Whamcloud - gitweb
fs/lustre-release.git
13 months agoLU-16661 build: add Recommends and Suggests for Debian 96/50396/5
Andreas Dilger [Fri, 24 Mar 2023 00:13:20 +0000 (18:13 -0600)]
LU-16661 build: add Recommends and Suggests for Debian

Add Suggests: bash-completion for lustre-client-tools and
lustre-server-tools for lctl and lfs completion.

Move perl from Depends to Recommends, since there are only some
uncommonly used tools (llstat, llobdstat) that are using perl.

Add python3 to Recommends for lustre-server-tools for lljobstat.
Remove python3 from lustre-iokit since it isn't used anywhere.

Change Maintainer for Debian packages to the lustre-devel mailing
list, instead of someone who hasn't worked on Lustre for 6 years.

Increase minimum kernel version for client from 2.6.32 to 3.10.

Improve package descriptions slightly.

Test-Parameters: trivial testlist=runtests clientdistro=ubuntu2204
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I43248cc78ae6a47ad77817c27ba11de25b3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50396
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
13 months agoLU-15740 tests: scale fs_log_size by OSTCOUNT 19/50419/8
Andreas Dilger [Fri, 24 Mar 2023 23:09:44 +0000 (17:09 -0600)]
LU-15740 tests: scale fs_log_size by OSTCOUNT

The fs_log_size "free space skew" was being scaled by MDSCOUNT,
but in fact this parameter is only ever used to compare the OST
free space usage, so the OSTCOUNT should be used when scaling it.

It is likely that the skew is actually caused by blocks allocated
by OST object directories and not llogs (no llogs used on OSTs for
many years), but it isn't worthwhile to rename the function.

Test-Parameters: trivial testlist=replay-single env=ONLY="20b 89"
Test-Parameters: testlist=runtests clientdistro=ubuntu2204
Test-Parameters: testlist=replay-ost-single env=ONLY="6 7"
Test-Parameters: testlist=sanity-sec env=ONLY="16-22
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I97f05b10fa7ec367534b5bdce09feae5e93ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50419
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-16516 tests: ONLY and ONLY_REPEAT improvements 95/50395/3
Andreas Dilger [Thu, 23 Mar 2023 21:38:54 +0000 (15:38 -0600)]
LU-16516 tests: ONLY and ONLY_REPEAT improvements

ONLY_REPEAT=N did not work if multiple subtests were selected via
a "base" test number (e.g. "ONLY=118") and that resulted in more
than one subtest being run (e.g. ONLY_118a=true, ONLY_118b=true, ...)
Since the run_one() caller of run_one_logged() is already checking
whether the test should be run or not, don't repeat that check
for ONLY_REPEAT.

Allow ONLY_REPEAT to be used when multiple subtests are specified
via ONLY, even if the subtests are not explicitly listed.

Allow tests in ONLY, EXCEPT, ALWAYS_EXCEPT, and SLOW to be separated
by '+', or ',' in addition to space-separated test numbers.  That
avoids issues with handling space-separated test lists in the shell
or when specified via Test-Parameters (need to use '+' in that case).

Test-Parameters: trivial testlist=sanity env=ONLY=118,ONLY_REPEAT=5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieac578c098ae76994a211c7db094dd99923bcc8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50395
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16163 tests: skip racer_on_nfs for NFSv3 85/50385/2
Andreas Dilger [Wed, 22 Mar 2023 23:42:10 +0000 (17:42 -0600)]
LU-16163 tests: skip racer_on_nfs for NFSv3

This test is continually failing and nobody is available to
fix it (it may be an NFS bug or a Lustre bug, unsure).

This same test passes on NFSv4 regularly.

Test-Parameters: trivial testlist=parallel-scale-nfsv3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7d9ac390c26aa8478dd35457ba20061747c2b92e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-9859 libcfs: discard MKSTR() macro 76/50376/2
Mr. NeilBrown [Wed, 22 Mar 2023 14:33:37 +0000 (10:33 -0400)]
LU-9859 libcfs: discard MKSTR() macro

This is only used for tracing when some strings might
be NULL.  NULL strings are not a problem for tracing,
vnsprintf() will report them as "(null)" which is probably
better (easier to parse) than an empty string.

Linux-commit: dd0393a5f29633f0e3d52e4c26ae4123c873c016

Test-Parameters: trivial
Change-Id: Ia305ea0d0dc05602a03dea589f928b6a599ee55e
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16643 lnet: Health logging improvements 05/50305/7
Chris Horn [Wed, 23 Nov 2022 17:28:45 +0000 (10:28 -0700)]
LU-16643 lnet: Health logging improvements

LNet health activity can generate noise in console logs. The NI/Peer
NI recovery pings could be expected to fail and the related messages
from lnet_handle_recovery_reply() are generally redundant.

Improve this logging by having the lnet_monitor_thread() provide a
summary of NIs in recovery.

Another useful metric in spotting network trouble is if we have
messages exceeding their deadline. We do not currently log this
information. Keep a count of messages that have exceeded their
deadline and track the total excess time. The lnet_monitor_thread()
will then provide a summary of the number of messages and their
average excess time at a regular interval. These stats are then
reset when the monitor thread prints this information to the console.

Because NIs can be in recovery for extended periods of time, the
interval of console updates will increase from 1 to 5 minutes.
The interval is reset when it is detected that there are no longer any
NIs in recovery and there haven't been any messages past their
deadline since the last console update.

Test-Parameters: trivial
HPE-bug-id: LUS-11500
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4ffffd0412806184282178ce0aca3073dd30d7e0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50305
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16260 lnet: enforce a positive minimum for lnd_timeout 36/50236/3
Frank Sehr [Thu, 9 Mar 2023 01:36:37 +0000 (17:36 -0800)]
LU-16260 lnet: enforce a positive minimum for lnd_timeout

Set the lnet_lnd_timeout to at least 1 second. The lnd_timeout is
calculated using the following formula:
max((lnet_transaction_timeout - 1) / (lnet_retry_count + 1), 1U);

Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: I64fd133974bd1f60ff3d7354bf9e0990c56d4c04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16529 test: wait quota synced on quota slaves 97/50197/5
Hongchao Zhang [Wed, 1 Mar 2023 00:48:28 +0000 (19:48 -0500)]
LU-16529 test: wait quota synced on quota slaves

Check and wait the quota setting to be synchronized on
quota slaves before running actual sanity-quota test_84.

Test-Parameters: trivial testlist=sanity-quota env=ONLY=84,ONLY_REPEAT=100
Fixes: a2fd4d3aee ("LU-15880 quota: fix insane grant quota")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I7752bff33f24d1d38dc340b2addbfc98d6f7c857
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50197
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-13485 build: fscrypt checks can be run in parallel 10/50110/6
Shaun Tancheff [Fri, 24 Mar 2023 10:49:24 +0000 (05:49 -0500)]
LU-13485 build: fscrypt checks can be run in parallel

Run some fscrypt checks in parallel.

Test-Parameters: trivial
HPE-bug-id: LUS-8584
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib4b882105958cdc6e47997992e1e978cfa01adf5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50110
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-16518 utils: fix unused function errors 01/49901/5
Timothy Day [Fri, 3 Feb 2023 03:32:59 +0000 (03:32 +0000)]
LU-16518 utils: fix unused function errors

Clang has default errors related to unused functions.
The errors related to 'fid_flatten' and 'fid_flatten32'
were resolved by moving the definitions of these
functions to the 'lustre_fid' header. This is a better
place for them, since they are small 'static inline'
functions and has the added benefit of cutting down
code duplication.

The error related to the 'static inline' function
'list_replace_init' was resolved by moving it to
'ofd_access_batch.h'.

The userspace implementation of 'fid_hash' has been
moved to the 'lustreapi.h' header.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I9714a2f36910c871c0a4579cf9400cb9ba72ec27
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-13485 libcfs: Remove unused iter_type check 91/48091/7
Shaun Tancheff [Fri, 23 Sep 2022 05:27:14 +0000 (12:27 +0700)]
LU-13485 libcfs: Remove unused iter_type check

The iter_type member check is not used, remove it.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I48d536a27738e73314feb88317d41d8479c72528
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48091
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-15660 statahead: statahead thread doesn't stop 73/47673/12
Yang Sheng [Fri, 17 Jun 2022 12:30:34 +0000 (20:30 +0800)]
LU-15660 statahead: statahead thread doesn't stop

Add a barrier to ensure sai_task changing can be seen
when access it without locking. Else the statahead
thread could sleep forever since wake_up was lost.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I211e99f1bdddaaaf028a205658f603fda034d389
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47673
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-13485 lnet: Parallel configure tests for lnet 68/38368/35
Shaun Tancheff [Mon, 3 Oct 2022 05:10:14 +0000 (12:10 +0700)]
LU-13485 lnet: Parallel configure tests for lnet

Transform the compile tests in lustre-lnet to run in parallel
Also fixes the generated Makefile to work with MOFED and in-kernel
OFED.

configure build times on an 8 core 8G vm vs current serial:

             serial      parallel
            --------     --------
    real    8m27.824s    1m28.375s
    user    5m29.448s    2m11.558s
    sys     3m48.258s    0m51.763s

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4f0cb8584e1c3149ec3f005dd55fed0c47b50472
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38368
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-6142 build: add SPD/GPL license to build files 47/50347/3
Timothy Day [Tue, 21 Mar 2023 03:31:05 +0000 (03:31 +0000)]
LU-6142 build: add SPD/GPL license to build files

Update the file header to have the SPDX license and
use the standard format.

Convert spaces to tabs.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id02218aa5b435bc0de96a39d3daa53a83a51c857
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50347
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-16371 ldlm: clear lock converting flag on resource cleanup 39/49339/3
Bobi Jam [Wed, 7 Dec 2022 16:03:20 +0000 (00:03 +0800)]
LU-16371 ldlm: clear lock converting flag on resource cleanup

During resource cleanup clear lock's converting flag so that
ldlm_cli_cancel() won't erroneously trip the assertion, the assertion
is used for normal lock revoke callbacks.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I1be4d7f16dbc7e026b460fd5358a0fe509b97a59
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49339
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16658 tests: disable performance-sanity test_6 86/50386/2
Andreas Dilger [Wed, 22 Mar 2023 23:59:27 +0000 (17:59 -0600)]
LU-16658 tests: disable performance-sanity test_6

This test is likely failing due to a bug in mdsrate, which is no
longer actively developed.  It should be replaced by mdtest.

Test-Parameters: trivial testlist=performance-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I05378fb75ed30e56983f4668c03725824ad5a8ab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16634 build: improve checkpatch warnings 31/50331/8
Andreas Dilger [Sat, 18 Mar 2023 01:03:49 +0000 (19:03 -0600)]
LU-16634 build: improve checkpatch warnings

Change checkpatch.pl to allow RETURN/GOTO as "end of switch case".

Improve CERROR/CWARN/LCONSOLE/CDEBUG message checking/warning to
print more useful message style advice than just "think hard".

Allow "DFID|DOSTID" within long error strings without complaint.

Add a spelling.txt rule to warn if version checks are added in a
test for future versions.  This is mostly useful for maintenance
branches when patches are being backported with test cases.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1a0d2f839949debf346aa15c65b0f407e0ce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50331
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-15404 ldiskfs: use per-filesystem workqueues to avoid deadlocks 54/50354/3
Andrew Perepechko [Tue, 21 Mar 2023 12:30:58 +0000 (08:30 -0400)]
LU-15404 ldiskfs: use per-filesystem workqueues to avoid deadlocks

Calling flush_scheduled_work() under s_umount is dangerous and may
cause deadlocks. This patch backports the fix from
https://lore.kernel.org/all/20220402084023.1841375-1-anserper@ya.ru/

Fixes: e239a14001 ("LU-15404 ldiskfs: truncate during setxattr leads to kernel panic")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: Ia191b70166f94f34e96a282ec18bd8650871e108
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50354
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-16683 tests: fix sanity-sec test_61 for SSK 76/50476/4
Sebastien Buisson [Thu, 30 Mar 2023 11:42:58 +0000 (13:42 +0200)]
LU-16683 tests: fix sanity-sec test_61 for SSK

When SHARED_KEY is in use, nodemap specific shared keys must be loaded
explicitly because sanity-sec test_61 defines a nodemap dedicated to
the client.

Fixes: a7222127c7 ("LU-16642 tests: improve sanity-sec test_61")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec env=ONLY=61
Test-Parameters: testlist=sanity-sec env=SHARED_KEY=true,ONLY=61
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I206205496352b6f36341c8b962bb7de4b71541d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50476
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-16515 tests: disable sanity test_118c/118d 70/50470/2
Andreas Dilger [Wed, 29 Mar 2023 21:39:50 +0000 (15:39 -0600)]
LU-16515 tests: disable sanity test_118c/118d

Temporarily disable sanity test_118c and test_118d until there is
a fix available, since this is failing a large fraction of tests.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I16ebbc470a126bb99b5c3ecdf93407d6b73ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50470
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16642 tests: improve sanity-sec test_61 17/50317/2
Sebastien Buisson [Thu, 16 Mar 2023 16:59:59 +0000 (17:59 +0100)]
LU-16642 tests: improve sanity-sec test_61

Improve sanity-sec test_61 by using a client-specific nodemap rather
than the default nodemap.

Test-Parameters: trivial testlist=sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie0c9e381e42a93d89558947dee9a60537cf01e65
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50317
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16639 misc: cleanup concole messages 83/50283/7
Andreas Dilger [Mon, 13 Mar 2023 22:08:30 +0000 (16:08 -0600)]
LU-16639 misc: cleanup concole messages

The lprocfs_job_cleanup() was not properly dropping all jobstats
from the hash table and printing errors from job_stat_exit() at
unmount.  Ensure all stats are "old enough" when @clear is set.

Change early libcfs cfs_cpu_init() messages from CERROR() to
pr_err() to avoid circular dependencies on libcfs setup before
printing an error message to the console during module init.

Test-Parameters: trivial
Fixes: ea2cd3af7b ("LU-11407 obdclass: add start time to stats files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ide3f502103392a79419cc1836200bf5a1a3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50283
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16589 tests: add sanity/31l to test ln command 65/50265/2
Andreas Dilger [Mon, 13 Mar 2023 04:43:53 +0000 (21:43 -0700)]
LU-16589 tests: add sanity/31l to test ln command

This patch adds a new subtest sanity/31l to test
hard-linking a file to a target direcory that has
trailing "/".

The subtest will be skipped if the coreutils version
>= 8.31 and kernel version < 5.18 because the
coreutils commit v8.30-18-g571f63f5010b reveals
a kernel issue, which is fixed by kernel commit
v5.18-rc2-188-gb3d4650d82c7.

Test-Parameters: trivial clientdistro=el7.9 env=ONLY=31l testlist=sanity
Test-Parameters: trivial clientdistro=el8.7 env=ONLY=31l testlist=sanity
Test-Parameters: trivial clientdistro=el9.1 env=ONLY=31l testlist=sanity
Test-Parameters: trivial clientdistro=el9.0 env=ONLY=31l testlist=sanity
Test-Parameters: trivial clientdistro=sles15sp4 env=ONLY=31l testlist=sanity
Test-Parameters: trivial clientdistro=sles15sp3 env=ONLY=31l testlist=sanity

Change-Id: I45d7c277a37fa538d5137150bfc7ba1704052873
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50265
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-930 docs: fix whatis output 64/50264/4
Timothy Day [Sun, 12 Mar 2023 15:19:54 +0000 (15:19 +0000)]
LU-930 docs: fix whatis output

The ".SH NAME" section has to be formatted in a certain
way for whatis and apropos to work correctly. Otherwise,
users will just see "(unknown subject)".

This patch fixes issues for all man pages.

Add a couple of one-line man page redirects.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ie11eb921c84ff9ad19b50973c616f6fb6df1f461
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16632 tests: more margin of error for sanity/56xh 62/50262/4
Timothy Day [Sat, 11 Mar 2023 22:55:09 +0000 (22:55 +0000)]
LU-16632 tests: more margin of error for sanity/56xh

Give sanity test_56xh more time to migrate files inside the
VMs before failing.

Also, fix a typo.

Test-Parameters: trivial testlist=sanity env=ONLY=56xh,ONLY_REPEAT=100
Fixes: 55968bfabe ("LU-13482 utils: bandwidth limit for lfs migrate")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If89c8c3ee113c8a14d4c0463c7bb79e353130c08
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50262
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
13 months agoLU-16633 obdclass: fix rpc slot leakage 61/50261/12
Alex Zhuravlev [Fri, 10 Mar 2023 17:47:05 +0000 (20:47 +0300)]
LU-16633 obdclass: fix rpc slot leakage

obd_get_mod_rpc_slot() can race with obd_put_mod_rpc_slot():
finishing wait_woken() resets WQ_FLAG_WOKEN (which is set
when the corresponding thread gets a slot incrementing
cl_mod_rpcs_in_flight. then another thread execting
__wake_up_locked_key() may find that wq_entry again and call
claim_mod_rpc_function() one more time again incrementing
cl_mod_rpc_in_flight. thus it's incremented twice for a
single obd_get_mod_rpc_slot().

 #1: obd_get_mod_rpc_slot() #2: obd_put_mod_rpc_slot()
flags &= ~WQ_FLAG_WOKEN
list_add()
wait_woken()
schedule claim_mod_rpc_function()
cl_mod_rpcs_in_flight++
wake_up()

flags &= ~WQ_FLAG_WOKEN

#3: obd_put_mod_rpc_slot()
claim_mod_rpc_function()
cl_mod_rpcs_in_flight++
wake_up()
list_del()

the patch introduces a replacement for WQ_FLAG_WOKEN which is never
reset once set.

Fixes: 5243630b09 ("LU-15947 obdclass: improve precision of wakeups for mod_rpcs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I29371c8c85414413c5a8e41dec3632f64ad127bb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50261
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-14291 batch: don't include lustre_update.h for client only builds 58/50258/2
James Simmons [Fri, 10 Mar 2023 14:12:27 +0000 (09:12 -0500)]
LU-14291 batch: don't include lustre_update.h for client only builds

The header lustre_update.h contains a huge amount of server only
code. Remove lustre_update.h for a client only build and include
only what we need for client directly.

Test-Parameters: trivial
Change-Id: I84dc39672340045bde09249d98f32aa9abec63b8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50258
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16615 utils: add messages in l_getidentity 13/50213/3
Lai Siyao [Wed, 18 Jan 2023 00:23:05 +0000 (19:23 -0500)]
LU-16615 utils: add messages in l_getidentity

Add time related messages in l_getidentity to help debug upon
timeout, which may cause -EACCES error in user applications.

Test-Parameters: trivial
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I87ebfb85d05e19886d8becc6b14ed0233eaed42d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50213
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16601 kernel: update SLES15 SP4 [5.14.21-150400.24.46.1] 79/50179/2
Jian Yu [Thu, 2 Mar 2023 02:34:18 +0000 (18:34 -0800)]
LU-16601 kernel: update SLES15 SP4 [5.14.21-150400.24.46.1]

Update SLES15 SP4 kernel to 5.14.21-150400.24.46.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp4 testlist=sanity

Change-Id: I5b9e39359e61e929adaeddece60f4d247996a00a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50179
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16599 obdclass: job_stats can parse escaped jobid string 60/50160/7
Lei Feng [Wed, 1 Mar 2023 00:16:03 +0000 (08:16 +0800)]
LU-16599 obdclass: job_stats can parse escaped jobid string

Writing a jobid to job_stats proc entry asks lustre to clear
the stats of the specific jobid. Since job_stats outputs
escaped jobid string in some cases, it should be able to parse
an escaped jobid string when the string is written to it.

Test-Parameters: trivial
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Idbc63dac6c3b35331317927107e634a3d638dd66
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50160
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-14668 lnet: add 'lock_prim_nid" lnet module parameter 59/50159/8
Serguei Smirnov [Tue, 28 Feb 2023 23:02:20 +0000 (15:02 -0800)]
LU-14668 lnet: add 'lock_prim_nid" lnet module parameter

Add 'lock_prim_nid' lnet module parameter to allow control
of how Lustre peer primary NID is selected.
If set to 1 (default), the NID specified by Lustre when
calling LNet API is designated as primary for the peer,
allowing for non-blocking discovery in the background.
If set to 0, peer discovery is blocking until complete
and the NID listed first in discovery response is designated
as primary.

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I6ed1cb0c637f4aa7a7340a6f01819ba9a85858f4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50159
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16598 osp: cleanup comment in osp_sync.c 46/50146/2
Li Xi [Mon, 27 Feb 2023 14:47:59 +0000 (22:47 +0800)]
LU-16598 osp: cleanup comment in osp_sync.c

The comment of osp_sync.c is outdated can be cleaned
up a little bit for better explanation of the implementation.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I60b39ab5f7360521258200cf55e5c85373cf4aa2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16595 test: save one second in wait_destroy_complete() 44/50144/2
Li Xi [Mon, 27 Feb 2023 03:22:28 +0000 (11:22 +0800)]
LU-16595 test: save one second in wait_destroy_complete()

In wait_destroy_complete(), there is no need to wait another 1
second when all in flight destroys finish.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I351616ecf261f1e77c3f8d61f5541a51e327fa83
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50144
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16563 lnet: use discovered ni status to set initial health 27/50027/8
Serguei Smirnov [Thu, 16 Feb 2023 18:34:03 +0000 (10:34 -0800)]
LU-16563 lnet: use discovered ni status to set initial health

If not routing, track local NI status in the ping buffer
such that locally recognized "down" state, for example,
due to a downed network interface/link, is available
to any discovering peer.
If NI 'fatal' status is changed, push update to peers.

On the active side of discovery, check peer NI status so if NI
is down, decrement its health score and queue for recovery.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I513c7942099c0da9088fa6d4460f76386ea91d3b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50027
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-16221 kernel: update RHEL 9.1 [5.14.0-162.18.1.el9_1] 77/50177/4
Jian Yu [Fri, 10 Mar 2023 18:50:00 +0000 (10:50 -0800)]
LU-16221 kernel: update RHEL 9.1 [5.14.0-162.18.1.el9_1]

Update RHEL 9.1 kernel to 5.14.0-162.18.1.el9_1 for Lustre client.

Test-Parameters: trivial clientdistro=el9.1 testlist=sanity
Test-Parameters: trivial serverdistro=el8.7 clientdistro=el9.1 \
testlist=sanity

Change-Id: I032f69f1ecba60248729bb856a3aad78e5f05680
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50177
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-15053 tests: reset quota if ENABLE_QUOTA=1 23/49823/5
Sergey Cheremencev [Mon, 30 Jan 2023 17:33:08 +0000 (20:33 +0300)]
LU-15053 tests: reset quota if ENABLE_QUOTA=1

Quota limits set in setup_quota() with ENABLE_QUOTA=1
should be cleaned up in the end to avoid failures in
the next sessions

Test-Parameters: testlist=sanity-quota env=ENABLE_QUOTA=yes
Test-Parameters: testgroup=review-dne-part-4 env=ENABLE_QUOTA=yes
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ia6b034739cfe800c6661f199420d0a4dbe7110fc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49823
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-16382 build: udev files in /usr/lib 69/49369/7
Mr NeilBrown [Mon, 13 Mar 2023 18:56:40 +0000 (14:56 -0400)]
LU-16382 build: udev files in /usr/lib

udev rules files should go in /usr/lib/udev/rules.d
/etc/udev/rules.d is meant for local configuration.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I525d25c54903c25d19b5909231e21e7a3a717d9b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49369
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16338 readahead: clip readahead with kms 26/49226/21
Qian Yingjin [Wed, 23 Nov 2022 13:03:41 +0000 (08:03 -0500)]
LU-16338 readahead: clip readahead with kms

During I/O test, it found that the read-ahead pages reach 255 for
small files with only several KiB. The amount of read data reaches
more than 1MiB.
The reason is that the granted DLM extent lock is [0, EOF], which
is larger than the requested extent. During readahead, the OSC
layer will also return [0, EOF] extent which will clip into stripe
size (1MiB) regardless the actual object size.
In this patch, the readahead range is clipped to the known min
size (kms) on OSC layer during readahead. By this way, the
read-ahead data will not beyong the last page of the file.

Add sanity/101m to verify it.

This patch also fixes multiop to return successfully when reaching
EOF instead of exiting with ENODATA during read.

Test-Parameters: testlist=sanity env=ONLY=101k,ONLY_REPEAT=3
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I285e3e1d84ad06231039306106c74d775c1b0b50
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49226
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-13107 utils: remove duplicate lctl erase/fork_lcfg 86/48886/3
Andreas Dilger [Wed, 13 Apr 2022 01:24:23 +0000 (19:24 -0600)]
LU-13107 utils: remove duplicate lctl erase/fork_lcfg

A patch merge error resulted in duplicate erase_lcfg and fork_lcfg
sub-commands in lctl.  Remove the duplicates, and move them to the
llog section, since they relate to the configuration llogs.

Fixes: b0efebdaef52 ("LU-13107 utils: clean up lctl command usage")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4449f7dbb0ab7b643e5057131bbc9620ac457a3d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16217 iokit: Add lst.sh wrapper and lst-survey 99/48799/5
Chris Horn [Tue, 4 Oct 2022 10:05:15 +0000 (05:05 -0500)]
LU-16217 iokit: Add lst.sh wrapper and lst-survey

lst.sh is a wrapper around the LNet selftest (lst) utility. It
provides a streamlined interface for executing read, write, combined
read/write and ping lst tests.

lst-survey leverages lst.sh to test the performance of groups of LNet
peers against each other.

HPE-bug-id: LUS-10279
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4c2593df1289b0b97760cb402de1e101ca22c319
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-12805 tests: disable replay-single/36 91/36291/13
Alex Zhuravlev [Wed, 25 Sep 2019 18:00:40 +0000 (21:00 +0300)]
LU-12805 tests: disable replay-single/36

the test is broken as it checks for server-side message
on the client. it fails constantly if changed to make
correct checking (on the server).
see LU-12805 for the details.

Test-Parameters: trivial
Test-Parameters: testlist=replay-single
Change-Id: I70db3994ba51076ce9a8ef47efded1acb4ddaf52
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/36291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16604 kfilnd: kfilnd_peer ref leak on send 57/50157/2
Chris Horn [Tue, 28 Feb 2023 20:09:57 +0000 (13:09 -0700)]
LU-16604 kfilnd: kfilnd_peer ref leak on send

There is an extra refcount_inc() done by kfilnd_tn_alloc_for_peer().
This is correct in the case where we are allocating TN for HELLO
request, because our caller does not take extra ref on kfilnd_peer,
but it is wrong in the normal kfilnd_tn_alloc() path because
kfilnd_tn_alloc() takes this reference by way of a call to
kfilnd_peer_get().

Move the refcount_inc() from kfilnd_tn_alloc_for_peer() to
kfilnd_send_hello_request() where it is needed.

Test-Parameters: trivial
HPE-bug-id: LUS-11128
Fixes: 11a32d886b ("LU-16213 kfilnd: Allow one HELLO in-flight per peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3d723a829ec42929ce22a80ffda97dbd87917d4b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50157
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-9680 lnet: handle multi-rail setups 26/50026/5
James Simmons [Tue, 7 Mar 2023 17:44:38 +0000 (12:44 -0500)]
LU-9680 lnet: handle multi-rail setups

For multi-rail setups we can push more than one interface at a
time to setup the local NIs but our netlink code ignored all but
one interface. Refactor both lnet_genl_parse_local_ni() and
lnet_net_cmd() to setup all the passed in interfaces. Also remove
setting ni to NULL in the NI deletion case which causes an oops
when we have more than one interface.

Lastly rework the Netlink userland library code to properly pack
netlink packets sent to the kernel. We were treating YAMl mappings
the same as YAML sequences. This is wrong so we separate the
handling of each case. Mapping then are translated as nested
collection of data and sequences are arrays of these data. This
ends up packing a nested collection in another nested collection.
Before we didn't have this layering which lead to improper
packing.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Icb220127fdabfc5ebf4bb848cf2715048c40f674
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50026
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH 24/38424/28
Li Dongyang [Mon, 22 Nov 2021 11:43:03 +0000 (22:43 +1100)]
LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH

Reduce LUSTRE_DATA_SEQ_MAX_WIDTH from ~4B to ~32M
to limit the number of objects under /O/[seq]/d[0..31]
dir on OSTs.
This makes the directories stay optimial for ldiskfs,
to avoid going into the largedir/3-level htree territory.

Remove the hard-coded LUSTRE_DATA_SEQ_MAX_WIDTH checks
in ofd, make them check the seq->lcs_width which is
a tunable set to LUSTRE_DATA_SEQ_MAX_WIDTH by default,
allow the value up to IDIF_MAX_OID if a larger seq width
is needed.

Use the odbo->o_size in the OST_CREATE rpc reply on ofd,
to update osp with the current seq width setting.
osp then uses this seq width to determine when to rollover
to a new seq.

The seq will rollover when the seq width is exhausted,
the default is LUSTRE_DATA_SEQ_MAX_WIDTH.
For seq >= FID_SEQ_NORMAL objects, the upper limit of
seq width is OBIF_MAX_OID,
For IDIF/MDT0 objects, the upper limit is IDIF_MAX_OID.
The seq FID_SEQ_OST_MDT0 will change to a normal seq after the
rollover.

Fix osp_precreate_reserve when the last precreated is the end
of the seq and the osp_objs_precreated can not host all
the requested objects, the mdt thread would stuck:
it wakes up osp precreate thread in a loop for progress,
but osp thread will not try to do anything until the seq
is used up. This can be seen easier when seq->lcs_width is
set to a low number and try to create an overstripe with stripe
number bigger than seq->lcs_width.

Fix the precreate thread spinning when the precreate pool
is at the end of the seq, and is nearly empty.

Change the seq->lcs_width to 16384 for all tests in
test-framework.sh, except a few slow tests to avoid timeouts,
and some overstriping tests creating LOV_MAX_STRIPE_COUNT to
avoid overstriping creating less objects than expected,
when precreate pool is at the end of the seq, and there are
not enough objects.

Fix the problem where seq could still change after
replay_barrier. To achieve this, introduce new fail_loc
OBD_FAIL_OSP_FORCE_NEW_SEQ and force_new_seq/force_new_seq_all
to drain the objects in the precreate pool then rollover to a
new seq. This applies to a bunch of test suites heavily using
replay_barrier.

Change-Id: I2749c1004b7bf3197b691cc94527f90145bcdef8
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38424
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16629 osd: refill the existing env 70/50270/2
Hongchao Zhang [Mon, 6 Mar 2023 13:17:10 +0000 (08:17 -0500)]
LU-16629 osd: refill the existing env

During the LDLM lock callback, the "lu_env" is created in
ldlm_bl_thread_main, which is initiated by "ldlm_setup",
and it could have no key of "osd_thread_info" yet, then it
need to call "lu_env_refill" to refill the keys.

Change-Id: Ibae978a5a10826c2e3186012911870ce7bf0b147
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16478 target: disconnected export 41/50041/7
Alex Zhuravlev [Fri, 17 Feb 2023 08:00:20 +0000 (11:00 +0300)]
LU-16478 target: disconnected export

eviction can race with a reconnect and this in turn can lead
to a leaked export reference prevent further umount -
mdt_obd_reconnect() grabs a reference via nodemap_add_member().
call obd_disconnect() if such a case observed to balance
obd_reconnect().

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3fd49429ef40ef391d58e042e091258dcb9add72
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50041
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16524 sec: add fscrypt_admin rbac role 84/50184/13
Sebastien Buisson [Wed, 1 Mar 2023 15:11:19 +0000 (16:11 +0100)]
LU-16524 sec: add fscrypt_admin rbac role

The purpose of the new fscrypt_admin rbac role is to control admin
tasks related to fscrypt. When not set, it is forbidden to all users
including root to modify existing protectors or policies, or create
new ones. But it remains possible to lock and unlock encrypted
directories.

Internally, this is achieved by marking fscrypt metadata files and
directories, i.e. everything under ROOT/.fscrypt, with a special mdt
object flag LOHA_FSCRYPT_MD.
Upon request processing, the mdt layer returns -EPERM if the flag
LOHA_FSCRYPT_MD is found on an object that is the target of a modify
request.
The LUSTRE_IMMUTABLE_FL flag is also returned to clients for such
objects.

sanity-sec test_64f is added to exercise the new fscrypt_admin flag.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I99956499133994444ccd88e33340067790a182ce
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-16524 sec: enforce rbac roles 07/49907/18
Sebastien Buisson [Fri, 3 Feb 2023 13:11:51 +0000 (14:11 +0100)]
LU-16524 sec: enforce rbac roles

There are 5 different rbac roles defined via nodemap:
- byfid_ops, to allow operations by FID (e.g. 'lfs rmfid').
- chlg_ops, to allow access to Lustre Changelogs.
- dne_ops, to allow operations related to DNE (e.g. 'lfs mkdir').
- file_perms, to allow modifications of file permissions and owners.
- quota_ops, to allow quota modifications.
Enforce these roles by checking the value of the 'rbac' nodemap
property on server side and returning -EPERM if operation is
forbidden.

Add sanity-sec test_64* to exercise these capabilities.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I37057f0ab50c02fa99db03cb04149a437e35ee0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49907
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
13 months agoLU-16524 nodemap: add rbac property to nodemap 73/49873/11
Sebastien Buisson [Wed, 25 Jan 2023 12:54:00 +0000 (13:54 +0100)]
LU-16524 nodemap: add rbac property to nodemap

Add new rbac property to nodemap. Internally this is a mask of allowed
roles. Externally it defaults to all, which means all roles are
allowed, and it can take the following values (multiple can be
specified, comma separated), with the semantic:
- byfid_ops, to allow operations by FID (e.g. 'lfs rmfid').
- chlg_ops, to allow access to Lustre Changelogs.
- dne_ops, to allow operations related to DNE (e.g. 'lfs mkdir').
- file_perms, to allow modifications of file permissions and owners.
- quota_ops, to allow quota modifications.
Apart from all, any role not explicitly specified is forbidden. And to
forbid all roles, use 'none' value.

Update lctl-nodemap-modify man page to mention this new property.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4cedf03c75948f4b1e9b55292414ab9110701874
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49873
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16637 llite: call truncate_inode_pages() under inode lock 84/50284/6
Bobi Jam [Tue, 14 Mar 2023 02:02:12 +0000 (10:02 +0800)]
LU-16637 llite: call truncate_inode_pages() under inode lock

truncate_inode_pages() is required to be called under (and serialised
by) inode lock.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I0f1a09c8756522f87a2e5d8030d12f80e2f630b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50284
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-15626 lnet: fix shellcheck warning for lnet utils 44/50244/4
Timothy Day [Thu, 9 Mar 2023 19:09:27 +0000 (19:09 +0000)]
LU-15626 lnet: fix shellcheck warning for lnet utils

Fix two small shellcheck warnings for
lnetunload.

Update the file headers to have the
SPDX license and use the standard
format.

Move spaces to tabs.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic21d917b5b4d6ba6b679453bd50a7699f9908267
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50244
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16549 osp: Fix sizeof for RMF_OUT_UPDATE_HEADER 24/50224/2
Vitaliy Kuznetsov [Tue, 7 Mar 2023 14:10:24 +0000 (17:10 +0300)]
LU-16549 osp: Fix sizeof for RMF_OUT_UPDATE_HEADER

The first struct packed is actually a struct
out_update_header with no inline data, so it must be
packed in the size of out_update_header struct, but
gets packed as length 120 because of the size of struct
osp_update_request(which should never go over the wire).

Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I692461ddf0493727e2dea222b6721c952d3166a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-7668 tests: skip conf-sanity test_33a for old MGS 21/50221/2
Andreas Dilger [Tue, 7 Mar 2023 01:00:41 +0000 (18:00 -0700)]
LU-7668 tests: skip conf-sanity test_33a for old MGS

Skip del_ost test for MGS versions that do not have this command.

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY="33a 123ah" serverversion=2.15.0
Fixes: 1121816c4a ("LU-7668 utils: add lctl del_ost")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib8ca9c604404e5717533be32fd6b5ccfbf70428e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50221
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16491 lfs: test getdirstripe YAML 08/50208/3
Timothy Day [Sun, 5 Mar 2023 04:37:46 +0000 (04:37 +0000)]
LU-16491 lfs: test getdirstripe YAML

Add a test to ensure that getdirstripe
is outputting valid YAML for layouts.

Fix the verbose flag for getdirstripe
by changing the verbose enum used. The
verbose flag is used in the test.

Add some parenthesis to improve readability.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic862a039ab01b004f212bd168d9e28de4fee15c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50208
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16610 ldiskfs: fix directory corruption on openeuler 22.03 92/50192/2
Xinliang Liu [Thu, 23 Feb 2023 07:54:15 +0000 (07:54 +0000)]
LU-16610 ldiskfs: fix directory corruption on openeuler 22.03

This fixes directory corruption error below.
LDISKFS-fs error (device dm-0): ldiskfs_find_dest_de:2412: inode
rec_len is smaller than minimal - offset=0, inode=0, rec_len=8,
name_len=0, size=4096

Fixes through
make up(&ei->i_append_sem) lock include ext4_journal_get_write_access()
like rhel9.1 ext4-pdirop.patch.
Remove the wrong the dx_move_dirents() call before condition "if
(hinfo->hash < hash2)" like other ext4-pdirop.patch.
Also move code part
if (indirect == level) { /* the last index level */
    struct ext4_dir_lock_data *ld;
u64 myblock;
...
}
after code part
block = dx_get_block(at);
for (i = 0; i <= level; i++) {
    ...
}

Change-Id: Ie33623ba4428d58f5c612871287c19e7e239755d
Test-Parameters: trivial
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16509 lnet: memcpy false positive in brw_test 85/50185/2
Patrick Farrell [Thu, 2 Mar 2023 18:02:25 +0000 (13:02 -0500)]
LU-16509 lnet: memcpy false positive in brw_test

The flexible array at the end of srcp_bulk is triggering a
false positive in fortified memcpy().  Quash it with
unsafe_memcpy().

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I13386c0a8e73b04af8d398aa49361bfdf6a05ad8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50185
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16518 libcfs: fix clang build errors 62/50162/4
Timothy Day [Tue, 28 Feb 2023 18:48:36 +0000 (18:48 +0000)]
LU-16518 libcfs: fix clang build errors

Adjust a strncat and the preceding if statement
to account for the null terminator in the string.

Use (void) to designate two variables as unused
in a function to avoid doing a self-assign.

Also, use an explicit format to fix the format
security warning around alloc_workqueue.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I14a19ba83c063cd81c16723c31d0488c2b4f607e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50162
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16518 utils: fix clang build errors 61/50161/4
Timothy Day [Tue, 28 Feb 2023 18:47:06 +0000 (18:47 +0000)]
LU-16518 utils: fix clang build errors

This patch fixes a number of small clang build
errors in Lustre utils. Many errors are related
to nuances in typing or statements which appear
to be tautologies. These are resolved.

Some unneeded paranthesis are removed. A variable
is initialized which could potentially be left
uninitialized. And a comparison was added that
seemed to be left out.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id3f40b033e640f8d2ae6386f66a88de06fc89666
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50161
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16603 protocol: add OBD_BRW_COMPRESSED 54/50154/5
Alex Zhuravlev [Tue, 28 Feb 2023 09:49:11 +0000 (12:49 +0300)]
LU-16603 protocol: add OBD_BRW_COMPRESSED

so the client can hint OST the data is compressed

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4b721db3ad349d5745ee6698de368d0cb0138954
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50154
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-16589 tests: fix hard-link failure in sanityn/55d 27/50127/3
Jian Yu [Mon, 13 Mar 2023 00:32:16 +0000 (17:32 -0700)]
LU-16589 tests: fix hard-link failure in sanityn/55d

Since coreutils version 8.31, the stat() and lstat()
operations were removed from ln by commit 571f63f5010b,
which caused the following dir hard-link failure in
sanityn/55d:

ln: failed to create hard link '/mnt/lustre2/d55d.sanityn/d55d.sanityn/'
=> '/mnt/lustre2/d55d.sanityn/f1': No such file or directory

This actually reveals a kernel issue which is fixed by commit
v5.18-rc2-188-gb3d4650d82c7.

To avoid the kernel issue and keep the test effective,
this patch appends the target filename to the $tdir/
so as to fix the hard-link failure.

Test-Parameters: trivial env=ONLY=55d testlist=sanityn
Test-Parameters: trivial clientdistro=el9.1 env=ONLY=55d testlist=sanityn
Test-Parameters: trivial clientdistro=el9.0 env=ONLY=55d testlist=sanityn
Test-Parameters: trivial clientdistro=sles15sp4 env=ONLY=55d testlist=sanityn
Test-Parameters: trivial clientdistro=sles15sp3 env=ONLY=55d testlist=sanityn

Change-Id: I42313e43eaea3d94007d534bf38efdeacf2ede43
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50127
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16587 utils: give lfs migrate a larger buffer 18/50118/5
Nathan Rutman [Wed, 22 Feb 2023 22:34:09 +0000 (14:34 -0800)]
LU-16587 utils: give lfs migrate a larger buffer

lfs migrate is slow because it mostly uses a small 1MB buffer. Bigify.

[root@kjlmo4n00 16G]# time lfs migrate -S 1M -p flash 16G.1
real 0m25.341s
[root@kjlmo4n00 16G]# time /root/tools/lfs_nzr migrate -S 1M -p flash 16G.1
real 0m6.526s

Signed-off-by: Nathan Rutman <nathan.rutman@hpe.com>
Change-Id: I850ca475fcd0efe2d71d26e4d1544f462c60252a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50118
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-16586 build: Remove old check for linux_selinux_enabled 11/50111/2
Shaun Tancheff [Wed, 22 Feb 2023 10:58:58 +0000 (04:58 -0600)]
LU-16586 build: Remove old check for linux_selinux_enabled

LC_SRC_HAS_LINUX_SELINUX_ENABLED is used and not defined, it
should be removed.

LC_PROG_LINUX_SRC and LC_PROG_LINUX_RESULTS are defined twice,
remove the unused ones.

Fixes: a346cf6cf22 ("LU-13485 kernel: Parallel core configure tests")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I37865c7f7bd6a3f7825084f88295f3fdea8cf920
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16626 build: remove python2 dependencies 41/50241/2
Alex Deiter [Thu, 9 Mar 2023 14:09:19 +0000 (18:09 +0400)]
LU-16626 build: remove python2 dependencies

Fixed packaging issue caused by zfsobj2fid script.

Test-Parameters: trivial
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Change-Id: I4375038b0d2c2b42ac4080fe834d35bdd3ef54f8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50241
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16374 ldiskfs: round-up enc file size 10/49410/6
Sebastien Buisson [Wed, 14 Dec 2022 17:21:14 +0000 (18:21 +0100)]
LU-16374 ldiskfs: round-up enc file size

When accessing encrypted files on targets mounted as ldiskfs, the
sizes must be rounded up to the next encryption block size. This is
required in order to read or write full encryption blocks.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6d08a2eb34f0dff864891a4e3e77977688412ec8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49410
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-14736 tests: improve leak-finder compatibility 57/49357/6
Andreas Dilger [Sat, 10 Dec 2022 04:59:56 +0000 (21:59 -0700)]
LU-14736 tests: improve leak-finder compatibility

Add regexps for leak-finder.pl to still be able to parse older
debug logs for ease of use.

Improve --summary mode so that it only prints the summary, instead of
every line that is parsed.  Add --debug to print every line if needed.

Track free-without-alloc calls so that --summary can be used on debug
logs captured during cleanup to aggregate frees that were allocated
before the debug log was started.  This will show "leaks" as negative
allocations for each callsite (i.e. more memory freed than allocated).

Print total allocated and freed in summary line.

Fixes: 5b998d803f ("LU-14736 utils: update leak-finder.pl for new format")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieefd5f5336252edcd3916a409c6c046a57b07dc4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49357
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16250 tests: remove metadata-updates.sh script 13/48913/3
Andreas Dilger [Tue, 18 Oct 2022 23:53:01 +0000 (17:53 -0600)]
LU-16250 tests: remove metadata-updates.sh script

The metadata-updates.sh test script doesn't really test anything
particularly interesting, and appears never to fail unless all of
the tests in that session fail, which indicates to me that anything
of interest that it might have tested has already been caught by
some earlier test. It has not been updated for anything except
test-framework changes in many years. The only minor part of interest
is write_disjoint, but that is also tested by parallel-scale (along
with more MPI tests).

Test-Parameters: trivial optional testgroup=review-dne-part-8
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3891cf4cabdc777c2648a95fd821a376f7e6c87f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48913
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-15873 osd: heed readonly mount upon osd-ldiskfs device 98/48098/6
Bobi Jam [Fri, 29 Jul 2022 04:12:12 +0000 (12:12 +0800)]
LU-15873 osd: heed readonly mount upon osd-ldiskfs device

Mount device rdonly for scrub if instructed.

Avoid write FID EA upon rdonly osd device.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9d24bdebfa2c6a98dc583760413e957ebcf4bca7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-15873 quota: don't release for RO device 96/48096/6
Wang Shilong [Thu, 7 Jan 2021 13:34:00 +0000 (21:34 +0800)]
LU-15873 quota: don't release for RO device

There is no need to release quota space for readonly
device.

And further problem is there is inconsistency between
Lustre osd and ldiskfs, ldiskfs won't load quota inode
on Readonly mount, however Lustre osd is not aware of
this and load accounting objects even in RO. this might
potentially cause problems when Lustre want to access
quota.

Change-Id: I7db5fe3f3bed3103ed62f6beba1d6f47fce12a21
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48096
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-13494 osp: wait for import invalidation 99/45499/31
Alex Zhuravlev [Tue, 9 Nov 2021 08:03:32 +0000 (11:03 +0300)]
LU-13494 osp: wait for import invalidation

osp_update_fini() should wait till racing ptlrpc_import_invalidate()
(running in a dedicated invalidation thread) is complete as the both
threads access ou_update and osp_update_fini() release the structure.

the change also fixes kernel's warning on scheduling while atomic:
old osp_update_fini() took the spinlock to protect ou_list, but
given new wait_event() we don't need to protect the list anymore.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iea40d3be8b1b3079b9fe8bdd015cc3392027b64d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45499
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-15162 osd: improve OI lookup concurrency 53/45353/27
Alex Zhuravlev [Mon, 25 Oct 2021 17:27:15 +0000 (20:27 +0300)]
LU-15162 osd: improve OI lookup concurrency

replace inode->i_mutex with i_rwsem in osd_obj_map_lookup()

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id8df20e00ae254ea4dcf4b10415e1927fac6bd44
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45353
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-14834: list the UUIDs of stale clients during recovery 96/44196/8
Courrier Guillaume [Thu, 1 Jul 2021 09:35:19 +0000 (11:35 +0200)]
LU-14834: list the UUIDs of stale clients during recovery

Add a new entry in debugfs for MDT and OSD to display the uuid of clients
yet to be reconnected during a recovery.

For example:

$ lctl get_param obdfilter.lustre-OST0000.recovery_stale_clients
obdfilter.lustre-OST0000.stale_clients=
9a7ab21d-207c-4680-b9bf-b5873fd05540

This will display, during the recovery, the UUIDs of clients
that are expected to be connected.

Signed-off-by: Courrier Guillaume <guillaume.courrier@cea.fr>
Change-Id: Ib8c0b500adc9098e3cfb1998df06757a7d31b7b9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44196
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-14760 llite: restart clio for AIO if necessary 95/43995/6
Li Dongyang [Mon, 14 Jun 2021 01:31:33 +0000 (11:31 +1000)]
LU-14760 llite: restart clio for AIO if necessary

If the clio needs to be restarted from where it left off,
do it for AIO as well, so we don't end up with short IO.
Limit thr number of retries to 1000, to avoid potential
issues if the loop is stuck forever.

Change-Id: Iccca31b032b01b940656864bfff22a821ff5061d
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43995
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-13654 target: print info of evicted clients to syslog 78/38878/2
Hongchao Zhang [Sun, 31 May 2020 21:00:01 +0000 (05:00 +0800)]
LU-13654 target: print info of evicted clients to syslog

During recovery, the information of the evicted clients could be
useful for administrator to evaluate the affect of the recovery,
the debug log could be not easy to get in this case, then it is
better to dump it to syslog.

Change-Id: Ib32e10b1bbf12fe65be7862a018144827201e58a
Test-Parameters: trivial
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-14692 osp: deprecate IDIF sequence for MDT0000 22/45822/22
Li Dongyang [Fri, 10 Dec 2021 11:44:09 +0000 (22:44 +1100)]
LU-14692 osp: deprecate IDIF sequence for MDT0000

Always return true for IDIF seq osp_fid_end_seq
so osp precreate will rollover to a new seq in
the FID_SEQ_NORMAL range for MDT0000.

Remove conf-sanity test_122b:
Check OST sequence wouldn't change when IDIF 32bit overflows

Change-Id: I85a0e38266331c96d971d68ec353949ccac3fc21
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45822
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
13 months agoLU-16612 llite: protect cp_state with vmpage lock 80/50180/2
Bobi Jam [Thu, 2 Mar 2023 09:39:01 +0000 (17:39 +0800)]
LU-16612 llite: protect cp_state with vmpage lock

cl_page_make_ready() calls cl_page_io_start() without vmpage lock
protection, and that could mess up cl_page's cp_state/cp_owner.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Id0df7e14246aa561494a9b6e581cebc55241c4b9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50180
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16501 lod: add qos_ost_weights to debugfs 74/50074/6
Sergey Cheremencev [Mon, 20 Feb 2023 17:27:52 +0000 (20:27 +0300)]
LU-16501 lod: add qos_ost_weights to debugfs

The patch adds files qos_ost_weights and qos_mdt_weights
at lod directory in debugfs. File qos_ost_weights would be
also added for each OST pool in a new directory lod/pool.

    lod.<fsname>-MDT*-mdtlov.qos_mdt_weights
    lod.<fsname>-MDT*-mdtlov.qos_ost_weights
    lod.<fsname>-MDT*-mdtlov.pool.<pool>.qos_ost_weights

These files provide target and server weights, penalties and other
data needed to debug QOS allocator imbalance issues in YAML:

- { ost_idx: 0, tgt_weight: 1137680, tgt_penalty: 0,
    tgt_penalty_per_obj: 115544, tgt_avail: 1137680,
    tgt_last_used: 1677104866, svr_nid: 192.168.100.31@tcp,
    svr_bavail: 2070560, svr_iavail: 1, svr_penalty: 0,
    svr_penalty_per_obj: 52572, svr_last_used: 1677104866 }

Writing to qos_ost_weights/qos_mdt_weights would cause
resetting of tgt_weight, tgt_penaly and svr_penalty.

The patch also adds sanity_205f to check YAML output.

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I27e3f5abeb2f31b1c445658be035ec7e76c1572e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
13 months agoLU-16263 lov: continue fsync on other OST objs even on -ENOENT 05/50005/4
Bobi Jam [Wed, 15 Feb 2023 09:09:53 +0000 (17:09 +0800)]
LU-16263 lov: continue fsync on other OST objs even on -ENOENT

When fsync races with truncate, we'd continue to other OST object's
fsync even some stripe fsync returns -ENOENT, so that on client it
could potentially discard caching pages by calling
osc_io_fsync_start()->osc_cache_writebase_range().

Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I457ba80063086e310df55aaa22778b51a6ea211e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50005
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
13 months agoLU-16579 llite: fix the wrong beyond read end calculation 65/50065/5
Qian Yingjin [Mon, 20 Feb 2023 03:11:54 +0000 (22:11 -0500)]
LU-16579 llite: fix the wrong beyond read end calculation

During the test, we found a dead loop in the read path which
retruns AOP_TRUNCATED_PAGE(0x8001) endless.
The reason is that the calculation of the ending beyond offset is
wrong: (iter->count + iocb->ki_pos).
The ending beyond offset was supposed to be not changed during
the read I/O loop for each page in buffered I/O mode.
However, @iter->count is decreased with read bytes when finished
the read of each page: @iter->count -= read_bytes.

In this patch, we store the ending beyond page index in
@lcc->lcc_end_index before call @generic_file_read_iter into a
loop for each read page and solve this bug.

Fixes: 2f8f38effa ("LU-16412 llite: check read page past requested")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I5bb7ab82e5e2de8b9bd911798fb8ae65fc7c91af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50065
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-15542 tests: skip interop conf-sanity test_61b 22/50222/2
Andreas Dilger [Tue, 7 Mar 2023 01:11:01 +0000 (18:11 -0700)]
LU-15542 tests: skip interop conf-sanity test_61b

Skip large_xattr test for old MDS without fix for large_xattr.
Minor cleanup to code style.

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY="61" serverversion=2.15.0
Fixes: 716de353b7 ("LU-15542 osd-ldiskfs: exclude EA inode from processing")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieb30568b617177a9986a139b289ba1ced63ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50222
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16381 ofd: fix soft_sync_limit lprocfs write handler 62/49362/8
Andrew Perepechko [Sun, 11 Dec 2022 11:22:49 +0000 (14:22 +0300)]
LU-16381 ofd: fix soft_sync_limit lprocfs write handler

soft_sync_limit_store() must return the size of the input buffer
in the success case, otherwise the writer may loop forever.

Change-Id: I2d926eaba7062c495a56c6ee32e1b82e08df63ce
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49362
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-14980 osd: check for locks in osd_trans_start() 22/44822/30
Alex Zhuravlev [Thu, 2 Sep 2021 15:21:38 +0000 (18:21 +0300)]
LU-14980 osd: check for locks in osd_trans_start()

since LU-10048 we shouldn't be starting a transaction with
object (osd) locks held.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia7d1de9351a23f8e0de52f3d5d0948f1e65529e7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44822
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-14980 mdd: mdd_layout_swap() to follow tx-lock rule 24/44824/25
Alex Zhuravlev [Thu, 2 Sep 2021 16:17:03 +0000 (19:17 +0300)]
LU-14980 mdd: mdd_layout_swap() to follow tx-lock rule

i.e. start transaction first, then take local (osd) locks
on the objects involved.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9a4add277f0911fa02d9b214e996c441d0952f9c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44824
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-14980 mdd: unlock object before changelog 28/44828/24
Alex Zhuravlev [Thu, 2 Sep 2021 18:24:04 +0000 (21:24 +0300)]
LU-14980 mdd: unlock object before changelog

we can't hold the object (osd) lock over transaction start
due to the locking rules. and we don't need the object
to be locked as only the fid is used at that point.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iaeb4645bfc9271d21d3644398c4c83f8e9b7aa04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44828
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16614 lfs: Fix memory leak 96/50196/2
Arshad Hussain [Fri, 3 Mar 2023 09:10:01 +0000 (14:40 +0530)]
LU-16614 lfs: Fix memory leak

Fix memory leak while running "lfs check all"

Before patch:
=============
$ valgrind --leak-check=full lfs check all
<snip>
==93768== LEAK SUMMARY:
==93768==    definitely lost: 56 bytes in 2 blocks
==93768==    indirectly lost: 282 bytes in 5 blocks
==93768==      possibly lost: 0 bytes in 0 blocks
==93768==    still reachable: 0 bytes in 0 blocks
==93768==         suppressed: 0 bytes in 0 blocks
<snip>

After patch:
=============
$ valgrind --leak-check=full lfs check all
<snip>
 HEAP SUMMARY:
   in use at exit: 0 bytes in 0 blocks
 total heap usage: 98 allocs, 98 frees, 294,999 bytes allocated
 All heap blocks were freed -- no leaks are possible
<snip>

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ib39d1f89a9c6937668f2a7d515606c7df86ac1df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50196
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16607 tests: Use real IPs for router tests 75/50175/4
Chris Horn [Wed, 1 Mar 2023 20:55:05 +0000 (14:55 -0600)]
LU-16607 tests: Use real IPs for router tests

Use real IPs for some of the router tests in sanity-lnet. By this we
mean use IPs that are on the same network as the interface that is
configured for LNet. This decreases the time it takes to clean up
from some test cases because socklnd is able to tear down connections
more easily.

For example, running all of sanity-lnet in a single node config takes
approximately 912 seconds without this patch. With this patch it takes
190 seconds.

validate_gateway_nids() is also enhanced to check that each gateway
nid that is expected to be configured is present in the actual
yaml configuration.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I962c6a52c411f27ed7f0258e1840073da12f31f0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50175
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16535 ldiskfs: Add SUSE 15 SP5 server support 23/49923/5
Shaun Tancheff [Tue, 7 Feb 2023 06:39:01 +0000 (00:39 -0600)]
LU-16535 ldiskfs: Add SUSE 15 SP5 server support

SUSE 15 SP5 server support needs an updated
  ext4-mballoc-pa-free-mismatch.patch
for Linux v5.18 and later, as linux/genhd.h was removed

Test-Parameters: trivial
HPE-bug-id: LUS-11471
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4c54edda99e9871389d560de58c415fc8c0ff0a2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16534 build: Prefer timer_delete[_sync] 22/49922/6
Shaun Tancheff [Tue, 7 Feb 2023 08:18:36 +0000 (02:18 -0600)]
LU-16534 build: Prefer timer_delete[_sync]

Linux commit v6.1-rc1-7-g9a5a30568697
  timers: Get rid of del_singleshot_timer_sync()
Linux commit v6.1-rc1-11-g9b13df3fb64e
  timers: Rename del_timer_sync() to timer_delete_sync()
Linux commit v6.1-rc1-12-gbb663f0f3c39
  timers: Rename del_timer() to timer_delete()

Prefer timer_delete_sync() to del_singleshot_timer_sync()
Prefer timer_delete_sync() to del_timer_sync()
Prefer del_timer() to timer_delete()

Provide del_timer and del_timer_sync when
timer_delete[_sync] is not available

Test-Parameters: trivial
HPE-bug-id: LUS-11470
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4c946c315a83482dd0bd69e5e89f0302a67bf81c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49922
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16527 llite: dir layout inheritance fixes 82/49882/4
Vitaly Fertman [Thu, 2 Feb 2023 20:20:04 +0000 (23:20 +0300)]
LU-16527 llite: dir layout inheritance fixes

fixes for some minor problems:
- it may happen that the depth is not set on a dir, do not consider
  depth == 0 as a real depth while checking if the root default is
  applicable;
- setdirstripe util implicitely sets max_inherit to 3 for non-striped
  dir when -i option is given but -c is not; at the same time 3 is the
  default for striped dirs only;
- getdirstripe shows inherited default layouts with max_inherit==0,
  whereas it has no sense anymore; the same for an explicitely set
  default layout on a dir/root with max_inherit==0;
- getdirstripe hides max_inherit_rr when stripe_offset != -1 as it has
  no sense and reset to 0, however it leads to user confusion;

HPE-bug-id: LUS-11090
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I65daaac76533fad44a88e1c5a8aad4467c9f7682
Reviewed-on: https://es-gerrit.dev.cray.com/161035
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16526 lmv: module params for default dir layout 78/49878/3
Vitaly Fertman [Thu, 18 Aug 2022 15:40:03 +0000 (18:40 +0300)]
LU-16526 lmv: module params for default dir layout

add module parameters to let the default ROOT layout to be flexible
enough to be disabled now and set to something arbitrary in future.

HPE-bug-id: LUS-11140
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I66cae2e0aa5be9c56552bcf0e8810f3f1b944836
Reviewed-on: https://es-gerrit.dev.cray.com/161058
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-15626 tests: Fix "error" reported by shellcheck (5/5) 39/49439/8
Arshad Hussain [Wed, 22 Jun 2022 12:59:17 +0000 (18:29 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck (5/5)

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/test-framework.sh. This patch also
moves spaces to tabs.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic0d1046577dacf242787841b64319a6e206bae22
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49439
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
14 months agoLU-15626 tests: Fix "error" reported by shellcheck (2/5) 36/49436/8
Arshad Hussain [Wed, 22 Jun 2022 11:28:54 +0000 (16:58 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck (2/5)

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/test-framework.sh. This patch also
moves spaces to tabs.

Change-Id: Ib2bb4e864ef0032f2906bf7e0f9ad0542a8411b3
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49436
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
14 months agoLU-16341 quota: fix panic in qmt_site_recalc_cb 41/49241/4
Sergey Cheremencev [Fri, 24 Jun 2022 20:38:29 +0000 (23:38 +0300)]
LU-16341 quota: fix panic in qmt_site_recalc_cb

The panic occurred due to empty qit_lqes array after
qmt_pool_lqes_lookup_spec. Sometimes it is possible if
global lqe is not enforced. Return -ENOENT from
qmt_pool_lqes_lookup_spec if no lqes have been added.
It fixes following panic:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
...
RIP: 0010:qmt_site_recalc_cb+0x2ec/0x780 [lquota]
...
[ffffa5564118fda0] cfs_hash_for_each_tight at ffffffffc0c72c81 [libcfs]
[ffffa5564118fe08] qmt_pool_recalc at ffffffffc142dec7 [lquota]
[ffffa5564118ff10] kthread at ffffffffb45043a6
[ffffa5564118ff50] ret_from_fork at ffffffffb4e00255

Add test sanity-quota_14 that reproduces above panic
without the fix.

HPE-bug-id: LUS-11007
Change-Id: Ie51396269fae7ed84379bef5fc964cce789eba7c
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/160828
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49241
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-15971 uapi: add DMV_IMP_INHERIT connect flag 88/47788/10
Lai Siyao [Mon, 27 Jun 2022 07:47:22 +0000 (03:47 -0400)]
LU-15971 uapi: add DMV_IMP_INHERIT connect flag

Add OBD_CONNECT2_DMV_IMP_INHERIT for implicit default LMV inherit.

Test-Parameters: trivial
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I49c217952df65461567c236790a49211a66a33d3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47788
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-5170 utils: Continue on error when multiple files requested 63/30663/16
Vitaliy Kuznetsov [Fri, 3 Mar 2023 13:17:52 +0000 (16:17 +0300)]
LU-5170 utils: Continue on error when multiple files requested

For lfs commands that accept multiple file arguments, processing
continues even if any of the files generates an error, instead
of immediately aborting. This follows the Unix convention of
attempting to process all requested files. Any relevant error
messages are reported as they are encountered.

Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I1bcfc18d17c76505796b8e367224d01f48731d9f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/30663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16621 enc: file names encryption when using secure boot 19/50219/2
Alex Deiter [Mon, 6 Mar 2023 13:59:46 +0000 (13:59 +0000)]
LU-16621 enc: file names encryption when using secure boot

Secure boot activates lockdown mode in the Linux kernel.
And debugfs is restricted when the kernel is locked down.
This patch moves file names encryption from debugfs to sysfs.

Test-Parameters: trivial testlist=sanity-sec
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Change-Id: I434714941ffac2a4694cabd33f613aef70933678
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16501 tgt: add qos debug 77/49977/4
Sergey Cheremencev [Mon, 13 Feb 2023 19:38:23 +0000 (22:38 +0300)]
LU-16501 tgt: add qos debug

Add several debug lines for QOS allocator.
Patch also changes S_CLASS subsystem to S_LOV in
lu_tgt_desc_tgt.c thus it can be enabled to capture
only QOS debugging.

Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I2dd53022d7f199e7c521bbf78acc4a8bf4abca51
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49977
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16606 lnet: lnet_parse_route uses wrong loop var 73/50173/3
Chris Horn [Wed, 1 Mar 2023 19:27:21 +0000 (13:27 -0600)]
LU-16606 lnet: lnet_parse_route uses wrong loop var

When looping over the gateways list, we're referencing the wrong
loop variable to get the gateway nid (ltb instead of ltb2).

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 3b76020810 ("LU-6142 lnet: use list_first_entry() in lnet/lnet subdirectory.")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Idbcc5e211fc8fd49831ba572805b60be511d0ffd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-9341 tests: skip interop sanity test_27U 35/49935/4
Wei Liu [Tue, 7 Feb 2023 21:25:53 +0000 (13:25 -0800)]
LU-9341 tests: skip interop sanity test_27U

Skip sanity test_27U for MDS less than 2.15.51
append pool was added by LU-9341 and test_27U was
added by LU-15727

Fixes: e2ac6e1eaa ("LU-9341 lod: Add special O_APPEND striping")
Fixes: 0396310692 ("LU-15727 lod: honor append_pool with default composite layouts")
Test-Parameters: trivial testlist=sanity env=ONLY=27U serverversion=2.14.0

Change-Id: I5c9d2ebf4e1e660fa60d845c9894409f0e96e01f
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-15129 tests: sanity-quota_75_dom fix 64/50164/2
Sergey Cheremencev [Tue, 28 Feb 2023 18:30:32 +0000 (22:30 +0400)]
LU-15129 tests: sanity-quota_75_dom fix

oflag=sync causes dd to write page by page instead
of sending several pages in RPC. Furthermore, when
granted space becomes closer to soft_limit(i.e. over
9MB if soft_limit is 10MB), OST can not preacquire
space anymore. Also OST could acquire only requested
amount of space - see qmt_alloc_expand. Thus OST has
to send quota acquire request at MDT for each BRW request
from the client. Sometimes 20 seconds is not enough to write
10MB. Change oflag=sync with conv=fsync to reduce the number
of RPCs between the client and OST and between QSDs and QMT.
One fsync at close should help to avoid timeout failure.

Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: testlist=sanity-quota env=ONLY=75,ONLY_REPEAT=50
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iad363fdc8a0984861055c295ea9cc3f23110fd9f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50164
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16285 ldlm: BL_AST lock cancel still can be batched 58/50158/3
Vitaly Fertman [Tue, 28 Feb 2023 01:45:15 +0000 (04:45 +0300)]
LU-16285 ldlm: BL_AST lock cancel still can be batched

The previous patch makes BLAST locks to be cancelled separately.
However the main problem is flushing the data under the other batched
locks, thus still possible to batch it with those with no data.
Could be optimized for not yet CANCELLING locks only, otherwise it is
already in the l_bl_ast list.

Fixes: b65374d9 ("LU-16285 ldlm: send the cancel RPC asap")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ie4a7c7f3e0f5462290f72af7c3b2ff410a31f5e7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50158
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>