Whamcloud - gitweb
fs/lustre-release.git
13 months agoLU-17379 mgc: try MGS nodes faster
Mikhail Pershin [Mon, 22 Jan 2024 12:58:23 +0000 (15:58 +0300)]
LU-17379 mgc: try MGS nodes faster

Re-organize import_select_connection to try all NIDs
faster at least at first round.

- check NID LNET discovery status and skip those not
  discovered yet on first round, at next round just
  select the least recently used one
- reset AT timeout to minimal values at first round
- track per-connection total attempts to connect,
  how many were replied, discovery status and output
  this in import stats

Lustre-change: https://review.whamcloud.com/54022
Lustre-commit: 94d05d0737db256a64626bfe6fa9801819230d8a

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib4d043e82bf156cc3e7c9ddeff0055790edcc9ee
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54949
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17379 lnet: add LNetPeerDiscovered to LNet API
Serguei Smirnov [Mon, 5 Feb 2024 20:14:30 +0000 (12:14 -0800)]
LU-17379 lnet: add LNetPeerDiscovered to LNet API

LNetPeerDiscovered is added to allow lustre check
whether the peer has been successfully discovered by LNet
before attempting to open a connection to it.
For example, given a mount command with a list of NIDs,
Lustre can use LNetAddPeer API to initiate discovery on
every candidate first, and later use LNetPeerDiscovered
to select a reachable peer to connect to.

Lustre-change: https://review.whamcloud.com/53926
Lustre-commit: dba41355565397228f587f13a901b5d762521ed0

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I7c9964148a5a2a24d7889b8b4c2e488a433ca258
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54950
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-15277 quota: don't print extra default quota info
Hongchao Zhang [Wed, 20 Mar 2024 14:14:27 +0000 (22:14 +0800)]
LU-15277 quota: don't print extra default quota info

While getting quota info by "lfs quota", it's better to include
default quota to the quota output of the specific quota ID.

Lustre-change: https://review.whamcloud.com/45725
Lustre-commit: 0a97a8a41796caa52ef27b0cc00b11ee5889c1fe

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6726888b8857f9a45a96c83db0a546b29507cf8a
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17815 tests: skip conf-sanity.sh test_5h
Emoly Liu [Mon, 6 May 2024 04:24:00 +0000 (12:24 +0800)]
LU-17815 tests: skip conf-sanity.sh test_5h

Skip conf-sanity.sh test_5h because it always caused test_102 and
test_108 failure in recent interop testing.

Lustre-change: https://review.whamcloud.com/55012
Lustre-commit: TBD (from ab300315dbbd745aec91482fc59ea15e6909fe15)

Test-Parameters: trivial serverbuildno=606 serverjob=lustre-b_es5_2 serverdistro=el7.9 testlist=conf-sanity env=ONLY="5h 102 108",HONOR_EXCEPT=y
Test-Parameters: trivial testlist=conf-sanity
Fixes: d1b5146eda ("LU-12206 mdt: mdt_init0 failure handling")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id6ffe8b5d88e1d79883cbf2d84d73796945fc734
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55013
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-8423 csdc: disable gzip
Sergey Cheremencev [Sat, 27 Apr 2024 10:22:15 +0000 (13:22 +0300)]
EX-8423 csdc: disable gzip

Gzip compression periodically causes
following assertion on clients:

 decompress_request()) ASSERTION( dst_size <= chunk_size )

Until this is not fixed:
1. forbid setting gzip layout
2. remove gzip from sanity-compr.sh, sanity.sh, sanity-flr.sh,
   sanity-lfsck.sh, sanity-pfl.sh
3. remove gzip from ll_compression_scan

There is still a backdoor to set gzip for test purposes,
if set LFS_SETSTRIPE_COMPR_OK. When set, gzip will be applied
in sanity-flr(43c, 43d) and sanity-compr(1a).

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I5461ba756dcd15e0d705f3a3c51a125a59ec19a5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54943
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-9449 csdc: replace assert with error message
Artem Blagodarenko [Tue, 16 Apr 2024 20:40:41 +0000 (21:40 +0100)]
EX-9449 csdc: replace assert with error message

Remove assert, based on the data from the wire and replace
it to the error message, which be useful in case this
error happens.

The -EAGAIN error is reasonable in this case.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I2f37d0204123af1c23352b967dad1de5e7860b64
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54817
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-15057 utils: pool quota man
Sergey Cheremencev [Wed, 31 Mar 2021 12:13:53 +0000 (15:13 +0300)]
LU-15057 utils: pool quota man

Adding pool quota man for setquota and
quota commands.
Remove [-o <obd_uuid>|-i <mdt_idx>|-I <ost_idx>]
from the case "lfs quota -t". Grace period
is stored only at quota master. Furthermore,
command lfs quota -t -I 0 /mnt/testfs fails
with EOPNOTSUPP.

Test-Parameters: trivial
Lustre-change: https://review.whamcloud.com/45121/
Lustre-commit: I368e22b782bd3626f64907059ea329e94986535b

Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Change-Id: I0e2d2c3df05c0053a1306dec9aa7353ce80162df
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17498 tests: show NIDs in node summary page
Andreas Dilger [Mon, 25 Sep 2023 17:53:18 +0000 (11:53 -0600)]
LU-17498 tests: show NIDs in node summary page

Instead of only showing the network type for each node, list
show the full NID in the YAML file to help with debugging and
identifying nodes in the logs.

Lustre-change: https://review.whamcloud.com/52500
Lustre-commit: 8e1f0cc90785463fb9ea847a8d1362941e82bcae

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7ee39b08c5cae5a3f9ee4ea4dbee001a6d889fbb
Reviewed-by: Lee Ochoa <lochoa@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17761 tests: make sanity-compr sanity/sanityn return 0
Jian Yu [Mon, 29 Apr 2024 17:26:50 +0000 (10:26 -0700)]
LU-17761 tests: make sanity-compr sanity/sanityn return 0

While running sanity-compr sanity/sanityn, if there was
sub-subtest failure, the sanity/sanityn test_cleanup would
be incorrectly marked as FAIL.

We should leave it to the individual sanity/sanityn subtests
to mark their failures, test_sanity() and test_sanityn()
should not also return an error.

Lustre-change: https://review.whamcloud.com/54855
Lustre-commit: TBD (from 96767ff8af44b5dac0677db759634515de1d1802)

Test-Parameters: trivial testlist=sanity-compr
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I1fd645b80b92e583f1a564f85e6d2d6d871b8fa8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54856
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17717 tests: skip sanity-lnet/252 for interop
Alex Zhuravlev [Tue, 9 Apr 2024 10:14:01 +0000 (13:14 +0300)]
LU-17717 tests: skip sanity-lnet/252 for interop

as the subtest fails finding the memory leak which has been
fixed recently.

Lustre-change: https://review.whamcloud.com/54707
Lustre-commit: d1c08e04cd331cdcc90a38cc6b1adc73b7da9c93

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ide80e0b39a053a2774804b025306ebdb1fc964a8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9611 lipe: ignore pylint warnings
Andreas Dilger [Tue, 16 Apr 2024 21:34:14 +0000 (15:34 -0600)]
EX-9611 lipe: ignore pylint warnings

The Python-based lipe code is deprecated, but fails during
building because of newer pylint warnings.  Ignore errors
from pylint during building until someone fixes them.

Make the installation of pylint optional to simplify builds.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibd94f6ef5ef69b1fd597f40bbecca6e3c3fb8f02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54861
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
13 months agoEX-9681 build: disable objtool and UBSAN warnings
Jian Yu [Mon, 29 Apr 2024 08:24:41 +0000 (01:24 -0700)]
EX-9681 build: disable objtool and UBSAN warnings

While building and running Lustre client with
kernel 6.8.0-31-generic, there are lots of
objtool compile-time warnings as follows:

  warning: objtool: __cfs_fail_check_set()
  falls through to next function __cfs_fail_timeout_set()

and also UBSAN runtime warnings as follow:

  UBSAN: array-index-out-of-bounds in libcfs_mem.c:97:3
  index 0 is out of range for type 'void *[*]'

Before all of the warnings are actually fixed,
we temporarily disable them to quiet the warnings
in build and system logs.

Change-Id: I18630f9a8aa6fd7c2b33b4eb8103fd7e2f6e19de
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-9688 build: Update lipe distro support
Minh Diep [Mon, 29 Apr 2024 23:52:39 +0000 (16:52 -0700)]
EX-9688 build: Update lipe distro support

In ARM rocky9.3, Rocky is now RockyLinux

Test-Parameters: trivial

Change-Id: I232a1066e3cda8e4cb1be04133432075a20402fe
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54957
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoRM-620 build: New tag 2.14.0-ddn145
Andreas Dilger [Sat, 27 Apr 2024 22:34:52 +0000 (16:34 -0600)]
RM-620 build: New tag 2.14.0-ddn145

New tag 2.14.0-ddn145

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I787fe5c5f8e68bc1a2b6b6c095eca3cbdf68c86c

13 months agoRM-620 build: New tag lipe-2.49
Andreas Dilger [Sat, 27 Apr 2024 22:34:25 +0000 (16:34 -0600)]
RM-620 build: New tag lipe-2.49

New tag lipe-2.49

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4d2205573cc92586895c54bf513ca78d501982fe

13 months agoLU-17463 osc: add option to disable page cache shrinker
Qian Yingjin [Wed, 24 Jan 2024 02:43:38 +0000 (21:43 -0500)]
LU-17463 osc: add option to disable page cache shrinker

The pages mapped into VM_LOCKED [mlocked()ed] VMAs are unevictable
pages. Those pages are marked with PG_mlocked.
However, page cache shrinker in Lustre treats all cached pages
equally even some of them are unevictable. It may evict mlocked
pages by mlock() or mlockall() calls wrongly.

This patch adds an tunable option to enable or disable page cache
shrinker:
- osc.*.enable_page_cache_shrink
It is enabled by default.

Lustre-Change: https://review.whamcloud.com/53795
Lustre-Commit: d90ce0aab10ee8856140720cd71935da6877a5ab

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I23ebf6d438a71c7917b0cb3375407a64587e15db
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54754
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-16736 quota: set revoke time to avoid endless wait
Hongchao Zhang [Fri, 26 Jan 2024 10:57:34 +0000 (18:57 +0800)]
LU-16736 quota: set revoke time to avoid endless wait

The revoke time of the lquota entry should be set when its qunit
reaches least qunit, but it could not be set in some rare case,
which could be related to the broken quota LDLM lock, set it in
"qmt_acquire" to avoid endless wait in QSD.

Lustre-change: https://review.whamcloud.com/50626
Lustre-commit: 49730821c4e5116f188c931830ce23b2da2d8a41

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ib68c5dc881346e0e619d43553ee490847ae5e225
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54907
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9588 lipe: lipe_scan3 compatibility for EXA5.2.8
Andreas Dilger [Fri, 26 Apr 2024 04:02:14 +0000 (22:02 -0600)]
EX-9588 lipe: lipe_scan3 compatibility for EXA5.2.8

Add a compatibility implementation of llapi_layout_compress_get()
so lipe_scan3 can run on EXA5.2.8 that has an old liblustreapi.so
without this function.

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7de9164467ea889ee2d47c7fbb18bfd7acce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54924
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
13 months agoEX-8779 build: enable build kernel_abi_stablelists
Minh Diep [Tue, 9 Apr 2024 15:49:23 +0000 (08:49 -0700)]
EX-8779 build: enable build kernel_abi_stablelists

To build kernel_abi_stablelists, we need to rebuild with noarch

Test-Parameters: trivial

Change-Id: I0f8abfa9a4a20539ffd0faa9ad70037fd4ef1685
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54711
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-7680 tests: Skip sanity 430c for CSDC layout
Wei Liu [Mon, 15 Apr 2024 23:26:02 +0000 (16:26 -0700)]
EX-7680 tests: Skip sanity 430c for CSDC layout

Skip sanity test_430c until SEEK_HOLE is implemented for CSDC

Test-Parameters: trivial testlist=sanity-compr env=ONLY="sanity",COMPR_EXTRA_LAYOUT="-E 1M -c 1 -E eof -Z lz4:3"
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I7359262de9e3d1644d2a45b5336328bd8253f91b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54798
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-8130 lipe: Add JSON report type for dirs stats
Vitaliy Kuznetsov [Thu, 18 Apr 2024 16:47:57 +0000 (18:47 +0200)]
EX-8130 lipe: Add JSON report type for dirs stats

This patch adds functions for displaying size statistics
for directories in the general report. This is necessary
to merge reports in the future.
This patch adds support for *.json format only.

Example structure:
"DirectoriesStats":{
  "SourceDirectory":"",
  "MaxDepth":4,
  "TotalSizeBytes":104861696,
  "TotalAllocatedSizeBytes":104861696,
  "RatingMinSizeBytes":0,
  "RatingMaxSizeBytes":104861696,
  "Rating":[
    {
      "RatingPosition":0,
      "SizeBytes":104861696,
      "AllocatedSizeBytes":104861696,
      "Depth":0,
      "FilesCount":1,
      "DirsCount":1,
      "UserID":0,
      "FID":"0x200000401:0x2:0x0",
      "DirectoryName":"d308.sanity-lipe-scan3",
      "Path":"d308.sanity-lipe-scan3"
    }
  ],
  "MainTree":{
    "ChildDirectories":[
      {
        "SizeBytes":104861696,
        "AllocatedSizeBytes":104861696,
        "Depth":0,
        "FilesCount":1,
        "DirsCount":1,
        "UserID":0,
        "GroupID":0,
        "ProjID":0,
        "Atime":1713188451,
        "Mtime":1713188451,
        "Ctime":1713188451,
        "Crtime":1713188451,
        "FID":"0x200000401:0x2:0x0",
        "DirectoryName":"d308.sanity-lipe-scan3",
        "Path":"d308.sanity-lipe-scan3",
        "ChildDirectories":[
        ]
      }
    ]
  }
}

Additional fields will also be added, and some
checks to display file statistics in JSON format.

Test-Parameters: trivial testlist=sanity-lipe-scan3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Ib250dc684cdd16e21187a710b855f4fffcf0eed1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54283
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-9136 lipe: improve attributes wording in report
Andreas Dilger [Tue, 16 Apr 2024 07:48:55 +0000 (01:48 -0600)]
EX-9136 lipe: improve attributes wording in report

Print "/" for root directory instead of "".

Improve wording of attributes descriptions in the report.

Change LS3_STATS_TYPE_EQUAL_OVERHEAD to have a 1-block margin
for the "equal" size.  Remove "overhead" from field description.

Remove extra spaces before tabs throughout file.

Test-Parameters: trivial testlist=sanity-lipe-find3,sanity-lipe-scan3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieabde4da50e1c24887789196c6c9a14a57fc9d4f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54805
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
13 months agoLU-14291 tests: make module loading of ost optional
James Simmons [Wed, 14 Feb 2024 12:38:25 +0000 (07:38 -0500)]
LU-14291 tests: make module loading of ost optional

Future Lustre versions will no longer have an ost kernel module.
load_module in the test framework will failure so capture the
failure to ignore it. We will need this for interop testing.

Lustre-change: https://review.whamcloud.com/54040
Lustre-commit: ef7deb7b076e554279f88f6d57afa17884027f9a

Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Iedff4f6a36ceffa9428e3f891db78b7538217085
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
13 months agoLU-16692 tests: force_new_seq_all interop version checking
Li Dongyang [Thu, 18 Apr 2024 11:10:39 +0000 (21:10 +1000)]
LU-16692 tests: force_new_seq_all interop version checking

force_new_seq_all is still needed in those test suites
if testing against servers don't have v2_15_61-226-gf00d2467fc

Lustre-change: https://review.whamcloud.com/54840
Lustre-commit: TBD (from 944c6d7017c08cc81d72b43cc4fc73a820111dd1)

Test-Parameters:trivial serverversion=EXA6.3.0 testlist=replay-single,replay-ost-single,replay-dual,recovery-small,replay-vbr,sanity-pfl

Change-Id: Iab963ac10308b56a60508774c1a63bcdfffdba85
Fixes: c0c664cac1 ("LU-16692 tests: remove force_new_seq from some test suites")
Fixes: 55a9dfb82d ("LU-16692 osp: do not assert on seq got over network")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17650 gss: fix use out of bounds in ptlrpc_gss
Oleg Drokin [Tue, 19 Mar 2024 03:10:13 +0000 (23:10 -0400)]
LU-17650 gss: fix use out of bounds in ptlrpc_gss

KASAN highlighted that the sockaddr_un struct is not enough
for the kernel primitives we use, so we have to use the
bigger sockaddr_storage for allocation, alas the field
names inside are different so we have to jump through some
hoops to make it actually work.
Also for a 128 byte allocation on stack variable is fine and
cannpot fail, so convert to that

Lustre-change: https://review.whamcloud.com/54452
Lustre-commit: 9519751c59f3a31b1c1fc2f7771699000aca09a2

Change-Id: I2292900b54756bf39530c96f7c5c228835562bef
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54892
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-9839 clio: lov active ios accounting fix
Alexander Zarochentsev [Tue, 21 Nov 2023 14:46:44 +0000 (09:46 -0500)]
LU-9839 clio: lov active ios accounting fix

ASSERT(atomic_read(&lov->lo_active_ios)==0) is triggered due to a
bug in active_ios accounting. For some cl_io_init(,CIT_MISC,,)
calls increment the lov_active_ios counter is not protected by the
layout lock. So the checks for active_ios != 0 are racy and not
preventing another thread from starting new cl_io and incrementing
the active_ios counter after any check but before the assertion.

The lov_active_ios counter increment should be done under the
same condition as taking the layout type lock.
The ci_type=CIT_MISC and ci_ignore_layout=1 should not be used
in ll_dom_finish_open() as the I/O doesn't come
"from the osc layer" and may race with a layout change.

Lustre-change: https://review.whamcloud.com/51638
Lustre-commit: 5bc1dd825b700677b002a43463a463c3ccb665ec

HPE-bug-id: LUS-11628
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I35fda85b968b847a87e73dd36bbb1648c744d62c
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17624 ssk: support FIPS mode on client
Sebastien Buisson [Wed, 6 Mar 2024 15:33:25 +0000 (15:33 +0000)]
LU-17624 ssk: support FIPS mode on client

In FIPS mode, only certain crypto methods are allowed. This has an
impact on the DHKE mechanism implemented for SSK, as this relies on
a prime number generated for the client key. More specifically, FIPS
mode imposes that only certain safe, well-known primes be used.

OpenSSL prior to v1.1 just imposes a requirement on the prime length.
OpenSSL v1.1 requires the use of a specific primitive when FIPS mode
is on, to fetch a well-known prime based on a prime NID.
OpenSSL v3 is capable of detecting FIPS mode is enforced, and picks up
a well-known prime instead of generating one.

Because of this, primes used for the DHKE are identical on all clients
in FIPS mode. So urge admins to use a short expiration time on SSK
keys, one day instead of one week, so that security contexts are
re-negotiated more often.

The NIST recommended primes are from see Table 26 in Appendix D of:
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Ar3.pdf

Lustre-change: https://review.whamcloud.com/54314
Lustre-commit: 5dc91df283fb5a7030b384f224085d73268dcca5

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1 clientdistro=el9.2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 clientdistro=el9.2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I52b1926393e51fba6a9e92a837f86a38516ef6ad
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54804
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17643 gss: make a local copy of the sptlrpc llog
Sebastien Buisson [Thu, 14 Mar 2024 17:15:29 +0000 (18:15 +0100)]
LU-17643 gss: make a local copy of the sptlrpc llog

Make a local copy on server side of the sptlrpc llog, so that
the targets that do not manage to connect to the MGS know at least
which security flavor to accept from clients.
This needs to pass the super_block to config_log_find_or_add().

Add sanity-sec test_70 to check that sptlrpc llog on MDS and OSS side
is equivalent to the one from the MGS.

Lustre-change: https://review.whamcloud.com/54394
Lustre-commit: 5921cb2a5b8b7e1301b2c1502be6f8006ab4082a

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I81f0136746e2df7cca1b34c4a17e4b7135a43c29
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-5134 utils: Add parallel option to lctl set_param
Ryan Haasken [Tue, 3 May 2016 19:49:57 +0000 (15:49 -0400)]
LU-5134 utils: Add parallel option to lctl set_param

Add a "-t" option to lctl set_param to enable setting multiple matched
parameters in parallel. When called with "-t", lctl will set up a work
queue of matched file names and spawn a fixed number of threads per
CPU. Each thread will pull items off the work queue, write to the file
associated with each work item, and return when there are no more
items on the work queue.

A field called po_parallel_threads is added to struct param_opts to
indicate the number of threads set_param should run in parallel. If in
parallel, jt_lcfg_setparam initializes a work queue and passes it to
do_param_op, which adds each matched item to the work queue. Once
jt_lcfg_setparam has called do_param_op for each param-value pair, it
passes the work queue to sp_run_threads, which creates threads, each
of which call write_param to set the parameter. If not in parallel,
jt_lcfg_setparam does not pass a work queue to do_param_op, and
do_param_op directly calls write_param on each matched param.

param_display was renamed to do_param_op to more accurately reflect
what it does.

If lctl is compiled without pthread support, "lctl set_param" will
still accept the "-t" option, but it will print a warning message, and
it will set the parameters in series.

The new "-t" option to set_param was documented in the lctl usage and
in the man page.

Lustre-change: https://review.whamcloud.com/10555
Lustre-commit: 345a2497d08f6b9afd74ed0188a70489f7a43e5d

HPE-bug-id: LUS-2592
Signed-off-by: Ryan Haasken <haasken@cray.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3f96a6f06c50d4ba2ce97050c35f46b976dfc005
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17713 mdd: validate the length of mdd append_pool name
Emoly Liu [Wed, 10 Apr 2024 09:18:03 +0000 (09:18 +0000)]
LU-17713 mdd: validate the length of mdd append_pool name

Validate the length of mdd append_pool name (<= LOV_MAXPOOLNAME)
before saving it in function append_pool_store().
Also, sanity.sh test_27M is improved a little to verify this fix.

Lustre-change: https://review.whamcloud.com/54691
Lustre-commit: 509a7cf9778968f796794c3743e62bc6b2a71592

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id7083fab60e9a18af4d8eedfa3d55f37544ba15d
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17692 flock: get extra reference for lockd
Yang Sheng [Thu, 28 Mar 2024 19:54:06 +0000 (03:54 +0800)]
LU-17692 flock: get extra reference for lockd

We should get local locking first for GETLK. Else
the lock_owner could be released while working with
lockd.

Lustre-change: https://review.whamcloud.com/54622
Lustre-commit: 7f8af8f37eadb0d332c94472ae9cb9556f4425d2

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I56e4204e315c2bdbc496b7961519ae45ab1820fe
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9392 sec: add server_upcall rbac role
Sebastien Buisson [Tue, 12 Mar 2024 10:32:59 +0000 (11:32 +0100)]
EX-9392 sec: add server_upcall rbac role

The purpose of the new server_upcall rbac role is to control whether
clients use the server side defined identity upcall. When set, clients
do comply with the server side identity upcall. When not set, clients
are leveraging the special INTERNAL identity upcall, which means
servers trust supplementary groups as provided by the clients.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I01dcedad5da0e175aa7b8d187f2affd34d933e39
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17518 gss: do not trust supp groups from client with krb
Sebastien Buisson [Fri, 9 Feb 2024 15:42:40 +0000 (16:42 +0100)]
LU-17518 gss: do not trust supp groups from client with krb

Thanks to Kerberos, Lustre does not have to trust clients anymore,
but relies on keytabs and tickets, cryptographically validated, to
recognize clients and users.
RPC provided supplementary groups should not be trusted, but checked
thanks to identity upcall and the trusted UID from the ticket.

Add sanity-krb5 test_9 to exercise this.

Lustre-change: https://review.whamcloud.com/53987
Lustre-commit: b09f56c208c6c34375d098f66075688f329b7c76

Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.8
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4113ef654492e76fcd377b2c0cc74e484b27850b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9530 tests: fix issues in backport of LU-13569
Serguei Smirnov [Fri, 26 Apr 2024 21:48:55 +0000 (14:48 -0700)]
EX-9530 tests: fix issues in backport of LU-13569

Backport of "LU-13569 tests: Check LNet Health recovery logic"
introduced adding of redundant lnets and drop rules.
Clean this up.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: trivial testlist=sanity-lnet clientversion=EXA6 serverversion=2.15
Fixes: 2b6f7a39 ("LU-13569 tests: Check LNet Health recovery logic")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1e2d5d31f77a29504182650be30f9db7087d82cc
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17440 lnet: prevent errorneous decref for asym route
Gian-Carlo DeFazio [Thu, 29 Feb 2024 00:44:48 +0000 (16:44 -0800)]
LU-17440 lnet: prevent errorneous decref for asym route

The following stack trace was seen on a lustre server:
Call Trace TBD:
[<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[<0>] lnet_destroy_peer_ni_locked+0x44d/0x4e0 [lnet]
[<0>] lnet_handle_find_routed_path+0x86c/0xee0 [lnet]
[<0>] lnet_select_pathway+0xb95/0x16c0 [lnet]
[<0>] lnet_send+0x6d/0x1e0 [lnet]
[<0>] lnet_parse_local+0x3ed/0xdd0 [lnet]
[<0>] lnet_parse+0xd7d/0x1490 [lnet]
[<0>] kiblnd_handle_rx+0x30e/0x900 [ko2iblnd]
[<0>] kiblnd_scheduler+0x104b/0x10d0 [ko2iblnd]
[<0>] kthread+0x14c/0x170
[<0>] ret_from_fork+0x1f/0x40

It was discovered that the lnet routes between the server
and a client cluster were misconfigured, so that the clients
had routes to the server through all 8 available routers,
but the server had routes to the clients through only 7 of
the routers.

The server was contacted by a client node through the
router with the missing route. It incremented the ref count
for the corresponding struct lnet_peer_ni for that router,
but then, because it had no route through that peer, changed
the value of the struct lnet_peer_ni to a peer with a route
back to the client. It then decremented the new
struct lnet_peer_ni which resulted in the ref count being
decremented to 0 which caused an LBUG.

Detect if the peer is a router to the appropriate net.
If so, decrement its ref count at the end of the function,
if not, decrement its ref count immediately.

Lustre-change: https://review.whamcloud.com/53896
Lustre-commit: 2b210f39059be998b80b0acc13c12451960b63bb

Fixes: 60cfce ("LU-17062 lnet: Update lnet_peer_*_decref_locked usage")
Test-Parameters: testlist=sanity-lnet mdscount=1 osscount=2 clientcount=1
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I2d00faef60ae8768afa7afbb1b00a62ba90535bb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54883
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
13 months agoLU-17062 lnet: Update lnet_peer_*_decref_locked usage
Shaun Tancheff [Sat, 16 Sep 2023 05:54:54 +0000 (00:54 -0500)]
LU-17062 lnet: Update lnet_peer_*_decref_locked usage

Move decref's to occur after last reference to prevent
use after free.

Lustre-change: https://review.whamcloud.com/52184
Lustre-commit: 60cfceb8c59364f786b31ac36c2c245b9a1e495a

HPE-bug-id: LUS-11799
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2382ece560039383f644b6aee73a9481d6bb5673
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17724 gss: fix bad use of user buffer in rsi upcall
Sebastien Buisson [Thu, 11 Apr 2024 06:58:19 +0000 (08:58 +0200)]
LU-17724 gss: fix bad use of user buffer in rsi upcall

Use the proper kernel buffer to print message out when
upcall_cache_set_upcall() returns an error.

Lustre-change: https://review.whamcloud.com/54730
Lustre-commit: fe8c195f7a5ef3e653b6eaff8863c4c94e97e28c

Fixes: a462a119ec ("LU-17497 obdclass: check upcall incorrect values")
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ice781b4506822f1fd4ce0a062ce742f51e366525
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-15851 lnet: Adjust niov checks for large MD
Chris Horn [Sat, 16 Apr 2022 16:01:57 +0000 (10:01 -0600)]
LU-15851 lnet: Adjust niov checks for large MD

An LNet user can allocate a large contiguous MD. That MD can have >
LNET_MAX_IOV pages which causes some LNDs to assert on either niov
argument passed to lnd_recv() or the value stored in
lnet_msg::msg_niov. This is true even in cases where the actual
transfer size is <= LNET_MTU and will not exceed limits in the LNDs.

Adjust ksocklnd_send()/ksocklnd_recv() to assert on the return value
of lnet_extract_kiov().

Remove the assert on msg_niov (payload_niov) from kiblnd_send().
kiblnd_setup_rd_kiov() will already fail if we exceed ko2iblnd's
available scatter gather entries.

Lustre-change: https://review.whamcloud.com/47319
Lustre-commit: 105193b4a147257a0f9332053a16eb676dc99623

HPE-bug-id: LUS-10878
Test-Parameters: trivial
Fixes: 857f11169f ("LU-13004 lnet: always put a page list into struct lnet_libmd")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaa851d90f735d04e5167bb9c07235625759245b2
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17630 osc: add cond_resched() to osc_lru_shrink()
Alex Zhuravlev [Tue, 9 Apr 2024 10:35:00 +0000 (13:35 +0300)]
LU-17630 osc: add cond_resched() to osc_lru_shrink()

osc_lru_shrink() may need to handle lots of pages and this way
can block scheduling for long. add couple cond_resched() to
prevent kernel warnings and other thread's starvation.

Lustre-change: https://review.whamcloud.com/54346
Lustre-commit: 69eb7b89c7f36ec6a8970e87fc8859207f4b9c0c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I862c568ac777c0b929a1ffb61e246b079aee6718
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54708
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9192 ofd: take local chunk-aligned lock
Alex Zhuravlev [Wed, 24 Apr 2024 11:53:17 +0000 (14:53 +0300)]
EX-9192 ofd: take local chunk-aligned lock

On OST side to prevent racing read-modify-write against same
compressed chunk from the same client.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Change-Id: Iffaf2d2856e276cb2f9becce2506154314217e3c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54890
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17767 build: struct lsmcontext has slot or id member
Sebastien Buisson [Tue, 23 Apr 2024 19:03:19 +0000 (12:03 -0700)]
LU-17767 build: struct lsmcontext has slot or id member

With Ubuntu 24.04 kernel 6.8.0-31-generic, the struct lsmcontext uses
a field named 'id' to identify the LSM module, instead of 'slot' in
previous kernel versions.

Lustre-change: https://review.whamcloud.com/54881
Lustre-commit: TBD (from 7764f73b8e25f8658867e7ab080fe5d8ec62230b)

Fixes: 0e66489401 ("LU-16619 build: Ubuntu jammy 5.19 client support")
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5080e60614b42ed63103f93cae1f481851742d0b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17774 build: pass systemdsystemunitdir to "make debs"
Jian Yu [Fri, 26 Apr 2024 17:32:46 +0000 (10:32 -0700)]
LU-17774 build: pass systemdsystemunitdir to "make debs"

This patch passes "--with-systemdsystemunitdir" configure
option to the configure command performed in "make debs".
It also updates debian/lustre-{client,server}-utils.install
with the detected/specified directory for systemd service files.

Lustre-change: https://review.whamcloud.com/54902
Lustre-commit: TBD (from f2621099bbbc032a053800940cb62d03dfbd7120)

Test-Parameters: trivial clientdistro=ubuntu2204

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I7c36904ea0ed0f393a76b0fb0ad444b330dfa78c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54921
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-9192 csdc: Fix the upper mergeable chunk pointer
Artem Blagodarenko [Tue, 16 Apr 2024 18:02:32 +0000 (19:02 +0100)]
EX-9192 csdc: Fix the upper mergeable chunk pointer

If the full chunk is followed by un mergeable page, the upper
mergeable chunk pointer is occasionally set to this unmanageable page.
The chunk size is calculated wrongly then and the next condition
suggest not to compress this chunk, because its size is not equal to
the expected size.

The pointer should be moved to the first instruction after the
can_merge_pages().

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I09fedc770c8bbcac4864b32372a941da5e0c7ac3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54814
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17504 build: fix lock_handle array-index-out-of-bounds
Andreas Dilger [Sat, 27 Apr 2024 01:48:49 +0000 (18:48 -0700)]
LU-17504 build: fix lock_handle array-index-out-of-bounds

After Linux kernel patch "ubsan: Tighten UBSAN_BOUNDS on GCC"
(commit v6.4-rc2-1-g2d47c6956ab3), flexible trailing arrays
declared like 'lock_handle[2]' will generate warnings when
CONFIG_UBSAN & co. is enabled:

    UBSAN: array-index-out-of-bounds in ldlm_request.c:1282:18
    index 2 is out of range for type 'lustre_handle [2]'

The declaration lock_handle[LDLM_LOCKREQ_HANDLES] confuses the
compiler into thinking there are only two fields in lock_handle,
but the caller often allocates extra fields beyond this for more
locks to be cancelled due to Early Lock Cancellation or from LRU.

Rather than have a second flexible array after lustre_handle[2],
declare the whole array as flexible, and fix up the few sites
that are allocating this array to ensure LDLM_LOCKREQ_HANDLES
fields are allocated at a minimum.

This subtly changes the checks in wiretest.c due to the removal
of the 2 "base" handles in ldlm_request, but I believe this is not
changing the wire protocol because it still allocates those handles
directly, and I have verified interoperability with a 2.14.0 server.

Lustre-change: https://review.whamcloud.com/54926
Lustre-commit: TBD (from 765bf07e894178a6a6f1477559a793af3a52412e)

Test-Parameters: testlist=runtests clientversion=2.14
Test-Parameters: testlist=runtests serverversion=2.14
Test-Parameters: testlist=runtests clientversion=2.15
Test-Parameters: testlist=runtests serverversion=2.15
Test-Parameters: testlist=runtests clientversion=EXA5
Test-Parameters: testlist=runtests serverversion=EXA5
Test-Parameters: testlist=runtests clientversion=EXA6
Test-Parameters: testlist=runtests serverversion=EXA6
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9695fb44f1b5c84bb750d2983cdd8b939e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54941
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
13 months agoLU-17784 build: improve wiretest for flexible arrays
Shaun Tancheff [Fri, 26 Apr 2024 18:32:09 +0000 (11:32 -0700)]
LU-17784 build: improve wiretest for flexible arrays

Flexible array checking can additionally probe that the size
of the array element is correct.

Lustre-change: https://review.whamcloud.com/54929
Lustre-commit: TBD (from a5cbf26e7985dfe60471d060439eb7cd90a17fc2)

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib7de3d156a2e77dfaf2e9ab1df8fab524c073610
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17504 build: fix gcc-13 [-Werror=stringop-overread] error
Shaun Tancheff [Thu, 25 Apr 2024 22:36:44 +0000 (15:36 -0700)]
LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error

This patch fixes the following [-Werror=stringop-overread] and
[-Werror=attribute-warning] errors detected by gcc 13:

lustre/mgc/mgc_request.c:190:21: error: 'strcmp' reading 1 or
more bytes from a region of size 0 [-Werror=stringop-overread]
  190 | if (strcmp(logname, cld->cld_logname) == 0) {
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In function 'fortify_memcpy_chk',
    inlined from 'class_handle_ioctl' at
/root/lustre-release/lustre/obdclass/class_obd.c:381:3:
include/linux/fortify-string.h:528:25: error:
call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  528 |  __write_overflow_field(p_size_field, size);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lustre-change: https://review.whamcloud.com/54834
Lustre-commit: TBD (from 787b45323742a00e262334ba6dfa8c7aff80bdac)

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f5a88b4cd64c9f4e67e568546baada371543b1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54874
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17657 build: gcc 13 stricter enum checking
Shaun Tancheff [Fri, 26 Apr 2024 17:53:36 +0000 (10:53 -0700)]
LU-17657 build: gcc 13 stricter enum checking

gcc 13 does not allow mixing of emum and integer
types between function declaration and implementation.

Cleanup a couple of instances where an enum is treated
as an uint32_t / __u32 and treat it as an enum type.

lustre/lov/lov_ea.c: In function 'lsme_unpack_comp':
lustre/lov/lov_ea.c:531:21: error: array subscript
   'struct lov_stripe_md_entry[0]' is partly outside array bounds
    of 'struct lov_stripe_md_entry[0]' [-Werror=array-bounds=]
  531 |                 lsme->lsme_magic = magic;

Lustre-change: https://review.whamcloud.com/54468
Lustre-commit: TBD (from 617e7a25b12e0cdb865188414b6d1206eedec69a)

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8e2ef989ecbdebe5e13bcea0fbb210c4a14eb45e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54873
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoRM-620 build: New tag 2.14.0-ddn144
Andreas Dilger [Mon, 15 Apr 2024 09:59:08 +0000 (03:59 -0600)]
RM-620 build: New tag 2.14.0-ddn144

New tag 2.14.0-ddn144

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2f3f6483d625cc777bcdd310e3acdde0530b3fb8

14 months agoRM-620 build: New tag lipe-2.48
Andreas Dilger [Mon, 15 Apr 2024 09:58:40 +0000 (03:58 -0600)]
RM-620 build: New tag lipe-2.48

New tag lipe-2.48

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I89c19d04e633a386d12553fda6cecca2b5b38322

14 months agoEX-9585 lipe: add lipe_find3 projid option
Andreas Dilger [Sat, 13 Apr 2024 05:38:00 +0000 (23:38 -0600)]
EX-9585 lipe: add lipe_find3 projid option

Add an option to print the project ID for a file with the
"-printf" argument, both as long option %{projid} as well
as short option and "%LP" that is compatible with "lfs find".

Sort all of the existing and new options alphabetically so that
it is easier to see which ones are implemented in the future.

Update the lipe-find3.1 man page and add a test case.

Test-Parameters: trivial testlist=sanity-lipe-find3.sh
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I18d2d3cc161c8aa92eb27c33b06214b6f5ce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54784
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17685 utils: Allow nocompr flag in lfs mirror extend
Alexandre Ioffe [Thu, 28 Mar 2024 02:34:58 +0000 (19:34 -0700)]
LU-17685 utils: Allow nocompr flag in lfs mirror extend

Extend the set of allowed optional flags in
'lfs mirror extend' command by LCME_FL_NOCOMPR. Allowed syntax:
--flags=prefer
--flags=nocompr
--flags=prefer,nocompr

Lustre-change: https://review.whamcloud.com/54640
Lustre-commit: 37e1316050c93e5233f77ebcd399a8272b989605

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Id1538182eca0142464c19c0c4b1406592e615be1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
14 months agoEX-9524 mdt: enable parallel_rename_crossdir
Li Xi [Fri, 5 Apr 2024 13:21:45 +0000 (21:21 +0800)]
EX-9524 mdt: enable parallel_rename_crossdir

parallel_rename_crossdir was not enabled due to a problem when
porting the following patch.

Fixes: ce01016a4a ("LU-17426 mdt: relax same MDT file rename lock")

The test case that excercise the feature was not run due to
the version check problem when porting the following patch.

Fixes: bc59df8232 ("LU-17426 tests: add crossdir parallel rename test")

Change-Id: I9316c599c6bd24891fbab3484935147d812b6f1c
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54682
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17078 ldlm: do not spin up thread for local cancels
Patrick Farrell [Thu, 31 Aug 2023 00:10:30 +0000 (20:10 -0400)]
LU-17078 ldlm: do not spin up thread for local cancels

When doing lockless IO on the client, the server is
responsible for taking LDLM locks for each IO.

Currently, the server sends these locks to a separate
thread for cancellation.  This behavior is necessary on the
client where a lock may protect a large number of cached
pages, so cancelling it in a user thread may introduce
unacceptable delays.  But the server doesn't have cached
pages, so it makes more sense for the server to do the
cancellation in the same thread.

We do this by not spinning up an ldlm_bl thread for
cancellations of local (server side only) locks.

This improves 4K DIO random read performance by about 9%.

Without patch, maximum server IOPs on 4K reads:
2864k IOPS

With patch:
3118k IOPS

This is the maximum performance achieved with many clients
and client threads doing 4K random AIO reads from different
files.

Lustre-change: https://review.whamcloud.com/52192
Lustre-commit: 291ac6e6925e3bdf31f527de2bedf5f19706b230

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia996732780d278c5d0bc290c5484e3bc325a347a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52193
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17297 grant: move tgt_grant_sanity_check() calls
Vladimir Saveliev [Fri, 17 Nov 2023 15:30:06 +0000 (18:30 +0300)]
LU-17297 grant: move tgt_grant_sanity_check() calls

Call tgt_grant_sanity_check() in ofd_obd_disconnect() and in
mdt_obd_disconnect() after call to tgt_grant_discard().

Otherwise, sum of grants does not match to total grant counter which
is reported as LustreError:
    ofd_obd_disconnect: tot_granted 0 != fo_tot_granted 8388608

This is because on stale export eviction
class_disconnect_stale_exports() moves stale exports to separate list
but does not update obd's grant counters.

Test to illustrate the issue is included.

Lustre-change: https://review.whamcloud.com/53171
Lustre-commit: 9df01eee755bbac5bed560f365fab85c1b1164ae

Test-Parameters: trivial testlist=recovery-small env=ONLY=156 serverversion=EXA5
Test-Parameters: trivial testlist=recovery-small env=ONLY=156 serverversion=2.15.4
HPE-bug-id: LUS-11469
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0b4568b88a2fe7b50f4eac50b4b064d7afbc7a75
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54672
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17510 obdclass: fix wake up when queuing close request.
Mr NeilBrown [Mon, 4 Mar 2024 02:15:17 +0000 (13:15 +1100)]
LU-17510 obdclass: fix wake up when queuing close request.

The waitqueue for requests that need to be sent but that haven't been
allocated a slot is kept ordered by request arrival for fairness.  So
new requests are added to the end.

For requests other than 'close' there is a limit to the number of
active requests (slots) and requests are assigned to slot on a
first-come-first-served basis, so they are simply removed from the
head of the list.

For 'close' requests it is important that these not block indefinitely
behind other other requests so there is one slot that can only be used
by a close request - and only if no other slots are used by a close
request.  These requests do not follow a strict FIFO order.

When a non-"close" request completes we wake the first request on the
list.  There is no point searching all the way down the list for a
close request that could also be woken.  We only do that when a
"close" request completes.  This optimises the common case.

However: when a request is first queued we add it to the end of the
queue and then wake up the first deserving request if there is one.
When there are free slots, this is expected to wake the request just
queued.  When there are no free slots, nothing is woken.

When a "close" request is queued and added to the end of the queue
after other non-close requests, we need to potentially search to the
end of the queue for a close request to wake, just as we do when a
close request completes.  Unfortunately we don't.  This can result in
a close request blocking indefinitely.

So: change the wakeup in obd_get_mod_rpc_slot() to match the wakeup in
obd_put_mod_rpc_slot().  This ensure consistent handling and in
particular will handle a close request immediately if there are no
other close requests in flight.

Clarify comment in claim_mod_rpc_function() to make and perform minor
code cleanup there.

Lustre-change: https://review.whamcloud.com/54259
Lustre-commit: 7a2296a397381a5f6f9473b297f0062e8ff15948

Fixes: b5fde4d6c023 ("LU-17197 obdclass: preserve fairness when waiting for rpc slot")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7b658efc0298a091166f0f18ce460fc3148047eb
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54688
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17261 lov: unlink can handle bogus striping
Alex Zhuravlev [Sat, 23 Mar 2024 17:13:32 +0000 (20:13 +0300)]
LU-17261 lov: unlink can handle bogus striping

Allow removing a file which has uninitialized OST objects in the
layout, possibly because LFSCK reconnected an orphan object back
into a mirrored file after the mirror had been deleted.

Don't wait and retry to access the bogus OST or MDT index in this
case, because the target will never appear, so waiting is futile.

Lustre-change: https://review.whamcloud.com/54544
Lustre-commit: 4ae823762db40d790ddd00c29e969b5c8e376430

Lustre-change: https://review.whamcloud.com/54719
Lustre-commit: 47573f85e60ac91f69c09b9edfbffc3f74fef298

Test-Parameters: testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr
Fixes: 94a4663db9 ("LU-17334 lmv: handle object created on newly added MDT")
Fixes: f35f897ec8 ("LU-17334 lov: handle object created on newly added OST")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I90b97c0e2d560d71b2a4c32a47fcfd7ae4e5535d
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54752
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16350 osd-ldiskfs: no_llseek removed, dquot_transfer
Shaun Tancheff [Thu, 11 Apr 2024 18:12:15 +0000 (11:12 -0700)]
LU-16350 osd-ldiskfs: no_llseek removed, dquot_transfer

Linux commit v5.19-rc2-6-g868941b14441
  fs: remove no_llseek

With the removal of no_llseek, leaving .llseek set to NULL
is functionally equivalent. Only provide no_llseek if it exists.

Linux commit v5.19-rc3-6-g71e7b535b890
 quota: port quota helpers mount ids

dquot_transfer adds a user namespace argument. Provide an
osd_dquot_transfer() wrapper to discard the additional
argument for older kernels.

Lustre-change: https://review.whamcloud.com/49266
Lustre-commit: 2de1dbd440e2b26ea1bdf663b92a3e8c62a95ee7

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If3165aed0d7b827b90e26d9f0174137d087ce57a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54745
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16692 tests: remove force_new_seq from some test suites
Li Dongyang [Fri, 15 Mar 2024 11:39:30 +0000 (22:39 +1100)]
LU-16692 tests: remove force_new_seq from some test suites

force_new_seq was used in some tests to avoid the
situation where the sequence from replay request
could be different than the one osp is at, due to
previous sequence width has been used up.

Now it can be handled so remvoe the force_new_seq
to speed up test runs.
Some force_new_seq are still required to make sure
there are enough objects in the current precreate pool
for the overstriping test cases.

Lustre-change: https://review.whamcloud.com/54433
Lustre-commit: 9ef186b71b350127e7cfb67be5729f9e0bd39c79

Change-Id: Id1bc6760e721db61c11b1c3d6b2fa82965459728
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16692 osp: do not assert on seq got over network
Li Dongyang [Tue, 13 Feb 2024 04:10:53 +0000 (15:10 +1100)]
LU-16692 osp: do not assert on seq got over network

Replay requests have FIDs already assigned and the
sequence could be different to the osp:
seq rollover happened after the original request,
then something triggers replay, or osp lost the
seq rollover record on storage.

Detect this and avoid the assert in osp_fid_diff(),
we don't update the last id on osp in this case,
otherwise orhpan cleanup could cleanup the objects
in the current osp's sequence.

Also when rollover seq happens in osp, do not
LASSERT() if we didn't get a new seq, most likely
on ofd/ost the previous seq update was lost on storage.
We could return the error code and let precreate
thread try again.

Cleanup lu_fid_diff() which is not used.
In osp_create(), do not call osp_update_last_fid()
again for the regular non-replay case, it's already
done via osp_object_assign_fid()->osp_precreate_get_fid().

Lustre-change: https://review.whamcloud.com/54020
Lustre-commit: f00d2467fc7c5ebd8a313683e039bf945a4b7094

Change-Id: I509c00b998933d45865c9540e12a2db7d1b2b8ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16692 osp: osp_fid_diff vs rollover_new_seq race
Li Dongyang [Mon, 19 Feb 2024 02:27:22 +0000 (13:27 +1100)]
LU-16692 osp: osp_fid_diff vs rollover_new_seq race

osp_fid_diff/osp_objs_precreated is accessing the
last_created_fid and pre_used_fid without opd_pre_lock,
and this could race with osp_precreate_rollover_new_seq()
when updating them to new fids.

Lustre-change: https://review.whamcloud.com/54087
Lustre-commit: bc256c25631960e1386f3359bb6c85cfe6481fb7

Change-Id: I3a61c99570b5532776ddc43247c1513b8c89fb32
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54703
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-9806 obdclass: wait for all exports to go
Alex Zhuravlev [Fri, 12 Apr 2024 05:28:28 +0000 (08:28 +0300)]
LU-9806 obdclass: wait for all exports to go

obd_zombie_export_add() removes an export from the stale list
and then schedules a job to destroy that export. in this short
window ofd_fini()/mdt_fini() can find obd_linked_exports list
empty and no work in zombie work queue. then the obd is being
removed and concurrent export destroy may find the obd in a
unexpected state:
LustreError: 11166:0:(tgt_lastrcvd.c:469:tgt_client_free())
ASSERTION( lut && lut->lut_client_bitmap ) failed

use obd_stale_export_num counter to block in obd_zombie_barrier.

move atomic_inc() from class_unlink_export to obd_export_zombie_add()
as self-exports are not added to the stale list. I

Lustre-change: https://review.whamcloud.com/50147
Lustre-commit: 08f9ebe93b300c39d2af1fb8e82a22e9c84f401b

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I62ed019f86becd3c66f5fcdf991f13cd47466e5e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17557 osd: only accounting inodes are special
Alex Zhuravlev [Mon, 19 Feb 2024 08:18:45 +0000 (11:18 +0300)]
LU-17557 osd: only accounting inodes are special

don't treat all inodes special (system) because 5.14 turns filesystem
read-only when we try to access an non-existing inode with
LDISKFS_IGET_SPECIAL flag.

Lustre-change: https://review.whamcloud.com/54091
Lustre-commit: 333c7518f18fad80fe504766ae9645f2ede0108c

Fixes: 2c0b2b7540 ("LU-13166 osd-ldiskfs: fix to allow to get system inode")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0c05adaf7b94e04c094cb069e8271bf478010b8c
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54716
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16887 scrub: delete OI when inode missing
Alexander Boyko [Thu, 11 Apr 2024 11:22:17 +0000 (19:22 +0800)]
LU-16887 scrub: delete OI when inode missing

osd_iget_check() function have no ability to check
OI when osd_iget() returns error, because inode is
lost during error. Let's return old logic.

Scrub doesn't check consistency between OI and inode
for items from inconsistent list. When OI points to
worng inode, OI record should be deleted.
(This part of 51263 had been merged into b_es6_0 along with
https://review.whamcloud.com/52037)

Lustre-change: https://review.whamcloud.com/51263
Lustre-commit: c24a090ec389ae9ca2bedb4c7e3ee777deb63c7f

Fixes: 716de353b ("LU-15542 osd-ldiskfs: exclude EA inode from processing")
HPE-bug-id: LUS-11540, LUS-11585
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic1618db1c8ee24bb307a9cf3f5ca98441a739b7f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16623 tests: ignore sanity-pfl stripe-count off-by-1
Andreas Dilger [Sun, 14 Apr 2024 06:03:24 +0000 (00:03 -0600)]
LU-16623 tests: ignore sanity-pfl stripe-count off-by-1

In some cases the MDS may not create all stripes on a file, if the
MDT-OST connection does not have precreated objects.  This is OK,
so the tests should not fail the stripe-count check if trying to
create a fully-striped file and one of the stripes is missing.

Lustre-change: https://review.whamcloud.com/54778
Lustre-commit: TBD (from 6380f3f13f7ffe854365bf55410bb34db801529a)

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie482fdf86f82e7a2292c021761885249a6c551f1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54779
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
14 months agoLU-17497 tests: skip sanity-sec/69 for old MDS
Andreas Dilger [Sun, 14 Apr 2024 07:49:46 +0000 (01:49 -0600)]
LU-17497 tests: skip sanity-sec/69 for old MDS

Older MDS versions do not have strict checking for identity_upcall
or rsi_upcall, don't run the test with those servers.

Lustre-change: https://review.whamcloud.com/54782
Lustre-commit: TBD (from 57b39fb5fecc895dc220835789d6011479bdd4db)

Test-Parameters: trivial testlist=sanity-sec env=ONLY=69 serverversion=2.15
Fixes: a462a119ec ("LU-17497 obdclass: check upcall incorrect values")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icdfda82eca32c2de7e88991ead0d9723023ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
14 months agoEX-8981 tests: fix compression_enabled check
Andreas Dilger [Sun, 14 Apr 2024 19:18:55 +0000 (13:18 -0600)]
EX-8981 tests: fix compression_enabled check

The compression_enabled() check was returning
"true" and "false" but these are invalid return values
for bash, and need to be numeric values.  As such,
the function was essentially always returning "false"
and causing every subtest using this function to be
skipped during testing since it was introduced.

Change it to return a numeric value as it should.
Run testing for affected tests both x86_64 and aarch64
to test that it is working both ways.

Test-Parameters: testlist=sanity-compr env=SANITY_ONLY="460",ONLY="0 1 1000-1080",HONOR_EXCEPT=y
Test-Parameters: testlist=hot-pools env=ONLY=80,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-flr env=ONLY=43,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-sec env=ONLY="66 67",HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-pfl env=ONLY=100,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-pfl env=ONLY=100,HONOR_EXCEPT=y clientdistro=el8.8 clientarch=aarch64
Fixes: 8465bfa296 ("EX-8981 csdc: execute tests if compression is enabled")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ice87f55617038b5c34da0bc1f76c3998d3ec639f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54786
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
14 months agoEX-9482 tests: skip sanity-pfl tests if no compression
Andreas Dilger [Sun, 14 Apr 2024 07:07:21 +0000 (01:07 -0600)]
EX-9482 tests: skip sanity-pfl tests if no compression

Skip sanity-pfl test_100* for servers that do not understand CSDC.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-pfl
Test-Parameters: testlist=sanity-pfl serverjob=lustre-master serverbuildno=4521 serverdistro=el8.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib3fcfd77e9e7ffb122ed6ade9015b02d42ea8319
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54781
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
14 months agoEX-8981 tests: skip sanity-lfsck/18i if no CSDC
Andreas Dilger [Sun, 14 Apr 2024 06:29:13 +0000 (00:29 -0600)]
EX-8981 tests: skip sanity-lfsck/18i if no CSDC

Skip sanity-lfsck test_18i if compression is not enabled/available.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lfsck env=ONLY=18,HONOR_EXCEPT=y clientdistro=el8.8 clientarch=aarch64
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I32b701ac91f072137f9f61d2cca39482f40b5ce5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
14 months agoLU-15378 tests: skip sanity test_64h for old servers
Andreas Dilger [Mon, 8 Apr 2024 19:14:48 +0000 (13:14 -0600)]
LU-15378 tests: skip sanity test_64h for old servers

Running sanity test_64h fails intermittently with EXA5.2 servers,
skip it during interop since there are a number of fixes in this
area and EXA5 grant interop isn't super critical.

Test-Parameters: trivial testlist=sanity env=ONLY=64 serverversion=EXA5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65d9e247aa62c02345c3cd0f9575e3e0ba1ff2ce
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54699
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
14 months agoRM-620 build: New tag 2.14.0-ddn143
Andreas Dilger [Tue, 9 Apr 2024 21:49:45 +0000 (15:49 -0600)]
RM-620 build: New tag 2.14.0-ddn143

New tag 2.14.0-ddn143

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4472a2c301d3d730108fcceed03bce1933d0c4cd

14 months agoLU-17034 quota: tmp fix against memory corruption
Sergey Cheremencev [Mon, 8 Apr 2024 11:43:53 +0000 (14:43 +0300)]
LU-17034 quota: tmp fix against memory corruption

Change QMT_INIT_SLV_CNT from 64 to 2000 to avoid accessing
memory out of array lqeg_arr. It could happen when at least
one of OSTs has index larger than the whole number of OSTs.
It is a temporary solution and maximum supported OST index
is 0x7d0. Later it will be changed with the longterm
solution.

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I8d9444017fa9847142f3df77c63368282ff134c4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54696
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoDDN-4905 revert: "quota: lqeg_arr memmory corruption"
Sergey Cheremencev [Mon, 8 Apr 2024 11:25:41 +0000 (14:25 +0300)]
DDN-4905 revert: "quota: lqeg_arr memmory corruption"

This reverts commit 7c6d08994b23cc3ef112e3626f9402dbccf0bc2c
("LU-17034 quota: lqeg_arr memmory corruption")
as it causes following panic:

 qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used ) failed: Cannot map ostidx 32 for 0000000072ee3f23
 qmt_map_lge_idx()) LBUG
 Call Trace TBD:
 libcfs_call_trace+0x6f/0xa0 [libcfs]
 lbug_with_loc+0x3f/0x70 [libcfs]
 qmt_map_lge_idx+0x7f/0x90 [lquota]
 qmt_seed_glbe_all+0x17f/0x770 [lquota]
 qmt_revalidate_lqes+0x213/0x360 [lquota]
 qmt_dqacq0+0x7d5/0x2320 [lquota]
 qmt_intent_policy+0x8d2/0xf10 [lquota]
 mdt_intent_opc+0x9a9/0xa80 [mdt]
 mdt_intent_policy+0x1fd/0x390 [mdt]
 ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
 ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
 tgt_enqueue+0xa4/0x200 [ptlrpc]
 tgt_request_handle+0xc9c/0x1950 [ptlrpc]
 ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
 ptlrpc_main+0xbf1/0x1510 [ptlrpc]
 kthread+0x134/0x150
 ret_from_fork+0x1f/0x40
 Kernel panic - not syncing: LBUG

Fixes: 7c6d08994b ("LU-17034 quota: lqeg_arr memmory corruption")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iff377529d2862c869b751b4c942b476262951570
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54695
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoRM-620 build: New tag 2.14.0-ddn142
Andreas Dilger [Sun, 7 Apr 2024 19:18:52 +0000 (13:18 -0600)]
RM-620 build: New tag 2.14.0-ddn142

New tag 2.14.0-ddn142

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6f94ce2c50e7be19c386d214dde67f1455aebeb7

14 months agoRM-620 build: New tag lipe-2.47
Andreas Dilger [Sun, 7 Apr 2024 19:18:32 +0000 (13:18 -0600)]
RM-620 build: New tag lipe-2.47

New tag lipe-2.47

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1d96d0f7049eeafd353e60d6a79ec19cf6554c0d

14 months agoEX-9523 csdc: fix ofd_preprw_write() for sanity 819b test
Artem Blagodarenko [Fri, 5 Apr 2024 13:03:56 +0000 (14:03 +0100)]
EX-9523 csdc: fix ofd_preprw_write() for sanity 819b test

Sanity 819b asserts:
tgt_brw_write()) ASSERTION( npages_local == npages_remote )

The test triggers fault inject in ofd_preprw_write():
if (OBD_FAIL_CHECK(OBD_FAIL_OST_2BIG_NIOBUF))
         rnb[i].rnb_len += PAGE_SIZE;

ofd_preprw_write() calculates npages_local taking in account
additional len from fault inject, BUT npages_remote is calulated
BEFORE the fault inject. So npages_remote was not adjusted.

To solve the problem it is enough to move range_to_page_count() call.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Fixes: 217341228f ("EX-7601 tgt: add remote_pages for writes")
Change-Id: Ifd659985a78c7630049a17622aff2eb7f4525fb1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54681
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17566 mdt: move squash code in new/old_init_ucred
Aurelien Degremont [Tue, 27 Feb 2024 12:20:33 +0000 (13:20 +0100)]
LU-17566 mdt: move squash code in new/old_init_ucred

Move the uid/gid squashing code at the same place,
at the bottom of the function, to make code refactoring
simpler later.

The squashing code is mostly clearing suppgids from ucred,
and no code was using between the old and new position in
the function. So that should be pretty safe.

Handle suppgids clearing the same way for both function
and for both UID or GID squashing.

Lustre-change: https://review.whamcloud.com/54194
Lustre-commit: 1730d8093fb36e7957414d314755ae5208da1011

Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: I29669af26cf68491bf1b6020548116acf318c0c7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54558
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17056 tests: force osc import reconnect in sanity-sec 30b
Sebastien Buisson [Mon, 28 Aug 2023 08:09:53 +0000 (10:09 +0200)]
LU-17056 tests: force osc import reconnect in sanity-sec 30b

In sanity-sec test_30b, force reconnect of idle osc imports
so that security flavor is correctly updated.
In case of failure, dump more information about state of the imports
and the srpc connections.

Lustre-change: https://review.whamcloud.com/54349
Lustre-commit: fa2cfb49decf3d897f63023c998a23fd98c5c3ea

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaecc7321b12e61a266e97d3640a3288f0e7ec9dd
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17666: configure lnet before add net in sanity-sec:31
Li Xi [Fri, 22 Mar 2024 12:30:57 +0000 (20:30 +0800)]
LU-17666: configure lnet before add net in sanity-sec:31

If "options lnet config_on_load=1" is not configured in
modprobe.d, the lnet will not be configured when trying to
add a network. The command will hit problem.

/usr/sbin/lnetctl net add --if eth1 --net tcp999
add:
    - net:
          errno: -22
          descr: "cannot add network: Invalid argument"

Test-Parameters: trivial testlist=sanity-sec env=ONLY=31

Lustre-change: https://review.whamcloud.com/54543
Lustre-commit: e163883f76dac45a516b7d89671513d31063b7d6

Change-Id: If65b7cb372d4f04a10ea066d62f3ae43029fcf65
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54654
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-14911 osp: release thandle if it was created
Alex Zhuravlev [Thu, 4 Apr 2024 16:49:49 +0000 (19:49 +0300)]
LU-14911 osp: release thandle if it was created

osp_statfs_update() could leak thandle if transaction couldn't
start for a reason.

Lustre-change:  https://review.whamcloud.com/44504
Lustre-commit: c807e3f33b39409a061fa997cac57ac394c503ba

Change-Id: I541a5e4a7860008eb179d905ac57997b737f178c
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17634 hsm: serialize HSM restore for a file on a client
Qian Yingjin [Wed, 13 Mar 2024 01:33:19 +0000 (21:33 -0400)]
LU-17634 hsm: serialize HSM restore for a file on a client

For a file in HSM released, exists, archived status, start tens of
processes to read it in parallel on a client, and one read process
may report "No data available" error.

After analyzed the error, we found the following bug in HSM code:
Reading a released file already granted LAYOUT lock on a client:
P1:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
        ->ldlm_cancel_resource_local()
          remove LAYOUT lock from resource into cancel list
          NOT yet cancel the LAYOUT lock on the client via ELC...

P2:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
      SKIP: No any conflict LAYOUT lock on resource lock list as P1
      has already move it (if any) into its cancel list
    ->mdt_hsm_request()
      ->cdt_restore_handle_add()
        ->cdt_restore_handle_find()
        ->list_add_tail(): add @crh to restore handle list
        NOT yet obtain EX LAYOUT lock to cancel cached LAYOUT
        locks on client side...

P3:
->ll_file_read_iter()
->ll_do_fast_read(): => return -ENODATA;
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
      SKIP as P1 has already move the conflict LAYOUT lock
      (if any) into its cancel list
    ->mdt_hsm_request()
      ->cdt_restore_handle_add()
        ->cdt_restore_handle_find()
        SKIP as found a restore handle with same FID in the
        the restore handle list added by P2.
  ->ll_layout_refresh()
  ->io->ci_need_restart = vio->vui_layout_gen != gen;
  ->LAYOUT gen does not have any change as the LAYOUT lock on
    the client is not revoken yet, will not restart I/O...
->return -ENODATA; =>from fast read

We can fix this bug by serializing the HSM restore operation on a
client by using the @lli->lli_layout_mutex simply.

Add sanity-hsm/test_12{t, u} to verfiy it.

Lustre-change: https://review.whamcloud.com/54366
Lustre-commit: a6b3faffeaea7abbef389ad5296880a522a13460

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idc2a8c1818386c64798d7e28500c20c80ff369f1
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54653
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17499 llite: inode lock in ll_migrate()
Alex Zhuravlev [Thu, 15 Feb 2024 16:24:12 +0000 (19:24 +0300)]
LU-17499 llite: inode lock in ll_migrate()

should be taken after data version check as this is the
correct locking order used in another paths like lseek.

Lustre-change: https://review.whamcloud.com/54041
Lustre-commit: 133fd8b4b11a0228f71d60e3d145d93be16014c9

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0bafb8db215a2ea004928ff36049d8f053507c6f
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17448 lod: don't skip uninited components
Alex Zhuravlev [Wed, 6 Mar 2024 18:00:51 +0000 (21:00 +0300)]
LU-17448 lod: don't skip uninited components

don't skip uninitialized component during declaration as we need
to declare potential records to llogs if the component is created
in this transaction later.

Lustre-change: https://review.whamcloud.com/54302
Lustre-commit: 35b1076aef8fbb2840de2b831765a20ec937d034

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia1cbfaae9b28e40fd68fa125d748ec0b5319f512
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17546 osd: use __vfs_removexattr
Alex Zhuravlev [Fri, 16 Feb 2024 05:31:41 +0000 (08:31 +0300)]
LU-17546 osd: use __vfs_removexattr

as otherwise vfs_removexattr() taking inode's lock confict with
osd_execute_truncate() while we don't really need inode's lock
because another per-object lock has been already taken.

Lustre-change: https://review.whamcloud.com/54072
Lustre-commit: b9ef5d1e7f7dd1055a6ea6d3dc9f176fa910a372

Fixes: dcd5607ce0 ("LU-13430 vfs: add ll_vfs_getxattr/ll_vfs_setxattr compat macro")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I43c1c60d2a9f911b6395e1b7546507074a90b1cf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17497 obdclass: check upcall incorrect values
Sebastien Buisson [Thu, 1 Feb 2024 15:52:22 +0000 (16:52 +0100)]
LU-17497 obdclass: check upcall incorrect values

Identity upcall is set via lctl set_param mdt.*.identity_upcall=xxx,
and rsi upcall is set via lctl set_param sptlrpc.gss.rsi_upcall=xxx.
Possible values are a valid path to an executable, and also INTERNAL
to enable support of supplementary groups from client, or NONE to
disable identity upcall.
Add an upcall cache function that checks the user provided string, to
make sure we do not store an invalid value. And print a message to
stdout to explain the accepted values.

Lustre-change: https://review.whamcloud.com/53878
Lustre-commit: 2153e86541884ef7a5c1697a5d00daf6fa6461a4

Add sanity-sec test_69 to exercise this.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaf59e72aa1612f5579db175d8999dcf0053308ed
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53879
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17446 revert: "ldlm: Do not wait for BL AST RPC completion on cancel"
Andreas Dilger [Mon, 1 Apr 2024 17:12:43 +0000 (17:12 +0000)]
LU-17446 revert: "ldlm: Do not wait for BL AST RPC completion on cancel"

This reverts commit cfd5411db998c2b0427e310a19b8741b1ec3644e.
There can be LASSERT triggered due to blocking callbacks on one lock.
This will be fixed to handle this in a more generic manner.

Change-Id: I5bdd59e3668de0f1db02f3654c73531712a77c72
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54643
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoEX-9136 lipe: Update the display format for fstats
Vitaliy Kuznetsov [Sat, 6 Apr 2024 09:08:34 +0000 (11:08 +0200)]
EX-9136 lipe: Update the display format for fstats

This patch affects only .out format report types.
The data output format has been updated according to
the request in the ticket.

Also:
1. Fixed incorrect information display in some tables;
2. Expanded additional information for each table;
3. The size in tables is now displayed in various formats,
   not just in KB;
4. Except for the “File Size” table, the size obtained from
   block_size is now used everywhere;
5. Fixed an issue with displaying the allocated size
   for directories;
6. Fixed the total size calculation for all directories;
7. Removed tables that are not yet available;
8. Added additional information about the number of missed
   files for each table;
9. All txt information for working with reports in the .out
   format is combined into a single array to
   simplify code maintenance.

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Id75b3af12ea00761850a9009848621539c016446
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54658
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17696 llite: remove LASSERT from ll_ddelete()
Jian Yu [Fri, 5 Apr 2024 08:37:03 +0000 (01:37 -0700)]
LU-17696 llite: remove LASSERT from ll_ddelete()

On Linux kernel 6.8, the changes in commit 2f42f1eb9093
("Call retain_dentry() with refcount 0") made d_delete()
instances called for dentries with ->d_lock held and
refcount equal to 0, which caused the following assertion
failure on Lustre client:

(dcache.c:136:ll_ddelete()) ASSERTION( d_count(de) == 1 ) failed

The value of d_count(de) became 0 instead of 1. Since
retain_dentry() was called either with refcount 0 or 1,
we can simply remove the LASSERT(ll_d_count(de) == 1)
from ll_ddelete() to avoid the above failure.

Lustre-change: https://review.whamcloud.com/54676
Lustre-commit: TBD (from 50bd3822d2977cd45e56521d137aec2ce5829529)

Change-Id: Ic4a39d9328326634190cd0719b4c0637e1bf315c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17504 build: fix array-index-out-of-bounds warning
Jian Yu [Wed, 3 Apr 2024 18:53:03 +0000 (11:53 -0700)]
LU-17504 build: fix array-index-out-of-bounds warning

On Linux kernel 6.5, due to commit 2d47c6956ab3
("ubsan: Tighten UBSAN_BOUNDS on GCC"), flexible
trailing arrays declared like 'lc_array_sum[1];'
will generate warnings when CONFIG_UBSAN & co. is
enabled:

  UBSAN: array-index-out-of-bounds in lprocfs_status.c:1609:17
  index 1 is out of range for type '__s64 [1]'

Since LPROCFS_STATS_FLAG_IRQ_SAFE flag is only used
in one place - obd_memory() counter, we can just
remove it and change obd_memory over to a regular
percpu_counter. This would both simplify the
lprocfs_counter() code, move over to using more
kernel functionality instead of libcfs, as well as
reduce overhead slightly for the memory accounting code.

Lustre-change: https://review.whamcloud.com/54365
Lustre-commit: TBD (from 21505a19d671868171de2ad0f94120b1ca779695)

Change-Id: Ic461c4b30317bfd2b1e9f5b6be84c4a7fb4e3eb9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17592 build: compatibility updates for kernel 6.8
Shaun Tancheff [Tue, 2 Apr 2024 22:52:02 +0000 (15:52 -0700)]
LU-17592 build: compatibility updates for kernel 6.8

Linux commit v4.9-12227-g7b737965b331 introduced
  staging/lustre/libcfs: Convert to hotplug state machine
Linux commit v4.10-rc1-5-g4205e4786d0b
  cpu/hotplug: Provide dynamic range for prepare stage
Linux commit v6.7-rc2-1-g15bece7bec0d
  cpu/hotplug: Remove unused CPU hotplug states

CPUHP_LUSTRE_CFS_DEAD was introduced in 4.9 and removed in 6.8
CPUHP_BP_PREPARE_DYN was introduced in 4.10

With no distro kernels between 4.10 and 4.11 switch to
CPUHP_BP_PREPARE_DYN

Linux commit v6.7-rc1-3-gda549bdd15c2
  dentry: switch the lists of children to hlist
Provide trival wrappers to abstract the changed members

Linux commit v6.7-rc4-79-gaf7628d6ec19
  fs: convert error_remove_page to error_remove_folio
Proved a generic_error_remove_folio() for older kernels.

Lustre-change: https://review.whamcloud.com/54229
Lustre-commit: TBD (from 2036974a891ffac3ecffc7b2a21ca50bc6c94f78)

HPE-bug-id: LUS-12181
Fixes: ce98bfe5f72 ("LU-10499 pcc: add readonly mode for PCC")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib2e85c2acd3d0934e1c4712dad53b80f0ddb1b08
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54586
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17592 build: kernel 6.8 -Werror=missing-prototypes
Shaun Tancheff [Tue, 2 Apr 2024 22:49:17 +0000 (15:49 -0700)]
LU-17592 build: kernel 6.8 -Werror=missing-prototypes

Linux commit v6.7-rc4-156-g0fcb70851fbf
  Makefile.extrawarn: turn on missing-prototypes globally

With -Wmissing-prototypes and -Werror cleanup some additional
funtions that are implicitly static and provide declarations
for those that are exported.

Add SERVER_ONLY and SERVER_ONLY_EXPORT_SYMBOL to wrap functions
that are only exported for and used by server components.

Lustre-change: https://review.whamcloud.com/54228
Lustre-commit: TBD (from 1da648a24984c94cccdf6686ab9c3aed28d32a47)

HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ice5219df5463effe964d2cd2114f003d185337da
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54584
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17592 build: kernel 6.8 removed strlcpy()
Shaun Tancheff [Tue, 2 Apr 2024 22:45:49 +0000 (15:45 -0700)]
LU-17592 build: kernel 6.8 removed strlcpy()

Linux commit v6.7-11707-gd26270061ae6
  string: Remove strlcpy()

strlcpy() is removed, use strscpy() and provide a strscpy()
for kernels that do not have one.

Lustre-change: https://review.whamcloud.com/54227
Lustre-commit: TBD (from 1861e4ce8ec6d66d17ed73042f39bacb6496685c)

HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ieab872f20e08d17a4842bc944fa38f9867de81f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54576
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoRM-620 build: New tag 2.14.0-ddn141
Andreas Dilger [Sun, 31 Mar 2024 15:38:40 +0000 (09:38 -0600)]
RM-620 build: New tag 2.14.0-ddn141

New tag 2.14.0-ddn141

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5951e02d762098a10026e744a3894d2dd77b2c0a

14 months agoRM-620 build: New tag lipe-2.46
Andreas Dilger [Sun, 31 Mar 2024 15:38:16 +0000 (09:38 -0600)]
RM-620 build: New tag lipe-2.46

New tag lipe-2.46

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I484423fbdfcd8ef6c5162e79fcb040a4039e20e0

14 months agoLU-8191 lnet: remove unused, fix non-static functions
Timothy Day [Thu, 28 Mar 2024 08:11:45 +0000 (01:11 -0700)]
LU-8191 lnet: remove unused, fix non-static functions

Static analysis shows that a number of functions
could be made static. This patch also declares
several functions in lnet static.

Lustre-change: https://review.whamcloud.com/51436
Lustre-commit: 43cbc93f1edc493e47fe5c4059bf0bae6a20c207

It is wrong to remove lnet_selftest_structure_assertion()
since it contained BUILD_BUGs used to ensure different LNet
Selftest versions can interoperate.

Add a dummy user for lnet_selftest_structure_assertion() in
LNet Selftest init. This should prevent analyzers from picking
this up as an unused function.

Lustre-change: https://review.whamcloud.com/54635
Lustre-commit: TBD (from ed2a2286d17a7d23b86a87094d1eb2abac8ea015)

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ie1b49c5652553715cd9f96b56090d33a95e3b438
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17334 lmv: exclude newly added MDT in mkdir
Lai Siyao [Thu, 18 Jan 2024 15:59:25 +0000 (10:59 -0500)]
LU-17334 lmv: exclude newly added MDT in mkdir

Exclude newly added MDT in QoS mkdir for 30 seconds in case
connections between MDTs are not ready, which may cause lookup fail.

Lustre-change: https://review.whamcloud.com/53860
Lustre-commit: a2b08583a1dc8ab18c4ea4a4b900870761a5c252

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibb5e6eda29ddfff8f66708d72e33453a96f5e7ef
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54608
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoEX-9212 lipe: Fix for 'stripe_count = -1' in stats reports
Vitaliy Kuznetsov [Thu, 28 Mar 2024 18:11:47 +0000 (19:11 +0100)]
EX-9212 lipe: Fix for 'stripe_count = -1' in stats reports

This patch corrects the display of the range for the table by the
number of stripes. Now the range will only contain one position
and will support a value of -1.

Also corrects the display for a range in other tables by
removing the fractional part.

For example:
[        -1 ] ...
[         1 ] ...
[         2 ] ...
[        10 ] ...

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Iff1933f69b713c7e5dff9145c5516fa050294d2e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoEX-9230 lipe: Add device name in file report name
Vitaliy Kuznetsov [Tue, 26 Mar 2024 13:55:30 +0000 (14:55 +0100)]
EX-9230 lipe: Add device name in file report name

This patch adds the device name (eg MDT-xxxx) to the
report name when automatically generating the name.
It also corrects the end time in the file name
(when scanning is completed) to the initial time
(when scanning began). Only for lipe_scan3.

Example of a new file name for a report:
files_sizes_report_lustre-MDT0000.2024-03-26-09:54:25.out

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I2e79404e459b5717858b92a0783fe3f1bad552ab
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54574
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoEX-9459 lipe: Fix behavior when getting attributes
Vitaliy Kuznetsov [Fri, 29 Mar 2024 14:29:49 +0000 (15:29 +0100)]
EX-9459 lipe: Fix behavior when getting attributes

This improvement is important and is intended for the
--collect-fsize-stats output policy in lipe_scan3.

This patch prevents the scanning process from stopping and
completing if any LOV attribute is not received correctly.
Instead of halting the scan, the patch adds additional error
counters, and all types of reports will now include new
error statistics.

Also add a counters for objects that have no size/allocate size.

An example of a new block with error information from a report
with the .out extension which will contain the following fields:

Error counters:
Allocated blocks is empty: 11101
Size is empty: 0
Without size (all size value empty): 59
Failed to get LOV attr: 0
Failed to get mirror count: 0
Failed to get stripe count: 0
Failed to get stripe size: 0

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I1817ea189f3d554894822ad8d12a8514546b13b0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17611 utils: fix wrong static declarations
Mikhail Pershin [Fri, 29 Mar 2024 08:12:21 +0000 (01:12 -0700)]
LU-17611 utils: fix wrong static declarations

Revert wrong changes made to zfs mount utils

Lustre-change: https://review.whamcloud.com/54293
Lustre-commit: f45a0288b00597bc797963f7aa01cae5167b024e

Test-Parameters: trivial
Fixes: c7e9bdf8d4 ("LU-8191 utils: remove unused, fix non-static functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I162d349ebadbf93a89abf49bd41465979d561423
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 utils: remove unused, fix non-static functions
Timothy Day [Fri, 29 Mar 2024 08:06:32 +0000 (01:06 -0700)]
LU-8191 utils: remove unused, fix non-static functions

Remove several functions which are never called.

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in various Lustre utils static.

Some missing headers caused some functions being
incorrectly marked as possible candidates for
being made static. These missing headers have
been added.

Lustre-change: https://review.whamcloud.com/51439
Lustre-commit: c7e9bdf8d4bb5e1127eb87472fbf0414823d5461

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id51f922be57c33c011ee2f9e509ca164cc480edf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54629
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>