Whamcloud - gitweb
Andreas Dilger [Wed, 22 Nov 2023 21:11:28 +0000 (14:11 -0700)]
RM-620 build: New tag 2.14.0-ddn116
New tag 2.14.0-ddn116
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iaf3d0d8a468b44c0bd179bc729fc66483cb45581
Andreas Dilger [Wed, 22 Nov 2023 21:10:48 +0000 (14:10 -0700)]
RM-620 build: New tag 2.14.0-2.14.0-ddn116
New tag 2.14.0-2.14.0-ddn116
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I752cf0dfd78de778fe34787b2e026fec0277f610
Qian Yingjin [Fri, 10 Nov 2023 09:23:46 +0000 (04:23 -0500)]
EX-8236 pcc: abort data copy via ll_fid_path_copy
For data copying via ll_fid_path_copy in direct I/O mode in user
space, the client calls llapi_pcc_state_fd() to obtain the file
PCC state. If it is marked with PCC_STATE_FL_ATTACH_ABORTING, the
data copy process ll_fid_path_copy exits immediately.
To reduce the overhead of these check, we do not check for each
data copy iter, instead, we do a check for certain times of I/Os
(32 times by default). For I/O size of 32MiB, it will be checking
1 times per second at 1GiB/s. There should be some time-lag
before the copy tool quits finally.
Change-Id: I20631e5481a7e97d7a1ed0729bcd269ef6248a2c
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Bobi Jam [Fri, 10 Nov 2023 09:17:50 +0000 (17:17 +0800)]
EX-7331 csdc: prohibit set compression upon encrypted file
Setting compression layout component upon encrypted file is not
allowed for now.
This patch add this check on MDS in creating file with layout,
adding/merging new mirror to existing file.
Test-Parameters: testlist=sanity-sec env=ONLY=67,PTLDEBUG=-1
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I60d9f4bfce3a498f1eb3994c6276afb9d89c99a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Fri, 17 Nov 2023 07:53:21 +0000 (15:53 +0800)]
EX-8584 tests: check and wait lpcc_purge scanning ends
check lpcc_purge status to make sure it finishs at least
one round of scanning.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc env=ONLY="200 201 202",ONLY_REPEAT=50
Change-Id: I8e6f50393d1a3cbb7a1bc976942631db6ecceb67
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53167
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Tue, 1 Nov 2022 02:57:39 +0000 (10:57 +0800)]
LU-16284 utils: lfs getstripe follows symlink
'lfs getstripe' prints the information of symlink target by default.
With '--no-follow' option it prints the information of symlink itself.
Lustre-change: https://review.whamcloud.com/49003
Lustre-commit:
af32b516593dbf2a8e7a85d885c33fd017926ada
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I6cef01af5bb2235bdcbf0b5c99af4b9ed5869515
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53139
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Mon, 20 Nov 2023 22:32:40 +0000 (14:32 -0800)]
LU-17275 kernel: RHEL 8.9 client support
This patch makes changes to support RHEL 8.9 release
with kernel 4.18.0-513.5.1.el8_9 for Lustre client.
Lustre-change: https://review.whamcloud.com/53071
Lustre-commit: TBD (from
0da16c715a06b6426a6b99c111147fc875784e85)
Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity
Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-1
Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-2
Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-3
Change-Id: Ia3672d134534b877bb6aaffb4cea0339bc55974f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Fri, 17 Nov 2023 18:02:00 +0000 (10:02 -0800)]
LU-17293 kernel: update SLES15 SP5 [5.14.21-150500.55.36.1]
Update SLES15 SP5 kernel to 5.14.21-150500.55.36.1 for Lustre client.
Lustre-change: https://review.whamcloud.com/53156
Lustre-commit: TBD (from
3e50280434d250996dfaa9d68d7da5e2c45d59ef)
Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=sles15sp5 testlist=sanity
Change-Id: I5a9afb313e9bf315ef4af5b6602785ee68c4c247
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Thu, 9 Nov 2023 19:01:19 +0000 (11:01 -0800)]
LU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3]
This patch makes changes to support new RHEL 9.3 release
for Lustre client.
Lustre-change: https://review.whamcloud.com/53054
Lustre-commit: TBD (from
9146471f862d6c6fae6c1f6ac99f55d8280a2891)
Test-Parameters: trivial env=SANITY_EXCEPT="906" \
mdtcount=4 mdscount=2 clientdistro=el9.3 testlist=sanity
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-3
Change-Id: I9cce1a7d2249cb4df39106c44ba4417411ee0757
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Tue, 24 Aug 2021 20:48:41 +0000 (13:48 -0700)]
LU-14955 lnet: Use fatal NI if none other available
Allow NI in fatal state to be selected for sending if there are no
NIs in non-fatal state.
Lustre-change: https://review.whamcloud.com/44746/
Lustre-commit:
ff3322fd0c77a8042558711d9f410326d2aa6375
Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11019
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iab8ef6ee5c5f45896196dbd88a2f61e004278297
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Tue, 14 Nov 2023 22:38:26 +0000 (15:38 -0700)]
RM-620 build: New tag 2.14.0-ddn115
New tag 2.14.0-ddn115
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8d964022825701d68ab711fb7fd5c22d7c1f6e2b
Sebastien Buisson [Sun, 24 Sep 2023 16:07:44 +0000 (12:07 -0400)]
LU-16374 enc: rename O_FILE_ENC to O_CIPHERTEXT
Rename O_FILE_ENC to O_CIPHERTEXT as per discussion in linux-fscrypt
mailing-list.
Also change the flag combination to be:
O_NOCTTY | O_NDELAY | O_DSYNC
to avoid the risk of accidental issues with tar that already opens
files with the 'O_NOCTTY | O_NDELAY' combination.
O_DSYNC does not make much sense for O_RDONLY files, but will force
writes on encrypted restore to be synchronous. With O_DIRECT and large
enough writes (32MB?) that might be OK, but not ideal for small files.
Lustre-Commit:
ac522557b1fe3ea2b7275fa6d5df73691b8d06db
Lustre-Change: https://review.whamcloud.com/51640
Fixes:
4869c7a530 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I36fed17a413ee690bc445c3e76674ed5fc337de5
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53049
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Mikhail Pershin [Fri, 13 Oct 2023 21:28:58 +0000 (00:28 +0300)]
LU-17184 mgc: remove damaged local configs
If local config llog is damaged it can't be removed and
prevents target from mounting. This happens because
mgc_llog_local_copy() uses llog_erase() to remove llogs
which can't do the job if llog header is damaged.
Patch changes are:
- llog_erase() to don't initialize header but just destroy
llog file
- mgc_llog_local_copy() to don't exit on backup to temp
file but continue with remote llog copying anyway
- conf-sanity test_151 is added to check that target can
mount with damaged local config
Lustre-change: https://review.whamcloud.com/52697
Lustre-commit:
6a6e4ee20fe5aaad4beab5477e1c7d05e4e702e2
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I637749c38fd5ed03bdac5ca1cd60196f724ab0d1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Artem Blagodarenko [Fri, 13 Oct 2023 07:49:07 +0000 (15:49 +0800)]
LU-16032 osd: move unlink of large objects to separate thread
Final unlink and freeing of blocks for large objects can lead to
a thread hung with this call stack:
Net: Service thread pid 1739 was inactive for 200.16s.
The thread might be hung, or it might only be slow and will
resume later.
Dumping the stack trace for debugging purposes:
__wait_on_buffer+0x2a/0x30
ldiskfs_wait_block_bitmap+0xe0/0xf0 [ldiskfs]
ldiskfs_read_block_bitmap+0x31/0x60 [ldiskfs]
ldiskfs_free_blocks+0x329/0xbb0 [ldiskfs]
ldiskfs_ext_remove_space+0x8a9/0x1150 [ldiskfs]
ldiskfs_ext_truncate+0xb0/0xe0 [ldiskfs]
ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs]
ldiskfs_evict_inode+0x58a/0x630 [ldiskfs]
evict+0xb4/0x180
iput+0xfc/0x190
osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
lu_object_free.isra.30+0x68/0x170 [obdclass]
lu_object_put+0xc5/0x3e0 [obdclass]
ofd_destroy_by_fid+0x20e/0x500 [ofd]
ofd_destroy_hdl+0x267/0x9f0 [ofd]
tgt_request_handle+0xaee/0x15f0 [ptlrpc]
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
ptlrpc_main+0xb34/0x1470 [ptlrpc]
kthread+0xd1/0xe0
Let's move final unlink to workqueue if inode size > 1GB. The size
threshold be configured by setting the minimum async truncate size
with the "osd-ldiskfs.*.delay_unlink_mb" parameter.
Writes to "osd-ldiskfs.*.force_sync" parameter will flush pending
delayed unlinks so that space can be reclaimed as needed.
Lustre-change: https://review.whamcloud.com/47995
Lustre-commit:
a772e90243ea0ff1de6ae9c67e1f6384c431d200
Change-Id: Id535ae4c58732769effabee42835bc2da8cb5cc1
Signed-off-by: Artem Blagodarenko <ablagodarenko@whamcloud.com>
DDN-bug-id: DDN-3144
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Vitaliy Kuznetsov [Fri, 10 Nov 2023 20:35:56 +0000 (21:35 +0100)]
LU-16827 obdfilter: Fix "emfperf obdfilter-survey" error
This patch fixes the definition of the lctl variable. It changes
the logic so that the LCTL value is assigned only when it was
defined earlier.
Lustre-change: https://review.whamcloud.com/53083
Lustre-commit:
95387e580a639eb9ff0648aecf69d0a4951325ef
Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I4dfd7e3d1f78208b33b897d8e6680e59b690014c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53084
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Timothy Day [Sat, 11 Mar 2023 22:55:09 +0000 (22:55 +0000)]
LU-16632 tests: more margin of error for sanity/56xh
Give sanity test_56xh more time to migrate files inside the
VMs before failing.
Also, fix a typo.
Lustre-change: https://review.whamcloud.com/50262
Lustre-commit:
36cbba150bce9e2890c8b462ec2ce4af2d6353a5
Test-Parameters: trivial testlist=sanity env=ONLY=56xh,ONLY_REPEAT=100
Fixes:
55968bfabe ("LU-13482 utils: bandwidth limit for lfs migrate")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If89c8c3ee113c8a14d4c0463c7bb79e353130c08
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53086
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Chris Horn [Thu, 2 Nov 2023 19:28:45 +0000 (12:28 -0700)]
LU-17258 socklnd: ensure connection type established upon race
When a connection race is hit between two peers, only increment the
retry count if a connection of the specific type has already been
established; otherwise, this can lead to an unexpected value set in
ksnr_connected and some of the assertions being triggered in
ksocknal_connect():
"ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed"
Lustre-change: https://review.whamcloud.com/52957
Lustre-commit:
5afe3b0538c533c3cca370bc9c0901abccca299a
Fixes:
da893c6c97 ("LU-16191 socklnd: limit retries on conns_per_peer mismatch")
HPE-bug-id: LUS-11922
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I6e8abb39ad3c0bcd7fbc8f8c5478c903029df908
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53046
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 10 Nov 2023 09:38:19 +0000 (02:38 -0700)]
RM-620 build: New tag 2.14.0-ddn114
New tag 2.14.0-ddn114
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia94862790d1dec3d8080b6d00445ca163afebf81
Patrick Farrell [Wed, 1 Nov 2023 20:14:12 +0000 (16:14 -0400)]
EX-7601 osc: walk chunk unaligned RPC correctly
For decompression, the client must start looking for
compressed chunks at a chunk aligned offset.
Implement this in decompress_request.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3273135990ddf51e8b3c651734e19350e91f659c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52933
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Patrick Farrell [Fri, 3 Nov 2023 19:56:27 +0000 (15:56 -0400)]
EX-7601 osc: remove unused 'wrkmem'
compress_chunk() takes a wrkmem buffer, which it does not
use.
Remove this and its allocation in compress_request.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6f236f018f5b79c57cc8725ca0f95125810a4064
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52980
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Patrick Farrell [Fri, 3 Nov 2023 18:11:46 +0000 (14:11 -0400)]
EX-7601 osc: apply compressed flag to dst page
The existing code to apply brw flags to compressed pages
has two issues:
1. The dst_page is NOT an osc async page, it is a bare BRW
page. This means the brw_page2oap macro isn't right,
because there is no oap page.
Because oap_brw_flags is actually oap_brw_page.flag, we
don't ever access the memory pointed at by OAP, just use it
to find an offset back in to the brw page.
This means the flags are set correctly, but we still
shouldn't use this macro.
2. However, the function then overwrites these flags by
copying from a page in the source, so OBD_BRW_COMPRESSED is
lost.
Add OBD_BRW_COMPRESSED when we set flags. This ensures the
flag is actually sent to the server on compressed IO.
This was not causing any problems because the server does
not actually use the OBD_BRW_COMPRESSED flag yet.
(EX-7601 uses this flag)
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia94cdc803868ce16a0b66fd58578ec8b2d00cbae
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Andreas Dilger [Thu, 9 Nov 2023 00:10:05 +0000 (17:10 -0700)]
EX-8270 sptlrpc: don't crash for too-large chunk size
If the chunk size is too large, don't fall off the
end of the page_pool[] array with a large "order".
Test-Parameters: trivial
Fixes:
d945f1b064 ("EX-6261 ptlrpc: extend sec bulk functionality")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I192ac1b227f1cab8405f6657e754101d353ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53044
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Bobi Jam [Wed, 23 Aug 2023 16:43:56 +0000 (00:43 +0800)]
EX-7806 csdc: not support data compression on MDT
Do not support setting data compression component on DoM until
data compression on MDT implemented.
Test-Parameters: trivial testlist=sanity-pfl
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3794460140f08a073377c418dd56e7dda907d96d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52062
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Thu, 9 Nov 2023 00:29:26 +0000 (17:29 -0700)]
EX-7601 csdc: improve preview warning messages
Avoid printing duplicate warning messages on the console when
creating files with multiple compressed components. On the
flip side, log a console message when compression is enabled
so that this will later be visible if enabled on a system.
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8cb2f67689824513335f3fa65e9ea751923ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53045
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Aurelien Degremont [Tue, 17 Oct 2023 13:07:45 +0000 (15:07 +0200)]
LU-17205 utils: add lctl get_param -H option
- Add a new '-H' option to 'lctl get_param' that will prefix
each output line with the parameter name instead of only
the first line by default.
That makes grepping lctl get_param with wildcards much easier
as you can now easily know which parameter returns which value.
$ lctl get_param -H osc.*.state | grep current
osc.lustre-OST0000-osc-
ff1148c0.state=current_state: FULL
osc.lustre-OST0001-osc-
ff1248c0.state=current_state: DISCONN
osc.lustre-OST0002-osc-
ff1348c0.state=current_state: FULL
osc.lustre-OST0003-osc-
ff1448c0.state=current_state: FULL
osc.lustre-OST0004-osc-
ff1548c0.state=current_state: FULL
It also prints an output line even for empty values. That also
makes like easier for admins.
- The patch also removes the force line feed if the parameter
value was larger than 80 chars. This was considered a misfeature
and is now drop for all usages, with or without -H.
Lustre-change: https://review.whamcloud.com/52730
Lustre-commit:
a12c352a3dd8d424b1da09efc6884530c60d105b
Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: Ib1fa0dc400db4c19fed10ad4cced9be5668418e3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53067
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Mon, 13 Mar 2023 22:08:30 +0000 (16:08 -0600)]
LU-16639 misc: cleanup concole messages
The lprocfs_job_cleanup() was not properly dropping all jobstats
from the hash table and printing errors from job_stat_exit() at
unmount. Ensure all stats are "old enough" when @clear is set.
Change early libcfs cfs_cpu_init() messages from CERROR() to
pr_err() to avoid circular dependencies on libcfs setup before
printing an error message to the console during module init.
Lustre-change: https://review.whamcloud.com/50283
Lustre-commit:
8f40a3d7110da1af8e310a4b7f40b86f13080938
Test-Parameters: trivial
Fixes:
ea2cd3af7b ("LU-11407 obdclass: add start time to stats files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ide3f502103392a79419cc1836200bf5a1a3ebbe5
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53063
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Thu, 9 Nov 2023 23:28:45 +0000 (16:28 -0700)]
LU-17251 tests: use stderr in precreated_ost_obj_count()
Write the status output to stderr instead of stdout, so that
it doesn't confuse the caller that is expecting the number
of objects precreated in stdout.
Test-Parameters: trivial testlist=sanity-scrub,sanity-lfsck
Fixes:
c39bdce94f ("LU-17251 test: improve parallel-scale rr_alloc test")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib9b132a04a88b15cea34872954bfa5c4ddf8cde7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53062
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Thu, 9 Nov 2023 09:38:47 +0000 (02:38 -0700)]
RM-620 build: New tag 2.14.0-ddn113
New tag 2.14.0-ddn113
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I73eab3dc06a0488b7e68c7434cb8a6af2c590a2f
Andreas Dilger [Thu, 9 Nov 2023 09:38:11 +0000 (02:38 -0700)]
RM-620 build: New tag lipe-2.36
New tag lipe-2.36
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6c986a17d42f4bd95009d9d0f03acc601c9ee2dd
Andreas Dilger [Wed, 8 Nov 2023 22:39:28 +0000 (15:39 -0700)]
EX-7601 tests: improve/skip sanity test_460a
Skip sanity test_460a for el9.2 clients, since they appear to be
failing that test regularly, but no other distro client is.
Improve the log messages to see what stage is currently running.
Limit the "cmp --verbose" messages to one chunk, otherwise it
may print the entire 14MB test file (about 80 MiB of ASCII).
Move enable_compression() and disable_compression() functions
into test-framework.sh so that they can be used for all tests.
Set LFS_SETSTRIPE_COMPR_OK=y in enable_compression() since we
already know this is a preview and don't need it printed.
Allow sanity-compr.sh to specify SANITY_ONLY and/or SANITYN_ONLY,
and skip the other test script run if only one of them is set.
Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY=460,HONOR_EXCEPT=y clientdistro=el9.2
Test-Parameters: testlist=sanity-compr env=SANITY_ONLY=460 clientdistro=ubuntu2204
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8cb2f67689824513335f3fa65e9ea7519e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53043
Tested-by: jenkins <devops@whamcloud.com>
Lei Feng [Wed, 8 Nov 2023 08:01:05 +0000 (16:01 +0800)]
EX-8570 lipe: add lpcc sub command to trigger purge scan
Add a sub command 'lpcc purge-scan' to trigger purge
scanning by sending SIGUSR2 to matching lpcc_purge
process.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I976621fe787daa15b8206eed97efdebe75cd7425
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53036
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Wed, 8 Nov 2023 04:46:45 +0000 (12:46 +0800)]
EX-8569 lipe: trigger lpcc_purge scan by SIGUSR2
send SIGUSR2 to lpcc_purge to trigger a scanning
immediately.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I2811c90ac75c93167e8104e90b424ac31c8cc50c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Wed, 8 Nov 2023 02:07:32 +0000 (10:07 +0800)]
EX-8568 lipe: lpcc_purge can disable force scanning
when force_scan_interval is set to -1, lpcc_purge will never
start force scanning.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I21bcadb97f09622eae08af73082196e816b2c9ae
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Mon, 6 Nov 2023 06:40:13 +0000 (14:40 +0800)]
EX-4125 lipe: adjust atime in lpcc_purge
Some time atime < mtime. In this case, adjust atime with mtime.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc env=ONLY="200-203"
Change-Id: I35b3da543b57265b09ef65f4e810761aa727f483
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53002
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Tue, 7 Nov 2023 04:06:24 +0000 (12:06 +0800)]
EX-8551 lipe: build arch-specific lipe-lpcc package
lpcc_purge in lipe-lpcc package is an exec binary.
So need arch-specific lipe-lpcc package.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I0387e258eaec6e39156f823d3a38b5dc3fb9a4cd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Raphael Druon <rdruon@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Tue, 22 Feb 2022 07:28:50 +0000 (01:28 -0600)]
LU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.0-ddn112
Prior to v2_14_55-29-g06e586016d setting create_count greater
than the maximum returned -ERANGE.
During interop testing skip sanity/823 for MDS older than 2.14.0-ddn112.
Lustre-change: https://review.whamcloud.com/46567
Lustre-commit:
5da859e262dd5e93bfeb2bfa1366a9e20395d3f4
Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=823
Fixes:
06e586016d3a ("LU-13941 osp: Silently lower requested create_count to maximum")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie79617deea047b2a846f696473b9c2b5681953be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53022
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 3 Nov 2023 23:49:29 +0000 (17:49 -0600)]
LU-10465 osd-ldiskfs: 8MiB IOs should bypass cache
Changes the writethrough_max_io_mb and readcache_max_io_mb
params to check for IO size >= max_io_mb instead of > max_io_mb
when deciding to bypass cache.
Read/write IOs that are 8MiB in size should bypass the pagecache
on the OSTs, rather than requiring IOs that are slightly larger
than this. 8MiB is enough to submit 1MiB to each HDD spindle in
an 8+2 RAID6, and caching these writes on the OSS is not helping.
Lustre-change: https://review.whamcloud.com/52989
Lustre-commit: TBD (from
dcdc4748f1443981a170bc2945b178226e64a6d4)
Test-Parameters: trivial
Fixes:
3043c6f189 ("LU-12071 osd-ldiskfs: bypass pagecache if requested")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iae435f5b99e2e8bc6a9458fedad65a81c2853350
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Bobi Jam [Thu, 21 Sep 2023 14:24:32 +0000 (22:24 +0800)]
LU-16958 llite: migrate deadlock on not responding lock cancel
lfs migrate race makes MDS hang with following backtrace
[ 3683.248584] [<0>] ldlm_completion_ast+0x78d/0x8e0 [ptlrpc]
[ 3683.250122] [<0>] ldlm_cli_enqueue_local+0x2fd/0x840 [ptlrpc]
[ 3683.251363] [<0>] mdt_object_local_lock+0x50e/0xb10 [mdt]
[ 3683.252615] [<0>] mdt_object_lock_internal+0x187/0x430 [mdt]
[ 3683.253793] [<0>] mdt_object_lock_try+0x22/0xa0 [mdt]
[ 3683.254857] [<0>] mdt_getattr_name_lock+0x1317/0x1dc0 [mdt]
[ 3683.256016] [<0>] mdt_intent_getattr+0x264/0x440 [mdt]
[ 3683.257105] [<0>] mdt_intent_opc+0x452/0xa80 [mdt]
[ 3683.258126] [<0>] mdt_intent_policy+0x1fd/0x390 [mdt]
[ 3683.259191] [<0>] ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
[ 3683.260350] [<0>] ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
[ 3683.261596] [<0>] tgt_enqueue+0xa4/0x200 [ptlrpc]
[ 3683.262662] [<0>] tgt_request_handle+0xc9c/0x1a40 [ptlrpc]
[ 3683.263860] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
[ 3683.265220] [<0>] ptlrpc_main+0xbf3/0x1540 [ptlrpc]
[ 3683.266303] [<0>] kthread+0x134/0x150
[ 3683.267111] [<0>] ret_from_fork+0x35/0x40
The deadlock happens as follows:
T1:
vvp_io_init()
->ll_layout_refresh() <= take lli_layout_mutex
->ll_layout_intent()
->ll_take_md_lock() <= take the CR layout lock ref
->ll_layout_conf()
->vvp_prune()
->vvp_inode_ops() <= release lli_layout_mtex
->vvp_inode_ops() <= try to acquire lli_layout_mutex
-> racer wait here for T2
T2:
->ll_file_write_iter()
->vvp_io_init()
->ll_layout_refresh() <= take lli_layout_mutex
->ll_layout_intent() <= Request layout from MDT
-> racer wait from server...
And server want to cancel the CR layout lock T1 hold, and it won't
happen. Also T1 could has take extent ldlm lock while waiting
lli_layout_mutex hold by T2, and ofd_destroy_hdl does not get the
lock cancellation response from T1.
lli_layout_mutex is only needed for enqueuing layout lock from server,
so that ll_layout_conf() does not involve with lli_layout_mutex.
Lustre-commit: TBD (from
7de620b53bea8a2fc252ceea4787f1226ce63a02)
Lustre-change: https://review.whamcloud.com/52388
Fixes:
8f2c1592c3 ("LU-16958 llite: migrate vs regular ops deadlock")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib94de2c63544c3a962199aa0537418255980ae8c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vladimir Saveliev [Wed, 26 Jul 2023 13:09:18 +0000 (16:09 +0300)]
LU-16043 osc: allow error for write on CL_FSYNC_DISCARD
If case of CL_FSYNC_DISCARD error is allowed for write of osc object.
Otherwise, the included test fails in rm with:
(osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16
(osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed:
(osc_page.c:175:osc_page_delete()) LBUG
Lustre-change: https://review.whamcloud.com/48032
Lustre-commit:
050c2fb23b1f98745305a3dfe3062ea5a66dfdb4
Test-Parameters: trivial testlist=sanity env=ONLY=907
HPE-bug-id: LUS-10410
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0aae0dc470ba0371964e7643a6d84b19a1b4e106
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andrew Perepechko [Tue, 10 Jan 2023 21:53:38 +0000 (16:53 -0500)]
LU-16609 target: top_trans_create cannot alloc memory
top_trans_create() requests __GFP_IO memory allocation,
which does not allow direct reclaim. However, if the
memory shortage is temporary, direct reclaim is reasonable.
GFP_NOFS is __GFP_IO with additional reclaim bits.
Lustre-change: https://review.whamcloud.com/50176
Lustre-commit:
9d1f8f1e3557ee3349c623f4f5596df44f60b082
Change-Id: I2c84d9d74188660063c948573780745a2b59a688
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Wed, 18 Oct 2023 03:54:59 +0000 (22:54 -0500)]
LU-17197 obdclass: preserve fairness when waiting for rpc slot
When obd_get_mod_rpc_slot() waits for an available slot it places the
waiting thread at the HEAD of the queue, so it will be woken before
anything else that is already queued. This is clearly unfair and can
hurt performance.
So change to always add to the tail to ensure a FIFO ordering (except
that CLOSE might sometimes be woken a bit early).
This regression was introduced in a rewrite that was supposed to make
waiting more fair - by avoiding a broadcast wakeup for "close"
requests.
Also fix some stale comments and expose __add_wait_queue_entry_tail
Running mdtest with the patch applied shows about a 3% improvement:
master patched
mdtest-easy-write 350.585906 kIOPS 353.783545 kIOPS
mdtest-easy-stat 1320.329353 kIOPS 1408.320419 kIOPS
mdtest-easy-delete 285.084103 kIOPS 289.625900 kIOPS
[SCORE] 509.115803 kiops 524.516113 kiops
Lustre-change: https://review.whamcloud.com/52738
Lustre-commit: TBD (from
7e28964085a4d98111b926fe125abc7f815e70be)
Fixes:
5243630b09d2 ("LU-15947 obdclass: improve precision of wakeups for mod_rpcs")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If767c4299bcbab71589b0f3c01e85bf461686ca5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Deiter [Wed, 1 Nov 2023 22:54:32 +0000 (02:54 +0400)]
LU-17251 test: improve parallel-scale rr_alloc test
Added checking for pre-created OST objects and waiting
(maximum 60 seconds) before executing the rr_alloc test.
Lustre-change: https://review.whamcloud.com/52940
Lustre-commit: TBD (from
3f1f70264e1ed9ba77094435fc598bc9abbbc044)
Test-Parameters: trivial
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: Ib604b99138ceccf384476ad2876d9df7cd7d524b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 3 Nov 2023 00:32:44 +0000 (18:32 -0600)]
LU-17251 osp: force precreate if create_count grows
Force the MDS to precreate OST objects if "osp.*.create_count" is
written and the OSP does not have at least that many precreated
objects locally. This avoids doing complex operations in test
scripts to force precreation to run, which slows down the tests
and increases the chance that a test might fail.
Previously opd_precreate_force was only used for handling OSTs
that were reformatted and this reset "create_count" to minimum, so
move that to the reformat case rather than in the precreate code
path so it does not reset "create_count" when it was just set.
Remove the "env" argument from several precreate-related functions,
since it wasn't used in those functions, and that made it difficult
to call them from the "create_count" parameter handling.
Lustre-change: https://review.whamcloud.com/52968
Lustre-commit: TBD (from
0206ef4d765aca3f298e24dd630f155114781986)
Test-Parameters: testlist=parallel-scale env=ONLY=test_rr_alloc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iac35c1b981fcd6ab2d1ea5abc9ffe2e4563ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52998
Tested-by: jenkins <devops@whamcloud.com>
Mikhail Pershin [Wed, 1 Nov 2023 14:55:39 +0000 (17:55 +0300)]
LU-17249 ptlrpc: protect scp_rqbd_idle list operations
Protect scp_rqbd_idle list entry getting by spinlock
in ptlrpc_service_purge_all() like it does in all
other places where rqbd_list linkage is being managed
Lustre-change: https://review.whamcloud.com/52931
Lustre-commit:
9ba375983d498690f5caa29c289c137470a76505
Test-Parameters: testgroup=full-part-1 env="SLOW=yes,ENABLE_QUOTA=yes"
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Iace37b1ee79bfd0c3a54a35722952e17d860a91c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Tue, 26 Sep 2023 23:57:46 +0000 (16:57 -0700)]
LU-17103 lnet: use workqueue for lnd ping buffer updates
Introduce workqueue for handling lnd-initiated ping buffer
update requests.
This is done to avoid the possibility of monitor thread
lock up waiting for the "old" ping buffer refcount to get
decremented during the update, while the message which
triggers the decrement is on the monitor thread's own queue
waiting to be processed.
Lustre-change: https://review.whamcloud.com/52522/
Lustre-commit: TBD (from
1200e9ce1b8272f4affb20386570a9a6e79ceeb4)
Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY="207 500",ONLY_REPEAT=50
Fixes:
7ac399c5 ("LU-16949 lnet: get monitor thread to update ping buffer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I5176581703e52f4adbfff417040bebcc2489b79e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52936
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Tue, 17 Oct 2023 18:43:14 +0000 (11:43 -0700)]
LU-17207 lnet: race b/w monitor thr stop and discovery push
As a result of race, discovery thread may attempt to dereference
a message on ln_mt_resendqs which was just freed by monitor thread
stopping. Make sure discovery thread is stopped first.
Lustre-change: https://review.whamcloud.com/52734/
Lustre-commit: TBD (from
5c6ca4991382a805da6e824c1dbfab931987dda6)
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0dfcf3bc5bb3c8df195388599f571bdd3caaa3d7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Tue, 7 Nov 2023 16:00:37 +0000 (17:00 +0100)]
EX-8543 tools: remove laudit/laudit-report
laudit/laudit-report is a demonstration tool for what is possible in
terms of Lustre audit. It is not meant to be used in production
because it stores the audit data as plaintext flat files, which is
both not secure and not scalable. And it is largely untested at scale.
So remove laudit/laudit-report from lipe sources, and fix build and
packaging mechanisms accordingly.
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I36fbd50cd4485f2cc7b0ee91922e58f92e008255
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53015
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Li Dongyang [Mon, 6 Nov 2023 04:22:47 +0000 (15:22 +1100)]
LU-17248 kernel: wait for pages under writeback for bdev
Use a better version of kernel patch instead of
just adding SB_I_STABLE_WRITES flag to bdev superblock.
We don't need to wait for page writeback for all block devices,
even for those don't require stable_page.
Test-Parameters: trivial
Fixes:
5968bc3954 ("LU-17248 kernel: add SB_I_STABLE_WRITES to bdev sb flag")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I20cfa33c4ef45b10e6a732e325698c6b1b00bc79
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53001
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Zhuravlev [Tue, 7 Nov 2023 14:56:16 +0000 (17:56 +0300)]
LU-16843 ldiskfs: merge extent blocks
There are cases (e.g. file written synchronously with discontiguous
blocks that are later filled in) when a lot of extents are created
initially, then the extents get merged over time, but there is no
way to merge the index blocks. This can cause a very deep extent
index tree (above 5 levels) and cause problems like:
inode has invalid extent depth: 6
Merge leave/index blocks (one at each level at most) to right/left
when extents are removed from the index.
submitted to ext4@ maillist:
https://lore.kernel.org/linux-ext4/
7A2B8861-96AA-4815-BB58-
180F63F62436@whamcloud.com/
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I09dfab193d82e7c99620ddb95aff2015023f73aa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Mon, 30 Oct 2023 10:16:49 +0000 (13:16 +0300)]
EX-8369 ldiskfs: fix dense writes
don't mix "dense" and regular writes as regular are bound
to logical offsets.
Fixes:
f36eda6a1e ("LU-10026 osd-ldiskfs: use preallocation for dense writes")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9f6b2c600f2132dcad23726f2fb3848ab02cc117
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52888
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Aurelien Degremont [Tue, 7 Nov 2023 19:38:53 +0000 (11:38 -0800)]
LU-17254 lnet: Fix ofed detection with specific kernel version
Improve OFED configure step with LNET when the kernel version
is using special characters that could be interprated in regexp
mode.
This is not uncommon in Debian world to have '+' in kernel version.
Lustre-change: https://review.whamcloud.com/52949
Lustre-commit:
b83156304df2d418aadb5d3dfd5f570ef72a7e2e
Test-parameters: trivial
Change-Id: Ia3da59c74d8c2e59e16525dd50c7b83c2b5fada8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53021
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Tue, 7 Nov 2023 19:47:01 +0000 (11:47 -0800)]
LU-17257 build: use pkg-config to find krb5 libdir
This patch fixes kerberos5.m4 to use pkg-config to
find krb5 libdir instead of looking for the krb5
libraries in a static list of path.
Lustre-change: https://review.whamcloud.com/53010
Lustre-commit: TBD (from
9cccb643173acf536f542103d47e4af7057c0ff9)
Test-Parameters: trivial kerberos=true testlist=sanity-krb5
Change-Id: Ia15812932942171b019f3e73034a78f9185c16ce
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53024
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Timothy Day [Wed, 8 Nov 2023 20:22:54 +0000 (12:22 -0800)]
LU-16518 utils: fix clang build errors
This patch fixes a number of small clang build
errors in Lustre utils. Many errors are related
to nuances in typing or statements which appear
to be tautologies. These are resolved.
Some unneeded paranthesis are removed. A variable
is initialized which could potentially be left
uninitialized. And a comparison was added that
seemed to be left out.
Lustre-change: https://review.whamcloud.com/50161
Lustre-commit:
632dc6729abcaf83aeaef8167a73ce18b9a41a67
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id3f40b033e640f8d2ae6386f66a88de06fc89666
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53042
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Aurelien Degremont [Tue, 7 Nov 2023 19:43:48 +0000 (11:43 -0800)]
LU-17256 debian: allow building client dkms on arm64
Just add 'arm64' on the supported architecture list
for 'lustre-client-modules-dkms' debian package.
Lustre-change: https://review.whamcloud.com/52951
Lustre-commit:
c4c9a8eea31cf9aa02f75ca3f119f90d67c70cc5
Test-Parameters: trivial
Change-Id: I2af307ee87448faeec81f6e0e27573ae980710f1
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53023
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Sun, 5 Nov 2023 10:52:56 +0000 (03:52 -0700)]
RM-620 build: New tag 2.14.0-ddn112
New tag 2.14.0-ddn112
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibd5e877813d29da337ac1343dcdd3223ef2e7355
Andreas Dilger [Sun, 5 Nov 2023 10:52:20 +0000 (03:52 -0700)]
RM-620 build: New tag lipe-2.35
New tag lipe-2.35
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I85314a2e67b809e0ebe40d3428db6ab19a5c554a
Jian Yu [Thu, 2 Nov 2023 18:54:42 +0000 (11:54 -0700)]
LU-16667 build: fix extra errors related to struct mnt_idmap
This patch fixes the following extra build errors related to
struct mnt_idmap:
lustre/llite/pcc.c:2440:40: error: passing argument 1 of
'inode_owner_or_capable' from incompatible pointer type
[-Werror=incompatible-pointer-types]
2440 | inode_owner_or_capable(&init_user_ns, inode)) ||
| ^~~~~~~~~~~~~
| |
| struct user_namespace *
include/linux/fs.h:1624:47: note: expected 'struct mnt_idmap *'
but argument is of type 'struct user_namespace *'
1624 | bool inode_owner_or_capable(struct mnt_idmap *idmap,
| ~~~~~~~~~~~~~~~~~~^~~~~
lustre/llite/pcc.c:3656:40: error: passing argument 1 of
'inode->i_op->fileattr_set' from incompatible pointer type
[-Werror=incompatible-pointer-types]
3656 | rc = inode->i_op->fileattr_set(&init_user_ns, dentry, &fa);
| ^~~~~~~~~~~~~
| |
| struct user_namespace *
lustre/llite/pcc.c:3656:40: note: expected 'struct mnt_idmap *'
but argument is of type 'struct user_namespace *'
Change-Id: Ia310f6f9053228b38b41243912dfe7818cfef33a
Test-Parameters: trivial
Fixes: 3011aa5 ("LU-16667 build: struct mnt_idmap, linux/filelock.h")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Thu, 2 Nov 2023 18:50:25 +0000 (11:50 -0700)]
LU-16802 build: compatibility for 6.4 kernels
linux kernel v6.3-rc4-32-g6eb203e1a868
iov_iter: remove iov_iter_iovec()
Provide a replacement iov_iter_iovec() when one is not provided.
linux kernel v6.3-rc4-34-g747b1f65d39a
iov_iter: overlay struct iovec and ubuf/len
This renames iov_iter member iov to __iov and provides the
iov_iter() accessor.
Define __iov as iov when __iov not present.
Provide an iov_iter() for older kernels.
linux kernel v6.3-rc1-13-g1aaba11da9aa
driver core: class: remove module * from class_create()
Provide an ll_class_create() to pass THIS_MODULE, or not,
as needed by class_create().
Linux commit v6.2-rc1-20-gf861646a6562
quota: port to mnt_idmap
Update osd_dquot_transfer to use mnt_idmap and fallback
to user_ns, if needed, by dquot_transfer.
Linux commit v6.3-rc7-2433-gcf64b9bce950
SUNRPC: return proper error from get_expiry()
Updated get_expiry() requires a time64_t pointer to be passed
to hold the expiry time. A non-zero return value indicates an
error, nominally -EINVAL. Provide a wrapper for kernels that
return a time64_t and return -EINVAL on error.
Lustre-change: https://review.whamcloud.com/50875
Lustre-commit: TBD (from
1bd4e67d1f635e0a5f94280c4bab85668ce677ca)
Test-Parameters: trivial
HPE-bug-id: LUS-11614
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I765d6257eec8b5a9bf1bd5947f03370eb9df1625
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Fri, 11 Aug 2023 00:58:11 +0000 (17:58 -0700)]
LU-17006 lnet: set up routes for going across subnets
Modify ksocklnd-config to set up route which features
default gateway for the subnet in case if default gateway
is defined, for example:
ip route add default via <gw_for_eth0> dev eth0 table eth0
which results in a route similar to the following added to
the eth0 route table:
default via <gw_for_eth0> dev eth0
If there's no gateway found for the eth0 subnet, keep the old
behaviour which results in the following added to eth0
route table:
<eth0_subnet> dev eth0 proto kernel scope link src <eth0_ip>
This makes sure that MR traffic goes out the intended interface
as selected by LNet no matter whether going across subnets or not.
Lustre-change: https://review.whamcloud.com/51921
Lustre-commit:
7f60b2b5580f67ca55e53a78dbaf7d50b5b7ab47
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I84a299c8b7eb4cdb4fc24408a1e42ad0283d9219
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52190
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Mon, 25 Sep 2023 19:03:20 +0000 (14:03 -0500)]
LU-17103 lnet: Avoid deadlock when updating ping buffer
lnet_peer_send_push() adds a reference to the the_lnet.ln_ping_target
lnet_ping_buffer. This reference is dropped by
lnet_discovery_event_handler. When the LNet configuration is modified
the ln_api_mutex is held and lnet_ping_target_update() is called to
update the ln_ping_target to reflect the new configuration.
While holding the ln_api_mutex, lnet_ping_target_update() will wait
until all refs on the old ping buffer are dropped. This can result
in a deadlock if the ln_api_mutex is required to complete the push.
Here is one scenario where this can happen:
1. PUSH is sent by discovery thread
2. LNet configuration is modified. lnetctl process is holding
ln_api_mutex and waiting in lnet_ping_target_update()
3. Local NI goes into recovery
4. Monitor thread wakes and attempts to send ping to local NI. If this
is the first ping sent to this NI then monitor thread needs
ln_api_mutex to create peer NI object for local NI.
(LNetGet->
lnet_send->
lnet_select_pathway->
lnet_peerni_by_nid_locked->
mutex_lock(&the_lnet.ln_api_mutex))
5. PUSH (1) fails with local timeout. It is placed on monitor thread
resend queue.
6. monitor thread cannot process resend queue until it acquires
ln_api_mutex. ln_api_mutex cannot be acquired until monitor thread
processes resend queue. Deadlock.
Fix is to drop ln_api_mutex before calling lnet_ping_md_unlink() in
lnet_ping_target_update(). This should ensure that updates to the
ping target are still synchronized via ln_api_mutex as intended, but
we're able to clear refs on the old ping buffer as needed.
Lustre-change: https://review.whamcloud.com/52479/
Lustre-commit:
3ca6ba39a21cfebc81bbe7f889c486bb82bb563a
Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=207,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I20cda185a865192f1ad162eaef1b8b4e5d751b2c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Li Dongyang [Wed, 1 Nov 2023 11:36:10 +0000 (22:36 +1100)]
LU-17248 kernel: add SB_I_STABLE_WRITES to bdev sb flag
Since RHEL 8.6 wait_for_stable_page() is controlled by
a new flag SB_I_STABLE_WRITES on the super block.
However the new flag is not set on the bdev pseudo sb,
which mean when doing write directly to the block device
we are not waiting on page writeback, this could trigger
false block integrity errors, as page could be modified
again when under writeback, the integrity checksum does
not match the new data any more.
Lustre-change: https://review.whamcloud.com/52922
Lustre-commit: TBD (from
5aeffdbec699abad07ed2326723c7743faadbf8a)
Change-Id: Ie088abf29f40b294c31f993bcfad56d6081a3fce
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52969
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Mon, 30 Oct 2023 19:13:45 +0000 (12:13 -0700)]
LU-17235 o2iblnd: adding alias ib interface causes crash
Commit
09c6e2b872 (LU-16836) causes o2iblnd startup routine to crash
when alias ib interface is used:
ifconfig ib0:0 10.1.0.52 up
modprobe lnet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0:0
Fix the code which attempts to set the NI status on startup to deal
with the case when corresponding net_device is not found gracefully.
Lustre-change: https://review.whamcloud.com/52894/
Lustre-commit: TBD (from
26a00e20fad0cd7871c30fe65653415566b498dc)
Test-Parameters: trivial testlist=sanity-lnet
Fixes:
09c6e2b872 ("LU-16836 lnet: ensure dev notification")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iaef9280a10f27ac28b872d9f4bc119c4d459ef40
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52910
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Fri, 9 Jun 2023 03:04:37 +0000 (11:04 +0800)]
EX-7584 ptlrpc: define nrs_orr_object.oo_ref atomic_t
nrs_orr_object.oo_ref is a reference counter but not atomic type.
nrs_trr_hash_ops.hs_put() is filled with nrs_orr_hop_put(), which
decreases oo_ref without any protection. So change it to atomic_t
to eliminate any potential race condition.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=77d,ONLY_REPEAT=100
Change-Id: I69d27eebdddab79d7dd7e279391cd841e438b5d3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Mon, 23 Aug 2021 14:40:39 +0000 (09:40 -0500)]
LU-13941 osp: Silently lower requested create_count to maximum
When setting create_count it should silently accept a larger value
and truncate it to the current maximum.
This would avoid issues if that limit is changed in the future.
Lustre-change: https://review.whamcloud.com/39967
Lustre-commit:
06e586016d3acc490f922e43e3aee6b8112a2803
HPE-bug-id: LUS-5960
Test-Parameters: trivial testlist=parallel-scale,sanity
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4727ba6fca747e1ae9850188ef63c7abb89904be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52967
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Artem Blagodarenko [Wed, 11 Oct 2023 16:23:56 +0000 (12:23 -0400)]
EX-7600 osc: save compressed object size
CSDC uses a sparse file feature. A client writes compressed data
chunks to the original offsets so the same data is expected to be
read from the same offsets.
There are no writes after the last compressed chunk, so no "hole"
after the last compressed chunk.
Compressed file size (based on OST objects size) is smaller than
the original on the "original last chunk size - compressed last
chunk size" delta.
Object size should be set to uncompressed size. This size is used to
calculate file size and to remove the workaround of not compressing
the last chunk in the file.
Fixes: caee1c5 ("EX-7818 osc: don't check for start inside the chunk)
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I387c282e1cf788c3b8f6230ef555d73ffffe49c1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Tue, 24 Oct 2023 23:55:20 +0000 (03:55 +0400)]
EX-7849 quota: extra debug messages
Add extra debug messages into qmt to find the
root cause of panic:
qmt_id_lock_glimpse()) ASSERTION( lqe->lqe_gl )
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I05377222e1887b660f759ed11de53cd9e4023ed1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52906
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Tue, 31 Oct 2023 07:36:49 +0000 (01:36 -0600)]
LU-16868 tests: skip conf-sanity/66 in interop
Do not run conf-sanity.sh test_66* in interop testing. Otherwise,
it is possible that the version of the test script running on the
client does not perform the upgrades with the right steps needed
for remote servers that are running a different version.
Lustre-change: https://review.whamcloud.com/52899
lustre-commit: TBD (from
774e626146ddcbeb527c0939e0210f92bab4c6c3)
Test-Parameters: trivial testlist=conf-sanity env=ONLY=66
Test-Parameters: testlist=conf-sanity env=ONLY=66 serverversion=2.12.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b28b5f123a7348f87d43c54c806eaf6173ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52900
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Vitaliy Kuznetsov [Fri, 3 Nov 2023 15:39:14 +0000 (16:39 +0100)]
EX-5258 lipe: add hidden option -show_counters
This patch adds a hidden output policy option
to lipe_find3 that shows how many total inodes
were scanned and the number of inodes
that matched the filters.
Usage example:
lipe_find3 /dev/nvme1n1 -path dir-1/* -show-counters
Output:
scanned: 1460
matched: 200
Test-Parameters: trivial
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Ibcd22a94e01ea6322cd38fd414e6058314aac8ef
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Tue, 31 Oct 2023 05:49:23 +0000 (22:49 -0700)]
EX-8441 lipe: lamigo fix compatibility with older lustre
- lfs mirror extend may dump help text either to stdout or stderr.
Lamigo will adopt both cases
- Make correct exit from loop when ssh remote session fails
- Skip hot-pools tests 75a,75b,75c if lfs mirror extend
does not support --stats-interval
- Minor code fixes
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iba035043bc4868e7898f3739d03607d5d3e21574
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52898
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 27 Oct 2023 22:03:50 +0000 (16:03 -0600)]
RM-620 build: New tag 2.14.0-ddn111
New tag 2.14.0-ddn111
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I95b2ba340e1e65cee4b661b8d986d365863fa633
Patrick Farrell [Fri, 27 Oct 2023 20:39:22 +0000 (16:39 -0400)]
EX-8421 llite: remove LBUG() from ll_readpage()
This LBUG() has been hitting sometimes in sanity-PCC, which
means EX-8421 is not completely fixed.
Until we can fully sort out EX-8421, we don't want to have
this LBUG enabled on customer systems. The underlying bug
has been present for some time and the first attempt at an
EX-8421 fix improves the situation.
So, remove the LBUG(), with the intent of putting it back
later once EX-8421 is fixed for real.
Fixes:
3a701bf587 ("EX-8421 llite: disable kernel readahead for pcc mmap")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I21f951d38f67b37626f33068d2a4b64377f4c46a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52858
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Thu, 14 Sep 2023 09:23:41 +0000 (05:23 -0400)]
EX-8236 pcc: abort in-progress attach by PCC detach command
A user may want to abort in-progress attach for some purposes such
as freeing space for PCC backend.
To support this operation, we add an "abort" option for PCC detach
command to abort the in-progress attach.
Change-Id: I49fb1c42838f8d7e9728a5c4c6f3d60e959b233b
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52375
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Thu, 14 Sep 2023 08:18:23 +0000 (04:18 -0400)]
EX-8027 pcc: add --wait option for PCC detach command
This patch adds "--wait" option for PCC detach command.
PCC detach with this option must wait for in-progress attach on
this file finished.
Add sanity-pcc/test_107 to verify it.
Change-Id: I63d52d514884b15a7b534d0f03deee441a12d3f1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52374
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Thu, 14 Sep 2023 06:43:32 +0000 (02:43 -0400)]
EX-8027 pcc: wait for attach finished for detach command
When detach a file from a PCC backend, this file may be still
attaching state. At this time, we add a flag to wait for the
attach finished (PCC_DEATCH_FL_ATTACHING_WAIT). After that, retry
the detach.
Change-Id: If85d95be744e3f7d6a07f880e78de5b68b579ed6
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Wed, 25 Oct 2023 15:26:16 +0000 (18:26 +0300)]
EX-7849 tests: add "+quota" to racer
Set temporarily "quota" debug level in racer.sh. This
should be removed after the debugging.
Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ie03b8f51bd3298d272d78447f5e6ff6969901886
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Fri, 22 Sep 2023 16:19:34 +0000 (18:19 +0200)]
LU-17138 enc: prefer specific crypto engines
Some ciphers provided by external accelerators might register under
the generic cipher name. To avoid using them with Lustre, prefer the
AES-NI variant implemented directly in the CPU. And fallback to the
generic cipher if AES-NI is not available.
Introduce a new libcfs kernel module parameter named
'client_encryption_engine' to give the ability to choose the cipher.
By default its value is 'aes-ni', which makes Lustre look for the
AES-NI cipher first. This parameter can be set to 'system-default'
whic makes Lustre pick the generic cipher.
Lustre-change: https://review.whamcloud.com/52477
Lustre-commit:
056eb9dcc0d5f80451c400342d54037f6de24bd9
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8b00f1c3c8dcf11c58e9f40a410b57b2f255e642
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52828
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Wed, 8 Mar 2023 23:40:21 +0000 (16:40 -0700)]
LU-16623 lod: handle object allocation consistently
Consistently handle the various OS_STATFS_* flags that indicate
an OST or MDT is full or otherwise marked ineligible for use.
Fix lod_statfs_check() so it skips MDTs with OS_STATFS_ENOINO
for allocating dir stripes instead of only checking OST targets.
In the LOD code, ltd_active=0 indicates that the device is not
usable for new object allocations for a variety of reasons. That
includes out of space or inodes, read-only, max_create_count=0,
or disconnected export, not *only* that the OSP is disconnected
from the OST as with imp_deactive. Targets marked ltd_active=0
will not be counted in ld_active_tgt_count, so these OSTs will
not count toward stripe_count for stripe_count=-1 files.
Set flags = LOD_USES_DEFAULT_STRIPE in lod_qos_prep_create() for
stripe_count = -1 layouts and pass it to lod_stripe_count_min()
to avoid use of *all* OSTs when free space is imbalanced or OSTs
are not available, and be happy with allocations on 3/4 of OSTs.
It looks like this functionality was missed when object allocations
transitioned from the LOV to LOD module. Put the LOV_USES_* into
an enum and rename to LOD_USES_* for consistency with current code.
Apply the lod.*.max_stripe_count limits to PFL components as well
as plain file layouts in lod_comp_entry_stripe_count().
Rename ltd_connecting to ltd_discon, since there is no guarantee
that this target is actually *connecting*, only that it is currently
disconnected. Use ltd_discon in places that checked ltd_active to
decide if the OSP was disconnected from the OST, which shouldn't be
skipped just because the OST is full or has creates disabled.
Lustre-change: https://review.whamcloud.com/50250
Lustre-commit:
ced540165ef573570b8a8cba6e43f79e5fc6539f
LU-16981 lod: update llc_stripe_count after ost inactive
If an OST gets deactivated while lod_ost_alloc_qos() is trying to
allocate stripes for a file create, then normally this is caught and
EAGAIN is returned which causes the lod_comp->llc_stripe_count to
get updated to accurately reflect the stripe count. But there is a
race condition and if the OST is deactivated after the call to
ltd_qos_is_usable() but before the stripes are allocated, then
updating the stripe count never occurred.
This causes an LBUG later in lod_striped_create() because fewer
stripes are allocated than the number in llc_stripe_count so it
finds a stripe that is NULL.
The solution is to properly update lod_comp->llc_stripe_count when
the number of stripes created is less than expected.
Lustre-change: https://review.whamcloud.com/51759
Lustre-commit:
78336aa166f4a7a0128a5891c747eecf26ff9565
Test-Parameters: testlist=sanity env=ONLY=27V,ONLY_REPEAT=100
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Fixes:
7b124fef76 ("LU-4277 lod: handle os_state as a flag, check READONLY")
Fixes:
5b147e47de ("LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms")
Fixes:
c7f2e70a27 ("LU-1303 lod: QoS allocation policy")
Fixes:
c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Fixes:
3c9580931d ("LU-9162 lod: option to set max stripe count per filesystem")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ifb9443fe6c80b4d7f82b442060db7ac8423ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52729
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Qian Yingjin [Wed, 23 Nov 2022 07:44:47 +0000 (02:44 -0500)]
LU-16334 llite: update statx size/ctime for fallocate
In the VFS interface ->fallocate(), it should update i_size and
i_ctime returned by statx() accordingly when the file size grows.
Add sanity/150h.
fallocate() call does not update the attributes on MDT.
We use STATX with cached-always mode to verify it as it will not
send Glimpse lock RPCs to OSTs to obtain file size information
and use the caching attributes (size) on the client side as much
as possible.
Lustre-change: https://review.whamcloud.com/49221
Lustre-commit:
51851705e936b2dbc9cf141ecf7ab4e3be04333a
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib8128892222a01cd00250c704328bd13cfb12e2d
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52736
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Thu, 19 Oct 2023 18:01:59 +0000 (11:01 -0700)]
EX-8353 csdc: remove holes from struct ll_compr_hdr
This patch reorganizes struct ll_compr_hdr to remove
alignment holes.
Test-Parameters: trivial
Change-Id: I59800b00e3a17972d621bae21ba06509a39b1036
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Wed, 18 Oct 2023 15:34:02 +0000 (17:34 +0200)]
EX-7331 sec: disable compression for encrypted files
In case a read-modify-write io pattern is carried out on a compressed
file, it has to be handled on server side.
But because encryption cannot be done on server side for security
reasons, we are not able to handle that kind of io pattern if the file
is encrypted + compressed.
So just disable compression for all encrypted files.
Fixes:
eb70ba19e9 ("EX-7331 sec: add support for encryption plus compression")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I84881fb1235f015d022751d4cce2d43a7231c2b4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52746
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Thu, 15 Jun 2023 19:32:05 +0000 (13:32 -0600)]
LU-15404 ldiskfs: fix truncate during setxattr for el7.9
Backport the ext4-delayed-iput.patch to rhel7.9 kernels so the
delayed osd-ldiskfs truncate can use s_misc_wq consistently.
This moves the call to the final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.
Lustre-change: https://review.whamcloud.com/51335
Lustre-commit:
471ce3d95651ca06209a76973cae3bbdb5b6aa2f
Test-Parameters: trivial serverdistro=el7.9
Fixes:
e239a14001 ("LU-15404 ldiskfs: truncate during setxattr leads to kernel panic")
Change-Id: Idd70befa6a83818ece06daccf9bb6256813ebbe5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52809
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Li Dongyang [Mon, 23 Oct 2023 11:49:55 +0000 (22:49 +1100)]
LU-11912 tests: fix racing in force_new_seq_all
We run force_new_seq in parallel to reduce time spent
on consuming precreated objects.
However this could be racy when multiple MDTs are on
the same MDS, a task could finish for one MDT early
and reset the fail_loc to 0 on MDS while other tasks
are still working on other MDTs.
Replace OBD_FAIL_OSP_FORCE_NEW_SEQ with a new param
prealloc_force_new_seq for osp, so we can control
the seq rollover individually for each osp device.
Lustre-change: https://review.whamcloud.com/52801
Lustre-commit: TBD (from
af6dcd597d7f5134de553349c05091e51e0f3dd6)
Change-Id: I52dbd550564ca628a8a85c42951694d58b2b93a9
Fixes:
656fc937cf ("LU-11912 tests: consume precreated objects in parallel")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52802
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Zhuravlev [Mon, 16 Oct 2023 12:52:58 +0000 (15:52 +0300)]
LU-16966 osd: take trunc_lock for fallocate
as fallocate may need few transactions (or transaction restarted)
we have to avoid any concurrent writes/truncates on this object
until fallocate supports 'restart-from-beginning' - first stop the
transaction, then release the lock, then repeat again (like
the write path does).
Lustre-change: https://review.whamcloud.com/52264
Lustre-commit:
51529fb57f85210e292a15c882cf25a4689ea77d
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0bf38b1886fbf24656b45fe0f87fcbad2227672a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexander Boyko [Thu, 17 Aug 2023 10:03:07 +0000 (06:03 -0400)]
LU-17040 scrub: inconsistent item
When OI does not include the fid, scrub will attempt to
fix it with zero inode number. There is
low chance that fid would be found during full inode
scan. But inode scan requires an empty inconsistent
list. With repeated EINPROGRESS replies, inconsistent list is
always not empty.
Move fid with zero inode numbers to stale list.
1 scrub fix to print real OI resurect and
skip not related
2 out_handle debug for dt_locate() fid failed
3 debug for out requests when it was interrupted
Lustre-change: https://review.whamcloud.com/51997
Lustre-commit:
461e3867ea11240c77ccd1bb71a3758506cf882e
HPE-bug-id: LUS-10780
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iad9e9cba90b4648eb0fe8fa6c99984ada60fde70
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52839
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lai Siyao [Mon, 25 Sep 2023 14:28:51 +0000 (10:28 -0400)]
LU-17144 mdt: set dmv by setxattr
Client side: convert setxattr("trusted.dmv") to "setdirstripe -D", as
will help restore directory default LMV from backup.
Server side: add a tunable to enable setxattr("trusted.dmv"), it can
be turned on by "lctl set_param -n mdt.*.enable_dmv_xattr=1". It's
off by default. Since empty buffer can be set by setxattr, add check
in server code to avoid crash.
Add sanity 413j.
Lustre-change: https://review.whamcloud.com/52510
Lustre-commit:
1ebe91ec0ab55f686a730d448e7a1ba2ce99639a
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I27d784998a9c4a182b4fffb8b06c84e9d9190919
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52511
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Thu, 19 Oct 2023 14:45:51 +0000 (17:45 +0300)]
LU-17136 ldiskfs: increase max extent tree depth
increase max extent tree depth to 8.
this is an workaround until LU-16843 ready
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie1b6bd64ff6d5179b47b6a537c6b9f85670c3f69
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52758
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Thu, 26 Oct 2023 01:00:40 +0000 (18:00 -0700)]
LU-17193 build: fix gcc-12 compiler warnings
A few instances of QCTL_COPY() should be QCTL_COPY_NO_PNAME()
as the zero-length array to hold the pool name is not
allocated in these cases.
Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/52687
Lustre-commit:
1b0de05f81372eeda9a2a38142553ead7e88a431
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I72bda8b46c51dbd42fb42bf569ba29572526acfe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Timothy Day [Tue, 10 Oct 2023 00:07:24 +0000 (00:07 +0000)]
LU-17151 tests: increase sanity/411b memory limit
This test fails most of the time when run using
arm clients. It seems like the cgroup memory limit
was increased in a past revision for a similar issue.
Increase it a bit more for aarch64. Increase it a
smaller amount for x86.
Also, add some better logging for some other issues.
There's likely a better fix for this, but hopefully
this will let the test pass and provide some value
without having to do a full revert.
Lustre-change: https://review.whamcloud.com/52610
Lustre-commit:
0e878390e1c8c5883bccd01758392eaa16a67f31
Fixes:
8aa231a99 ("LU-16713 llite: writeback/commit pages under memory pressure")
Test-Parameters: trivial
Test-Parameters: testgroup=review-ldiskfs-arm testlist=sanity env=ONLY=411b,ONLY_REPEAT=50
Test-Parameters: clientdistro=el8.7 testlist=sanity env=ONLY=411b,ONLY_REPEAT=50
Test-Parameters: clientdistro=el9.1 testlist=sanity env=ONLY=411b,ONLY_REPEAT=50
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If850077c0d7f6466082433776d370d24eee9736c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52838
Qian Yingjin [Tue, 6 Jun 2023 08:11:30 +0000 (15:11 +0700)]
LU-16713 llite: writeback/commit pages under memory pressure
Lustre buffered I/O does not work well with restrictive memcg
control. This may result in OOM when the system is under memroy
pressure.
Lustre has implemented unstable pages support similar to NFS.
But it is disabled by default due to the performance reason.
In Lustre, a client pins the cache pages for writes until the
write transcation is committed on the server (OST) even these
pinned pages have been finished writeback. The server starts
a transaction commit either because the commit interval (5
second, by default) for the backend storage (i.e. OST/ldiskfs)
has been reached or there is not enough room in the journal
for a particular handle to start. Before the write transcation
has been committed and notify the client, these pages are
pinned and not flushable in any way by the kernel.
This means that when a client hits memory pressure there can
be a large number of unfreeable (pinned and uncommitted) pages,
so the application on the client will end up OOM killed because
when asked to free up memory it can not.
This is particularly common with cgroups. Because when cgroups
are in use, the memory limit is generally much lower than the
total system memory limits and it is more likely to reach the
limits.
Linux kernel has matured memory reclaim mechanism to avoid OOM
even with cgroups.
After perform dirtied write for a page, the kernel calls
@balance_dirty_pages(). If the dirtied and uncommitted pages
are over background threshold for the global memory limits or
memory cgroup limits, the writeback threads are woken to perform
some writeout.
When allocate a new page for I/O under memory pressure, the
kernel will try direct reclaim and then allocating. For cgroup,
it will try to reclaim pages from the memory cgroup over soft
limit. The slow page allocation path with direct reclaim will
call @wakeup_flusher_threads() with WB_REASON_VMSCAN to start
writeback dirty pages.
Our solution uses the page reclaim mechanism in the kernel
directly.
In the completion of page writeback (in @brw_interpret), call
@__mark_inode_dirty() to add this dirty inode which has pinned
uncommitted pages into the @bdi_writeback where each memory
cgroup has itw own @bdi_writeback to contorl the writeback for
buffered writes within it.
Thus under memory pressure, the writeback threads will be woken
up, and it will call @ll_writepages() to write out data.
For background writeout (over background dirty threshold) or
writeback with WB_REASON_VMSCAN for direct reclaim, we first
flush dirtied pages to OSTs and then sync them to OSTs and force
to commit these pages to release them quickly.
When a cgroup is under memory pressure, the kernel asks to do
writeback and then it does a fsync to OSTs. This will commit
uncommitted/unstable pages, and then the kernel can free them
finally.
In the following, we will give out some performance results.
The client has 512G memory in total.
1. dd if=/dev/zero of=$test bs=1M count=$size
I/O size 128G 256G 512G 1024G
unpatch (GB/s) 2.2 2.2 2.1 2.0
patched (GB/s) 2.2 2.2 2.1 2.0
There is no preformance regession after enable unstable page
account with the patch.
2. One process under different memcg limits and total I/O
size varies from 2X memlimit to 0.5 memlimit:
dd if=/dev/zero of=$file bs=1M count=$((memlimit_mb * time))
memcg limits 1G 4G 16G 64G
2X memlimit (GB/s) 1.7 1.6 1.8 1.7
1X memlimit (GB/s) 1.9 1.9 2.2 2.2
.5X memlimit(GB/s) 2.3 2.3 2.2 2.3
Without this patch, dd with I/O size > memcg limit will be
OOM-killed.
3. Multiple cgroups Testing:
8 cgroups in total each with memory limit of 8G.
Run dd write on each cgrop with I/O size of 2X memory limit
(16G).
17179869184 bytes (17 GB, 16 GiB) copied, 12.7842 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.7889 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9504 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9577 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.4066 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5397 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5769 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.6605 s, 1.3 GB/s
4. Two dd writers one (A) is under memcg control and another
(B) is not. The total write data is 128G. Memcg limits varies
from 1G to 128G.
cmd: ./t2p.sh $memlimit_mb
memlimit dd writer (A) dd writer (B)
1G 1.3GB/s 2.2GB/s
4G 1.3GB/s 2.2GB/s
16G 1.4GB/s 2.2GB/s
32G 1.5GB/s 2.2GB/s
64G 1.8GB/s 2.2GB/s
128G 2.1GB/s 2.1GB/s
The results demonstrates that the process with memcg limits
nearly has no impact on the performance of the process without
limits.
Lustre-change: https://review.whamcloud.com/50544
Lustre-commit:
8aa231a994683a9224d42c0e7ae48aaebe2f583c
Test-Parameters: clientdistro=el8.7 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Test-Parameters: clientdistro=el9.1 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7b548dcc214995c9f00d57817028ec64fd917eab
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52527
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Raphael Druon [Thu, 19 Oct 2023 15:05:25 +0000 (09:05 -0600)]
EX-8362 scripts: Improve estimated ratio
ll_compression_scan does not take in account the size of the
sampled files, this might lead to uncorrect estimated ratio for non
homogeneous file.
This patch apply the compression ratio estimated with the sampled data
and applies it to the entire file size, assuming the file will have
the same compression ratio across it.
Test-Parameters: trivial
Signed-off-by: Raphael Druon <rdruon@ddn.com>
Change-Id: Ic4a26460e17c666b9edf4c0d8d450a06fad5920f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Bobi Jam [Wed, 25 Oct 2023 06:52:20 +0000 (14:52 +0800)]
LU-16837 lov: NULL dereference in lov_delete_composite
commit
14ed4a6f8f retroduced the issue fixed by commit
5da049d9ef ("LU-14389 lov: avoid NULL dereference in cleanup), this
patch makes the fix cover the new case added by
14ed4a6f8f.
Lustre-change: https://review.whamcloud.com/52826
Lustre-commit: TBD (from
10b4a14b389cb00e1033e2f49e3d1f5a554b259a)
Fixes:
14ed4a6f8f ("LU-16837 llite: handle unknown layout component")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4a2b72e21139b60519ed523b4851723c91f523c1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaliy Kuznetsov [Mon, 23 Oct 2023 10:21:55 +0000 (12:21 +0200)]
LU-16827 obdfilter: Fix obdfilter-survery/1a
local_node() under test-framework is used
to determine if the node is remote or local
local_node() returns "true" if the node is
local. Else for remote node it return "false"
This patch fixes obdfilter/1a test case which
which was making reverse logic call to
local_node() to determine remote/local node
This patch modifies local_node() to return
"true"/"false" instead of 0/1
This patch also replaces lctl with $LCTL
Lustre-change: https://review.whamcloud.com/51035
Lustre-commit:
91a3b286ba57bb491b5c17600d7cec9e516a428f
Test-Parameters: testlist=obdfilter-survey,sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I7bcb483975ec46d9847e0050e5a1f22f68663c80
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52800
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Hongchao Zhang [Thu, 12 Oct 2023 10:32:29 +0000 (18:32 +0800)]
LU-15461 test: add pool quota check
The test_79 in sanity-quota needs quota pool support.
The removal of the "stop file" is also improved not to
trigger the test error if it has been deleted.
Lustre-change: https://review.whamcloud.com/52737
Lustre-commit: TBD (from
a4b3cd91ae157a63644350769ebb248f21dd6eac)
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4acd36e61faf4259c2821293ffb7913d4cca76bd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52659
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Thu, 26 Oct 2023 18:41:57 +0000 (11:41 -0700)]
LU-17220 kernel: update RHEL 7.9 [3.10.0-1160.102.1.el7]
Update RHEL 7.9 kernel to 3.10.0-1160.102.1.el7.
Lustre-change: https://review.whamcloud.com/52819
Lustre-commit: TBD (from
1feea616fd7addf842afdc836e7f32686ea159ae)
Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9
Change-Id: Ifc56766dedf055dc3762e200835beb220fd63afb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52843
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Thu, 26 Oct 2023 18:35:22 +0000 (11:35 -0700)]
LU-17221 kernel: update SLES15 SP4 [5.14.21-150400.24.92.1]
Update SLES15 SP4 kernel to 5.14.21-150400.24.92.1 for Lustre client.
Lustre-change: https://review.whamcloud.com/52820
Lustre-commit: TBD (from
92cf005d01e327e53bd312b411211ed2f1d827b9)
Test-Parameters: trivial clientdistro=sles15sp4 testlist=sanity
Change-Id: Id82d0ce48179df1f12dc367cced8cf84e1b918d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52825
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Wed, 25 Oct 2023 06:48:24 +0000 (23:48 -0700)]
LU-17222 kernel: update SLES15 SP5 [5.14.21-150500.55.31.1]
Update SLES15 SP5 kernel to 5.14.21-150500.55.31.1 for Lustre client.
Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/52821
Lustre-commit: TBD (from
b2159275aaf3595776ae89b3efeda4ec8bde14ff)
Test-Parameters: trivial clientdistro=sles15sp5 testlist=sanity
Change-Id: I5719e8c79740a58223b2e0bea6f6b269f281968a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52824
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Thu, 26 Oct 2023 01:14:57 +0000 (19:14 -0600)]
LU-16868 tests: skip conf-sanity/32 in interop
Do not run conf-sanity.sh test_32* in interop testing. Otherwise,
it is possible that the version of the test script running on the
client does not perform the upgrades with the right steps needed
for remote servers that are running a different version.
Lustre-change: https://review.whamcloud.com/52835
Lustre-commit: TBD (from
6368e97e593707d2ae1423dcb41c7f001f1d2152)
Test-Parameters: trivial testlist=conf-sanity env=ONLY=32a
Test-Parameters: testlist=conf-sanity env=ONLY=32a serverversion=EXA5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iabe1469a87d58c49e3c38b76ab18f8997f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52836
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Alex Zhuravlev [Wed, 15 Apr 2020 14:54:07 +0000 (17:54 +0300)]
LU-13453 osd-ldiskfs: do not leak inode if OI insertion fails
osd_create() should destroy just created inode if OI insertion
fails.
also fixes lustre_index_restore() to drop nlink for object to
be removed.
the patch adds two tests:
- ENOSPC on OI insertion
- ENOSPC on .. insertion, i.e. directory block allocation
Lustre-change: https://review.whamcloud.com/38235
Lustre-commit:
e45e8a92a2ecab742b3680716a55aaa1d9827057
Test-Parameters: testlist=sanity-scrub mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-scrub mdscount=2 mdtcount=4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2a5db657c7dab54b8dc2c50bc29365d5ee754a2e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52846
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Sat, 21 Oct 2023 17:47:03 +0000 (11:47 -0600)]
RM-620 build: New tag 2.14.0-ddn110
New tag 2.14.0-ddn110
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iab6709369cc3d4e50fd799fcd6db3796202905e7
Lai Siyao [Wed, 4 Aug 2021 04:37:29 +0000 (00:37 -0400)]
LU-14659 test: improve generate_uneven_mdts() in sanity.sh
Improve generate_uneven_mdts() in several places:
1. set qos maxage to 1, so the result is up to date, and avoid filling
up MDT.
2. fill MDT with files of size 64K other than 1M, so MDT imbalance is
quicker to achieve.
3. when checking minimum imbalance after test, lookup max value from
the result, other than by index stored before directory creation,
because the result is dynamic if several MDTs have almost the same
free space and inodes.
Lustre-change: https://review.whamcloud.com/44649
Lustre-commit:
d45be79a069f527657c1ce91630183031ea42b27
Test-Parameter: trivial mdscount=2 mdtcount=4 testlist=sanity
Fixes:
233344d451e ("LU-13417 test: generate uneven MDTs early for sanity 413")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2807101ff632404e25fdb640840d83d1991c88d9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>