Whamcloud - gitweb
fs/lustre-release.git
18 months agoEX-7601 osc: replace assert with error
Patrick Farrell [Mon, 13 Nov 2023 04:20:01 +0000 (23:20 -0500)]
EX-7601 osc: replace assert with error

We shouldn't assert on values read from storage, instead if
they are incorrect, we should give EIO.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Icda213e3c5a90a848c9b008788e92ee49e2efcb1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: variable cleanup in decompress_req
Patrick Farrell [Mon, 13 Nov 2023 04:13:10 +0000 (23:13 -0500)]
EX-7601 osc: variable cleanup in decompress_req

Use type and lvl variables in decompress_request.

Remove an unused variable and an assert which can never
fire.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ieff57411a2a41215fd368d731614801bd0f43e38
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53107
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 obd: move module load to function
Patrick Farrell [Sat, 11 Nov 2023 20:21:01 +0000 (15:21 -0500)]
EX-7601 obd: move module load to function

This is a trivial code change to make alloc_compr a bit
shorter.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0a790afe7afebde1d223420d9a578529da6ff7e5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53102
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: make compress_chunk take chunk_bits
Patrick Farrell [Fri, 10 Nov 2023 22:21:52 +0000 (17:21 -0500)]
EX-7601 ofd: make compress_chunk take chunk_bits

Chunk bits is used everywhere, have compress_chunk convert
to log bits rather than have the callers do it.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic01bb749425cb95d9c5717965d692a18138ceeb7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53100
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: cleanup compression variables
Patrick Farrell [Fri, 10 Nov 2023 22:19:00 +0000 (17:19 -0500)]
EX-7601 osc: cleanup compression variables

Make usage of the compression variables more readable.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6daff56b56877c8f36e02303cc0579ba7faa731b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: rename 'done'
Patrick Farrell [Fri, 10 Nov 2023 22:10:34 +0000 (17:10 -0500)]
EX-7601 osc: rename 'done'

Rename the ambiguous 'done' and remove it where not used.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8fb88b7a91fcc7dbd5ce2d29a61c18330fc0cda3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: rename pages_in_chunk
Patrick Farrell [Fri, 10 Nov 2023 22:07:39 +0000 (17:07 -0500)]
EX-7601 osc: rename pages_in_chunk

Use the more standard pages_per_chunk.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I47e0995fe8aa8d1a9a610669d6cd4c39559b6fa4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7600 osc: use pages_left in unmerge_chunk
Patrick Farrell [Sun, 12 Nov 2023 19:52:28 +0000 (14:52 -0500)]
EX-7600 osc: use pages_left in unmerge_chunk

Since we have compressed chunks < chunk_size (if they're
after EOF), we must use pages_left in unmgerge_chunk or it
will go off the end of the page array.

This also lets us remove the workaround where unmerge_chunk
would skip pages that were not present.  unmerge_chunk
always works with a known and complete set of pages, so this
check is unneeded.

We should also check that our count of bytes is correct
when we finish.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I88896307990ff839514e54e9a7e18390a457e5d8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53095
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: only set compressed flag on compressed pages
Patrick Farrell [Mon, 13 Nov 2023 16:18:49 +0000 (11:18 -0500)]
EX-7601 osc: only set compressed flag on compressed pages

The code accidentally sets the compressed flag on all
pages processed through fill_cpga, even if they're not
compressed.  Oops.

Also stop setting pg->index on the pages in the compressed
pga, this is only used by encryption and that's no longer
supported with compression.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I313fd943a18b71cd52493852a6884f30d187e52f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53118
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: remove cpga fill bits
Patrick Farrell [Fri, 10 Nov 2023 19:07:07 +0000 (14:07 -0500)]
EX-7601 osc: remove cpga fill bits

cpga fill bits are not needed now that we don't support
compression and encryption.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I13c2278e085e9b288bd896585947e28e2ea505ca
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add obd level compression lib
Patrick Farrell [Wed, 1 Nov 2023 21:07:23 +0000 (17:07 -0400)]
EX-7601 ofd: add obd level compression lib

Some compression functions will be used by several areas of
of Lustre, so they need to be in obdclass.

This moves merge_chunk and unmerge_chunk there and adds the
ability for them to merge lnbs.  This is used in a future
patch.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If4a318119bb7685e41adb9f3b31a66074031e6ac
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 llite: restrict readahead to eof
Patrick Farrell [Tue, 14 Nov 2023 22:58:16 +0000 (17:58 -0500)]
EX-7601 llite: restrict readahead to eof

Compressed file readahead rounding needs to come before
readahead is limited to EOF.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4e9e7fe63301c08efcb05f170726735593a9431d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53137
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16032 tests: restore delay_unlink_mb in sanity/360
Andreas Dilger [Thu, 23 Nov 2023 22:56:00 +0000 (15:56 -0700)]
LU-16032 tests: restore delay_unlink_mb in sanity/360

Restore the original value of osd-ldiskfs.*.delay_unlink_mb after
sanity test_360 is finished, so that it doesn't have an impact on
later tests running, in particular sanity-quota.sh was seeing some
delay in freeing quota for files that were just deleted.

Lustre-change: https://review.whamcloud.com/53218
Lustre-commit: TBD (from 8fa0580fd64fe7cbe969817ece87a161c517c4c3)

Test-Parameters: trivial testlist=sanity-quota
Fixes: a772e90243 ("LU-16032 osd: move unlink of large objects to separate thread")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7c1ab02262afdef2fc51f9fbc3932d954a4f8304
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15777 hsm: set changelog error for restore layout swap failure
Nikitas Angelinas [Wed, 11 May 2022 22:54:08 +0000 (15:54 -0700)]
LU-15777 hsm: set changelog error for restore layout swap failure

Set the error code in the changelog record generated, if the layout swap
fails at the end of an HSM restore operation. Also, handle error code
overflow inside hsm_set_cl_error(), so that callers don't need to do
this themselves.

Lustre-change: https://review.whamcloud.com/47121
Lustre-commit: 09fe64719b888cd212b6cffe923545b7650f230f

Suggested-by: Olaf Weber <olaf.weber@hpe.com>
Suggested-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I4ed2ebffa3bc1c6a0f87ea9f13734e344f77006f
HPE-bug-id: LUS-10863
Test-Parameters: testlist=sanity-hsm,sanity-pcc
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53213
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17115 quota: fix race of acquiring locks in qmt
Hongchao Zhang [Thu, 26 Oct 2023 12:46:44 +0000 (20:46 +0800)]
LU-17115 quota: fix race of acquiring locks in qmt

In qmt_delete_qid and qmt_reset_qid, the order to require
the lock of lquota_entry and journal is different from that
in qmt_dqacq0, which could cause deadlock in some cases.

Lustre-change: https://review.whamcloud.com/52371
Lustre-commit: ee0e9447e7022e2caa8b161657d505e17ccdc4a1

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ic439f2c5d6ca22429422b87f0dde65e0d2e6113d
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16097 quota: release preacquired quota when over limits
Hongchao Zhang [Thu, 19 Oct 2023 06:33:47 +0000 (14:33 +0800)]
LU-16097 quota: release preacquired quota when over limits

The pre-acquired quota on each MDT or OST should be released when
the whole quota is over limits, for instance, after the quota limits
had been decreased for some quota ID by Administrator.

Lustre-change: https://review.whamcloud.com/48576
Lustre-commit: 57ac32a22372065b789ca491a568f075e755d339

Test-Parameters: testlist=sanity-quota
Test-Parameters: testlist=sanity-quota
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6263b835d4ae6a3fd03f9a2bc4f463949cbc74d4
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17142 mgc: reconnection without pinger
Alexander Boyko [Tue, 22 Aug 2023 09:53:14 +0000 (05:53 -0400)]
LU-17142 mgc: reconnection without pinger

When MGS was offline for some time, AT is increased and
connection request deadline is high. Reconnect with a pinger
waits a request deadline for a next attempt. A situation is
worse with a failover partner, when different connections are used.
Reconnection could fail with local MGS too.

Here is the error when MGC could not connect to a local MGS, MDT
combined with MGS.

    LustreError: 15c-8: MGC90@kfi:
    Confguration from log kjlmo12-MDT0000 failed from MGS -5.

The patch forces reconnection with import invalidate and aborts
inflight requests.

ptlrpc_recover_import() aborts waiting for disconnect import state.
But disconnect happens between connection attempt and it is valid.
This is fixed.

Reset Adaptive Timeout when local MGS starts. It allows MGC to
reconnect efficiently.

mgs_barrier_gl_interpret_reply() should handle -EINVAL from a client,
it means client don't have a lock.

Lustre-change: https://review.whamcloud.com/52498
Lustre-commit: 867ba433e3a0fce4a1b2f8d37a91d550ada41a26

HPE-bug-id: LUS-11633
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ie631e04fb3e72900af076cf7f268f20f7b285445
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53116
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn116
Andreas Dilger [Wed, 22 Nov 2023 21:11:28 +0000 (14:11 -0700)]
RM-620 build: New tag 2.14.0-ddn116

New tag 2.14.0-ddn116

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iaf3d0d8a468b44c0bd179bc729fc66483cb45581

18 months agoRM-620 build: New tag 2.14.0-2.14.0-ddn116
Andreas Dilger [Wed, 22 Nov 2023 21:10:48 +0000 (14:10 -0700)]
RM-620 build: New tag 2.14.0-2.14.0-ddn116

New tag 2.14.0-2.14.0-ddn116

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I752cf0dfd78de778fe34787b2e026fec0277f610

18 months agoEX-8236 pcc: abort data copy via ll_fid_path_copy
Qian Yingjin [Fri, 10 Nov 2023 09:23:46 +0000 (04:23 -0500)]
EX-8236 pcc: abort data copy via ll_fid_path_copy

For data copying via ll_fid_path_copy in direct I/O mode in user
space, the client calls llapi_pcc_state_fd() to obtain the file
PCC state. If it is marked with PCC_STATE_FL_ATTACH_ABORTING, the
data copy process ll_fid_path_copy exits immediately.
To reduce the overhead of these check, we do not check for each
data copy iter, instead, we do a check for certain times of I/Os
(32 times by default). For I/O size of 32MiB, it will be checking
1 times per second at 1GiB/s. There should be some time-lag
before the copy tool quits finally.

Change-Id: I20631e5481a7e97d7a1ed0729bcd269ef6248a2c
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7331 csdc: prohibit set compression upon encrypted file
Bobi Jam [Fri, 10 Nov 2023 09:17:50 +0000 (17:17 +0800)]
EX-7331 csdc: prohibit set compression upon encrypted file

Setting compression layout component upon encrypted file is not
allowed for now.

This patch add this check on MDS in creating file with layout,
adding/merging new mirror to existing file.

Test-Parameters: testlist=sanity-sec env=ONLY=67,PTLDEBUG=-1
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I60d9f4bfce3a498f1eb3994c6276afb9d89c99a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8584 tests: check and wait lpcc_purge scanning ends
Lei Feng [Fri, 17 Nov 2023 07:53:21 +0000 (15:53 +0800)]
EX-8584 tests: check and wait lpcc_purge scanning ends

check lpcc_purge status to make sure it finishs at least
one round of scanning.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc env=ONLY="200 201 202",ONLY_REPEAT=50
Change-Id: I8e6f50393d1a3cbb7a1bc976942631db6ecceb67
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53167
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-16284 utils: lfs getstripe follows symlink
Lei Feng [Tue, 1 Nov 2022 02:57:39 +0000 (10:57 +0800)]
LU-16284 utils: lfs getstripe follows symlink

'lfs getstripe' prints the information of symlink target by default.
With '--no-follow' option it prints the information of symlink itself.

Lustre-change: https://review.whamcloud.com/49003
Lustre-commit: af32b516593dbf2a8e7a85d885c33fd017926ada

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I6cef01af5bb2235bdcbf0b5c99af4b9ed5869515
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53139
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17275 kernel: RHEL 8.9 client support
Jian Yu [Mon, 20 Nov 2023 22:32:40 +0000 (14:32 -0800)]
LU-17275 kernel: RHEL 8.9 client support

This patch makes changes to support RHEL 8.9 release
with kernel 4.18.0-513.5.1.el8_9 for Lustre client.

Lustre-change: https://review.whamcloud.com/53071
Lustre-commit: TBD (from 0da16c715a06b6426a6b99c111147fc875784e85)

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-3

Change-Id: Ia3672d134534b877bb6aaffb4cea0339bc55974f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17293 kernel: update SLES15 SP5 [5.14.21-150500.55.36.1]
Jian Yu [Fri, 17 Nov 2023 18:02:00 +0000 (10:02 -0800)]
LU-17293 kernel: update SLES15 SP5 [5.14.21-150500.55.36.1]

Update SLES15 SP5 kernel to 5.14.21-150500.55.36.1 for Lustre client.

Lustre-change: https://review.whamcloud.com/53156
Lustre-commit: TBD (from 3e50280434d250996dfaa9d68d7da5e2c45d59ef)

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=sles15sp5 testlist=sanity

Change-Id: I5a9afb313e9bf315ef4af5b6602785ee68c4c247
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3]
Jian Yu [Thu, 9 Nov 2023 19:01:19 +0000 (11:01 -0800)]
LU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3]

This patch makes changes to support new RHEL 9.3 release
for Lustre client.

Lustre-change: https://review.whamcloud.com/53054
Lustre-commit: TBD (from 9146471f862d6c6fae6c1f6ac99f55d8280a2891)

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 clientdistro=el9.3 testlist=sanity
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-3

Change-Id: I9cce1a7d2249cb4df39106c44ba4417411ee0757
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-14955 lnet: Use fatal NI if none other available
Serguei Smirnov [Tue, 24 Aug 2021 20:48:41 +0000 (13:48 -0700)]
LU-14955 lnet: Use fatal NI if none other available

Allow NI in fatal state to be selected for sending if there are no
NIs in non-fatal state.

Lustre-change: https://review.whamcloud.com/44746/
Lustre-commit: ff3322fd0c77a8042558711d9f410326d2aa6375

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11019
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iab8ef6ee5c5f45896196dbd88a2f61e004278297
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoRM-620 build: New tag 2.14.0-ddn115
Andreas Dilger [Tue, 14 Nov 2023 22:38:26 +0000 (15:38 -0700)]
RM-620 build: New tag 2.14.0-ddn115

New tag 2.14.0-ddn115

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8d964022825701d68ab711fb7fd5c22d7c1f6e2b

19 months agoLU-16374 enc: rename O_FILE_ENC to O_CIPHERTEXT
Sebastien Buisson [Sun, 24 Sep 2023 16:07:44 +0000 (12:07 -0400)]
LU-16374 enc: rename O_FILE_ENC to O_CIPHERTEXT

Rename O_FILE_ENC to O_CIPHERTEXT as per discussion in linux-fscrypt
mailing-list.
Also change the flag combination to be:
O_NOCTTY | O_NDELAY | O_DSYNC
to avoid the risk of accidental issues with tar that already opens
files with the 'O_NOCTTY | O_NDELAY' combination.

O_DSYNC does not make much sense for O_RDONLY files, but will force
writes on encrypted restore to be synchronous. With O_DIRECT and large
enough writes (32MB?) that might be OK, but not ideal for small files.

Lustre-Commit: ac522557b1fe3ea2b7275fa6d5df73691b8d06db
Lustre-Change: https://review.whamcloud.com/51640

Fixes: 4869c7a530 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I36fed17a413ee690bc445c3e76674ed5fc337de5
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53049
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-17184 mgc: remove damaged local configs
Mikhail Pershin [Fri, 13 Oct 2023 21:28:58 +0000 (00:28 +0300)]
LU-17184 mgc: remove damaged local configs

If local config llog is damaged it can't be removed and
prevents target from mounting. This happens because
mgc_llog_local_copy() uses llog_erase() to remove llogs
which can't do the job if llog header is damaged.

Patch changes are:
- llog_erase() to don't initialize header but just destroy
  llog file
- mgc_llog_local_copy() to don't exit on backup to temp
  file but continue with remote llog copying anyway
- conf-sanity test_151 is added to check that target can
  mount with damaged local config

Lustre-change: https://review.whamcloud.com/52697
Lustre-commit: 6a6e4ee20fe5aaad4beab5477e1c7d05e4e702e2

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I637749c38fd5ed03bdac5ca1cd60196f724ab0d1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16032 osd: move unlink of large objects to separate thread
Artem Blagodarenko [Fri, 13 Oct 2023 07:49:07 +0000 (15:49 +0800)]
LU-16032 osd: move unlink of large objects to separate thread

Final unlink and freeing of blocks for large objects can lead to
a thread hung with this call stack:

  Net: Service thread pid 1739 was inactive for 200.16s.
  The thread might be hung, or it might only be slow and will
  resume later.
  Dumping the stack trace for debugging purposes:
    __wait_on_buffer+0x2a/0x30
    ldiskfs_wait_block_bitmap+0xe0/0xf0 [ldiskfs]
    ldiskfs_read_block_bitmap+0x31/0x60 [ldiskfs]
    ldiskfs_free_blocks+0x329/0xbb0 [ldiskfs]
    ldiskfs_ext_remove_space+0x8a9/0x1150 [ldiskfs]
    ldiskfs_ext_truncate+0xb0/0xe0 [ldiskfs]
    ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs]
    ldiskfs_evict_inode+0x58a/0x630 [ldiskfs]
    evict+0xb4/0x180
    iput+0xfc/0x190
    osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
    lu_object_free.isra.30+0x68/0x170 [obdclass]
    lu_object_put+0xc5/0x3e0 [obdclass]
    ofd_destroy_by_fid+0x20e/0x500 [ofd]
    ofd_destroy_hdl+0x267/0x9f0 [ofd]
    tgt_request_handle+0xaee/0x15f0 [ptlrpc]
    ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
    ptlrpc_main+0xb34/0x1470 [ptlrpc]
    kthread+0xd1/0xe0

Let's move final unlink to workqueue if inode size > 1GB.  The size
threshold be configured by setting the minimum async truncate size
with the "osd-ldiskfs.*.delay_unlink_mb" parameter.

Writes to "osd-ldiskfs.*.force_sync" parameter will flush pending
delayed unlinks so that space can be reclaimed as needed.

Lustre-change: https://review.whamcloud.com/47995
Lustre-commit: a772e90243ea0ff1de6ae9c67e1f6384c431d200

Change-Id: Id535ae4c58732769effabee42835bc2da8cb5cc1
Signed-off-by: Artem Blagodarenko <ablagodarenko@whamcloud.com>
DDN-bug-id: DDN-3144
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16827 obdfilter: Fix "emfperf obdfilter-survey" error
Vitaliy Kuznetsov [Fri, 10 Nov 2023 20:35:56 +0000 (21:35 +0100)]
LU-16827 obdfilter: Fix "emfperf obdfilter-survey" error

This patch fixes the definition of the lctl variable. It changes
the logic so that the LCTL value is assigned only when it was
defined earlier.

Lustre-change: https://review.whamcloud.com/53083
Lustre-commit: 95387e580a639eb9ff0648aecf69d0a4951325ef

Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I4dfd7e3d1f78208b33b897d8e6680e59b690014c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53084
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16632 tests: more margin of error for sanity/56xh
Timothy Day [Sat, 11 Mar 2023 22:55:09 +0000 (22:55 +0000)]
LU-16632 tests: more margin of error for sanity/56xh

Give sanity test_56xh more time to migrate files inside the
VMs before failing.

Also, fix a typo.

Lustre-change: https://review.whamcloud.com/50262
Lustre-commit: 36cbba150bce9e2890c8b462ec2ce4af2d6353a5

Test-Parameters: trivial testlist=sanity env=ONLY=56xh,ONLY_REPEAT=100
Fixes: 55968bfabe ("LU-13482 utils: bandwidth limit for lfs migrate")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If89c8c3ee113c8a14d4c0463c7bb79e353130c08
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53086
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
19 months agoLU-17258 socklnd: ensure connection type established upon race
Chris Horn [Thu, 2 Nov 2023 19:28:45 +0000 (12:28 -0700)]
LU-17258 socklnd: ensure connection type established upon race

When a connection race is hit between two peers, only increment the
retry count if a connection of the specific type has already been
established; otherwise, this can lead to an unexpected value set in
ksnr_connected and some of the assertions being triggered in
ksocknal_connect():

"ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed"

Lustre-change: https://review.whamcloud.com/52957
Lustre-commit: 5afe3b0538c533c3cca370bc9c0901abccca299a

Fixes: da893c6c97 ("LU-16191 socklnd: limit retries on conns_per_peer mismatch")
HPE-bug-id: LUS-11922
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I6e8abb39ad3c0bcd7fbc8f8c5478c903029df908
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53046
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoRM-620 build: New tag 2.14.0-ddn114
Andreas Dilger [Fri, 10 Nov 2023 09:38:19 +0000 (02:38 -0700)]
RM-620 build: New tag 2.14.0-ddn114

New tag 2.14.0-ddn114

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia94862790d1dec3d8080b6d00445ca163afebf81

19 months agoEX-7601 osc: walk chunk unaligned RPC correctly
Patrick Farrell [Wed, 1 Nov 2023 20:14:12 +0000 (16:14 -0400)]
EX-7601 osc: walk chunk unaligned RPC correctly

For decompression, the client must start looking for
compressed chunks at a chunk aligned offset.

Implement this in decompress_request.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3273135990ddf51e8b3c651734e19350e91f659c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52933
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
19 months agoEX-7601 osc: remove unused 'wrkmem'
Patrick Farrell [Fri, 3 Nov 2023 19:56:27 +0000 (15:56 -0400)]
EX-7601 osc: remove unused 'wrkmem'

compress_chunk() takes a wrkmem buffer, which it does not
use.

Remove this and its allocation in compress_request.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6f236f018f5b79c57cc8725ca0f95125810a4064
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52980
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
19 months agoEX-7601 osc: apply compressed flag to dst page
Patrick Farrell [Fri, 3 Nov 2023 18:11:46 +0000 (14:11 -0400)]
EX-7601 osc: apply compressed flag to dst page

The existing code to apply brw flags to compressed pages
has two issues:
1. The dst_page is NOT an osc async page, it is a bare BRW
page.  This means the brw_page2oap macro isn't right,
because there is no oap page.
Because oap_brw_flags is actually oap_brw_page.flag, we
don't ever access the memory pointed at by OAP, just use it
to find an offset back in to the brw page.

This means the flags are set correctly, but we still
shouldn't use this macro.
2. However, the function then overwrites these flags by
copying from a page in the source, so OBD_BRW_COMPRESSED is
lost.

Add OBD_BRW_COMPRESSED when we set flags.  This ensures the
flag is actually sent to the server on compressed IO.

This was not causing any problems because the server does
not actually use the OBD_BRW_COMPRESSED flag yet.
(EX-7601 uses this flag)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia94cdc803868ce16a0b66fd58578ec8b2d00cbae
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
19 months agoEX-8270 sptlrpc: don't crash for too-large chunk size
Andreas Dilger [Thu, 9 Nov 2023 00:10:05 +0000 (17:10 -0700)]
EX-8270 sptlrpc: don't crash for too-large chunk size

If the chunk size is too large, don't fall off the
end of the page_pool[] array with a large "order".

Test-Parameters: trivial
Fixes: d945f1b064 ("EX-6261 ptlrpc: extend sec bulk functionality")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I192ac1b227f1cab8405f6657e754101d353ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53044
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
19 months agoEX-7806 csdc: not support data compression on MDT
Bobi Jam [Wed, 23 Aug 2023 16:43:56 +0000 (00:43 +0800)]
EX-7806 csdc: not support data compression on MDT

Do not support setting data compression component on DoM until
data compression on MDT implemented.

Test-Parameters: trivial testlist=sanity-pfl
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3794460140f08a073377c418dd56e7dda907d96d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52062
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-7601 csdc: improve preview warning messages
Andreas Dilger [Thu, 9 Nov 2023 00:29:26 +0000 (17:29 -0700)]
EX-7601 csdc: improve preview warning messages

Avoid printing duplicate warning messages on the console when
creating files with multiple compressed components.  On the
flip side, log a console message when compression is enabled
so that this will later be visible if enabled on a system.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8cb2f67689824513335f3fa65e9ea751923ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53045
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
19 months agoLU-17205 utils: add lctl get_param -H option
Aurelien Degremont [Tue, 17 Oct 2023 13:07:45 +0000 (15:07 +0200)]
LU-17205 utils: add lctl get_param -H option

- Add a new '-H' option to 'lctl get_param' that will prefix
each output line with the parameter name instead of only
the first line by default.

That makes grepping lctl get_param with wildcards much easier
as you can now easily know which parameter returns which value.

  $ lctl get_param -H osc.*.state | grep current
  osc.lustre-OST0000-osc-ff1148c0.state=current_state: FULL
  osc.lustre-OST0001-osc-ff1248c0.state=current_state: DISCONN
  osc.lustre-OST0002-osc-ff1348c0.state=current_state: FULL
  osc.lustre-OST0003-osc-ff1448c0.state=current_state: FULL
  osc.lustre-OST0004-osc-ff1548c0.state=current_state: FULL

It also prints an output line even for empty values. That also
makes like easier for admins.

- The patch also removes the force line feed if the parameter
value was larger than 80 chars. This was considered a misfeature
and is now drop for all usages, with or without -H.

Lustre-change: https://review.whamcloud.com/52730
Lustre-commit: a12c352a3dd8d424b1da09efc6884530c60d105b

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: Ib1fa0dc400db4c19fed10ad4cced9be5668418e3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53067
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16639 misc: cleanup concole messages
Andreas Dilger [Mon, 13 Mar 2023 22:08:30 +0000 (16:08 -0600)]
LU-16639 misc: cleanup concole messages

The lprocfs_job_cleanup() was not properly dropping all jobstats
from the hash table and printing errors from job_stat_exit() at
unmount.  Ensure all stats are "old enough" when @clear is set.

Change early libcfs cfs_cpu_init() messages from CERROR() to
pr_err() to avoid circular dependencies on libcfs setup before
printing an error message to the console during module init.

Lustre-change: https://review.whamcloud.com/50283
Lustre-commit: 8f40a3d7110da1af8e310a4b7f40b86f13080938

Test-Parameters: trivial
Fixes: ea2cd3af7b ("LU-11407 obdclass: add start time to stats files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ide3f502103392a79419cc1836200bf5a1a3ebbe5
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53063
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-17251 tests: use stderr in precreated_ost_obj_count()
Andreas Dilger [Thu, 9 Nov 2023 23:28:45 +0000 (16:28 -0700)]
LU-17251 tests: use stderr in precreated_ost_obj_count()

Write the status output to stderr instead of stdout, so that
it doesn't confuse the caller that is expecting the number
of objects precreated in stdout.

Test-Parameters: trivial testlist=sanity-scrub,sanity-lfsck
Fixes: c39bdce94f ("LU-17251 test: improve parallel-scale rr_alloc test")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib9b132a04a88b15cea34872954bfa5c4ddf8cde7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53062
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoRM-620 build: New tag 2.14.0-ddn113
Andreas Dilger [Thu, 9 Nov 2023 09:38:47 +0000 (02:38 -0700)]
RM-620 build: New tag 2.14.0-ddn113

New tag 2.14.0-ddn113

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I73eab3dc06a0488b7e68c7434cb8a6af2c590a2f

19 months agoRM-620 build: New tag lipe-2.36
Andreas Dilger [Thu, 9 Nov 2023 09:38:11 +0000 (02:38 -0700)]
RM-620 build: New tag lipe-2.36

New tag lipe-2.36

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6c986a17d42f4bd95009d9d0f03acc601c9ee2dd

19 months agoEX-7601 tests: improve/skip sanity test_460a
Andreas Dilger [Wed, 8 Nov 2023 22:39:28 +0000 (15:39 -0700)]
EX-7601 tests: improve/skip sanity test_460a

Skip sanity test_460a for el9.2 clients, since they appear to be
failing that test regularly, but no other distro client is.
Improve the log messages to see what stage is currently running.
Limit the "cmp --verbose" messages to one chunk, otherwise it
may print the entire 14MB test file (about 80 MiB of ASCII).

Move enable_compression() and disable_compression() functions
into test-framework.sh so that they can be used for all tests.

Set LFS_SETSTRIPE_COMPR_OK=y in enable_compression() since we
already know this is a preview and don't need it printed.

Allow sanity-compr.sh to specify SANITY_ONLY and/or SANITYN_ONLY,
and skip the other test script run if only one of them is set.

Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY=460,HONOR_EXCEPT=y clientdistro=el9.2
Test-Parameters: testlist=sanity-compr env=SANITY_ONLY=460 clientdistro=ubuntu2204
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8cb2f67689824513335f3fa65e9ea7519e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53043
Tested-by: jenkins <devops@whamcloud.com>
19 months agoEX-8570 lipe: add lpcc sub command to trigger purge scan
Lei Feng [Wed, 8 Nov 2023 08:01:05 +0000 (16:01 +0800)]
EX-8570 lipe: add lpcc sub command to trigger purge scan

Add a sub command 'lpcc purge-scan' to trigger purge
scanning by sending SIGUSR2 to matching lpcc_purge
process.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I976621fe787daa15b8206eed97efdebe75cd7425
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53036
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8569 lipe: trigger lpcc_purge scan by SIGUSR2
Lei Feng [Wed, 8 Nov 2023 04:46:45 +0000 (12:46 +0800)]
EX-8569 lipe: trigger lpcc_purge scan by SIGUSR2

send SIGUSR2 to lpcc_purge to trigger a scanning
immediately.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I2811c90ac75c93167e8104e90b424ac31c8cc50c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8568 lipe: lpcc_purge can disable force scanning
Lei Feng [Wed, 8 Nov 2023 02:07:32 +0000 (10:07 +0800)]
EX-8568 lipe: lpcc_purge can disable force scanning

when force_scan_interval is set to -1, lpcc_purge will never
start force scanning.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I21bcadb97f09622eae08af73082196e816b2c9ae
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-4125 lipe: adjust atime in lpcc_purge
Lei Feng [Mon, 6 Nov 2023 06:40:13 +0000 (14:40 +0800)]
EX-4125 lipe: adjust atime in lpcc_purge

Some time atime < mtime. In this case, adjust atime with mtime.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc env=ONLY="200-203"
Change-Id: I35b3da543b57265b09ef65f4e810761aa727f483
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53002
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8551 lipe: build arch-specific lipe-lpcc package
Lei Feng [Tue, 7 Nov 2023 04:06:24 +0000 (12:06 +0800)]
EX-8551 lipe: build arch-specific lipe-lpcc package

lpcc_purge in lipe-lpcc package is an exec binary.
So need arch-specific lipe-lpcc package.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I0387e258eaec6e39156f823d3a38b5dc3fb9a4cd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Raphael Druon <rdruon@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.0-ddn112
Shaun Tancheff [Tue, 22 Feb 2022 07:28:50 +0000 (01:28 -0600)]
LU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.0-ddn112

Prior to v2_14_55-29-g06e586016d setting create_count greater
than the maximum returned -ERANGE.

During interop testing skip sanity/823 for MDS older than 2.14.0-ddn112.

Lustre-change: https://review.whamcloud.com/46567
Lustre-commit: 5da859e262dd5e93bfeb2bfa1366a9e20395d3f4

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=823
Fixes: 06e586016d3a ("LU-13941 osp: Silently lower requested create_count to maximum")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie79617deea047b2a846f696473b9c2b5681953be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53022
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-10465 osd-ldiskfs: 8MiB IOs should bypass cache
Andreas Dilger [Fri, 3 Nov 2023 23:49:29 +0000 (17:49 -0600)]
LU-10465 osd-ldiskfs: 8MiB IOs should bypass cache

Changes the writethrough_max_io_mb and readcache_max_io_mb
params to check for IO size >= max_io_mb instead of > max_io_mb
when deciding to bypass cache.

Read/write IOs that are 8MiB in size should bypass the pagecache
on the OSTs, rather than requiring IOs that are slightly larger
than this.  8MiB is enough to submit 1MiB to each HDD spindle in
an 8+2 RAID6, and caching these writes on the OSS is not helping.

Lustre-change: https://review.whamcloud.com/52989
Lustre-commit: TBD (from dcdc4748f1443981a170bc2945b178226e64a6d4)

Test-Parameters: trivial
Fixes: 3043c6f189 ("LU-12071 osd-ldiskfs: bypass pagecache if requested")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iae435f5b99e2e8bc6a9458fedad65a81c2853350
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
19 months agoLU-16958 llite: migrate deadlock on not responding lock cancel
Bobi Jam [Thu, 21 Sep 2023 14:24:32 +0000 (22:24 +0800)]
LU-16958 llite: migrate deadlock on not responding lock cancel

lfs migrate race makes MDS hang with following backtrace

[ 3683.248584] [<0>] ldlm_completion_ast+0x78d/0x8e0 [ptlrpc]
[ 3683.250122] [<0>] ldlm_cli_enqueue_local+0x2fd/0x840 [ptlrpc]
[ 3683.251363] [<0>] mdt_object_local_lock+0x50e/0xb10 [mdt]
[ 3683.252615] [<0>] mdt_object_lock_internal+0x187/0x430 [mdt]
[ 3683.253793] [<0>] mdt_object_lock_try+0x22/0xa0 [mdt]
[ 3683.254857] [<0>] mdt_getattr_name_lock+0x1317/0x1dc0 [mdt]
[ 3683.256016] [<0>] mdt_intent_getattr+0x264/0x440 [mdt]
[ 3683.257105] [<0>] mdt_intent_opc+0x452/0xa80 [mdt]
[ 3683.258126] [<0>] mdt_intent_policy+0x1fd/0x390 [mdt]
[ 3683.259191] [<0>] ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
[ 3683.260350] [<0>] ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
[ 3683.261596] [<0>] tgt_enqueue+0xa4/0x200 [ptlrpc]
[ 3683.262662] [<0>] tgt_request_handle+0xc9c/0x1a40 [ptlrpc]
[ 3683.263860] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
[ 3683.265220] [<0>] ptlrpc_main+0xbf3/0x1540 [ptlrpc]
[ 3683.266303] [<0>] kthread+0x134/0x150
[ 3683.267111] [<0>] ret_from_fork+0x35/0x40

The deadlock happens as follows:
T1:
vvp_io_init()
->ll_layout_refresh() <= take lli_layout_mutex
->ll_layout_intent()
->ll_take_md_lock()  <= take the CR layout lock ref
->ll_layout_conf()
->vvp_prune()
->vvp_inode_ops() <= release lli_layout_mtex
->vvp_inode_ops() <= try to acquire lli_layout_mutex
-> racer wait here for T2
T2:
->ll_file_write_iter()
->vvp_io_init()
->ll_layout_refresh() <= take lli_layout_mutex
->ll_layout_intent() <= Request layout from MDT
-> racer wait from server...

And server want to cancel the CR layout lock T1 hold, and it won't
happen. Also T1 could has take extent ldlm lock while waiting
lli_layout_mutex hold by T2, and ofd_destroy_hdl does not get the
lock cancellation response from T1.

lli_layout_mutex is only needed for enqueuing layout lock from server,
so that ll_layout_conf() does not involve with lli_layout_mutex.

Lustre-commit: TBD (from 7de620b53bea8a2fc252ceea4787f1226ce63a02)
Lustre-change: https://review.whamcloud.com/52388

Fixes: 8f2c1592c3 ("LU-16958 llite: migrate vs regular ops deadlock")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib94de2c63544c3a962199aa0537418255980ae8c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16043 osc: allow error for write on CL_FSYNC_DISCARD
Vladimir Saveliev [Wed, 26 Jul 2023 13:09:18 +0000 (16:09 +0300)]
LU-16043 osc: allow error for write on CL_FSYNC_DISCARD

If case of CL_FSYNC_DISCARD error is allowed for write of osc object.

Otherwise, the included test fails in rm with:
  (osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16
  (osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed:
  (osc_page.c:175:osc_page_delete()) LBUG

Lustre-change: https://review.whamcloud.com/48032
Lustre-commit: 050c2fb23b1f98745305a3dfe3062ea5a66dfdb4

Test-Parameters: trivial testlist=sanity env=ONLY=907
HPE-bug-id: LUS-10410
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0aae0dc470ba0371964e7643a6d84b19a1b4e106
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16609 target: top_trans_create cannot alloc memory
Andrew Perepechko [Tue, 10 Jan 2023 21:53:38 +0000 (16:53 -0500)]
LU-16609 target: top_trans_create cannot alloc memory

top_trans_create() requests __GFP_IO memory allocation,
which does not allow direct reclaim. However, if the
memory shortage is temporary, direct reclaim is reasonable.
GFP_NOFS is __GFP_IO with additional reclaim bits.

Lustre-change: https://review.whamcloud.com/50176
Lustre-commit: 9d1f8f1e3557ee3349c623f4f5596df44f60b082

Change-Id: I2c84d9d74188660063c948573780745a2b59a688
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17197 obdclass: preserve fairness when waiting for rpc slot
Shaun Tancheff [Wed, 18 Oct 2023 03:54:59 +0000 (22:54 -0500)]
LU-17197 obdclass: preserve fairness when waiting for rpc slot

When obd_get_mod_rpc_slot() waits for an available slot it places the
waiting thread at the HEAD of the queue, so it will be woken before
anything else that is already queued.  This is clearly unfair and can
hurt performance.

So change to always add to the tail to ensure a FIFO ordering (except
that CLOSE might sometimes be woken a bit early).

This regression was introduced in a rewrite that was supposed to make
waiting more fair - by avoiding a broadcast wakeup for "close"
requests.

Also fix some stale comments and expose __add_wait_queue_entry_tail

Running mdtest with the patch applied shows about a 3% improvement:

                             master            patched
  mdtest-easy-write      350.585906 kIOPS   353.783545 kIOPS
   mdtest-easy-stat     1320.329353 kIOPS  1408.320419 kIOPS
 mdtest-easy-delete      285.084103 kIOPS   289.625900 kIOPS
            [SCORE]      509.115803 kiops   524.516113 kiops

Lustre-change: https://review.whamcloud.com/52738
Lustre-commit: TBD (from 7e28964085a4d98111b926fe125abc7f815e70be)

Fixes: 5243630b09d2 ("LU-15947 obdclass: improve precision of wakeups for mod_rpcs")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If767c4299bcbab71589b0f3c01e85bf461686ca5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17251 test: improve parallel-scale rr_alloc test
Alex Deiter [Wed, 1 Nov 2023 22:54:32 +0000 (02:54 +0400)]
LU-17251 test: improve parallel-scale rr_alloc test

Added checking for pre-created OST objects and waiting
(maximum 60 seconds) before executing the rr_alloc test.

Lustre-change: https://review.whamcloud.com/52940
Lustre-commit: TBD (from 3f1f70264e1ed9ba77094435fc598bc9abbbc044)

Test-Parameters: trivial
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=8
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: Ib604b99138ceccf384476ad2876d9df7cd7d524b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17251 osp: force precreate if create_count grows
Andreas Dilger [Fri, 3 Nov 2023 00:32:44 +0000 (18:32 -0600)]
LU-17251 osp: force precreate if create_count grows

Force the MDS to precreate OST objects if "osp.*.create_count" is
written and the OSP does not have at least that many precreated
objects locally.  This avoids doing complex operations in test
scripts to force precreation to run, which slows down the tests
and increases the chance that a test might fail.

Previously opd_precreate_force was only used for handling OSTs
that were reformatted and this reset "create_count" to minimum, so
move that to the reformat case rather than in the precreate code
path so it does not reset "create_count" when it was just set.

Remove the "env" argument from several precreate-related functions,
since it wasn't used in those functions, and that made it difficult
to call them from the "create_count" parameter handling.

Lustre-change: https://review.whamcloud.com/52968
Lustre-commit: TBD (from 0206ef4d765aca3f298e24dd630f155114781986)

Test-Parameters: testlist=parallel-scale env=ONLY=test_rr_alloc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iac35c1b981fcd6ab2d1ea5abc9ffe2e4563ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52998
Tested-by: jenkins <devops@whamcloud.com>
19 months agoLU-17249 ptlrpc: protect scp_rqbd_idle list operations
Mikhail Pershin [Wed, 1 Nov 2023 14:55:39 +0000 (17:55 +0300)]
LU-17249 ptlrpc: protect scp_rqbd_idle list operations

Protect scp_rqbd_idle list entry getting by spinlock
in ptlrpc_service_purge_all() like it does in all
other places where rqbd_list linkage is being managed

Lustre-change: https://review.whamcloud.com/52931
Lustre-commit: 9ba375983d498690f5caa29c289c137470a76505

Test-Parameters: testgroup=full-part-1 env="SLOW=yes,ENABLE_QUOTA=yes"
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Iace37b1ee79bfd0c3a54a35722952e17d860a91c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17103 lnet: use workqueue for lnd ping buffer updates
Serguei Smirnov [Tue, 26 Sep 2023 23:57:46 +0000 (16:57 -0700)]
LU-17103 lnet: use workqueue for lnd ping buffer updates

Introduce workqueue for handling lnd-initiated ping buffer
update requests.

This is done to avoid the possibility of monitor thread
lock up waiting for the "old" ping buffer refcount to get
decremented during the update, while the message which
triggers the decrement is on the monitor thread's own queue
waiting to be processed.

Lustre-change: https://review.whamcloud.com/52522/
Lustre-commit: TBD (from 1200e9ce1b8272f4affb20386570a9a6e79ceeb4)

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY="207 500",ONLY_REPEAT=50
Fixes: 7ac399c5 ("LU-16949 lnet: get monitor thread to update ping buffer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I5176581703e52f4adbfff417040bebcc2489b79e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52936
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17207 lnet: race b/w monitor thr stop and discovery push
Serguei Smirnov [Tue, 17 Oct 2023 18:43:14 +0000 (11:43 -0700)]
LU-17207 lnet: race b/w monitor thr stop and discovery push

As a result of race, discovery thread may attempt to dereference
a message on ln_mt_resendqs which was just freed by monitor thread
stopping. Make sure discovery thread is stopped first.

Lustre-change: https://review.whamcloud.com/52734/
Lustre-commit: TBD (from 5c6ca4991382a805da6e824c1dbfab931987dda6)

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0dfcf3bc5bb3c8df195388599f571bdd3caaa3d7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8543 tools: remove laudit/laudit-report
Sebastien Buisson [Tue, 7 Nov 2023 16:00:37 +0000 (17:00 +0100)]
EX-8543 tools: remove laudit/laudit-report

laudit/laudit-report is a demonstration tool for what is possible in
terms of Lustre audit. It is not meant to be used in production
because it stores the audit data as plaintext flat files, which is
both not secure and not scalable. And it is largely untested at scale.

So remove laudit/laudit-report from lipe sources, and fix build and
packaging mechanisms accordingly.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I36fbd50cd4485f2cc7b0ee91922e58f92e008255
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53015
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17248 kernel: wait for pages under writeback for bdev
Li Dongyang [Mon, 6 Nov 2023 04:22:47 +0000 (15:22 +1100)]
LU-17248 kernel: wait for pages under writeback for bdev

Use a better version of kernel patch instead of
just adding SB_I_STABLE_WRITES flag to bdev superblock.
We don't need to wait for page writeback for all block devices,
even for those don't require stable_page.

Test-Parameters: trivial
Fixes: 5968bc3954 ("LU-17248 kernel: add SB_I_STABLE_WRITES to bdev sb flag")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I20cfa33c4ef45b10e6a732e325698c6b1b00bc79
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53001
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16843 ldiskfs: merge extent blocks
Alex Zhuravlev [Tue, 7 Nov 2023 14:56:16 +0000 (17:56 +0300)]
LU-16843 ldiskfs: merge extent blocks

There are cases (e.g. file written synchronously with discontiguous
blocks that are later filled in) when a lot of extents are created
initially, then the extents get merged over time, but there is no
way to merge the index blocks.  This can cause a very deep extent
index tree (above 5 levels) and cause problems like:

inode has invalid extent depth: 6

Merge leave/index blocks (one at each level at most) to right/left
when extents are removed from the index.

submitted to ext4@ maillist:
https://lore.kernel.org/linux-ext4/
7A2B8861-96AA-4815-BB58-180F63F62436@whamcloud.com/

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I09dfab193d82e7c99620ddb95aff2015023f73aa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8369 ldiskfs: fix dense writes
Alex Zhuravlev [Mon, 30 Oct 2023 10:16:49 +0000 (13:16 +0300)]
EX-8369 ldiskfs: fix dense writes

don't mix "dense" and regular writes as regular are bound
to logical offsets.

Fixes: f36eda6a1e ("LU-10026 osd-ldiskfs: use preallocation for dense writes")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9f6b2c600f2132dcad23726f2fb3848ab02cc117
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52888
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17254 lnet: Fix ofed detection with specific kernel version
Aurelien Degremont [Tue, 7 Nov 2023 19:38:53 +0000 (11:38 -0800)]
LU-17254 lnet: Fix ofed detection with specific kernel version

Improve OFED configure step with LNET when the kernel version
is using special characters that could be interprated in regexp
mode.

This is not uncommon in Debian world to have '+' in kernel version.

Lustre-change: https://review.whamcloud.com/52949
Lustre-commit: b83156304df2d418aadb5d3dfd5f570ef72a7e2e

Test-parameters: trivial
Change-Id: Ia3da59c74d8c2e59e16525dd50c7b83c2b5fada8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53021
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-17257 build: use pkg-config to find krb5 libdir
Jian Yu [Tue, 7 Nov 2023 19:47:01 +0000 (11:47 -0800)]
LU-17257 build: use pkg-config to find krb5 libdir

This patch fixes kerberos5.m4 to use pkg-config to
find krb5 libdir instead of looking for the krb5
libraries in a static list of path.

Lustre-change: https://review.whamcloud.com/53010
Lustre-commit: TBD (from 9cccb643173acf536f542103d47e4af7057c0ff9)

Test-Parameters: trivial kerberos=true testlist=sanity-krb5

Change-Id: Ia15812932942171b019f3e73034a78f9185c16ce
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53024
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16518 utils: fix clang build errors
Timothy Day [Wed, 8 Nov 2023 20:22:54 +0000 (12:22 -0800)]
LU-16518 utils: fix clang build errors

This patch fixes a number of small clang build
errors in Lustre utils. Many errors are related
to nuances in typing or statements which appear
to be tautologies. These are resolved.

Some unneeded paranthesis are removed. A variable
is initialized which could potentially be left
uninitialized. And a comparison was added that
seemed to be left out.

Lustre-change: https://review.whamcloud.com/50161
Lustre-commit: 632dc6729abcaf83aeaef8167a73ce18b9a41a67

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id3f40b033e640f8d2ae6386f66a88de06fc89666
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53042
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17256 debian: allow building client dkms on arm64
Aurelien Degremont [Tue, 7 Nov 2023 19:43:48 +0000 (11:43 -0800)]
LU-17256 debian: allow building client dkms on arm64

Just add 'arm64' on the supported architecture list
for 'lustre-client-modules-dkms' debian package.

Lustre-change: https://review.whamcloud.com/52951
Lustre-commit: c4c9a8eea31cf9aa02f75ca3f119f90d67c70cc5

Test-Parameters: trivial
Change-Id: I2af307ee87448faeec81f6e0e27573ae980710f1
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53023
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoRM-620 build: New tag 2.14.0-ddn112
Andreas Dilger [Sun, 5 Nov 2023 10:52:56 +0000 (03:52 -0700)]
RM-620 build: New tag 2.14.0-ddn112

New tag 2.14.0-ddn112

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibd5e877813d29da337ac1343dcdd3223ef2e7355

19 months agoRM-620 build: New tag lipe-2.35
Andreas Dilger [Sun, 5 Nov 2023 10:52:20 +0000 (03:52 -0700)]
RM-620 build: New tag lipe-2.35

New tag lipe-2.35

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I85314a2e67b809e0ebe40d3428db6ab19a5c554a

19 months agoLU-16667 build: fix extra errors related to struct mnt_idmap
Jian Yu [Thu, 2 Nov 2023 18:54:42 +0000 (11:54 -0700)]
LU-16667 build: fix extra errors related to struct mnt_idmap

This patch fixes the following extra build errors related to
struct mnt_idmap:

lustre/llite/pcc.c:2440:40: error: passing argument 1 of
'inode_owner_or_capable' from incompatible pointer type
[-Werror=incompatible-pointer-types]
 2440 |                 inode_owner_or_capable(&init_user_ns, inode)) ||
      |                                        ^~~~~~~~~~~~~
      |                                        |
      |                                        struct user_namespace *
include/linux/fs.h:1624:47: note: expected 'struct mnt_idmap *'
but argument is of type 'struct user_namespace *'
 1624 | bool inode_owner_or_capable(struct mnt_idmap *idmap,
      |                             ~~~~~~~~~~~~~~~~~~^~~~~
lustre/llite/pcc.c:3656:40: error: passing argument 1 of
'inode->i_op->fileattr_set' from incompatible pointer type
[-Werror=incompatible-pointer-types]
 3656 |         rc = inode->i_op->fileattr_set(&init_user_ns, dentry, &fa);
      |                                        ^~~~~~~~~~~~~
      |                                        |
      |                                        struct user_namespace *
lustre/llite/pcc.c:3656:40: note: expected 'struct mnt_idmap *'
but argument is of type 'struct user_namespace *'

Change-Id: Ia310f6f9053228b38b41243912dfe7818cfef33a
Test-Parameters: trivial
Fixes: 3011aa5 ("LU-16667 build: struct mnt_idmap, linux/filelock.h")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16802 build: compatibility for 6.4 kernels
Shaun Tancheff [Thu, 2 Nov 2023 18:50:25 +0000 (11:50 -0700)]
LU-16802 build: compatibility for 6.4 kernels

linux kernel v6.3-rc4-32-g6eb203e1a868
  iov_iter: remove iov_iter_iovec()

Provide a replacement iov_iter_iovec() when one is not provided.

linux kernel v6.3-rc4-34-g747b1f65d39a
  iov_iter: overlay struct iovec and ubuf/len

This renames iov_iter member iov to __iov and provides the
iov_iter() accessor.
Define __iov as iov when __iov not present.
Provide an iov_iter() for older kernels.

linux kernel v6.3-rc1-13-g1aaba11da9aa
  driver core: class: remove module * from class_create()

Provide an ll_class_create() to pass THIS_MODULE, or not,
as needed by class_create().

Linux commit v6.2-rc1-20-gf861646a6562
  quota: port to mnt_idmap

Update osd_dquot_transfer to use mnt_idmap and fallback
to user_ns, if needed, by dquot_transfer.

Linux commit v6.3-rc7-2433-gcf64b9bce950
  SUNRPC: return proper error from get_expiry()

Updated get_expiry() requires a time64_t pointer to be passed
to hold the expiry time. A non-zero return value indicates an
error, nominally -EINVAL. Provide a wrapper for kernels that
return a time64_t and return -EINVAL on error.

Lustre-change: https://review.whamcloud.com/50875
Lustre-commit: TBD (from 1bd4e67d1f635e0a5f94280c4bab85668ce677ca)

Test-Parameters: trivial
HPE-bug-id: LUS-11614
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I765d6257eec8b5a9bf1bd5947f03370eb9df1625
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17006 lnet: set up routes for going across subnets
Serguei Smirnov [Fri, 11 Aug 2023 00:58:11 +0000 (17:58 -0700)]
LU-17006 lnet: set up routes for going across subnets

Modify ksocklnd-config to set up route which features
default gateway for the subnet in case if default gateway
is defined, for example:
        ip route add default via <gw_for_eth0> dev eth0 table eth0
which results in a route similar to the following added to
the eth0 route table:
        default via <gw_for_eth0> dev eth0

If there's no gateway found for the eth0 subnet, keep the old
behaviour which results in the following added to eth0
route table:
        <eth0_subnet> dev eth0 proto kernel scope link src <eth0_ip>

This makes sure that MR traffic goes out the intended interface
as selected by LNet no matter whether going across subnets or not.

Lustre-change: https://review.whamcloud.com/51921
Lustre-commit: 7f60b2b5580f67ca55e53a78dbaf7d50b5b7ab47

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I84a299c8b7eb4cdb4fc24408a1e42ad0283d9219
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52190
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-17103 lnet: Avoid deadlock when updating ping buffer
Chris Horn [Mon, 25 Sep 2023 19:03:20 +0000 (14:03 -0500)]
LU-17103 lnet: Avoid deadlock when updating ping buffer

lnet_peer_send_push() adds a reference to the the_lnet.ln_ping_target
lnet_ping_buffer. This reference is dropped by
lnet_discovery_event_handler. When the LNet configuration is modified
the ln_api_mutex is held and lnet_ping_target_update() is called to
update the ln_ping_target to reflect the new configuration.
While holding the ln_api_mutex, lnet_ping_target_update() will wait
until all refs on the old ping buffer are dropped. This can result
in a deadlock if the ln_api_mutex is required to complete the push.

Here is one scenario where this can happen:

1. PUSH is sent by discovery thread
2. LNet configuration is modified. lnetctl process is holding
ln_api_mutex and waiting in lnet_ping_target_update()
3. Local NI goes into recovery
4. Monitor thread wakes and attempts to send ping to local NI. If this
is the first ping sent to this NI then monitor thread needs
ln_api_mutex to create peer NI object for local NI.
(LNetGet->
 lnet_send->
lnet_select_pathway->
lnet_peerni_by_nid_locked->
mutex_lock(&the_lnet.ln_api_mutex))
5. PUSH (1) fails with local timeout. It is placed on monitor thread
resend queue.
6. monitor thread cannot process resend queue until it acquires
ln_api_mutex. ln_api_mutex cannot be acquired until monitor thread
processes resend queue. Deadlock.

Fix is to drop ln_api_mutex before calling lnet_ping_md_unlink() in
lnet_ping_target_update(). This should ensure that updates to the
ping target are still synchronized via ln_api_mutex as intended, but
we're able to clear refs on the old ping buffer as needed.

Lustre-change: https://review.whamcloud.com/52479/
Lustre-commit: 3ca6ba39a21cfebc81bbe7f889c486bb82bb563a

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=207,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I20cda185a865192f1ad162eaef1b8b4e5d751b2c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-17248 kernel: add SB_I_STABLE_WRITES to bdev sb flag
Li Dongyang [Wed, 1 Nov 2023 11:36:10 +0000 (22:36 +1100)]
LU-17248 kernel: add SB_I_STABLE_WRITES to bdev sb flag

Since RHEL 8.6 wait_for_stable_page() is controlled by
a new flag SB_I_STABLE_WRITES on the super block.

However the new flag is not set on the bdev pseudo sb,
which mean when doing write directly to the block device
we are not waiting on page writeback, this could trigger
false block integrity errors, as page could be modified
again when under writeback, the integrity checksum does
not match the new data any more.

Lustre-change: https://review.whamcloud.com/52922
Lustre-commit: TBD (from 5aeffdbec699abad07ed2326723c7743faadbf8a)

Change-Id: Ie088abf29f40b294c31f993bcfad56d6081a3fce
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52969
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17235 o2iblnd: adding alias ib interface causes crash
Serguei Smirnov [Mon, 30 Oct 2023 19:13:45 +0000 (12:13 -0700)]
LU-17235 o2iblnd: adding alias ib interface causes crash

Commit 09c6e2b872 (LU-16836) causes o2iblnd startup routine to crash
when alias ib interface is used:

        ifconfig ib0:0 10.1.0.52 up
        modprobe lnet
        lnetctl lnet configure
        lnetctl net add --net o2ib --if ib0:0

Fix the code which attempts to set the NI status on startup to deal
with the case when corresponding net_device is not found gracefully.

Lustre-change: https://review.whamcloud.com/52894/
Lustre-commit: TBD (from 26a00e20fad0cd7871c30fe65653415566b498dc)

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 09c6e2b872 ("LU-16836 lnet: ensure dev notification")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iaef9280a10f27ac28b872d9f4bc119c4d459ef40
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52910
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-7584 ptlrpc: define nrs_orr_object.oo_ref atomic_t
Lei Feng [Fri, 9 Jun 2023 03:04:37 +0000 (11:04 +0800)]
EX-7584 ptlrpc: define nrs_orr_object.oo_ref atomic_t

nrs_orr_object.oo_ref is a reference counter but not atomic type.
nrs_trr_hash_ops.hs_put() is filled with nrs_orr_hop_put(), which
decreases oo_ref without any protection. So change it to atomic_t
to eliminate any potential race condition.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=77d,ONLY_REPEAT=100
Change-Id: I69d27eebdddab79d7dd7e279391cd841e438b5d3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-13941 osp: Silently lower requested create_count to maximum
Shaun Tancheff [Mon, 23 Aug 2021 14:40:39 +0000 (09:40 -0500)]
LU-13941 osp: Silently lower requested create_count to maximum

When setting create_count it should silently accept a larger value
and truncate it to the current maximum.

This would avoid issues if that limit is changed in the future.

Lustre-change: https://review.whamcloud.com/39967
Lustre-commit: 06e586016d3acc490f922e43e3aee6b8112a2803

HPE-bug-id: LUS-5960
Test-Parameters: trivial testlist=parallel-scale,sanity
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4727ba6fca747e1ae9850188ef63c7abb89904be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52967
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-7600 osc: save compressed object size
Artem Blagodarenko [Wed, 11 Oct 2023 16:23:56 +0000 (12:23 -0400)]
EX-7600 osc: save compressed object size

CSDC uses a sparse file feature. A client writes compressed data
chunks to the original offsets so the same data is expected to be
read from the same offsets.

There are no writes after the last compressed chunk, so no "hole"
after the last compressed chunk.

Compressed file size (based on OST objects size) is smaller than
the original on the "original last chunk size - compressed last
chunk size" delta.

Object size should be set to uncompressed size. This size is used to
calculate file size and to remove the workaround of not compressing
the last chunk in the file.

Fixes: caee1c5 ("EX-7818 osc: don't check for start inside the chunk)
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I387c282e1cf788c3b8f6230ef555d73ffffe49c1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-7849 quota: extra debug messages
Sergey Cheremencev [Tue, 24 Oct 2023 23:55:20 +0000 (03:55 +0400)]
EX-7849 quota: extra debug messages

Add extra debug messages into qmt to find the
root cause of panic:

  qmt_id_lock_glimpse()) ASSERTION( lqe->lqe_gl )

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I05377222e1887b660f759ed11de53cd9e4023ed1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52906
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16868 tests: skip conf-sanity/66 in interop
Andreas Dilger [Tue, 31 Oct 2023 07:36:49 +0000 (01:36 -0600)]
LU-16868 tests: skip conf-sanity/66 in interop

Do not run conf-sanity.sh test_66* in interop testing.  Otherwise,
it is possible that the version of the test script running on the
client does not perform the upgrades with the right steps needed
for remote servers that are running a different version.

Lustre-change: https://review.whamcloud.com/52899
lustre-commit: TBD (from 774e626146ddcbeb527c0939e0210f92bab4c6c3)

Test-Parameters: trivial testlist=conf-sanity env=ONLY=66
Test-Parameters: testlist=conf-sanity env=ONLY=66 serverversion=2.12.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b28b5f123a7348f87d43c54c806eaf6173ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52900
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
19 months agoEX-5258 lipe: add hidden option -show_counters
Vitaliy Kuznetsov [Fri, 3 Nov 2023 15:39:14 +0000 (16:39 +0100)]
EX-5258 lipe: add hidden option -show_counters

This patch adds a hidden output policy option
to lipe_find3 that shows how many total inodes
were scanned and the number of inodes
that matched the filters.

Usage example:
lipe_find3 /dev/nvme1n1 -path dir-1/* -show-counters
Output:
scanned: 1460
matched: 200

Test-Parameters: trivial
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Ibcd22a94e01ea6322cd38fd414e6058314aac8ef
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8441 lipe: lamigo fix compatibility with older lustre
Alexandre Ioffe [Tue, 31 Oct 2023 05:49:23 +0000 (22:49 -0700)]
EX-8441 lipe: lamigo fix compatibility with older lustre

- lfs mirror extend may dump help text either to stdout or stderr.
  Lamigo will adopt both cases
- Make correct exit from loop when ssh remote session fails
- Skip hot-pools tests 75a,75b,75c if lfs mirror extend
  does not support --stats-interval
- Minor code fixes

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iba035043bc4868e7898f3739d03607d5d3e21574
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52898
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoRM-620 build: New tag 2.14.0-ddn111
Andreas Dilger [Fri, 27 Oct 2023 22:03:50 +0000 (16:03 -0600)]
RM-620 build: New tag 2.14.0-ddn111

New tag 2.14.0-ddn111

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I95b2ba340e1e65cee4b661b8d986d365863fa633

19 months agoEX-8421 llite: remove LBUG() from ll_readpage()
Patrick Farrell [Fri, 27 Oct 2023 20:39:22 +0000 (16:39 -0400)]
EX-8421 llite: remove LBUG() from ll_readpage()

This LBUG() has been hitting sometimes in sanity-PCC, which
means EX-8421 is not completely fixed.

Until we can fully sort out EX-8421, we don't want to have
this LBUG enabled on customer systems.  The underlying bug
has been present for some time and the first attempt at an
EX-8421 fix improves the situation.

So, remove the LBUG(), with the intent of putting it back
later once EX-8421 is fixed for real.

Fixes: 3a701bf587 ("EX-8421 llite: disable kernel readahead for pcc mmap")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I21f951d38f67b37626f33068d2a4b64377f4c46a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52858
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8236 pcc: abort in-progress attach by PCC detach command
Qian Yingjin [Thu, 14 Sep 2023 09:23:41 +0000 (05:23 -0400)]
EX-8236 pcc: abort in-progress attach by PCC detach command

A user may want to abort in-progress attach for some purposes such
as freeing space for PCC backend.
To support this operation, we add an "abort" option for PCC detach
command to abort the in-progress attach.

Change-Id: I49fb1c42838f8d7e9728a5c4c6f3d60e959b233b
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52375
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8027 pcc: add --wait option for PCC detach command
Qian Yingjin [Thu, 14 Sep 2023 08:18:23 +0000 (04:18 -0400)]
EX-8027 pcc: add --wait option for PCC detach command

This patch adds "--wait" option for PCC detach command.
PCC detach with this option must wait for in-progress attach on
this file finished.
Add sanity-pcc/test_107 to verify it.

Change-Id: I63d52d514884b15a7b534d0f03deee441a12d3f1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52374
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-8027 pcc: wait for attach finished for detach command
Qian Yingjin [Thu, 14 Sep 2023 06:43:32 +0000 (02:43 -0400)]
EX-8027 pcc: wait for attach finished for detach command

When detach a file from a PCC backend, this file may be still
attaching state. At this time, we add a flag to wait for the
attach finished (PCC_DEATCH_FL_ATTACHING_WAIT). After that, retry
the detach.

Change-Id: If85d95be744e3f7d6a07f880e78de5b68b579ed6
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-7849 tests: add "+quota" to racer
Sergey Cheremencev [Wed, 25 Oct 2023 15:26:16 +0000 (18:26 +0300)]
EX-7849 tests: add "+quota" to racer

Set temporarily "quota" debug level in racer.sh. This
should be removed after the debugging.

Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ie03b8f51bd3298d272d78447f5e6ff6969901886
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-17138 enc: prefer specific crypto engines
Sebastien Buisson [Fri, 22 Sep 2023 16:19:34 +0000 (18:19 +0200)]
LU-17138 enc: prefer specific crypto engines

Some ciphers provided by external accelerators might register under
the generic cipher name. To avoid using them with Lustre, prefer the
AES-NI variant implemented directly in the CPU. And fallback to the
generic cipher if AES-NI is not available.

Introduce a new libcfs kernel module parameter named
'client_encryption_engine' to give the ability to choose the cipher.
By default its value is 'aes-ni', which makes Lustre look for the
AES-NI cipher first. This parameter can be set to 'system-default'
whic makes Lustre pick the generic cipher.

Lustre-change: https://review.whamcloud.com/52477
Lustre-commit: 056eb9dcc0d5f80451c400342d54037f6de24bd9

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8b00f1c3c8dcf11c58e9f40a410b57b2f255e642
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52828
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-16623 lod: handle object allocation consistently
Andreas Dilger [Wed, 8 Mar 2023 23:40:21 +0000 (16:40 -0700)]
LU-16623 lod: handle object allocation consistently

Consistently handle the various OS_STATFS_* flags that indicate
an OST or MDT is full or otherwise marked ineligible for use.

Fix lod_statfs_check() so it skips MDTs with OS_STATFS_ENOINO
for allocating dir stripes instead of only checking OST targets.

In the LOD code, ltd_active=0 indicates that the device is not
usable for new object allocations for a variety of reasons. That
includes out of space or inodes, read-only, max_create_count=0,
or disconnected export, not *only* that the OSP is disconnected
from the OST as with imp_deactive.  Targets marked ltd_active=0
will not be counted in ld_active_tgt_count, so these OSTs will
not count toward stripe_count for stripe_count=-1 files.

Set flags = LOD_USES_DEFAULT_STRIPE in lod_qos_prep_create() for
stripe_count = -1 layouts and pass it to lod_stripe_count_min()
to avoid use of *all* OSTs when free space is imbalanced or OSTs
are not available, and be happy with allocations on 3/4 of OSTs.
It looks like this functionality was missed when object allocations
transitioned from the LOV to LOD module.  Put the LOV_USES_* into
an enum and rename to LOD_USES_* for consistency with current code.

Apply the lod.*.max_stripe_count limits to PFL components as well
as plain file layouts in lod_comp_entry_stripe_count().

Rename ltd_connecting to ltd_discon, since there is no guarantee
that this target is actually *connecting*, only that it is currently
disconnected.  Use ltd_discon in places that checked ltd_active to
decide if the OSP was disconnected from the OST, which shouldn't be
skipped just because the OST is full or has creates disabled.

Lustre-change: https://review.whamcloud.com/50250
Lustre-commit: ced540165ef573570b8a8cba6e43f79e5fc6539f

LU-16981 lod: update llc_stripe_count after ost inactive

If an OST gets deactivated while lod_ost_alloc_qos() is trying to
allocate stripes for a file create, then normally this is caught and
EAGAIN is returned which causes the lod_comp->llc_stripe_count to
get updated to accurately reflect the stripe count. But there is a
race condition and if the OST is deactivated after the call to
ltd_qos_is_usable() but before the stripes are allocated, then
updating the stripe count never occurred.

This causes an LBUG later in lod_striped_create() because fewer
stripes are allocated than the number in llc_stripe_count so it
finds a stripe that is NULL.

The solution is to properly update lod_comp->llc_stripe_count when
the number of stripes created is less than expected.

Lustre-change: https://review.whamcloud.com/51759
Lustre-commit: 78336aa166f4a7a0128a5891c747eecf26ff9565

Test-Parameters: testlist=sanity env=ONLY=27V,ONLY_REPEAT=100
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Fixes: 7b124fef76 ("LU-4277 lod: handle os_state as a flag, check READONLY")
Fixes: 5b147e47de ("LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms")
Fixes: c7f2e70a27 ("LU-1303 lod: QoS allocation policy")
Fixes: c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Fixes: 3c9580931d ("LU-9162 lod: option to set max stripe count per filesystem")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ifb9443fe6c80b4d7f82b442060db7ac8423ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52729
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16334 llite: update statx size/ctime for fallocate
Qian Yingjin [Wed, 23 Nov 2022 07:44:47 +0000 (02:44 -0500)]
LU-16334 llite: update statx size/ctime for fallocate

In the VFS interface ->fallocate(), it should update i_size and
i_ctime returned by statx() accordingly when the file size grows.

Add sanity/150h.

fallocate() call does not update the attributes on MDT.
We use STATX with cached-always mode to verify it as it will not
send Glimpse lock RPCs to OSTs to obtain file size information
and use the caching attributes (size) on the client side as much
as possible.

Lustre-change: https://review.whamcloud.com/49221
Lustre-commit: 51851705e936b2dbc9cf141ecf7ab4e3be04333a

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib8128892222a01cd00250c704328bd13cfb12e2d
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52736
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoEX-8353 csdc: remove holes from struct ll_compr_hdr
Jian Yu [Thu, 19 Oct 2023 18:01:59 +0000 (11:01 -0700)]
EX-8353 csdc: remove holes from struct ll_compr_hdr

This patch reorganizes struct ll_compr_hdr to remove
alignment holes.

Test-Parameters: trivial
Change-Id: I59800b00e3a17972d621bae21ba06509a39b1036
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoEX-7331 sec: disable compression for encrypted files
Sebastien Buisson [Wed, 18 Oct 2023 15:34:02 +0000 (17:34 +0200)]
EX-7331 sec: disable compression for encrypted files

In case a read-modify-write io pattern is carried out on a compressed
file, it has to be handled on server side.
But because encryption cannot be done on server side for security
reasons, we are not able to handle that kind of io pattern if the file
is encrypted + compressed.
So just disable compression for all encrypted files.

Fixes: eb70ba19e9 ("EX-7331 sec: add support for encryption plus compression")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I84881fb1235f015d022751d4cce2d43a7231c2b4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52746
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-15404 ldiskfs: fix truncate during setxattr for el7.9
Andreas Dilger [Thu, 15 Jun 2023 19:32:05 +0000 (13:32 -0600)]
LU-15404 ldiskfs: fix truncate during setxattr for el7.9

Backport the ext4-delayed-iput.patch to rhel7.9 kernels so the
delayed osd-ldiskfs truncate can use s_misc_wq consistently.

This moves the call to the final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Lustre-change: https://review.whamcloud.com/51335
Lustre-commit: 471ce3d95651ca06209a76973cae3bbdb5b6aa2f

Test-Parameters: trivial serverdistro=el7.9
Fixes: e239a14001 ("LU-15404 ldiskfs: truncate during setxattr leads to kernel panic")
Change-Id: Idd70befa6a83818ece06daccf9bb6256813ebbe5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52809
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
19 months agoLU-11912 tests: fix racing in force_new_seq_all
Li Dongyang [Mon, 23 Oct 2023 11:49:55 +0000 (22:49 +1100)]
LU-11912 tests: fix racing in force_new_seq_all

We run force_new_seq in parallel to reduce time spent
on consuming precreated objects.

However this could be racy when multiple MDTs are on
the same MDS, a task could finish for one MDT early
and reset the fail_loc to 0 on MDS while other tasks
are still working on other MDTs.

Replace OBD_FAIL_OSP_FORCE_NEW_SEQ with a new param
prealloc_force_new_seq for osp, so we can control
the seq rollover individually for each osp device.

Lustre-change: https://review.whamcloud.com/52801
Lustre-commit: TBD (from af6dcd597d7f5134de553349c05091e51e0f3dd6)

Change-Id: I52dbd550564ca628a8a85c42951694d58b2b93a9
Fixes: 656fc937cf ("LU-11912 tests: consume precreated objects in parallel")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52802
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16966 osd: take trunc_lock for fallocate
Alex Zhuravlev [Mon, 16 Oct 2023 12:52:58 +0000 (15:52 +0300)]
LU-16966 osd: take trunc_lock for fallocate

as fallocate may need few transactions (or transaction restarted)
we have to avoid any concurrent writes/truncates on this object
until fallocate supports 'restart-from-beginning' - first stop the
transaction, then release the lock, then repeat again (like
the write path does).

Lustre-change: https://review.whamcloud.com/52264
Lustre-commit: 51529fb57f85210e292a15c882cf25a4689ea77d

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0bf38b1886fbf24656b45fe0f87fcbad2227672a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>