Whamcloud - gitweb
fs/lustre-release.git
4 months agoLU-18387 kernel: new kernel [RHEL 9.5 5.14.0-503.14.1.el9_5] 48/56748/6
Jian Yu [Mon, 18 Nov 2024 07:41:16 +0000 (23:41 -0800)]
LU-18387 kernel: new kernel [RHEL 9.5 5.14.0-503.14.1.el9_5]

This patch makes changes to support new RHEL 9.5 release
for Lustre client.

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.5 testlist=sanity
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-3

Change-Id: I1bce12b2b7190bcbd880916049667630aba700c8
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56748
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16907 ptlrpc: correct the reply buffer size for batch RPC 45/56645/4
Qian Yingjin [Thu, 10 Oct 2024 09:48:19 +0000 (17:48 +0800)]
LU-16907 ptlrpc: correct the reply buffer size for batch RPC

The calculation for growing reply buffer size for a batch RPC is
incorrect and it adds the SUB request size wrongly.
This may result in the following panic:
"Max IOV exceeded: 257 should be < 256"
Fix it accordingly.

Fixes: 5a2dfd36f9c ("LU-14139 ptlrpc: grow PtlRPC properly when prepare sub request")
Test-Parameters: testlist=sanity env=ONLY=123f,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3c5151a485cac7f3fb9384cd9fb022143ca3389d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56645
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17660 tests: test symlink file to existing dir 39/56639/9
Feng Lei [Thu, 10 Oct 2024 01:18:59 +0000 (09:18 +0800)]
LU-17660 tests: test symlink file to existing dir

Create a test case to verify LU-17660.

Symlink a file to an existing dir. Expect the symlink entry
under dir. Error if report "cannot overwrite directory".

Fixed in kernels newer than v6.9-rc4-39-gbb32cded3be2 or with
a backported fix.

Signed-off-by: Feng Lei <flei@whamcloud.com>
Test-Parameters: trivial
Test-Parameters: clientdistro=el8.10 testlist=sanity env=ONLY=17p
Test-Parameters: clientdistro=el9.4 testlist=sanity env=ONLY=17p
Test-Parameters: clientdistro=sles15sp5 testlist=sanity env=ONLY=17p
Test-Parameters: clientdistro=sles15sp6 testlist=sanity env=ONLY=17p
Test-Parameters: clientdistro=ubuntu2204 testlist=sanity env=ONLY=17p
Test-Parameters: clientdistro=ubuntu2404 testlist=sanity env=ONLY=17p
Change-Id: I905813c26e78ae3e6df4f88af10ab3f0c596a59b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56639
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18279 obdclass: fix class_add_nids_to_uuid 41/56541/2
Sergey Cheremencev [Mon, 30 Sep 2024 13:52:05 +0000 (16:52 +0300)]
LU-18279 obdclass: fix class_add_nids_to_uuid

Fix class_add_nids_to_uuid to utilize all 32
(MTI_NIDS_MAX) elements in lnet_nid array instead of
31 (MTI_NIDS_MAX -1).

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ibbedda6e28e6c26b11ae95d89ad31afd812c559f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56541
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18273 tests: do reboot and failback in several attempts 14/56514/2
Elena Gryaznova [Fri, 27 Sep 2024 10:44:01 +0000 (13:44 +0300)]
LU-18273 tests: do reboot and failback in several attempts

Instead of playing around the ha_failback_delay values
different for different clusters and to avoid the tests
failures unrelated to Lustre - it looks reasonable just to
try "failover/failback" several times.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11816
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I748e4e580408f7662b2c64576af36637cfe46ef3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18078 quota: check version to force_reint if needed 13/56513/6
Hongchao Zhang [Thu, 3 Oct 2024 12:02:21 +0000 (20:02 +0800)]
LU-18078 quota: check version to force_reint if needed

If the quota setting's version between QMT and QSD is mismatched,
the new quota request will be deferred and won't be applied if
the missing update with the version had been dropped.
This patch will check the version during processing the incoming
quota LDLM glimpse request and inside the QSD writeback thread,
and will trigger the force_reint if the missed version has not
been arrived in time.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I1b33cfbb594e3f1595580d4190fd77efb55bc627
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56513
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18249 o2iblnd: external mofed driver requires 46/56446/4
Shaun Tancheff [Sat, 21 Sep 2024 06:13:08 +0000 (13:13 +0700)]
LU-18249 o2iblnd: external mofed driver requires

Only the o2iblnd, when built for MOFED, should have an external
requires for the external kernel module package.

Remove the requires from lustre-* and apply it only to the
package that includes the o2ib module when built with MOFED.

HPE-bug-id: LUS-12355
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I33133c112066f9dc07fa568594b93ccb7fccf746
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56446
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
4 months agoLU-18241 osc: skip repmsg processing in case of error 38/56438/2
Hongchao Zhang [Mon, 26 Aug 2024 00:06:22 +0000 (08:06 +0800)]
LU-18241 osc: skip repmsg processing in case of error

If the request has failed, the repmsg processing in osc_quotactl
should be skipped.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I0f4c6db93888b1000e6993411aeb3a147f05f5db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56438
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17190 llite: release locks after all read-head pages submit 24/56324/3
Qian Yingjin [Thu, 5 Sep 2024 16:57:05 +0000 (00:57 +0800)]
LU-17190 llite: release locks after all read-head pages submit

We put all acquired DLM extent locks for read-ahead in a list.
After all read-ahead pages are submitted, then the client releases
all DLM extent locks acquired during the previous read-ahead.

By this way, in a extent lock blocking AST, all reading extents
have already submitted and put into the list @oo_reading_exts of
the OSC object. Then the client can check this list to find out
the conflict outstanding extents as all I/O RPC slots (limited by
osc.*.max_rpcs_in_flight) are used out by direct I/Os which take
server-side locking.

Otherwise, in the original way, it matches DLM extent lock, adds
read-ahead pages into queue list, releases the previous matched
lock; repeat this progress for read-ahead and finally submit the
I/O containing all read-ahead pages (@osc_io_submit). The conflict
extents in OES_LOCK_DONE state may be added into the list
@oo_reading_exts after the check in blocking AST.
On the client side in the blocking AST from server-side locking
for DIO it will try to lock the pages in these lockdone extents
to writeback or discard these cached pages covered by the lock;
All pages in lockdone extent are locked (PG_locked), and these
extents are waiting for RPC slots while all RPC slots are used
out by DIO.
Thus it may cause deadlock.

This patch can be used by the next patch about high priority I/O
for blocking AST. The client can check the list @oo_reading_exts
to find out the conflict outstanding extents. Put these conflict
extents into HP list, thus they can be sent to OSTs and handled
ASAP, avoiding the possible deadlock.

Change-Id: I5661607ecba3b6cbd6e29ae3fa14566a5ec045f1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56324
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-15808 ptlrpc: ptlrpc_set_wait() use wait_woken 17/56317/9
Shaun Tancheff [Fri, 13 Sep 2024 02:31:08 +0000 (09:31 +0700)]
LU-15808 ptlrpc: ptlrpc_set_wait() use wait_woken

ptlrpc_set_wait() using a potentially long running condition
ptlrpc_check_set() that can also block.

If it does block during ptl_send_rpc() it could potentially
trigger a warning:
   do not call blocking ops when !TASK_RUNNING

NeilBrown <neilb@suse.de> suggested to use wait_woken() instead.

Convert ptlrpc_set_wait to use wait_woken()
similar to the wait_woken() method used in ptlrpcd.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I544550db58fa2e89ce18a8a43a64fdea7ed57206
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56317
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-12515 ofd: allow setting a target to readonly 04/56304/10
Ronnie Sahlberg [Fri, 6 Sep 2024 04:29:30 +0000 (00:29 -0400)]
LU-12515 ofd: allow setting a target to readonly

Add a control to toggle read-only mode for an OFD
to have it reject all mutating commands with -EROFS

This can be used to temporarily set a device to readonly mode
while identifying and correcting a misbehaving client.
As this prevents clients from destaging data it should not
be kept in readonly mode for too long else clients will
eventually run out of kernel memory.

Example:
   lctl set_param obdfilter.lustre-OST0000.readonly=1

Test-Parameters: trivial
Signed-off-by: Ronnie Sahlberg <rsahlberg@whamcloud.com>
Change-Id: Ia6658fb58aea17624d5c2ef2528696c4355e7b05
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56304
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18086 sec: do not allocate empty pools 71/56271/3
Sebastien Buisson [Fri, 30 Aug 2024 13:53:50 +0000 (15:53 +0200)]
LU-18086 sec: do not allocate empty pools

If the node does not have enough memory, we might end up allocating
empty pools at some orders. Prevent empty pools as they are useless
and the rest of the code expect valid pools.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9c5d9eeb557df0c91efecd6871654d1e48e5662a
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18170 obdclass: fix llog_print_cb() 56/56256/6
Etienne AUJAMES [Wed, 4 Sep 2024 12:22:07 +0000 (14:22 +0200)]
LU-18170 obdclass: fix llog_print_cb()

This patch get rid off static variables in llog_print_cb().
This enables to run several instances of "lctl llog_print" in parallel
without memory corruption.

It also fixes the following:
- buffer overflow checks
- llog EOF detection
- ref leak in lod_iocontrol()

Add regression test conf-sanity 123H.
conf-sanity 123ad is modified to support records with varriable sizes.

Test-Parameters: testlist=conf-sanity env=ONLY=123
Test-Parameters: testlist=conf-sanity env=ONLY=123ad,ONLY_REPEAT=300
Test-Parameters: testlist=conf-sanity env=ONLY=123H,ONLY_REPEAT=20
Test-Parameters: testlist=replay-single env=ONLY=100d,ONLY_REPEAT=10
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ibd971e38392d01b2069d29cb4799fbc922d31684
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-4315 doc: updating l[c-h] man page style 02/56202/5
Frederick Dilger [Thu, 29 Aug 2024 20:49:37 +0000 (14:49 -0600)]
LU-4315 doc: updating l[c-h] man page style

Updating files to match the new code style for Lustre manual pages as
enforced by 'contrib/scripts/checkpatch-man.pl'.

This also includes other changes like removing < > or { } for singular
required arguments and placing [ ] around optional ones as well as
making all arguments CAPITAL and italicized, literal arguments are
bolded. Lines over 80 characters should be split at the natural line
end rather than the word that goes over the limit as fewer lines will
need to be modified when making changes if each sentence is on it's
own line.

Only using features that appear in groff 1.22.3 as this is the
available version is CentOS 8.

Checked files:
- lctl.8
- ldev.8
- ldev.conf.5
- lfs.1
- l_getidentity.8
- l_getsepol.8
- lgss_sk.8
- lhbadm.8

Test-Parameters: trivial
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I77f4c6dd0e28cf09c076052a022885e30c2bbe6f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56202
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-4315 doc: updating llapi-[o-u] man page style 88/56188/4
Frederick Dilger [Wed, 28 Aug 2024 15:52:49 +0000 (09:52 -0600)]
LU-4315 doc: updating llapi-[o-u] man page style

Updating files to match the new code style for Lustre manual pages as
enforced by 'contrib/scripts/checkpatch-man.pl'.

This also includes other changes like removing < > or { } for singular
required arguments and placing [ ] around optional ones as well as
making all arguments CAPITAL and italicized, literal arguments are
bolded. Lines over 80 characters should be split at the natural line
end rather than the word that goes over the limit as fewer lines will
need to be modified when making changes if each sentence is on it's
own line.

Only using features that appear in groff 1.22.3 as this is the
available version is CentOS 8.

Checked files:
- llapi_open_by_fid.3
- llapi_open_by_fid_at.3
- llapi_param_get_paths.3
- llapi_param_get_value.3
- llapi_path2fid.3
- llapi_path2parent.3
- llapi_pcc_attach.3
- llapi_pcc_attach_fid.3
- llapi_pcc_attach_fid_str.3
- llapi_pcc_clear.3
- llapi_pcc_del.3
- llapi_pcc_detach_fid.3
- llapi_pcc_detach_fid_fd.3
- llapi_pcc_detach_fid_str.3
- llapi_pcc_detach_file.3
- llapi_pccdev_get.3
- llapi_pccdev_set.3
- llapi_pcc_state_get.3
- llapi_pcc_state_get_fd.3
- llapi_quotactl.3
- llapi_rmfid.3
- llapi_rmfid_at.3
- llapi_root_path_open.3
- llapi_search_mdt.3
- llapi_search_ost.3
- llapi_search_rootpath.3
- llapi_search_rootpath_by_dev.3
- llapi_search_tgt.3
- llapi_unlink_foreign.3

Test-Parameters: trivial
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Ia31335f543b306c4b2081401133555c5a55401e0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56188
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-4315 doc: update llapi-layout-[f-s] man page style 73/56173/4
Frederick Dilger [Tue, 27 Aug 2024 20:35:44 +0000 (14:35 -0600)]
LU-4315 doc: update llapi-layout-[f-s] man page style

Updating files to match the new code style for Lustre manual pages as
enforced by 'contrib/scripts/checkpatch-man.pl'.

This also includes other changes like removing < > or { } for singular
required arguments and placing [ ] around optional ones as well as
making all arguments CAPITAL and italicized, literal arguments are
bolded. Lines over 80 characters should be split at the natural line
end rather than the word that goes over the limit as fewer lines will
need to be modified when making changes if each sentence is on it's
own line.

Only using features that appear in groff 1.22.3 as this is the
available version is CentOS 8.

Checked files:
- llapi_layout_free.3
- llapi_layout_get_by_fd.3
- llapi_layout_get_by_fid.3
- llapi_layout_get_by_path.3
- llapi_layout_get_by_xattr.3
- llapi_layout_ost_index_get.3
- llapi_layout_ost_index_reset.3
- llapi_layout_ost_index_set.3
- llapi_layout_pattern_get.3
- llapi_layout_pattern_set.3
- llapi_layout_pool_name_get.3
- llapi_layout_pool_name_set.3
- llapi_layout_stripe_count_get.3
- llapi_layout_stripe_count_set.3
- llapi_layout_stripe_size_get.3
- llapi_layout_stripe_size_set.3

Test-Parameters: trivial
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Ifdf0d86e76898bf3b4e7febb40acecb8f12ad702
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56173
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-4315 doc: updating llapi-[cha] man page style 31/56131/3
Frederick Dilger [Thu, 22 Aug 2024 21:20:29 +0000 (15:20 -0600)]
LU-4315 doc: updating llapi-[cha] man page style

Updating files to match the new code style for Lustre manual pages as
enforced by 'contrib/scripts/checkpatch-man.pl'.

This also includes other changes like removing < > or { } for singular
required arguments and placing [ ] around optional ones as well as
making all arguments CAPITAL and italicized, literal arguments are
bolded. Lines over 80 characters should be split at the natural line
end rather than the word that goes over the limit as fewer lines will
need to be modified when making changes if each sentence is on it's
own line.

Only using features that appear in groff 1.22.3 as this is the
available version is CentOS 8.

Checked files:
- llapi_changelog_clear.3
- llapi_changelog_fini.3
- llapi_changelog_free.3
- llapi_changelog_get_fd.3
- llapi_changelog_in_buf.3
- llapi_changelog_recv.3
- llapi_changelog_set_xflags.3
- llapi_changelog_start.3

Test-Parameters: trivial
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Id2b6f92811b8ffbe23fc7a902a14360b8cf583c5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18155 lnet: use LASSERT/F instead of if () LBUG() 16/56116/3
Timothy Day [Thu, 22 Aug 2024 01:28:30 +0000 (21:28 -0400)]
LU-18155 lnet: use LASSERT/F instead of if () LBUG()

We should use a proper LASSERT statement rather than
more verbose if/LBUG blocks.

The patch has been generated with the coccinelle script below.
I manually inverted the conditional logic.

@@
expression L;
expression list F;
@@

- if (L) {
(
- CDEBUG(F);
|
- CWARN(F);
|
- CERROR(F);
|
- CEMERG(F);
|
- CNETERR(F);
|
- LCONSOLE(F);
|
- LCONSOLE_INFO(F);
|
- LCONSOLE_WARN(F);
|
- LCONSOLE_ERROR(F);
|
- LCONSOLE_EMERG(F);
)
- LBUG();
- }
+ LASSERTF(L, F);

@@
expression L;
@@

-if (L) LBUG();
+LASSERT(L);

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I73eff22d6fd4199b02d3ae2cc7740aa754d6945d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56116
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18155 ptlrpc: use LASSERT/F instead of if () LBUG() 14/56114/5
Timothy Day [Thu, 22 Aug 2024 01:16:28 +0000 (21:16 -0400)]
LU-18155 ptlrpc: use LASSERT/F instead of if () LBUG()

We should use a proper LASSERT statement rather than
more verbose if/LBUG blocks.

The patch has been generated with the coccinelle script below.
I manually inverted the logic in the asserts.

@@
expression L;
expression list F;
@@

- if (L) {
(
- CDEBUG(F);
|
- CWARN(F);
|
- CERROR(F);
|
- CEMERG(F);
|
- CNETERR(F);
|
- LCONSOLE(F);
|
- LCONSOLE_INFO(F);
|
- LCONSOLE_WARN(F);
|
- LCONSOLE_ERROR(F);
|
- LCONSOLE_EMERG(F);
)
- LBUG();
- }
+ LASSERTF(L, F);

@@
expression L;
@@

-if (L) LBUG();
+LASSERT(L);

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If3b2d519239ec2c86bc940135346b37ebe1d050e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56114
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-4315 doc: updating lctl-[p-s] man page style 71/56071/8
Frederick Dilger [Thu, 15 Aug 2024 06:59:20 +0000 (00:59 -0600)]
LU-4315 doc: updating lctl-[p-s] man page style

Updating files to match the new code style for Lustre manual pages as
enforced by 'contrib/scripts/checkpatch-man.pl'.

This also includes other changes like removing < > or { } for singular
required arguments and placing [ ] around optional ones as well as
making all arguments CAPITAL and italicized, literal arguments are
bolded. Lines over 80 characters should be split at the natural line
end rather than the word that goes over the limit as fewer lines will
need to be modified when making changes if each sentence is on it's
own line.

Only using features that appear in groff 1.22.3 as this is the
available version is CentOS 8.

Checked files:
- lctl-pcc.8
- lctl-pool_add.8
- lctl-pool_new.8
- lctl-set_param.8
- lctl-snapshot-create.8
- lctl-snapshot-destroy.8
- lctl-snapshot-list.8
- lctl-snapshot-modify.8
- lctl-snapshot-mount.8
- lctl-snapshot-umount.8

Test-Parameters: trivial
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Icd30b49a3025aede26524b09cefa438bdca2ccc4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56071
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16622 utils: 'lfs find --ost' supports index range 02/55902/5
Lei Feng [Tue, 30 Jul 2024 04:14:18 +0000 (12:14 +0800)]
LU-16622 utils: 'lfs find --ost' supports index range

'lfs find --ost' command can accept an index range. For example
    lfs find --ost 0,1-3
It equals to:
    lfs find --ost 0,1,2,3
'lfs find --mdt' can accept index range too.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I094aa480c97440dad8ffaac8af902682b5916d6c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55902
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-16446 utils: specify total count for mirror extend 67/55867/11
Frederick Dilger [Wed, 24 Jul 2024 21:59:36 +0000 (15:59 -0600)]
LU-16446 utils: specify total count for mirror extend

The 'lfs mirror extend -N' can now be used to specify the total
number of mirrors to create on a file by using '=' infront of the
COUNT:
    'lfs mirror extend -N=TOTAL_COUNT'
or
    'lfs mirror extend --mirror-count==TOTAL_COUNT'

This is a no-op if the specified number of mirrors already exists on
that file.

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I76fc416b4dc2c37edf99926bae9a8d42167a49ab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55867
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18035 tests: enhance root_prj_enable tests 57/55757/5
Sergey Cheremencev [Mon, 15 Jul 2024 15:29:24 +0000 (18:29 +0300)]
LU-18035 tests: enhance root_prj_enable tests

Add more tests cases at sanity-quota_1i
"Enable project quota enforcement for root":
1. root can't write after enabling root_prj_enable
   if this project has hit its limit
2. root is still able to write at a different
   project ID directory with larger limit

Test-Parameters: testlist=sanity-quota env=ONLY=1i trivial
Fixes: b53438df70e ("LU-15109 tests: different quota and usage relations")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I58ce986221b2c970957a05408b20a70301e4f51e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55757
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17770 quota: don't panic in qmt_map_lge_idx 76/55476/6
Sergey Cheremencev [Mon, 15 Apr 2024 20:35:56 +0000 (23:35 +0300)]
LU-17770 quota: don't panic in qmt_map_lge_idx

There is a valid case when it is impossible to
map OST index into an appropriate index of lqe global
array(lqe_gblb_array). This might happen when newly
added OSTs haven't connected yet to QMT and there is
no corresponding index files in quota_master/dt-0x0
directory. At the same time if these OSTs already
exist in OST pools, this might cause following panic:

    qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used )
        failed: Cannot map ostidx 32 for 000000000505fcbe
    qmt_map_lge_idx()) LBUG
    ...
    Call Trace TBD:
    libcfs_call_trace+0x6f/0xa0 [libcfs]
    lbug_with_loc+0x3f/0x70 [libcfs]
    qmt_map_lge_idx+0x7f/0x90 [lquota]
    qmt_seed_glbe_all+0x17f/0x770 [lquota]
    qmt_revalidate_lqes+0x213/0x360 [lquota]
    qmt_dqacq0+0x7d5/0x2320 [lquota]
    qmt_intent_policy+0x8d2/0xf10 [lquota]
    mdt_intent_opc+0x9a9/0xa80 [mdt]
    mdt_intent_policy+0x1fd/0x390 [mdt]
    ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
    ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
    tgt_enqueue+0xa4/0x200 [ptlrpc]
    tgt_request_handle+0xc9c/0x1950 [ptlrpc]
    ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
    ptlrpc_main+0xbf1/0x1510 [ptlrpc]
    kthread+0x134/0x150
    ret_from_fork+0x1f/0x40
    Kernel panic - not syncing: LBUG

Add sanity-quota_91. It removes and creates quota slave
index files in quota_master/dt-0x0 to simulate adding
new OSTs in a system.

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I747366af736d408a8965812b48660cca1367becb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55476
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17831 llite: remove refcount assert 54/55054/2
Patrick Farrell [Wed, 8 May 2024 14:36:55 +0000 (10:36 -0400)]
LU-17831 llite: remove refcount assert

This refcount assert costs us a few % in page freeing and
it comes immediately after a dec_and_test.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I8de3369cbbaf484e1e0fce27ae3e62cad1ae5282
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55054
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17831 llite: remove PageLocked assert 53/55053/2
Patrick Farrell [Wed, 8 May 2024 14:36:33 +0000 (10:36 -0400)]
LU-17831 llite: remove PageLocked assert

This pagelocked assert touches the spinlock and costs us
about 5% of our performance when dropping pages.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Ie67f45fc0131d3fa94b70e3734ade2139c481301
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55053
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17431 nodemap: allow handling a nidrange for a dynamic nm 08/54508/18
Sebastien Buisson [Wed, 20 Mar 2024 14:45:41 +0000 (15:45 +0100)]
LU-17431 nodemap: allow handling a nidrange for a dynamic nm

Adding or deleting a nid range for a dynamic nodemap, by sending the
appropriate ioctl to the MDS or OSS device.
On kernel side, we need to allow handling a range for a nodemap even
if the nodemap config file is not accessible (i.e. we are not on the
MGS). This dynamic nodemap is not stored on disk.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I34d6bb720cb700e23a3567a47def71c2b1ca7343
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54508
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17431 nodemap: allow add/del of a dynamic nodemap 07/54507/17
Sebastien Buisson [Wed, 20 Mar 2024 09:05:22 +0000 (10:05 +0100)]
LU-17431 nodemap: allow add/del of a dynamic nodemap

Adding a dynamic nodemap requires specifying the '-d' option to
'lctl nodemap_add'. This in turn sends the appropriate ioctl to
the MDS or OSS device.
Deleting a dynamic nodemap does not require this flag, as we can
figure out if the nodemap to be deleted is dynamic or not.
On kernel side, we need to allow handling a nodemap even if the
nodemap config file is not accessible (i.e. we are not on the MGS).
This dynamic nodemap is not stored on disk.
And we prevent modifications of static, on-disk nodemaps on non-MGS
servers. On these servers it is only possible to modify dynamic
nodemaps.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I10019ad131825c3db520689e01b4c931d90c1c89
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54507
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10499 pcc: add some statistics data 58/54458/7
Lei Feng [Thu, 6 Jan 2022 01:34:15 +0000 (20:34 -0500)]
LU-10499 pcc: add some statistics data

Add statictics of the number and total size of pcc attached files
and pcc hit files.

EX-bug-id: EX-4433
Change-Id: Ib0e429c636298d4c6ff06d84a416073895b86184
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-10499 pcc: command to remove PCC mirror component 57/54457/7
Qian Yingjin [Wed, 20 Oct 2021 04:07:14 +0000 (12:07 +0800)]
LU-10499 pcc: command to remove PCC mirror component

This patch adds a command "lfs pcc delete $FILE" to delete the
PCC foreign mirror layout component.

Describe fields in lfs-pcc-state.1 and lfs-pcc-delete.1 page.

EX-bug-id: EX-4055
Signed-off-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3f56fb8134bd1e7673ef8e04dff9b8482f0e32c3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54457
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10499 pcc: Check if PCC copy is unlinked for state output 51/54451/8
Qian Yingjin [Wed, 29 Sep 2021 03:35:30 +0000 (11:35 +0800)]
LU-10499 pcc: Check if PCC copy is unlinked for state output

In this patch, it adds support for the command "lfs pcc state" to
check whether the PCC copy is in the local client cache or
unlinked improperly.

Do not print an error message if "lfs pcc detach" tries to detach
a file that is already removed from the cache.  This might happen
for a wide variety of reasons (external cache cleanup process, etc).

EX-bug-id: EX-3825
Test-Parameters: testlist=sanity-pcc env=ONLY=48
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ic50c901df78adfaf5b56990120f832e5d74a117c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-18442 mdc: remove PageChecked usage in mdc 04/52504/3
Patrick Farrell [Tue, 12 Nov 2024 22:35:49 +0000 (17:35 -0500)]
LU-18442 mdc: remove PageChecked usage in mdc

PageChecked is set in mdc_readpage, and it's not clear why.
This was added in an early merge of a bunch of code, but
was never checked anywhere.

It's unclear what this ever did, but it's not doing
anything now.  Remove.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I59d40cc5cb9b437182db04b468f7dc222c275424
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52504
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17139 utils: l_getidentity remove 'files' => 'lustre' alias 87/52487/8
Shaun Tancheff [Sat, 23 Sep 2023 07:02:30 +0000 (02:02 -0500)]
LU-17139 utils: l_getidentity remove 'files' => 'lustre' alias

Fully remove the old 'files' alias for 'lustre' and deprecate
'nss_files' alias for 'files' now that it can be used directly.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ied7cf87e9be8d474f9b65a9b5b14870578806151
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52487
Reviewed-by: corey tesdahl <corey.tesdahl@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-16235 hsm: get a valid cookie for RAoLU request 50/51850/11
Etienne AUJAMES [Wed, 2 Aug 2023 09:27:41 +0000 (11:27 +0200)]
LU-16235 hsm: get a valid cookie for RAoLU request

Add a way to get a valid cookie when nobody initializes
cdt_last_cookie.

RAoLU policy is allowed to queue a remove request with the
coordinator stopped. In that cases cdt_last_cookie can not be yet
initialize and the remove request can be queued with a conflicting
cookie.

This patch adds cdt_update_last_cookie() that reverses process the hsm
llog and stops at the first non-cancel action to determine the last
cookie.

Add the regression test sanity-hsm 26e.

Test-Parameters: testlist=sanity-hsm
Test-Parameters: testlist=sanity-hsm
Test-Parameters: testlist=sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=26e,ONLY_REPEAT=30
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I6468a24b95fcb8768e12f40edfcea3ce8407281f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51850
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14361 statahead: add tunable for fname pattern detection 92/51592/39
Qian Yingjin [Thu, 6 Jul 2023 13:21:40 +0000 (09:21 -0400)]
LU-14361 statahead: add tunable for fname pattern detection

This patch adds two tunable parameters for the detection of the
fname pattern statahead:
- llite.*.statahead_fname_predict_hit: when the naming of stat()
  files under a directory follows a certain name rule roughly more
  than this value, the directory is considered to meet the first
  requirement for statahead. For an example, file naming rule is
  mdtest.$rank.$i, the suffix of the stat() dentry name is number
  and do stat() for dentries with name ending with number more
  then this parameter.
- llite.*.statahead_fname_match_hit: After meet the first
  requirement, then the naming of stat() files under a directory
  continuously satisfies a certain name rule strictly more than
  this value, it will start a statahead thread to do attribute
  prefetching under the directory.

This patch also fixes the following panic:
IP: _atomic_dec_and_lock+0xc/0x70
->ll_sax_put [lustre]
->ll_statahead_thread [lustre]
->kthread

The reason is that the @lli_sax is set with NULL by wrong
statahead context (sax) put.

This patch also fixes the possible deadlock between hardlink and
batch stat-ahead operations.
Fix the test failure on lustre-rsync-test/test_6.

It must wait all inuse inodes for statahead to release during
umount.
Otherwsie, it may cause the following panic:
BUG: Dentry 0000000033ca4f3e{i=280001b840002c7,n=l4}  still in use
(1) [unmount of lustre lustre]
RIP: 0010:umount_check.cold.52+0x2f/0x3b
d_walk+0xe7/0x290
do_one_tree+0x20/0x40
shrink_dcache_for_umount+0x28/0x90
generic_shutdown_super+0x1a/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70

RIP: 0010:ll_prep_md_op_data+0x73/0x870 [lustre]
sa_prep_data+0xde/0x350 [lustre]
sa_statahead+0x3b9/0xd20 [lustre]
ll_statahead_thread+0x1507/0x21f0 [lustre]
kthread+0x134/0x150

Test-Parameters: clientdistro=el8.10 testlist=sanity
Test-Parameters: clientdistro=el8.10 testlist=sanity
Test-Parameters: clientdistro=el8.10 testlist=sanity
Test-Parameters: clientdistro=el8.10 testlist=sanity
Test-Parameters: clientdistro=el8.10 testlist=sanity
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I42d9478e796918d9f2498ab64cf7c20b61334144
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51592
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11850 obd: use netlink to get lustre stats 56/34256/28
James Simmons [Sun, 18 Aug 2024 00:18:18 +0000 (20:18 -0400)]
LU-11850 obd: use netlink to get lustre stats

This adds the ability to collect performance metrics from lustre
in another way then from proc / debugfs files. The move to debugfs
has limited the scope of access to only root. Additionally there
is an expensive cost accessing many virtual file system files to
collect that data. Netlink will scale much better in this case as
well as offer a much more flexiable API.

The new ldebugfs_stats_alloc() replaces lprocfs_stats_alloc() and
registers the stats to be accessible BOTH throught debugfs AND
through Netlink. The new global "lstats_list" contains a list of
all registered sets of statistics, so it mirrors a subset of
debugfs. Netlink access can report on any statistics registered in
lstats_list.

Test-Parameters: trivial
Change-Id: If2d662baa62348fe6f0dd5c8d77344650c2a27d8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/34256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11073 tests: enable DNE in recovery-mds-scale 26/32626/52
Elena Gryaznova [Fri, 27 Sep 2024 12:13:27 +0000 (15:13 +0300)]
LU-11073 tests: enable DNE in recovery-mds-scale

In recovery-mds-scale.sh, recovery-random-scale.sh,
recovery-double-scale.sh add environmental variables
RECOVERY_SCALE_ENABLE_REMOTE_DIRS,
RECOVERY_SCALE_ENABLE_STRIPED_DIRS
to enable or disable the corresponding operations.
RECOVERY_SCALE_ENABLE_REMOTE_DIRS defaults to true if
the MDS version is at least 2.5, while
RECOVERY_SCALE_ENABLE_STRIPED_DIRS defaults to true if
the MDS version is at least 2.8.

Patch also:
simplifies the run_*.sh logic: it removes the never
 used BREAK_ON_ERROR and useless CONTINUE parameters;
 adds the possibility to set the directory striping
 via client_load_SETSTRIPEPARAMS;
 ignores failures caused by ENOSPC error: instead of
 adding the complex free space calculation logic
 for various layouts let's allow the loads to fail with
 ENOSPC. This does not affect fofb testing the recovery
 scale tests are for.
 forces lustre-rsync-test test_4 to run iozone on not
 striped directory with offset=0

Test-Parameters: trivial env=SLOW=no,FAILURE_MODE=HARD,RACER_ENABLE_REMOTE_DIRS=true,RACER_ENABLE_STRIPED_DIRS=true         clientcount=4 mdtcount=1 mdscount=2 osscount=2         austeroptions=-R failover=true iscsi=1         testlist=recovery-mds-scale,recovery-oss-scale,recovery-random-scale,lustre-rsync-test,racer
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
HPE-bug-id: LUS-5973, LUS-6386, LUS-7540, LUS-9399
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Icd42ae9b4e5ac403ba76a9e3909616977dbd6a72
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/32626
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11610 target: Fix correct return value in out_handler.c 60/56260/5
Ronnie Sahlberg [Wed, 4 Sep 2024 04:48:48 +0000 (00:48 -0400)]
LU-11610 target: Fix correct return value in out_handler.c

out_handler.c has a pattern:

    if (IS_ERR(x) || expression) {
        ...
        RETURN(PTR_ERR(x))
    }

which if x is a valid pointer but the expression that represents an
error would cause the function to return PTR_ERR(x) for a valid pointer.
Fix this by either returning PTR_ERR(x) or -EPROTO depending on what the error
is.

Signed-off-by: Ronnie Sahlberg <rsahlberg@whamcloud.com>
Change-Id: Ic9f3427be31a9ba84b46f349ba90cc6aea379845
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56260
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18471 ldiskfs: large directory causes htree corruption 94/57094/3
Shaun Tancheff [Thu, 21 Nov 2024 01:55:54 +0000 (08:55 +0700)]
LU-18471 ldiskfs: large directory causes htree corruption

When creating a lot of files in a single directory, it can
get corrupted because of a typo in ext4-kill-dx-root.patch.

Lustre-change: https://review.whamcloud.com/46526
Lustre-commit: ea3ee9337f9bcd42360e4523f1e34bcd04d3bf41

Test-Parameters: trivial
HPE-bug-id: LUS-12617
Fixes: 8da23f070c ("LU-15544 ldiskfs: SUSE 15 SP4 kernel 5.14.21 SUSE")
Fixes: fc87b01f96 ("LU-12477 ldiskfs: remove obsolete ext4 patches")
Fixes: 89075044b3 ("LU-12477 ldiskfs: drop SUSE kernel 4.4 and earlier")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iacbaf9840db76ea7e2e017835a14b476ca9be391
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-15913 tests: fix set_params_xxx 43/55143/4
Sergey Cheremencev [Fri, 17 May 2024 18:25:28 +0000 (21:25 +0300)]
LU-15913 tests: fix set_params_xxx

Move argument number check from set_params_(clients|mdts|osts)
to set_params_nodes. Without that fix these functions
didn't set (MDS|OSS|CLIENT)_LCTL_SETPARAM_PARAM when
the number of nodes was < 2.

Test-Parameters: trivial
Fixes: 9a1d68f9b8 ("LU-15913 tests: add rename stress test via racer")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I0733109c53cdea435b4461da7ac44e54fa49498c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55143
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18435 lod: recover layout generation from replay 50/56950/5
Alex Zhuravlev [Sun, 10 Nov 2024 07:17:37 +0000 (10:17 +0300)]
LU-18435 lod: recover layout generation from replay

The offset of the layout generation is different between struct
lov_mds_md_v1/v3.lmm_layout_gen and lov_comp_md.lcm_layout_gen.
When checking/setting layout gen, we must use layout-specific field.

Otherwise layout generation can be set to 0 (or other random value)
after replay and client can't apply new layout during later update.

Fixes: 13557aa86904 ("LU-15300 mdt: refresh LOVEA with LL granted")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5e4a63cd097d157317e0e8d1a0fca4a46817d118
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56950
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-15535 lmv: fix lmv_stripe_object_dump() CDEBUG 47/56647/4
Etienne AUJAMES [Thu, 10 Oct 2024 12:06:10 +0000 (14:06 +0200)]
LU-15535 lmv: fix lmv_stripe_object_dump() CDEBUG

"refs" and "magic" are reversed in CDEBUG() arguments of
lmv_stripe_object_dump().

Test-Parameters: trivial
Fixes: 3ebc8e0528e3 ("LU-15535 llite: deadlock on lli_lsm_sem")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ibb4dc6ea38b506f7262da92699cf5ed97a362f3d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18260 o2iblnd: fix race between REJ vs kiblnd_connd 18/56518/3
Etienne AUJAMES [Fri, 27 Sep 2024 14:50:15 +0000 (16:50 +0200)]
LU-18260 o2iblnd: fix race between REJ vs kiblnd_connd

This patch fixes a possible race between CM_EVENT_REJECTED and
kiblnd_connd().

kiblnd_connd() set connection state to IBLND_CONN_DISCONNECTED
before removing the QP. So if CM_EVENT_REJECTED is received in this
time windows, it will cause the following crash:

Workqueue: ib_cm cm_work_handler [ib_cm]
all Trace:
<TASK>
dump_stack_lvl+0x34/0x48
panic+0x100/0x2d2
lbug_with_loc.cold+0x18/0x18 [libcfs]
kiblnd_cm_callback+0x108d/0x10b0 [ko2iblnd]
cma_cm_event_handler+0x1e/0xb0 [rdma_cm]
cma_ib_handler+0x8d/0x2e0 [rdma_cm]
cm_process_work+0x22/0x190 [ib_cm]
cm_rej_handler+0xdf/0x260 [ib_cm]
cm_work_handler+0x47f/0x4d0 [ib_cm]
process_one_work+0x1e8/0x390
worker_thread+0x53/0x3d0
kthread+0x124/0x150
ret_from_fork+0x1f/0x30
</TASK>

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 0b8c18d ("LU-17480 o2iblnd: add a timeout for rdma_connect")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I2d04433eb51e1a6862b788a89e127d8abb24b8a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56518
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-2525 ldlm: add asynchronous flocks 89/4889/79
Andriy Skulysh [Tue, 5 Feb 2019 13:37:48 +0000 (15:37 +0200)]
LU-2525 ldlm: add asynchronous flocks

Add support of asynchronous flocks.
They are used only by Linux nfsd for now.

HPE-bug-id: LUS-3210, LUS-7034,LUS-7031,LUS-8832, LUS-8313
HPE-bug-id: LUS-8592
Change-Id: Iefafaf014fd06d569dc5d1dd22ebb3518d04e99a
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/4889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18390 tests: dump debug logs during module unload correctly 67/56767/5
Emoly Liu [Sun, 3 Nov 2024 10:31:12 +0000 (18:31 +0800)]
LU-18390 tests: dump debug logs during module unload correctly

If DEBUG is enabled, debug logs will be dumped to:
- $TMP/debug by default if DEBUG_RMMOD is unset. And if memory
  leak is found, $TMP/debug will be renamed to $TMP/debug-leak.xxx;
- $DEBUG_RMMOD if $DEBUG_RMMOD is a full path name;
- $TMP/DEBUG_RMMOD if $DEBUG_RMMOD is a filename;
- standard output if DEBUG_RMMOD=- .

Memory leak in conf-sanity.sh test_29 in interop testing(LU-17962)
is used to verify this patch.

Test-Parameters: env=DEBUG=true,IGNORE_LEAK=yes,ONLY=29 \
  testlist=conf-sanity serverversion=2.15 clientdistro=el8.10 \
  serverdistro=el8.10 mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: env=DEBUG_RMMOD=lu18390,IGNORE_LEAK=yes,ONLY=29 \
  testlist=conf-sanity serverversion=2.15 clientdistro=el8.10 \
  serverdistro=el8.10 mdscount=2 mdtcount=4 ostcount=8
Test-Parameters:trivial

Fixes: 255102e84e ("LU-16384 tests: dump lustre log if DEBUG_RMMOD set")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I23a584541d0f9b313cf00e56f63dc4ac356c3cbc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56767
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18352 mgc: explicitly create sptlrpc local copy 09/56609/10
Mikhail Pershin [Tue, 8 Oct 2024 13:10:37 +0000 (16:10 +0300)]
LU-18352 mgc: explicitly create sptlrpc local copy

Sptlrpc config has single instance per MGC and is shared
by targets, hence it is processed only once and local
copy of it is created also only during first processing
for the first target doing that. All other targets just
find config in memory.

Therefore a local copy creation for other targets need
to be done explicitly when already processed config
is found.

Patch introduces mgc_get_local_copy() which does
just llog copy from MGS if possible for a target
finding already processed sptlrpc config

Test-Parameters: testlist=sanity-sec env=ONLY=70,SHARED_KEY=true
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I922f92a950b9a07172f36f42b94da854c7702a80
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56609
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18338 build: fix Ubuntu kernel signature check 13/56613/2
Jian Yu [Tue, 8 Oct 2024 19:42:49 +0000 (12:42 -0700)]
LU-18338 build: fix Ubuntu kernel signature check

The Ubuntu kernel signature was checked by searching for
CONFIG_VERSION_SIGNATURE from autoconf.h. However, in
Ubuntu kernel 6.11.0-061100-generic, CONFIG_VERSION_SIGNATURE
is not defined. We can search for UTS_UBUNTU_RELEASE_ABI
from utsrelease.h.

Test-Parameters: trivial mdtcount=4 mdscount=2 \
  clientdistro=ubuntu2404 testlist=sanity

Test-Parameters: trivial mdtcount=4 mdscount=2 \
  env=SANITY_EXCEPT="255c" \
  clientdistro=ubuntu2204 testlist=sanity

Change-Id: I62e74ef936bbbf4e85130965cfff35aa7aa3be5e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56613
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18334 obdclass: wait for RCU completion 93/56593/5
Alex Zhuravlev [Sun, 6 Oct 2024 14:47:02 +0000 (17:47 +0300)]
LU-18334 obdclass: wait for RCU completion

in lu_kmem_fini(), otherwise otherwise those RCU callbacks doing
kmem_cache_free() can race with kmem_cache_destroy():
kmem_cache_destroy echo_object_kmem: Slab cache still has objects
WARNING: CPU: 1 PID: 7991 at mm/slab_common.c:523 kmem_cache_destroy

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I21dfc034b9fc9368bf22d269d6986297a6812a5c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18277 tests: Add missing env variable for acceptance-small 38/56538/3
Arshad Hussain [Mon, 30 Sep 2024 07:35:45 +0000 (03:35 -0400)]
LU-18277 tests: Add missing env variable for acceptance-small

Although LUSTRE and NAME is defined once auster is
called via test-framwork.sh inclusion. It is still
undefined at this point making running standalone
acceptance-small.sh to fail. This patch defines
missing env variable LUSTRE and NAME when running
standalone accpetance-small.sh

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ib540f4ce7d56f6205f4d9e43bbc3b9e6db94511b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56538
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18256 gss: deprecate insecure enctypes 12/56512/5
Sebastien Buisson [Fri, 27 Sep 2024 07:48:44 +0000 (09:48 +0200)]
LU-18256 gss: deprecate insecure enctypes

A number of encryption types declared in the GSS code are deprecated
for security reasons, and should not be used. So remove support for
them in the Lustre code:
- des-cbc-crc
- des-cbc-md4
- des-cbc-md5
- des-cbc-raw
- des-hmac-sha1
- des3-cbc-sha
- des3-cbc-raw
- des3-cbc-sha1
- arcfour-hmac
- arcfour-hmac-exp

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic8dd2470339323be88a416796c8d420ecd2f55e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56512
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18272 build: remove Summary line from osd-zfs 06/56506/2
Shaun Tancheff [Thu, 26 Sep 2024 10:46:42 +0000 (17:46 +0700)]
LU-18272 build: remove Summary line from osd-zfs

Resolve spurious warning:
   warning: line 390: second Summary
when building src rpm:

Test-Parameters: trivial
HPE-bug-id: LUS-12538
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6aa591aae3ae4dc07a36740e12ef3520cea035ef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56506
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: corey tesdahl <corey.tesdahl@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18256 gss: support SHA2 enctypes 89/56489/4
Sebastien Buisson [Mon, 23 Sep 2024 08:31:07 +0000 (10:31 +0200)]
LU-18256 gss: support SHA2 enctypes

Introduce support for ENCTYPE_AES128_CTS_HMAC_SHA256_128 and
ENCTYPE_AES256_CTS_HMAC_SHA384_192 encryption types that are used by
GSS code for authentication context.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I42ab758b42b24c64647cd771887a2fd26bc55394
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56489
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17190 osc: account DIO in flight using server-side locking 25/56325/3
Qian Yingjin [Wed, 11 Sep 2024 03:06:40 +0000 (11:06 +0800)]
LU-17190 osc: account DIO in flight using server-side locking

Add accounting for DIO (using server-side DLM extent locking) in
flight.
We can see from osc.*.rpc_stats to judge whether all I/O RPC slots,
which are limit by @max_rpcs_in_flight, are used out by DIOs.

Change-Id: I3d2c6e607d19037bb399c2bceb50e64826263469
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56325
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16350 ldiskfs: update for kernel 6.11 84/56284/5
Shaun Tancheff [Tue, 10 Sep 2024 08:58:56 +0000 (15:58 +0700)]
LU-16350 ldiskfs: update for kernel 6.11

Update ext4-kill-dx-root.patch and ext4-lookup-dotdot.patch
for kernel 6.11.

These updates are also applicable to 6.10 stable, tested with
6.10.6 stable.

HPE-bug-id: LUS-11376
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I94f95eeff65b80f879a8b34aea05dc5fa289aa73
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56284
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
4 months agoLU-18190 build: compatibility updates for kernel 6.11 83/56283/10
Shaun Tancheff [Sat, 21 Sep 2024 04:01:42 +0000 (11:01 +0700)]
LU-18190 build: compatibility updates for kernel 6.11

Linux commit v6.10-12269-g78eb4ea25cd5
  sysctl: treewide: constify the ctl_table argument of proc_handlers

Constify ctl_table and cast away const for older kernels

Linux commit v5.15-rc3-13-g9257e1567738
  mm/filemap: Add folio_index(), folio_file_page() and
  folio_contains()
Linux commit v6.10-rc6-27-g05b0c7edad9b
  mm: drop page_index and simplify folio_index

Removed page_index() in favor of folio_index() provide a wrapper
folio_index_page() to call the correct function with a page.

Linux commit v6.1-rc4-186-gcb67f4282bf9
  mm,thp,rmap: simplify compound page mapcount handling
Adds folio_mapcount()
Linux commit v6.10-rc6-100-gcdd9a571b7d8
  fs/proc: move page_mapcount() to fs/proc/internal.h

Removed page_mapcount() in favor of folio_mapcount() provide
folio_mapcount_page() to call page_mapcount() or calculate
mapcount as done by folio_precise_page_mapcount()

Linux commit v6.10-rc3-19-ge9f5f44ad372
  block: remove the blk_integrity_profile structure
Linux commit v6.10-rc3-25-g9f4aa46f2a74
  block: invert the BLK_INTEGRITY_{GENERATE,VERIFY} flags

Invert the checks for BLK_INTEGRITY_NO{GENERATE,VERIFY} when
BLK_INTEGRITY_NOVERIFY is present and remove checks that
require bi->profile->{verify_fn,generate_fn}

Also resolve a gcc-14 issue with -Werror=stringop-truncation
in lustre/utils/obd.c

HPE-bug-id: LUS-12519
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ifda4da9716108129bb59634612940d61abe69aa2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56283
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
4 months agoLU-18213 o2ib: dec connection count before peer 75/56275/2
Shaun Tancheff [Fri, 6 Sep 2024 05:51:12 +0000 (12:51 +0700)]
LU-18213 o2ib: dec connection count before peer

BUG: KFENCE: use-after-free write in \
kiblnd_destroy_conn+0x356/0x660 [ko2iblnd]

In kiblnd_destroy_conn() calling kiblnd_peer_decref()
could result in freeing the peer_ni.

Drop the connection counters before calling
kiblnd_peer_decref() to avoid use after free

HPE-bug-id: LUS-12513
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I85fcfc399ca38e9b85d9eff72314f0363e2a0666
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18192 nrs: move nid/uid/gid/opcode/jobid into generic key 27/56227/4
Qian Yingjin [Mon, 2 Sep 2024 08:48:12 +0000 (16:48 +0800)]
LU-18192 nrs: move nid/uid/gid/opcode/jobid into generic key

A more generic data struct ("nrs_tbf_key") is defined to store the
nid/uid/gid/opcode/jobid information of a TBF bucket (data struct
nrs_tbc_client).

Remove the old tc_key for the generic NRS TBF type. And replace
with this new data struct (nrs_tbf_key) instead.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I0f62f02eb358d1ef697bbacd2e8225956deff0ec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56227
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17000 lnet: lnet_inet_enumerate krealloc and kfree 50/56150/6
Shaun Tancheff [Mon, 26 Aug 2024 04:02:17 +0000 (11:02 +0700)]
LU-17000 lnet: lnet_inet_enumerate krealloc and kfree

CoverityID: 442369  ("Memory - corruptions, double free")
On realloc() failure free 'ifaces' which may already differ from
*dev_list due to a previous realloc(). Also nalloc is now zero.

CoverityID: 442378 ("Resource Leak")
Ensure lnet_inet_enumerate() ifaces are freed.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2d2762d86fcf070387b100115ad3a50bd2b2840b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
4 months agoLU-17000 utils: Coverity leaks and races 49/56149/4
Shaun Tancheff [Tue, 27 Aug 2024 02:59:35 +0000 (09:59 +0700)]
LU-17000 utils: Coverity leaks and races

CoverityID: 442365 ("time-of-check, time-of-use")
CoverityID: 442366 ("time-of-check, time-of-use")
Use fstat() on open file descriptor and use fcntl() to add
the desired O_NONBLOCK flag for pipes.

CoverityID: 442113 ("Resource leaks")
Ensure qctl allocated object is free'd in lfs_quota()

CoverityID: 442364: ("Null pointer dereferences")
In lfs_fid2path() ensure `path_or_fsname` is not null before
checking if it starts with a forward slash ('/')

CoverityID: 442116 ("Resource leaks")
In get_projid() close dir_fd as soon as it is no longer needed.

CoverityID: 442373 ("Memory - corruptions, overrun allocated")
Fix lov_forge_comp_v1() which also fails to clone FID

Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY="56ebb"
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2868d5ded322cd9cc890c463a494d296206d4be9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56149
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18163 obdclass: fix sysfs_memparse()/string_to_size() 35/56135/8
Etienne AUJAMES [Thu, 22 Aug 2024 15:46:19 +0000 (17:46 +0200)]
LU-18163 obdclass: fix sysfs_memparse()/string_to_size()

The patch reworks string_to_size() to avoid string copies and handle
default unit directly in __string_to_size().
Also, this would handle more gracefully fractional value with huge
unit (like 0.5EiB).

This implementation fixes the parsing of the following invalid
strings:
- "10.badbadMib"
- "10MiBbadbad"
- "10MBAD"
- "10.123badMib"
- "1024.KG"

The patch change the way to handle decimal fractional part: a maximum
of 9 digits are supported.

It fixes test_string_to_size_err() to actually return an error if a
test failed and then prevents the module to load.
It fixes obdclass_init() to avoid a crash if obd_init_checks()
failed.

Add regression tests in obd_init_checks.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ic20d11368fc7608637e8123d7c6c5a2ab2cf4a4b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10026 ldiskfs: mballoc to preserve preallocation's start 67/55467/2
Alex Zhuravlev [Tue, 4 Jun 2024 12:25:35 +0000 (15:25 +0300)]
LU-10026 ldiskfs: mballoc to preserve preallocation's start

.. used in dense preallocation. otherwise it's possible to lose
preallocated space when the corresponding cache bitmap is dropped
from the cache, then ldiskfs will be printing error messages
about block counter mismatch.

Fixes: 686dee707f ("LU-10026 osd-ldiskfs: use preallocation for dense writes")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I93177510af959e849dba7a9c35d81bc27809a31b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55467
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17432 libcfs: new CDEBUG_SLOW message type 39/55439/19
Frederick Dilger [Fri, 14 Jun 2024 00:39:28 +0000 (20:39 -0400)]
LU-17432 libcfs: new CDEBUG_SLOW message type

Created new CDEBUG_SLOW message type that will skip the first
SECONDS of messages, then continue printing to console as normal.

Specifically CWARN_SLOW and CERROR_SLOW have been created which
mirror the functionality of their respective CWARN and CERROR
functions but with the additional SLOW behavior.

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I905fdff795488ff937faf4d04d5d3d6fec24950a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55439
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-17939 ofd: validate FID in MDT/OFD 01/55401/11
Lai Siyao [Mon, 20 May 2024 21:20:34 +0000 (17:20 -0400)]
LU-17939 ofd: validate FID in MDT/OFD

OST object FID from other nodes can only be normal FID, IDIF, ECHO or
OST_MDT0, and MDT object FID can only be namespace visible FID or
local ROOT, return error if it's not true to avoid assertion later.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0c518c67acad44e90159fff71ff4fa9b893e8f3d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55401
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-14510 dom: fiemap support for DoM files 21/55221/6
Mikhail Pershin [Tue, 28 May 2024 11:02:23 +0000 (14:02 +0300)]
LU-14510 dom: fiemap support for DoM files

Patch adds support for fiemap to DoM files.
Server part:
- modify MDS_GET_IMFO handler to return FIEMAP like
  OST_GET_INFO does
- mdt_fiemap_get() to process fiemap request
Client part:
- rewrite lov_object_fiemap() to support DoM component
- rework fiemap_for_stripe() to work with both DoM and
  RAID0 layouts
- use initialized layout entries to get subobject and
  get rid of lov_find_subobj() used by fiemap only
- fix issue with wrong resume entry/stripe count
- mdc_object_fiemap() as implementation of .coo_fiemap
  cl_object_operations to send and receive fiemap request
- treat LOV subdev errors as UNKNOWN extent
- rework FID2PATH layout description to be compatible with
  other GET_INFO keys (no protocol changes)
- add sanity.sh test_130h for DoM fiemap with resuming

To indicate MDT device the extra bit is taken from stripe
number bits in favor of device number. So total absolute
stripe amount limit is 32768 in fiemap report

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9b6df04fd62d773aec2d916440ba08dfea06faa4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55221
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-15536 utils: add lfs somsync utility 98/49498/5
Jian Yu [Tue, 24 Sep 2024 07:29:19 +0000 (00:29 -0700)]
LU-15536 utils: add lfs somsync utility

This patch adds lfs somsync utility to synchronize
SOM xattr(s) for given FILE(s) or FID(s).

lfs somsync FILE ...
lfs somsync --by-fid MOUNT FID ...

Test-Parameters: trivial env=ONLY=807 testlist=sanity

Change-Id: Ie9ee39625d56ec026c89dcc0f27025904ca354e3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18228 build: specify kmp-moddir 86/56386/3
Shaun Tancheff [Tue, 13 Aug 2024 14:10:34 +0000 (21:10 +0700)]
LU-18228 build: specify kmp-moddir

On debian 12 when packaging kernel integration test modules
the installed location needs to be well known.

Update debian/rules to specify kmp-moddir as well as adjust
the expected location of the kernel module.

HPE-bug-id: LUS-12444
Test-Parameters: trivial
Fixes: bac5e458ee ("LU-17096 debian: add obd_test.ko, llog_test.ko to lustre-tests")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic431a78f372c6388907d1cc5db3ab39973cf31f0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
4 months agoLU-13062 llite: return stripe_offset -1 in trusted.lov 52/45252/10
Andreas Dilger [Fri, 15 Oct 2021 02:31:48 +0000 (20:31 -0600)]
LU-13062 llite: return stripe_offset -1 in trusted.lov

If the trusted.lov xattr is copied by userspace to be restored later,
this results in PFL files always being restored onto OST0000 because
the kernel replaces the lmm_layout_gen field with 0 to avoid confusion
in userspace between the layout generation and the stripe offset (both
use the same field, one for input, one for output).

Instead of always returning 0 for the layout generation, return -1
for PFL layouts, so that restoring the xattr will direct the MDS to
select an appropriate OST.

For setxattr attempt patch prevent setting specific offsets if
PFL layout has an initialized entry what is considered as layout
copy attempt.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia1af2bfcfa41cf1593aab44fe2fa792c3d254035
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18199 scripts: fix to always add local route 19/56819/2
Serguei Smirnov [Tue, 29 Oct 2024 20:36:14 +0000 (13:36 -0700)]
LU-18199 scripts: fix to always add local route

To avoid disruption of network connectivity during LNet start-up,
fix gateway selection logic in ksocklnd-config script so that
the local route is added along with the default route to the
custom routing table created by this script.

Another change in this patch makes sure that link-local IPv6
addresses are ignored, because they are ignored by LNet anyway

Fixes: 7f60b2b55 ("LU-17006 lnet: set up routes for going across subnets")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I341a40268298709c772bad84ddf4a4fae645b48f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56819
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17906 pltrpc: don't use non-uptodate peer at connect 55/56755/3
Mikhail Pershin [Sun, 8 Sep 2024 08:10:55 +0000 (11:10 +0300)]
LU-17906 pltrpc: don't use non-uptodate peer at connect

If peer is not yet discovered then LNET puts messages into
pending queue until discovery is done. That pins ptlrpc
request as well, thus a connect RPC to not alive peer is
stuck until peer discovery timed out despite RPC timeout.
Moreover that means no connect attempt to other peers are
made for that time:

nids_stats:
   "192.168.252.112@tcp": { connects: 1, ... sec_ago: 31 }
   "192.168.252.113@tcp": { connects: 0, ... sec_ago: never }
   "192.168.252.115@tcp": { connects: 0, ... sec_ago: never }

After 30s it is still stuck with first NID and never tried
any other, despite connect RPC timeout is about 5-10s in
ptlrpc.

Patch prevents RPC stuck on non-uptodate peer just by
dropping such request in ptl_send_rpc(). That lets ptlrpc
to keep control over connection request expiration and new
connect attempts, so all peers are tried one by one until
some is ready.

Results with patch:
nids_stats:
   "192.168.252.112@tcp": { connects: 4, ... sec_ago: 9 }
   "192.168.252.113@tcp": { connects: 4, ... sec_ago: 4 }
   "192.168.255.115@tcp": { connects: 3, ... sec_ago: 14 }

After the same 30s we had 11 connect attempts with all
failover NIDs tried

Patch modifies also LNetPeerDiscovered() to consider
a local peer as uptodate and return error code instead of
boolean.

Import uptodate state is also not boolen now but shows
discovery status.

Was-Change-Id: I51d8973aa8475ce1930f292c42aa22c70cfc13db

Test-Parameters: env=ONLY=153a,ONLY_REPEAT=10 testlist=conf-sanity
Test-Parameters: testlist=recovery-small,sanity-flr
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I468faf6f4b0cf8ba0f4f810fe09a5f1165432d28
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56755
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18383 ksocklnd: Avoid TCP socket orphans in racy LNet 37/56737/5
Josh Samuelson [Sat, 26 Oct 2024 20:06:37 +0000 (15:06 -0500)]
LU-18383 ksocklnd: Avoid TCP socket orphans in racy LNet

LU-18137 3367ef3eb2 introduced code to handle upstream kernel changes
with regards to releasing kernel sockets.  Under heavy/racy LNet
hello transactions between servers and clients, client TCP sockets
can transition to a orphan state that is never reaped.  This commit
moves LU-18137's fix closer to the kernel socket allocation to
avoid this racy connection setup condition.

This commit also provides a backport of the kernel's socket
accounting function sock_inuse_add() for compatibility when vendor
kernels lack the static function definition in their kernel headers.
The lnet_compat.h file has been introduced for a place to put such
compatibility code.

Fixes: 3367ef3eb2 ("LU-18137 ksocklnd: Fix TCP socket cleanup")
Signed-off-by: Josh Samuelson <josh@1up.unl.edu>
Change-Id: I11fc0b53e051871da26c079e9aeb50976b20bbd3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56737
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Mark Roper <ropermar@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17962 mgc: free nidlist correctly 09/56709/7
Emoly Liu [Wed, 23 Oct 2024 00:53:00 +0000 (08:53 +0800)]
LU-17962 mgc: free nidlist correctly

Memory leak was found during interop testing because
nidlist was not freed correctly in function
mgc_apply_recover_logs().

Test-Parameters: testlist=conf-sanity env=ONLY=29 \
  serverversion=2.15 clientdistro=el8.10 serverdistro=el8.10 \
  mdscount=2 mdtcount=4 ostcount=8

Fixes: e4d2d4ff74 ("LU-13306 mgc: handle large NID formats")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I7a7a1b4b4f3b8c65c7608537e4dca1b9f1b68e77
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56709
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-14288 lnet: Introduce nidmasks 22/55922/13
Chris Horn [Fri, 26 Jul 2024 20:36:51 +0000 (14:36 -0600)]
LU-14288 lnet: Introduce nidmasks

A nidmask is like a netmask, except it applies to IPv4 or IPv6 LNet
NIDs.

Nidmasks use the existing nidlist infrastructure so any caller of
cfs_parse_nidlist() can include a nidmask in the argument and match
NIDs against it using cfs_match_nid(), or convert it back to a string
with cfs_print_nidlist().

For example, "192.168.1.1@tcp/24" is equivalent to the nidrange
"192.168.1.[1-254]@tcp", and "2001::1@tcp/126" is equivalent to
"2001::@tcp 2001::1@tcp 2001::2@tcp 2001::3@tcp".

cfs_parse_nidrange() is modified to treat an IPv6 address as
equivalent to a netmask with prefix length of /128. Thus,
cfs_parse_nidlist() can now be used with lists of IPv6 addresses.

The user and kernel space implementations of cfs_parse_nidlist(),
et. al. have been modified to more closely match each other. Namely,
char * is used instead of the struct cfs_lstr and a length argument
is added to the kernel space cfs_parse_nidlist. Callers are adjusted
accordingly.

conf-sanity.sh/test_43a is modified to generate nidmasks that contain
the client's NID and verify that this is handled correctly when
nosquash_nids is set to the nidmask.

lnetctl debug nidlist command is added to facilitate testing of the
userspace code.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=43a
Test-Parameters: testlist=conf-sanity env=ONLY=43a,FORCE_LARGE_NID=true,LOAD_MODULES_REMOTE=true
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id9d0bc6f4f8b977591f0b6f88bda46ae03cb58d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55922
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
4 months agoLU-18120 ldiskfs: improve a quota speedup patch 79/55979/2
Alexey Lyashkov [Fri, 9 Aug 2024 08:33:42 +0000 (11:33 +0300)]
LU-18120 ldiskfs: improve a quota speedup patch

current patch might have a small window to make write duplicate,
lets avoid it.

HPe-bug-id: LUS-12432
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I8de2e675d3bf5d5cbb41b25b04affe1d5e0d6411
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18176 idl: reserve OBD_CONNECT2_UPDATE_LAYOUT flags 81/56181/8
Bobi Jam [Wed, 28 Aug 2024 14:02:42 +0000 (22:02 +0800)]
LU-18176 idl: reserve OBD_CONNECT2_UPDATE_LAYOUT flags

Reserve some bits to be used in CSDC:
- OBD_CONNECT2_UPDATE_LAYOUT to negotiate new intent flag
- LAYOUT_INTENT_CHANGE to allow changing layout flags
- LAIF_INCOMPRESSIBLE to set LCME_FL_NOCOMPR flag in layout

Test-Parameters: trivial
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ifabeb6ce2669f9c88e1bba176b9ecc75e9b2b935
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56181
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18214 ldlm: change flock deadlock detection 19/56319/7
Yang Sheng [Tue, 10 Sep 2024 16:14:21 +0000 (00:14 +0800)]
LU-18214 ldlm: change flock deadlock detection

The flock deadlock detection code thought request lock
same as blocking lock is a bug. In fact, this is a case
of cycling chain. So we should treat it as a deadlock
case. Also clean up the reprocess code.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Icf0df4ac281c2cdb6cc57cb79db137d39ecef9e6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56319
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoPost-branch-off of 2.16 master is now 2.17 2.16.50 v2_16_50
Oleg Drokin [Tue, 12 Nov 2024 05:16:33 +0000 (00:16 -0500)]
Post-branch-off of 2.16 master is now 2.17

Change-Id: I15deb804e67969e7fd97fb3620f78ec835cfd51a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
5 months agoNew release 2.16.0 2.16.0 v2_16_0
Oleg Drokin [Fri, 8 Nov 2024 20:45:12 +0000 (15:45 -0500)]
New release 2.16.0

Change-Id: I8059e5e445f200fb77600213c0d0b4c2046ee2ae
Signed-off-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17251 tests: force test_rr_alloc new seq 13/56913/2
Andreas Dilger [Thu, 7 Nov 2024 04:49:52 +0000 (21:49 -0700)]
LU-17251 tests: force test_rr_alloc new seq

Force parallel-scale test_rr_alloc to create a new OST sequence
if the number of precreated objects is low.  This should ensure
that the number of available objects is enough to complete the
test, and avoids OST object creation being blocked in the middle
of the test while the SEQ number rolls over.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=50
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I636a488e575d27ac235749911f171d5e1e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56913
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17251 tests: try to fix test_rr_alloc again 53/56853/10
Andreas Dilger [Mon, 4 Nov 2024 20:01:56 +0000 (12:01 -0800)]
LU-17251 tests: try to fix test_rr_alloc again

Try to fix parallel-scale test_rr_alloc again:
- ensure that the test directory is striped over all MDTs to
  maximize the number of precreated objects available
- ensure the FID SEQ has OIDs to not run out during this test,
  which caused some OSPs to run out of objects during creation

Add debugging to understand issue more if it continues to fail:
- print lfs df, lfs df -i at start and error to show imbalance
- delete files only in cleanup_rr_alloc() so distribution can be shown

Clean up test script to ensue SEQ is large enough to run test.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testgroup=review-dne-part-9 env=RACER_EXCEPT="1 2"
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=50
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I636a488e575d27ac235749911f171d5e1e33e310
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56853
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17712 tests: allow multiop to return 0 in recovery-small/157 84/56884/2
Andreas Dilger [Tue, 5 Nov 2024 00:42:38 +0000 (16:42 -0800)]
LU-17712 tests: allow multiop to return 0 in recovery-small/157

This patch allows multiop to return 0 in recovery-small/157,
which is conserdered as success because the mmaped I/O
did not hang after the client was evicted.

The patch also skips recovery-small/157 for older MDS since
the test depends on changes made to the MDS code on the server.

Test-Parameters: trivial testlist=recovery-small serverversion=2.15.5

Fixes: 71f8e5d650 ("LU-14708 ptlrpc: skip unnecessary client eviction")
Change-Id: I19d0362ffbccb44c81ec2e09d9b4eccba40b9dcf
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56884
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18394 test: adjust step for different OS Arch 20/56820/5
Hongchao Zhang [Sat, 5 Oct 2024 16:12:08 +0000 (00:12 +0800)]
LU-18394 test: adjust step for different OS Arch

in test_49 of sanity-quota, the Bash could fail if the data to be
processsed by "eval" is big under PPC64le, decrease the data size
to be suitable for different Arches.

Test-Parameters: trivial testlist=sanity-quota env=SLOW=yes,ONLY=49,ONLY_REPEAT=10 clientdistro=el8.9 clientarch=ppc64le serverdistro=el8.8
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6fef841211662c3518caf97835598dc26beaea1f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56820
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18288 tests: lru_resize_disable sets lru_max_age 42/56642/14
Andreas Dilger [Thu, 10 Oct 2024 02:01:25 +0000 (20:01 -0600)]
LU-18288 tests: lru_resize_disable sets lru_max_age

sanity test_120* has started failing much more regularly since the
lru_max_age=600s was made the default.  This is caused by DLM locks
being aged out of the LRU during the test and confuses the result.

Set lru_max_age to the old (65 min) limit in lru_resize_disable()
for tests that don't want locks to be cancelled during the subtest.
Register a stack_trap in lru_resize_disable() to reset lru_max_age
to the old value, so that the caller does not need to remember this.

Resetting lru_size to the "old" value cannot be done directly, since
it returns the number of locks in the LRU, and not the LRU size limit.
Instead, register lru_resize_enable() with stack_trap() to reset it.

Add debugging to test cases that were failing with earlier versions
of this patch, to help understand similar failures in the future.

Script code style fixes in modified subtests.

Fixes: 357cae970c ("LU-17428 ldlm: reduce default lru_max_age")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I82ee177eae14f3030a9e92e3aca86e4c47401ff5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56642
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18402 tests: skip recovery-small/{153,154a} for older MDS 29/56829/5
Jian Yu [Fri, 1 Nov 2024 16:57:38 +0000 (09:57 -0700)]
LU-18402 tests: skip recovery-small/{153,154a} for older MDS

Skip recovery-small/{153,154a} for older MDS since
the test depends on changes made to the MDS code on the server.

Test-Parameters: trivial testlist=recovery-small serverversion=2.15.5

Fixes: 654d5f3fa4 ("LU-16478 target: disconnected export")
Fixes: e818052444 ("LU-17365 lod: handle llog errors gracefuly")

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I3f26fae9d8d3d338f0e0860ea10f5fb762d70640
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56829
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-18320 tests: add skip option to sanity-lnet test_226 39/56839/3
Serguei Smirnov [Thu, 31 Oct 2024 00:04:51 +0000 (17:04 -0700)]
LU-18320 tests: add skip option to sanity-lnet test_226

Update sanity-lnet test_226 to be skipped if peer doesn't
have the refcount fix from LU-17440.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 2b210f3905 ("LU-17440 lnet: prevent errorneous decref for asym route")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I274668048f3df27da1b226e9fb1966dbbd877713
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56839
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17527 tests: fix sanity/255a decimal comparison 36/56836/2
Andreas Dilger [Wed, 30 Oct 2024 22:10:56 +0000 (16:10 -0600)]
LU-17527 tests: fix sanity/255a decimal comparison

Don't use fractional percentages in the sanity test_255a performance
comparison, since bash (( ... )) cannot compare numbers with decimal
points properly.  Instead, just compute the percentage speedup with
whole numbers, since the test discards anything less than 20% speedup
and a fraction of a percent will not make much difference here.

Test-Parameters: trivial
Fixes: bdd470ff97 ("LU-9069 tests: improve output of sanity test_255a")
Fixes: 395f3e1a55 ("LU-15316 tests: use integers in sanity test_255a")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id3b37e07168ee2590e52d01f66336027254ced55
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56836
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18286 tests: auster node.yml labels rocky9 as RHEL 80/56580/3
Charlie Olmstead [Tue, 1 Oct 2024 17:02:05 +0000 (11:02 -0600)]
LU-18286 tests: auster node.yml labels rocky9 as RHEL

release() assumes a node with /etc/redhat-release is RHEL.
This patch removes reading this file in favor of os-release.
Reading of centos-release (if present) still required as
os-release for centos distros doesn't include minor version.

Test-Parameters: trivial
Signed-off-by: Charlie Olmstead <charlie@whamcloud.com>
Change-Id: I888f8eeacaf843120b2beb134292047b3907a9a6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56580
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18310 tests: add debugging to test_metabench 82/56582/4
Andreas Dilger [Fri, 4 Oct 2024 07:15:03 +0000 (01:15 -0600)]
LU-18310 tests: add debugging to test_metabench

Both parallel-scale and parallel-scale-nfs are intermittently
failing test_metabench with "No space left on device" (ENOSPC)
or "Disk quota exceeded" (EDQUOT), even though this test is
creating only about 10-20k files.

Add some debugging to see where all of the space has gone, and
what quota limits are being set.  It may be that some earlier
test (e.g. compilebench) is leaving too much junk behind.

The failure rate is very low (only 2/637 runs in the past 4 weeks),
so it likely needs to be landed to catch a failure.

Test-Parameters: trivial testlist=parallel-scale
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testgroup=full-part-1
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie35ae677032ccc8113cbad5dc5a7b0504149717f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56582
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Elena <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18298 tests: skip sanity-pcc/test_1{c,d} on SLES15 SP5 09/56809/2
Qian Yingjin [Tue, 29 Oct 2024 03:25:33 +0000 (11:25 +0800)]
LU-18298 tests: skip sanity-pcc/test_1{c,d} on SLES15 SP5

Skip the sanity-pcc/test_1{c, d} failure on SLES15 SP3 - SP5.
However, they passed on SLES15 SP6.

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I08fd0a192307f0072cc82033958dd8239ea507d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56809
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18393 tests: $num_files should be multiple of $num_entries 85/56785/3
Emoly Liu [Fri, 25 Oct 2024 07:05:33 +0000 (15:05 +0800)]
LU-18393 tests: $num_files should be multiple of $num_entries

According to performance-sanity.sh test_4 failure
"md_validate_tests, items must be a multiple of items per directory",
set $num_files to be a multiple of $num_entries.

Test-Parameters: trivial testlist=performance-sanity
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I8c635649ef016389d1bd22f8318a55f8d0f77962
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56785
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18379 tests: wait for stats to expire in conf-sanity/127 26/56726/8
Alex Zhuravlev [Thu, 17 Oct 2024 18:22:11 +0000 (21:22 +0300)]
LU-18379 tests: wait for stats to expire in conf-sanity/127

fs stats are not updated immediately on the client,
thus we need to wait.

Test-Parameters: env=ONLY=127,ONLY_REPEAT=20 testlist=conf-sanity
Test-Parameters: env=ONLY=127,ONLY_REPEAT=20 testlist=conf-sanity
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I28cf407f9fe4df1f46af8cd88f50670bb8f0d93f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56726
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18402 tests: skip recovery-small/155 for older MDS 95/56795/2
Jian Yu [Sat, 26 Oct 2024 03:56:45 +0000 (20:56 -0700)]
LU-18402 tests: skip recovery-small/155 for older MDS

Skip recovery-small test 155 for MDS < 2.15.58.110 since
the test depends on changes made to the MDS code on the server.

Test-Parameters: trivial testlist=recovery-small env=ONLY="155 157" \
  serverversion=2.15.5

Fixes: 71f8e5d650 ("LU-14708 ptlrpc: skip unnecessary client eviction")
Change-Id: I44137ebbfb1ec0f9a6a1cf1b42cd211caa146009
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56795
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoNew RC 2.16.0-RC5 2.16.0-RC5 v2_16_0-RC5
Oleg Drokin [Sat, 26 Oct 2024 23:58:55 +0000 (19:58 -0400)]
New RC 2.16.0-RC5

Change-Id: I677f9f232df161eb37fc917469a6c23f48716b55
Signed-off-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17525 llite: soft fail unaligned dio 75/56775/6
Shaun Tancheff [Thu, 24 Oct 2024 06:58:41 +0000 (13:58 +0700)]
LU-17525 llite: soft fail unaligned dio

Skip unaligned DIO and pass io as buffered I/O rather than
hard fail unaligned DIO.

Test-Parameters: testlist=sanity serverversion=2.14 env=SANITY_EXCEPT="65n 211 413"
Fixes: ff018bb77a ("LU-18284 llite: disallow udio exceptions")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I638f62ec96abc3032da5fbcf895cd835022fd759
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56775
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-18389 tests: skip added sanity/65n checks 86/56786/4
Andreas Dilger [Fri, 25 Oct 2024 07:54:01 +0000 (01:54 -0600)]
LU-18389 tests: skip added sanity/65n checks

Skip checks added at the end of sanity test_65n for MDS versions
that do not have the new layout inheritance behavior.

Test-Parameters: trivial env=ONLY=65n testlist=sanity serverversion=2.14
Fixes: 6e59408f1a ("LU-12130 lod: make pool inheritance policy more consistent")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic7c36801ec6e906d631bc4fc234f1f2b77e9f7dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56786
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
5 months agoLU-17893 tests: wait destroys before replay-dual:test_28 12/56712/4
Vladimir Saveliev [Wed, 16 Oct 2024 14:45:37 +0000 (17:45 +0300)]
LU-17893 tests: wait destroys before replay-dual:test_28

replay-dual.sh:test_28() should take care that it drops only own
blocking ast. If test_26() ran before, there may be pending destroys
when test_28() runs. Dropping of blocking asts for destroys makes
replay-dual.sh to get accompanied with:

  watchdog stack traces:
  [169376.453554] Lustre: ll_ost00_057: service thread pid 236757 was
  inactive for 40.816 seconds. Watchdog stack traces are limited to 3
  per 300 seconds, skipping this one.
  [169376.461659] [<0>] ldlm_completion_ast+0x99b/0xc00 [ptlrpc]
  [169376.461782] [<0>] ldlm_cli_enqueue_local+0x302/0x890 [ptlrpc]
  [169376.461888] [<0>] ofd_destroy_by_fid+0x29c/0x570 [ofd]
  [169376.461906] [<0>] ofd_destroy_hdl+0x22c/0x960 [ofd]

  lock timeouts:
  [169638.155933] LustreError:
  236757:0:(ldlm_request.c:104:ldlm_expired_completion_wait()) ###
  lock timed out (enqueued at 1729087746, 303s ago); not entering
  recovery in server code, just going back to sleep ns..

  and system overload indications:
  [169852.021044] Lustre: ll_ost00_052: service thread pid 236555
  completed after 516.964s. This likely indicates the system was
  overloaded (too many service threads, or not enough hardware
  resources).

Wait for completion of destroys before starting test_28().

Test-Parameters: trivial testlist=replay-dual
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I837579a428d8c2383fe884961d356ff417fc3f2e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56712
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
5 months agoLU-14330 tests: wait for orphan thread to exit 59/56559/3
Andreas Dilger [Tue, 1 Oct 2024 21:06:06 +0000 (15:06 -0600)]
LU-14330 tests: wait for orphan thread to exit

It may take a few seconds for the orphan cleanup thread to finish.
Wait for the thread to exit rather than failing the test.

Test-Parameters: trivial
Fixes: a1e6e75a82 ("LU-12846 tests: verify orphan upgrade compatibilty")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I604be1b9f8f460d9183ba1aaddd3b77e153ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15553 test: mkdir_on_mdt0 in replay-dual 65/51665/6
Lai Siyao [Sat, 8 Jul 2023 22:32:29 +0000 (18:32 -0400)]
LU-15553 test: mkdir_on_mdt0 in replay-dual

Several subtests in replay-dual require test dir created on MDT0,
replace mkdir with mkdir_on_mdt0. These subtests are found by script:
grep -C 10 -n "do_facet.*SINGLEMDS" lustre/tests/*.sh | grep -w mkdir

Fixes: b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-dual,replay-dual,replay-dual
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib28cf35575546c61bb7fa1b2c8a87ac31bd1ad4e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51665
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15553 test: mkdir_on_mdt0 in conf-sanity.sh 39/56539/3
Lai Siyao [Mon, 23 Sep 2024 01:10:20 +0000 (21:10 -0400)]
LU-15553 test: mkdir_on_mdt0 in conf-sanity.sh

Change mkdir to mkdir_on_mdt0 in several conf-sanity.sh sub tests.

Fixes: b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")
Test-Parameters: trivial testlist=conf-sanity mdtcount=4
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I5ace9df10e725802ba502ca20c60afc708b857cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56539
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15553 test: mkdir_on_mdt0 in replay-vbr.sh 40/56540/5
Lai Siyao [Mon, 23 Sep 2024 01:17:40 +0000 (21:17 -0400)]
LU-15553 test: mkdir_on_mdt0 in replay-vbr.sh

Change mkdir to mkdir_on_mdt0 in several replay-vbr.sh sub tests.

Fixes: b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")
Test-Parameters: trivial testlist=replay-vbr mdtcount=4
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7457c155bbadb86adf8272113a4e4202b98c20a5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16870 tests: Make 413a/b work on server 2.14 with client master 68/56768/4
Arshad Hussain [Wed, 23 Oct 2024 11:08:52 +0000 (07:08 -0400)]
LU-16870 tests: Make 413a/b work on server 2.14 with client master

This patch makes 413a/b interop work on server 2.14
with client master.

First, this patch removes general redirect of cmd execution
(fallocate/dd) to dev/null from generate_uneven_mdts(). For
'dd' it might get little verbose however on failure it will
dump output to stdout giving more info.

Second, it adds check under check_fallocate_supported() to check
is fallocate is issued on MDS. If yes it check for version when
this feature was added. If not it falls back to 'dd' command

Third, under unload_modules() it check for version where
unload_modules_local() was added. If not it falls back to
non-unload_modules_local() version to unload module.

Test-Parameters: trivial testlist=sanity serverversion=2.14 env=ONLY=413a,413b,ONLY_REPEAT=10
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8a8843dd32f7e88d6d0938b67ce24353c9f9cb65
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56768
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>