Whamcloud - gitweb
fs/lustre-release.git
19 months agoLU-16037 build: remove quotes from %{mkconf_options} 44/48044/4
Jian Yu [Wed, 3 Aug 2022 21:15:06 +0000 (14:15 -0700)]
LU-16037 build: remove quotes from %{mkconf_options}

This patch fixes lustre-dkms.spec.in to remove quotes
from %{mkconf_options} passed to dkms.mkconf, so as to
resolve the following build issue:

dkms.conf: Error! Directive 'DEST_MODULE_LOCATION'
does not begin with '/kernel', '/updates', or '/extra'
in record #0.

Test-Parameters: trivial

Change-Id: I0b365d7a96cb632680bc2321e87b28a3bf076e47
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48044
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoNew tag 2.15.51 2.15.51 v2_15_51
Oleg Drokin [Mon, 8 Aug 2022 19:57:22 +0000 (15:57 -0400)]
New tag 2.15.51

Change-Id: I2fc9e9fae7a975f047528c4670276b239d77ac26
Signed-off-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15994 llite: use fatal_signal_pending in range_lock 06/48106/2
Qian Yingjin [Tue, 2 Aug 2022 09:14:48 +0000 (05:14 -0400)]
LU-15994 llite: use fatal_signal_pending in range_lock

FIO io_uring failed with one file shared by two FIO processes
under Unubtu 2204 kernel.
After analyzed, we found that range_lock() function return
-ERESTARTSYS when there pending signal on current process in
Lustre I/O. This causes -EINTR returned to the application.

we solve this bug by replacing @signal_pending(current) with
@fatal_signal_pending(current) in range_lock(). The range_lock()
function only returns -ERESTARTSYS when the current process has
fatal pending signal such as SIGKILL.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I0a0be8fa3b4ba5c89f7866286b2bdc6595f18026
Reviewed-on: https://review.whamcloud.com/48106
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16042 tests: can not get cache size on Arm64 30/48030/2
Kevin Zhao [Mon, 25 Jul 2022 07:53:44 +0000 (15:53 +0800)]
LU-16042 tests: can not get cache size on Arm64

This fix the test fail on Arm64, the cache size can not be
display on /proc/cpuinfo. And even in the VM and somee
older Arm64 CPU, we can not get the cachesize. So it's
better to fallback to a pre-set value here if we don't get
the cache size.

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Change-Id: I17ce1d8accc69d1489db2071a2741b3927fff302
Reviewed-on: https://review.whamcloud.com/48030
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16019 llite: fully disable readahead in kernel I/O path 93/47993/4
Qian Yingjin [Wed, 20 Jul 2022 02:22:35 +0000 (22:22 -0400)]
LU-16019 llite: fully disable readahead in kernel I/O path

In the new kernel (rhel9 or ubuntu 2204), the readahead path may
be out of the control of Lustre CLIO engine:

generic_file_read_iter()
  ->filemap_read()
    ->filemap_get_pages()
      ->page_cache_sync_readahead()
        ->page_cache_sync_ra()

void page_cache_sync_ra()
{
if (!ractl->ra->ra_pages || blk_cgroup_congested()) {
if (!ractl->file)
return;
req_count = 1;
do_forced_ra = true;
}

/* be dumb */
if (do_forced_ra) {
force_page_cache_ra(ractl, req_count);
return;
}
...
}

From the kernel readahead code, even if read-ahead is disabled
(via @ra_pages == 0), it still issues this request as read-ahead
as we will need it to satisfy the requested range. The forced
read-ahead will do the right thing and limit the read to just
the requested range, which we will set to 1 page for this case.

Thus it can not totally avoid the read-ahead in the kernel I/O
path only by setting @ra_pages with 0.
To fully disable the read-ahead in the Linux kernel I/O path, we
still need to set @io_pages to 0, it will set I/O range to 0 in
@force_page_cache_ra():
void force_page_cache_ra()
{
...
max_pages = = max_t(unsigned long, bdi->io_pages,
    ra->ra_pages);
nr_to_read = min_t(unsigned long, nr_to_read, max_pages);
while (nr_to_read) {
...
}
...
}

After set bdi->io_pages with 0, it can pass the sanity/101j.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I859a6404abb9116d9acfa03de91e61d3536d3554
Reviewed-on: https://review.whamcloud.com/47993
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15938 llog: llog_reader to detect more corruptions 34/47934/6
Mikhail Pershin [Tue, 12 Jul 2022 06:40:38 +0000 (09:40 +0300)]
LU-15938 llog: llog_reader to detect more corruptions

Improve llog_reader to determine more corruptions and report
errors
 - notify if llog bitmap has bits set with no records in llog
 - compare header records count with amount of records really
   found
 - fix amount of records to output, preventing wrong output of
   NOT SET record
 - list missing records in gap if found
 - count all errors found, add prefix 'error:' in output for
   better output processing by third-party scripts
 - don't exit immediately in case of error but continue if
   possible and output all read valid data

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic47dc6bb6cbdd9db6f888a0b892254403a628912
Reviewed-on: https://review.whamcloud.com/47934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15996 quota: change 'none' to 'expired' for grace 12/47912/7
Hongchao Zhang [Thu, 28 Jul 2022 13:56:55 +0000 (21:56 +0800)]
LU-15996 quota: change 'none' to 'expired' for grace

If the grace time is expired, the output of grace in 'lfs quota'
is better to use 'expired' than 'none'.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Honghao Zhang <hongchao@whamcloud.com>
Change-Id: I7a3fac77ca6e16ad406bef0bd7642d6d50feb4b2
Reviewed-on: https://review.whamcloud.com/47912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15896 gss: support OpenSSLv3 17/47717/11
Sebastien Buisson [Mon, 13 Jun 2022 12:41:11 +0000 (14:41 +0200)]
LU-15896 gss: support OpenSSLv3

Lustre GSS code makes use of some OpenSSL API that has been
deprecated in v3, namely all the functions in the DH_* family.
So replace them with their EVP_PKEY_* counterparts if Lustre is
built on a system with OpenSSLv3.

Fixes: ee60c14360 ("LU-15896 gss: ignore OpenSSLv3 deprecated API")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I78a4ca18b25aca3c34fe84e41413a33caddc01b6
Reviewed-on: https://review.whamcloud.com/47717
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-9859 lfsck: use Linux bitmap API 79/47579/11
James Simmons [Mon, 11 Jul 2022 16:41:58 +0000 (12:41 -0400)]
LU-9859 lfsck: use Linux bitmap API

Replace the use of the libcfs specific bitmap API used by lfsck
with the standard Linux bitmap API.

Change-Id: Icc0d9d2ceb9ca7b4b94dd728d9b9c499cf4d2414
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47579
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14975 utils: non-recursive dir migration fix 12/47012/2
Lai Siyao [Wed, 30 Mar 2022 06:28:04 +0000 (02:28 -0400)]
LU-14975 utils: non-recursive dir migration fix

If sem_init() doesn't return 0, llapi_semantic_traverse() won't
call sem_fini() in directory traverse, therefore
cb_migrate_mdt_init() shouldn't increase param->fp_depth if it
reaches max depth in non-recursive mode.

Update sanity 230w.

Fixes: 5604a6d270b ("LU-14975 dne: dir migration in non-recursive mode")
Signed-off-by: Lai Syao <lai.siyao@whamcloud.com>
Change-Id: I8814aaae7c267cec51654175f9fa0708f7685a5a
Reviewed-on: https://review.whamcloud.com/47012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15486 lod: mirroring a plain file in mirrored-layout dir 17/46517/2
Bobi Jam [Mon, 14 Feb 2022 11:50:04 +0000 (19:50 +0800)]
LU-15486 lod: mirroring a plain file in mirrored-layout dir

If a file does not have a mirror in a directory with a default FLR
mirror, then "lfs mirror extend" on the file fails with
"cannot create volatile file: Invalid argument".

This comes from the the non-striped file layout generated from
LOD inheriting its FLR state from the default FLR while it contains
no mirror in it, and lov_init_composite() will complain about it.

 if (equi(flr_state == LCM_FL_NONE, comp->lo_mirror_count > 1))
         RETURN(-EINVAL);

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5e849acb2327ce735d0008271bfd48fa7293161c
Reviewed-on: https://review.whamcloud.com/46517
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8837 lustre: make uapi...lustre_disk.h unnecessary on client 94/41994/14
Mr NeilBrown [Sun, 8 May 2022 22:09:31 +0000 (18:09 -0400)]
LU-8837 lustre: make uapi...lustre_disk.h unnecessary on client

uapi/linux/lustre/lustre_disk.h doesn't contain anything that is
needed for client-only code, but that code doesn't compile with the
file excluded, largely due to dependency on IS_SERVER() and related
macros.

So for client-only code provide stubs for IS_SERVER() and related
macros, and don't include the uapi...lustre_disk.h file.

This will cause some code to now be compiled-out on client-only, and
allows some #ifdefs to be removed.

A few function need to be protected with #ifdef HAVE_SERVER_SUPPORT,
and llog_server.o needs to be disabled for client-only compiles.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I19c5b1612108e448f8b6a1fe3d3a448aa4abdd2a
Reviewed-on: https://review.whamcloud.com/41994
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8238 ldlm: rid of obsolete param of ldlm_resource_get() 31/20631/10
Bobi Jam [Thu, 2 Sep 2021 15:44:57 +0000 (23:44 +0800)]
LU-8238 ldlm: rid of obsolete param of ldlm_resource_get()

The second parameter @parent of ldlm_resource_get() is obsolete, just
remove it.

Test-Parameters: trivial
Change-Id: I88af99c6984eda50a21da4d516ce7dea8cba60f5
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/20631
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-12511 utils: fix regression for UAPI headers for native client 03/47803/6
James Simmons [Mon, 27 Jun 2022 15:24:19 +0000 (11:24 -0400)]
LU-12511 utils: fix regression for UAPI headers for native client

A patch landed to add wiretest for the GSS wire protocol which is
lacking for the native client. Add disabling the new test code
for the native Linux client build.

Test-Parameters: trivial
Fixes: 7dfbc71350 ("LU-9243 gss: fix GSS struct definition badness")
Change-Id: I31c387b757a77f4503b923c784911afc16c878a0
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-9680 utils: fix Netlink / YAML library handling 02/47802/11
James Simmons [Sat, 16 Jul 2022 19:34:48 +0000 (15:34 -0400)]
LU-9680 utils: fix Netlink / YAML library handling

Testing the implementation of Netlink with lustre with early
code revealed some userland lnetconfig library bugs. Several bugs
fixed are:

1) First issue was that the YAML parser and emitter can share an
   netlink socket. This means the netlink callbacks will expect
   that same void argument passed in. We were in the case of error
   handling expecting an struct yaml_netlink_output but all other
   callbacks were using struct yaml_netlink_input. This mismatch
   can cause the application to segfault. So move all netlink
   callback handling to use just the yaml_parser. The yaml
   emittter now is used to just send Netlink packets to the
   kernel.

   Also fix the Netlink ext_ack error message handling.

2) In my board testing I found various bugs related to the
   paring of the YAML to create Netlink packet to send to the
   kernel. This patch resolves all the known issues. Most
   related to the complex layering of sequences, mappings and
   flows in a YAML block.

3) Fix up nla_strlcpy autoconf test which always fails with
   Oleg's special setup.

4) Add a Netlink protocol version YAML function.

Test-Parameters: trivial
Change-Id: I7e7c755ceaa969dffff8c6f771c2ac048dc55720
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47802
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Joe Atzinger <joseph.atzinger@microsoft.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8367 osp: enable replay for precreation request 89/46889/21
Alexander Boyko [Tue, 22 Mar 2022 12:09:01 +0000 (08:09 -0400)]
LU-8367 osp: enable replay for precreation request

Lustre has some kind of deadlock between osp_precreate_thread()
and stripe allocation at osp_precreate_reserve(). Stripe allocation
thread allocated objects and sleeps for more objects at
osp_precreate_reserve() in case of OST failover. After reconnection,
osp_precreate_thread() calls osp_precreate_cleanup_orphans() to
synchronize last id and clean-up unused objects, but it waits zero
object reservation(d->opd_pre_reserved). So, no more objects could
be created at OST and no reserved objects could be freed.
This produce slow creates messages and MDT creation threads hang
osp_precreate_reserve()) kjcf05-OST0003-osc-MDT0001: slow creates,
 last=[0x340000400:0x23a4f483:0x0], next=[0x340000400:0x23a4f378:0x0],
 reserved=267, sync_changes=0, sync_rpcs_in_progress=0, status=0
The issue reproduced more often with over stripe feature.

No need to do orphan clean-up phase when MDT supports
resend/replay for precreation request. This behaviour resolves the
osp_precreate_cleanup_orphans() hang and unblocks objects creation.

Force creation logic is added to support reformatted OST with a same
index. It was done during orphan clean-up phase before this.

Sanity tests 27S and 822 become invalid. 27S is based on orphan
clean-up after reconnection, 822 is based on not resendable
OST_CREATE request. These tests are removed.

HPE-bug-id: LUS-10793
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I21287b51252e573e796fac69ee3df6ac90e28c10
Reviewed-on: https://review.whamcloud.com/46889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15850 lmv: always space-balance r-r directories 78/47578/6
Lai Siyao [Thu, 9 Jun 2022 11:44:41 +0000 (07:44 -0400)]
LU-15850 lmv: always space-balance r-r directories

If the MDT free space is imbalanced, use QOS space balancing for
round-robin subdirectory creation, regardless of the depth
of the directory tree.  Otherwise, new subdirectories created
in parents with round-robin default layout may suddenly become
"sticky" on the parent MDT and upset the space balancing and
load distribution.

Add sanity/test_413h to check that round-robin dirs always balance.

Test-Parameters: testlist=sanity env=ONLY=413h,ONLY_REPEAT=100
Fixes: 38c4c538f5 ("LU-15216 lmv: improve MDT QOS space balance")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia1d0b5b1a027cf14236f93ae34b5cf4929e76d23
Reviewed-on: https://review.whamcloud.com/47578
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15846 tests: don't use comma-separated debug flags 08/47308/7
Andreas Dilger [Thu, 12 May 2022 04:45:45 +0000 (22:45 -0600)]
LU-15846 tests: don't use comma-separated debug flags

To avoid test interop issues between 2.15 clients and 2.12/2.14
servers, don't use comma-separated debug flags in sanity-quota.sh
quota_init() and quota_fini().

Test-Parameters: trivial testlist=sanity-quota.sh env=ONLY=0 serverversion=2.14.0
Fixes: 6b6fde1026 ("LU-13055 libcfs: allow comma-separated masks")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifca39054d14292bca8bcff9b8e03ae58fd5cc3a8
Reviewed-on: https://review.whamcloud.com/47308
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14651 ldiskfs: add 5.11 kernel support 00/47900/3
James Simmons [Wed, 6 Jul 2022 17:38:17 +0000 (11:38 -0600)]
LU-14651 ldiskfs: add 5.11 kernel support

The Ubuntu 20.04.3 LTS moved to the 5.11 kernel. Support for this
kernel is a small step from the 5.10 kernel support. This patch
adds these small changes to support ldiskfs.

Test-Parameters: trivial
Change-Id: I3055736658b628fe79a6a9fc20ac01e7e1597630
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47900
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15821 ldlm: Prioritize blocking callbacks 15/47215/9
Patrick Farrell [Thu, 5 May 2022 00:50:57 +0000 (20:50 -0400)]
LU-15821 ldlm: Prioritize blocking callbacks

The current code places bl_ast lock callbacks at the end of
the global BL callback queue.  This is bad because it
causes urgent requests from the server to wait behind
non-urgent cleanup tasks to keep lru_size at the right
level.

This can lead to evictions if there is a large queue of
items in the global queue so the callback is not serviced
in a timely manner.

Put bl_ast callbacks on the priority queue so they do not
wait behind the background traffic.

Add some additional debug in this area.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic6eb65819a4a93e9d30e807d386ca18380b30c7d
Reviewed-on: https://review.whamcloud.com/47215
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15850 llite: pass dmv inherit depth instead of dir depth 77/47577/6
Lai Siyao [Thu, 9 Jun 2022 11:40:42 +0000 (07:40 -0400)]
LU-15850 llite: pass dmv inherit depth instead of dir depth

In directory creation, once it's ancestor has default LMV, pass
the inherit depth, otherwise pass the directory depth to ROOT.

This depth will be used in QoS allocation.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id480f32c1718e9f62314c2dfe8905be5db94d1f2
Reviewed-on: https://review.whamcloud.com/47577
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16010 kernel: kernel update RHEL8.6 [4.18.0-372.16.1.el8_6] 48/47948/4
Jian Yu [Fri, 15 Jul 2022 16:50:08 +0000 (09:50 -0700)]
LU-16010 kernel: kernel update RHEL8.6 [4.18.0-372.16.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.16.1.el8_6.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I08db577f31a1d686b88804384a05d5b418e634d5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15910 llite: use max default EA size to get default LMV 37/47937/4
Lai Siyao [Mon, 11 Jul 2022 14:27:32 +0000 (10:27 -0400)]
LU-15910 llite: use max default EA size to get default LMV

Subdir mount will fetch ROOT default LMV and set it, but the default
EA size cl_default_mds_easize may not be set for MDT0 yet, because
it's updated upon getattr/enqueue, and if subdir mount is not on MDT0,
it may not be initialized yet. Use max EA size to fetch default
layout in ll_dir_get_default_layout().

Fixes: a162e24d2d ("LU-15910 llite: enforce ROOT default on subdir mount")
Fixes: 716ac65ef6 ("LU-15910 tests: skip sanity/413g for SSK")
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3c762cd371a80c2bea12d7fdbc16c6b14b3214e6
Reviewed-on: https://review.whamcloud.com/47937
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-12189 ec: code to add support for M to N parity 28/47628/5
James Simmons [Wed, 29 Jun 2022 19:32:13 +0000 (15:32 -0400)]
LU-12189 ec: code to add support for M to N parity

This code adds basic functionality for calculating N parities
for M data units. This allows much more than just working with
raid6 calculations. The code is derived from the Intel isa-l
userland library. Keep the code in an separate module for easy
merger upstream at a latter time.

Test-Parameters: trivial
Change-Id: Ie0bb5af2514c213db40de33139e03e16f9605ce8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Adam Disney <disneyaw@ornl.gov>
Reviewed-on: https://review.whamcloud.com/47628
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15851 lnet: Adjust niov checks for large MD 19/47319/3
Chris Horn [Sat, 16 Apr 2022 16:01:57 +0000 (10:01 -0600)]
LU-15851 lnet: Adjust niov checks for large MD

An LNet user can allocate a large contiguous MD. That MD can have >
LNET_MAX_IOV pages which causes some LNDs to assert on either niov
argument passed to lnd_recv() or the value stored in
lnet_msg::msg_niov. This is true even in cases where the actual
transfer size is <= LNET_MTU and will not exceed limits in the LNDs.

Adjust ksocklnd_send()/ksocklnd_recv() to assert on the return value
of lnet_extract_kiov().

Remove the assert on msg_niov (payload_niov) from kiblnd_send().
kiblnd_setup_rd_kiov() will already fail if we exceed ko2iblnd's
available scatter gather entries.

HPE-bug-id: LUS-10878
Test-Parameters: trivial
Fixes: 857f11169f ("LU-13004 lnet: always put a page list into struct lnet_libmd")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaa851d90f735d04e5167bb9c07235625759245b2
Reviewed-on: https://review.whamcloud.com/47319
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16036 test: make sanity-lfsck 15d more robust 48/48048/3
Lai Siyao [Fri, 15 Jul 2022 08:30:32 +0000 (04:30 -0400)]
LU-16036 test: make sanity-lfsck 15d more robust

Now migrating directory LFSCK is not fully supported, thus accessing
such directory may fail. To make sanity-lfsck 15d more robust,
reformat servers after test.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-lfsck env=ONLY=15d,ONLY_REPEAT=100
Fixes: 54a2d4662b58 ("LU-15868 lfsck: don't crash upon dir migration failure")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie613737ab3e8d294e1f9e5709ceb35baa75790ad
Reviewed-on: https://review.whamcloud.com/48048
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16023 tests: sanity-quota/8 should return success 66/47966/4
Alex Zhuravlev [Mon, 18 Jul 2022 10:28:49 +0000 (13:28 +0300)]
LU-16023 tests: sanity-quota/8 should return success

sanity-quota/8 should return success explicitly

Test-Parameters: trivial testlist=sanity-quota
Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout or short cmd error")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id30bfd3e0bafedb6516471accbc0519cc640d2bd
Reviewed-on: https://review.whamcloud.com/47966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15282 tests: relax test_51d thresholds somewhat 78/47978/3
Andreas Dilger [Mon, 18 Jul 2022 22:12:12 +0000 (16:12 -0600)]
LU-15282 tests: relax test_51d thresholds somewhat

Added combinations for sanity.sh test_51d are failing some fraction
of the time.  Relax thresholds somewhat to avoid spurious failures,
while keeping added configs to detect major regressions.

Test-Parameters: trivial
Fixes: fd5c915eff ("LU-15282 tests: improve sanity test_51d coverage")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2d28f77fa377a1eb96a00cf35827b9ebc5af806b
Reviewed-on: https://review.whamcloud.com/47978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15983 lnet: Define KFILND network type 30/47830/5
Chris Horn [Wed, 29 Jun 2022 00:24:32 +0000 (19:24 -0500)]
LU-15983 lnet: Define KFILND network type

Define the KFILND network type. This reserves the network type number
for future implementation and allows creation of kfi peers and
adding routes to kfi peers.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11060
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9111645f1290c8af4937d1b2689a068df81922a4
Reviewed-on: https://review.whamcloud.com/47830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15850 mdt: pack default LMV in open reply 76/47576/7
Lai Siyao [Thu, 9 Jun 2022 11:26:40 +0000 (07:26 -0400)]
LU-15850 mdt: pack default LMV in open reply

Add flag MDS_OPEN_DEFAULT_LMV to indicate that default LMV should be
packed in open reply, otherwise if open fetches LOOKUP lock, client
won't know directory has default LMV, and in subdir creation default
LMV won't take effect.

Test-Parameters: clientversion=2.14 testlist=sanity mdtcount=4 mdscount=2 env=SANITY_EXCEPT="39l 134b 150b 160g 205a 208 230e 230p 270a 300g 807"
Test-Parameters: serverversion=2.14 testlist=sanity mdtcount=4 mdscount=2 env=SANITY_EXCEPT="65n 247f"
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If2300ca39f406169eff9eab8f973ca1c2bfc8202
Reviewed-on: https://review.whamcloud.com/47576
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14472 quota: skip non-exist or inact tgt for lfs_quota 71/41771/19
Hongchao Zhang [Wed, 15 Dec 2021 12:11:17 +0000 (20:11 +0800)]
LU-14472 quota: skip non-exist or inact tgt for lfs_quota

The nonexistent or inactive targets (MDC or OSC) should be skipped
for "lfs quota".

Change-Id: I25eece413715e4e05dd94ccbfd101220da7477f9
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41771
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11407 tgt: cleanup job_stats output printing 64/37764/24
Andreas Dilger [Fri, 10 Jan 2020 21:18:53 +0000 (14:18 -0700)]
LU-11407 tgt: cleanup job_stats output printing

Escape non-printable and other special characters in the JobID
name, which may be passed from the client environment, to avoid
breaking YAML format parsing.  We can't use the kernel "%*pE"
escape format, because that doesn't have any option to escape
printable characters like quotes or regular spaces.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If6a0bc4276fae03305f94e8c85d8f109913ebbe5
Reviewed-on: https://review.whamcloud.com/37764
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-6612 utils: strengthen llog_reader vs wrong format/header 54/15654/5
Bruno Faccini [Mon, 20 Jul 2015 14:30:11 +0000 (16:30 +0200)]
LU-6612 utils: strengthen llog_reader vs wrong format/header

The following snippet shows that llog_reader can be puzzled due to
an invalid 0 for the number of records when parsing an expected
LLOG file header :
root# dd if=/dev/zero bs=4096 count=1 of=/tmp/zeroes
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000263962 s, 15.5 MB/s
root# llog_reader /tmp/zeroes
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I12be79e6c6a5da384a5fd81878a76a7ea8aa5834
Reviewed-on: https://review.whamcloud.com/15654
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15984 o2iblnd: debug message is missing a newline 33/47933/2
Serguei Smirnov [Mon, 11 Jul 2022 22:49:04 +0000 (15:49 -0700)]
LU-15984 o2iblnd: debug message is missing a newline

Add missing newline to one of the debug messages in
kiblnd_pool_alloc_node.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I541622322ea6166892270dbfd1567cc64f8c314c
Reviewed-on: https://review.whamcloud.com/47933
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15938 lod: prevent endless retry in recovery thread 98/47698/3
Mikhail Pershin [Wed, 22 Jun 2022 10:27:48 +0000 (13:27 +0300)]
LU-15938 lod: prevent endless retry in recovery thread

- abort lod_sub_recovery_thread() by obd_abort_recov_mdt in
  addition to obd_abort_recovery
- handle 'short llog' situation gracefully, when remote llog
  is shorter than local copy header expects, trust remote llog
  data and consider llog processing as finished
- on other errors during remote llog read, set obd_abort_recov_mdt
  but not obd_abort_recovery in attempt to skip MDT-MDT recovery
  only and continue with client recovery while possible
- fix parsing problem with 'abort_recov' and 'abort_recov_mdt' in
  lmd_parse() causing no MDT recovery abort but client recovery
  abort always. Allow also 'abort_recovery_mdt' mount option name

The original case with endless retry is caused by such de-sync
between local llog structures and remote llog. The local llog
header says there is record with some ID, so recovery thread
is trying to get that record from remote llog. Meanwhile there
is no such record on remote server, so it reads whole llog and
return it back properly but llog processing consider that as
incomplete llog due to network issues and retry endlessly.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib127fd0d1abd5289d90c7b4b3ca74ab6fc78bc71
Reviewed-on: https://review.whamcloud.com/47698
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-10994 clio: Remove cl_2queue_add wrapper 51/47651/4
Shivani Bhardwaj [Mon, 27 Jun 2022 15:13:25 +0000 (11:13 -0400)]
LU-10994 clio: Remove cl_2queue_add wrapper

Remove the wrapper function cl_2queue_add() and replace all its calls in
different files with the function it wrapped. Also, comments are added
wherever necessary to make the working of function clear. Prototype of
the function is also removed from the header file as it is no longer
needed.

Linux-commit: 53f1a12768a55e53b2c40e00a8804b1edfa739b3

Change-Id: Ic746c45e3dda9fdf3f1d2f8c6204d80fec5c058f
Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/47651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15925 lnet: add debug messages for IB 83/47583/4
Cyril Bordage [Thu, 9 Jun 2022 21:41:54 +0000 (23:41 +0200)]
LU-15925 lnet: add debug messages for IB

If net debug is enabled, information about connection, when
tx status is ECONNABORTED, is collected (only for IB).

Test-Parameters: trivial
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I44a33703931630b85cc0e847e2a038217b7967c6
Reviewed-on: https://review.whamcloud.com/47583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15902 obdclass: dt_try_as_dir() check dir exists 83/47483/7
Lai Siyao [Thu, 19 May 2022 22:31:07 +0000 (18:31 -0400)]
LU-15902 obdclass: dt_try_as_dir() check dir exists

If an object is not directory, but dt_lookup() is called on it, it
may crash because .do_lookup is NULL for non-directory file.

Add argument to check object existence and type in dt_try_as_dir(),
and for object to create, skip this check.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I51df0cbb5a4e7abca370ee27dac678f995b76159
Reviewed-on: https://review.whamcloud.com/47483
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15880 quota: fix issues in reserving quota 25/47425/4
Hongchao Zhang [Fri, 24 Jun 2022 14:06:08 +0000 (22:06 +0800)]
LU-15880 quota: fix issues in reserving quota

Calling "chgrp" with unprivileged user will reserve quota space
before changing the GID of the file, and the reserved quota space
will be freed after its transaction is committed. there are some
issues in the current implementation,
1, the reserved quota isn't freed in case of error in "mdd_attr_set"
   and "tgt_cb_last_committed".
2, during freeing the reserved quota, the quota space to free is
   set as the same parameter as reserving the quota, which could
   be wrong, for instance, the reserving quota space will be 0 if
   the corresponding quota ID isn't enforces, but the call will
   return without error.

Like the "qsd_op_begin/qsd_op_end",The patch also adds reference to
the lquota_entry gotten during reserving quota and release it during
freeing the reserved quota to prevent potential issue.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I098cde7d5e89fe8b9eaab0ae4bc285a4ac6c2281
Reviewed-on: https://review.whamcloud.com/47425
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15991 kernel: kernel update RHEL7.9 [3.10.0-1160.71.1.el7] 65/47865/2
Jian Yu [Tue, 5 Jul 2022 06:08:40 +0000 (23:08 -0700)]
LU-15991 kernel: kernel update RHEL7.9 [3.10.0-1160.71.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.71.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I89215145ea8da2925e5c8c01cdf963ba8a087877
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-16000 utils: align updatelog parameters in llog_reader 13/47913/2
Etienne AUJAMES [Fri, 8 Jul 2022 10:36:10 +0000 (12:36 +0200)]
LU-16000 utils: align updatelog parameters in llog_reader

Parameters in update log records are aligned on 64bits. llog_reader
do not aligned these parameters: if a parameters size is not mutiple
of 8, the next parameter size will be read incorrectly.

Test-Parameters: trivial
Fixes: 9962d6f ("LU-14617 utils: llog_reader updatelog support")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I6871614ab4ea79d59c3c3b4644b377de395bad56
Reviewed-on: https://review.whamcloud.com/47913
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15993 ofd: don't leak pages if nodemap fails 73/47873/4
Alex Zhuravlev [Tue, 5 Jul 2022 11:24:03 +0000 (14:24 +0300)]
LU-15993 ofd: don't leak pages if nodemap fails

ofd_commitrw() shouldn't exit w/o calling ofd_commitrw_write(),
otherwise the pages taken in ofd_preprw() are leaked.

same in mdt_obd_commitrw()

Fixes: bbfdc7c167 ("LU-14739 quota: fix quota with root squash enabled")

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Icd60c7ab80c5a7b65603d7da0d2e83872dc6b97f
Reviewed-on: https://review.whamcloud.com/47873
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-8582 target: send error reply on wrong opcode 61/47761/6
Li Xi [Tue, 21 Jun 2022 12:06:22 +0000 (20:06 +0800)]
LU-8582 target: send error reply on wrong opcode

Unknown opcode does not necessarily means insane client. A new client
might send RPCs with new opcodes to an old server. The client might
desperately stuck there waiting for a reply. So, send an error back
when RPC has a wrong opcode.

This patch returns the EOPNOTSUPP to client instead of block. ENOTSUPP
is not used here since strerror() does not understand ENOTSUPP.

OBD_FAIL_OST_OPCODE=0x253 is added for fault injection test of opcode.
To test whether an invalid opcode is handled properly on OST, use the
following command:

lctl set_param fail_val=${opcode} fail_loc=0x253

Change-Id: I46ca62bc532b92368e06a4f883b102c7186c453c
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/47761
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15888 build: Debian dkms-debs requires ed and libkeyutils 55/47455/7
Shaun Tancheff [Fri, 8 Jul 2022 13:42:00 +0000 (08:42 -0500)]
LU-15888 build: Debian dkms-debs requires ed and libkeyutils

dkms install/build needs dependencies on libmount-dev,
libkeyutils1, and libkeyutils-dev

Debian does not install the 'ed' package by default.
Without the 'ed' package the version is not correctly added
to the changelog and parsed to the package names.

Debian does not have linux-image or linux-headers psuedo
packages so require the arch specific ones, ex:
   linux-image | linux-image-amd64 | linux-image-arm64
and:
   linux-headers | linux-headers-amd64 | linux-headers-arm64
respectively.

o2ib fails to find Debian in-kernel Module.symvers and
should check $LINUX_OBJ/Module.symvers before failing.

HPE-bug-id: LUS-10984
Test-Parameters: trivial
Fixes: 85a6eebeca1 ("LU-15652 build: On Debian detect -common kernel headers")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I82e2689f3af4b9ce106ee3ab6b4109d2709c8872
Reviewed-on: https://review.whamcloud.com/47455
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15886 lfsck: remove unreasonable assertions 47/47447/6
Lai Siyao [Wed, 18 May 2022 22:21:52 +0000 (18:21 -0400)]
LU-15886 lfsck: remove unreasonable assertions

Remove unreasonable assertions in LFSCK code:
* lfsck->li_obj_dir and lfsck->li_lmv may be NULL if object wan't
  initialized successfully.
* orphan objects under ldiskfs /lost+found may not exist.
* object may not be directory in lfsck_verify_lpf()->
  lfsck_verify_linkea().
* for corner case
  (leh->leh_reccount == 0 && leh->leh_overflow_time != 0),
  LASSERT(ldata->ld_leh->leh_reccount > 0) will be triggerred in
  lfsck_namespace_linkea_clear_overflow(), remove this assertion,
  and this corner case can be handled correctly in current lfsck
  code.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If114c7882a2c083e83fcfac5981eddfa526d1426
Reviewed-on: https://review.whamcloud.com/47447
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15868 lfsck: don't crash upon dir migration failure 81/47381/7
Lai Siyao [Tue, 17 May 2022 11:11:25 +0000 (07:11 -0400)]
LU-15868 lfsck: don't crash upon dir migration failure

LFSCK against directories that were migrated, but failed may crash,
it's because lost+found directory may not be initialized correctly,
and this error is skipped on purpose, add check in code that
dereference it.

lfsck_verify_lpf() may dereference NULL "child2".

lmv_name_to_stripe_index() should support stripe LMV, which is used
by LFSCK to verify name hash.

Add OBD_FAIL_OUT_EIO to simulate sub transaction failure.

Add sanity-lfsck 15d to verify LFSCK won't crash upon directory
migration failure.

Update sanity-lfsck 4 and 5 to start mds1 with OI scrub enabled, and
wait for mds1 OI scrub finish, otherwise LFSCK may fail to verify
lost+found later.

Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity-lfsck \
env=ONLY=15d,ONLY_REPEAT=100
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1b1872da2b4ef8f7403effc4d1d3e298c6a0b7e6
Reviewed-on: https://review.whamcloud.com/47381
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15626 tests: Fix "error" reported by shellcheck for recovery-random-scale 66/46866/8
Arshad Hussain [Fri, 18 Mar 2022 07:19:24 +0000 (12:49 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for recovery-random-scale

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/recovery-random-scale.sh. This
patch also moves spaces to tabs.

Test-Parameters: trivial clientcount=6 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180,SLOW=no testlist=recovery-random-scale
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9e7c6fcd726d5fc2141a7ced73ea78ee6a43ec22
Reviewed-on: https://review.whamcloud.com/46866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15626 tests: Fix "error" reported by shellcheck for recovery-mds-scale 65/46865/12
Arshad Hussain [Fri, 18 Mar 2022 07:25:25 +0000 (12:55 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for recovery-mds-scale

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/recovery-mds-scale.sh. This patch
also moves spaces to tabs.

Test-Parameters: trivial clientcount=6 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180,DURATION=600,SLOW=no testlist=recovery-mds-scale
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I6c098809835950e1f781e04a6898895592407948
Reviewed-on: https://review.whamcloud.com/46865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15132 hsm: Protect against parallel HSM restore requests 67/45367/15
Etienne AUJAMES [Thu, 21 Oct 2021 14:31:01 +0000 (16:31 +0200)]
LU-15132 hsm: Protect against parallel HSM restore requests

Multiple parallel accesses (read/write) to the same released file
could cause multiple HSM restore requests to be sent.
On the MDT side, each restore request waits the first one to complete
before grabbing the MDS_INODELOCK_LAYOUT LCK_EX and registering the
llog record.

This could cause several MDT threads to hang for the same restore
request sent in parallel. In the worst case, all MDT threads can
hang and the MDS is not longer able to handle requests.

This patch checks if an HSM restore handle exists before taking the
lock.

Test-Parameters: testlist=sanity-hsm,sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=12s,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I9584edc2c7411aa41b2e318e55f57c117d1c3dfb
Reviewed-on: https://review.whamcloud.com/45367
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15913 mdt: disable parallel rename for striped dirs 93/47593/9
Andreas Dilger [Sat, 11 Jun 2022 01:47:00 +0000 (19:47 -0600)]
LU-15913 mdt: disable parallel rename for striped dirs

Parallel rename should not be done within striped directories to
avoid remote updates.  These are like cross-directory renames.

Add tunables for parallel directory rename in case of problems.
These can be configured separately for files and directories.

    mdt.*.enable_parallel_rename_dir
    mdt.*.enable_parallel_rename_file

Fixes: 90979ab390 ("LU-12125 mds: allow parallel directory rename")
Fixes: d76cc65d5d ("LU-12125 mds: allow parallel regular file rename")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I384976cd1c9f401169336ee7a479ba0e3dd9f4ee
Reviewed-on: https://review.whamcloud.com/47593
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15521 spec: fix more bare words error with rpm 4.16 33/47833/2
Jian Yu [Wed, 29 Jun 2022 21:23:59 +0000 (14:23 -0700)]
LU-15521 spec: fix more bare words error with rpm 4.16

This patch fixes more bare words errors and extra tokens
warnings with rpm 4.16.

Test-Parameters: trivial

Change-Id: Ic40b5763d1cb362d5aa77b06e9a5768b2abbc708
Fixes: 597a6bf9e085 ("LU-15521 spec: fix bare words error with rpm 4.16")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15653 client: able to cleanup devices manually 59/46859/7
Mikhail Pershin [Tue, 22 Feb 2022 17:34:37 +0000 (20:34 +0300)]
LU-15653 client: able to cleanup devices manually

Using 'lctl cleanup/detach' could be needed in situations
with unclean umount. Meanwhile that doesn't work now for
LMV and also could cause panic after all

Patch restores ability to cleanup/detach client devices
manually.
- debugfs and lprocfs cleanup in lmv_precleanup() are moved
  lmv_cleanup() to be not cleared too early. This prevents
  hang on 'lctl cleanup' for LMV device
- test 172 is added in sanity. It skips device cleanup during
  normal umount, keeping device alive without client mount
  then manually cleanups/detaches them
- prevent negative lov_connections in lov_disconnect() and
  handle it gracefully
- remove obd_cleanup_client_import() in mdc_precleanup(),
  it is called already inside osc_precleanup_common()

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I8a3868fabd1d805e827d04852d1614a3fe57ce35
Reviewed-on: https://review.whamcloud.com/46859
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15894 ofd: revert range locking in ofd 66/47466/4
Andrew Perepechko [Sun, 24 Apr 2022 14:48:09 +0000 (17:48 +0300)]
LU-15894 ofd: revert range locking in ofd

After commit 301d76a711 (LU-14876), range locking is no longer needed
in ofd, because 301d76a711 itself prevents the original data corruption
fixed by range locking. At the same time, range locking in ofd adds
unnecessary overhead, we can even see serialization under specific load.

This patch reverts range locking but keeps recovery-small test 148
to test the original corruption scenario case.

Change-Id: Ic795bcfb1e249c4927f66b6bad456f5511819861
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-9890
Fixes: 35679a730 ("LU-10958 ofd: data corruption due to RPC reordering")
Reviewed-on: https://review.whamcloud.com/47466
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15759 libcfs: debugfs file_operation should have an owner 35/47335/7
Mr NeilBrown [Fri, 13 May 2022 02:16:12 +0000 (12:16 +1000)]
LU-15759 libcfs: debugfs file_operation should have an owner

If debugfs a file is open when unloading the libcfs/lnet module, it
produces a kernel Oops (debugfs file_operations callbacks no longer
exist).

Crash generated with routerstat (/sys/kernel/debug/lnet/stats):
[ 1449.750396] IP: [<ffffffffab24e093>] SyS_lseek+0x83/0x100
[ 1449.750412] PGD 9fa14067 PUD 9fa16067 PMD d4e5d067 PTE 0
[ 1449.750428] Oops: 0000 [#1] SMP
[ 1449.750883]  [<ffffffffab7aaf92>] system_call_fastpath+0x25/0x2a
[ 1449.750897]  [<ffffffffab7aaed5>] ?
system_call_after_swapgs+0xa2/0x13a

This patch adds an owner to debugfs file_operation for libcfs and
lnet_router entries (/sys/kernel/debug/lnet/*).

The following behavior is expected:
$ modprobe lustre
$ routerstat 10 > /dev/null &
$ lustre_rmmod
rmmod: ERROR: Module lnet is in use
Can't read statfile (ENODEV)
[1]+  Exit 1                  routerstat 10 > /dev/null
$ lustre_rmmod

Note that the allocated 'struct file_operations' cannot be freed until
the module_exit() function is called, as files could still be open
until then.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia0920313e0c2a4b6cdc875fed08221e174a12a73
Reviewed-on: https://review.whamcloud.com/47335
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15779 ofd: don't hold read lock over bulk 26/47126/5
Alex Zhuravlev [Sun, 24 Apr 2022 11:27:18 +0000 (14:27 +0300)]
LU-15779 ofd: don't hold read lock over bulk

as this can block all operations on OST:

1) ofd_preprw_read() takes a shared object lock and initiates BULK
2) OUT needs an exclusive object lock on the same object
3) ofd_commitrw_write() starts transaction and now has to wait
   for OUT to get and release that exclusive object lock (step 2)
4) number of threads can get stuck waiting for ofd_commit_write()
   to stop it's transaction

this patch drops a shared object lock before BULK transfer.
at the moment it's not clear how such read would race with
object removal on ZFS - this should be investigated.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I131493abd90283e9ca897f904e00c25d26e3d8d3
Reviewed-on: https://review.whamcloud.com/47126
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15727 lod: honor append_pool with default composite layouts 14/47014/3
John L. Hammond [Thu, 7 Apr 2022 16:31:07 +0000 (11:31 -0500)]
LU-15727 lod: honor append_pool with default composite layouts

In lod_get_default_lov_striping(), correct the handling of composite
default layouts in the case where append_stripe_count is nonzero.
Align the names of the append members of struct dt_allocation_hint
with the mdd params. Remove the unused dah_mode member of struct
dt_allocation_hint.

Add sanity test_27U() to verify.

Fixes: e2ac6e1eaa ("LU-9341 lod: Add special O_APPEND striping")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I66b426d24d6476fb483397f290229983f3da4be5
Reviewed-on: https://review.whamcloud.com/47014
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15922 sec: new connect flag for name encryption 74/47574/5
Sebastien Buisson [Thu, 9 Jun 2022 08:00:52 +0000 (10:00 +0200)]
LU-15922 sec: new connect flag for name encryption

Introduce OBD_CONNECT2_ENCRYPT_NAME connection flag for compatibility
with older versions that do not support name encryption.
When server side does not have this flag, client side is forced to
null encryption for file names. And client needs to use old xattr to
store encryption context.

Also update tests in sanity-sec to exercise name encryption only if
server side supports it.

Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Test-Parameters: serverversion=2.14 testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=ldiskfs
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I446a4caba8e45821d701628a14c96f03cb6c4525
Reviewed-on: https://review.whamcloud.com/47574
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15952 doc: improvement on lfs-project doc 41/47641/2
Lei Feng [Thu, 16 Jun 2022 02:45:51 +0000 (10:45 +0800)]
LU-15952 doc: improvement on lfs-project doc

Describe 'lfs project -C' clearly.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I1e0f70a9116265edac993b12b775bd57c8587d40
Reviewed-on: https://review.whamcloud.com/47641
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15942 utils: ofd_access_log_reader exit status 25/47625/2
John L. Hammond [Tue, 14 Jun 2022 13:32:54 +0000 (08:32 -0500)]
LU-15942 utils: ofd_access_log_reader exit status

If no OSTs are mounted then the ofd module may not be leaded and hence
/dev/lustre-access-log/control may not exist. In
ofd_access_log_reader, if --exit-on-close is used then we should
handle this consistently with the case of no access logs by exiting
with status 0.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I91b059bee8941501f2d207d2a48d1ea5ad40ae99
Reviewed-on: https://review.whamcloud.com/47625
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15931 tests: Escape * in log() 97/47597/3
Oleg Drokin [Sat, 11 Jun 2022 04:28:19 +0000 (00:28 -0400)]
LU-15931 tests: Escape * in log()

So it does not print every file name in test description in dmesg

Test-Parameters: trivial testlist=sanity env=ONLY=160s
Fixes: f60b307c50 ("LU-14699 mdd: proactive changelog garbage collection")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Change-Id: I300fde0f71ef15a5c6573a67324944ba8d53f8e3
Reviewed-on: https://review.whamcloud.com/47597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-8621 utils: cmd help to stdout or short cmd error 62/47162/3
Aleksei Alyaev [Thu, 23 Dec 2021 08:48:22 +0000 (11:48 +0300)]
LU-8621 utils: cmd help to stdout or short cmd error

- Changed to print command help to stdout
- Changed to output short error message for an unrecognized command

Test-Parameters: trivial
Signed-off-by: Aleksei Alyaev <aalyaev@ddn.com>
Change-Id: I67616ddb576e3347a2da130b3a731a6bf8730185
Reviewed-on: https://review.whamcloud.com/47162
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-14555 lnet: asym route inconsistency warning 18/46918/6
Gian-Carlo DeFazio [Wed, 23 Mar 2022 23:25:56 +0000 (16:25 -0700)]
LU-14555 lnet: asym route inconsistency warning

lnet_check_route_inconsistency() checks for inconsistency between
the lr_hops and lr_single_hop values of a route.

A warning is currently emitted if the route is not single hop
and the hop count is either 1 or LNET_UNDEFINED_HOPS.

To emit the warning, add the requirement that
avoid_asym_router_failure is enabled.

Test-Parameters: trivial
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: Iaa26d25492e49b569ae5e81da9f00f162be3da59
Reviewed-on: https://review.whamcloud.com/46918
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-13335 ldiskfs: add projid to debug logs 69/46369/8
Andreas Dilger [Fri, 28 Jan 2022 05:14:49 +0000 (22:14 -0700)]
LU-13335 ldiskfs: add projid to debug logs

There is virtually no tracking of projid changes in osd-ldiskfs,
which makes it very difficult to debug operations therein.

Add some minimal debugging on the client and servers to log
the projid when it is changed, along with the affected FID.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibcf3f09ee243ebe052c8f9119383897072ce7057
Reviewed-on: https://review.whamcloud.com/46369
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain 10/46310/5
Etienne AUJAMES [Tue, 25 Jan 2022 21:38:26 +0000 (22:38 +0100)]
LU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain

Add the catalog callback return LLOG_SKIP_PLAIN to conditionally skip
an entire llog plain.

This could speedup the catalog processing for specific usages when a
record need to be access in the "middle" of the catalog. This could
be usefull for changelog with several users or HSM.

This patch modify chlg_read_cat_process_cb() to use LLOG_SKIP_PLAIN.
The main idea came from: d813c75d ("LU-14688 mdt: changelog purge
deletes plain llog")

**Performance test:**

* Environement:
2474195 changelogs record store on the mds0 (40 llog plain):
mds# lctl get_param -n mdd.lustrefs-MDT0000.changelog_users
current index: 2474195
ID    index (idle seconds)
cl1   0 (3509)

* Test
Access to records at the end of the catalog (offset: 2474194):
client# time lfs changelog lustrefs-MDT0000 2474194 >/dev/null

* Results
- with the patch:  real    0m0.592s
- without the patch: real    0m17.835s (x30)

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I887d5bef1f3a6a31c46bc58959e0f508266c53d2
Reviewed-on: https://review.whamcloud.com/46310
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15451 sec: read-only nodemap flag 49/46149/5
Sebastien Buisson [Fri, 14 Jan 2022 09:16:31 +0000 (10:16 +0100)]
LU-15451 sec: read-only nodemap flag

Add a new 'readonly_mount' property to nodemaps. When set, we return
-EROFS from server side if the client is not mounting read-only.
So the client will have to specify the read-only mount option to be
allowed to mount.

Fixes: 928714dddabb ("LU-5092 nodemap: save id maps to targets in new index file")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9931844ae46dfd5d724f592f8dfacc4a8011c7e3
Reviewed-on: https://review.whamcloud.com/46149
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15399 llite: dont restart directIO with IOCB_NOWAIT 47/46147/4
Qian Yingjin [Mon, 27 Dec 2021 03:23:45 +0000 (11:23 +0800)]
LU-15399 llite: dont restart directIO with IOCB_NOWAIT

It should hanlde FLR mirror retry and io_uring with IOCB_NOWAIT
flag differently.

int cl_io_loop(const struct lu_env *env, struct cl_io *io)
{
...
if (result == -EAGAIN && io->ci_ndelay) {
io->ci_need_restart = 1;
result = 0;
}
...
}

ssize_t
generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
...
if (iocb->ki_flags & IOCB_NOWAIT) {
if (filemap_range_has_page(mapping, iocb->ki_pos,
   iocb->ki_pos +
   count - 1))
return -EAGAIN;
...
}

In current code, it will restart I/O engine for read when get
-EAGAIN code.
However, for io_uring direct IO with IOCB_NOWAIT, if found that
there are cache pages in the current I/O range, it should return
-EAGAIN to the upper layer immediately. Otherwise, it will stuck
in an endless loop.

This patch also adds a tool "io_uring_probe" to check whether
the kernel supports io_uring fully.
The reason adding this check is because the rhel8.5 kernel has
backported io_uring:
cat /proc/kallsyms |grep io_uring
ffffffffa8510e10 W __x64_sys_io_uring_enter
ffffffffa8510e10 W __x64_sys_io_uring_register
ffffffffa8510e10 W __x64_sys_io_uring_setup
but the io_uring syscalls return -ENOSYS.

Test-Parameters: clientdistro=ubuntu2004 testlist=sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Id4374382e56e90d02349676891aa57b216b3deff
Reviewed-on: https://review.whamcloud.com/46147
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-7668 utils: add lctl del_ost 49/41449/12
Stephane Thiell [Tue, 9 Feb 2021 06:47:31 +0000 (22:47 -0800)]
LU-7668 utils: add lctl del_ost

Add helper command:

   lctl del_ost [--dryrun] --target fsname-OSTxxxx

Permanently disable an OST by altering the client and MDT llog
catalogs on MGS. The command finds all catalog records related to the
specified OST and cancel them. A --dryrun option is provided so that
the system administrator can see which records would have been
cancelled, but without actually cancelling them.

Signed-off-by: Stephane Thiell <sthiell@stanford.edu>
Change-Id: I58c4f10fa0f7164a40231e807698eb224cccf062
Reviewed-on: https://review.whamcloud.com/41449
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-13189 osd-zfs: add project id for old objects without ZFS_PROJID 09/47709/3
Li Dongyang [Thu, 23 Jun 2022 06:36:19 +0000 (16:36 +1000)]
LU-13189 osd-zfs: add project id for old objects without ZFS_PROJID

After project quota zpool upgrade, the ZFS_PROJID
flag could still be missing on some old objects.

We used to check for this and return ENXIO in
osd_declare_attr_set(), however the check is changed
by "LU-12309 osd-zfs: Support disabled project quotas".

Now if the target project id is the default
project id 0, we will pass the check in
osd_declare_attr_set() and trigger the assert in
osd_attr_set() later.

Instead of returning ENXIO, we could adjust the
attribute layout of the old objects to accommodate
project id.

Also add back the logic from "LU-14740 quota: reject
invalid project id on server side", which got removed
by "LU-14927 quota: move qsd_transfer to lquota module",
due to using GPL symbols.

Change-Id: Ib62fdd2a0e07f15ae12daf564273a249a54dd8ea
Fixes: 291e7196d3 ("LU-12309 osd-zfs: Support disabled project quotas")
Fixes: d2e8208e22 ("LU-14927 quota: move qsd_transfer to lquota module")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/47709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15926 nrs: fix tbf realtime rules 85/47585/3
Etienne AUJAMES [Thu, 9 Jun 2022 20:50:06 +0000 (22:50 +0200)]
LU-15926 nrs: fix tbf realtime rules

tc_nsecs_resid should be reset to 0 when changing a rule otherwise
this could lead to mds crashes for realtime policies.

nrs_tbf_req_get(): ASSERTION( cli->tc_nsecs_resid < cli->tc_nsecs )

Fixes: d11fa2c27959 ("LU-9228 nrs: TBF realtime policies under congestion")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I280acb42e104088c6b8750a0bb7bf9c50cf96e73
Reviewed-on: https://review.whamcloud.com/47585
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-6142 lnet: use list_first_entry() in lnet/lnet subdirectory. 88/47488/3
Mr NeilBrown [Tue, 31 May 2022 00:43:12 +0000 (20:43 -0400)]
LU-6142 lnet: use list_first_entry() in lnet/lnet subdirectory.

Convert
  list_entry(foo->next .....)
to
  list_first_entry(foo, ....)

in 'lnet/lnet'

In several cases the call is combined with
a list_empty() test and list_first_entry_or_null() is used

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I45e1bdfe41854c88af98ebf24797f72a68b11dc3
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/47488
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15713 lnet: Ensure round robin across nets 76/46976/7
Chris Horn [Thu, 31 Mar 2022 01:49:46 +0000 (20:49 -0500)]
LU-15713 lnet: Ensure round robin across nets

Introduce a global net sequence number and a peer sequence number.
These sequence numbers are used to ensure round robin selection of
local NIs and peer NIs across nets.

Also consolidate the sequence number accounting under
lnet_handle_send(). Previously the sequence number increment for
the final destination peer net/peer NI on a routed send was done
in lnet_handle_find_routed_path().

Some cleanup that is also in this patch:
 - Redundant check of null src_nid is removed from
   lnet_handle_find_routed_path() (LNET_NID_IS_ANY handles null arg)
 - Avoid comparing best_lpn with itself in
   lnet_handle_find_routed_path() on the first loop iteration
 - In lnet_find_best_ni_on_local_net() check whether we have
   a specified lp_disc_net_id outside of the loop to avoid doing
   that work on each loop iteration.

Added some debug statements to print information used when selecting
peer net/local net.

HPE-bug-id: LUS-10871
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ide07e832deda85735042835e3097b9bf92e1e4b0
Reviewed-on: https://review.whamcloud.com/46976
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-10391 lnet: change lnet_*_peer_ni to take struct lnet_nid 25/44625/6
Mr NeilBrown [Tue, 18 Jan 2022 17:14:59 +0000 (12:14 -0500)]
LU-10391 lnet: change lnet_*_peer_ni to take struct lnet_nid

lnet_add_peer_ni() and lnet_del_peer_ni() now take
struct lnet_nid rather than lnet_nid_t.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I48bbed96fddea509b27dee131e134aa7b35ae68c
Reviewed-on: https://review.whamcloud.com/44625
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-10391 lnet: discard some peer_ni lookup functions 24/44624/7
Mr NeilBrown [Thu, 16 Jun 2022 18:32:21 +0000 (14:32 -0400)]
LU-10391 lnet: discard some peer_ni lookup functions

lnet_nid2peerni_locked(), lnet_peer_get_ni_locked(),
lnet_find_peer4(), and lnet_find_peer_ni_locked() each have few users
left and that can call be change to use alternate versions which take
'struct lnet_nid' rather than 'lnet_nid_t'.

So convert all those callers over, and discard the older functions.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9f0ebd0631c2e4160c3198aa37f16b45027bce3d
Reviewed-on: https://review.whamcloud.com/44624
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-6142 obdclass: checkpatch cleanup of obd_mount_server.c 59/47259/6
James Simmons [Mon, 6 Jun 2022 14:30:05 +0000 (10:30 -0400)]
LU-6142 obdclass: checkpatch cleanup of obd_mount_server.c

Address the many issues reported by checkpatch.pl. Replace __u**
with u** since it is kernel only code. Move lustre_tgt
registeration from obd_mount.c to obd_mount_server.c.

Change-Id: If06d120434ecb200f7a265167a537b2c98519670
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47259
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-11695 som: disabling xattr cache for LSOM on client 11/33711/3
Qian Yingjin [Fri, 23 Nov 2018 08:10:54 +0000 (16:10 +0800)]
LU-11695 som: disabling xattr cache for LSOM on client

To obtain uptodate LSOM data, currently a client needs to set
llite.*.xattr_cache =0 to disable the xattr cache on client
completely. This leads that other kinds of xattr can not be cached
on the client too.
This patch introduces a heavy-weight solution to disable caching
only for LSOM xattr data ("trusted.som") on client.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Iab5ef3030b05ac09184d01f2a3a8ed92ff1cf26b
Reviewed-on: https://review.whamcloud.com/33711
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-14642 tests: skip sanity-flr/100 for old servers 67/47567/2
Andreas Dilger [Wed, 8 Jun 2022 14:40:20 +0000 (08:40 -0600)]
LU-14642 tests: skip sanity-flr/100 for old servers

The new FLR FSX mode in sanity-flr test_100 triggers LU-13730 when
run on old servers.  Skip it in this case.

Test-Parameters: trivial testlist=sanity-flr env=ONLY=100
Test-Parameters: serverversion=2.14 testlist=sanity-flr env=ONLY=100
Fixes: 90ba8b4ac360 ("LU-14642 test: add fsx mirror file test mode")
Fixes: 571f3cf11159 ("LU-13730 lod: don't confuse stale with primary flag")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I595bba6d5ab318ae591d80704c70998af9166de1
Reviewed-on: https://review.whamcloud.com/47567
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15282 tests: improve sanity test_51d coverage 54/46154/11
Andreas Dilger [Mon, 17 Jan 2022 23:24:48 +0000 (16:24 -0700)]
LU-15282 tests: improve sanity test_51d coverage

Improve sanity test_51d to test all different stripe counts, rather
than only striping over all OSTs.  With the current default test
config there are 7 OSTs, and this does not cover some test cases.

Test-Parameters: trivial testlist=sanity env=ONLY=51d ostcount=3
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=4
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=5
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=6
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=7
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=8
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=9
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=10
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=11
Test-Parameters: testlist=sanity env=ONLY=51d ostcount=12
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icc694d037e53f7bf966aff8ca1070d42ac3ebbe5
Reviewed-on: https://review.whamcloud.com/46154
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15720 dne: add crush2 hash type 15/47015/8
Andreas Dilger [Tue, 12 Apr 2022 23:18:10 +0000 (17:18 -0600)]
LU-15720 dne: add crush2 hash type

The original "crush" hash type has a significant error with files
that have all-number suffixes, or suffixes that have non-alpha
characters in them.  These files will all be placed on the same
MDT as the base filename, which causes MDT imbalance.

Add a "crush2" hash type that has more stringent checks for the
suffix, so that it doesn't consider all-digit suffixes, or files
that only have a '.' at the right offset, as temporary files.

Test that the "broken" all-digit or extra-'.' filenames are hashed
properly with "crush2".  We also need to confirm that the old "crush"
hash has not changed (for name lookup compatibility) and still has
the original "bad hashing" bug that puts all files on the same MDT.

Fix handling of types beyond MDT_HASH_TYPE_CRUSH when creating dirs.

Fix debug layout printing of hash_type in more parts of the code.
Don't flood console if hash type is unrecognized in the future.

Fixes: 0a1cf8da8069 ("LU-11025 dne: introduce new directory hash type 'crush'")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1ce34b8f3af44432f55307ebc6906677c6179d1d
Reviewed-on: https://review.whamcloud.com/47015
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15956 gss: allow build without ssk 81/47681/6
Sebastien Buisson [Wed, 6 Jul 2022 04:22:20 +0000 (21:22 -0700)]
LU-15956 gss: allow build without ssk

The GSS part of Lustre should be able to build without SSK, in case
some SSK requirements are not met at configure time.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ieacbcb5db77fcc12cc13579785e640857ce7fb02
Reviewed-on: https://review.whamcloud.com/47681
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15973 build: remove AC_DEFINE(__state, state, ...) 99/47799/2
Jian Yu [Mon, 27 Jun 2022 17:30:16 +0000 (10:30 -0700)]
LU-15973 build: remove AC_DEFINE(__state, state, ...)

RHEL 8.6 build failed with MLNX_OFED 5.6-2.0.9.0 as follows:

error: 'struct task_struct' has no member named 'state'; did you mean '__state'?
 #define __state state
                 ^~~~~

The failure was introduced by
commit bb7c82f13e7a01891edcdf0626c6fb91a240e56e and a proper
way to resolve the original issue was in
commit c04adbcd76725a360f411f09c63df785bf7db426. So, let's
remove the improper way AC_DEFINE(__state, state, ...).

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: Icbc897cf5870352311262af8bd59ec24ea9d7301
Fixes: bb7c82f13e7 ("LU-15795 kernel: RHEL 8.6 server support")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15962 build: add in-kernel Module.symvers to symbol path 99/47699/2
Jian Yu [Wed, 22 Jun 2022 18:52:45 +0000 (11:52 -0700)]
LU-15962 build: add in-kernel Module.symvers to symbol path

After building Lustre with in-kernel OFED, installing
ko2iblnd module hit the following errors:

ko2iblnd: disagrees about version of symbol __ib_alloc_pd
ko2iblnd: Unknown symbol __ib_alloc_pd (err -22)
ko2iblnd: disagrees about version of symbol rdma_resolve_addr
ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)

Those exported symbols are contained in in-kernel Module.symvers,
which should be added to the symbol path KBUILD_EXTRA_SYMBOLS.

Test-Parameters: trivial

Change-Id: Ic30caa7079af00a452ea24e7e982a856874af702
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47699
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15940 build: add a required dependency for libmount 19/47619/5
Jian Yu [Sun, 19 Jun 2022 07:47:01 +0000 (00:47 -0700)]
LU-15940 build: add a required dependency for libmount

The Lustre client utilities (mount/umount) have
an optional dependency on libmount to update utab.
However, libmount has been introduced in util-linux
since 2.18 in 2010. There is no need to make the
dependency as optional.

Test-Parameters: trivial clientdistro=ubuntu2004
Test-Parameters: trivial clientdistro=el8.5

Change-Id: I4b965a5ce6cb6fc5d2061a53c44ef9b709ebab49
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47619
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15652 build: On Debian detect -common kernel headers 36/46836/6
Shaun Tancheff [Wed, 25 May 2022 16:46:50 +0000 (23:46 +0700)]
LU-15652 build: On Debian detect -common kernel headers

Check for a matching /usr/src/linux-headers-<ver>-common/
and update the --with-linux argument accordingly.

Also move LC_GLIBC_SUPPORT_COPY_FILE_RANGE outside
of utils as this also breaks the dkms build on Debian
with 'static' follows 'non static' declairation of
copy_file_range.

Fixes: e6d1968fbbad ("LU-13903 build: Move GLIBC/openssl checks to where needed")
HPE-bug-id: LUS-10826
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6e4f0b27eba6c5b07cda14f064e57aa9c93ae3cc
Reviewed-on: https://review.whamcloud.com/46836
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15703 ldiskfs: Disable unused fast commit buffer 43/46943/8
Shaun Tancheff [Tue, 31 May 2022 00:27:52 +0000 (20:27 -0400)]
LU-15703 ldiskfs: Disable unused fast commit buffer

Linux commit v5.9-rc7-39-g6866d7b3f2bb
    ext4 / jbd2: add fast commit initialization

Disable journal fast commit buffer via a mount option because it is
not used by lustre's ldiskfs since it will break recovery.

Linux commit v5.10-rc2-9-gede7dc7fa0af
    jbd2: rename j_maxlen to j_total_len and add
          jbd2_journal_max_txn_bufs

Change osd_transaction_size to use jbd2_journal_get_max_txn_bufs
and provide a jbd2_journal_get_max_txn_bufs when it is not
provided.

Test-Parameters: trivial
HPE-bug-id: LUS-10858
Fixes: c93a3e5b15 ("LU-14195 ldiskfs: update patches for Linux 5.10")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I9bffc3559a8bbce9d4c1c2b6692cb8518f3f991a
Reviewed-on: https://review.whamcloud.com/46943
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
21 months agoLU-13562 build: get correct kernel flavor for SLES 94/47594/2
Jian Yu [Sat, 11 Jun 2022 03:57:54 +0000 (20:57 -0700)]
LU-13562 build: get correct kernel flavor for SLES

This patch fixes lustre.spec.in to get correct kernel flavor
for SLES when kobjdir is detected as /lib/modules/%{_kver}/build.

Test-Parameters: trivial clientdistro=sles15sp3

Change-Id: I350032af383ea8b7f48accd93e5cd11c571e6620
Fixes: d746e64fe1 ("LU-13562 build: SUSE build support for azure, cray_ari_s")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15838 autoconf: fix use of obsolete macros 88/47288/3
Jian Yu [Mon, 30 May 2022 21:02:50 +0000 (14:02 -0700)]
LU-15838 autoconf: fix use of obsolete macros

This patch fixes the following warnings when using autoconf 2.71:

configure.ac:2: warning: AC_INIT: not a literal:
                "m4_esyscmd(sh -c "./LUSTRE-VERSION-GEN | tr -d '\n'")"
configure.ac:10: warning: The macro `AC_CANONICAL_SYSTEM' is obsolete.
configure.ac:16: warning: The macro `AC_PROG_LIBTOOL' is obsolete.
configure.ac:24: warning: The macro `AC_HELP_STRING' is obsolete.

Like m4_esyscmd, macro m4_esyscmd_s (introduced in autoconf 2.64)
expands to the result of running command in a shell. The difference
is that any trailing newlines are removed.

Since autoconf 2.50, macro 'AC_CANONICAL_TARGET' has been the new name
of 'AC_CANONICAL_SYSTEM':
AU_ALIAS([AC_CANONICAL_SYSTEM], [AC_CANONICAL_TARGET])

Since autoconf 2.58, macro 'AS_HELP_STRING' has been added to replace
'AC_HELP_STRING'.

Since libtool 2.0, new 'LT_INIT' interface has been added to replace
'AC_PROG_LIBTOOL'.

Change-Id: I3c06c21460d7a2cf643fe825e72a26a5416609cf
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47288
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15968 build: update libssl3 51/47751/2
Minh Diep [Fri, 24 Jun 2022 06:19:03 +0000 (23:19 -0700)]
LU-15968 build: update libssl3

In Ubuntu 22.04 libssl1.1 has been superceded by libssl3

Test-Parameters: trivial clientdistro=ubuntu2004

Change-Id: Ic2504c3a40d5c756c2da7b81edb1501fb6b44712
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15005 build: wrong dependencies for lustre-client-modules deb package 47/47747/2
Alex Deiter [Fri, 24 Jun 2022 06:04:55 +0000 (23:04 -0700)]
LU-15005 build: wrong dependencies for lustre-client-modules deb package

Fixed dependencies for DKMS deb package:
- added autocon, automake and libtool
- added bison and flex
- added required dev packages
- added linux-base and linux-image
- added python3-distutils-extra to fix build on Ubuntu 16.04

Change-Id: Ic1d05155cd8ad056dece1d22d0f040695d038652
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/47747
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15967 build: configure script does not check for required build tools 44/47744/2
Alex Deiter [Fri, 24 Jun 2022 05:55:58 +0000 (22:55 -0700)]
LU-15967 build: configure script does not check for required build tools

- added check for flex and bison
- added requirement for build kernel modules

Change-Id: I4f4f19ea44f3cd8f69482d950970bf701e81f7ec
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/47744
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15939 build: configure script changes system header and config files 04/47604/2
Alex Deiter [Sun, 12 Jun 2022 19:26:26 +0000 (12:26 -0700)]
LU-15939 build: configure script changes system header and config files

Remove the SUBDIRS target from configure tests.
Since Linux kernel version 2.6.x we can use target M.

Test-Parameters: trivial clientdistro=ubuntu2004
Test-Parameters: trivial clientdistro=el8.5

Change-Id: I8e59bdaf2d0e4e08a659e08f63a14472fba72eb2
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/47604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15914 lnet: Fix null md deref for finalized message 46/47546/4
Chris Horn [Mon, 6 Jun 2022 18:09:03 +0000 (13:09 -0500)]
LU-15914 lnet: Fix null md deref for finalized message

When a message is finalized the lnet_msg:msg_md field may be cleared
(see lnet_finalize() -> lnet_msg_detach_md()).
When an LNet router is forwarding such message, or if an ACK has been
requested for such a message, then the NULL msg_md may be deref'd in
lnet_get_best_ni(). Check for this in lnet_get_best_ni() before
dereferencing the MD.

It may also be dereferenced in kiblnd_send(), so check for this
situation there, too. Some style cleanup is included in
kiblnd_send().

Test-Parameters: trivial
Fixes: 959304eac7 ("LU-15189 lnet: fix memory mapping.")
HPE-bug-id: LUS-10997
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3cfdc8d342bd3b49a61d1ce6c31a245848accf8f
Reviewed-on: https://review.whamcloud.com/47546
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15910 tests: skip sanity/413g for SSK 00/47800/3
Andreas Dilger [Mon, 27 Jun 2022 18:24:37 +0000 (12:24 -0600)]
LU-15910 tests: skip sanity/413g for SSK

When running sanity test_413g under review-dne-selinux-ssk-part-1
it intermittently fails.  Temporarily disable this subtest for
this config until the problem is understood and fixed.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iacda6add8baabd479ea05dff06867f1a7afb23ce
Reviewed-on: https://review.whamcloud.com/47800
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15910 llite: enforce ROOT default on subdir mount 18/47518/4
Lai Siyao [Sat, 21 May 2022 02:21:38 +0000 (22:21 -0400)]
LU-15910 llite: enforce ROOT default on subdir mount

In subdirectory mount, the filesystem-wide default LMV doesn't take
effect. This fix includes the following changes:
* enforce the filesystem-wide default LMV on subdirectory mount if
  it's not set separately.
* "lfs getdirstripe -D <subdir_mount>" should print the
  filesystem-wide default LMV.

Add sanity test_413g.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I26de9d02872f0df8918b4ef0765b6b18b84794e6
Reviewed-on: https://review.whamcloud.com/47518
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15900 hsm: don't return error on state change during mount 73/47473/2
Mikhail Pershin [Sat, 28 May 2022 06:59:03 +0000 (09:59 +0300)]
LU-15900 hsm: don't return error on state change during mount

HSM coordinator is started in stopped state always, but
mount may have hsm_control parameters 'disabled'. Such
parameter cause wrong state change so mount would fail
with error.

Treat parameter change from 'stopping/stopped' to 'disabled'
as not critical error, keep state unchanged and report
no error back to a caller.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9d1366423391971b9511c46b6aed39d21ebf637c
Reviewed-on: https://review.whamcloud.com/47473
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15860 socklnd: Duplicate ksock_conn_cb 61/47361/3
Chris Horn [Thu, 12 May 2022 18:16:10 +0000 (13:16 -0500)]
LU-15860 socklnd: Duplicate ksock_conn_cb

If two threads enter ksocknal_add_peer(), the first one to acquire
the ksnd_global_lock will create a ksock_peer_ni and associate a
ksock_conn_cb with it.

When the second thread acquires the ksnd_global_lock it will find the
existing ksock_peer_ni, but it does not check for an existing
ksock_conn_cb. As a result, it overwrites the existing ksock_conn_cb
(ksock_peer_ni::ksnp_conn_cb) and the ksock_conn_cb from the first
thread becomes stranded.

Modify ksocknal_add_peer() to check whether the peer_ni has an
existing ksock_conn_cb associated with it

Fixes: 7766f01e89 ("LU-13641 socklnd: replace route construct")
HPE-bug-id: LUS-10956
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6c0190a0c1d3321ddd85c763b86ad1f0d32cf2b9
Reviewed-on: https://review.whamcloud.com/47361
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15855 enc: enc-unaware clients get ENOKEY if file not found 49/47349/2
Sebastien Buisson [Fri, 13 May 2022 15:03:22 +0000 (17:03 +0200)]
LU-15855 enc: enc-unaware clients get ENOKEY if file not found

To reduce issues with applications running on clients without keys
or without fscrypt support that check for the existence of a file in
an encrypted directory, return -ENOKEY instead of -ENOENT.
For encryption-unaware clients, this is done on server side in the
mdt layer, by checking if clients have the OBD_CONNECT2_ENCRYPT
connection flag.
For clients without the key, this is done in llite when the searched
filename is not in encoded form.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9a3b7af3a856b7fc7222c61a308ad23168869d57
Reviewed-on: https://review.whamcloud.com/47349
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15839 tests: correct the ZFS grace time for sanity-quota 4a 89/47289/3
Etienne AUJAMES [Wed, 11 May 2022 07:26:01 +0000 (09:26 +0200)]
LU-15839 tests: correct the ZFS grace time for sanity-quota 4a

For  sanity-quota 4a, the grace time is increased from 12s to 20s but
not actually set on filesystem.

Fixes: 3e4c3fdc ("LU-6836 test: re-add test 4a to sanity-quota for ZFS")
Test-Parameters: fstype=zfs testlist=sanity-quota env=ONLY=4a,ONLY_REPEAT=100
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I2324e818a42a19bc9928f127b1622f1e5274db1f
Reviewed-on: https://review.whamcloud.com/47289
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
21 months agoLU-15420 build: fixes to support building on Ubuntu 22.04 LTS 33/47133/5
James Simmons [Mon, 25 Apr 2022 17:42:32 +0000 (13:42 -0400)]
LU-15420 build: fixes to support building on Ubuntu 22.04 LTS

Lustre uses the glibc stdarg.h instead of the kernel's version which
causes the following build issue.

lustre/include/lu_object.h:35,
/usr/lib/gcc/x86_64-linux-gnu/11/include/stdarg.h:52: note: this is the
location of the previous definition
        #define va_copy(d,s)    __builtin_va_copy(d,s)

The solution is to use the kernels version of stdarg.h

The second build issue :
update_trans.c:1608:30: error: ‘struct task_struct’ has no member named
                        ‘state’; did you mean ‘__state’?

is due Linux commit 2f064a59a11ff9bc22e52e9678bc601404c7cb34
(sched: Change task_struct::state). The state field was
changed and the barrier macros READ_ONCE()/WRITE_ONCE()
are used to access it now which is the proper thing to do.
Since the check in update_trans.c is equivalent to testing
if the kernel thread is not running, since TASK_RUNNING == 0,
we can just change the code to use task_is_running(). The
task_is_running() was introduced in 5.13.

Test-Parameters: trivial
Change-Id: Ib5985b187c3013fbc513e9962a5f27bed4996f5b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47133
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15055 lod: run qmt_pool_* only from the MDT0000 config 59/47059/6
Etienne AUJAMES [Wed, 13 Apr 2022 14:43:12 +0000 (16:43 +0200)]
LU-15055 lod: run qmt_pool_* only from the MDT0000 config

On the first mds (with MDT0000/QMT0000), if there is more than one MDT
target, qmt_pool_{new/del/rem/add} functions will be call several
times on QMT0000 for the same pool.

This resulting to the following error in dmseg:
LustreError: 5659:0:(qmt_pool.c:1390:qmt_pool_add_rem()) add to: can't
scratch-QMT0000 scratch-OST0000_UUID pool pool1: rc = -17

This patch run qmt_pool_* only from a record config from the MDT0000.
The qmt_pool_add_rem() dmesg error is checked on sanity-quota test_1b.

Test-Parameters: mdtcount=2 mdscount=1 testlist=sanity-quota
Fixes: 09f9fb32 ("LU-11023 quota: quota pools for OSTs")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ia6b712abe25a4d68770753e3408c3321181db1aa
Reviewed-on: https://review.whamcloud.com/47059
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoLU-15117 ofd: don't take lock for dt_bufs_get() 29/47029/5
Alex Zhuravlev [Mon, 11 Apr 2022 08:30:44 +0000 (11:30 +0300)]
LU-15117 ofd: don't take lock for dt_bufs_get()

osd_bufs_get() allocates the pages and can cause new transactions
as part of memory release procedure. this would break Lustre's
"start a transaction, then do locking" rule.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I58a52af8e2fbbc4823aafc133893e1defedf99b7
Reviewed-on: https://review.whamcloud.com/47029
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>