Whamcloud - gitweb
fs/lustre-release.git
3 months agoLU-15632 tests: Typo in check_node_health 46/46746/3
Chris Horn [Tue, 8 Mar 2022 02:59:05 +0000 (20:59 -0600)]
LU-15632 tests: Typo in check_node_health

A typo in test-frameworks.sh::check_node_health() causes test failures
if "lctl get_param catastrophe" doesn't return any output.

Test-Parameters: trivial
Fixes: 67752f6db2 ("LU-14773 tests: skip check_network() on working node")
HPE-bug-id: LUS-10800
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5dbb46578a62f8fc13f39be6fbebe87c75bc2a7d
Reviewed-on: https://review.whamcloud.com/46746
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
3 months agoLU-15636 test: add iabf 44/46744/7
John L. Hammond [Wed, 16 Mar 2022 17:00:36 +0000 (01:00 +0800)]
LU-15636 test: add iabf

Usage: [IABF_OPTIONS...] iabf [INIT] --- A --- B --- [FINI] ---

Initialize, run tasks A and B with various overlaps, and Finalize.

Command lines for INIT, A, B, and FINI are terminated by ---.
If INIT or FINI is empty then it will be skipped.
If INIT or FINI fail then we exit immediately with status 1.

For delay = $IABF_DELAY_BEGIN_NS; delay < $IABF_DELAY_END_NS;
delay += $IABF_DELAY_STEP_NS
  Run initializer (INIT).
  In parallel: Fork, delay *, and exec processes A and B.
    If delay is negative then delay A by abs(delay) ns.
    Otherwise delay B by delay ns.
  Wait for A and B to terminate.
  Run finilizer (FINI).

See lustre/tests/iabf/README for more information.

Test-Parameters: trivial
Change-Id: I97920e082a7a5bec458c805c507b4fefb448427b
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46744
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10994 lov: remove lov_page 21/47221/6
John L. Hammond [Thu, 5 May 2022 16:14:55 +0000 (11:14 -0500)]
LU-10994 lov: remove lov_page

Remove the lov page layer since it does nothing but costs 24 bytes per
page plus pointer chases.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Icd7b4b0041e0fe414a3a4143179f45845177960e
Reviewed-on: https://review.whamcloud.com/47221
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15618 lnet: Return ESHUTDOWN in lnet_parse() 11/46711/2
Chris Horn [Thu, 3 Mar 2022 07:12:32 +0000 (01:12 -0600)]
LU-15618 lnet: Return ESHUTDOWN in lnet_parse()

If the peer NI lookup in lnet_parse() fails with ESHUTDOWN then we
should return that value back to the LNDs so that they can treat the
failed call the same way as other lnet_parse() failures.

Returning zero results in at least one bug in socklnd where a
reference on a ksock_conn can be leaked which prevents socklnd from
shutting down.

Fixes: 47b7b31978 ("LU-8106 lnet: Do not drop message when shutting down LNet")
Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-15794
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic403619c6dccf3921c46a674808c404adad7a30e
Reviewed-on: https://review.whamcloud.com/46711
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15617 contrib: Add shellcheck to prepare-commit-msg 05/46705/6
Arshad Hussain [Fri, 4 Mar 2022 06:41:06 +0000 (12:11 +0530)]
LU-15617 contrib: Add shellcheck to prepare-commit-msg

Add shellcheck verifiction to prepare-commit-msg.
SHELLCHECK_OPT export can be used to further add
option if the user wants. 'shellcheck' by deault
is set to 'yes' (SHELLCHECK_RUN="yes"). However,
it can be switched off by setting SHELLCHECK_RUN=
Tested on shellcheck version 0.8.0

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Id0603b26202b92f68d43b82a3b2d846dd634c20e
Reviewed-on: https://review.whamcloud.com/46705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
3 months agoLU-15598 tgt: free allocated page on error 59/46659/2
Andreas Dilger [Tue, 1 Mar 2022 04:49:53 +0000 (21:49 -0700)]
LU-15598 tgt: free allocated page on error

Free allocated page if cfs_crypto_hash_init() fails.

Fixes: b1e7be00cb6e ("LU-10472 osc: add T10PI support for RPC checksum")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0a45a82a57a98ad2517ccf50a2be1e8d65550bb5
Reviewed-on: https://review.whamcloud.com/46659
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
3 months agoLU-15596 build: set TARBALL with m-a build helper for debs 39/46639/12
Shaun Tancheff [Fri, 25 Mar 2022 18:31:06 +0000 (13:31 -0500)]
LU-15596 build: set TARBALL with m-a build helper for debs

debs built using module-assistant tool by default scan
for packages and may find a random lustre tarball

Specify the correct tarball with the TARBALL
environment variable to avoid default search picking an
incorrect tarball source.

HPE-bug-id: LUS-10783
Test-Parameters: trivial
Fixes: 1d97ac16d8 ("LU-14948 build: Warn about /usr/src/lustre.tar.bz2")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iaa5a31aaa81e11ee97ae2ea27811c7a4399a0efa
Reviewed-on: https://review.whamcloud.com/46639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15573 build: remove mpi, CC deps from config cache 62/46562/2
Shaun Tancheff [Sun, 20 Feb 2022 17:49:39 +0000 (12:49 -0500)]
LU-15573 build: remove mpi, CC deps from config cache

Also drop mpi and CC dependencies compiler type from the
initial values in the configure cache as these may be
changed during the setup in the rpm spec or debian build rules

Test-Parameters: trivial
Fixes: a5084c2f2e ("LU-14937 build: re-use config cache in 'make rpms/debs'")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I986c2ae3653deae08b9da8d64d0b3c02fdc8fa2b
Reviewed-on: https://review.whamcloud.com/46562
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9243 gss: fix GSS struct definition badness 43/46543/6
Sebastien Buisson [Thu, 17 Feb 2022 15:40:20 +0000 (16:40 +0100)]
LU-9243 gss: fix GSS struct definition badness

struct lgssd_ioctl_param should not be defined in multiple places. So
move it to a new header file lgss.h that can be included from kernel
space and user space.

struct gss_header, struct gss_rep_header, struct gss_err_header and
struct gss_wire_ctx are going on the wire, so they need to be moved to
lustre_idl.h, and be wire-checked.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I97c4a8322e6bb7627c6dff5f068931278f4567d7
Reviewed-on: https://review.whamcloud.com/46543
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14875 import: fix bad CPT read 41/46541/2
Cyril Bordage [Thu, 17 Feb 2022 11:49:16 +0000 (12:49 +0100)]
LU-14875 import: fix bad CPT read

When importing, CPT was read from tunables field but in fact, it is in
the same level in the YAML file generated during export.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: Iea7b6189ad1a25b95ae6416d75ee2cbe4dca2fbf
Reviewed-on: https://review.whamcloud.com/46541
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15559 tests: add do_node_vp() and do_facet_vp() 35/46535/4
John L. Hammond [Wed, 16 Feb 2022 17:54:26 +0000 (11:54 -0600)]
LU-15559 tests: add do_node_vp() and do_facet_vp()

Add new test-framework functions (do_node_vp() and do_facet_vp())
which carefully escape and quote command lines for execution on the
local or remote node. Add sanityn test_0 to verify.

Test-Parameters: trivial env=ONLY="0" testlist=sanityn
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic491b0148e6ef11ecd0b3ccce983afcf4d1300e5
Reviewed-on: https://review.whamcloud.com/46535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15635 ldiskfs: Interface change fix server v5.10 75/46775/6
Shaun Tancheff [Fri, 11 Mar 2022 17:14:39 +0000 (00:14 +0700)]
LU-15635 ldiskfs: Interface change fix server v5.10

Linux v5.9-rc7-8-g15ed2851b0f4
    ext4: remove unused argument from ext4_(inc|dec)_count

Test for ext4_inc_count with 2 arguments and provide a compat
wrapper for ldiskfs_inc|dec_count that discards handle has
needed.

HPE-bug-id: LUS-10808
Fixes: c93a3e5b15 ("LU-14195 ldiskfs: update patches for Linux 5.10")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I48fb52e67ad334e2fc0c045e96fc5dffd3243842
Reviewed-on: https://review.whamcloud.com/46775
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15538 lnet: DLC sets map_on_demand incorrectly 92/46492/2
Chris Horn [Sat, 5 Feb 2022 23:15:30 +0000 (23:15 +0000)]
LU-15538 lnet: DLC sets map_on_demand incorrectly

When any NET or LND tunable is specified via CLI or yaml, then the
whole tunables struct gets memset to 0, or in the case of yaml config,
0 gets assigned to any tunable that isn't specified in the yaml. This
causes a problem for map_on_demand because 0 is a valid value for that
parameter, and ko2iblnd cannot know whether the user specified that 0
should be used or if DLC is specifying that the parameter was unset.

Rather than setting this parameter to 0 in the LND tunables struct,
have DLC set it to UINT_MAX to indicate that ko2iblnd should use the
value of the kernel module parameter.

Test-Parameters: trivial
HPE-bug-id: LUS-10740
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I303e64d4d402ba61b5ae3e3910873f192a4a2845
Reviewed-on: https://review.whamcloud.com/46492
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15521 spec: fix bare words error with rpm 4.16 71/46471/3
Jian Yu [Mon, 7 Feb 2022 18:21:47 +0000 (10:21 -0800)]
LU-15521 spec: fix bare words error with rpm 4.16

RPM 4.16 removed support for bare words in expressions
(eg a == b needs to be "a" == "b" now). The change is
backward compatible. More changes are in:
https://rpm.org/wiki/Releases/4.16.0

This patch accommodates the above change and fixes more
errors/warnings:
- E: specfile-error error: bare words are no longer supported,
     please use "...":  redhat=="redhat" || redhat=="fedora"
- E: specfile-error warning: extra tokens at the end of %else
     directive in line 140:  %else #for Suse
- W: macro-in-comment %optflags
- W: macro-in-comment %{name}

Test-Parameters: trivial

Change-Id: I725c47f62be7762a89e5919fd2865e2fb2ced407
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46471
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15522 util: make readline() static 65/46465/2
John L. Hammond [Sun, 6 Feb 2022 17:45:02 +0000 (11:45 -0600)]
LU-15522 util: make readline() static

In libcfs/libcfs/util/parser.c make readline() static to avoid
conflicting with the real readline().

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I4b90fd5d22c9fd5d193f5d18588afd3d97ca591c
Reviewed-on: https://review.whamcloud.com/46465
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15508 gss: protect from arbitrary write to init channel 61/46461/2
Sebastien Buisson [Fri, 4 Feb 2022 14:53:35 +0000 (15:53 +0100)]
LU-15508 gss: protect from arbitrary write to init channel

In case some arbitrary data was written to the gss init channel,
directly return -EINVAL. This protects against unsolicited
authentication requests, and avoids having a dandling entry
in the auth init cache.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iadde630012e4ded83f9609fbb3e10b2e092deb57
Reviewed-on: https://review.whamcloud.com/46461
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15509 lnet: Ping buffer ref leak in lnet_peer_data_present 31/46431/2
Chris Horn [Wed, 2 Feb 2022 22:05:15 +0000 (22:05 +0000)]
LU-15509 lnet: Ping buffer ref leak in lnet_peer_data_present

lnet_peer_merge_data() and lnet_peer_set_primary_data() are
responsible for dropping the reference on the ping buffer that is
taken by lnet_peer_push_event() and lnet_discovery_event_reply().
However, there are some error paths in lnet_peer_data_present()
where we do not call either lnet_peer_merge_data() or
lnet_peer_set_primary_data(). In these cases, we need to drop
the reference on the ping buffer otherwise it will leak.

HPE-bug-id: LUS-10715
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I36ba0017caa9d6ce139f94090912496f14eda626
Reviewed-on: https://review.whamcloud.com/46431
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15511 misc: allow Lustre-change and Lustre-commit 20/46420/3
Andreas Dilger [Wed, 2 Feb 2022 02:34:34 +0000 (19:34 -0700)]
LU-15511 misc: allow Lustre-change and Lustre-commit

Allow the Lustre-change:, Lustre-commit:, Linux-commit: labels in
the signoff section.  Verify that Lustre-change: has a valid-looking
Gerrit URL in "permalink" format "https://review.whamcloud.com/nnnnn".
Verify -commit: labels have a valid-looking Git hashes (40 hex chars),
though the actual hash cannot be verified because it may be from a
different Git repository.

Shorten the help message in case of error, and add "commit-msg --help"
to print the full commit format help.

Allow lines over 70 chars that contain URLs, which cannot be shorter.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I866ab91b33f6e52ff893344df74af243903ebbe5
Reviewed-on: https://review.whamcloud.com/46420
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15393 lod: skip qos for qos_threshold_rr=100 88/46388/3
Alexander Boyko [Mon, 31 Jan 2022 14:04:08 +0000 (09:04 -0500)]
LU-15393 lod: skip qos for qos_threshold_rr=100

Current implementation of qos allocation is called for
every statfs update. It takes lq_rw_sem for write and
recalculate penalties, even whith setting qos_threshold_rr=100.
Which means always use rr allocation. Let's skip unnecessary
locking and calculation for 100% round robin allocation.

HPE-bug-id: LUS-10388
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2fcc272d00a988ca4ba0f745b1d5809d65b28654
Reviewed-on: https://review.whamcloud.com/46388
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15493 tests: facet_failover() improvements 59/46359/10
Elena Gryaznova [Mon, 7 Feb 2022 14:14:25 +0000 (17:14 +0300)]
LU-15493 tests: facet_failover() improvements

Fix template matching in affected facets accounting
(when the == and != operators are used, the string to the
right of the operator is considered a pattern).

Reduce failover duration done by facet_failover(): long
failover duration needs increasing of ldlm_enqueue_min to
avoid evictions with striped objects, so let's do node reboot
and mount on failover node in parallel.

Make wait_clients_import_state() working with a facet list.

Test-Parameters: env=CONF_SANITY_EXCEPT=32a
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
HPE-bug-id: LUS-7112, LUS-9901, LUS-10718
Change-Id: Ibbeeea49632acce590219da53f322afb44fa4ffa
Reviewed-on: https://review.whamcloud.com/46359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15492 build: fallthrough macro for pre/post gcc-7 57/46357/4
Shaun Tancheff [Fri, 28 Jan 2022 07:56:33 +0000 (01:56 -0600)]
LU-15492 build: fallthrough macro for pre/post gcc-7

gcc-7.5 on openSUSE 15:
   error: this statement may fall through [-Werror=implicit-fallthrough=]

Use the __attribute__((fallthrough)) for gcc-7 and later.
and use a no op statement for earlier gcc where the fallthrough
attribute is not available.

Test-Parameters: trivial
Fixes: 5549b1b9e0 ("LU-15220 lustre: use 'fallthrough' pseudo keyword for switch")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib72f5996149c738805f15e354e1e1606d981ce29
Reviewed-on: https://review.whamcloud.com/46357
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13906 build: Fix %{name}-osd-ldiskfs-mount 41/46341/3
Shaun Tancheff [Fri, 28 Jan 2022 08:15:00 +0000 (15:15 +0700)]
LU-13906 build: Fix %{name}-osd-ldiskfs-mount

The following error occurs during installation:

error: Failed dependencies:
    lustre-osd-ldiskfs-mount = 2.14.57_56_g4d94d2f is needed by
    kmod-lustre_ib-osd-ldiskfs-2.14.57_56_g4d94d2f-1.el8.x86_64

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idbcdbd5a1e7793c1359f90c1035ded9fe8e90576
Reviewed-on: https://review.whamcloud.com/46341
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13906 build: Enable using ldiskfsprogs without wc tag 40/46340/5
Shaun Tancheff [Tue, 1 Feb 2022 16:14:43 +0000 (10:14 -0600)]
LU-13906 build: Enable using ldiskfsprogs without wc tag

HPE ldiskfsprogs internal versioning is tagged with 'cr',
Whamcloud ldiskfsprogs internal version is tagged with 'wc'.

Instead depend on an ldiskfsprogs > 1.45.6

Test-Parameters: trivial
HPE-bug-id: LUS-10712
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic1c6037aaf4d9877ee9a3cf54c9285dd36abf64d
Reviewed-on: https://review.whamcloud.com/46340
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15420 uapi: avoid gcc-11 -Werror=stringop-overread 19/46319/4
James Simmons [Sat, 5 Feb 2022 16:34:22 +0000 (09:34 -0700)]
LU-15420 uapi: avoid gcc-11 -Werror=stringop-overread

GCC 11 warns about string and memory operations on fixed address:

In function ‘strlen’,
    inlined from ‘changelog_rec_sname’ at include/uapi/linux/lustre/lustre_user.h:1981:19,
    inlined from ‘mdd_changelog_rec_ext_rename’ at lustre/mdd/mdd_dir.c:932:2,
    inlined from ‘mdd_changelog_ns_store’ at lustre/mdd/mdd_dir.c:1061:3:
include/linux/fortify-string.h:25:33: error: ‘__builtin_strlen’
reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
   25 | #define __underlying_strlen     __builtin_strlen

The reason is that we are looking for an address right after the end
of the chanelog record header which gcc thinks is an overrun. Rework
the code to allow us to index the memory right after the changelog
record header.

Also fix a long hidden bug in the lustre snmp code.

Change-Id: I13479b9074a392330d39f01656b26f9e9a91a8ec
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46319
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-15480 build: style cleanup lustre/tests/aiocp.c 97/46297/6
Shaun Tancheff [Wed, 16 Feb 2022 09:19:00 +0000 (03:19 -0600)]
LU-15480 build: style cleanup lustre/tests/aiocp.c

Add implicit-fallthrough decorator
Also some minor style-cleanups

Test-Parameters: trivial
HPE-bug-id: LUS-10201
Fixes: 689714eb51 ("LU-13846 llite: move iov iter forward by ourself")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3be3b27c1c006d13820ea55a84c083fbfc8c4b0f
Reviewed-on: https://review.whamcloud.com/46297
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15474 test: facet_failover() should mount all facets concurrently 66/46266/2
Andriy Skulysh [Fri, 24 Jul 2020 09:07:27 +0000 (12:07 +0300)]
LU-15474 test: facet_failover() should mount all facets concurrently

A single host can hold multiple facets.
They all are needed to be mounted after reboot.
Mounting all facets for in one by one mode increases
total recovery time and leads to evictions
while using striped files.

Change-Id: Ifa76dfd080cd89f1316ccb1d5552bb12d070168a
HPE-bug-id: LUS-9141
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9859 libcfs: add "default" keyword for debug mask 51/46251/3
Andreas Dilger [Fri, 21 Jan 2022 07:20:56 +0000 (00:20 -0700)]
LU-9859 libcfs: add "default" keyword for debug mask

Allow "lctl set_param debug=default" to reset the debug mask to
the default value.  This is useful if the debug needs to be set
to a higher value temporarily, but should be easily reset back
to the original value afterward.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7d0d8fb81e51afb5ea6f29abea0d0814de3ebbe5
Reviewed-on: https://review.whamcloud.com/46251
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15420 llite: add rcu argument to ->get_acl() callback 86/46086/5
Jian Yu [Tue, 25 Jan 2022 05:51:02 +0000 (21:51 -0800)]
LU-15420 llite: add rcu argument to ->get_acl() callback

Kernel 5.15 commit 0cad6246621b5887d5b33fea84219d2a71f2f99a
added a rcu argument to the ->get_acl() callback.

Test-Parameters: trivial
Change-Id: Icd711b38dda1a5a3c56bd631fa2edd94eab3572c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46086
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15412 tests: Let init_clients_lists() export client vars 94/45994/7
Xinliang Liu [Fri, 7 Jan 2022 02:41:00 +0000 (02:41 +0000)]
LU-15412 tests: Let init_clients_lists() export client vars

init_clients_lists() counts the value of client related variables
correctly. So let it define and export these variables.

This patch can fix sanity test 807 stuck issue when running on
multi-node Lustre cluster and CLIENTS is empty.

Also cleanup client count checking. Now CLIENTS is always set.

Change-Id: I9a5d4b9bde401e14e1d7f6f88b04c8d1c6aea11a
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15393 lod: use killable semaphore for creation path 21/45921/7
Alexander Boyko [Wed, 22 Dec 2021 12:06:16 +0000 (07:06 -0500)]
LU-15393 lod: use killable semaphore for creation path

lod_ost_alloc_qos() function sleeps during ost failover, but object
allocation could use different OSTs. The patch changes
down_write call to down_write_killable and adds timer for a
wakeup.

The main idea of this fix is next, when OST is lost during
lod_ost_alloc_rr() and MDT does not have precreated objects for it
lod_ost_alloc_rr()->..->lod_qos_declare_object_on() would sleep while
holding a lq_rw_sem for read. Any creation thread would stuck at
lod_ost_alloc_qos() waiting lq_rw_sem for write, after statfs update.
Whith a fix sleep is limited and allocation would going through
lod_ost_alloc_rr(). For read lq_rw_sem is shared and stripe allocation
would skip OST without objects.

lod_ost_alloc_rr() refills OST pool with a lq_rw_sem for write, when
lq_rr.lqr_flags has LQ_DIRTY. This should happen only when OST is
added/removed. No need to set LQ_DIRTY for lq_rr when statfs get
error, this flag does not affect any change for pool list at
lod_qos_calc_rr().

Change behaviour for lod_check_and_reserve_ost(), it  would sleep
during object allocation for speed 2 only.

HPE-bug-id: LUS-10388
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I4768c4cf7d2f9f02f0a9e0dfb6d15e02932cb5fe
Reviewed-on: https://review.whamcloud.com/45921
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13783 sec: support of native Ubuntu 20.04 HWE 5.8 kernel 38/46238/10
James Simmons [Thu, 17 Feb 2022 13:44:22 +0000 (06:44 -0700)]
LU-13783 sec: support of native Ubuntu 20.04 HWE 5.8 kernel

For Linux 5.5 kernel a patch to improve nokey names was landed
that removed several variables that Lustre's llite and mdt layer
were using. Rework the code to use other defines that exist.
Second change for Ubuntu is several backports to handle a few
variables changing across different kernel versions. One is the name
change of DCACHE_ENCRYPTED_NAME and the other being
is_chipertext_name. So the simpler approach Ubuntu took was to use
fscrypt_prepare_lookup() and fscrypt_is_nokey_name() to work
around these changes. Lastly the good news is for 5.12 the
stomping of ll_d_ops no longer happens and the special
revalidate_dentry fscrypto function is exported.

Change-Id: I7f70fe9abddf34798e2e01b35099c9a032d92b91
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46238
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15317 osc: Add RPC to iotrace 94/45894/8
Patrick Farrell [Wed, 9 Feb 2022 21:25:10 +0000 (16:25 -0500)]
LU-15317 osc: Add RPC to iotrace

Add RPCs to iotrace debugging.

To avoid creating too much debug output, this debug
ignores the possiblity that an RPC contains non-contiguous
extents.  Thus the eventual visualization will act as
though the RPC is a continuous whole.  I judge this to be
superior to the amount of log data and complexity of
capturing each extent separately.  If that level of detail
is needed, a higher debug level can be used.

Test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6fe416ba44be5572f130704ba9d3e9b85d09c656
Reviewed-on: https://review.whamcloud.com/45894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15317 llite: Add COMPLETED iotrace messages 84/46484/3
Patrick Farrell [Wed, 9 Feb 2022 21:14:23 +0000 (16:14 -0500)]
LU-15317 llite: Add COMPLETED iotrace messages

It's very useful to see how long an I/O call took.  There
are other ways to do this, but the goal is for iotrace to
provide all necessary information for basic I/O performance
analysis, so we add COMPLETED messages to iotrace.

Test-Parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I17f52ebc87a31d5ba34f63dc8b6a279e83cd10ef
Reviewed-on: https://review.whamcloud.com/46484
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15317 llite: Add FID to async ra iotrace 12/45912/7
Patrick Farrell [Tue, 21 Dec 2021 16:34:53 +0000 (11:34 -0500)]
LU-15317 llite: Add FID to async ra iotrace

IOtrace log entries need to include the FID of the file
concerned.  Add this to async readahead.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8d788969f29412ce88f1cafa229977f6efa20962
Reviewed-on: https://review.whamcloud.com/45912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15317 llite: Add strided readahead to iotrace 88/45888/4
Patrick Farrell [Sat, 18 Dec 2021 21:27:59 +0000 (16:27 -0500)]
LU-15317 llite: Add strided readahead to iotrace

We need to capture some additional parameters to correctly
understand the behavior of strided readahead.  Add these
parameters to the existing iotrace message.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7caddf9dfaf9ba5f2645d045d5a4a50562cc1b54
Reviewed-on: https://review.whamcloud.com/45888
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15317 llite: Make iotrace logging quieter 87/45887/4
Patrick Farrell [Sat, 18 Dec 2021 21:23:52 +0000 (16:23 -0500)]
LU-15317 llite: Make iotrace logging quieter

Most of the time, we don't read any pages with readahead,
since we're moving through the window and aren't ready to
read more yet.  That's important for readahead debug, but
there's no need to log it for iotrace.  (This matters
because without this change, this messsage is the large
majority of iotrace messages.)

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I58197acd1ef97c903320a2433ec1d5dcb0e46bd0
Reviewed-on: https://review.whamcloud.com/45887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11787 test: Fix checkfilemap tests for 64K page 29/45629/11
James Simmons [Mon, 31 Jan 2022 17:44:46 +0000 (12:44 -0500)]
LU-11787 test: Fix checkfilemap tests for 64K page

File mapping is page size aligned. Modify the tests to handle 64K
page.

Test-Parameters: trivial clientdistro=el8.5 clientarch=aarch64 testlist=sanityn env=ONLY="71a 71b"
Test-Parameters: trivial clientdistro=el8.5 clientarch=ppc64le testlist=sanityn env=ONLY="71a 71b"
Change-Id: I316a197db8cdd0f9064431f8c572b43adf6110b8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45629
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15583 build: Update ZFS version to 2.1.2 91/45591/6
Jian Yu [Wed, 11 May 2022 23:26:41 +0000 (16:26 -0700)]
LU-15583 build: Update ZFS version to 2.1.2

Update ZFS version to 2.1.2. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.2

Change-Id: If7c81a4b1fe13e29eea1c277b896223f5c06b31a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45591
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15207 libcfs: reset hs_rehash_bits 33/45533/8
Alex Zhuravlev [Thu, 11 Nov 2021 08:19:46 +0000 (11:19 +0300)]
LU-15207 libcfs: reset hs_rehash_bits

if rehash work is cancelled, then nobody resets
hs_rehash_bits and the first iterator asserts
at LASSERT(!cfs_hash_is_rehashing(hs)) in
cfs_hash_for_each_relax().

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I1a567f6be77ca6c45e5d4f256722206b12588554
Reviewed-on: https://review.whamcloud.com/45533
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15193 quota: expand QUOTA_MAX_TRANSIDS to 12 56/45456/6
Lei Feng [Thu, 4 Nov 2021 11:41:06 +0000 (19:41 +0800)]
LU-15193 quota: expand QUOTA_MAX_TRANSIDS to 12

In some rare cases 12 quota ids are needed.
Usually (user, group) * (block, inode) * (inode, parent) = 8 qids
are needed. But with project id,
(user, group, project) * (block, inode) * (inode, parent) = 12 qids
are needed.

Change-Id: I4b3ee197f6e274abda06edf60b246f089fe28d10
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-quota
Reviewed-on: https://review.whamcloud.com/45456
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15082 osp: invalidate statfs data from the timer callback 99/45199/11
Alex Zhuravlev [Tue, 12 Oct 2021 05:26:21 +0000 (08:26 +0300)]
LU-15082 osp: invalidate statfs data from the timer callback

osp_statfs_timer_cb() can be called just before statfs data gets
stale. this in turn may cause early wakeup to the precreate thread
which would find statfs data still up-to-data and go back to slepp.
if no precreate happens to this OSP (e.g. due to current space
usage), then the precreate thread will stay asleep for a long time,
statfs data won't get refreshed and this may block new objects
on the corresponding OST.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I86e16eed6f1068702db696a9ddec7a22994180b7
Reviewed-on: https://review.whamcloud.com/45199
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14979 lnet: set max recovery interval duration 27/44927/8
Cyril Bordage [Wed, 15 Sep 2021 16:15:08 +0000 (18:15 +0200)]
LU-14979 lnet: set max recovery interval duration

Add a tunable parameter to limit the recovery ping interval which was
previously statically set to 900.
This can be done by using:

  lnetctl set max_recovery_ping_interval <value>

Modify sanity-lnet test 210/211 to validate this new functionality.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I766ceb6e03ffdab125005e16472b6f9eeadfb9d5
Reviewed-on: https://review.whamcloud.com/44927
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14264 tests: make PARALLEL available to all suites 63/44763/5
James Nunez [Thu, 26 Aug 2021 23:27:16 +0000 (17:27 -0600)]
LU-14264 tests: make PARALLEL available to all suites

The PARALLEL environment variable is checked and a default
value is set in the sanity test suite, but recovery-small
also uses PARALLEL and does not check/initialize it.

Move check and set the PARALLEL environment variable to a
default value in the init_test_env() routine in
test-framework.sh.

Fixes: 688d5da6a89 (“LU-12846 mdd: return error while delete failed”)
Fixes: 26e8f1137b8 (“LU-13116 mgc: do not lose sptlrpc config lock”)

Test-Parameters: trivial testlist=sanity,recovery-small
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: If773e2ea7300056a0ef00a2cb24e13e20a971bd6
Reviewed-on: https://review.whamcloud.com/44763
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15483 tests: Improve test 398b 21/44321/15
Patrick Farrell [Thu, 15 Jul 2021 19:15:12 +0000 (15:15 -0400)]
LU-15483 tests: Improve test 398b

Test 398b is currently only writing 3 MiB (12 MiB/4) of
data from each thread in the racing set (4 DIO, 4 BIO).
This is such a small amount that it's probably finishing
too quickly to generate the intended overlap & races most
of the time.

It also currently writes only at PAGE_SIZE, which excludes
a lot of possible races.

This is currently a pretty fast test - ~ 2 seconds on my
local machine, similar on Maloo - so let's have it write 4
times more data and use 4 different block sizes, so it will
hit more races.

This is 16x more I/O, but the test still only takes around
16 seconds on my local machine.

This is motivated by difficulties dealing with rare
failures while developing the LU-13799 patch series and
hoping to make them easier to catch.

Test-Parameters: trivial testlist=sanity env=ONLY=398b,ONLY_REPEAT=50
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1092023a526c085dfacd1bd112c2ebb714da63e5
Reviewed-on: https://review.whamcloud.com/44321
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14832 llite: Correct cl_env comments 91/44191/6
Patrick Farrell [Thu, 8 Jul 2021 19:34:33 +0000 (15:34 -0400)]
LU-14832 llite: Correct cl_env comments

The comments related to cl_env caching behavior are
dangerously out of date and misleading, describing an
old caching mechanism which was linked to threads.

This has not been present for some time, and we cannot use
cl_env_get to get the environment for a thread as it
describes.

Correct the various comments and remove a now extraneous
include.

Test-Parameters: trivial
Fixes: 4533271278 ("LU-4257 obdclass: Get rid of cl_env hash table")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0a855094c66281bfb35e234ecad1af3b923e8538
Reviewed-on: https://review.whamcloud.com/44191
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 months agoLU-14771 ptlrpc: Rearrange version mismatch message 29/44029/3
Andrew Elwell [Thu, 17 Jun 2021 23:38:40 +0000 (09:38 +1000)]
LU-14771 ptlrpc: Rearrange version mismatch message

Minor change to reposition the client version string on
console warning message, and merge split string to one line

Test-Parameters: trivial
Signed-off-by: Andrew Elwell <Andrew.Elwell@gmail.com>
Change-Id: I89242cff873028f4f76bafdd5a14f169a98f7875
Reviewed-on: https://review.whamcloud.com/44029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-8137 utils: fix llverdev for use on regular files 30/43230/14
Andreas Dilger [Wed, 7 Apr 2021 20:59:45 +0000 (14:59 -0600)]
LU-8137 utils: fix llverdev for use on regular files

Allow llverdev to work with regular files to verify data contents.

Add a parameter to allow specifying the maximum file size to use.
Allow the use of unit suffix when specifying sizes as arguments.
If the file size is zero, write until the filesystem is full.
Use pread() and pwrite() to avoid having to deal with file offsets.

Add sanityn test_56a/56b to verify proper operation on a large file.

Update llverdev.8 man page to describe options properly.

Test-Parameters: trivial testlist=sanityn env=ONLY=56a,56b ostsizegb=12
Test-Parameters: fstype=zfs testlist=sanityn env=ONLY=56a,56b ostsizegb=12
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib06dc989fe692fd6aae388a79d9aa28f702540e5
Reviewed-on: https://review.whamcloud.com/43230
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15460 test: wait quota pool to be prepared 53/46853/4
Hongchao Zhang [Fri, 25 Mar 2022 02:34:56 +0000 (10:34 +0800)]
LU-15460 test: wait quota pool to be prepared

When one OST pool was created, the corresponding quota pool
could need more time to prepare after it was created, then
fail to check the info of the quota pool

Test-Parameters: trivial
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ibea33403639087f27e438d71c0e87fea5367bc3e
Reviewed-on: https://review.whamcloud.com/46853
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14503 o2iblnd: clean up zombie connections on shutdown 68/42068/6
Serguei Smirnov [Thu, 18 Mar 2021 03:52:16 +0000 (20:52 -0700)]
LU-14503 o2iblnd: clean up zombie connections on shutdown

Clean up zombie connections on net shutdown in o2iblnd
Wake up connd threads and wait for them to do the clean-up
before proceeding.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iff8424d9be7401987046fe9aef6e7a787f5efe83
Reviewed-on: https://review.whamcloud.com/42068
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14422 utils: check empty stripe-count/offset setting 93/41793/6
Bobi Jam [Tue, 23 Mar 2021 03:18:25 +0000 (11:18 +0800)]
LU-14422 utils: check empty stripe-count/offset setting

Prohibit "" stripe count and stripe offset setting in lfs setstripe.

Test-Parameters: trivial
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4d0cc2354c4249a31222e0f90de39e16ec4694a5
Reviewed-on: https://review.whamcloud.com/41793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13970 llite: add option to disable Lustre inode cache 73/39973/14
Lai Siyao [Fri, 18 Sep 2020 09:53:17 +0000 (17:53 +0800)]
LU-13970 llite: add option to disable Lustre inode cache

A tunable option is added to disable Lustre inode cache:
"llite.*.inode_cache=0" (default =1)

When it's turned off, ll_drop_inode() always returns 1, then
the last iput() will release inode.

Add sanity test_433.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0642bdc694dc365a05395c3fae98131e1e7723c6
Reviewed-on: https://review.whamcloud.com/39973
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
3 months agoLU-13906 build: consistent use of %{name} 62/39662/8
Shaun Tancheff [Fri, 28 Jan 2022 08:13:10 +0000 (15:13 +0700)]
LU-13906 build: consistent use of %{name}

Make the of the %{name} macro consistent across lustre packages

Fixes: cfaf0eb92005 ("LU-12214 build: fixes if the name is not just 'lustre'")
Test-Parameters: trivial
HPE-bug-id: LUS-5914, LUS-5915
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I71c0a0e77a0fc7319a166311e33d3ca9de60e499
Reviewed-on: https://review.whamcloud.com/39662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10973 lutf: Fix library path 02/46502/4
Amir Shehata [Thu, 10 Feb 2022 23:54:10 +0000 (15:54 -0800)]
LU-10973 lutf: Fix library path

Correct library path pointing the lnetconfig library.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I60089ba0e86d2c3406e283c44530793d84065674
Reviewed-on: https://review.whamcloud.com/46502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13547 tests: remove ea_inode from mkfs MDT options 82/38582/9
James Nunez [Tue, 12 May 2020 18:38:24 +0000 (12:38 -0600)]
LU-13547 tests: remove ea_inode from mkfs MDT options

The large_dir and ea_inode is enabled by default when MDTs
are formatted.  Thus, remove 'ea_inode' and 'large_dir'
from the mkfs options in the test-framework.

Add a test to make sure that these MDT features are enabled
by default.

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I1b3f1bee7e0c091f8fffc9da0e769eed921dac8f
Reviewed-on: https://review.whamcloud.com/38582
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12186 ec: add necessary structure member for EC file 19/38319/5
Bobi Jam [Mon, 2 May 2022 14:47:47 +0000 (10:47 -0400)]
LU-12186 ec: add necessary structure member for EC file

Added basic structure members for erasure-coding layout.

Test-Parameters: trivial
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I697dc144877d6c5fbe9335dc721200e43749f5d9
Reviewed-on: https://review.whamcloud.com/38319
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14195 libcfs: test for nla_strscpy 76/46876/5
James Simmons [Wed, 25 May 2022 14:20:23 +0000 (10:20 -0400)]
LU-14195 libcfs: test for nla_strscpy

During the development of the Linux 5.10 kernel the function
nla_strlcpy() was replaced by nla_strscpy(). Handle this
change for Lustre.

Test-parameters: trivial
Change-Id: I47f12add619cfd88a3692f0760b8bcc35b7877d9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15788 lmv: try another MDT if statfs failed 52/47152/6
Alexander Boyko [Wed, 27 Apr 2022 08:36:49 +0000 (04:36 -0400)]
LU-15788 lmv: try another MDT if statfs failed

With lazystatfs option statfs could fail if MDT0 is offline.
This leads to MPICH->IOR fail during FOFB tests. A client
could get statfs data from different MDT at DNE setup.

HPE-bug-id: LUS-10581
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Icec83aba0c3ddbc749b782787b1b52faadf34a3e
Reviewed-on: https://review.whamcloud.com/47152
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12756 lnet: Avoid redundant peer NI lookups 23/36623/5
Chris Horn [Sun, 27 Oct 2019 17:44:23 +0000 (12:44 -0500)]
LU-12756 lnet: Avoid redundant peer NI lookups

Each caller of lnet_peer_ni_traffic_add() performs a subsequent call
to lnet_peer_ni_find_locked(). We can avoid the extra lookup by having
lnet_peer_ni_traffic_add() return a peer NI pointer (or ERR_PTR as
appropriate).

lnet_peer_ni_traffic_add() now takes a ref on the peer NI to mimic
the behavior of lnet_peer_ni_find_locked().

lnet_nid2peerni_ex() only has a single caller that always passes
LNET_LOCK_EX for the cpt argument, so this function argument is
removed.

Some duplicate code dealing with ln_state handling is removed from
lnet_peerni_by_nid_locked()

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8e9e2449ef2b958b53abd59cd2c122e5492fbb34
Reviewed-on: https://review.whamcloud.com/36623
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-6864 osp: manage number of modify RPCs in flight 75/14375/23
Gregoire Pichon [Thu, 5 May 2022 21:42:56 +0000 (17:42 -0400)]
LU-6864 osp: manage number of modify RPCs in flight

Currently we use a rpc_lock to ensure concurrent in-flight
request are handled serially to prevent the execution status
from being over written. This patch changes the osp component
to send multiple modify RPCs in parallel to the MDT. This will
improve metadata performance of cross-MDT operations.

For testing replace mkdirmany with createmany -d which does the
same thing.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Icb601afabd6767463634a4c7943ec4206bc758ec
Reviewed-on: https://review.whamcloud.com/14375
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15841 lod: iterate component to collect avoid array 93/47293/3
Bobi Jam [Wed, 11 May 2022 10:36:15 +0000 (18:36 +0800)]
LU-15841 lod: iterate component to collect avoid array

In newly created file, the mirror information hasn't been established
as LOD is trying allocate OST for its components, so we need to
iterate component instead of mirror to collect the avoid guidance
information.

Test-Parameters: testlist=sanity-flr env=ONLY=47,ONLY_REPEAT=40
Fixes: fabf3fe7 ("LU-9007 lod: improve obj alloc for FLR file")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I6bbe5f6b6dfea06c5213b77b7ebb6a5d28aa0d17
Reviewed-on: https://review.whamcloud.com/47293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15786 tests: get maxage param on mds1 properly 45/47145/2
Elena Gryaznova [Tue, 26 Apr 2022 14:25:41 +0000 (17:25 +0300)]
LU-15786 tests: get maxage param on mds1 properly

The correct maxage parameters on mds1:
  osp.${FSNAME}-MDT000[1-N]-osp-MDT0000.maxage

To reproduce the failure just run the following on
failover setup where mds1_HOST != mds1failover_HOST:
  sh llmount.sh
  ONLY="100b 100c" sh replay-single.sh

  error: get_param: param_path
   'osp/*MDT0000*MDT0001/maxage': No such file or directory
  sleep: missing operand
  Try 'sleep --help' for more information.

Fixes: 436cd4fd21 ("LU-14938 tests: fail_abort() in t-f to take care of MDTs")
Test-Parameters: trivial testlist=replay-single,recovery-small
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10804
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Icbedb044c4a008868bd3a99d44aa1c350e7c9eaa
Reviewed-on: https://review.whamcloud.com/47145
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15754 lfsck: skip an inode if iget() returns -ENOMEM 79/47079/2
Artem Blagodarenko [Wed, 13 Apr 2022 20:01:58 +0000 (16:01 -0400)]
LU-15754 lfsck: skip an inode if iget() returns -ENOMEM

After the change
commit c2b6d621c4ffe9936adf7a55c8b1c769672c306f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu Jun 28 15:53:17 2018 -0400 new primitive: discard_new_inode()

find_inode_fast() returns -ESTALE, but iget_locked() replaces
it to the NULL and finally ldiskfs_inode_attach_jinode()
returns -ENOMEM.

So this check in osd_iit_iget() doesn't work.

if (rc == -ENOENT || rc == -ESTALE)
    RETURN(SCRUB_NEXT_CONTINUE);

As a solution we can skip an inode if -ENOMEM returned

Hpe-bug-id: LUS-10833
Change-Id: Icb30610e46e2ab899a512761b63aea248c4f2ada
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/47079
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15724 tests: MDT failover hang reproducer 06/47006/8
Alexander Boyko [Wed, 6 Apr 2022 09:39:27 +0000 (05:39 -0400)]
LU-15724 tests: MDT failover hang reproducer

The patch adds recovery-small 144a test to reproduce
MDT failover hang when precreate threads are blocked on objects.

LustreError: 0-0: Forced cleanup waiting for mdt-kjcf05-MDT0001_UUID
namespace with 46 resources in use, (rc=-110)

Test-Parameters: trivial testlist=recovery-small env=ONLY=144a
HPE-bug-id: LUS-10750
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2743a1b5c8911d6982b527f7e7b7bbbaf310cd04
Reviewed-on: https://review.whamcloud.com/47006
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15724 osp: wakeup all precreate threads 05/47005/5
Alexander Boyko [Wed, 6 Apr 2022 09:06:46 +0000 (05:06 -0400)]
LU-15724 osp: wakeup all precreate threads

Number of threads could sleep at osp_precreate_reserve() and
wait objects from OST. When MDT stops Lustre should wakeup
all threads. When opd_pre_recovering is set any wakeup of
opd_pre_user_waitq is useless. Failover of MDT does not produce
disconnect event, only inactive, so osp_precreate_cleanup_orphans()
can not be awakened.

LustreError: 0-0: Forced cleanup waiting for mdt-kjcf05-MDT0001_UUID
namespace with 46 resources in use, (rc=-110)

 schedule_timeout at ffffffff8e551cd3
 osp_precreate_reserve at ffffffffc17d2d83 [osp]
 osp_declare_create at ffffffffc17c7eb9 [osp]
 lod_sub_declare_create at ffffffffc156415b [lod]
 lod_qos_declare_object_on at ffffffffc155bf42 [lod]
 lod_ost_alloc_rr.constprop.23 at ffffffffc155db2f [lod]
 lod_qos_prep_create at ffffffffc15630a6 [lod]
 lod_declare_instantiate_components at ffffffffc154b237 [lod]

HPE-bug-id: LUS-10750
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: If0164cfbecb1e358d9857421cb234559dc8cecbc
Reviewed-on: https://review.whamcloud.com/47005
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15132 mdc: Use early cancels for hsm requests 81/47181/5
Etienne AUJAMES [Mon, 2 May 2022 12:27:17 +0000 (14:27 +0200)]
LU-15132 mdc: Use early cancels for hsm requests

HSM RELEASE and RESTORE requests take EX layout lock on the MDT side.
So the client can use early cancel for its local lock on the resource
to limit the contention (mdt side).

This patch does not pack ldlm request inside the hsm request because
the field (RMF_DLM_REQ) does not exist in the request. Adding this
field inside the request would break compatibility with _old_ servers.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I30a57b4855c28eef9c55a9645d3b6c491f962b13
Reviewed-on: https://review.whamcloud.com/47181
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13363 lod: do object allocation in OST pool 36/38136/17
Emoly Liu [Thu, 14 Apr 2022 16:04:24 +0000 (12:04 -0400)]
LU-13363 lod: do object allocation in OST pool

Currently, the ltd->ltd_qos.lq_same_space boolean that decides
whether the LOD QOS allocator is active for an allocation or not
is tracked for the entire LOV. But when a pool is specified, this
judgement should be tracked on a per-pool basis.

sanity.sh test_116c is added to verify this patch.

Test-Parameters: ostcount=6 testlist=sanity env=ONLY=116c
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I463d5927c7a9c9171483615d2cec629ec10dc666
Reviewed-on: https://review.whamcloud.com/38136
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15800 ofd: take a read lock for fallocate 68/47268/5
Alex Zhuravlev [Tue, 10 May 2022 07:48:55 +0000 (10:48 +0300)]
LU-15800 ofd: take a read lock for fallocate

there is no need to take an write (exclusive) object's
lock for fallocate - we just need to serialize fallocate
vs destroy, all internal structures should be protected
by OSD and disk filesystem like the write path does.

Fixes: cdaaa87f6b ("LU-14214 ofd: fix locking in ofd_object_fallocate()")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I65986745865ee329c5257a7efca5e79403830608
Reviewed-on: https://review.whamcloud.com/47268
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15718 lnet: improve lnet_selftest speed 02/47002/4
Alexey Lyashkov [Wed, 6 Apr 2022 05:13:46 +0000 (08:13 +0300)]
LU-15718 lnet: improve lnet_selftest speed

lets replace a global spinlock with atomic variables,
to avoid cpu power limit in testing.

Test-Parameters: trivial testlist=lnet-selftest
HPE-bug-id: LUS-10812
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ib12a3b3fe2fe300354e5a7c502bc6f5165e7d05c
Reviewed-on: https://review.whamcloud.com/47002
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15832 lod: clear .do_index_ops in striping free 56/47256/2
Lai Siyao [Thu, 28 Apr 2022 14:48:01 +0000 (10:48 -0400)]
LU-15832 lod: clear .do_index_ops in striping free

LDLM lock can guarantee LOD object directory striping is safe to
access, but lod_striping_free_nolock() should clear .do_index_ops,
otherwise upon some failure the directory striping is freed, while a
subsequent dt_try_as_dir() skips striping initialization, and call
.do_index_ops directly, which will cause crash.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib94a4ef2f8bf5f0d34521abff77d8be46ecbf428
Reviewed-on: https://review.whamcloud.com/47256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-15776 tgt: fix transaction handling in tgt_brw_write() 71/47371/3
Mikhail Pershin [Tue, 17 May 2022 09:57:28 +0000 (12:57 +0300)]
LU-15776 tgt: fix transaction handling in tgt_brw_write()

Hotfix to prevent possible data loss during WRITE replay.
Since commit f0f92773ee18 from LU-14187 the obd_commitrw()
may restart write transaction in OFD and MDT. That causes
transaction number to be assigned multiple times if such
restart happens. Without flag tti_mult_trans the first
transaction number is stored only so later one could remain
not applied causing data loss after recovery.

Patch sets tti_mult_trans for tgt_brw_write() so the latest
transaction number will be used as request transno.

Fixes: f0f92773ee ("LU-14187 osd-ldiskfs: fix locking in write commit")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I364b478591942be5562c3e98ee6e6aa487f3e0c5
Reviewed-on: https://review.whamcloud.com/47371
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15887 test: add always_except() 52/47452/3
John L. Hammond [Wed, 25 May 2022 14:07:37 +0000 (09:07 -0500)]
LU-15887 test: add always_except()

In test-framework.sh, add a new function (always_except()) to replace
manual manipulation of $ALWAYS_EXECPT. Convert sanity.sh to use
always_except() and add a line to contrib/scripts/spelling.txt to
suggest its use.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1b39fe9555bab59e70db00cef73d13102668500a
Reviewed-on: https://review.whamcloud.com/47452
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE 04/47004/2
Xinliang Liu [Wed, 6 Apr 2022 08:06:33 +0000 (08:06 +0000)]
LU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE

This fixes below IO write stuck issue:
-----
[606895.151765] LustreError:
334886:0:(ofd_io.c:1389:ofd_commitrw_write()) lustre-OST0000: restart IO
write too many times: 10000
[606895.207345] LustreError:
334886:0:(ofd_io.c:1389:ofd_commitrw_write()) Skipped 8 previous similar
messages
-------

Which goes into an infinite loop:
ofd_commitrw_write()->osd_write_commit()->osd_ldiskfs_map_inode_pages()
    ->ldiskfs_map_blocks()->ofd_commitrw_write()

The cause is that:
For 64K PAGE_SIZE blocks allocation/mapping. m_lblk should be the
first un-allocated block if m_lblk points at an already allocated
block when create = 1, ldiskfs_map_blocks() will just return with
already allocated blocks and without allocating any new requested
blocks for the extent.

This stuck issue won't happen on 4K PAGE_SIZE. Because for
PAGE_SIZE = blocksize case, if m_lblk points at an already
allocated block it will point at an un-allocated block in next
restart transaction, because the already mapped block/page will
be filtered out in next restart transaction via flag
OBD_BRW_DONE in osd_declare_write_commit().

Change-Id: Iadba0be8875a15a2e2f158ec9571f5ece5637ae0
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/47004
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15658 lod: ost list and pool name conflict 67/46967/3
Vitaly Fertman [Wed, 30 Mar 2022 19:11:42 +0000 (22:11 +0300)]
LU-15658 lod: ost list and pool name conflict

If the OST list is given on setstripe with the -o option, the pool is
unconditionally dropped even if all the OSTs are in. Let the pool stay
in this case.

Also, if the start index given on setstripe with the -i option is out
of the pool, make it similar to the -o option - drop the pool.

HPE-bug-id: LUS-10868
Fixes: b384ea39e5 ("LU-14480 pool: wrong usage with ost list")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I718c237e273689048eb74044eea73de6c212395e
Reviewed-on: https://review.whamcloud.com/46967
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15683 ofd: proper initialize filter_fid in ofd fallocate 19/46919/3
Bobi Jam [Thu, 24 Mar 2022 08:19:31 +0000 (16:19 +0800)]
LU-15683 ofd: proper initialize filter_fid in ofd fallocate

Intialize filter_fid buffer and call xattr set XATTR_NAME_FID properly
in ofd_object_fallocate().

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ied573c39dde77f935622e9fbedb2d71eb3bd8f5d
Reviewed-on: https://review.whamcloud.com/46919
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15519 quota: fallocate does not increase projectid usage 76/46676/7
Arshad Hussain [Mon, 14 Feb 2022 08:36:47 +0000 (14:06 +0530)]
LU-15519 quota: fallocate does not increase projectid usage

fallocate() was not accounting for projectid quota usage.
This was happening due to two reasons. 1) the projectid
was not properly passed to md_op_data in ll_set_project()
and 2) the OBD_MD_FLPROJID flag was not set receive the
projctid.

This patch addresses the above reasons.

Test-case: sanity-quota/78a added

Fixes: 48457868a02a ("LU-3606 fallocate: Implement fallocate preallocate operation")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3ed44e7ef7ca8fe49a08133449c33b62b1eff500
Reviewed-on: https://review.whamcloud.com/46676
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15487 mdd: print FID in mdd_dir_page_build() error 68/46368/4
Andreas Dilger [Wed, 26 Jan 2022 23:33:27 +0000 (16:33 -0700)]
LU-15487 mdd: print FID in mdd_dir_page_build() error

Print the MDT name and FID in mdd_dir_page_build() when an error
is hit.  Because this changes the callback function signature,
also update dt_index_page_build() to print a more useful message.

Add OBD_FAIL_MDS_DIR_PAGE_WALK to allow triggering this codepath
to see if this is the source of problems in error handling.

Minor code style and whitespace cleanups in related functions.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic475f4a2c775871ff5af59a47e0966ba3eed7013
Reviewed-on: https://review.whamcloud.com/46368
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15284 llite: access lli_lsm_md with lock in all places 55/46355/4
Lai Siyao [Fri, 28 Jan 2022 05:08:41 +0000 (00:08 -0500)]
LU-15284 llite: access lli_lsm_md with lock in all places

lli_lsm_md should be accessed with lock in all places. Among all the
changes, ll_rease_page() is inside lock already, except statahead
code.

Test-Parameters: mdscount=2 mdtcount=4 testlist=racer,racer,racer
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1e09402812ce51ce7ab80d9529d488cb5b2879ee
Reviewed-on: https://review.whamcloud.com/46355
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15278 lod: distinguish DIR/REGULAR lod_object members 10/45710/11
Bobi Jam [Thu, 2 Dec 2021 10:34:42 +0000 (18:34 +0800)]
LU-15278 lod: distinguish DIR/REGULAR lod_object members

In lod_striping_free_nolock(), we need to distinguish lod_object
type, since DIR/REGULAR lod_object structure share the same memory
region, it could accidently free some unintended memory if it treat
DIR lod_object as REGULAR one, or vice versa.

Fixes: 6a20bdcc608b ("LU-11376 lov: new foreign LOV format")
Fixes: fdad38781ccc ("LU-11376 lmv: new foreign LMV format")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d4c563725b35f7a75f0f1fbf9c1d35b1799eff4
Reviewed-on: https://review.whamcloud.com/45710
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 socklnd: switch ksocknal_del_peer to lnet_processid 23/44623/4
Mr NeilBrown [Wed, 9 Jun 2021 00:38:48 +0000 (10:38 +1000)]
LU-10391 socklnd: switch ksocknal_del_peer to lnet_processid

ksocknal_del_peer now takes a pointer to a lnet_processid,
with room for a large address.
A NULL means "ANY NID, AND PID".
The "ip" argument was completely unused, so has been removed.

This was the last use of 'struct lnet_process_id' in ksocklnd.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2f3879fdafc9b1effd540af3644febf0d06eb5e2
Reviewed-on: https://review.whamcloud.com/44623
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 socklnd: large processid for ksocknal_get_peer_info 22/44622/4
Mr NeilBrown [Wed, 9 Jun 2021 00:27:38 +0000 (10:27 +1000)]
LU-10391 socklnd: large processid for ksocknal_get_peer_info

Have ksocknal_launch_packet() report a 'struct lnet_processid'
with a large address.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I6dd4b11e0361d893dea519287448028ca0a1ab97
Reviewed-on: https://review.whamcloud.com/44622
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 socklnd: pass large processid to ksocknal_add_peer 21/44621/4
Mr NeilBrown [Tue, 8 Jun 2021 00:52:25 +0000 (10:52 +1000)]
LU-10391 socklnd: pass large processid to ksocknal_add_peer

Teach ksocknal_add_peer() to handle large-address processid, and now
ksocknal_launch_packet() can support IPv6 addresses as well as IPv4.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I90655b8d6e1a2e9fc38a7bf9d492542f76086c70
Reviewed-on: https://review.whamcloud.com/44621
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: switch LNetIsPeerLocal() to take 16-byte addr 21/43621/11
Mr NeilBrown [Fri, 10 Jul 2020 05:58:29 +0000 (15:58 +1000)]
LU-10391 lnet: switch LNetIsPeerLocal() to take 16-byte addr

LNetIsPeerLocal() now takes a 'struct lnet_nid'

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id72a92cda510de6864c82d88912c582745cc9727
Reviewed-on: https://review.whamcloud.com/43621
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: change LNetGet to take 16byte nid and pid. 20/43620/12
Mr NeilBrown [Tue, 30 Nov 2021 15:46:00 +0000 (10:46 -0500)]
LU-10391 lnet: change LNetGet to take 16byte nid and pid.

"self" is now passed to LNetGet as a pointer to a 16-byte-addr nid, or
NULL for "ANY".  "target" is passed as a 16-bytes-addr process_id.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8e0fcd442d5195991b799a8db3ec8030c81f9400
Reviewed-on: https://review.whamcloud.com/43620
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: convert LNetPut to take 16byte nid and pid. 19/43619/12
Mr NeilBrown [Fri, 10 Jul 2020 05:19:38 +0000 (15:19 +1000)]
LU-10391 lnet: convert LNetPut to take 16byte nid and pid.

LNetPut() now takes a 16byte nid for self and similar process_id for
target.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I240caf6fb8b02b1814b9d4883aceda33894786a4
Reviewed-on: https://review.whamcloud.com/43619
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: Change LNetDist to work with struct lnet_nid 18/43618/12
Mr NeilBrown [Fri, 10 Jul 2020 05:04:03 +0000 (15:04 +1000)]
LU-10391 lnet: Change LNetDist to work with struct lnet_nid

LNetDist now takes and returns 'struct lnet_nid'
lustre_uuid_to_peer() is also updated.

The 'dst' and 'src' parameters to LNetDist are now both pointers, and
that can point to the same 'struct lnet_nid'.  Code needs to be
careful not to set *src until after the last use of *dst.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I267f3333e3164778a7df39d6b90f0a9a913fcdcf
Reviewed-on: https://review.whamcloud.com/43618
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: alter lnet_drop_rule_match() to take lnet_nid 17/43617/11
Mr NeilBrown [Fri, 10 Jul 2020 02:39:17 +0000 (12:39 +1000)]
LU-10391 lnet: alter lnet_drop_rule_match() to take lnet_nid

The local nid passed to lnet_drop_rule_match() is now a 16-byte nid.
Various support functions are also changed to embrace 'struct
lnet_nid'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I738bff9cfc8c5a70c736639fdd64a66d2aded186
Reviewed-on: https://review.whamcloud.com/43617
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: change LNetPrimaryNID to use struct lnet_nid 16/43616/12
Mr NeilBrown [Thu, 27 Jan 2022 00:38:21 +0000 (19:38 -0500)]
LU-10391 lnet: change LNetPrimaryNID to use struct lnet_nid

Rather than taking and returning a 4-byte-addr nid, LNetPrimaryNID now
takes a pointer to a struct lnet_nid, and updates it in-place.

Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true testlist=sanity,sanity-sec
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Change-Id: I74f193e5c533125c282f230d272a506129baa365
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43616
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14627 utils: quiet spurious lustre_rmmod message 16/47116/3
Andreas Dilger [Fri, 22 Apr 2022 03:05:13 +0000 (21:05 -0600)]
LU-14627 utils: quiet spurious lustre_rmmod message

Quiet unnecessary "LNET ready to unload" message from lustre_rmmod.

Lustre-change: https://review.whamcloud.com/47116
Lustre-commit: TBD (from ee7aba2beabbf983ccffe8e4881e792943a15b09)

Fixes: 32304d863ae98 ("LU-14627 tests: Create unload_modules_local")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I02efa36bbc1a3f4ab63767176ef53956dcafa589
Reviewed-on: https://review.whamcloud.com/47116
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15000 llog: read canceled records in llog_backup 52/46552/3
Etienne AUJAMES [Fri, 18 Feb 2022 12:26:00 +0000 (13:26 +0100)]
LU-15000 llog: read canceled records in llog_backup

llog_backup() do not reproduce index "holes" in the generated copy.
This could result to a llog copy indexes different from the source.
Then it might confuse the configuration update mechanism that rely on
indexes between the MGS source and the target copy.

This index gaps can be caused by "lctl --device MGS llog_cancel".

This patch add "raw" read mode to llog_process* to read canceled
records. So now llog_backup is able to reproduce an exact copy of
the original.

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I811e23de8f4545bed36a44fedc2638d7418830dd
Reviewed-on: https://review.whamcloud.com/46552
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15542 osd-ldiskfs: exclude EA inode from processing 86/46486/4
Alexey Lyashkov [Thu, 10 Feb 2022 12:46:00 +0000 (15:46 +0300)]
LU-15542 osd-ldiskfs: exclude EA inode from processing

EA inode is ldiskfs internal object and this should be
excluded at any osd access.
Panic fixed by this change because iterator will return
an error when tries to access to the ea inode.

osd_ea_fid_get()) Process leaving (rc=18446744073709551614 : -2
osd_it_ea_rec()) Process leaving (rc=18446744073709551614 : -2)
osd_it_ea_next()) Process entered

HPe-bug-id: LUS-10683
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I73e5325721ae28ea380e9d216c6c6cf7fa0ac4ea
Reviewed-on: https://review.whamcloud.com/46486
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15502 mdt: cleanup whitespace in mdt_open.c 85/46385/3
Andreas Dilger [Sun, 30 Jan 2022 11:58:10 +0000 (04:58 -0700)]
LU-15502 mdt: cleanup whitespace in mdt_open.c

Change spaces to tabs for indent and do not align local variables.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1b89ffd54d41ce42738b061738f990d0326bf9af
Reviewed-on: https://review.whamcloud.com/46385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15189 lnet: fix memory mapping. 82/45482/12
Alexey Lyashkov [Mon, 7 Feb 2022 15:02:14 +0000 (18:02 +0300)]
LU-15189 lnet: fix memory mapping.

Nvidia GDS have a bug which caused incorrect page type detect.
It may return an GPU flag for the kmalloc buffer (ptlrpc_message
in my case).
To workaround this - Whamcloud have both mapping calls, but it's
costly and caused an extra RDMA operations as ko2iblnd trust
an msg_rdma_force flag.
Lets drop extra Nvidia calls and check just an real "user" pages
or GPU flag.

HPe-bug-id: LUS-10520
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I5d70c5e0630b0f16e130a7db0385de2443c11a63
Reviewed-on: https://review.whamcloud.com/45482
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15189 build: add GDS configure options 80/45480/10
Alexey Lyashkov [Thu, 3 Mar 2022 16:52:41 +0000 (19:52 +0300)]
LU-15189 build: add GDS configure options

add ability to specify GDS patch, pickup an GDS API directly from the
GDS sources.

Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Iccc52765540cc90c6a69af34d230ed24b2eb996a
Reviewed-on: https://review.whamcloud.com/45480
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14642 test: add fsx mirror file test mode 73/43473/10
Bobi Jam [Wed, 28 Apr 2021 05:26:18 +0000 (13:26 +0800)]
LU-14642 test: add fsx mirror file test mode

- add fsx mirror file test mode with "-M" option so that fsx can exert
  its IO to FLR file as well as extend/split/resync the FLR file.

- add sanity-flr test_70b() to test fsx with flrmode.

- fix a bug in "lfs mirror verify" to accomodate max mirror count
  instead of (max - 1) mirrors.

- improve "lfs mirror verify -v" print proper data range of its crc-32
  checksum values.

Test-Parameters: trivial testlist=sanity-flr env=ONLY=100,FSXNUM=10000
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib55c7b25dcd82fa0b197ad21268b16c82aab5da9
Reviewed-on: https://review.whamcloud.com/43473
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10073 tests: re-enable lnet selftest smoke test for PPC + ARM 57/38857/15
James Simmons [Mon, 10 Jan 2022 23:43:56 +0000 (18:43 -0500)]
LU-10073 tests: re-enable lnet selftest smoke test for PPC + ARM

LNet selftest has failed on platforms with 64K pages. Recent work
by Alexy Lyashkov should have resolved these issues. Lets enable
the lnet selftest to valid these changes.

Test-Parameters: trivial clientdistro=el8.5 clientarch=aarch64 testlist=lnet-selftest
Test-Parameters: trivial clientdistro=el8.5 clientarch=ppc64le testlist=lnet-selftest

Change-Id: I97684101ea2f90bb41f452955d5f864ca35d61ec
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/38857
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-15404 ldiskfs: port truncate fix to Ubuntu 20 HWE 80/46480/2
James Simmons [Tue, 8 Feb 2022 17:56:18 +0000 (10:56 -0700)]
LU-15404 ldiskfs: port truncate fix to Ubuntu 20 HWE

A fixed landed for the LU-15404 bug which a truncate vs setxattr
race caused a kernel crash. This fix wasn't added in for new
Ubuntu 20 HWE kernels and the 5.10 kernel.

Change-Id: I43b33cd28335d1bc5796aa6651ecf847b4ea24e6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46480
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-15858 sec: reinstate null encryption for file names 55/47355/4
Sebastien Buisson [Mon, 2 May 2022 16:00:37 +0000 (18:00 +0200)]
LU-15858 sec: reinstate null encryption for file names

Reinstate null encryption for file names by adding a new llite
parameter named 'enable_filename_encryption', set to 0 by default.
When this parameter is 0, new empty directories configured as
encrypted ignore the filenames_encryption_mode and use
LLCRYPT_MODE_NULL instead, which is a no-op. This LLCRYPT_MODE_NULL
mode is inherited for all subdirectories and files.
When this parameter is 1, new empty directories configured as
encrypted use the normal encryption mode.

To set this parameter globally for all clients, do on the MGS:
mgs# lctl set_param -P llite.*.enable_filename_encryption=0

Also update sanity-sec test_54 to exercise the new parameter
'enable_filename_encryption'.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9d726ba26cc91a51690d59a81efe3eb98ee2995c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47355
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-15848 ldiskfs: escape encrypted file names 09/47309/8
Sebastien Buisson [Wed, 11 May 2022 13:42:29 +0000 (15:42 +0200)]
LU-15848 ldiskfs: escape encrypted file names

When a Lustre MDT is mounted as ldiskfs, the names of the encrypted
files have to be escaped to avoid breaking the shell.
On CentOS 7, the LDISKFS_ENCRYPT_FL flag does not exist. So we add it,
and when the target is mounted as ldiskfs (LDISKFS_MOUNT_DIRDATA flag
not present) we critical-encode encrypted file names before
presentation. And conversely, we critical-decode names upon lookup.
On CentOS 8, the LDISKFS_ENCRYPT_FL flag exists. The fscrypt functions
from kernel 4.18 are also wired up, but they all refer to -EOPNOTSUPP
or equivalent, so they cannot be used to present usable names. So when
the target is mounted as ldiskfs (LDISKFS_MOUNT_DIRDATA flag not
present) we proceed to critical-encoding of encrypted file names
before presentation, and to critical-decoding upon lookup.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaca467eaa233be8142356efa822962953754c2ce
Reviewed-on: https://review.whamcloud.com/47309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14541 llite: Check vmpage in releasepage 62/47262/12
Patrick Farrell [Thu, 12 May 2022 18:53:08 +0000 (14:53 -0400)]
LU-14541 llite: Check vmpage in releasepage

We cannot release a page if the vmpage reference count is
>1, otherwise we will detach a vmpage from Lustre when the
page is still referenced in the VM.

This creates a situation where page discard for lock
cancellation will not find the page, so we can get stale
data reads.

This re-introduces the LU-12587 issue where direct I/O on
a client falls back to buffered I/O if there are pages in
cache, since it cannot flush them.  This is annoying but
not a huge problem.

Fixes: e59f0c9a245f ("LU-12587 llite: don't check vmpage refcount in ll_releasepage()")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3aa1cd7330f5e7d1ba2ddb0c12779aa22f3d70b7
Reviewed-on: https://review.whamcloud.com/47262
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
4 months agoLU-14541 llite: add rw_seq_cst_vs_drop_caches 43/47243/6
John L. Hammond [Fri, 6 May 2022 18:54:13 +0000 (13:54 -0500)]
LU-14541 llite: add rw_seq_cst_vs_drop_caches

Add a reproducer (rw_seq_cst_vs_drop_caches) for the read/write vs
drop_caches sequnetial consistency violation described in
LU-14541. Add an always excepted test (sanityn test_16f) to run
rw_seq_cst_vs_drop_caches.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I557ae7386b38214110a4d85ba0515e95fed7a11e
Reviewed-on: https://review.whamcloud.com/47243
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>