Whamcloud - gitweb
fs/lustre-release.git
8 months agoLU-15189 osc: don't have extra nvidia call 81/45481/5
Alexey Lyashkov [Mon, 8 Nov 2021 06:36:08 +0000 (09:36 +0300)]
LU-15189 osc: don't have extra nvidia call

osc don't needs to call nvidia to check an GPU page,
this is in the oap_flags

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I124c328838ad9823361afef33d0732fa4ebbb696
Reviewed-on: https://review.whamcloud.com/45481
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15358 tests: Variable incorrectly defined under sanityn 19/45819/2
Arshad Hussain [Fri, 10 Dec 2021 04:34:26 +0000 (10:04 +0530)]
LU-15358 tests: Variable incorrectly defined under sanityn

Under sanityn.sh/print_jbd_stat() local variable
was incorrectly defined. This was exposed using
shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In lustre/tests/sanityn.sh line 950:
local varcvs
      ^-- SC2034: varcvs appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanityn
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7b2f62c15e420a4c6f5d71445a2e940816e20098
Reviewed-on: https://review.whamcloud.com/45819
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
9 months agoLU-15360 tests: Use saved value on EXIT/Restore 21/45821/2
Arshad Hussain [Fri, 10 Dec 2021 09:46:54 +0000 (15:16 +0530)]
LU-15360 tests: Use saved value on EXIT/Restore

This was originally reported by shellcheck as
unused variable. However, on closer inspection
it appears that the restore on "EXIT" was
hard-coded to 0 (mostly this should be correct)
instead of using the original value of $old

This patch resets 'enable_chprojid_gid' value
to original value captured in $old instead of
hard-coded value of 0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ./lustre/tests/sanity-quota.sh line 4150:
local old=$(do_facet mds1 $LCTL get_param -n \
      ^-- SC2034: old appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I31e7a8a931d53a1fcb9d77ecf1759fce572bd52c
Reviewed-on: https://review.whamcloud.com/45821
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15220 utils: fix gcc-11 -Werror=format-truncation= error 15/45815/2
Jian Yu [Thu, 9 Dec 2021 20:00:36 +0000 (12:00 -0800)]
LU-15220 utils: fix gcc-11 -Werror=format-truncation= error

This patch fixes the following -Werror=format-truncation= error in
liblustreapi.c:

liblustreapi.c: In function ‘lov_dump_comp_v1’:
liblustreapi.c:3673:57: error: ‘snprintf’ output may be truncated
before the last format character [-Werror=format-truncation=]
 3673 |                 snprintf(pool_name, LOV_MAXPOOLNAME, "%s",
      |                                                         ^

Change-Id: I55c3e05a933ff3d2c33a71ed269fffe63797b528
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45815
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15356 tests: get rid of extra spaces in PERM_CMD 13/45813/2
Elena Gryaznova [Thu, 9 Dec 2021 18:50:37 +0000 (21:50 +0300)]
LU-15356 tests: get rid of extra spaces in PERM_CMD

The tests use the PERM_CMD set to "set_param  -P" with
extra space before -P" fail because they do not expect
these allowable extra spaces:
   [[ $PERM_CMD = *"set_param -P"* ]]

Fixes: b9c359a70d ("LU-7004 tests: move from lctl conf_param to lctl set_param -P")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: Ia18e32baa56b7dac1f4e15777bfcc4b9ab1048fb
Reviewed-on: https://review.whamcloud.com/45813
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14776 ldiskfs: support Ubuntu 5.8.0-63 94/45794/2
James Simmons [Sat, 4 Dec 2021 22:51:23 +0000 (15:51 -0700)]
LU-14776 ldiskfs: support Ubuntu 5.8.0-63

Handle small changes in ext4 for Ubuntu 5.8.0-63 release.

Change-Id: Ie81b64909a49e66af17b4dfc1b8fbaf538f9f29e
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45794
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15342 tests: escape "|" 88/45788/3
Elena Gryaznova [Wed, 8 Dec 2021 11:19:44 +0000 (14:19 +0300)]
LU-15342 tests: escape "|"

escape "|" on want="FULL|IDLE" to protect interpretation
by shell:
  sh: IDLE: command not found

Fixes: af666bef05 ("LU-12857 tests: allow clients to be IDLE after recovery")
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I2f885ea225ba43537f37b8dad1c2e0cd8f652a79
Reviewed-on: https://review.whamcloud.com/45788
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15339 tests: Increase timeout in sanity 208 79/45779/2
Patrick Farrell [Tue, 7 Dec 2021 21:54:20 +0000 (16:54 -0500)]
LU-15339 tests: Increase timeout in sanity 208

It's been observed that occasionally the initial request in
sanity 208 does not complete in 1 second, which invalidates
the test.  (And sometimes causes it to fail - but even if
it passes, the test is invalid.)

Increase the time to 2 seconds.

Using trivial testing because this just modifies sanity and
it's such a simple change.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I70cf32813a9a2ced0cc388eb25eba29918ba7d03
Reviewed-on: https://review.whamcloud.com/45779
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
9 months agoLU-15338 tests: check whole jobid in sanity 205a 74/45774/3
Andreas Dilger [Tue, 7 Dec 2021 19:12:00 +0000 (12:12 -0700)]
LU-15338 tests: check whole jobid in sanity 205a

Check the whole jobid string in sanity test_205a to avoid matching
a substring of the jobid twice.  This could only currently happen
for the second "dd" test, at a rate about 1/8192, but might also
fail in the future if other tests are added.

Test-Parameters: trivial testlist=sanity env=ONLY=205a,ONLY_REPEAT=200
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34b7ed1a7825e3fbad9ea8666fccb2bdc53ebbe5
Reviewed-on: https://review.whamcloud.com/45774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15331 kernel: kernel update SLES15 SP2 [5.3.18-24.96.1] 64/45764/2
Jian Yu [Tue, 7 Dec 2021 07:11:51 +0000 (23:11 -0800)]
LU-15331 kernel: kernel update SLES15 SP2 [5.3.18-24.96.1]

Update SLES15 SP2 kernel to 5.3.18-24.96.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 103 125 130 136 154 255 817" \
clientdistro=sles15sp2 \
testlist=sanity

Change-Id: Ia457af76060a96f574cb501af6456afdc7de6411
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45764
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15244 llite: set ra_pages of backing_dev_info with 0 12/45712/4
Qian Yingjin [Thu, 2 Dec 2021 13:16:18 +0000 (08:16 -0500)]
LU-15244 llite: set ra_pages of backing_dev_info with 0

The latest RHEL8.5 kernel sets initial @ra_pages of
backing_dev_info with VM_READAHEAD_PAGES:
struct backing_dev_info *bdi_alloc(int node_id)
{
...
bdi->ra_pages = VM_READAHEAD_PAGES;
bdi->io_pages = VM_READAHEAD_PAGES;
...
}

This will cause that @ra_pages of file readahead state is set
with @bdi->ra_pages, make the readahead is out of Lustre control
and trigger the readahead logic in Linux kernel wrongly. And it
results in the failure sanity 101j.

In this patch, we force to set @ra_pages of backing_dev_info with
0 after setup the backing device info. By this way, it disables
kernel readahead in the super block.

This patch also cleanups the unnecessary setting of @ra_pages in
llite "file.c" and "vvp_io.c".

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If6468109620269c1e76abe3a1cd73c3b40a417a8
Reviewed-on: https://review.whamcloud.com/45712
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15293 build: Add build support for arm64 centos8 91/45691/4
Xinliang Liu [Mon, 22 Nov 2021 01:34:34 +0000 (01:34 +0000)]
LU-15293 build: Add build support for arm64 centos8

This patch adds lbuid support for latest Arm64 CentOS 8.4, 8.5.

Also fix build doesn't use Lustre provided kernel config file
on CentOS8.

Test-Parameters: trivial

Test-Parameters: env=SANITY_EXCEPT="101j" \
 clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Change-Id: I95c7aa7e77ea1cc7a99fdaacc2220e14d2db6185
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45691
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-12678 o2iblnd: convert ibp_refcount to a kref 85/45685/4
James Simmons [Tue, 30 Nov 2021 18:21:49 +0000 (13:21 -0500)]
LU-12678 o2iblnd: convert ibp_refcount to a kref

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I23ade8c2f768c70a1fd330e8c173e0d18f5ff976
Reviewed-on: https://review.whamcloud.com/45685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15234 lnet: Race on discovery queue 70/45670/8
Chris Horn [Mon, 29 Nov 2021 17:38:48 +0000 (11:38 -0600)]
LU-15234 lnet: Race on discovery queue

If the discovery thread clears the LNET_PEER_DISCOVERING bit then a
race window opens when the discovery thread drops the
lnet_peer.lp_lock spinlock and closes when the discovery thread
acquires the lnet_net_lock. If another thread queues the peer for
discovery during this window then the LNET_PEER_DISCOVERING bit is
added back to the peer state, but since the peer is already on the
lnet.ln_dc_working queue, it does not get added to the
lnet.ln_dc_request queue.

When the discovery thread acquires the lnet_net_lock/EX, it sees that
the LNET_PEER_DISCOVERING bit has not been cleared, so it does not
call lnet_peer_discovery_complete() which is responsible for sending
messages on the peer's discovery pending queue.

At this point, the peer is stuck on the lnet.ln_dc_working queue, and
messages may continue to accumulate on the peer's
lnet_peer.lp_dc_pendq.

Fix the issue by re-working the main discovery thread loop so that we
do not release the lnet_peer.lp_lock until after we've determined
whether we need to call lnet_peer_discovery_complete().
This ensures that the lnet_peer is correctly removed from the
discovery work queue and any messages on the peer's
lnet_peer.lp_dc_pendq are sent or finalized.

It is also possible for the lnet_peer.lp_dc_error to be cleared
during the aforementioned window, as well as during the time when
lnet_peer_discovery_complete() is processing the contents of the
lnet_peer.lp_dc_pendq. This could prevent messages on the
lnet_peer.lp_dc_pendq from being correctly finalized. To fix this
issue, the responsibilities of lnet_peer_discovery_error() were
incorporated into lnet_peer_discovery_complete().

HPE-bug-id: LUS-10615
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3779a342de7108105c2fd2bc41373560e8e5ef14
Reviewed-on: https://review.whamcloud.com/45670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15279 ptlrpc: use a cached value 61/45661/5
Alexey Lyashkov [Thu, 25 Nov 2021 18:12:21 +0000 (21:12 +0300)]
LU-15279 ptlrpc: use a cached value

Don't calculate a early reply size - use a cached,
as it don't changed after start

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I3a6bd5013d0646b6165db52d6a7fb38b263756e6
Reviewed-on: https://review.whamcloud.com/45661
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15175 tests: fix ldev tests 13/45613/6
Elena Gryaznova [Thu, 2 Dec 2021 11:31:04 +0000 (14:31 +0300)]
LU-15175 tests: fix ldev tests

generate_ldev_conf() and tests use this fn do not
work on setup with ost targets located not on 1 oss
and do not work on failover setup where
  <facet>_HOST != <facet>failover_HOST.

Fixes: 0f17fc82a89a ("LU-7060 ldev: Added MGS NID substitution to ldev")
Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-2495
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ief38df0b1a0ffa37a8e7a4545a69a453d6dba7bd
Reviewed-on: https://review.whamcloud.com/45613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15095 target: lbug_on_grant_miscount module parameter 21/45521/5
Vladimir Saveliev [Wed, 10 Nov 2021 08:40:50 +0000 (11:40 +0300)]
LU-15095 target: lbug_on_grant_miscount module parameter

Some tests have hit "lctl: error invoking upcall" when setting the
lbug_on_grant_miscount tunable parameter.  Instead, define a module
parameter lbug_on_grant_miscount flag as ptlrpc module parameter,
similar to how it is done for ldiskfs_track_declares_assert.

Change-Id: I9cd0f9fa75b37539b23443bbcbb3445c87318ab1
Fixes: bb5d81ea95 ("LU-14543 target: prevent overflowing of tgd->tgd_tot_granted")
Test-Parameters: trivial
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45521
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15195 ofd: missing OST object 59/45459/6
Vitaly Fertman [Thu, 4 Nov 2021 14:28:49 +0000 (17:28 +0300)]
LU-15195 ofd: missing OST object

as the OST-MDT resync may be not finished by the end of the recovery
it may happen new enqueue for a write op may fail due to an absent
object. Return EINPROGRESS so that the enqueue was resent until get
resynced.

to not get stuck forever in case of disappeared MDT or a double
failure, return EINPROGRESS during hard failover timeout only.

also, cleanup replay-ost-single test 12:
- eliminate a need in the hard failover
- no need in a special obd_fail_loc, just use replay_barrier
- createmany is able to create files with unique names,
  no need in special steps

HPE-bug-id: LUS-10267
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I5f16b63454c51ad8d112770c15c7e6e7f41f3c40
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/45459
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15137 socklnd: expect two control connections maximum 61/45461/4
Serguei Smirnov [Thu, 4 Nov 2021 18:35:43 +0000 (11:35 -0700)]
LU-15137 socklnd: expect two control connections maximum

As a result of connecting to ourselves, e.g. pinging own nid,
two control type connections are established vs. just one
in case of connecting externally.
Fix the control connection counter to be able to handle that.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 71b2476e ("LU-12815 socklnd: add conns_per_peer parameter")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Idce01d81e3924226b5b163d2472cbcd4f6eb5819
Reviewed-on: https://review.whamcloud.com/45461
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
9 months agoLU-15109 tests: different quota and usage relations 57/45257/6
Sergey Cheremencev [Fri, 15 Oct 2021 12:23:53 +0000 (15:23 +0300)]
LU-15109 tests: different quota and usage relations

Add sanity-quota_1i that following cases:
- User is above PQ limit and the quota limit is cleared.
  User should now be able to write.
- User is below PQ limit and the quota limit is lowered
  below current usage. User should not be able to write.
- User is above PQ limit and the quota limit is raised
  above current usage. Should now be able to write.

Change-Id: Iad81c706aaf838cacfdf2971ee100950c47d1585
HPE-bug-id: LUS-9935
Test-Parameters: testlist=sanity-quota env=ONLY=1i
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45257
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9516 tests: fix sanity test_24v 72/45172/3
Andreas Dilger [Fri, 8 Oct 2021 23:01:00 +0000 (17:01 -0600)]
LU-9516 tests: fix sanity test_24v

The "lfs getdirstripe -c" command will return stripes=0 for
unstriped directories.  Handle this when calculating free_inodes
to avoid creating zero files for this test.

Speed cleanup of test_24v and other users of simple_cleanup_common()
by using unlinkmany to delete files if the file count is provided.

Use stack_trap consistently and don't do both manual and exit cleanup.

Test-Parameters: trivial testlist=sanity env=ONLY=24v
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I25105a5d0ab719d41bf41cff0aaea6d00a9c4059
Reviewed-on: https://review.whamcloud.com/45172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14965 ldiskfs: rhel7.6 inode mutex for ldiskfs_orphan_add 65/45165/2
Bobi Jam [Fri, 8 Oct 2021 09:21:22 +0000 (16:21 +0700)]
LU-14965 ldiskfs: rhel7.6 inode mutex for ldiskfs_orphan_add

See following warning:

ldiskfs/namei.c:3331 ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
Call Trace:
dump_stack+0x19/0x1b
__warn+0xd8/0x100
warn_slowpath_null+0x1d/0x20
ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
ldiskfs_xattr_inode_orphan_add+0xbb/0x110 [ldiskfs]
ldiskfs_xattr_delete_inode+0x5c/0x350 [ldiskfs]
ldiskfs_evict_inode+0x1a8/0x630 [ldiskfs]
evict+0xb4/0x180
iput+0xfc/0x190
osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
lu_object_free.isra.27+0xb8/0x1c0 [obdclass]
lu_object_put+0xa5/0x460 [obdclass]
mdt_object_put+0x30/0x110 [mdt]
mdt_reint_unlink+0x8e0/0x1890 [mdt]
mdt_reint_rec+0x83/0x210 [mdt]
mdt_reint_internal+0x720/0xaf0 [mdt]
mdt_reint+0x67/0x140 [mdt]
tgt_request_handle+0x7ea/0x1750 [ptlrpc]
ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21

Need to hold inode mutex on the external EA for ldiskfs_orphan_add()
to soothe the warning.

This is a port of:

Lustre-commit: 7d3b5d9fdc766411eacaed27fb2fd9250800f096
Lustre-change: https://review.whamcloud.com/44754

Test-Parameters: trivial
Fixes: f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock() for truncates")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I47a01862793afaac1d7c311f1b6d65d2cf4bb93f
Reviewed-on: https://review.whamcloud.com/45165
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13717 sec: fix handling of encrypted file with long name 63/45163/3
Sebastien Buisson [Tue, 5 Oct 2021 14:51:52 +0000 (16:51 +0200)]
LU-13717 sec: fix handling of encrypted file with long name

The ciphertext representation of the name of an encrypted file or
directory can be up to 256 bytes of binary data, if the cleartext
name is up to NAME_MAX. But then this ciphertext is encoded via
critical_encode() before being sent to servers. Once encoded, the
length can exceed NAME_MAX because of the escaped critical
characters.
So make sure ll_prep_md_op_data() accepts those too long encoded names
if it is called for lookup or create of an encrypted file or
directory. In the other cases, the 'name' taken as input is the plain
text version, so it must conform to the NAME_MAX limit.

When carrying out operations on an encrypted file with long name, we
manipulate a digested form whose hash needs to be matched against the
content of the LinkEA. The name found in the LinkEA is not NUL
terminated, so this aspect must be taken care of.

Fixes: 4d38566a00 ("LU-13717 sec: filename encryption")
Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4b0e51eee5e549ab56292fe0fec3c1be1b487fc7
Reviewed-on: https://review.whamcloud.com/45163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT 84/44984/7
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/44984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14677 sec: remove MIGRATION_ compatibility defines 57/44957/11
Sebastien Buisson [Fri, 10 Sep 2021 12:03:03 +0000 (14:03 +0200)]
LU-14677 sec: remove MIGRATION_ compatibility defines

Remove the MIGRATION_* compatibility flags and use
LLAPI_MIGRATION_* everywhere.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab2a2f6dfc435377e9db0d4963547841b2cbc403
Reviewed-on: https://review.whamcloud.com/44957
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14677 sec: no encryption key migrate/extend/resync/split 24/44024/24
Sebastien Buisson [Thu, 17 Jun 2021 13:31:44 +0000 (15:31 +0200)]
LU-14677 sec: no encryption key migrate/extend/resync/split

Allow some layout operations on encrypted files, even when the
encryption key is not available:
- lfs migrate
- lfs mirror extend
- lfs mirror resync
- lfs mirror verify
- lfs mirror split
We allow these access patterns to applications that know what they are
doing, by using the specific flag O_FILE_ENC and O_DIRECT.

Also add sanity-sec test_59a,b,c to exercise these access patterns.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ieaeee0e5bf7643f18d775fe6daa5e31c2f349f8c
Reviewed-on: https://review.whamcloud.com/44024
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files 76/43876/20
James Simmons [Wed, 8 Dec 2021 22:13:40 +0000 (17:13 -0500)]
LU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files

With kallsyms_lookup_name() no longer exported with 5.8+ kernels
this means the work around to setup the security handling broke.
Currently osd-ldiskfs will crash due to security_alloc() never
being called. The solution is to use alloc_file_pseudo() instead
to create our fake file.

Change-Id: Ib417ebdda7d9829a231c568022618154c273f3e6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43876
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14704 tests: disable opencache for sanity/29 77/43777/14
Alex Zhuravlev [Tue, 25 May 2021 04:13:55 +0000 (07:13 +0300)]
LU-14704 tests: disable opencache for sanity/29

otherwise lock counting is not quite correct

Fixes: 41d99c4902 ("LU-10948 llite: Introduce inode open heat counter")

Test-Parameters: trivial

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia73e8aa4a16b7ced29490d41c8eac4ee839a3406
Reviewed-on: https://review.whamcloud.com/43777
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-11388 test: enable replay-single test_131b 21/40421/7
Vikentsi Lapa [Tue, 27 Oct 2020 14:39:58 +0000 (14:39 +0000)]
LU-11388 test: enable replay-single test_131b

Issue is fixed, so this commit verifies fix.

Test-Parameters: trivial env=ONLY=131 testlist=replay-single fstype=zfs
Signed-off-by: Vikentsi Lapa <vlapa@whamcloud.com>
Change-Id: I609146172c1fee2a955d5c41f623c8b8c2ffaeaa
Reviewed-on: https://review.whamcloud.com/40421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
9 months agoLU-15357 mdd: fix changelog context leak 31/45831/4
Mikhail Pershin [Sat, 11 Dec 2021 12:49:47 +0000 (15:49 +0300)]
LU-15357 mdd: fix changelog context leak

The mdd_changelog_clear() shouldn't skip llog_ctxt_put()
in case of error.

Fixes: 6b183927e1 (LU-14553 changelog: eliminate mdd_changelog_clear warning)
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9c9aa3ce0d11e8f67470b450d007f2a1081644c6
Reviewed-on: https://review.whamcloud.com/45831
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15252 mdt: reduce contention at mdt_lsom_update 09/45709/5
Alexander Boyko [Thu, 2 Dec 2021 09:43:54 +0000 (04:43 -0500)]
LU-15252 mdt: reduce contention at mdt_lsom_update

mot_som_mutex serialize all close requests with lsom updates for
a same mdt_object. For a massive open/read/close single shared
file load, it leads to high load avarage cause many threads sleep
on mutex.
This patch introduces a cached lsom size, and uses a mutex at update
part only. Close requests with lsom size less or equal to cached size
would not take a mutex at all.

Test results MPI open/flock/funlock/close SSF
10 iterations 10 node 100 thread each, 1000 file ops per thread
close time secs master patch MDT load avarage master patch
avg             0.142  0.086                  47.05  38.89
max             0.164  0.129                  49.39  44.77
min             0.097  0.041                  44.44  34.7

Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I807b468b128295df9391b0467e74d4f10240662e
Reviewed-on: https://review.whamcloud.com/45709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-7372 tests: re-enable replay-dual test_26 82/43982/2
Andreas Dilger [Fri, 11 Jun 2021 07:19:52 +0000 (01:19 -0600)]
LU-7372 tests: re-enable replay-dual test_26

Re-enable test_26 since it was just the unfortunate victim of
either test_24 or test_25 causing MDS unmount to hang.

Test-Parameters: trivial testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib944028e798488c425501f0c48bf812fc13ebbe5
Reviewed-on: https://review.whamcloud.com/43982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15262 osd: bio_integrity_prep_fn return value processing 46/45646/3
Alexey Lyashkov [Mon, 22 Nov 2021 13:32:23 +0000 (16:32 +0300)]
LU-15262 osd: bio_integrity_prep_fn return value processing

There is osd_bio_integrity_handle() fn in lustre/osd-ldiskfs/osd_io.c
It checks the returned code of bio_integrity_prep_fn() but between
mainstream Linux 4.12 and 4.13 kernel integrity API has changed and
in 4.13+ (as well as for any RHEL8 including first beta)

bio_integrity_prep() returns boolean true on success.

HPe-bug-id: LUS-10443
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I973aa8ccae024157ad863d26afc7b1264a5c7149
Reviewed-on: https://review.whamcloud.com/45646
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoNew tag 2.14.56 2.14.56 v2_14_56
Oleg Drokin [Mon, 13 Dec 2021 20:16:40 +0000 (15:16 -0500)]
New tag 2.14.56

Change-Id: I2491f69b4d4e4a7ae8ed39bef8c9806127c93d79
Signed-off-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7] 87/45687/2
Jian Yu [Tue, 30 Nov 2021 21:43:10 +0000 (13:43 -0800)]
LU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.49.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I356b8a8345a4a91d6d1c1a4a9b4eab4bb5afe75b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15260 tests: numfailovers() fix 33/45633/6
Elena Gryaznova [Mon, 22 Nov 2021 15:13:07 +0000 (18:13 +0300)]
LU-15260 tests: numfailovers() fix

Patch fixes numfailovers() to use comma
separated MDTS list correctly. Without this fix
in newer bash version we see the following error:
  line 69: mds1,mds2,mds3,mds4_nums: bad substitution

Fixes: a7a2133bfa ("b=18696 new RECOVERY_RANDOM_SCALE test")
Fixes: b594948509 ("TT-59 remove . and - from the node name")
Test-Parameters: trivial testlist=recovery-random-scale
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10619
Change-Id: I4c28e3c62cada60dc1241948dc4e969e0e10ce9a
Reviewed-on: https://review.whamcloud.com/45633
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15263 quota: fix bug in qmt_pool_recalc 32/45632/2
Sergey Cheremencev [Thu, 21 Oct 2021 20:28:01 +0000 (23:28 +0300)]
LU-15263 quota: fix bug in qmt_pool_recalc

env should be freed at the end of qmt_pool_recalc,
as it is needed in qpi_putref. It causes a panic,
if pool has the last reference:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffffc08de2d7>] lu_context_key_get+0x17/0x30 [obdclass]
...
Call Trace:
 [<ffffffffc08de358>] lu_object_free.isra.30+0x68/0x170 [obdclass]
 [<ffffffffc08e1a35>] lu_object_put+0xc5/0x3e0 [obdclass]
 [<ffffffffc100e56c>] qmt_pool_free+0x30c/0x590 [lquota]
 [<ffffffffc10100b5>] qmt_pool_recalc+0x365/0x1260 [lquota]
 [<ffffffff8bac1c31>] kthread+0xd1/0xe0
 [<ffffffff8c176c37>] ret_from_fork_nospec_begin+0x21/0x21

HPE-bug-id: LUS-10426
Change-Id: Ic23dcb858ff811757f38948aa572c936c076e21e
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/45632
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15208 ldiskfs: add support for Ubuntu20 kernel 5.4.0.90 47/45547/4
Li Dongyang [Fri, 12 Nov 2021 12:30:43 +0000 (23:30 +1100)]
LU-15208 ldiskfs: add support for Ubuntu20 kernel 5.4.0.90

Also fix the lustre-build-ldiskfs.m4 to select correct series file.
We use -ge to check the kernel release version, so greater version
should come on top.

Change-Id: Id6b599ef5b2ea823e203aaa6a40917e49f98f4d9
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45547
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-930 doc: update lustre.7 man page 93/45493/2
Andreas Dilger [Mon, 8 Nov 2021 21:03:24 +0000 (14:03 -0700)]
LU-930 doc: update lustre.7 man page

Update the lustre.7 man page to better describe current functionality.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I979841e597fcfa8448c708dd66d4d89d3018b1cc
Reviewed-on: https://review.whamcloud.com/45493
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15196 kernel: kernel update RHEL8.4 [4.18.0-305.25.1.el8_4] 60/45460/4
Jian Yu [Tue, 30 Nov 2021 22:07:40 +0000 (14:07 -0800)]
LU-15196 kernel: kernel update RHEL8.4 [4.18.0-305.25.1.el8_4]

Update RHEL8.4 kernel to 4.18.0-305.25.1.el8_4.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Change-Id: Ic70f7330f90a36646bb36e0c6015ea22882b20b9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45460
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15190 ptlrpc: fix duplication check 45/45445/5
Alex Zhuravlev [Wed, 3 Nov 2021 06:31:06 +0000 (09:31 +0300)]
LU-15190 ptlrpc: fix duplication check

ptlrpc_server_check_for_resend() skips duplication check if
current exp_rpc_count == 0 which is wrong as exp_rpc_count
is incremented for RPCs in progress.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4ba1600341d916871f66aceb4d6a1043dd015e55
Reviewed-on: https://review.whamcloud.com/45445
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-12784 tests: fix large_xattr_enabled() for ZFS 64/45264/2
Andreas Dilger [Fri, 15 Oct 2021 17:31:57 +0000 (11:31 -0600)]
LU-12784 tests: fix large_xattr_enabled() for ZFS

Fix large_xattr_enabled() check for ZFS filesystems, since bash
functions return "0" for true.  Otherwise, all ZFS tests that
check large_xattr_enabled() will be skipped.

Fixes: 84097792f56c ("LU-12784 llite: limit max xattr size by kernel value")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie566244c6b1f46b947a96331e7623b9b863ebbe5
Reviewed-on: https://review.whamcloud.com/45264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15057 utils: pool quota man 21/45121/4
Sergey Cheremencev [Wed, 31 Mar 2021 12:13:53 +0000 (15:13 +0300)]
LU-15057 utils: pool quota man

Adding pool quota man for setquota and
quota commands.
Remove [-o <obd_uuid>|-i <mdt_idx>|-I <ost_idx>]
from the case "lfs quota -t". Grace period
is stored only at quota master. Furthermore,
command lfs quota -t -I 0 /mnt/testfs fails
with EOPNOTSUPP.

Test-Parameters: trivial
HPE-bug-id: LUS-9869
Change-Id: I368e22b782bd3626f64907059ea329e94986535b
Reviewed-on: https://es-gerrit.dev.cray.com/158556
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45121
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13756 quota: up_read leak in qmt_pool_lookup 06/45106/7
Sergey Cheremencev [Thu, 30 Sep 2021 15:58:16 +0000 (18:58 +0300)]
LU-13756 quota: up_read leak in qmt_pool_lookup

qmt_pool_lock is not released if qti_pools_add fails in
qmt_pool_lookup.

Change-Id: Ic2adb44468d51af7aefcbb91279260ae6f85d67a
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14975 dne: dir migration in non-recursive mode 02/44802/10
Lai Siyao [Thu, 26 Aug 2021 11:37:09 +0000 (07:37 -0400)]
LU-14975 dne: dir migration in non-recursive mode

Add an option "-d|--directory" option for "lfs migrate -m" to
migrate specified directory only, which is similar to "ls -d".

Add sanity 230w.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib97949e3840a3b49f7074b16e259582a9bf16e3b
Reviewed-on: https://review.whamcloud.com/44802
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14956 fld: repeat failed FLDB lookup 23/44723/13
Alex Zhuravlev [Mon, 23 Aug 2021 07:29:18 +0000 (10:29 +0300)]
LU-14956 fld: repeat failed FLDB lookup

it's possible that LWP reconnection is in progress after remote
MDS restart. if FLDB misses an entry, then FLDB lookup can fail
with EAGAIN and whole RPC processing (like MDS_REINT) can fail
as well. try to lookup few times in cases of EAGAIN.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib6aeaf7706a6465b0c8bee696d985bb440ed192e
Reviewed-on: https://review.whamcloud.com/44723
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9859 mdd: unwind md_capable() 80/44580/8
James Simmons [Tue, 17 Aug 2021 14:16:41 +0000 (10:16 -0400)]
LU-9859 mdd: unwind md_capable()

The inline function md_capable() is just a wrapper
around cap_raised() which adds little benefit. Lets
just remove the use of this wrapper.

Change-Id: I1a5f4b2e34b4cf358b52b3fc4bdeff17fdab50c9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44580
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14754 tests: add Overstripe support to racer 64/43964/5
Elena Gryaznova [Thu, 10 Jun 2021 09:19:44 +0000 (12:19 +0300)]
LU-14754 tests: add Overstripe support to racer

The files are created with a overstripe layout if
RACER_ENABLE_OVERSTRIPE=true is set.

We would like to have the ability to use the "real"
layouts, i.e. to limit the number of stripes per OST
instead of allowing racer to achieve the max LOV_MAX_STRIPE_COUNT
value. Patch adds RACER_LOV_MAX_STRIPECOUNT equal to
LOV_MAX_STRIPE_COUNT by default.

Test-Parameters: trivial testlist=racer
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9466, LUS-9608
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I550922938438afa121af275fd1d6f60082db9b54
Reviewed-on: https://review.whamcloud.com/43964
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14392 gnilnd: re-enable large I/o buffers 73/41373/2
Shaun Tancheff [Sun, 31 Jan 2021 16:20:54 +0000 (10:20 -0600)]
LU-14392 gnilnd: re-enable large I/o buffers

DVS on gni breaks the LNet 1M handshake of LNET_MAX_IOV.

Introduce GNILND_MAX_IOV with a 4M i/o maximum and a hint
LNET_MD_GNILND so LNet can accept the large buffer w/o complaint.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4e78c0022fdece0d6945bbcc47e2e64d4d181dca
Reviewed-on: https://review.whamcloud.com/41373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-11915 tests: fix conf-sanity 115 test 49/38849/9
Artem Blagodarenko [Fri, 5 Jun 2020 15:45:45 +0000 (11:45 -0400)]
LU-11915 tests: fix conf-sanity 115 test

Not enough xattrs added to move outside inode.
Add one additional xattr. The test works only with FLAKEY=false.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Test-Parameters: testlist=conf-sanity env="ONLY=115"
HPE-bug-id: LUS-6966
Change-Id: Iab13ed3434effb03e1209755ac51eba2debe7387
Reviewed-on: https://review.whamcloud.com/38849
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13542 osd: brw stats are initialized too late 54/38554/6
Andrew Perepechko [Thu, 25 Nov 2021 17:28:01 +0000 (20:28 +0300)]
LU-13542 osd: brw stats are initialized too late

Lustre crashes with the following stack trace:

 [<ffffffffc113cbac>] lprocfs_oh_tally+0x2c/0x40 [obdclass]
 [<ffffffffc169719b>] record_start_io.part.14+0x2b/0x40 [osd_zfs]
 [<ffffffffc1698322>] osd_read+0xa2/0x180 [osd_zfs]
 [<ffffffffc1167dee>] dt_record_read+0x1e/0x70 [obdclass]
 [<ffffffffc1190997>] lustre_index_restore+0x527/0x1720 [obdclass]
 [<ffffffffc16b2564>] osd_initial_OI_scrub+0xa34/0xd50 [osd_zfs]
 [<ffffffffc16b34fd>] osd_scrub_setup+0x9ed/0xb90 [osd_zfs]
 [<ffffffffc168a97b>] osd_mount+0xf4b/0x1380 [osd_zfs]

osd_procfs_init()/osd_stats_init() are called *after*
osd_initial_OI_scrub(), so osd stats are not yet initialized
when osd_read() first tries to update them.

This patch separates osd stats initialization from procfs
initialization so that osd stats should become initialized
by the time scrub starts its own initialization.

Change-Id: I15ab03e77eaab76e3dea8067b849c891e89aa9a8
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-8173
Reviewed-on: https://review.whamcloud.com/38554
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-10640 tests: ha.sh script improvements 29/31229/9
Elena Gryaznova [Wed, 17 Nov 2021 10:13:26 +0000 (13:13 +0300)]
LU-10640 tests: ha.sh script improvements

In each load iteration check for all created directories
that ls' long format output does not contain question
marks ('?').
'?'s may be reported if
  stat(2)->getattr()->ll_glimpse_size()
fails, which is expected in case of failover. Then the test
is to wait until recovery is completed, repeat the check
and exit with error if '?' still exists after second check.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-4894
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Change-Id: I88495511797aaad53c923c90f88f92f1412380ce
Reviewed-on: https://review.whamcloud.com/31229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15252 mdc: add client tunable to disable LSOM update 19/45619/3
Alexander Boyko [Fri, 19 Nov 2021 08:08:16 +0000 (03:08 -0500)]
LU-15252 mdc: add client tunable to disable LSOM update

It seems that mdt_lsom_update() has a serious issue with a single
shared file because of its mdt-level mutex for every close request.
The patch adds mdc_lsom parameter to mdc, base on it state client
sends or not LSOM updates to MDT. By default LSOM is on.

lctl set_param mdc.*.mdc_lsom=[on|off]

For a configuration when LSOM is not used the patch helps
MDT with load avarage with a specific load when many threads
open/read/close for a single file.

HPE-bug-id: LUS-10604
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iba0e745a94825641da6b0a1c09488b1e2f54658b
Reviewed-on: https://review.whamcloud.com/45619
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15167 quota: fallocate send UID/GID for quota 75/45475/5
Arshad Hussain [Sun, 7 Nov 2021 14:46:29 +0000 (09:46 -0500)]
LU-15167 quota: fallocate send UID/GID for quota

Calling fallocate() on a newly created file did not account quota
usage properly because the OST object did not have a UID/GID
assigned yet. Update the fallocate code in the OSC to always send
the file UID/GID/PROJID to the OST so that the object ownership
can be updated before space is allocated.

Test-case: sanity-quota/78 added

Fixes: 48457868a02a ("LU-3606 fallocate: Implement fallocate preallocate operation")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I86d80a7f415a80100f7d2fb5f417cf47bf5b2900
Reviewed-on: https://review.whamcloud.com/45475
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14514 flr: mirror split should not make stale file 24/42024/23
Bobi Jam [Thu, 2 Sep 2021 16:27:34 +0000 (00:27 +0800)]
LU-14514 flr: mirror split should not make stale file

Mirror split could leave an all stale mirrors file, this patch
prevent removing the last non-stale mirror from the file.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I63007784929a2cd18d2823e2250f7307ca7d8d45
Reviewed-on: https://review.whamcloud.com/42024
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-11952 mdt: fix reconstruct open 12/35112/19
Andriy Skulysh [Sun, 3 Mar 2019 18:10:31 +0000 (20:10 +0200)]
LU-11952 mdt: fix reconstruct open

We shouldn't start a new transaction on resend.

Store fid of an opened object and use it during
reconstruction of the resend.

Change-Id: I8c21e9661903d3d4090ad29e43480e2ba7e35c39
Cray-bug-id: LUS-6957, LUS-7286
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/35112
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-15169 Revert "LU-14668 lnet: Lock primary NID logic" 86/45386/3
Chris Horn [Tue, 26 Oct 2021 20:23:37 +0000 (15:23 -0500)]
LU-15169 Revert "LU-14668 lnet: Lock primary NID logic"

This patch breaks client mounts under certain LNet configurations.

This reverts commit 024f9303bc6f32a3113357c864765c4f9c93ed03.

Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic1f1d07694fe49df14c803a9434d673e61c7dd67
Reviewed-on: https://review.whamcloud.com/45386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13601 llite: avoid needless large stats alloc 01/40901/13
Andreas Dilger [Tue, 8 Dec 2020 06:54:58 +0000 (23:54 -0700)]
LU-13601 llite: avoid needless large stats alloc

Allocate the ll_rw_extents_info (5896 bytes), ll_rw_offset_info
(6400 bytes), and ll_rw_process_info (640 bytes) structs only
when these stats are enabled, which is very rarely.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59bbfce8d7f2422d810617d5fa712a67333ebbe5
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/40901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15181 osd-ldiskfs: a typo in osd_declare_write_commit() 23/45423/3
Andrew Perepechko [Sun, 31 Oct 2021 19:03:48 +0000 (22:03 +0300)]
LU-15181 osd-ldiskfs: a typo in osd_declare_write_commit()

A typo in osd_declare_write_commit() makes ASAN emit warnings like:
UBSAN: Undefined behaviour in /lustre/osd-ldiskfs/osd_io.c:1404:2
shift exponent 4096 is too large for 32-bit type 'int'

Change-Id: Ie612d9c6655211445c00bd17fa1bf7a836af3542
Fixes: f0f92773e ("LU-14187 osd-ldiskfs: fix locking in write commit")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45423
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
10 months agoLU-15182 git: Add .gitreview file 07/44907/4
Xinliang Liu [Tue, 14 Sep 2021 01:20:32 +0000 (01:20 +0000)]
LU-15182 git: Add .gitreview file

Add .gitreview file, so that we can use "git review -s" to setup
remote push url and use "git review" to send patch for review.

Git review cmd is a very convenient tool to push, review for
Gerrit review system. See more details here:
https://docs.opendev.org/opendev/git-review/latest/usage.html

Test-Parameters: trivial
Change-Id: Ic8223bdfcb7a696328f921159a63d625359e45a6
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/44907
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-9859 gss: replace cfs_size_roundXX macros. 84/45584/3
James Simmons [Tue, 16 Nov 2021 16:07:13 +0000 (11:07 -0500)]
LU-9859 gss: replace cfs_size_roundXX macros.

Many of the cfs_size_roundX() macros are not even used so delete
them. Replace cfs_size_round4() uses in the GSS layer with
round_up(var, 4);

Change-Id: Id35f0f7b60f8d00f425d9b15d2a76aa4d0fa5f2f
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45584
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
10 months agoLU-15217 pcc: disable PCC for encrypted files 45/45545/4
Qian Yingjin [Fri, 30 Jul 2021 08:47:55 +0000 (16:47 +0800)]
LU-15217 pcc: disable PCC for encrypted files

When files are encrypted in Lustre using fscrypt, they should
normally not be accessible to users without the proper encyrption
key. However, if a user has then encryption key loaded when they
read a file, it may be decrypted in memory and saved to the PCC
backend in unencrypted form.

Due to the above reason, we just disable PCC caching for encrypted
files.

DDN-bug-id: EX-3571
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I6c363dcad7a6bc8520350c0295f6e221bec3abb0
Reviewed-on: https://review.whamcloud.com/45545
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14325 tests: skip replay-single 134 for older servers 50/45450/3
James Nunez [Wed, 3 Nov 2021 19:39:06 +0000 (13:39 -0600)]
LU-14325 tests: skip replay-single 134 for older servers

The fix for a PFL file lost during recovery was landed to
Lustre 2.13.53.  Servers prior to 2.13.53 will fail the
replay-single test, test_134, added to the original patch
to check that PFL files are not lost.  Thus, we need to
skip this test for Lustre servers less than 2.13.53.

Fixes: 72d45e1d344c ("LU-13809 mdc: fix lovea for replay")
Test-Parameters: trivial env=ONLY=134 testlist=replay-single
Test-Parameters: serverdistro=el7.9 serverversion=2.12.7 env=ONLY=134 testlist=replay-single
Test-Parameters: serverdistro=el7.7 serverversion=2.13.0 env=ONLY=134 testlist=replay-single
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Id70f9e06f6221f88a54d696afce9de70cbcf1efa
Reviewed-on: https://review.whamcloud.com/45450
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
10 months agoLU-15186 o2iblnd: Default map_on_demand to 1 31/45431/3
Chris Horn [Mon, 1 Nov 2021 20:06:31 +0000 (15:06 -0500)]
LU-15186 o2iblnd: Default map_on_demand to 1

On kernels that provide global MR we default to using that exclusively
even if FMR/FastReg is available. This causes an interop issue if the
active side of a connection request has a higher fragment count than
the passive side  because FMR/FastReg may be needed to map the higher
fragment count. We should change the default map_on_demand to 1 so
that FMR/FastReg is used by default. map_on)demand can still be set
to 0 if needed.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I76010a905f151efbb0b109ae6f5fba6fb7ce1956
Reviewed-on: https://review.whamcloud.com/45431
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15166 tests: restore osp-syn threads after test_818 75/45375/4
Vladimir Saveliev [Tue, 26 Oct 2021 16:53:01 +0000 (19:53 +0300)]
LU-15166 tests: restore osp-syn threads after test_818

test_818() is supposed to leave osp-syn threads up after the test end,
otherwise, following tests get "logging isn't available, run LFSCK".

Use fail $SINGLEMDS for that.

Test-Parameters: trivial testlist=sanity
HPE-bug-id: LUS-10495
Change-Id: Ib4876f4c4d39fc87f86788d8611838b8078e4aac
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45375
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12857 tests: allow clients to be IDLE after recovery 18/45318/3
Andreas Dilger [Thu, 21 Oct 2021 01:47:25 +0000 (19:47 -0600)]
LU-12857 tests: allow clients to be IDLE after recovery

If clients are not connected to an OST when it fails (connection
is IDLE), they do not need to be involved in recovery, so this
should not be considered an error when checking the client state.

Test-Parameters: trivial testlist=recovery-mds-scale env=SLOW=no
Test-Parameters: testlist=conf-sanity
Test-Parameters: testlist=replay-dual,replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6cfeb718acd233378ed1608f22061bc15c3ebbe5
Reviewed-on: https://review.whamcloud.com/45318
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15126 kernel: RHEL 8.5 server support 06/45306/9
Jian Yu [Wed, 17 Nov 2021 20:11:43 +0000 (12:11 -0800)]
LU-15126 kernel: RHEL 8.5 server support

This patch makes changes to support RHEL 8.5 release
with kernel 4.18.0-348.2.1.el8_5 for Lustre server.

Test-Parameters: trivial fstype=ldiskfs \
env=SANITY_EXCEPT="101j" \
clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Test-Parameters: trivial fstype=zfs \
env=SANITY_EXCEPT="101j" \
clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Change-Id: Ie976d8fd3e6fcf8a564eff8a41ad0fd51b2c858c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45306
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15222 build: Update ZFS version to 2.0.6 67/45567/4
Jian Yu [Mon, 15 Nov 2021 17:01:18 +0000 (09:01 -0800)]
LU-15222 build: Update ZFS version to 2.0.6

Update ZFS version to 2.0.6. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.0.6

Change-Id: I2a7df45b79f402c3d3bce8b137edd11b5224b576
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45567
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15126 kernel: new kernel [RHEL 8.5 4.18.0-348.2.1.el8_5] 85/45285/8
Jian Yu [Wed, 17 Nov 2021 19:58:14 +0000 (11:58 -0800)]
LU-15126 kernel: new kernel [RHEL 8.5 4.18.0-348.2.1.el8_5]

This patch makes changes to support new RHEL 8.5 release
for Lustre client.

Test-Parameters: trivial env=SANITY_EXCEPT="101j" \
clientdistro=el8.5

Change-Id: I068f091817126fffc14402254f45dcd75ba7f3fc
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
10 months agoLU-15184 llite: properly detect SELinux disabled case 01/45501/4
Sebastien Buisson [Tue, 9 Nov 2021 16:03:19 +0000 (17:03 +0100)]
LU-15184 llite: properly detect SELinux disabled case

Usually, security_dentry_init_security() returns -EOPNOTSUPP when
SELinux is disabled. But on some kernels (e.g. rhel 8.5) it returns
0 when SELinux is disabled, and in this case the security context is
empty.
So in both cases make sure the security context name is not set, which
means "SELinux is disabled" for the rest of the code.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3b9608f9768288de89570c158e8429560fa0213f
Reviewed-on: https://review.whamcloud.com/45501
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15119 tgt: BUG at tgt_brw_read+0x16bf/0x1d80 73/45273/3
Andriy Skulysh [Wed, 6 Oct 2021 10:25:12 +0000 (13:25 +0300)]
LU-15119 tgt: BUG at tgt_brw_read+0x16bf/0x1d80

struct tgt_thread_big_cache {
  local = {{
      lnb_file_offset = 0,
      lnb_page_offset = 0,
      lnb_len = 0,
      lnb_rc = 0,
      lnb_page = 0xffffddee74fae100,
so npages_read becomes 0

Change-Id: Ie2201c9fc6f0350b1c6dcb480cff52f44d5413db
HPE-bug-id: LUS-10510
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-on: https://review.whamcloud.com/45273
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15110 quota: cosmetic changes in PQ 58/45258/3
Sergey Cheremencev [Fri, 15 Oct 2021 14:11:47 +0000 (17:11 +0300)]
LU-15110 quota: cosmetic changes in PQ

cosmetic changes in PQ:
- make tgt_pool_free and qmt_sarr_pool_free void
- remove outdated comment from qmt_pool_lqes_lookup
- replace tabs with spaces

HPE-bug-id: LUS-9547
Change-Id: If4918b647eed1d971d00c521d010d0c72d349207
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45258
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15079 quota: include qsd_thread_info into mgs thread context 81/45181/2
Vladimir Saveliev [Tue, 24 Aug 2021 14:57:37 +0000 (17:57 +0300)]
LU-15079 quota: include qsd_thread_info into mgs thread context

mgs service thread envs do not get supplied with qsd_thread_info, which
may lead to the failure shown below:
(lu_object.h:1274:lu_env_info()) ASSERTION( info ) failed:
(lu_object.h:1274:lu_env_info()) LBUG
Pid: 146951, comm: ll_mgs_0003 3.10.0-957.1.3957.1.3.x4.3.25.x86_64 #1 SMP
Call Trace:
 libcfs_call_trace+0x8e/0xf0 [libcfs]
 lbug_with_loc+0x4c/0xa0 [libcfs]
 qsd_refresh_usage+0x25e/0x2f0 [lquota]
 qsd_op_adjust+0x2f1/0x730 [lquota]
 osd_object_delete+0x2b2/0x360 [osd_ldiskfs]
 lu_object_free.isra.32+0x68/0x170 [obdclass]
 lu_site_purge_objects+0x2fe/0x530 [obdclass]
 lu_object_find_at+0x371/0xa60 [obdclass]
 dt_locate_at+0x1d/0xb0 [obdclass]
 llog_osd_open+0x50e/0xf30 [obdclass]
 llog_open+0x15a/0x3e0 [obdclass]
 llog_origin_handle_open+0x334/0x720 [ptlrpc]
 tgt_llog_open+0x33/0xe0 [ptlrpc]
 mgs_llog_open+0x46/0x460 [mgs]
 tgt_request_handle+0x96a/0x1680 [ptlrpc]

Supply msg service context with qsd_thread_info.

Change-Id: If8664b81e1f64df015dad46ba26c9c1d1e3f54bf
HPE-bug-id: LUS-10334
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45181
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15059 nrs: do not overwrite "cmd" in nrs_tbf_rule 42/45142/4
Etienne AUJAMES [Wed, 6 Oct 2021 20:11:17 +0000 (22:11 +0200)]
LU-15059 nrs: do not overwrite "cmd" in nrs_tbf_rule

"cmd" pointer inside ptlrpc_lprocfs_nrs_tbf_rule_seq_write() and
nrs_tbf_parse_cmd are static. This could cause a double kfree call
because "cmd" could be overwriten by another "nrs_tbf_rule" write
instance.

Let's try to remove the "static" definition.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I8cd7d9dd0483778c82bbf8711c07e49255983f4b
Reviewed-on: https://review.whamcloud.com/45142
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
10 months agoLU-14699 mdd: proactive changelog garbage collection 68/45068/9
Mikhail Pershin [Fri, 24 Sep 2021 15:47:44 +0000 (18:47 +0300)]
LU-14699 mdd: proactive changelog garbage collection

Currently changelog starts garbage collection when user
exceeds maximum idle timeout, there is also limit by amount
of idle records but it is used only for old changelog users
which have no cur_time field, therefore it is not used at
all nowadays. Another problem is that garbage collection is
started only when changelog is almost full. That causes
often situations when changelog might have very old users
staying much longer than idle timeout and having idle
records above maximum limit consuming space for nothing.

Patch reworks changelog GC in the following way:
- GC starts when changelog is almost full (old way) or
  either idle time or idle records limits are exceeded or
  when (idle_time * idle_records) exceeds its limit as well.
  The latest limit is calculated as:
  (idle_time * idle_records) / 84600 > (1 << 32) which is a
  reasonable heuristic for deciding if a user is "too idle"
  in both cases when lots records being created quickly vs
  user is idle a very long time.
- to avoid the processing of changelog users each time GC is
  checking all conditions both least user record and time
  are tracked when changelog users are initialized or
  purged/canceled. Both values are stored as mdd_changelog
  fields mc_minrec and mc_mintime
- test 160g is changed to test the new approach when idle
  indexes are checked always along with idle time checks
- test 160s is added in sanity.sh to check heuristic approach
  with (idle_time * idle_records) value checking

Fixes: 3442db6faf68 ("LU-7340 mdd: changelogs garbage collection")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6028f3164212a2377a4fc45b60a826c64f859099
Reviewed-on: https://review.whamcloud.com/45068
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14957 mdd: prepare xattrs before migration 41/44741/2
Lai Siyao [Thu, 5 Aug 2021 15:30:22 +0000 (11:30 -0400)]
LU-14957 mdd: prepare xattrs before migration

In directory migration, the xattrs should be prepared before starting
transaction, otherwise if remote MDT is down, which will cause local
MDT stuck as well.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I79279e7b0c051a7542a71066fffd4ad70f559368
Reviewed-on: https://review.whamcloud.com/44741
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14941 lnet: Fix source specified to routed destination 30/44730/5
Chris Horn [Thu, 12 Aug 2021 21:16:05 +0000 (16:16 -0500)]
LU-14941 lnet: Fix source specified to routed destination

If a source NI is specified for a send then we should not modify the
destination NID that was passed to lnet_send().

HPE-bug-id: LUS-10301
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie47558d5bce97a0dca30ff7d072dcd39eb903324
Reviewed-on: https://review.whamcloud.com/44730
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14940 lnet: Fix source specified send to different net 28/44728/3
Chris Horn [Thu, 12 Aug 2021 21:08:44 +0000 (16:08 -0500)]
LU-14940 lnet: Fix source specified send to different net

The destination NI is fixed for all source-specified sends. Thus, in
order for a source-specified send to be considered "local", i.e. a
send that does not require a route, the destination NID must be on
the same net as the specified source.

HPE-bug-id: LUS-10303
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4847db1d393bbc36def65123f260b928ebbf944e
Reviewed-on: https://review.whamcloud.com/44728
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14939 lnet: Allow specifying a source NID for lnetctl ping 27/44727/5
Chris Horn [Thu, 12 Aug 2021 16:26:07 +0000 (11:26 -0500)]
LU-14939 lnet: Allow specifying a source NID for lnetctl ping

Add a new --source option for lnetctl ping command. This allows the
user to specify a local NI from which to send the ping. This also
ensures that the specified destination NID is also used. Otherwise,
pings to multi-rail peers may end up going to a different peer NI
based on the multi-rail selection algorithm. The ability to specify
a source NI, and thus fix the destination NI, is a great help in
troubleshooting communication issues between multi-rail peers.

Add test to exercise lnetctl ping --source option.

HPE-bug-id: LUS-10296
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I454217b30a92414de537880f076a11a693b1f0b3
Reviewed-on: https://review.whamcloud.com/44727
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14867 doc: a few words about an asterisk in lfs quota 57/44357/4
Sergey Cheremencev [Thu, 10 Jun 2021 10:08:28 +0000 (13:08 +0300)]
LU-14867 doc: a few words about an asterisk in lfs quota

Clarify the difference between an asterisk printed per OST
in verbouse lfs quota output and an asterisk near the whole
filesystem usage.

Change-Id: I778fe1f7b1f6f8d55c311d81bd2b311d82463390
Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/44357
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14437 gnilnd: use ktime_get_seconds() to get time 79/41679/4
Shaun Tancheff [Sat, 9 Oct 2021 04:33:23 +0000 (11:33 +0700)]
LU-14437 gnilnd: use ktime_get_seconds() to get time

Use ktime_get_seconds() to directly get the time inatead of
getting a timespec and converting it.

Fixes: 4b0e495e3c ("LU-14080 gnilnd: updates for SUSE 15 SP2")
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I256855ceb9e038a9960fa76fe6e3bfe63fb16580
Reviewed-on: https://review.whamcloud.com/41679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14188 osd: remove not used iam_container lock 90/40890/7
Artem Blagodarenko [Sat, 5 Dec 2020 01:40:52 +0000 (20:40 -0500)]
LU-14188 osd: remove not used iam_container lock

There is an rw_semaphore
struct iam_container {
    ...
        /*
         * read-write lock protecting index consistency.
         */
        struct rw_semaphore     ic_sem;   <<<<<<
        struct dynlock       ic_tree_lock;
        /* Protect ic_idle_bh */
        struct mutex         ic_idle_mutex;
     ...
};

There is initialization
 2    234  lustre/osd-ldiskfs/osd_iam.c <<iam_container_init>>
             init_rwsem(&c->ic_sem);

There are wrappers
   3    622  lustre/osd-ldiskfs/osd_iam.c <<iam_container_write_lock>>
             down_write(&ic->ic_sem);
   4    627  lustre/osd-ldiskfs/osd_iam.c <<iam_container_write_unlock>>
             up_write(&ic->ic_sem);
   5    632  lustre/osd-ldiskfs/osd_iam.c <<iam_container_read_lock>>
             down_read(&ic->ic_sem);
   6    637  lustre/osd-ldiskfs/osd_iam.c <<iam_container_read_unlock>>
             up_read(&ic->ic_sem);

But this wrappers are not used. And, based on the git history, never been used.
Let's delete this useless code.

Change-Id: Ied1122f034e53fee08888e1091f700bda4507f00
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
HPE-bug-id: LUS-9545
Reviewed-on: https://review.whamcloud.com/40890
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14073 ldiskfs: update ldiskfs patches for Linux 5.9 97/40397/4
Mr NeilBrown [Tue, 9 Nov 2021 15:55:33 +0000 (10:55 -0500)]
LU-14073 ldiskfs: update ldiskfs patches for Linux 5.9

Minor conflict changes only.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If0b58fc278f6721e44e26c6052509c31068f8e78
Reviewed-on: https://review.whamcloud.com/40397
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-9325 obdclass: make niduuid for lustre_stop_mgc() static 17/33617/8
James Simmons [Mon, 1 Nov 2021 18:31:24 +0000 (14:31 -0400)]
LU-9325 obdclass: make niduuid for lustre_stop_mgc() static

The process to create a proper string for niduuid can be made
simpler and avoid a memory allocation.

Change-Id: I52cb01117e41cbcf2756477e91934a42d31fd157
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/33617
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12511 ldlm: free resource when ldlm_lock_create() fails. 85/45585/2
Mr. NeilBrown [Tue, 16 Nov 2021 16:11:22 +0000 (11:11 -0500)]
LU-12511 ldlm: free resource when ldlm_lock_create() fails.

ldlm_lock_create() gets a resource, but don't put it on
all failure paths. It should.

Change-Id: Ib49bcafdeac834c412adad9db135034d1ea06a04
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/45585
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12511 llite: fix misuse of current->parent. 80/45580/2
Mr. NeilBrown [Mon, 15 Nov 2021 19:16:54 +0000 (14:16 -0500)]
LU-12511 llite: fix misuse of current->parent.

current->parent is used by ptrace to redirect some signal delivery
to the ptracer.  It should only be used by 'ptrace' or 'signal' code.
All other users should  use current->real_parent, which is the real
parent.

Change-Id: I3212fd9f3db9935b8144dba8af82eb2f387a5c45
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/45580
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15083 ldiskfs: disable xattr credits check 46/45546/2
Li Dongyang [Fri, 12 Nov 2021 12:24:05 +0000 (23:24 +1100)]
LU-15083 ldiskfs: disable xattr credits check

Add ext4-xattr-disable-credits-check.patch to the
ldiskfs series from kernel 4.15+ except the
rhel8 ones. They already have this fix.

Change-Id: I1f9818c394377cdba6b11d2b24c26d2354f41ac0
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45546
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-9162 lod: option to set max stripe count per filesystem 32/45532/7
Lei Feng [Thu, 11 Nov 2021 06:06:39 +0000 (14:06 +0800)]
LU-9162 lod: option to set max stripe count per filesystem

Add an option to set max default stripe count when the stripe count
is set to -1.

Change-Id: I02634a02a6f6579750fe964662b7e644af1689d6
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/45532
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14781 osp: osp_object_free access NULL pointer 42/45442/3
Bobi Jam [Tue, 2 Nov 2021 07:27:59 +0000 (15:27 +0800)]
LU-14781 osp: osp_object_free access NULL pointer

If a osp_object is created by multiple threads at the same time,
lu_object_find_at() could allocate an osp_object without object
initialization, before hash inserting of the object, it find another
object has been created and inserted by another thread, it will free
the unintialized osp_object, and osp_object_free() will access
an uninitialized list_head (opo_xattr_list).

This patch initialize osp_object::opo_xattr_list in its allocation
function.

Call trace:
            lu_object_free.isra.30+0xf2/0x170 [obdclass]
            lu_object_find_at+0x496/0x930 [obdclass]
            lod_initialize_objects+0x3e4/0xba0 [lod]
            lod_parse_striping+0x693/0xc20 [lod]
            lod_striping_load+0x2b2/0x660 [lod]
            lod_declare_destroy+0x12b/0x600 [lod]
            mdd_declare_finish_unlink+0x91/0x210 [mdd]
            mdd_unlink+0x48f/0xab0 [mdd]
            mdt_reint_unlink+0xc32/0x1550 [mdt]
            mdt_reint_rec+0x83/0x210 [mdt]
            mdt_reint_internal+0x6e1/0xb00 [mdt]
            mdt_reint+0x67/0x140 [mdt]
            tgt_request_handle+0xaee/0x15f0 [ptlrpc]
            ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
            ptlrpc_main+0xb34/0x1470 [ptlrpc]
            kthread+0xd1/0xe0

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib86aca5b41e94a1758f177655ea3a0f680335e0f
Reviewed-on: https://review.whamcloud.com/45442
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
10 months agoLU-15114 osp: changes queuing throttle 65/45265/8
Alexander Zarochentsev [Tue, 31 Aug 2021 05:36:30 +0000 (08:36 +0300)]
LU-15114 osp: changes queuing throttle

Prevent queue of sync changes from growing too much
by adding resends when queue size reaches
some (tunable) limit.

HPE-bug-id: LUS-10345
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I5efabb91d3700c58d9451f81c5fed9a22ae404fb
Reviewed-on: https://review.whamcloud.com/45265
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14491 ldiskfs: do not corrupt journal with bh change rh7.6 64/45164/2
Andrew Perepechko [Fri, 8 Oct 2021 10:05:50 +0000 (17:05 +0700)]
LU-14491 ldiskfs: do not corrupt journal with bh change rh7.6

Currently, ldiskfs_xattr_delete_inode() zeroes xattr inode
references in cached buffers that haven't been prepared by
get_write_access().

When using journal checksums, it is possible that these buffers
are modified after the checksum is calculated but before the
buffer has been written to journal. Journal replay will fail
with a journal checksum error message if this transaction needs
to be replayed.

This is a port of:

Lustre-commit: d563f5140ffa183d0854cf7cd493ad884e314e3d
Lustre-change: https://review.whamcloud.com/41896

Test-Parameters: trivial
HPE-bug-id: LUS-9483
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5ce1016737a16ddf74811c43ae74296d4f3e03b0
Reviewed-on: https://review.whamcloud.com/45164
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15068 ptlrpc: Do not unlink difficult reply until sent 38/45138/3
Chris Horn [Tue, 5 Oct 2021 19:11:29 +0000 (14:11 -0500)]
LU-15068 ptlrpc: Do not unlink difficult reply until sent

If a difficult reply is queued in LNet, or the PUT for it is
otherwise delayed, then it is possible for the commit callback
to unlink the reply MD which will abort the send. This results in
client hitting "slow reply" timeout for the associated RPC and
an unnecessary reconnect (and possibly resend).

This patch replaces the rs_on_net flag with rs_sent and rs_unlinked.
These flags indicate whether the send event for the reply MD has
been generated, and whether the MD has been unlinked, respectively.

If rs_sent is set, but rs_unlinked has not been set, then ptlrpc_hr
is free to unlink the reply MD as a result of the commit callback.
The reply-ack will simply be dropped by the server.

If ptlrpc_hr is processing the reply because of commit callback, and
rs_sent has not been set, then ptlrpc_hr will not unlink the reply
MD. This means that the reply_out_callback must also be modified to
check for this case when the send event occurs. Otherwise, if the ACK
never arrives from the client, then the MD would never be unlinked.
Thus when the send event occurs, and rs_handled is set, the
reply_out_callback will schedule the reply for handling by ptlrpc_hr.

HPE-bug-id: LUS-10505
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib8f4853c7ab35d72624fce7ee3fba9e59a746e1f
Reviewed-on: https://review.whamcloud.com/45138
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15065 quota: fix BIO write performance drop 33/45133/7
Sergey Cheremencev [Wed, 15 Sep 2021 15:05:45 +0000 (18:05 +0300)]
LU-15065 quota: fix BIO write performance drop

Before the patch qti_lqes_qunit_min used int to store qunit
value, while lqe_qunit type is _u64. lqe_qunit > 2G caused
an overflow in a local integer argument. For example, when
block hard limit was set to 500TB(i.e. lqe_qunit was about
64TB in a system with 2 OSTs), qti_lqes_qunit_min returned
0 instead of 64TB in a qmt_lvbo_fill. Thus new qunit was not
set on OSTs(qsd_set_qunit wasn't called). Without the qunit,
OST began to send release request after each acquire. For
example, to write 10MB at the OST were sent 2 acquire and
2 release reuests(as qunit was not set on OST). With the
fix, i.e. in a normal case, OST needs just one acquire
request. The issue caused performance drop in a bufferred
write up to 15%-20% if compare with a baseline without PQ
patches.

Note, the issue exists only when a hard limit is set to some
high value(>100GB). The exact hard limit value depends on OSTs
number in a system and on amount of used space, but let's think
that issue doesn't exist on a clean system with 2 OSTs and hard
block limit 100G(this case was checked).

Remove qmt_pool_hash - it is not used anywhere since
"LU-11023 quota: remove quota pool ID".

HPE-bug-id: LUS-10250
Change-Id: I2c4ce38f5b9395ed1f4868d4c8efc00751116b15
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45133
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15179 tests: add trap cleanup_quota_test 18/45418/5
Sergey Cheremencev [Fri, 29 Oct 2021 19:00:46 +0000 (22:00 +0300)]
LU-15179 tests: add trap cleanup_quota_test

Add stack_trap cleanup_quota_test to the tests that
use setup_quota_test. If a test fails without calling
cleanup_quota_test, it may cause later tests to fail
due to used space > 0.

Remove ${tdir}_dom, if exists, in cleanup_quota_test.
sanity-quota_75 doesn't remove test_dom directory.

Test-Parameters: trivial  testlist=sanity-quota
Fixes: a4fbe734("LU-14739 quota: nodemap squashed root cannot bypass quota")
Change-Id: Ife4fd499b427bee79f74a5e172d233fe6a83e240
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45418
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15025 quota: stale edquot after clearing limits 00/45000/10
Sergey Cheremencev [Fri, 18 Jun 2021 16:08:16 +0000 (19:08 +0300)]
LU-15025 quota: stale edquot after clearing limits

When hard and soft limit set to 0, lqe enforced flag is also
set to false. As qmt_adjust_qunit does not handle not enforced
lqes, edquot set to the pool continues to be true and a user
gets -EDQUOT even if all pool limits are cleared. This was ok
for global pool lqe as since it turned off, zero limits are
sent to OSTs causing OSTs to release all granted space and
avoid EDQUOT. Fix this for PQ - set edquot and qunit to zero,
since appropriate lqe becomes "not enforced".

HPE-bug-id: LUS-10146
Change-Id: I1e08929bae7e1b37b1e8cbbc44859a786b5fb090
Reviewed-on: https://es-gerrit.dev.cray.com/158915/
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15191 quota: set correct revoke_time 47/45447/4
Sergey Cheremencev [Tue, 12 Oct 2021 15:21:49 +0000 (18:21 +0300)]
LU-15191 quota: set correct revoke_time

When we do qmt_adjust_qunit, there are several lqes
and lqe_revoke_time is set for some of them, it means
appropriate OSTs have been already notified with the
least qunit and there is no chance to free more space.
If a qunit of the current lqe becomes equal to the least
qunit, find an lqe with the minimum(earliest) revoke_time
and set this revoke_time to the current one.

This patch fixes the following case. For example, we have
8 OSTs and 4 MDTs(i.e. 12 slaves) and a pool with just one
OST. Global hard block limit for the user is 50M, and 10M
for this user in a pool. User's usage is 0. As global pool
has 12 slaves it's initial qunit value is 1M, i.e. equal to
the least qunit. At the same time initial qunit value for the
pool with one OST is 4M. When user begins to write, pool's
qunit is decreased to 1M, but lqe_revoke is not set - it
should be set only after sending new qunit to OSTs in
qmt_lvbo_update. However, it won't be send because appropriate
lge_qunit in lqe global array already has the same value.
This problem caused sanity-quota_72 to hang instead of fail
with EDQUOT in test_1_check_write.

HPE-bug-id: LUS-10516
Change-Id: I5878c1e719ae83a69ad5dbc3653717bb1b4de632
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45447
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-10391 uapi: move out kernel only code. 44/44844/4
James Simmons [Fri, 3 Sep 2021 23:22:18 +0000 (19:22 -0400)]
LU-10391 uapi: move out kernel only code.

Userland doesn't use the new IPv6 function in the UAPI nidstr.h.
So move this to the kernel header lib-types.h. Normally this
wouldn't matter but the python wrappers break with the LUTF
project. With the move to Netlink in the future for the user land
API we shouldn't need these functions anyways.

Test-parameters: trivial
Change-Id: I2ac6d102d4575f639573253c21723d62ca08abc2
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44844
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12347 llite: do not take mod rpc slot for getxattr 51/44151/10
Vladimir Saveliev [Thu, 9 Sep 2021 12:05:24 +0000 (15:05 +0300)]
LU-12347 llite: do not take mod rpc slot for getxattr

The following scenario may lead to client eviction:
clientA                clientB                  MDS
threadA1: write to file F1, get
and hold DoM MDC LDLM lock L1:
   ->cl_io_loop()
    ->cl_io_lock()
     :
     ->mdc_lock_granted()
      ->lock->l_writers++
      [hold ref until write done]

threadA2-A8: create files F2-F8:
   ->ll_file_open()
    ->mdc_enqueue_base()
     ->ldlm_cli_enqueue()
      ->ptlrpc_get_mod_rpc_slot()
      ->ptlrpc_queue_wait()
      [hold RPC slot until create done]

                                              OST(s) in recovery.
                                              MDS waiting on OST(s) to
                                              precreate new objects.

threadA1:
   -> cl_io_start()
    -> __generic_file_aio_write()
     -> file_remove_suid()
      -> ll_xattr_cache_refill()
       -> mdc_xattr_common()
        -> ptlrpc_get_mod_rpc_slot()
        [blocked waiting for RPC slot]

                       threadB1: write file F1,
       enqueue DoM MDC lock L1

                                              MDS sends blocking AST
                                              to clientA for lock L1

ldlm_threadA3: cannot cancel busy lock L1:
   -> ldlm_handle_bl_callback()
   ["Lock L1 referenced, will be cancelled later"]

                                              MDS evicts clientA for
                                              not cancelling lock L1

threadA1: never completes write:
  ->cl_io_end()
   ->cl_io_unlock()
    ->osc_lock_cancel()
     ->lock->l_writers--;

The fix is to add IT_GETXATTR to list of operations which do not
need mod rpc slot.

Tests to illustrate the issue is added.

wait_for_function(): total sleep time (wait) is to be equal to max
when 1 is returned.

Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
HPE-bug-id: LUS-7271
Change-Id: I1b80677df084bda141b9ac58a78b765bd0b14a41
Reviewed-on: https://review.whamcloud.com/44151
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14428 libcfs: separate daemon_list from cfs_trace_data 93/41493/10
Mr NeilBrown [Wed, 10 Feb 2021 22:49:54 +0000 (09:49 +1100)]
LU-14428 libcfs: separate daemon_list from cfs_trace_data

cfs_trace_data provides a fifo for trace messages.  To minimize
locking, there is a separate fifo for each CPU, and even for different
interrupt levels per-cpu.

When a page is remove from the fifo to br written to a file, the page
is added to a "daemon_list".  Trace message on the daemon_list have
already been logged to a file, but can be easily dumped to the console
when a bug occurs.

The daemon_list is always accessed from a single thread at a time, so
the per-CPU facilities for cfs_trace_data are not needed.  However
daemon_list is currently managed per-cpu as part of cfs_trace_data.

This patch moves the daemon_list of pages out to a separate structure
- a simple linked list, protected by cfs_tracefile_sem.

Rather than using a 'cfs_trace_page' to hold linkage information and
content size, we use page->lru for linkage and page->private for
the size of the content in each page.

This is a step towards replacing cfs_trace_data with the Linux
ring_buffer which provides similar functionality with even less
locking.

In the current code, if the daemon which writes trace data to a file
cannot keep up with load, excess pages are moved to the daemon_list
temporarily before being discarded.  With the patch, these page are
simply discarded immediately.
If the daemon thread cannot keep up, that is a configuration problem
and temporarily preserving a few pages is unlikely to really help.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie894f7751cadacb515215f18182163ea5d26e969
Reviewed-on: https://review.whamcloud.com/41493
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15107 mdt: Exclusive create isn't replayed 54/45254/6
Andriy Skulysh [Wed, 16 Sep 2020 11:38:41 +0000 (14:38 +0300)]
LU-15107 mdt: Exclusive create isn't replayed

mdt_finish_open() fails on exclusive create check as
there isn't an infrormation that the file was created.

Set DISP_OPEN_CREATE for exclusive open replay,
as we know that the original request has succeeded.

Change-Id: Idc71db76b757670b91b3bb1718870a018a805ed2
HPE-bug-id: LUS-9358
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/45254
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>