Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-10391 socklnd: factor out key calculation for ksnd_peers 03/42103/13
Mr NeilBrown [Mon, 9 Mar 2020 03:13:05 +0000 (14:13 +1100)]
LU-10391 socklnd: factor out key calculation for ksnd_peers

The hash_table library requires a "long" to be used as a key.  We
currently provide the nid, which at 64bits is a suitable long on 64bit
hosts, but isn't really correct on 32bit hosts.

When we change to an extend nid (which is 160bits) it will be even
less appropriate.

So create a separate function to compute a 'long' key, and implement
by simply xoring 'long'-sized parts of the nid together.  On a 64bit
machine, this is currently optimized away for lnet_nid_t, but that
will change when we convert to struct lnet_nid.

This new function is placed in lnet-types.h as it will be more
generally useful later.

The hash_table library calls hash_long() on the key, so we don't need
to do anything more interesting than xoring.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I22c59a87c9872bb59a2f47c2a8c57b287ed53ed3
Reviewed-on: https://review.whamcloud.com/42103
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lp_disc_*_nid to struct lnet_nid 20/44620/5
Mr NeilBrown [Tue, 22 Jun 2021 05:24:42 +0000 (15:24 +1000)]
LU-10391 lnet: change lp_disc_*_nid to struct lnet_nid

Change lp_disc_src_nid and lp_disc_dst_nid in struct lnet_peer to
struct lnet_nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0f127fbc790c0821900d7b8abfa56c1a7de8f944
Reviewed-on: https://review.whamcloud.com/44620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lp_primary_nid to struct lnet_nid 02/42102/13
Mr NeilBrown [Wed, 18 Aug 2021 20:31:46 +0000 (16:31 -0400)]
LU-10391 lnet: change lp_primary_nid to struct lnet_nid

Change lp_primary_nid in struct lnet_peer to struct lnet_nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I386e85062257f6f8832ffdf4b9603c0e1c072dae
Reviewed-on: https://review.whamcloud.com/42102
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lpni_nid in lnet_peer_ni to lnet_nid 01/42101/13
Mr NeilBrown [Mon, 6 Apr 2020 06:31:55 +0000 (16:31 +1000)]
LU-10391 lnet: change lpni_nid in lnet_peer_ni to lnet_nid

lpni_nid in 'struct lnet_peer_ni' is converted to 'struct lnet_nid'
and various supporting functions updated.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7a99f758b600a0dd0668edd368663ff65f603486
Reviewed-on: https://review.whamcloud.com/42101
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: add string formating/parsing for IPv6 nids 42/43942/8
Mr NeilBrown [Tue, 8 Jun 2021 04:02:14 +0000 (14:02 +1000)]
LU-10391 lnet: add string formating/parsing for IPv6 nids

New entries for struct netstrfns:
  nf_addr2str_size
  nf_str2addr_size
which accept or report the size of the address in bytes.
New matching functions that can report or parse IPv4 and IPv6
addresses.

New interface - currently unused - libcfs_strnid() which takes a str
and provides a 'struct lnet_nid' with appropriate nid_size.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idbfc8bb9502192e1dc6a217750e7f4431e3eca4a
Reviewed-on: https://review.whamcloud.com/43942
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: introduce struct lnet_nid 00/42100/26
Mr NeilBrown [Fri, 20 Aug 2021 14:17:35 +0000 (10:17 -0400)]
LU-10391 lnet: introduce struct lnet_nid

LNet nids are currently limited to 4-bytes for addresses.
This excludes the use of IPv6.

In order to support IPv6, introduce 'struct lnet_nid' which can hold
up to 128bit address and is extensible, and deprecate 'lnet_nid_t'.
lnet_nid_it will eventually be removed.  Where lnet_nid_t is often
passed around by value, 'struct lnet_nid' will normally be passed
around by reference as it is over twice as large.

The net_type field, which currently has value up to 16, is now limited
to 0-254 with 255 being used as a wildcard.  The most significant byte
is now a size field which gives the size of the whole nid minus 8.  So
zero is correct for current nids with 4-byte addresses.

Where we still need to use 4-byte-address nids, we will use names
containing "nid4".  So "nid4" is a lnet_nid_t when "nid" is a struct
lnet_nid.  lnet_nid_to_nid4 converts a 'struct lnet_nid' to an
lnet_nid_t.

While lnet_nid_t is stored and often transmitted in host-endian format
(and possibly byte-swapped on receipt), 'struct lnet_nid' is always
stored in network-byte-order (i.e.  big-endian).  This is more common
approach for network addresses.

In this first instance, 'struct lnet_nid' is used for ni_nid in
'struct lnet_ni', and related support functions.

In particular libcfs_nidstr() is introduced which parallels
libcfs_nid2str(), but takes 'struct lnet_nid'.

In cases were we need to have similar functions for old and new style
nid, the new function is introduced with a slightly different name,
such as libcfs_nid2str above, or LNET_NID_NET (like LNET_NIDNET).
It will be confusing having both, but the plan is to remove the old
names as soon as practical.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4dcf1bab856621915b6535958d77cdde89105d96
Reviewed-on: https://review.whamcloud.com/42100
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14930 mdt: abort_recov_mdt shouldn't abort client recovery 10/44610/2
Mikhail Pershin [Wed, 11 Aug 2021 14:30:48 +0000 (17:30 +0300)]
LU-14930 mdt: abort_recov_mdt shouldn't abort client recovery

When abort_recov_mdt is set to abort MDT-MDT recovery then
abort_recovery flag is set too inside target_stop_recovery_thread()
call, that causes not just MDT-MDT recovery abort but aborts
also clients/MDT recovery.

Fixes: dd9e79b64d ("LU-12546 mdt: abort recovery between MDTs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ibda05e91a2da90156e2b6c9fdcb2169cdbd50fe4
Reviewed-on: https://review.whamcloud.com/44610
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 tests: silence gcc10 error for badarea_io 70/44670/4
James Simmons [Mon, 16 Aug 2021 17:02:15 +0000 (13:02 -0400)]
LU-14093 tests: silence gcc10 error for badarea_io

With gcc10 badarea_io will fail to build with the following error.

badarea_io.c: In function ‘main’:
badarea_io.c:59:7: error: ‘write’ reading 2097152 bytes from a
                           region of size 4 [-Werror=stringop-overflow=]
   59 |  rc = write(fd, &fd, 2UL*1024*1024);
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Talking to Oleg see stated this is the done this way on purpose.
So instead of 'fixing' the issue in this case we silence the gcc
warning.

Test-Parameters: trivial
Test-Parameters: env=ONLY=133f,133g testlist=sanity
Change-Id: Iee79c7988cc209fd099c23c38a8bd7df96015b05
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14912 obdclass: prefer T10 checksum if the target supports it 57/44657/2
Li Dongyang [Fri, 13 Aug 2021 08:58:46 +0000 (18:58 +1000)]
LU-14912 obdclass: prefer T10 checksum if the target supports it

If the target actually has T10PI support, we prefer to use that
T10 checksum even it's not the fastest on the client, given
checksum_type is not explicitly set.

Change-Id: If91217881fcadbc84d1e360e65648344f5ac2447
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
2 years agoLU-14928 mgs: allow md target re-register 94/44594/5
Alexander Zarochentsev [Sun, 30 May 2021 13:43:05 +0000 (16:43 +0300)]
LU-14928 mgs: allow md target re-register

In a DNE system, it is not safe to do writeconf of
a MD target and attempt to mount (and re-register) it again,
as it creates a weird MDT-MDT osp devices like
fsname-MDT0001-osp-MDT0001" and makes the system non-functioning.
The fix doesn't allow creation of illegal devices.

HPE-bug-id: LUS-10098
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I698ee6d70ac96f54eaec57b5c5fe553d130ba011
Reviewed-on: https://review.whamcloud.com/44594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14926 utils: print unlink and setattr recs in llog_reader 91/44591/5
Alexander Zarochentsev [Fri, 16 Jul 2021 19:16:29 +0000 (22:16 +0300)]
LU-14926 utils: print unlink and setattr recs in llog_reader

Enhance llog_reader to print unlink and setattr llog records
correctly.

HPE-bug-id: LUS-10220
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7b44f65c976459d143521185a807939524f67fa2
Reviewed-on: https://review.whamcloud.com/44591
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14924 osd-ldiskfs: fix T10PI verify/generate_fn 48/44548/4
Li Dongyang [Tue, 10 Aug 2021 13:20:13 +0000 (23:20 +1000)]
LU-14924 osd-ldiskfs: fix T10PI verify/generate_fn

We are making wrong assumptions in the verify/generate_fn
of T10PI.

Consider this case: we have 4 pages lnb[0-3] in osd_iobuf.
lnb[2] is mapped to a hole, so it won't be added to bio.
If lnb[3] happens to be contiguous after lnb[1], lnb[3] will
be added to bio, with a bi_idx of 2.
In the verify/generate_fn, we work out which niobuf_local
to feed the guard tags to using bi_idx and obp_start_page_idx
and we will end up with wrong niobuf and set the guard tags
for lnb[2].

Contiguous blocks in bio doesn't necessarily mean we are looking
at contiguous niobuf_local/lnb in osd_iobuf->dr_lnbs

Test-Parameters: env=ONLY=77n testlist=sanity
Change-Id: I1ea1b6498692044e680c8754cd31e2c2b7bc9539
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44548
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14160 tests: fix fsx logdump to fit in 80 chars 10/44510/3
Andreas Dilger [Thu, 5 Aug 2021 20:45:46 +0000 (14:45 -0600)]
LU-14160 tests: fix fsx logdump to fit in 80 chars

Fix fsx logdump fallocate/truncate lines to fit within 80 columns.
Remove spurious leading 0 for every operation length.

Test-Parameters: trivial testlist=sanityn env=ONLY=16
Fixes: cb037f305c64 ("LU-14160 fallocate: Add punch mode to fallocate")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I93460b62be8611926e620241232d886dee3ebbe5
Reviewed-on: https://review.whamcloud.com/44510
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14773 tests: quiet down some verbose messages 34/44034/5
Andreas Dilger [Fri, 18 Jun 2021 21:35:29 +0000 (15:35 -0600)]
LU-14773 tests: quiet down some verbose messages

Don't print anything into the test logs for normal background
operations that are run as part of run_one(), so that they
don't clutter the test output with repeated/useless messages.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib6a49fc268e4cd0ad92c71a391865ce2d73ebbe5
Reviewed-on: https://review.whamcloud.com/44034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14321 tests: create PFL file in sanityn 51b 27/44027/4
James Nunez [Thu, 17 Jun 2021 23:25:55 +0000 (17:25 -0600)]
LU-14321 tests: create PFL file in sanityn 51b

sanityn test 51b was modified to integrate statx API in
Lustre version 2.13.54.  When we run version interop testing
with servers less than 2.13.54 and later clients, the test
will fail.

We should modify the test to create a PFL file without the
'extension-size' lfs setstripe option which will allow this
test to run with servers less than 2.13.54.

Fixes: 3f7853b31ef6 ("LU-10934 llite: integrate statx() API with Lustre")
Test-Parameters: trivial
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=51b testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic3feb72771aa2db050b792159175624260e71f5b
Reviewed-on: https://review.whamcloud.com/44027
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
2 years agoLU-12848 tests: link succeded to an ophan remote object 91/35991/14
Alexander Zarochentsev [Mon, 12 Aug 2019 20:59:05 +0000 (23:59 +0300)]
LU-12848 tests: link succeded to an ophan remote object

An open file gets unlinked by rename,
at the same time a cross-mdt link is able to create a name
for a dying object. That causes a file system corruption
seeing as a failed attempt to remove the test dir, also
e2fsck would see an unconnected inode.

Cray-bug-id: LUS-6208
Test-Parameters: mdtcount=2 envdefinitions=ONLY=111 testlist=sanityn
Change-Id: Ic1fde278e5f4b53eaf5560ab50fe460d8c7f7dc3
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/35991
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11303 quota: enforce block quota for chgrp 96/33996/17
Hongchao Zhang [Fri, 2 Apr 2021 06:53:59 +0000 (14:53 +0800)]
LU-11303 quota: enforce block quota for chgrp

In patch https://review.whamcloud.com/30146 "LU-5152 quota: enforce
block quota for chgrp", problems were introduced due to synchronous
requests from the MDS to the OSS to change the quota assignment of
files during chgrp operations. However, in some cases, the OSTs are
themselves out of grant and may send a quota request to the MDS,
which may result in a deadlock. Another issue is the slow performance
caused by the synchronous operation between MDT and OSTs.

This patch drops the synchronous RPC requirement of the original
patch #30146 to avoid this problem.

Previously, problems in quota tracking related to chgrp were introduced
due to synchronous RPCs from the MDS to the OSS when changing the group
ownership of objects for quota tracking since
Fixes: 8a71fd5061b ("LU-5152 quota: enforce block quota for chgrp")

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I40556b9e8a0628eb18aa806d2f6b3dfb9b53e874
Reviewed-on: https://review.whamcloud.com/33996
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew tag 2.14.54 2.14.54 v2_14_54
Oleg Drokin [Wed, 18 Aug 2021 14:31:36 +0000 (10:31 -0400)]
New tag 2.14.54

Change-Id: I062c9dc76585f42edfa78108f286824e75badf8c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 lutf: fix build with gcc10 84/44484/7
James Simmons [Wed, 4 Aug 2021 13:39:19 +0000 (09:39 -0400)]
LU-14093 lutf: fix build with gcc10

The new LUTF code has build issues with gcc10. I see the following
build errors.

ld: lutf-lutf_listener.o:lutf.h:88: multiple definition of `g_lutf_cfg'
ld: lutf-lutf_listener.o:lutf.h:22: multiple definition of `debugtimestr'
ld: lutf-lutf_listener.o:lutf.h:21: multiple definition of `di'
ld: lutf-lutf_listener.o:lutf.h:20: multiple definition of `debugnow'

In function ‘snprintf’,
    inlined from ‘python_run_interactive_shell’ at lutf_python.c:45:2:
stdio2.h:71:10: error: ‘%s’ directive argument is null [-Werror=format-truncation=]
   71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   72 |        __glibc_objsize (__s), __fmt,
      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   73 |        __va_arg_pack ());
      |        ~~~~~~~~~~~~~~~~~

This patch resolves these warnings. Without this patch LUTF will
not build on Ubuntu 20 LTS.

Test-Parameters: trivial
Change-Id: Ie3c99f8c6cf2f5de583dc95a0dc63fcde1aa6ffd
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44484
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14903 doc: update lfs-setdirstripe man page 81/44481/2
Lai Siyao [Mon, 2 Aug 2021 11:55:12 +0000 (07:55 -0400)]
LU-14903 doc: update lfs-setdirstripe man page

Update lfs-setdirstripe man page to reflect the change of
filesystem-wide default directory layout.

Test-parameters: trivial

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1e7818679e057add4747565a2fc850e1857cd7b0
Reviewed-on: https://review.whamcloud.com/44481
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-14899 ldiskfs: Add 5.4.136 mainline kernel support 50/44450/2
Oleg Drokin [Sat, 31 Jul 2021 04:55:40 +0000 (00:55 -0400)]
LU-14899 ldiskfs: Add 5.4.136 mainline kernel support

The changes likely appeared in an earlier release
that we may also track down and update to.

Test-Parameters: trivial
Change-Id: I92125087650109b8cc8a968b2fd95ba5f8e7f998
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 years agoLU-12815 socklnd: set conns_per_peer based on link speed 17/44417/4
Serguei Smirnov [Wed, 28 Jul 2021 21:47:39 +0000 (14:47 -0700)]
LU-12815 socklnd: set conns_per_peer based on link speed

Specifying conns_per_peer=0 for a ni is now used to set
the conns_per_peer as a function of the corresponding link speed
as follows:
conns_per_peer = (ilog2(Gbps) / 2 + 1)

Listed below are the resulting defaults for common link speeds:
100Gbps, 200Gbps -> 4
        50Gbps  -> 3
        5Gbps, 10Gbps  -> 2
        less than 4Gbps  -> 1

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ief2b33a796c180d8669bd5796b3e35ec748423a5
Reviewed-on: https://review.whamcloud.com/44417
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7] 76/44376/3
Jian Yu [Thu, 22 Jul 2021 07:26:50 +0000 (00:26 -0700)]
LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]

Update RHEL7.9 kernel to 3.10.0-1160.36.2.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ie2898b1df28c8b99ea4099e94baafe388c6aa626
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44376
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14865 utils: llog_reader.c printf type mismatch 46/44346/5
Gian-Carlo DeFazio [Tue, 20 Jul 2021 00:30:36 +0000 (17:30 -0700)]
LU-14865 utils: llog_reader.c printf type mismatch

Add (unsigned long long) cast to results of
__le64_to_cpu so that it matches the formatting (%llu)
of the enclosing printf call.

Build log message:
"llog_reader.c:887:9: error: format '%llu' expects
argument of type 'long long unsigned int', but
argument 3 has type '__u64' [-Werror=format=]"

Test-Parameters: trivial
Fixes: 9962d6f84db5 LU-14617 utils: llog_reader updatelog support
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I9549e0a0bd21727dfcc42992b693bc39a779e1a1
Reviewed-on: https://review.whamcloud.com/44346
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_* 09/44309/4
Mr. NeilBrown [Wed, 4 Aug 2021 17:27:29 +0000 (13:27 -0400)]
LU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_*

The calling convention for ->proc_handler is rather clumsy,
as a comment in fs/procfs/proc_sysctl.c confirms.
lustre has copied this convention to lnet_debugfs_{read,write},
and then provided a wrapper for handlers - lprocfs_call_handler -
to work around the clumsiness.

It is cleaner to just fold the functionality of lprocfs_call_handler()
into lnet_debugfs_* and let them call the final handler directly.

If these files were ever moved to /proc/sys (which seems unlikely) the
handling in fs/procfs/proc_sysctl.c would need to be fixed to, but
that would not be a bad thing.

So modify all the functions that did use the wrapper to not need it
now that a more sane calling convention is available.

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I548ed6a3179cdb7cd5c024febd3fee4709285a82
Reviewed-on: https://review.whamcloud.com/44309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14787 libcfs: Proved an abstraction for AS_EXITING 70/44070/6
Shaun Tancheff [Thu, 22 Jul 2021 07:31:30 +0000 (02:31 -0500)]
LU-14787 libcfs: Proved an abstraction for AS_EXITING

Linux kernel v3.14-7405-g91b0abe36a7b added AS_EXITING flag
AS_EXITING flag is set while address_space mapping is exiting.

Provide an abstraction mapping_clear_exiting() to clear
the AS_EXITING flag. This balances the kernel mapping_set_existing()
and is used for older kernels when enum mapping_flags does
not include AS_EXITING.

HPE-bug-id: LUS-9977
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib3101b7e3eb8a7fcfd0012ac27367f1e65537f5d
Reviewed-on: https://review.whamcloud.com/44070
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1] 37/44037/2
Jian Yu [Sat, 19 Jun 2021 00:26:07 +0000 (17:26 -0700)]
LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]

Update SLES12 SP5 kernel to 4.12.14-122.74.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I98952c097b14c68f744a570e5558fb21d9392ad2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14773 tests: skip check_network() on working node 33/44033/4
Andreas Dilger [Fri, 18 Jun 2021 20:55:51 +0000 (14:55 -0600)]
LU-14773 tests: skip check_network() on working node

Don't call check_network() (which can take several seconds per node)
if the get_param command ran successfully on all of the nodes.  The
get_param success implies the connection to the remote nodes works
properly, and completes more quickly.

For consistency with previous behavior, still call check_network() if
get_param didn't return any output, since the modules may be unloaded.

Remove some extra visual clutter from every subtest.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6a11cf8a1a6b43bebc3ff8f5506e1faac13ebbe5
Reviewed-on: https://review.whamcloud.com/44033
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14668 lnet: Lock primary NID logic 63/43563/5
Amir Shehata [Wed, 5 May 2021 18:35:06 +0000 (11:35 -0700)]
LU-14668 lnet: Lock primary NID logic

If a peer is created by Lustre make sure to lock that peer's
primary NID. This peer can be discovered in the background.
There is no need to block until discovery is complete, as Lustre
can continue on with the primary NID it provided.

Discovery will populate the peer with other interfaces the peer has
but will not change the peer's primary NID. It can also delete
peer's NIDs which Lustre told it about (not the Primary NID).

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I677b8e01fc89a42128327645861ca6cfba4c1b1a
Reviewed-on: https://review.whamcloud.com/43563
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14668 lnet: peer state to lock primary nid 62/43562/5
Amir Shehata [Wed, 5 May 2021 01:20:54 +0000 (18:20 -0700)]
LU-14668 lnet: peer state to lock primary nid

Introduce the following two peer states:

LNET_PEER_LOCK_PRIMARY, set by Lustre to lock the primary NID
of a peer to the NID Lustre is configured with

LNET_PEER_BAD_CONFIG, set by LNet if Lustre attempts to set
a peer's Primary NID to a NID used as the primary NID of another
peer

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8c55e90ad2abd083c2fc902a04d4cd06a3412bfa
Reviewed-on: https://review.whamcloud.com/43562
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer/peer NI when processing llog 10/43510/6
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/43510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14661 lnet: Provide kernel API for adding peers 09/43509/5
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/43509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14531 osd: serialize access to object vs object destroy 33/43233/18
Alex Zhuravlev [Thu, 18 Mar 2021 08:43:06 +0000 (11:43 +0300)]
LU-14531 osd: serialize access to object vs object destroy

in osd-zfs as ZFS doesn't provide an internal mechanism for this.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5f25710a5cf1568f124733a15e77a37ffcb55434
Reviewed-on: https://review.whamcloud.com/43233
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12815 socklnd: allow dynamic setting of conns_per_peer 63/41463/13
Serguei Smirnov [Mon, 2 Aug 2021 14:48:35 +0000 (10:48 -0400)]
LU-12815 socklnd: allow dynamic setting of conns_per_peer

Modify lnetctl and associated code to allow dynamic setting
of conns_per_peer lnd parameter per ni.

The parameter can be set for a specific active nid:
        lnetctl net set --nid 192.168.122.10@tcp --conns-per-peer=4

Or when adding a new net, taking effect on the new nid:
        lnetctl net add --net tcp --if eth0 --conns-per-peer=1

By default, conns_per_peer value specified as the module parameter
shall be used.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I11625b9ad61f0311c294001a38b7855465491aaf
Reviewed-on: https://review.whamcloud.com/41463
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 mgc: rework mgc_apply_recover_logs() for gcc10 84/40484/8
Alex Zhuravlev [Tue, 3 Aug 2021 14:15:10 +0000 (10:15 -0400)]
LU-14093 mgc: rework mgc_apply_recover_logs() for gcc10

rework mgc_apply_recover_logs() to use a separate buffer of
appropriate size so that gcc10 doesn't complain:
mgc_request.c:1506:24: error: argument 4 may overlap destination
        object [-Werror=restrict]
 1506 |        pos += sprintf(obdname + pos, "-%s-%s", cname, inst);

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ice863b412475e53705dc6523ab30ba613244bd90
Reviewed-on: https://review.whamcloud.com/40484
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-6142 tests: remove iam_ut binary 09/44509/3
Andreas Dilger [Thu, 5 Aug 2021 20:21:44 +0000 (14:21 -0600)]
LU-6142 tests: remove iam_ut binary

Remove iam_ut binary that was incorrectly committed many years ago.

Test-Parameters: trivial
Fixes: 6e679230f2f5 ("LU-6142 tests: Remove file iam_ut.c")
Fixes: d2d56f38da01 ("make HEAD from b_post_cmd3")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2c254d990a3f07cad4feb7969e646d856b3ebbe5
Reviewed-on: https://review.whamcloud.com/44509
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14876 out: don't connect to busy MDS-MDS export 90/44390/5
Mikhail Pershin [Wed, 21 Jul 2021 15:14:01 +0000 (18:14 +0300)]
LU-14876 out: don't connect to busy MDS-MDS export

MDS-MDS connection is missing check for busy requests upon
reconnect, so resent can be executed concurrently with
original request.

- in ptlrpc_server_check_resend_in_progress() remove exception
  for bulk requests, they can be compared by XID nowadays.
  This prevents OUT requests vs resent execution as well.
- fix messages in target_handle_connect() to report correct
  information about connection details
- in out_handle() check for last_xid only once per OUT_UPDATE
- test 110m is added to recovery-small to reproduce the issue

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2ad183674d59a2cdeab0037bd8551c607b10ffeb
Reviewed-on: https://review.whamcloud.com/44390
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14798 lustre: Support RDMA only pages 11/44111/2
Amir Shehata [Thu, 6 Feb 2020 04:23:20 +0000 (20:23 -0800)]
LU-14798 lustre: Support RDMA only pages

Some memory architectures and CPU-offload cards with
on-board memory do not map data pages into the CPU
address space. Allow RDMA of data directly into those
pages without accessing contents.

Therefore, made changes to prevent doing checksum on
these type of pages.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I189c34893ffa500ed275f2a1f79e8fb817a2489d
lustre-change: https://review.whamcloud.com/37454
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44111
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14798 lnet: add LNet GPU Direct Support 10/44110/2
Amir Shehata [Thu, 6 Feb 2020 03:14:17 +0000 (19:14 -0800)]
LU-14798 lnet: add LNet GPU Direct Support

This patch exports registration/unregistration functions
which are called by the NVFS module to let the LND know
that it can call into the NVFS module to do RDMA mapping
of GPU shadow pages.

GPU priority is considered during NI selection.

Less than 4K writes are always RDMAed if the rdma source is
the gpu device

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2bfdbdd5fe3b8536e616ab442d18deace6756d57
lustre-change: https://review.whamcloud.com/37368
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14893 lctl: check user for changelog_deregister 32/44432/3
Emoly Liu [Fri, 30 Jul 2021 08:13:12 +0000 (16:13 +0800)]
LU-14893 lctl: check user for changelog_deregister

If no user is specified for "lctl changelog_deregister", usage
should be printed correctly.
Also, sanity.sh test_106e is modified to verify this fix.

Fixes: a15eb4f13224e ("LU-13055 mdd: per-user changelog names and mask")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ia7f1b18e82f6b4174b9435cd67aba5f591d43ce1
Reviewed-on: https://review.whamcloud.com/44432
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14881 libcfs: Complete testing for tcp_sock_set_* 74/44374/3
Shaun Tancheff [Thu, 22 Jul 2021 08:58:44 +0000 (03:58 -0500)]
LU-14881 libcfs: Complete testing for tcp_sock_set_*

Linux commits:
  v5.7-rc6-2504-gddd061b8daed
  tcp: add tcp_sock_set_quickack

  v5.7-rc6-2508-gd41ecaac903c
  tcp: add tcp_sock_set_keepintvl

  v5.7-rc6-2509-g480aeb9639d6
  tcp: add tcp_sock_set_keepcnt

Introduced a series of helper functions that may be
back ported individually.

Test-Parameters: trivial
Fixes: 99d9638d6c ("LU-13783 libcfs: support removal of kernel_setsockopt()")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4fce67b801979ec7857265b6bd0370c05737e268
Reviewed-on: https://review.whamcloud.com/44374
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14413 test: test for overstriping for sanity 27M 40/44340/8
James Simmons [Wed, 28 Jul 2021 00:10:29 +0000 (20:10 -0400)]
LU-14413 test: test for overstriping for sanity 27M

The introduction of sanity 27M broke interop with 2.12 LTS since
over striping doesn't exist in that version. Adjust the test to
use over striping if the client supports it, otherwise just use
traditional striping.

Test-Parameters: trivial testlist=sanity env=ONLY=27M
Change-Id: I2d788a116cbb749a83d6cec36f97d06533b32421
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44340
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14740 quota: reject invalid project id on server side 39/44339/2
Wang Shilong [Mon, 19 Jul 2021 07:14:43 +0000 (15:14 +0800)]
LU-14740 quota: reject invalid project id on server side

do sanity check before transfer project ID, reject invalid
project id if it comes from some older clients.

Test-parameters: trivial testlist=sanity-quota

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: If89e320c7808d188e615f5f0923c2322774b2ceb
Reviewed-on: https://review.whamcloud.com/44339
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 obdclass: move lu_ref to debugfs 11/44311/6
James Simmons [Tue, 20 Jul 2021 20:14:09 +0000 (16:14 -0400)]
LU-8066 obdclass: move lu_ref to debugfs

A special procfs file is created for lu_ref debugging. Lets move
this to debugfs where it belongs.

Also fixed a missed USE_LU_REF due to landing order as well as
a build fix.

Fixes: dfe2d225b86 ("LU-13799 clio: Implement real list splice")
Change-Id: I33646a87adfcabc5a5f214832953b2444e7aaf0a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14790 lnet: Reflect ni_fatal in NI status 72/44072/2
Chris Horn [Thu, 24 Jun 2021 17:16:46 +0000 (12:16 -0500)]
LU-14790 lnet: Reflect ni_fatal in NI status

If the ni_fatal_error_on flag is set on an NI then that NI should be
considered down.

HPE-bug-id: LUS-10167
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I201bda7e06da1fb1cc23db70ce0cfa3118635d0f
Reviewed-on: https://review.whamcloud.com/44072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14694 mdt: do not remove orphans at umount 83/43783/20
Alex Zhuravlev [Tue, 25 May 2021 15:39:14 +0000 (18:39 +0300)]
LU-14694 mdt: do not remove orphans at umount

as it's very likely that another MDT is being umounted as well
and such a removal can get stuck if the object being removed
is a striped directory.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0417b1b4447887e166c144605bbfa3249126eacd
Reviewed-on: https://review.whamcloud.com/43783
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 libcfs: discard cfs_cap_t, use kernel_cap_t 71/43171/9
Mr. NeilBrown [Wed, 14 Jul 2021 16:15:26 +0000 (12:15 -0400)]
LU-9859 libcfs: discard cfs_cap_t, use kernel_cap_t

lustre only sends 32bits of capabilities in on-the-wire RPC calls.
It current strips off higher bits and uses a 32bit cfs_cap_t
throughout.
Though there is a small memory cost, it is cleaner to use
kernel_cap_t throughout and only truncate when marshalling
data for RPC calls.

So this patch replaces cfs_cap_t with kernel_cap_t throughout,
and where a cfs_cap_t was previous stored in a __u32, we now
store cap.cap[0] instead.

With this, we can remove include/linux/libcfs/curproc.h

Linux-commit: 18f92a6e3d6bd00941ddfb5837835348f72d39dc

Change-Id: If7dd7a16c218dfc0d520e189f021ed6bda3b93fd
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/43171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lnet: LUTF Python infra 87/38087/50
Amir Shehata [Wed, 25 Mar 2020 02:23:43 +0000 (19:23 -0700)]
LU-10973 lnet: LUTF Python infra

Added the python LUTF infrastructure. The python infrastructure
provides the core LUTF feature set. The tests-infra is lnet
specific infrastructure to be used by LUTF test suites.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1d0336606625424880f1b64b1dd296d4c7ed85ea
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lnet: LUTF infrastructure updates 77/44177/3
Amir Shehata [Mon, 5 Jul 2021 18:17:16 +0000 (11:17 -0700)]
LU-10973 lnet: LUTF infrastructure updates

Fix Agent management
Handle python failures properly.
Change default location for temporary files to be in /tmp/lutf

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e37b6226dfa12de4b7a1f5bfd87f84e91ee1dda
Reviewed-on: https://review.whamcloud.com/44177
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-6142 lustre: use list_first_entry() in lustre subdirectory. 38/44338/2
Mr. NeilBrown [Sun, 18 Jul 2021 12:57:33 +0000 (08:57 -0400)]
LU-6142 lustre: use list_first_entry() in lustre subdirectory.

Convert
  list_entry(foo->next .....)
to
  list_first_entry(foo, ....)

in 'lustre'

In several cases the call is combined with
a list_empty() test and list_first_entry_or_null() is used

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I27b8b55cac2cfeaf95bb66930958c49ad422156e
Reviewed-on: https://review.whamcloud.com/44338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14382 mdt: implement fallocate in MDC/MDT 18/41418/11
Mikhail Pershin [Mon, 1 Feb 2021 21:16:17 +0000 (00:16 +0300)]
LU-14382 mdt: implement fallocate in MDC/MDT

- add CLIO fallocate() handling in MDC
- implement FALLOCATE RPC handling at MDT side
- update test group 150 in sanity to work with
  sanity-dom.sh test

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I46c25c6c7fd72fe14d558b594ecc63c2a3ad81b2
Reviewed-on: https://review.whamcloud.com/41418
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14844 tests: make sure mgc_requeue_timeout_min exist. 15/44215/13
James Simmons [Wed, 14 Jul 2021 17:07:59 +0000 (13:07 -0400)]
LU-14844 tests: make sure mgc_requeue_timeout_min exist.

The module parameter mgc_requeue_timeout_min was introduced to reduce
testing times. Currently the test framework always attempts to set this
value but it doesn't exist in earlier Lustre versions which breaks
interop testing. Set the module parameter only if it exist.

Change-Id: I64f62e3d6e2faeba99ced98363d241083f95d92e
Fixes: 04b2da6180d ("LU-14516 mgc: configurable wait-to-reprocess time")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44215
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14900 tests: do not fail if kmemleak tunable is not writable 51/44451/3
Oleg Drokin [Sat, 31 Jul 2021 05:43:58 +0000 (01:43 -0400)]
LU-14900 tests: do not fail if kmemleak tunable is not writable

Change-Id: Id77430f1e8ff7a8fda6538211a0d36bbe973a889
Test-Parameters: trivial
Fixes: 15c0a21ea9 ("tests: Add kmemleak awareness to test-framework")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44451
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-14880 libcfs: Use crypto/sha2.h if available 73/44373/4
Shaun Tancheff [Thu, 22 Jul 2021 07:45:37 +0000 (02:45 -0500)]
LU-14880 libcfs: Use crypto/sha2.h if available

As of Linux commit a24d22b225ce158651378869a6b88105c4bdb887
   crypto: sha - split sha.h into sha1.h and sha2.h

sha.h is removed and sha2.h or sha3.h is preferred.

Test-Parameters: trivial
Fixes: a813e81870 ("LU-12275 sec: add llcrypt as file encryption library")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iead5e1cb23e79a400da3cbfeb4c35c834e821d62
Reviewed-on: https://review.whamcloud.com/44373
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 gss: gcc10 fixes for GSS 63/44363/4
James Simmons [Wed, 21 Jul 2021 16:41:10 +0000 (12:41 -0400)]
LU-14093 gss: gcc10 fixes for GSS

Building with gcc10 reports the following issues when building
GSS:

gss_util.h:37: multiple definition of `this_realm';
gssd.h:73: multiple definition of `clnt_list';
svcgssd.h:38: multiple definition of `krb_enabled';

Properly scope these variables.

Change-Id: I05fc298fb90d67314c6963273688c2577099188a
Test-Parameters: env=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44363
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13299 lnet: add "stats reset" to lnetctl 50/44150/2
Cyril Bordage [Tue, 6 Jul 2021 12:26:09 +0000 (14:26 +0200)]
LU-13299 lnet: add "stats reset" to lnetctl

This new command resets stats shown by "lnetctl stats show". It could
be useful when debugging connectivity issues, by making easier the
process to detect the changes in stats from the clean state rather
than on top of historical values.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I4195a862fa5e04d96ac4c2b1509b625c90fbb579
Reviewed-on: https://review.whamcloud.com/44150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14806 o2iblnd: clear fatal error on successful failover 39/44139/4
Serguei Smirnov [Mon, 5 Jul 2021 18:23:33 +0000 (11:23 -0700)]
LU-14806 o2iblnd: clear fatal error on successful failover

In IB bonding configuration link down event causes fatal error
flag to be set on the bonded interface so it is not selected by
LNet for tx, e.g. when just one of the two cables is pulled.
This change allows for the interface status to be restored on
successful failover.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ifd55b141e73d01a187c02ede3f021f0eab18e0bb
Reviewed-on: https://review.whamcloud.com/44139
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14792 llite: enable filesystem-wide default LMV 90/44090/17
Lai Siyao [Mon, 21 Jun 2021 03:52:01 +0000 (11:52 +0800)]
LU-14792 llite: enable filesystem-wide default LMV

This change includes three parts:
1. save dir depth to ROOT after lookup on client side.
2. once space balanced default LMV is set on ROOT, and
   max-inherit/max-inherit-rr is unlimited or not less than directory
   depth, new directory will be created in QOS or roundrobin mode.
3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to
   3, and increase the ratio to create subdirectory on local MDT with
   the directory depth to ROOT, so that new directories will be
   created by space usage, and the deeper it's located it's more
   likely to create on local MDTs; and the top 3 layer will be created
   in roundrobin mode if system is balanced.

Set default LMV in mkdir_on_mdt() to make sure its subdirectories are
created on the same MDT. Add sanity 413d.

Create a test directory on MDT0 for pjdfstest, because cross-MDT
rename of symlink will migrate symlink to target MDT, which will cause
inode change (LU-11631).

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib3a133ac99655ca04443b9498e6618033f6b88b9
Reviewed-on: https://review.whamcloud.com/44090
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14621 mdd: fix lock-tx order in mdd_xattr_merge() 66/43366/15
Alex Zhuravlev [Mon, 19 Apr 2021 05:42:26 +0000 (08:42 +0300)]
LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()

to follow common transaction-first-then-locks rule.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I437c919e498781b8d5dc653bf90aac799df4882a
Reviewed-on: https://review.whamcloud.com/43366
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13417 mdd: set default LMV on ROOT 53/38553/34
Lai Siyao [Wed, 28 Apr 2021 15:02:23 +0000 (23:02 +0800)]
LU-13417 mdd: set default LMV on ROOT

To balance MDT usage, set default LMV on ROOT if it's not set. The
default stripe offset is "-1", and default stripe count is "1". Then
directory created by "mkdir" under ROOT will be scattered on all MDTs
by usage.

Add sanity 0e.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I7a6c752256225b8d065b2c304c4725268df28045
Reviewed-on: https://review.whamcloud.com/38553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoNew tag 2.14.53 2.14.53 v2_14_53
Oleg Drokin [Fri, 30 Jul 2021 19:20:13 +0000 (15:20 -0400)]
New tag 2.14.53

Change-Id: Idff781cb6333d4f0af90c0e729f3cd0231022a5a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag 13/40813/4
Qian Yingjin [Tue, 1 Dec 2020 08:22:08 +0000 (16:22 +0800)]
LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag

The upcoming new feature PCC-RO is combined with FLR and extend
the on-disk data strucutre 'enum lov_comp_md_flags' for layout
components. It adds a new layout flag: LCM_FL_PCC_RDONLY.

enum lov_comp_md_flags {
LCM_FL_NONE = 0x0,
LCM_FL_RDONLY = 0x1,
LCM_FL_WRITE_PENDING = 0x2,
LCM_FL_SYNC_PENDING = 0x3,
LCM_FL_PCC_RDONLY = 0x8,
LCM_FL_FLR_MASK         = 0xB,
};

The LCM_FL_PCC_RDONLY flag, which is dedicated for PCC-RO, is
different from LCM_FL_RDONLY.
A PCC-RO cached file could be in the state:
- LCM_FL_PCC_RDONLY | LCM_FL_RDONLY: it means that all FLR
  components are synced and in up-to-date state. The replicated
  file is on read-only state. And then one client attaches the
  file into the PCC backend with PCC-RO mode.
- LCM_FL_PCC_RDONLY | LCM_FL_WRITE_PENDING: it means the file was
  once modified, the data content of layout components are not
  synced. MDT has already picked a promary replica and marked
  other components as STALE. At this time, a client can still
  PCC-RO attach the file. On this client, the primary component
  and the PCC copy are both in up-to-date state.

As a new LCM_FL_PCC_RDONLY flag is added, the old client may not
understand this new FLR layout flag, and may result in
inconsistent data access.

This patch adds this new flag for the purpose of compatibility and
interoperability.

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I2e96f413c0b35355503c20dfea0a39d39a216d90
Reviewed-on: https://review.whamcloud.com/40813
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14814 osc: osc: Do not flush on lockless cancel 52/44152/7
Patrick Farrell [Tue, 6 Jul 2021 15:20:56 +0000 (11:20 -0400)]
LU-14814 osc: osc: Do not flush on lockless cancel

The cancellation of a an OSC lock without an LDLM lock
(a 'lockless' OSC lock) should not flush pages.  Only
direct i/o is allowed to use a lockless OSC lock, and
direct i/o does not create flushable pages.

DIO pages are not flushable because:
A) all synced ASAP, and
B) the OSC extents created for them are not added to the
extent tree which is used to track these pages.

Instead, this has the effect of trying to flush pages from
ongoing buffered i/o.  This can lead to crashes like the
following:

osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed

This assert essentially says the lock cancellation
(hp == 1) found an active i/o (an extent in the OES_ACTIVE
state).

This is not allowed because the flushing code assumes an
LDLM lock is being cancelled, which will only start once
there is no active i/o.  Because the OSC lock being
cancelled is not associated with an LDLM lock, this is not
true, and nothing prevents active i/o under a different
lock, leading to this assert.

The solution is simply to not flush pages when cancelling a
no-LDLM-lock OSC lock.

Additional note:
New lockless OSC locks cannot be created if they are
blocked by a regular OSC lock, but a new regular lock can
be created if there is a lockless lock present.

Thus, the sequence is something like this:
Direct i/o creates lockless OSC lock
Buffered i/o creates OSC and LDLM lock on the same range
Direct i/o finishes, starts cancelling its OSC lock
Buffered i/o is still ongoing, with extents in OES_ACTIVE

This results in the above crash during the OSC lock
cancellation.

Note it would be possible to resolve this issue by not
allowing lockless OSC locks to match regular OSC locks, but
this is not necessary, since there's no reason for lockless
locks to flush pages on cancellation.

Test-Parameters: env=ONLY=398b,ONLY_REPEAT=200 testlist=sanity
Test-Parameters: env=ONLY=77,ONLY_REPEAT=100 testlist=sanityn
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iceb1747b66232cad3f7e90ec271310a13a687a33
Reviewed-on: https://review.whamcloud.com/44152
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14838 osc: Remove client contention support 05/44205/5
Patrick Farrell [Fri, 9 Jul 2021 20:13:36 +0000 (16:13 -0400)]
LU-14838 osc: Remove client contention support

Lockless buffered i/o and contention detection don't work,
lockless bufferd i/o is unfixable and contention detection
is broken enough that it will have to be rewritten.

Let's remove both.  This patch starts the removal by
pulling the client side support.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If8583eff176bddb33e197befb967d229f8ca5688
Reviewed-on: https://review.whamcloud.com/44205
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14838 osc: Remove lockless truncate 04/44204/5
Patrick Farrell [Fri, 9 Jul 2021 20:13:09 +0000 (16:13 -0400)]
LU-14838 osc: Remove lockless truncate

Lockless truncate does not work and cannot be made to work.

Fundamentally, it has no means of ensuring consistency
across clients because it can't force them all to drop
cached data without locking.

It's been off for years - let's just get rid of it.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia2979fb6b31a61da6d4833e9f463fcd5b6dbd718
Reviewed-on: https://review.whamcloud.com/44204
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c 32/44332/2
Mr. NeilBrown [Fri, 16 Jul 2021 12:24:10 +0000 (08:24 -0400)]
LU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c

This type is only used in libcfs/module.c, so make it local to there.
If any other module ever wanted to add its own symlinks,
it would probably be easiest to export lnet_debugfs_root
and just call debugfs_create_symlink as required.

Linux-commit: c4f907719736b720aa831447828809840e533371

Test-Parameters: trivial
Change-Id: I08221cfc781735451a292ba20cd35b9172fc20f2
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/44332
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 years agoLU-14637 flr: get rid of excluding dom+flr support test 85/44185/8
Bobi Jam [Thu, 8 Jul 2021 14:34:18 +0000 (22:34 +0800)]
LU-14637 flr: get rid of excluding dom+flr support test

Now that DoM+FLR are supported, fix the tests that expect this
combination of features on a file to fail.

Fixes: 0bff64be320fd ("LU-9771 flr: to not support dom+flr for phase 1")
Fixes: 44a721b8c1063 ("LU-11421 dom: manual OST-to-DOM migration via mirroring)
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9fc76e797e469744107e5d0453b78729226be0ee
Reviewed-on: https://review.whamcloud.com/44185
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14789 tests: make sanity 133f and 133g working 84/44184/4
Cyril Bordage [Thu, 8 Jul 2021 14:35:44 +0000 (16:35 +0200)]
LU-14789 tests: make sanity 133f and 133g working

Tests sanity 133f and 133g were doing nothing after change 38567.
Because the argument given to badarea_io was not a path, 0 was always
returned. This patch finds the complete path of the parameters
returned by "lctl get_param" and gives them to badarea_io.

Fixes: 1c54733894f8 ("LU-10401 tests: add -F so list_param prints entry type")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7a8914e2950d5a8b2a93948c4fbe889520a3198c
Reviewed-on: https://review.whamcloud.com/44184
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-14788 lnet: check memdup_user_nul using IS_ERR 91/44091/4
Cyril Bordage [Mon, 28 Jun 2021 13:13:21 +0000 (15:13 +0200)]
LU-14788 lnet: check memdup_user_nul using IS_ERR

Crash in __proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.

Fixes: 67af976c806 ("LU-14428 libcfs: discard cfs_trace_copyin_string()"
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I1fabf2499b6bbee7b94a2f802fbcbd9270d901b3
Reviewed-on: https://review.whamcloud.com/44091
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13055 doc: update changelog manpages 22/44022/5
Mikhail Pershin [Thu, 17 Jun 2021 14:11:51 +0000 (17:11 +0300)]
LU-13055 doc: update changelog manpages

Add lctl-changelog_register.8 and lctl-changelog_deregister.8
manpages and update lctl.8 manpage to refer to them.

Fixes: 15305c3c3fe7 ("LU-12214 build: fix build without lustre_utils")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie41db630c72f61a884cd8000e0a4aeeb42ca60eb
Reviewed-on: https://review.whamcloud.com/44022
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14748 build: gcc9 fix address of packed member warning 61/43961/4
Shaun Tancheff [Wed, 9 Jun 2021 16:25:21 +0000 (11:25 -0500)]
LU-14748 build: gcc9 fix address of packed member warning

gcc9 complains about use of __swabXXs() with some packed
structures.
Use __swabXX() for these cases.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I7437c6f21d38c209ef452b41760aad6d1d3d6034
Reviewed-on: https://review.whamcloud.com/43961
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14740 llite: avoid project quota overflow 39/43939/8
Wang Shilong [Mon, 7 Jun 2021 15:40:22 +0000 (23:40 +0800)]
LU-14740 llite: avoid project quota overflow

Currently, project ID is stored as u32, max possible
value for it is 4294967295.

However, VFS reserve max value for special usage, see
following function:

  static inline bool
  qid_has_mapping(struct user_namespace *ns, struct kqid qid)
  {
          return from_kqid(ns, qid) != (qid_t) -1;
  }

So qid_has_mapping() could return 0 for id 4294967295.
A further try on chown test:

  $ chown 4294967295:4294967295 c.sh
  chown: invalid user: ‘4294967295:4294967295
  $ chown 4294967294:4294967294 c.sh

Fix to check max possible value for project ID in the
client kernel side, and add a test case for this.

Test-parameters: trivial testlist=sanity-quota
Fixes: 7b5c1f1404c3 ("LU-13845 utils: Quota id 0xFFFFFFFF is invalid")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ide8b9cc79d9b7f2a8b9860a0c0f683ec903b8f91
Reviewed-on: https://review.whamcloud.com/43939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14430 mdd: rename mti_fid to mdi_fid and friends 40/43740/5
Andreas Dilger [Tue, 18 May 2021 17:56:32 +0000 (11:56 -0600)]
LU-14430 mdd: rename mti_fid to mdi_fid and friends

Rename mdd_thread_info fields to avoid confusion with mdt_thread_info.
The final patch to rename mdd_thread_info fields to a unique prefix:

  mti_cattr->mdi_cattr
  mti_fid->mdi_fid
  mti_fid2->mdi_fid2
  MTI_KEEP_KEY->MDI_KEEP_KEY
  mti_la_for_fix->mdi_la_for_fix
  mti_la_for_start->mdi_la_for_start
  mti_pattr->mdi_pattr
  mti_tattr->mdi_tattr
  mti_tpattr->mdi_tpattr

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I17bcc3ddfae400a5ca76e4f654c696da6d3ebbe5
Reviewed-on: https://review.whamcloud.com/43740
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: handle null algo for filename encryption 88/43388/6
Sebastien Buisson [Thu, 25 Mar 2021 16:55:35 +0000 (17:55 +0100)]
LU-13717 sec: handle null algo for filename encryption

Encrypted files created with Lustre 2.14 have clear text file names.
With new code implementing filename encryption, newly created files
will have cipher text names, unless they are in an encrypted directory
created in Lustre 2.14.

So we need to make sure llcrypt library can properly handle the "null"
algorithm for client side filename encryption, which is basically a
no-op.
Handling this "null" algo for filename encryption will not be possible
with the in-kernel fscrypt library, so modify the behaviour of
configure to build with embedded llcrypt by default, and only build
against in-kernel fscrypt if explicitly specified via
--enable-crypto=in-kernel configure option.

The objective is to urge users to convert their encrypted directories
to the new fashion that encrypts filenames.
However, with the new code some operations on encrypted files created
with 2.14 might not be possible, like migrate, so expressly forbid
migrate on files that use the "null" algorithm for client side
filename encryption.

Finally, we revert commit 11fcbfa9de4a5170abc2c5df2a6e4e02f0f84268
("LU-12275 sec: force file name encryption policy to null") so that
new encrypted directories will enforce filename encryption.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I393945adc9b720a56544b5da0669cb2848507457
Reviewed-on: https://review.whamcloud.com/43388
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 osc: Improve osc_queue_sync_pages 82/39482/23
Patrick Farrell [Tue, 15 Jun 2021 14:23:04 +0000 (10:23 -0400)]
LU-13799 osc: Improve osc_queue_sync_pages

This patch was split and partially done in:
https://review.whamcloud.com/38214

So the text below refers to the combination of this patch
and that one.  This patch now just improves a looped atomic
add by replacing with a single one.  The rest of the grant
calcuation change is in
https://review.whamcloud.com/38214

(I am retaining the text below to show the performance
improvement)
----------
osc_queue_sync_pages now has a grant calculation component,
this has a pretty painful impact on the new faster DIO
performance.  Specifically, per page ktime_get() and the
per-page atomic_add cost close to 10% of total CPU time in
the DIO path.

We can make this per batch of pages rather than for each
page, which reduces this cost from 10% of CPU to almost
nothing.

This improves write performance by about 10% (but has no
effect on reads, since they don't use grant).

This patch reduces i/o time in ms/GiB by:
Write: 10 ms/GiB
Read: 0 ms/GiB

Totals:
Write: 158 ms/GiB
Read: 161 ms/GiB

mpirun -np 1 $IOR -w -t 1G -b 64G -o $FILE --posix.odirect

Before patch:
write     6071

After patch:
write     6470

(Read is similar.)

This also fixes a mistake in c24c25dc1b / LU-13419 where it
removed the shrink interval update entirely from the direct
i/o path.

Fixes: c24c25dc1b ("LU-13419 osc: Move shrink update to per-write")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ic606e03be58239c291ec0382fa89eba64560da53
Reviewed-on: https://review.whamcloud.com/39482
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 clio: Skip prep for transients 48/39448/17
Patrick Farrell [Fri, 7 May 2021 19:51:32 +0000 (15:51 -0400)]
LU-13799 clio: Skip prep for transients

The work done by cpo_prep() (etc) is unnecessary for
transient pages.  This gives only a minimal performance
boost and is better seen as a step towards removing the
cl_page abstraction for transient pages.

But, it does consistently give around 1% better
performance.

This patch reduces i/o time in ms/GiB by:
Write: 1 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 169 ms/GiB
Read: 161 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        6028 MiB/s
read         6305 MiB/s

Plus this patch:
write        6071 MiB/s
read         6355 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ib94f57cde468c9aaea952e1bb89db8fcf4b35e07
Reviewed-on: https://review.whamcloud.com/39448
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Adjust dio refcounting 47/39447/16
Patrick Farrell [Fri, 7 May 2021 19:50:15 +0000 (15:50 -0400)]
LU-13799 llite: Adjust dio refcounting

We get a page reference in cl_page_find, then immediately
add another for cl_2queue_add and remove the first
reference.  This is pretty silly, since the life cycle is
the same on these.

This improves DIO/AIO page submission by around 2%.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 170 ms/GiB
Read: 162 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous pa5ches in series:
write        5955 MiB/s
read         6218 MiB/s

Plus this patch:
write        6028 MiB/s
read         6305 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I228eca6d48c6007bbf2c8caae5e477b7d40521d1
Reviewed-on: https://review.whamcloud.com/39447
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 lov: Improve DIO submit 46/39446/16
Patrick Farrell [Fri, 7 May 2021 19:42:20 +0000 (15:42 -0400)]
LU-13799 lov: Improve DIO submit

Skip some unnecessary looping in page submission for the
DIO case.

This gives about a 2% improvement for AIO/DIO page
submission.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 172 ms/GiB
Read: 165 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7726 MiB/s
read         5899 MiB/s

Plus this patch:
write        5954 MiB/s
read         6217 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Iedad978438ee3f1f3290d990311532626cba9e2d
Reviewed-on: https://review.whamcloud.com/39446
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Remove transient page counting 41/39441/15
Patrick Farrell [Sat, 29 May 2021 01:32:43 +0000 (21:32 -0400)]
LU-13799 llite: Remove transient page counting

Transient page counting is not used for anything, as
already noted in the commit message, but costs something
like 4% of the time in DIO page submission.

Remove it.

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 11 ms/GiB

Totals:
Write: 174 ms/GiB
Read: 167 ms/GiB

With previous patches in series:
write     5703 MiB/s
read      5756 MiB/s

Plus this patch:
write     5900 MiB/s
read      6136 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I825de4f1b5d1dd1476a4a711bfa51e7d24b5027a
Reviewed-on: https://review.whamcloud.com/39441
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-13799 llite: Modify AIO/DIO reference counting 42/39442/14
Patrick Farrell [Fri, 7 May 2021 15:50:51 +0000 (11:50 -0400)]
LU-13799 llite: Modify AIO/DIO reference counting

For DIO pages, it's enough to have a reference on the
cl_object associated with the AIO.  This saves taking a
reference on the cl_object for each page, which saves about
5% of the time when doing DIO/AIO.

This is possible because the lifecycle of the aio struct is
always greater than that of the associated pages.

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 198 ms/GiB
Read: 197 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5030 MiB/s
read      5174 MiB/s

Plus this patch:
write     5183 MiB/s
read      5200 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I970cda20417265b4b66a8eed6e74440e5d3373b8
Reviewed-on: https://review.whamcloud.com/39442
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13326 mds: remove MDS_SETATTR_PORTAL and service 98/37798/7
Andreas Dilger [Wed, 4 Mar 2020 20:28:26 +0000 (12:28 -0800)]
LU-13326 mds: remove MDS_SETATTR_PORTAL and service

Remove the MDS_SETATTR_PORTAL and the service threads listening on
this portal since they are unused since Lustre 2.1 and are no longer
needed.

Remove module tunables related to the mds_attr service threads:
- mds_attr_num_threads
- mds_attr_cpu_bind
- mds_attr_num_cpts

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I64f4f3f0004e1895ef7b49b31a4ad687a1abcca2
Reviewed-on: https://review.whamcloud.com/37798
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13417 test: mkdir_on_mdt0() in more tests 15/44315/8
Lai Siyao [Thu, 8 Jul 2021 08:09:01 +0000 (16:09 +0800)]
LU-13417 test: mkdir_on_mdt0() in more tests

Replace mkdir with mkdir_on_mdt0() in several tests.

Update recovery-small test_110k() in case there are opened files on
MDT1 which would cause umount stall.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iebc32568b7fc146b658f47c5f5053fd3db24432f
Reviewed-on: https://review.whamcloud.com/44315
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14655 lnet: Protect lpni deref in lnet_health_check 03/43503/3
Chris Horn [Wed, 28 Apr 2021 01:10:16 +0000 (20:10 -0500)]
LU-14655 lnet: Protect lpni deref in lnet_health_check

Discovery thread can modify peer NI/peer net/peer relationship
so we need to be careful when dereferencing the peer NI pointer in
lnet_health_check(). Discovery thread operations under net lock, so
move the peer NI dereference under the net lock which is taken for
incrementing the health stats.

Move some of the other code that is only relevant for messages with a
health status != LNET_MSG_STATUS_OK under the appropriate condition.

HPE-bug-id: LUS-9962
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3e6763b71bcdc9281f46b79c59e40f939190d468
Reviewed-on: https://review.whamcloud.com/43503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13417 test: generate uneven MDTs early for sanity 413 84/44384/6
Lai Siyao [Tue, 20 Jul 2021 01:24:36 +0000 (09:24 +0800)]
LU-13417 test: generate uneven MDTs early for sanity 413

Fill MDT early to generate uneven MDTs for sanity test_413, and
add test_413z to unlink these directories.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-1
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I84e3670bb40c3666488139d6a272f29188b0dfae
Reviewed-on: https://review.whamcloud.com/44384
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14868 llite: revert 'simplify callback handling for async getattr' 71/44371/6
Andreas Dilger [Wed, 21 Jul 2021 23:38:37 +0000 (23:38 +0000)]
LU-14868 llite: revert 'simplify callback handling for async getattr'

This reverts commit cbaaa7cde45f59372c75577d7274f7e2e38acd24.

This is causing process hangs and timeouts during file removal.

Test-Parameters: trivial
Fixes: cbaaa7cde4 ("LU-14139 llite: simplify callback handling for async getattr")
Change-Id: I77f5bc460850bfe7a5143e22b0c5f3e14a40474a
Reviewed-on: https://review.whamcloud.com/44371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14826 mdt: getattr_name("..") under striped directory 68/44168/6
Lai Siyao [Thu, 8 Jul 2021 14:25:51 +0000 (10:25 -0400)]
LU-14826 mdt: getattr_name("..") under striped directory

For getattr_name(".."), it should return FID of the master object for
striped directories. This includes changes on both client and server:
* lmv_getattr_name() should use master object FID if it's looking up
  "..".
* mdt_raw_lookup() should check parent object is sub stripe, if so
  it needs to lookup again to get master object FID. For old client
  without above change this needs to be checked twice.

This is needed by NFS export, because ll_get_parent() find parent by
getattr_name("..").

Reenable check_fhandle_syscall and update sanityn test_102.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I72c951293e41656ce3778750147402d7f8ca4cec
Reviewed-on: https://review.whamcloud.com/44168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-14833 sec: quiet spurious gss_init_svc_upcall() message 97/44197/2
Sebastien Buisson [Fri, 9 Jul 2021 12:52:40 +0000 (14:52 +0200)]
LU-14833 sec: quiet spurious gss_init_svc_upcall() message

Switch from CWARN to CDEBUG(D_SEC) for message printed by
gss_init_svc_upcall():
Init channel is not opened by lsvcgssd, following request might be
dropped until lsvcgssd is active
Indeed, this message is printed no matter GSS is enabled or not, and
we do not have any way to check this by the time the kernel module
is loaded.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I66c8c2a16e58ca75973226c80e0f4a92c90b4025
Reviewed-on: https://review.whamcloud.com/44197
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14114 lnet: print device status in net show command 69/44169/2
Cyril Bordage [Wed, 7 Jul 2021 13:27:54 +0000 (15:27 +0200)]
LU-14114 lnet: print device status in net show command

A device can be in fatal state, if the cable was disconnected, or the
port brought down on the switch side. In these cases, the LND (o2iblnd
for now), will flag the device in fatal state. That device will not be
used any further. However, it's health will not be decremented. This
causes some confusion when examining the state of the node.
It is better to print the device status in the output of the lnetctl
net show command.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7c635ab1062f6153449fcec1bc07585065818a72
Reviewed-on: https://review.whamcloud.com/44169
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14804 nodemap: do not return error for improper ACL 27/44127/3
Sebastien Buisson [Thu, 1 Jul 2021 15:20:39 +0000 (00:20 +0900)]
LU-14804 nodemap: do not return error for improper ACL

In nodemap_map_acl(), in case the ACL is incorrect, do nothing
and just return initial size to caller.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I26aba9ce43e4a8878bfa47e145b1b44cfff89403
Reviewed-on: https://review.whamcloud.com/44127
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14300 quota: avoid nested lqe lookup 26/43326/5
Sergey Cheremencev [Mon, 12 Apr 2021 23:44:34 +0000 (02:44 +0300)]
LU-14300 quota: avoid nested lqe lookup

lqe_locate called from qmt_pool_lqes_lookup for lqe
that hasn't an entry on a disk calls qmt_lqe_set_default.
This may call qmt_set_id_notify->qmt_pool_lqes_spec
and rewrite already added lqes in a qti. Rewritten
lqes may trigger an assertion:

LustreError: 5072:0:(qmt_pool.c:838:qmt_pool_lqes_lookup()) ASSERTION( (((qmt_info(env)->qti_lqes_num) > 16 ? qmt_info(env)->qti_lqes : qmt_info(env)->qti_lqes_small)[(qmt_info(env)->qti_glbl_lqe_idx)])->lqe_is_global ) failed:
LustreError: 5072:0:(qmt_pool.c:838:qmt_pool_lqes_lookup()) LBUG
Pid: 5072, comm: mdt_rdpg00_003 3.10.0-957.1.3957.1.3.x4.1.15.x86_64 #1 SMP Mon Nov 18 14:47:03 PST 2019
Call Trace:
 [<ffffffffc046f62c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<ffffffffc046f94c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<ffffffffc0e4ae38>] qmt_pool_lqes_lookup+0x798/0x8f0 [lquota]
 [<ffffffffc0e3b0ce>] qmt_intent_policy+0x86e/0xe00 [lquota]
 [<ffffffffc109d53d>] mdt_intent_opc+0x3bd/0xb40 [mdt]
 [<ffffffffc10a5134>] mdt_intent_policy+0x1a4/0x360 [mdt]
 [<ffffffffc0a7bedb>] ldlm_lock_enqueue+0x3cb/0xad0 [ptlrpc]
 [<ffffffffc0aa4a46>] ldlm_handle_enqueue0+0xa56/0x1610 [ptlrpc]
 [<ffffffffc0b304b2>] tgt_enqueue+0x62/0x210 [ptlrpc]
 [<ffffffffc0b3753a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]

or a deadlock(2 same lqes qti_lqes array):

 call_rwsem_down_write_failed+0x17/0x30
 qti_lqes_write_lock+0xb1/0x1b0 [lquota]
 qmt_dqacq0+0x2ee/0x1ac0 [lquota]
 qmt_intent_policy+0xbfe/0xe00 [lquota]
 mdt_intent_opc+0x3ba/0xb50 [mdt]
 mdt_intent_policy+0x1a1/0x360 [mdt]
 ldlm_lock_enqueue+0x3d6/0xaa0 [ptlrpc]
 ldlm_handle_enqueue0+0xa76/0x1620 [ptlrpc]
 tgt_enqueue+0x62/0x210 [ptlrpc]
 tgt_request_handle+0x96a/0x1680 [ptlrpc]
 kthread+0xd1/0xe0

Patch adds a sanity-quota_73b to check that the isssue
doesn't exist anymore.

Change-Id: Ib1ebe82c3b6e819b2538f30af08930060bd659ae
HPE-bug-id: LUS-9902
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158581
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/43326
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14508 lfs: make mirror operations preserve timestamps 09/42009/17
John L. Hammond [Thu, 11 Mar 2021 16:02:54 +0000 (10:02 -0600)]
LU-14508 lfs: make mirror operations preserve timestamps

Save and try to restore the file timestamps around the various mirror
operations. Add sanity-flr tests 61[abc] to verify this.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5ef754e46cfbe82c731a709209576bbfcc73af3d
Reviewed-on: https://review.whamcloud.com/42009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12214 build: fix SLES build/install 72/39972/20
Alexey Lyashkov [Fri, 15 Jan 2021 15:21:15 +0000 (10:21 -0500)]
LU-12214 build: fix SLES build/install

Redhat and SuSe can have different library name for same devel,
lets drop a strong requrement to the library package name and
ask rpm to use an autoprovide option.

Test-Parameters: trivial
Test-Parameters: clientdistro=sles15sp1 ossdistro=el7.7 mdsdistro=el8.2
HPE-bug-id: LUS-7204
Fixes: e1bf37870d LU-12214 build: fix build with gss enabled
Fixes: d746e64fe1 LU-13562 build: SUSE build support for azure
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I7e0fe83f9090e7616ab156fa75fed4821099406e
Reviewed-on: https://review.whamcloud.com/39972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12022 tests: error on resync failure sanity-flr 54/35754/7
James Nunez [Tue, 15 Jun 2021 17:14:49 +0000 (11:14 -0600)]
LU-12022 tests: error on resync failure sanity-flr

In sanity-flr test 200, we should error if the final resync
fails.  Replace all calls to 'mirror_io resync' that does
not inject an error to  '$LFS mirror resync'.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I9b2ec1beb7060086808b7529467bef80c8e9659f
Reviewed-on: https://review.whamcloud.com/35754
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
2 years agoLU-6142 libcfs: checkpatch cleanup of libcfs fail.c 07/44207/4
James Simmons [Sat, 10 Jul 2021 14:54:23 +0000 (10:54 -0400)]
LU-6142 libcfs: checkpatch cleanup of libcfs fail.c

Resolve several checkpatch issues reported for fail.c. This brings
us into aligment with the native Linux client version.

Test-Parameters: trivial
Change-Id: I71e71f48a94fa20756f7696b5fbf115c919d05d3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44207
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg 22/44122/3
Mr NeilBrown [Thu, 1 Jul 2021 03:19:29 +0000 (13:19 +1000)]
LU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg

Rather than requiring the called to format a thread name into a temp
buffer, change these thread_start function to accept a format and
args, and to hand them directly to kthread_run().

This is done with a macro rather than a function as the functions are
trivial and varargs is slightly easier with macros.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I73926ef38a9e84061d1a3f9acf5c0be4a247f957
Reviewed-on: https://review.whamcloud.com/44122
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-6142 lnet: discard lnet_current_net_count 89/44089/2
Mr NeilBrown [Mon, 28 Jun 2021 06:22:02 +0000 (16:22 +1000)]
LU-6142 lnet: discard lnet_current_net_count

The variable lnet_current_net_count is never used.  So remove it.
The function lnet_get_net_count() is only used to update thar
variable, so remove it too.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id61f381f6220356c5b96c8a5822d8748a8ba43a4
Reviewed-on: https://review.whamcloud.com/44089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14217 osd-zfs: allow SEEK_HOLE/DATA only with sync 70/40970/2
Mikhail Pershin [Tue, 15 Dec 2020 11:47:20 +0000 (14:47 +0300)]
LU-14217 osd-zfs: allow SEEK_HOLE/DATA only with sync

ZFS doesn't report valid offset for SEEK_DATA if there are dirty
data, but may report SEEK_HOLE correctly that cause unreliable
results when same offset can be reported as HOLE (correctly) and
also as DATA, incorrectly but because switching to generic approach,
assuming all file is data and hole beyond end of file.

To avoid that we have to sync dirty data when dmu_offset_next()
reports EBUSY and repeat lseek call. Considering that this can
cause slowdown this behavior is controlled via new 'sync_on_lseek'
option. With this option turned off osd-zfs reports that it doesn't
support SEEK_DATA/HOLE because we cannot use unrealiable results
in our tools to copy sparse data

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic92c127628ce517a9c2f79f595a1d16116930383
Reviewed-on: https://review.whamcloud.com/40970
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14805 llite: No locked parallel DIO 31/44131/3
Patrick Farrell [Fri, 2 Jul 2021 17:24:48 +0000 (13:24 -0400)]
LU-14805 llite: No locked parallel DIO

If we are doing locked DIO, the OSC & LDLM locks are
released at the end of cl_io_loop, ie, before we wait for
parallel DIO at the llite layer.

This is problematic because the locks are released before
i/o done using them is complete; this can lead to data
inconsistencies.  (And at least one LBUG, see LU-14805.)

The easiest solution for now is only do parallel DIO when
working lockless (which is the default; DIO only switches
to locked to manage conflicts with buffered i/o).

This problem & fix apply to AIO as well as parallel DIO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If98a0551d6dde54220b406b26e978e284a6b1ebf
Reviewed-on: https://review.whamcloud.com/44131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13440 utils: fix handling of lsa_stripe_off = -1 30/43530/11
Andreas Dilger [Tue, 4 May 2021 01:25:23 +0000 (19:25 -0600)]
LU-13440 utils: fix handling of lsa_stripe_off = -1

Use LMV_OFFSET_DEFAULT instead of "-1" for parsing lfs_setdirstripe()
since parse_targets() will return "(__u32)-1" to the caller for the
stripe index, but lsa_stripe_off is a signed long long so it is
interpreted as 4294967295.  This causes the parsing to fail when
"lfs setdirstripe -i -1 --max-inherit-rr 1" is used.

Update sanity test_413a/413c to also specify "-i -1" to verify this.

In sanity test 413a,413b and 413c, create "qos" directory on most
full directory, so that its subdirectories won't be created on the
same MDT.

Fixes: f167f78e3bfd ("LU-13440 lmv: add default LMV inherit depth")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic934f859173155b1b2df56fcd315c8da633ebbe5
Reviewed-on: https://review.whamcloud.com/43530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14541 llite: avoid stale data reading 76/43476/5
Wang Shilong [Wed, 28 Apr 2021 14:26:10 +0000 (22:26 +0800)]
LU-14541 llite: avoid stale data reading

remove_mapping() can prohibit to kill page from page cache due page
refcount!=2, in vvp_page_delete() clear uptodate flag in case
stale data reading later.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I322debec951b1a342246475456c0f40e10b0e578
Reviewed-on: https://review.whamcloud.com/43476
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>