Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-11213 uapi: Remove unused CONNECT flag 08/36008/3
Patrick Farrell [Fri, 30 Aug 2019 13:39:57 +0000 (09:39 -0400)]
LU-11213 uapi: Remove unused CONNECT flag

The plain layout connect flag was added as part of an
earlier implementation of LU-11213, but the design was
improved before landing and the flag was not needed.

Let's remove it.  Since it was never actually marked as
supported in any client/server version, we can just remove
it entirely, leaving the flag bit open for future use.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I30946a50299268d00cbfee2081607effd0e4fb47
Reviewed-on: https://review.whamcloud.com/36008
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10467 llite: fix indentation 67/35967/6
Mr NeilBrown [Mon, 26 Aug 2019 04:51:00 +0000 (14:51 +1000)]
LU-10467 llite: fix indentation

Prior to making code changes, fix up the indentation in
ll_umount_begin()

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Iac84c5e0b97e6c07d6a591f3526e7bbc66f3726a
Reviewed-on: https://review.whamcloud.com/35967
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 contrib: fix typo in spelling.txt file 65/35965/4
Mr NeilBrown [Mon, 26 Aug 2019 04:46:36 +0000 (14:46 +1000)]
LU-6142 contrib: fix typo in spelling.txt file

Any patch that mentions cfs_time_seconds() gets a
weird warning from checkpatch due to this typo.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Ifeb54992c991f7f6cbcd0a9fd5aca5158adb2a4f
Reviewed-on: https://review.whamcloud.com/35965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
4 years agoLU-12694 quota: display correct group quota information 17/35917/3
Tatsushi Takamura [Mon, 26 Aug 2019 01:27:10 +0000 (10:27 +0900)]
LU-12694 quota: display correct group quota information

GETQUOTA is not executed when oqctl->qc_dqblk.dqb_curspace
not 0. So, qctl.qc_dqblk need to be cleared
to display correct.

Signed-off-by: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com>
Change-Id: I5163c00fbc58ec1dd04c912e36abd01a7d2239ad
Reviewed-on: https://review.whamcloud.com/35917
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs 76/35776/7
Alexander Zarochentsev [Mon, 12 Aug 2019 18:42:07 +0000 (21:42 +0300)]
LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs

mdd_mark_orphan_object() sets LUSTRE_ORPHAN_FL only for
directories, which is not correct, causing the important bit of
orphan object state not transferring across OSP link and
allowing a distributed link operation to succeed
for an orphan source object , causing a dangling reference
on one mdt and an unconnected inode on another mdt.
mdd_open_sanity_check() conditions had to be relaxed in
case of replay.

Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: If0d868b3de4d68406e1a3b371827f354566d3e42
Reviewed-on: https://review.whamcloud.com/35776
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9019 llite: fix timeout to not be zero 92/35992/2
Andreas Dilger [Thu, 29 Aug 2019 14:24:10 +0000 (08:24 -0600)]
LU-9019 llite: fix timeout to not be zero

The timeout in ll_kill_super() is intended to be
1/8 of a second, while it loops waiting on other
threads to finish.  However, this was incorrectly
calculated to always be 0 (infinite wait) due to
integer division.

Instead, convert seconds to jiffies before division.

Fixes: 0c2cc920370e ("LU-9019 libcfs: avoid using HZ and msecs_to_jiffies()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I33efbfc2d6c0a9c4ae6404c1974c7593a72540e5
Reviewed-on: https://review.whamcloud.com/35992
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11675 hsm: don't allow new HSM requests during CDT_INIT 71/33671/6
Nikitas Angelinas [Wed, 24 Jul 2019 09:43:53 +0000 (02:43 -0700)]
LU-11675 hsm: don't allow new HSM requests during CDT_INIT

When the HSM CDT is shut down and restarted, it resets cdt_last_cookie
using ktime_get_real_seconds() and examines the CDT llog for existing
requests, in order to set cdt_last_cookie to the highest known value,
so that newly-assigned cookies are unique. There is a window between
CDT_INIT and CDT_RUNNING during which new requests can arrive, and if
the CDT llog has not been fully examined, cookies can be reused. This
can cause the following two assertions to be triggered in
cdt_agent_record_hash_add():

LASSERT(carl0->carl_cat_idx == carl1->carl_cat_idx);
LASSERT(carl0->carl_rec_idx == carl1->carl_rec_idx);

Fix this by not allowing new HSM requests during CDT_INIT.

Also, cookie values are incremented on a separate line, which causes
one value to be skipped at CDT startup time. This is not an issue, but
there does not seem to be a need for it; fix this post-incrementing
and assigning cookie values in the same line.

Signed-off-by: Nikitas Angelinas <nangelinas@cray.com>
Cray-bug-id: LUS-6589
Test-Parameters: trivial testlist=sanity-hsm
Change-Id: I18a1c3e85de6c50a9bf1ce598e21d83d893ad0ca
Reviewed-on: https://review.whamcloud.com/33671
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11239 lfs: fix mirror resync error handling 37/33537/9
Bobi Jam [Wed, 10 Oct 2018 06:23:55 +0000 (14:23 +0800)]
LU-11239 lfs: fix mirror resync error handling

This patch returns error for partially successful mirror resync.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9d6c9ef5aca1674ceb7a9cbc6b790f3f7276ff5d
Reviewed-on: https://review.whamcloud.com/33537
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12603 ldlm: Check cancel lock count for correctness 06/35806/2
Oleg Drokin [Sat, 17 Aug 2019 05:36:07 +0000 (01:36 -0400)]
LU-12603 ldlm: Check cancel lock count for correctness

Make sure the number of locks we are going to cancel fits into
the supplied buffer first.

Change-Id: I93887133532bf7ee2be27114b1972aa64e06623c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Reviewed-on: https://review.whamcloud.com/35806
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
4 years agoLU-12590 ptlrpc: check lm_bufcount and lm_buflen 83/35783/7
Emoly Liu [Thu, 29 Aug 2019 02:55:13 +0000 (10:55 +0800)]
LU-12590 ptlrpc: check lm_bufcount and lm_buflen

Check lm_bufcount to be used by lustre_msg_hdr_size_v2() and
validate individual and total buffer lengths in
lustre_unpack_msg_v2() in case of any out-of-bound read.

Change-Id: I4905e0665c7770443684cffe504935d27473d7c6
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35783
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11546 tests: enable large_dir support for tests 58/35358/7
Andreas Dilger [Fri, 28 Jun 2019 09:58:32 +0000 (03:58 -0600)]
LU-11546 tests: enable large_dir support for tests

Enable the ldiskfs large_dir feature by default for all test
filesystems, so that we can verify it is not causing any issues
in regular testing.  If this is successful, it can be enabled
in mkfs.lustre permanently.

Existing conf-sanity.sh test_110() and test_111() exercise this
feature specifically, this patch is to ensure other tests have
no problems with this feature enabled.  There are other problems
with test_110() so it cannot be enabled at this time.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifff578551b57b05753fc10abb2d5294730254035
Reviewed-on: https://review.whamcloud.com/35358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count 07/35807/2
Oleg Drokin [Sat, 17 Aug 2019 05:43:36 +0000 (01:43 -0400)]
LU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count

Make sure the number of locks we are going to cancel fits into
the supplied buffer first.
This is similar to LU-12603, just in a different place.

Change-Id: Ifa2aa976ce8613217c739ef609de54538c57b5e9
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Reviewed-on: https://review.whamcloud.com/35807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
4 years agoLU-12685 llite: fix check for mem-alloc failure. 73/35873/3
Mr NeilBrown [Thu, 22 Aug 2019 00:07:38 +0000 (10:07 +1000)]
LU-12685 llite: fix check for mem-alloc failure.

A change to allocation of op_data in ll_statahead_thread()
appears to have assumed that OBD_ALLOC_PTR() would set the
pointer to ERR_PTR(-ENOMEM) on failure.  It actually sets
it to NULL, so the test needs to be changed accordingly.

Test-Parameters: trivial
Fixes: ae828cd3b092 ("LU-4684 llite: add lock for dir layout data")
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I37521d2da50d71ed6fa0f9f05b7cfb848f0d47d9
Reviewed-on: https://review.whamcloud.com/35873
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12635 lnet: Fix style issues for module.c conctl.c 02/35802/3
Shaun Tancheff [Mon, 26 Aug 2019 17:19:32 +0000 (12:19 -0500)]
LU-12635 lnet: Fix style issues for module.c conctl.c

This patch fixes issues reported by checkpatch for the file
selftest/module.c and selftest/conctl.c.
Linux 5.3 enforces the use of 'fallthrough' which is also
suggested by checkpatch

Test-Parameters: trivial
Cray-bug-id: LUS-7690
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: If650375f63f27c01c40e251059fa242f919854be
Reviewed-on: https://review.whamcloud.com/35802
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12635 lnet: Fix style issues for selftest/rpc.c 00/35800/2
Shaun Tancheff [Wed, 14 Aug 2019 20:26:09 +0000 (15:26 -0500)]
LU-12635 lnet: Fix style issues for selftest/rpc.c

This patch fixes issues reported by checkpatch for the file
selftest/rpc.c.
Linux 5.3 enforces the use of 'fallthrough' which is also
suggested by checkpatch

Test-Parameters: trivial
Cray-bug-id: LUS-7690
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I049e32c0b0cf1002166445a89ac39110442d28bd
Reviewed-on: https://review.whamcloud.com/35800
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12634 build: kbuild changes in 5.3 drop subdir-m 86/35786/2
Shaun Tancheff [Tue, 13 Aug 2019 20:36:47 +0000 (15:36 -0500)]
LU-12634 build: kbuild changes in 5.3 drop subdir-m

Several changes in kbuild affect the way external modules
can be built. In the Linux 5.3-rc4 series subdir-m has
been removed.

Linux commit: c07d8d47bca1b325102fa2be3a463075f7b051d9

Test-Parameters: trivial
Cray-bug-id: LUS-7689
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: Id1f248ac4ccdee8d2a2d177b4fdff4444d2084d1
Reviewed-on: https://review.whamcloud.com/35786
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11607 tests: replace version/fstype calls in replay-single 24/35724/3
James Nunez [Wed, 7 Aug 2019 22:29:20 +0000 (16:29 -0600)]
LU-11607 tests: replace version/fstype calls in replay-single

The routine get_lustre_env() is available to all Lustre test
suites and sets an environment variable for the file system
type for MDS1 and OST1 and sets a variable for the Lustre
version of servers.

In replay-single, replace the calls to facet_fstype() and
lustre_version_code() for all server types with definitions
in get_lustre_env().

While doing this, replace facet_fstype $SINGLEMDS and
lustre_version_code $SINGLEMDS with facet_fstype mds1
and lustre_version_code mds1, respectively.

Clean up around any modifications by removing calls to
return after skip() or skip_env().

Test-Parameters: trivial testlist=replay-single
Test-Parameters: fstype=zfs testlist=replay-single
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I14d90f934401c9f0504d42adc9f6e59b12149b0b
Reviewed-on: https://review.whamcloud.com/35724
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12635 build: Support for gcc -Wimplicit-fallthrough 08/35708/4
Shaun Tancheff [Thu, 15 Aug 2019 18:50:01 +0000 (13:50 -0500)]
LU-12635 build: Support for gcc -Wimplicit-fallthrough

Linux 5.3 enables -Wimplicit-fallthrough
Add decorators for implicit-fallthrough compiler checks.

Test-Parameters: trivial
Cray-bug-id: LUS-7690
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I5bccb2cfd6b5900ff7f0e21b5546eec9ffa83c19
Reviewed-on: https://review.whamcloud.com/35708
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
4 years agoLU-12523 ptlrpc: Stop sending ptlrpc_body_v2 83/35583/3
Patrick Farrell [Mon, 22 Jul 2019 18:02:28 +0000 (14:02 -0400)]
LU-12523 ptlrpc: Stop sending ptlrpc_body_v2

ptlrpc_body_v2 does not include space for jobids, that
means that when we added jobid to the RPC debug messages,
we started getting errors like this:

LustreError: 6817:0:(pack_generic.c:425:lustre_msg_buf_v2()) msg
000000005c83b7a2 buffer[0] size 152 too small (required 184, opc=-1)

This happened every time we tried to print a ptlrpc_body_v2
message.

body_v2 is still sent on some RPCs for compatibility with
very old versions of Lustre, but we no longer support
interop with those versions (latest reported is 2.3).

So, stop sending ptlrpc_body_v2 on any RPCs.

Note that we need to retain the ptlrpc_body_v2 definitions
and parsing capability for interop with servers which still
use them for some messages, which is all prior to this
patch.

One further note:
This does *not* fix the case of newer clients collecting
rpctrace with older servers.  They will still see the
error message for some RPCs.  That could be fixed with
tweaks to the debug printing code.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I54fc9a174788235c43fe9101a1b42adc7f547847
Reviewed-on: https://review.whamcloud.com/35583
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: double copy bug 74/35574/3
Dan Carpenter [Fri, 19 Jul 2019 18:48:12 +0000 (14:48 -0400)]
LU-9859 libcfs: double copy bug

The problem is that we copy hdr.ioc_len, we verify it, then we copy it
again without checking to see if it has changed in between the two
copies.

This could result in an information leak.

Linux-commit: 76bdaa161cd93d9c033bf6fe2b0a5661c8204441

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: Ic9ae8c19d90a5547600f3775ed337394717b94e3
Reviewed-on: https://review.whamcloud.com/35574
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4423 ptlrpc: incorporate BUILD_BUG_ON into ptlrpc_req_async_args() 09/35509/5
NeilBrown [Sat, 27 Jul 2019 15:44:06 +0000 (11:44 -0400)]
LU-4423 ptlrpc: incorporate BUILD_BUG_ON into ptlrpc_req_async_args()

Every call to ptlrpc_req_async_args() should be preceded by a
BUILD_BUG_ON() (aka LASSERT()), though a few aren't.

To improve maintainability, include the BUILD_BUG_ON into the
ptlrpc_req_async_args() macro.

Change-Id: I07481921379930b4b2d9329aefb47068fd0e07f0
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12075 mdt: commit migrate transaction with locks held 31/34431/9
Alex Zhuravlev [Fri, 15 Mar 2019 12:39:10 +0000 (15:39 +0300)]
LU-12075 mdt: commit migrate transaction with locks held

in normal conditions migration (being a distributed transaction)
saves LDLM locks in the reply to implement CoS semantics, but if
 migrate process has got too many LDLM locks and we can't save
them in the reply, then we should commit the transaction and
then we can release the locks.

Change-Id: I5e5f0516bdca973a72e43d63ecde79c792558abd
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34431
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 mdt: replace CLASSERT with BUILD_BUG_ON 19/32219/4
James Simmons [Sat, 17 Aug 2019 14:47:42 +0000 (10:47 -0400)]
LU-9859 mdt: replace CLASSERT with BUILD_BUG_ON

Replace the lustre specific CLASSERT in mdt layer with what the
linux kernel provides. Note the logic is reverse.

Test-Parameters: trivial

Change-Id: I1e4d75a6643204c55c888a5c2e95a8b12251f5b7
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12616 obclass: fix MDS start/stop race 52/35652/5
Alexander Boyko [Tue, 30 Jul 2019 11:33:15 +0000 (07:33 -0400)]
LU-12616 obclass: fix MDS start/stop race

The MDS unload happen when type of MDT has no reference.
The MDT drop it during obd_cleanup. So race window located
between obd_cleanup and server_stop_servers.
Lustre can lost MDS obd_device during server_start_targets
between MDS checking and taking the type reference, if another MDT
stops.

The patch takes one more reference for a MDT type at
server_start_targets, and put it at server_stop_servers.

This patch adds sanity test 278. It reproduces the next race
   started cleanup of MDT01
   started cleanup of MDT00
   finished cleanup of MDT00
   started MDT00 mount, checked MDS exist
   finished cleanup of MDT01, and cleanup of MDS also
   asserted during MDT00 initialization

Cray-bug-id: LUS-7275
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I9ae3bc2ec1d23c8d436f143d12e26209fdb6b083
Reviewed-on: https://review.whamcloud.com/35652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12015 build: update changelog for ubuntu kernel 31/34331/7
Minh Diep [Tue, 26 Feb 2019 16:06:23 +0000 (08:06 -0800)]
LU-12015 build: update changelog for ubuntu kernel

Update Ubuntu18.04 client build kernel

Test-Parameters: trivial

Change-Id: I02137fd813239d5d19a9dc0d74be49f302652e8e
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34331
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Joseph Gmitter <jgmitter@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12671 mdd: rename mdd/sync_perm to sync_permissions 51/35851/4 pcc
James Simmons [Wed, 21 Aug 2019 20:16:13 +0000 (16:16 -0400)]
LU-12671 mdd: rename mdd/sync_perm to sync_permissions

Commit e783bbff accidentally renamed a sysfs variable when moving.
Change the sysfs file to it proper name

Test-Parameters: trivial testlist=replay-vbr

Change-Id: I56e0534506271cf6760f775a9c8fa99b12683861
Fixes: e783bbff ("LU-8066 mdd: migrate from proc to sysfs")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/35851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: switch to cpumask_var_t 09/35809/3
NeilBrown [Mon, 19 Aug 2019 00:17:55 +0000 (20:17 -0400)]
LU-9859 libcfs: switch to cpumask_var_t

So that we can use the common cpumask allocation functions,
switch to cpumask_var_t.
We need to be careful not to free a cpumask_var_t until the
variable has been initialized, and it cannot be initialized
directly.
So we must be sure either that it is filled with zeros, or
that zalloc_cpumask_var() has been called on it.

Linux-commit: 3872fb73cabdd47fd4abf7b6eff21d06e57297eb

Change-Id: I58d3f2e1fb1c71e1bd094a60ec4eb49b477e69e3
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35809
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12457 kernel: RHEL 7.7 server support 27/35727/4
Jian Yu [Tue, 13 Aug 2019 19:20:05 +0000 (12:20 -0700)]
LU-12457 kernel: RHEL 7.7 server support

This patch makes changes to support new RHEL 7.7 release
for Lustre server.

Test-Parameters: trivial clientdistro=el7.7 serverdistro=el7.7

Change-Id: Ic56e087e6c89f1bbd1ab247c44b2e979828f34f9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35727
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11768 test: limit at_max to timeout in time 51/35651/2
Hongchao Zhang [Thu, 11 Jul 2019 08:20:33 +0000 (04:20 -0400)]
LU-11768 test: limit at_max to timeout in time

In test_6 of sanity-quota, if the AT is enabled, the timeout of
the QUOTA_DQACQ request could be longer than OBD_TIMEOUT*2, which
cause the watchdog to be triggered.

Change-Id: I7e3a976a004259f5c956fc48f4d8d63c751ee2c0
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12400 osd-ldiskfs: support multi-page bvec 98/35498/2
Shaun Tancheff [Sun, 14 Jul 2019 11:33:35 +0000 (06:33 -0500)]
LU-12400 osd-ldiskfs: support multi-page bvec

Mutli-page bvec support to enable bio_for_each_segment_all
needs users to supply bvec_iter_all as an iterator

Abstract the old (int) and new (bvec_iter_all) iterator types

Linux-commit: 6dc4f100c175dd0511ae8674786e7c9006cdfbfa

Test-Parameters: trivial
Cray-bug-id: LUS-7600
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I5a6decae9a4d470e268e20e86c29623b98e97205
Reviewed-on: https://review.whamcloud.com/35498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-12455 osd: Correct readcache_max_filesize proc 66/35266/5
Patrick Farrell [Fri, 21 Jun 2019 16:44:14 +0000 (12:44 -0400)]
LU-12455 osd: Correct readcache_max_filesize proc

Readcache max file size is displayed as unsigned, but
parsed as signed.  This causes confusion when users try to
set it back to the default, which is 2^64-1 (or -1), and
they can't use 2^64-1.

Easy enough to add an unsigned parser and use it instead.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I15f5677c61f4f12a8448a665c5d52a8c94d062f3
Reviewed-on: https://review.whamcloud.com/35266
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11883 nodemap: make deny_unknown visible on default nodemap 90/34090/10
Sebastien Buisson [Tue, 22 Jan 2019 16:26:18 +0000 (17:26 +0100)]
LU-11883 nodemap: make deny_unknown visible on default nodemap

deny_unknown can be set on the 'default' nodemap, but its value
cannot be read, neither with 'lctl get_param' nor by reading the
file /proc/fs/lustre/nodemap/default/deny_unknown directly.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idc8db01a8d32f0ae071f92307843379f4c02571c
Reviewed-on: https://review.whamcloud.com/34090
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-12017 ldlm: DoM truncate deadlock 57/35057/10
Andriy Skulysh [Thu, 6 Jun 2019 12:22:00 +0000 (15:22 +0300)]
LU-12017 ldlm: DoM truncate deadlock

setxattr takes inode lock and sends reint to MDS.
truncate takes MDS_INODELOCK_DOM lock and wants
to acquire inode lock.

MDS locks are for different bits
MDS_INODELOCK_UPDATE|MDS_INODELOCK_XATTR vs
MDS_INODELOCK_DOM but they blocks each other if
some blocking lock was present earlier.

If IBITS waiting lock has no conflicts with any lock in the
granted queue or any lock ahead in the waiting queue then
it can be granted.

Use separate waiting lists for each ibit to eliminate full
lr_waiting list scan.

Cray-bug-id: LUS-6970
Change-Id: I95b2ed0b1a0063b7ece5277a5ee06e2511d44e5f
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/35057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10048 ofd: take local locks within transaction 93/31293/63
Alex Zhuravlev [Tue, 13 Feb 2018 13:10:37 +0000 (16:10 +0300)]
LU-10048 ofd: take local locks within transaction

The patch (with companion patch LU-10048 osd: async truncate)
addresses long outstanding technical debt resulting in different
locking order on MDT and OST. With OUT introduction this mismatch
lead to deadlocks as OUT is used by the both sides and can't easily
support two different locking models simultanously.

The unified locking rules are:
- open transaction (dt_trans_start()),
- then take a lock (dt_{read|write}_lock()).

Change-Id: I1eaeb58ae3d61869914293a9fea2d0a1faefe76b
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/31293
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12602 mdt: check EA size in mdt_getxattr_pack_reply() 68/35768/3
Emoly Liu [Wed, 14 Aug 2019 07:52:58 +0000 (15:52 +0800)]
LU-12602 mdt: check EA size in mdt_getxattr_pack_reply()

Check EA data size(non-positive or excessively large) in case of
any corruption.

Change-Id: I8ccea214f8d7c0403a9df180acf487ee381b8d77
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35768
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12605 tgt: check client data size in target_handle_connect() 11/35711/4
Emoly Liu [Fri, 9 Aug 2019 07:29:30 +0000 (15:29 +0800)]
LU-12605 tgt: check client data size in target_handle_connect()

Check client data size (negtive or excessively large) in case of
memcpy corruption.

Change-Id: Ided26dea0e2bbb79e607c626810834ca947497d4
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35711
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12675 mdt: release object reference upon error 45/35845/3
Bruno Faccini [Wed, 21 Aug 2019 13:32:54 +0000 (15:32 +0200)]
LU-12675 mdt: release object reference upon error

LBUG ("(lu_object.c:1196:lu_device_fini()) ASSERTION(
atomic_read(&d->ld_ref) == 0) failed: Refcount is <x>") can
intermitently occur during umount of MDT0000, upon specific
use cases (playing with file/dir having foreign LOV/LMV), and
due to object reference set/leaked on server side.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ic49b2bb0402b1a6e51d7ba656f9957eeda1bd0fb
Reviewed-on: https://review.whamcloud.com/35845
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: fix cfs_print_to_console() 47/35847/2
James Simmons [Wed, 21 Aug 2019 15:52:03 +0000 (11:52 -0400)]
LU-9859 libcfs: fix cfs_print_to_console()

The original code for cfs_print_to_console() used printk() and
used tricks to select which printk level to use. Later a cleanup
patch landed the just converted printk directly to pr_info which
is not exactly correct. Instead of converting back to printk lets
move everything to pr_* type functions which simplify the code.
This allows us to fold both dbghdr_to_err_string() and the
function dbghdr_to_info_string() into cfs_print_to_console().

Linux-commit: f030d88558e77bbf07fab388c341af1cf86135c9

Fixes: 003096c7e3e ("LU-6142 libcfs: Fix style issues for linux-tracefile.c")

Change-Id: I44646b1ff41505faa05eeea7bfcb6911e893fb73
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35847
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11011 osc: add preferred checksum type support 49/32349/11
Li Xi [Thu, 10 May 2018 04:25:05 +0000 (00:25 -0400)]
LU-11011 osc: add preferred checksum type support

Some checksum types might not work correctly even though they are
available options and have the best speeds during test. In these
circumstances, users might want to use a certain checksum type which
is known to be functional. However, "lctl conf_param XXX-YYY.osc.
checksum_type=ZZZ" won't help to enforce a certain checksum type,
because the selected checksum type is determined during OSC
connection, which will overwrite the LLOG parameter.

To solve this problem, whenever a valid checksum type is set by "lctl
conf_param" or "lctl set_param", it is remembered as the perferred
checksum type for the OSC. During connection process, if that
checksum type is available, that checksum type will be selected as
the RPC checksum type regardless of its speed.

The semantics of interface /proc/fs/lustre/osc/*/checksum_type is
changed for a little bit. If a wrong checksum name is being written
into this entry, -EINVAL will be returned as before. If the written
string is a valid checksum name, even though the checksum type is
not supported by this OSC/OST pair, the checksum type will still be
remembered as the perferred checksum type, and return value will be
-ENOTSUPP. Whenever connecting/reconnecting happens, if perferred
checksum type is availabe, it will be used for the RPC checksum.

Change-Id: Ie6fdc1d8ed6c55531ad6b7c926659d644fefccaf
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/32349
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12313 llite: Mark lustre_inode_cache as reclaimable 90/35790/4
Jacek Tomaka [Sat, 18 May 2019 02:17:30 +0000 (10:17 +0800)]
LU-12313 llite: Mark lustre_inode_cache as reclaimable

This is required for proper kernel memory available accounting.
Without it memory allocated to lustre_inode_cache appears as
SUnreclaim where in reality it should apper as SReclaimable.
This affect MemAvailable as well (it is lower than it should be).

Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Change-Id: Iac526a62d0e063b82eea451d1fafa42f2bb4d3b8
Reviewed-on: https://review.whamcloud.com/35790
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12660 kernel: kernel update SLES12 SP4 [4.12.14-95.29.1] 74/35774/3
Jian Yu [Mon, 12 Aug 2019 17:32:17 +0000 (10:32 -0700)]
LU-12660 kernel: kernel update SLES12 SP4 [4.12.14-95.29.1]

Update SLES12 SP4 kernel to 4.12.14-95.29.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT="103a 817"

Change-Id: I4cc8564c4044f551a075ee0d41fd393844f4b760
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL inheritance fix 04/35704/3
Vitaly Fertman [Wed, 7 Aug 2019 13:48:17 +0000 (16:48 +0300)]
LU-10070 lod: SEL inheritance fix

a sub-dir cannot be created in a dir with a SEL layout template
due to problem layout verification

Fixes: 94e9699878e ("LU-10070 lod: SEL: interoperability support")

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Cray-bug-id: LUS-2528
Change-Id: I2c439e44d8f183f4057f548ee38e530c578371e4
Reviewed-on: https://review.whamcloud.com/35704
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12611 lnet: continue adding routes 41/35641/2
Amir Shehata [Mon, 29 Jul 2019 21:53:06 +0000 (14:53 -0700)]
LU-12611 lnet: continue adding routes

Continue adding routes specified even if the current route
exists or the gateway specified in the current route is unreachable.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I71257cd444c29d4641d9d27f05d9160871316b02
Reviewed-on: https://review.whamcloud.com/35641
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12589 llite: swab LOV EA data in ll_getxattr_lov() 26/35626/4
Jian Yu [Sat, 3 Aug 2019 06:58:34 +0000 (23:58 -0700)]
LU-12589 llite: swab LOV EA data in ll_getxattr_lov()

On PPC client, the LOV EA data returned by getfattr from x86_64 server
was not swabbed to the host endian. While running setfattr, the data was
swabbed in ll_lov_setstripe_ea_info(), which caused magic mis-match in
ll_lov_user_md_size() and then ll_setstripe_ea() returned -ERANGE.

This patch fixed the above issue by swabbing LOV EA data in ll_getxattr_lov().

Test-Parameters: clientarch=ppc64 \
envdefinitions=ONLY="24D 102a" testlist=sanity

Change-Id: I8069df0c8f07c0bedba2e27db7c3a5553f11afb4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
4 years agoLU-11873 tests: Increase barrier freeze time 61/35361/2
Patrick Farrell [Fri, 28 Jun 2019 15:32:29 +0000 (11:32 -0400)]
LU-11873 tests: Increase barrier freeze time

Barrier freeze times of 10 seconds or less are roughly the
same length as ZFS commit intervals, and because barriers
generate sync ops, they have to wait for those.  This means
that a 10 second barrier will occassionally expire before
the commit has finished.

Switch to barriers of at least 20 seconds.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I50fc8315c791ed444ccf39755441fdbe3aa1db6c
Reviewed-on: https://review.whamcloud.com/35361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
4 years agoLU-12043 llite: don't miss every first stride page 16/35216/6
Wang Shilong [Thu, 8 Aug 2019 16:49:24 +0000 (12:49 -0400)]
LU-12043 llite: don't miss every first stride page

Whenever we need skip some pages for stride io read, we
will calculate next start page index, however, this page
index is skipped every time, because loop start from index + 1

Testing command: iozone -w -c -i 5 -t1 -j 2 -s 100m -r 1m -F data
Without patch: 587384.69 kB/sec

                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                      16  19  19   |          0   0   0
2:                       0   0  19   |          0   0   0
4:                       0   0  19   |          0   0   0
8:                       0   0  19   |          0   0   0
16:                      0   0  19   |          0   0   0
32:                      0   0  19   |          0   0   0
64:                      0   0  19   |          0   0   0
128:                     0   0  19   |          0   0   0
256:                     0   0  19   |          0   0   0
512:                    22  26  46   |          0   0   0
1024:                   44  53 100   |          0   0   0

With patch: 744635.56 kB/sec
                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                       0   0   0   |          0   0   0
2:                       0   0   0   |          0   0   0
4:                       0   0   0   |          0   0   0
8:                       0   0   0   |          0   0   0
16:                      0   0   0   |          0   0   0
32:                      0   0   0   |          0   0   0
64:                      0   0   0   |          0   0   0
128:                     0   0   0   |          0   0   0
256:                     0   0   0   |          0   0   0
512:                     8  13  13   |          0   0   0
1024:                   50  86 100   |          0   0   0

We get better performances ~27% up here, and all 1 page RPC
disappear.

Change-Id: I126674cbe15197f0abdff256fdde3fc0c49c6898
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35216
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12394 llite: Fix extents_stats 75/35075/8
Patrick Farrell [Tue, 11 Jun 2019 18:54:20 +0000 (14:54 -0400)]
LU-12394 llite: Fix extents_stats

Patch 32517 from LU-8066 changed:
        (1 << LL_HIST_START << i)

To:

        BIT(LL_HIST_START << i)

But these are not equivalent because this changes the order
of operations.  The earlier one does the operations in this
order:
        (1 << LL_HIST_START) << i

The new one is this order:
        1 << (LL_HIST_START << i)

Which is quite different, as it's left shifting
LL_HIST_START directly, and LL_HIST_START is a number of
bits.

The goal is really just to start with BIT(LL_HIST_START)
and left shift by one (going from 4K, to 8K, etc) each
time, so just use:
        BIT(LL_HIST_START + i)

The result of this was that all i/os over 8K were placed in
the 4K-8K stat bucket, because the loop exited early.

Also add mmap'ed reads & writes to extents_stats.

Add test for extents_stats.

Fixes: adb5aca3d673 ("LU-8066 llite: Move all remaining procfs entries
                     to debugfs")

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iab4dc097234d411601a18d501075df45791d1138
Reviewed-on: https://review.whamcloud.com/35075
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12343 osc: Fix dom handling in weight_ast 66/34966/5
Patrick Farrell [Wed, 29 May 2019 15:02:19 +0000 (11:02 -0400)]
LU-12343 osc: Fix dom handling in weight_ast

The DOM bit can be cancelled at any time during calls to
weigh_ast, so:

1. We cannot assert that it is present
2. We cannot use it to identify the !LDLM_EXTENT case when
calling osc_lock_weight

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic3e7370580e35d8ae06b8330971959e0d55a4e81
Reviewed-on: https://review.whamcloud.com/34966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12336 build: Update ZFS version to 0.8.1 51/34951/16
Nathaniel Clark [Thu, 1 Aug 2019 16:53:47 +0000 (12:53 -0400)]
LU-12336 build: Update ZFS version to 0.8.1

New Features 0.8.0

* Native encryption #5769 - The encryption property enables the
creation of encrypted filesystems and volumes. The aes-256-ccm
algorithm is used by default. Per-dataset keys are managed with zfs
load-key and associated subcommands.

* Raw encrypted 'zfs send/receive' #5769 - The zfs send -w option
allows an encrypted dataset to be sent and received to another pool
without decryption. The received dataset is protected by the original
user key from the sending side. This allows datasets to be efficiently
backed up to an untrusted system without fear of the data being
compromised.

* Device removal #6900 - This feature allows single and mirrored
top-level devices to be removed from the storage pool with zpool
remove. All data is copied in the background to the remaining
top-level devices and the pool capacity is reduced accordingly.

* Pool checkpoints #7570 - The zpool checkpoint subcommand allows
you to preserve the entire state of a pool and optionally revert back
to that exact state. It can be thought of as a pool wide snapshot.
This is useful when performing complex administrative actions which
are otherwise irreversible (e.g. enabling a new feature flag,
destroying a dataset, etc).

* Pool TRIM #8419 - The zpool trim subcommand provides a way to
notify the underlying devices which sectors are no longer allocated.
This allows an SSD to more efficiently manage itself and helps prevent
performance from degrading. Continuous background trimming can be
enabled via the new autotrim pool property.

* Pool initialization #8230 - The zpool initialize subcommand writes
a pattern to all the unallocated space. This eliminates the first
access performance penalty, which may exist on some virtualized
storage (e.g. VMware VMDKs).

* Project accounting and quota #6290 - This features adds project
based usage accounting and quota enforcement to the existing space
accounting and quota functionality. Project quotas add an additional
dimension to traditional user/group quotas. The zfs project and zfs
projectspace subcommands have been added to manage projects, set quota
limits and report on usage.

* Channel programs #6558 - The zpool program subcommand can be used
to perform compound ZFS administrative actions via Lua scripts in a
sandboxed environment (with time and memory limits).

* Pyzfs #7230 - The new pyzfs library is intended to provide a
stable interface for the programmatic administration of ZFS. This
wrapper provides a one-to-one mapping for the libzfs_core API
functions, but the signatures and types are more natural to Python.

* Python 3 compatibility #8096 - The arcstat, arcsummary, and
dbufstat utilities have been updated to be compatible with Python 3.

* Direct IO #7823 - Adds support for Linux's direct IO interface.

Performance

* Sequential scrub and resilver #6256 - When scrubbing or
resilvering a pool the process has been split into two phases. The
first phase scans the pool metadata in order to determine where the
data blocks are stored on disk. This allows the second phase to issue
scrub I/O as sequentially as possible, greatly improving performance.

* Allocation classes #5182 - Allows a pool to include a small number
of high-performance SSD devices that are dedicated to storing specific
types of frequently accessed blocks (e.g. metadata, DDT data, or small
file blocks). A pool can opt-in to this feature by adding a special or
dedup top-level device.

* Administrative commands #7668 - Improved performance due to
targeted caching of the metadata required for administrative commands
like zfs list and zfs get.

* Parallel allocation #7682 - The allocation process has been
parallelized by creating multiple "allocators" per-metaslab group.
This results in improved allocation performance on high-end systems.

* Deferred resilvers #7732 - This feature allows new resilvers to be
postponed if an existing one is already in progress. By waiting for
the running resilver to complete redundancy is restored as quickly as
possible.

* ZFS Intent Log (ZIL) #6566 - New log blocks are created and issued
while there are still outstanding blocks being serviced by the
storage, effectively reducing the overall latency observed by the
application.

* Volumes #8615 - When a pool contains a large number of volumes
they are more promptly registered with the system and made available
for use after a zpool import.

* QAT #7295 #7282 #6767 - Support for accelerated SHA256 checksums,
AES-GCM encryption, and the new QAT Intel(R) C62x Chipset / Atom(R)
C3000 Processor Product Family SoC.

Changes in Behavior

* Relaxed (ref)reservation constraints on volumes, they may now be
set larger than the volume size.

* The arcstat.py, arc_summary.py, and dbufstat.py commands have been
renamed arcstat, arc_summary, and dbufstat respectively.

* The SPL source is now included in the ZFS repository removing the
need for separate packages.

* The dedupditto pool property and zfs send -D option have been
deprecated and will be removed in a future release.

Changes for 0.8.1

* Fix comparison signedness in arc_is_overflowing() #8873
* Fix incorrect error message for raw receive #8863
* arc_summary: prefer python3 version and install when there is no
python #8851
* Fix %post and %postun generation in kmodtool #8866
* Reinstate raw receive check when truncating #8852 #8857
* If $ZFS_BOOTFS contains guid, replace the guid portion with $pool
* Fix integer overflow of ZTOI(zp)->i_generation #8858
* hkdf_test binary should only have one icp instance #8850
* Fixed a small typo in man/man1/raidz_test.1 #8855
* Allow TRIM_UNUSED_KSYM when build as a builtin-module #8820
* Make Python detection optional and more portable #8809 #8731
* Wait in 'S' state when send/recv pipe is blocking #8733 #8752
* Make zfs_async_block_max_blocks handle zero correctly #8829 #8289
* Revert "Report holes when there are only metadata changes" #8816
* Add link count test for root inode #8732
* Exclude log device ashift from normal class #8735
* Fix integer overflow in get_next_chunk() #8778 #8797
* Double-free of encryption wrapping key due to invalid pool
properties #8791
* tests: fix cosmetic permission issues during 'make install' #8803
* test-runner.py: change shebang to python3 #8803
* Endless loop in zpool_do_remove() on platforms with unsigned char
* Fix embedded bp accounting in count_block() #8800 #8766
* Disable parallel processing for 'zfs mount -l' #8762 #8811
* Linux 5.2 compat: Directly call wait_on_page_bit() #8794
* Linux 5.2 compat: Fix config/kernel-shrink.m4 test failure #8776
* Linux 5.2 compat: Remove config/kernel-set-fs-pwd.m4 #8777
* zpool: status -t is not documented in help message #8782
* zfs-tests: fix warnings when packaging some .shlib files #8787
* VERIFY3P() message is missing a space character #8786
* zfs-tests: verify zfs(8) and zpool(8) help message is under 80
columns #8785
* zfs: don't pretty-print objsetid property #8784
* zfs: missing newline character in zfs_do_channel_program() error
message #8783
* Fix ksh-path for random_readwrite_fixed.ksh #8779
* Linux 2.6.39 compat: Test if kstrtoul() exists #8760 #8761
* Device removal panics on 32-bit systems #8790
* zpool: trim -p is not a valid option #8781
* Fix coverity defects: CID 186143 #8788
* Rename reservation tests from *.sh to *.ksh #8729
* Fix kstat state update during pool transition #8746
* Linux 5.2 compat: rw_tryupgrade() #8730

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I32973aa5e553eba40abb27fb36461b02674efcd8
Reviewed-on: https://review.whamcloud.com/34951
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12100 tests: Use minimum soft qunit limits 67/35667/7
Nathaniel Clark [Thu, 1 Aug 2019 15:36:59 +0000 (11:36 -0400)]
LU-12100 tests: Use minimum soft qunit limits

Ensure that we don't create limits that are too small, which would
cause all writes to fail.
Wait for grace period to timeout.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-quota
Test-Parameters: testlist=sanity-quota fstype=zfs
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I9342272615ca9c252fcc7f77ed8a61030fc9672a
Reviewed-on: https://review.whamcloud.com/35667
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12622 tests: skip sanity test_816 with SSK 64/35664/2
Sebastien Buisson [Thu, 1 Aug 2019 06:58:44 +0000 (08:58 +0200)]
LU-12622 tests: skip sanity test_816 with SSK

sanity test_816 is incompatible with SSK, so skip it
in case SHARED_KEY is true.

Whamcloud-bug-id: ATM-1283
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I019cf6a4bab8b0cf9825b7c49f364225d5156dfa
Reviewed-on: https://review.whamcloud.com/35664
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-12657 llite: forget cached ACLs properly 56/35756/5
Alex Zhuravlev [Fri, 9 Aug 2019 19:43:45 +0000 (23:43 +0400)]
LU-12657 llite: forget cached ACLs properly

Lustre with linux-4.* fails ACL tests (e.g. sanity/103 and sanityn/25)
because ll_lock_cancel_bits() does not reset i_acl and i_default_acl
into initial state.  use kernel's forget_all_cached_acls() to do so.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I468b775e13ba0f7279a6aa320983705f5e79187a
Reviewed-on: https://review.whamcloud.com/35756
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 libcfs: Fix style issues for linux-tracefile.c 73/35773/2
James Simmons [Mon, 12 Aug 2019 15:39:27 +0000 (11:39 -0400)]
LU-6142 libcfs: Fix style issues for linux-tracefile.c

This patch fixes issues reported by checkpatch for the file
linux-tracefile.c. It brings us into sync with linux client
code.

Change-Id: Ie058bb837f4b53746b0334f08a1091e5c5fe856d
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35773
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12402 lnet: handle recursion in resend 31/35431/6
Amir Shehata [Sat, 6 Jul 2019 16:02:33 +0000 (09:02 -0700)]
LU-12402 lnet: handle recursion in resend

When we're resending a message we have to decommit it first. This
could potentially result in another message being picked up from the
queue and sent, which could fail immediately and be finalized, causing
recursion. This problem was observed when a router was being shutdown.

This patch uses the same mechanism used in lnet_finalize() to limit
recursion. If a thread is already finalizing a message and it gets
into path where it starts finalizing a second, then that message
is queued and handled later.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0cb943473fc8c22573d98da63a99cf7d678d4f42
Reviewed-on: https://review.whamcloud.com/35431
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11594 test: re-enable sanity test 103a for ARM 19/34819/8 19/34819/9
James Simmons [Mon, 12 Aug 2019 01:54:46 +0000 (21:54 -0400)]
LU-11594 test: re-enable sanity test 103a for ARM

The fix for LU-12657 resolves why sanity test 103a failes with newer
kernels. The ARM platform has been failing for this reason so lets
re-enable the test.

Test-Parameters: trivial testgroup=review-ldiskfs-arm

Change-Id: I8b56b7926de7a155ea1429a47d6cbab7a7836bf1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34819
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12503 vvp_dev: increment *pos in .next 65/35765/2
NeilBrown [Sun, 11 Aug 2019 15:43:40 +0000 (11:43 -0400)]
LU-12503 vvp_dev: increment *pos in .next

As described in

Commit ec2e9995e4c5 ("lustre: llite: change how "dump_page_cache" walks a hash table")

The .next function should increment *pos. For some reason it
didn't, and this can trigger the warning in that function.

Change-Id: If4ac748f455750d82712299b7915eb541a3ddc7e
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35765
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12615 mdt: check mdt_object 64/35764/2
Hongchao Zhang [Sat, 13 Jul 2019 12:07:07 +0000 (08:07 -0400)]
LU-12615 mdt: check mdt_object

In processing RPC of getattr, getxattr, swap_layouts and sync,
the mdt_object should be checked to verify there is a valid
RMF_MDT_BODY field and OBD_MD_FLID is set properly.

Change-Id: Ibb6aaa5ec5eb4b7284f4d5567a618a908d66920c
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Reviewed-on: https://review.whamcloud.com/35764
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12462 osc: layout and chunkbits alignment mismatch 33/35733/6
Vitaly Fertman [Thu, 8 Aug 2019 15:46:06 +0000 (18:46 +0300)]
LU-12462 osc: layout and chunkbits alignment mismatch

In the discard case, the OSC fsync/writeback code asserts
that each OSC extent is fully covered by the fsync request.

It may happen that a start(or an end) of a component does not match
the first (the last) osc object extent start (end), which is aligned
by the cl_chunkbits which depends on the OST block size.

The requirement for the component alignment is LOV_MIN_STRIPE_SIZE
which is 64K, the ZFS block size could be in MBs.

Use an aligned by chunk size the fsync reqion in the assertion.

Fixes: 092ecd6612 ("LU-12462 osc: Do not assert for first extent")

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I2ff47fc87c838239142ffc63bebafce3e9403f4e
Cray-bug-id: LUS-7498
Reviewed-on: https://review.whamcloud.com/35733
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10931 lnet: handle unlink before send completes 44/35444/4
Amir Shehata [Mon, 8 Jul 2019 19:33:31 +0000 (12:33 -0700)]
LU-10931 lnet: handle unlink before send completes

If LNetMDUnlink() is called on an md with md->md_refcount > 0 then
the eq callback isn't called.
There is a scenario where the response times out before the send
completes. So we have a refcount on the MD. The Unlink callback gets
dropped on the floor. Send completes, but because we've already timed
out, the REPLY for the GET is dropped. Now we're left with a peer
that is in the following state:
LNET_PEER_MULTI_RAIL
LNET_PEER_DISCOVERING
LNET_PEER_PING_SENT
But no more events are coming to it, and the discovery never
completes.

This scenario can get RPCs stuck as well if the response times out
before the send completes.

The solution is to set the event status to -ETIMEDOUT to inform
the send event handler that it should not expect a reply

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ica0e1a823d0d1200bb8cc42a6e058785da1d4fa4
Reviewed-on: https://review.whamcloud.com/35444
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11760 ofd: limit num of objects to create in 1 transaction 73/35373/5
Sergey Cheremencev [Fri, 28 Jun 2019 20:42:28 +0000 (23:42 +0300)]
LU-11760 ofd: limit num of objects to create in 1 transaction

Set flag th_sync when the number of objects to create per
sequence reaches OST_MAX_PRECREATE in one transaction.
It is needed to avoid gaps after OST failover.
See details in LU-11760.

Change-Id: Ie29de5a42e757b07561749982359c01df999e798
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://review.whamcloud.com/35373
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
4 years agoNew tag 2.12.57 2.12.57 v2_12_57
Oleg Drokin [Tue, 20 Aug 2019 16:25:10 +0000 (12:25 -0400)]
New tag 2.12.57

Change-Id: Ib7df9478e2fcd5ab9221c09af9d1ebf34208f09a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10094 mdc: dir page ldp_hash_end mistakenly adjusted 17/35517/6
Lai Siyao [Mon, 1 Jul 2019 15:02:07 +0000 (23:02 +0800)]
LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted

On system PAGE_SIZE > 4k, mdc_adjust_dirpages() adjusts dir page
end hash with le64_to_cpu() value, but it should be little endian.

Fixes: 9d087dfd0fd ("LU-4516 mdc: missing lexxx_to_cpu in
mdc_read_entry")

Test-Parameters: clientarch=ppc64 envdefinitions=ONLY="18 22 32 48" \
testlist=sanity

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I89bb8b93f1fe5f7962f0b80d122ef9965cf15c63
Reviewed-on: https://review.whamcloud.com/35517
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12562 tests: sanity/256 to use /tmp for the temporary file 54/35554/4
Alex Zhuravlev [Thu, 18 Jul 2019 06:22:59 +0000 (10:22 +0400)]
LU-12562 tests: sanity/256 to use /tmp for the temporary file

otherwise the test being running from a read-only build tree fails.

Test-Parameters: trivial

Change-Id: Ia61202ee92d39e6b0bab21e09bb4fd50fa8e6749
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35554
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-930 utils: fix 'lfs find' error message 40/35740/2
Andreas Dilger [Thu, 8 Aug 2019 22:07:26 +0000 (16:07 -0600)]
LU-930 utils: fix 'lfs find' error message

Print out the actual filename supplied to "lfs find" rather than the
last argument in the list.  Otherwise we get error messages like:

    $ lfs find /myth/tmp -name fasdfasf -size 12
    error: find failed for 12.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I56eef89ab3c04b193f1c7dec3852c4038b3ebbe5
Reviewed-on: https://review.whamcloud.com/35740
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12542 handles: discard h_owner in favour of h_ops 39/35739/3
NeilBrown [Fri, 9 Aug 2019 00:02:43 +0000 (20:02 -0400)]
LU-12542 handles: discard h_owner in favour of h_ops

lustre_handles assigned a 64bit unique identifier (a 'cookie') to
objects of various types and stored them in a hash table, allowing
them to be accessed by the cookie.

There is a facility for type checking by recording an 'owner' for each
object, and checking the owner on lookup. Unfortunately this is not
used - owner is always zero for the client.

Each object also contains an h_ops pointer which can be used to
reliably identify an owner.

So discard h_owner, pass and 'ops' pointer to class_handle2object(),
and only return objects for which the h_ops matches.

Note: this h_owner is now quiet different from the similar h_owner
in the server code.  When the server code is merged the "med" pointer
should be stored in the "mfd" and validated separately.

This reduces the size of the portals_handle by one pointer, which
benefits various other structures including struct ldlm_lock which can
be very populous and so is best keep small.

Change-Id: I9cf2b32f8b0ea7c188888301fb6130818b3d5ae9
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35739
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12542 obdclass: Remove unused function class_handle_hash_back() 38/35738/2
Oleg Drokin [Thu, 8 Aug 2019 23:44:40 +0000 (19:44 -0400)]
LU-12542 obdclass: Remove unused function class_handle_hash_back()

No callers left.

Linux-commit: 3a459a79cea24cf0f6def24a16ce7b308d93d8a2

Test-Parameters: trivial

Change-Id: Idb7033d418758bfa3b1239cfca46d9317d168cce
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35738
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
4 years agoLU-11607 tests: replace version/fstype calls in conf-sanity 21/35721/2
James Nunez [Wed, 7 Aug 2019 21:12:51 +0000 (15:12 -0600)]
LU-11607 tests: replace version/fstype calls in conf-sanity

The routine get_lustre_env() is available to all Lustre test
suites and sets an environment variable for the file system
type for MDS1 and OST1 and sets a variable for the Lustre
version of servers.

In conf-sanity, replace the calls to facet_fstype() and
lustre_version_code() for all server types defined in
get_lustre_env().  While doing this, replace SINGLEMDS with
mds1 in these calls.

Clean up around any modifications with
- converting spaces to tabs
- removing calls to return after skip() or skip_env()

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: fstype=zfs testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I17707883d46aa66c32e1229107646bc7a9df5e4e
Reviewed-on: https://review.whamcloud.com/35721
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
4 years agoLU-12639 tests: initialize variable sanity 317 16/35716/2
James Nunez [Wed, 7 Aug 2019 15:56:50 +0000 (09:56 -0600)]
LU-12639 tests: initialize variable sanity 317

sanity test 317 checks the file system type of $facet,
but facet is not initiaized in the test.

Replace the check of the facet file system type with the
variable ost1_FSTYPE.  The call to return after skip is
removed.

Fixes: 6115eb7fd55a ("LU-10370 ofd: truncate does not update blocks count on client")
Test-Parameters: trivial
Test-Parameters: fstype=zfs envdefinitions=ONLY=317 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: If67c4d786e4d23752effd1aaebc82bb1be8aceb5
Reviewed-on: https://review.whamcloud.com/35716
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12457 kernel: new kernel [RHEL 7.7 3.10.0-1062.el7] 09/35709/2
Jian Yu [Wed, 7 Aug 2019 06:45:39 +0000 (23:45 -0700)]
LU-12457 kernel: new kernel [RHEL 7.7 3.10.0-1062.el7]

This patch makes changes to support new RHEL 7.7 release
for Lustre client.

Test-Parameters: trivial clientdistro=el7.7

Change-Id: I1fd68b56340c8674c9fae607e05faca04ba99a5a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35709
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12578 obdecho: reuse an cl env cache for obdecho survey 00/35700/3
Alexey Lyashkov [Tue, 6 Aug 2019 06:55:46 +0000 (09:55 +0300)]
LU-12578 obdecho: reuse an cl env cache for obdecho survey

obdecho enviroment is already CL_thread type, so
easy to reuse cl_env cache instead of allocate env on each
ioctl call. It reduce cpu usage dramatically.

Cray-bug-id: LUS-7552
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: Id4b36626233be6b65efc4daef649bf0ef97c2e60
Reviewed-on: https://review.whamcloud.com/35700
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4423 utils: Remove variables qualified by 'extern' under mount_utils.c 88/35688/2
Arshad Hussain [Sat, 3 Aug 2019 07:18:10 +0000 (12:48 +0530)]
LU-4423 utils: Remove variables qualified by 'extern' under mount_utils.c

Removal of corresponding 'extern' declaration is already done
under commit b43c7cc6cba514687ebcbeaa1a7dcc8cf7ffa694.

This patch is an extension to the above commit and covers
file mount_utils.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I2362be0f7ec32fba9d9e680a3d7c416e5ec40083
Reviewed-on: https://review.whamcloud.com/35688
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-12626 lnet: create existing net returns EEXIST 81/35681/2
Olaf Faaland [Fri, 2 Aug 2019 16:38:50 +0000 (09:38 -0700)]
LU-12626 lnet: create existing net returns EEXIST

When "lnetctl net add" is called for an interface/net pair that
already exists, the error returned should be EEXIST, so the
user knows that the net is already configured.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Idab79ab288a11a2920793f27df235b4dfab497fe
Reviewed-on: https://review.whamcloud.com/35681
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11673 tests: quote argument of -n conf-sanity 69/35669/3
James Nunez [Thu, 1 Aug 2019 21:23:14 +0000 (15:23 -0600)]
LU-11673 tests: quote argument of -n conf-sanity

Inside the single bracket test function '[', the argument
of the ā€˜-nā€™ flag should be quoted arguments.  If the -n
argument is not quoted, a blank value will cause the
variable to disappear and this causes issues.  Quote the
argument or use [[ ]].

conf-sanity test 79 has two cases where the ā€˜-nā€™ argument
is not quoted.  Let's correct this.

Test-Parameters: trivial envdefinitions=ONLY=79 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I4b3a43de064d1992439dc25ecc7b0682520f74c9
Reviewed-on: https://review.whamcloud.com/35669
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
4 years agoLU-12137 osd-ldiskfs: shared common code for osd lookup 81/35581/4
James Simmons [Sat, 3 Aug 2019 19:25:33 +0000 (15:25 -0400)]
LU-12137 osd-ldiskfs: shared common code for osd lookup

For osd_lookup_one_len_unlocked() the only time we don't return
the found dentry is if is_bad_inode() is true. The test for
is_bad_inode() can only be run when d_inode is not NULL so
we can move the d_inode test. The other change is the drop of
the debug message which will just fill the logs with too much
chatter so remove it.

Change-Id: I82cf73ca842a45d906ffc21c9c5397a61c2679d8
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35581
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12137 osd-ldiskfs: implement proper error handling in osd_ios_OBJECTS_scan 44/35644/3
James Simmons [Sat, 3 Aug 2019 13:59:16 +0000 (09:59 -0400)]
LU-12137 osd-ldiskfs: implement proper error handling in osd_ios_OBJECTS_scan

In the linux kernel it is considered proper coding style to
handle error codes returned by functions instead of successful
completions. Reverse the error handling of the return values from
osd_ios_lookup_one_len(). This also reduces the code indentation
and makes it easier to follow.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: Idd3b91218f80a88d69206d3c2570bea22ff1fbb1
Reviewed-on: https://review.whamcloud.com/35644
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12137 osd-ldiskfs: have scrub code handle NULL dentry inode 80/35580/5
James Simmons [Sat, 3 Aug 2019 13:46:31 +0000 (09:46 -0400)]
LU-12137 osd-ldiskfs: have scrub code handle NULL dentry inode

With the goal of making what is currently osd_ios_lookup_one_len()
used outside of the scrub code move the special handling of the
inode of the dentry being NULL out of the function. This leads to
making the scrub code more clear.

Most calls to osd_ios_lookup_one_len() are followed by a call to
osd_ios_scan_one() passing child->d_inode as the 4th arg.
To handle those cases, osd_ios_scan_one() is changed to
return -ENOENT when the fourth arg is NULL.  The remaining
cases require
    child->d_inode==NULL
to be handled explicitly.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Ib52d7e5ee9b1e88af830cd85f8c8841b2e7ac16f
Reviewed-on: https://review.whamcloud.com/35580
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12565 obdecho: use bit-locking in echo_client. 63/35563/4
NeilBrown [Mon, 22 Jul 2019 03:04:11 +0000 (23:04 -0400)]
LU-12565 obdecho: use bit-locking in echo_client.

The ep_lock used by echo client causes lockdep to complain.
Multiple locks of the same class are taken concurrently which
appear to lockdep to be prone to deadlocking, and can fill up
lockdep's fixed size stack for locks.

As ep_lock is taken on multiple pages always in ascending page order,
deadlocks don't happen, so this is a false-positive.

The function of the ep_lock is the same as thats for page_lock(),
which is implemented as a bit-lock using wait_on_bit().  lockdep
cannot see these locks, and doesn't really need to.

So convert ep_lock to a simple bit-lock using wait_on_bit for
waiting. This provides similar functionality, matches how page_lock()
works, and avoids lockdep problems.

Linux-commit: f017f3ff7eb704ea1fc125a90a39b693ee84bd0a

Change-Id: I97050e1c88ee27ca4e05b4b39a65e1850f42534b
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35563
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12477 kernel: eliminate lustre_patchless_compat.h 39/35539/7
James Simmons [Tue, 30 Jul 2019 00:30:39 +0000 (20:30 -0400)]
LU-12477 kernel: eliminate lustre_patchless_compat.h

Now that kernels earlier than RHEL7 3.11 series are no longer
support we can remove the wrapper related to the header
lustre_patchless_compat.h. With that gone the header can also
be removed.

Change-Id: I496983d3065fdd0718871548aeed6bf50fd0747b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35539
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12527 utils: Make lustre_user.h c++-legal 71/35471/9
Rob Latham [Wed, 10 Jul 2019 22:04:27 +0000 (17:04 -0500)]
LU-12527 utils: Make lustre_user.h c++-legal

recent c++ compilers did not like some of the C idioms used in this
header:
- C++ checks the types of enums more forecfully than is done in C
- signed vs unsigned comparisons will generate a warning under g++
- "invalid suffix on literal" warning: Lustre is not trying to
  generate a new literal identifier.

Signed-off-by: Rob Latham <robl@mcs.anl.gov>
Change-Id: I6aa8ba18407c14e071a7e2943b5a1a3f5be27bac
Reviewed-on: https://review.whamcloud.com/35471
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12520 osc: reserve lru pages for read in batch 40/35440/4
Wang Shilong [Mon, 5 Aug 2019 02:10:09 +0000 (10:10 +0800)]
LU-12520 osc: reserve lru pages for read in batch

The benefit of doing this is to reduce contention
against atomic counter cl_lru_left by changing it from
per-page access to per-IO access.

We have done this optimization for write, do it for read
too.

Change-Id: Ifd15d0a59eda265dce43876d953e32f27b07b6a0
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35440
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12428 tests: wait for nodemaps to be synchronized 21/35421/4
Sebastien Buisson [Fri, 5 Jul 2019 12:56:23 +0000 (21:56 +0900)]
LU-12428 tests: wait for nodemaps to be synchronized

In sanity-sec, create_nodemaps and delete_nodemaps functions
must make sure nodemaps they create or delete are actually sychronized
on other servers before continuing.

Test-Parameters: trivial
Test-Parameters: combinedmdsmgs=true mdscount=2 mdtcount=4 osscount=2 ostcount=8 testlist=sanity-sec,sanity-sec,sanity-sec
Test-Parameters: combinedmdsmgs=true mdscount=2 mdtcount=4 osscount=2 ostcount=8 testlist=sanity-sec,sanity-sec,sanity-sec
Test-Parameters: combinedmdsmgs=true mdscount=2 mdtcount=4 osscount=2 ostcount=8 testlist=sanity-sec,sanity-sec,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If11e81a64cbcd94d9381833959892a26153cc800
Reviewed-on: https://review.whamcloud.com/35421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12485 obdclass: 0-nlink race in lu_object_find_at() 60/35360/11
Lai Siyao [Fri, 28 Jun 2019 12:19:56 +0000 (20:19 +0800)]
LU-12485 obdclass: 0-nlink race in lu_object_find_at()

There is a race in lu_object_find_at: in the gap between
lu_object_alloc() and hash insertion, another thread may
have allocated another object for the same file and unlinked
it, so we may get an object with 0-nlink, which will trigger
assertion in osd_object_release().

To avoid such race, initialize object after hash insertion.
But this may cause an unitialized object found in cache, if
so, wait for the object initialized by the allocator.

To reproduce the race, introduced cfs_race_wait() and
cfs_race_wakeup(): cfs_race_wait() will cause the thread that
calls it wait on the race; while cfs_race_wakeup() will wake
up the waiting thread. Same as cfs_race(), CFS_FAIL_ONCE
should be set together with fail_loc.

Add sanityn test_84.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0869f254544256987b73f0ff92f75e4d1562e566
Reviewed-on: https://review.whamcloud.com/35360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11673 tests: quote argument of -n and test fix 80/35080/5
James Nunez [Fri, 2 Aug 2019 19:49:59 +0000 (13:49 -0600)]
LU-11673 tests: quote argument of -n and test fix

Inside the single bracket test function '[', the ā€˜-nā€™ flag
problems arise with unquoted arguments.  The -n argument
should be quoted or use double brackets for the test.

Quote the ā€˜-nā€™ argument in test-framework.sh functions.
This simple correction caused a few tests to fail.
Fix sanity test 65k to use the correct facets and check
for the mgs facet in convert_facet2label() to fix
replay-single test 58b.

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I9655d2138c56c007207434f04b487b518bb3392e
Reviewed-on: https://review.whamcloud.com/35080
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12236 lnet: support non-default network namespace 68/34768/14
Aurelien Degremont [Thu, 25 Apr 2019 13:15:56 +0000 (13:15 +0000)]
LU-12236 lnet: support non-default network namespace

Replace hard coded references to default root network namespace
(&init_net) in LNET code (LNET, socklnd and o2iblnd).

When a network interface is created, Lustre records the current
network namespace. This patch improves the LNET code to use
this reference namespace most of the time instead of the root
network namespace. When using lctl, lnetctl or insmod, we
use the current process network namespace.
When starting the listening acceptor, we use the namespace of the
process that triggers this start.

An additional patch is needed for RPCSEC GSS support.

Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I56877ddcd7a27883662c86f245b196153211e7b2
Reviewed-on: https://review.whamcloud.com/34768
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12202 misc: remove obsolete contrib files 19/34719/3
Andreas Dilger [Fri, 19 Apr 2019 00:28:46 +0000 (18:28 -0600)]
LU-12202 misc: remove obsolete contrib files

nn-final-symbol-list.txt: originally used for Solaris port
mptlinux.spec.patch: for ancient SCSI drivers
01-dont-include-openib-initscript-rhel5.ed: old RHEL5 patch
rdac_spec: old driver file

I only wish there was a "really_trivial" keyword...

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If66db22caf604736949e18289af74cc4f0fa6613
Reviewed-on: https://review.whamcloud.com/34719
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11681 lfsck: misc fixes in inserting shard 28/33928/14
Lai Siyao [Wed, 12 Dec 2018 16:14:13 +0000 (00:14 +0800)]
LU-11681 lfsck: misc fixes in inserting shard

* To insert a shard, we need to convert parent to a plain
directory by removing its LMV, and set back after insertion.
* If parent lost its LMV, after inserting shard, also copy
shard LMV to parent.
* Currenly remove striped directory LMV will remote its shard
LMV too, but if directory LMV is corrupt, removing will fail.
As a work around, don't remove shard LMV when removing striped
directory LMV, normally when we do this, we will remove shards.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ic0f6c2eadb607e82b281ffaf9eaa75284ab90a16
Reviewed-on: https://review.whamcloud.com/33928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11681 lfsck: misc fixes for dangling entry repair 27/33927/11
Lai Siyao [Wed, 12 Dec 2018 16:04:12 +0000 (00:04 +0800)]
LU-11681 lfsck: misc fixes for dangling entry repair

* lock child in dangling entry repair.
* if child is directory, create it as plain directory, because
  it may have shards already, which will be inserted later, besides,
  it may be remote, and creating striped directory remotely is not
  supported.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ide0d90f99e6d24f4a9507d98bbc7ad7f876a907d
Reviewed-on: https://review.whamcloud.com/33927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11681 lfsck: read LMV from bottom object 26/33926/13
Lai Siyao [Wed, 26 Dec 2018 17:42:52 +0000 (01:42 +0800)]
LU-11681 lfsck: read LMV from bottom object

LFSCK should read directory LMV from bottom object because it
doesn't want to read shard FIDs, and the allocated buffer size
can only contain struct lmv_mds_md_v1.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2160b029ea4d19c5c056a0d06c340194a4811b00
Reviewed-on: https://review.whamcloud.com/33926
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11542 import: fix race between imp_state & imp_invalid 95/33395/13
Yang Sheng [Mon, 15 Oct 2018 09:37:21 +0000 (17:37 +0800)]
LU-11542 import: fix race between imp_state & imp_invalid

We set import to LUSTRE_IMP_DISCON and then deactive when
it is unreplayable. Someone may set this import up between
those two operations. So we will get a invalid import with
FULL state.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ib4cec0bcaf6f4b221ba260edb94749a4e523f5e6
Reviewed-on: https://review.whamcloud.com/33395
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10756 osp: properly order sysfs registration 64/35464/4
James Simmons [Thu, 11 Jul 2019 01:08:48 +0000 (21:08 -0400)]
LU-10756 osp: properly order sysfs registration

When generating udev events for import states I found that the
order of setting up the sysfs tree for osp was important. Setup
the osp obd device sysfs tree before lwp sysfs device. Always
call client_obd_cleanup() first on shutdown.

Change-Id: I29257f3e91f10f8266509b535e36cc8b62ce2362
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35464
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12627 ofd: reset fti_attr in ofd_lvbo_update() 85/35685/3
Wang Shilong [Sat, 3 Aug 2019 06:27:22 +0000 (14:27 +0800)]
LU-12627 ofd: reset fti_attr in ofd_lvbo_update()

This patch try to fix following panic:

(ofd_internal.h:440:tsi2ofd_info()) ASSERTION( info->fti_attr.la_valid == 0 ) failed:
(ofd_internal.h:440:tsi2ofd_info()) LBUG
[ 5321.108598] Call Trace:
[ 5321.109347]  [<ffffffffc06fc8bc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 5321.111342]  [<ffffffffc06fc96c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 5321.113026]  [<ffffffffc147631a>] ofd_preprw+0xcfa/0x1160 [ofd]
[ 5321.114643]  [<ffffffffc0bb934c>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc]
[ 5321.116373]  [<ffffffffc0bbc50a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc]
[ 5321.118230]  [<ffffffffc0b61636>] ptlrpc_server_handle_request+0x256/0xb00 [ptlrpc]
[ 5321.120318]  [<ffffffffc0b6516c>] ptlrpc_main+0xbac/0x1560 [ptlrpc]
[ 5321.122001]  [<ffffffff84cc1c31>] kthread+0xd1/0xe0
[ 5321.123023]  [<ffffffff85374c37>] ret_from_fork_nospec_end+0x0/0x39
[ 5321.124066]  [<ffffffffffffffff>] 0xffffffffffffffff

If this is server lock, tgt_brw_lock() will finally call
ofd_lvbo_update() upon lock canceling which will use @fti_attr
and pollute value:

|->ptlrpc_main
 |->lu_context_enter(le_ctx)
  |->tgt_brw_write
   |->tgt_brw_lock
    |->tgt_extent_lock
     |->ldlm_cli_enqueue_local
      |->ldlm_lock_enqueue
       |->ldlm_run_ast_work
        |->ptlrpc_check_set
          |->ldlm_cb_interpret
           |->ldlm_handle_ast_error
            |->ofd_lvbo_update
             |->ofd_attr_get polluted @info->fti_attr

  |->tgt_brw_write
   |->ofd_preprw
    |->tsi2ofd_info
      |->ASSERTION(info->fti_attr.la_valid == 0)

 |->lu_context_exit(le_ctx)--->memset @fti_attr

To fix this problem, reset fti_attr->la_valid before
ofd_lvbo_update() return just like what offd_lvbo_init() did.

Change-Id: Ib6b448dd21603cfe0305d8425862a96ef3f7fee8
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12566 mdc: hold lock while walking changelog dev list 68/35668/3
Andreas Dilger [Thu, 1 Aug 2019 20:55:58 +0000 (14:55 -0600)]
LU-12566 mdc: hold lock while walking changelog dev list

In mdc_changelog_cdev_finish() we need chlg_registered_dev_lock
while walking and changing entries on the chlog_registered_devs
and ced_obds lists in chlg_registered_dev_find_by_obd().

Move the calling of chlg_registered_dev_find_by_obd() under the
mutex, and add assertions to the places where the lists are walked
and changed that the mutex is held.

Fixes: 1d40214d96dd ("LU-7659 mdc: expose changelog through char devices")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib62fdff87cde6a4bcfb9bea24a2ea72a933ebbe5
Reviewed-on: https://review.whamcloud.com/35668
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 tests: Fix style issues for parallel_grouplock.c 36/35436/2
Arshad Hussain [Sat, 6 Jul 2019 01:29:21 +0000 (06:59 +0530)]
LU-6142 tests: Fix style issues for parallel_grouplock.c

This patch fixes issues reported by checkpatch
for file lustre/tests/mpi/parallel_grouplock.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I746b77db6fe27ee7b9306014f57887ee7504baf3
Reviewed-on: https://review.whamcloud.com/35436
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-6142 tests: Fix style issues for createmany-mpi.c 35/35435/2
Arshad Hussain [Sat, 6 Jul 2019 00:06:07 +0000 (05:36 +0530)]
LU-6142 tests: Fix style issues for createmany-mpi.c

This patch fixes issues reported by checkpatch
for file lustre/tests/mpi/createmany-mpi.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I365f533bbd68a256be971486cd1232bddca4342f
Reviewed-on: https://review.whamcloud.com/35435
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-6142 tests: Fix style issues for mdsrate.c 34/35434/3
Arshad Hussain [Fri, 5 Jul 2019 21:37:28 +0000 (03:07 +0530)]
LU-6142 tests: Fix style issues for mdsrate.c

This patch fixes issues reported by checkpatch
for file lustre/tests/mpi/mdsrate.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Ibfbadca6af5c915193363a772bc8f6c9a7107ffd
Reviewed-on: https://review.whamcloud.com/35434
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-6142 util: Fix style issues for parser.c 84/35384/4
Arshad Hussain [Tue, 25 Jun 2019 03:15:26 +0000 (08:45 +0530)]
LU-6142 util: Fix style issues for parser.c

This patch fixes issues reported by checkpatch
for file libcfs/libcfs/util/parser.c

Change-Id: I6e5f4d40634e4b3a86640fb2793a70cfd646c725
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/35384
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-6142 util: Fix style issues for l_ioctl.c 82/35382/2
Arshad Hussain [Tue, 25 Jun 2019 00:45:41 +0000 (06:15 +0530)]
LU-6142 util: Fix style issues for l_ioctl.c

This patch fixes issues reported by checkpatch
for file libcfs/libcfs/util/l_ioctl.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I75a8dad667e62bf128af79d64f0ce001ffc9f12c
Reviewed-on: https://review.whamcloud.com/35382
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
4 years agoLU-11743 utils: allow lctl pool commands on separate MGS 10/34110/12
Andreas Dilger [Wed, 12 Dec 2018 08:49:00 +0000 (16:49 +0800)]
LU-11743 utils: allow lctl pool commands on separate MGS

The current lctl code checks for the presence of configured pools on
the client and MDS via /proc or /sys files.  However, the MGS does
not parse the client/MDS configuration logs, so it does not create
the various files for the pools, which causes the pool commands to
fail verification.

Change lctl pool_new, pool_add, pool_remove and pool_destroy commands
to parse the configuration log directly when run on a standalone MGS
node.  This also allows the pool commands to be run when only the MGS
is started.

Test-Parameters: standalonemgs=true testlist=ost-pools.sh
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ib6fdb367c919f7b726fbf551dcfa6015593ebbe5
Reviewed-on: https://review.whamcloud.com/34110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12604 mdt: check field size of sec context name 55/35655/6
Sebastien Buisson [Wed, 31 Jul 2019 16:12:40 +0000 (18:12 +0200)]
LU-12604 mdt: check field size of sec context name

In request received from client, check that claimed size of
RMF_FILE_SECCTX_NAME field is consistent with expected content,
which is supposed to be an extended attribute name.

Test-Parameters: clientselinux testlist=sanity,recovery-small,sanity-selinux envdefinitions=SANITY_EXCEPT="271f"
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ice96f0e03f790b334fcdf64ae4becef2e39738f4
Reviewed-on: https://review.whamcloud.com/35655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12600 tgt: shortio size should be unsigned 53/35653/4
Patrick Farrell [Tue, 30 Jul 2019 18:10:32 +0000 (14:10 -0400)]
LU-12600 tgt: shortio size should be unsigned

The short_io_size value is accepting unsigned values from
req_capsule_get_size, and so needs to be unsigned as well.

If it's not, it's possible for the short_io_size memcopy to
act on an incorrect value and cause memory corruption.

Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I043e314cd43a7b40519f951a605fa5a36ff91dcf
Reviewed-on: https://review.whamcloud.com/35653
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>