Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-13437 lmv: check stripe FID sanity 00/39600/2
Lai Siyao [Fri, 8 May 2020 14:53:47 +0000 (22:53 +0800)]
LU-13437 lmv: check stripe FID sanity

Striped directory layout may be broken, if some stripe FID is insane,
return -ENODEV.

Lustre-change: https://review.whamcloud.com/38560
Lustre-commit: 698a496aac51e11791717a9cbd0a86b3525f4557

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7ed8c7c561e34625e2cb29bfd14bc0ecf3fce46c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39600
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13471 lnet: use the same src nid for discovery 76/39576/4
Amir Shehata [Thu, 23 Apr 2020 00:06:23 +0000 (17:06 -0700)]
LU-13471 lnet: use the same src nid for discovery

When discovering a remote peer (not on the same network) a GET is
sent to the peer to retrieve the peer's interfaces.  This is followed
by a PUSH, if discovery is on, to push the node's interfaces However,
if both node and peer have multiple interfaces it is likely that the
GET and the PUSH will originate on different interfaces. When the
peer receives the PUSH it will not be able to connect the two NIDs
and will not be able to consolidate the node's NIDs.  This issue is
specific for remote peers because at the time the push handler is
invoked the remote lpni has not been created yet. lnet_parse()
creates the lpni of the gateway.

Similar to the strategy already in place of using the same source NID
for all the messages of an RPC, discovery should use the same source
NID for both the GET and PUSH.

This patch stores the source NID interfaces the GET was sent on and
uses it for the PUSH.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I5a13ab7799b2ddc47714202bcbed786b0d3940b7
Reviewed-on: https://review.whamcloud.com/38320
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39576

3 years agoLU-13907 llite: don't set FS_REQUIRES_DEV on client 74/39674/4
Andreas Dilger [Thu, 13 Aug 2020 22:18:52 +0000 (16:18 -0600)]
LU-13907 llite: don't set FS_REQUIRES_DEV on client

If doing a client-only build, do not set the FS_REQUIRES_DEV flag
for the 'lustre' filesystem type.  This is only needed on the server,
but the filesystem type declaration is shared between both.

In master, this was fixed by declaring a new 'lustre_tgt' filesystem
type and using that for server filesystem mounts.  However, for 2.12
this is overkill, and it is possible to get a 95% fix by dropping
the FS_REQUIRES_DEV flag for the common case of client-only builds.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: Iab2e78515aba018e2a6bceb324ad1b8a313ebbe5
Reviewed-on: https://review.whamcloud.com/39674
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13187 osd-ldiskfs: don't enforce max dir size limit on IAM objects 82/39882/3
Li Dongyang [Thu, 3 Sep 2020 23:34:34 +0000 (09:34 +1000)]
LU-13187 osd-ldiskfs: don't enforce max dir size limit on IAM objects

Add ext4-no-max-dir-size-limit-for-iam-objects.patch to introduce new
inode state EXT4_STATE_IAM and use it to mark IAM objects.

Lustre-change: https://review.whamcloud.com/39823
Lustre-commit: 03e6db505be90d35ccacb3af7e15277784e5d448

Change-Id: I3bcc5435ea07edb9fa265dcd8e3261d849495f00
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13763 osc: don't allow negative grants 80/39380/4
Mikhail Pershin [Wed, 15 Jul 2020 05:42:49 +0000 (08:42 +0300)]
LU-13763 osc: don't allow negative grants

Add check in the osc_init_grant() to prevent possible
underflow of cl_avail_grant and report error if it happens

Lustre-change: https://review.whamcloud.com/#/c/39827
Lustre-commit: e05ccafd6ee214895d01efbb13a3757e3625a859

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Idcd25ed427c23735e1cdc70359bace43b5b9d886
Reviewed-on: https://review.whamcloud.com/39380
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12687 osc: consume grants for direct I/O 86/39386/11
Vladimir Saveliev [Mon, 29 Jun 2020 11:26:57 +0000 (14:26 +0300)]
LU-12687 osc: consume grants for direct I/O

New IO engine implementation lost consuming grants by direct I/O
writes. That led to early emergence of out of space condition during
direct I/O. The below illustrates the problem:
  # OSTSIZE=100000 sh llmount.sh
  # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct
  dd: error writing ‘/mnt/lustre/file’: No space left on device

Consume grants for direct I/O.

Try to consume grants in osc_queue_sync_pages() when it is called for
pages which are being writted in direct i/o.

Tests are added to verify grant consumption in buffered and direct i/o
and to verify direct i/o overwrite when ost is full.
The overwrite test is for ldiskfs only as zfs is unable to overwrite
when it is full.

Lustre-change: https://review.whamcloud.com/35896
Lustre-commit: 05f326a7988a7a0d6954d1b0d318315526209ae6

Fixes: 9fe4b52ad2 ("LU-1030 osc: new IO engine implementation")
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e
Cray-bug-id: LUS-7036
Reviewed-on: https://review.whamcloud.com/39386
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13761 o2ib: Fix compilation with MOFED 5.1 81/39781/2
Sergey Gorenko [Tue, 1 Sep 2020 06:53:06 +0000 (23:53 -0700)]
LU-13761 o2ib: Fix compilation with MOFED 5.1

A new argument was added to rdma_reject() in MOFED 5.1 and
Linux 5.8.

Add a cofigure check and support both versions of rdma_reject().

Lustre-commit: 956deb0fe8195c7a0c38c66a5a8cc1e95c2c245e
Lustre-change: https://review.whamcloud.com/39323

Test-Parameters: trivial
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Change-Id: I2b28991f335658b651b21a09899b7b17ab2a9d57
Reviewed-on: https://review.whamcloud.com/39781
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13742 llite: do not bypass selinux xattr handling 71/39671/3
Shaun Tancheff [Wed, 5 Aug 2020 14:17:03 +0000 (09:17 -0500)]
LU-13742 llite: do not bypass selinux xattr handling

Without the hint from selinux_is_enabled() to determine if selinux
is running at boot the performance fix from LU-549 to skip handling
of selinux xattrs cannot be correctly handled.

The correct path is to act is if selinux is enabled.

This fixes a bug introduced by LU-12355 that now exists in
RHEL 8.2 kernels where clients have enabled selinux.

Lustre-change: https://review.whamcloud.com/39569
Lustre-commit: 994287bd47819ebd8badb716da4232cdff97d324

Fixes: 39e5bfa734 ("LU-12355 llite: include file linux/selinux.h removed")
Test-Parameters: clientdistro=el8.2 serverdistro=el8.2 clientselinux testlist=sanity-selinux
Test-Parameters: clientdistro=el8.1 serverdistro=el8.1 clientselinux testlist=sanity-selinux
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6fb5ed9ecdb79545225b5586b90509eb157a355b
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39671
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13580 tests: fix retrieval of SELinux context 13/39713/2
Sebastien Buisson [Mon, 18 May 2020 09:43:22 +0000 (11:43 +0200)]
LU-13580 tests: fix retrieval of SELinux context

Use 'stat' command instead of 'ls -lZ' to retrieve SELinux security
context, to make it more portable.

Lustre-change: https://review.whamcloud.com/38648
Lustre-commit: ca09fda138b6d72588f40e4cf79c5f2de832d2dd

Test-Parameters: trivial clientselinux testlist=sanity-selinux mdtcount=2 clientcount=2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I61bc0efb1e8ae0427d05827e2933eb0b848fb442
Reviewed-on: https://review.whamcloud.com/39713
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13278 lnet: Reconcile discovery push and reply handling 75/39575/2
Chris Horn [Mon, 10 Feb 2020 20:11:49 +0000 (14:11 -0600)]
LU-13278 lnet: Reconcile discovery push and reply handling

Reconcile the logic for updating the multi-rail flag of a peer when
processing a discovery PUSH with the logic used when processing a
discovery REPLY.

Cray-bug-id: LUS-8516
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Idfb4c3729822d03b71f9440ac66176ae6b886022
Reviewed-on: https://review.whamcloud.com/37674
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Stephen Champion <stephen.champion@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39575
Reviewed-by: Chris Horn <chris.horn@hpe.com>
3 years agoLU-13818 build: use libsnmp-dev instead of libsnmp30 79/39679/2
Minh Diep [Fri, 24 Jul 2020 17:38:04 +0000 (10:38 -0700)]
LU-13818 build: use libsnmp-dev instead of libsnmp30

Installing libsnmp-dev will pull in the correct libsnmpXX.
By depending on the libsnmp-dev we can install on
ubuntu 20.04 which is libsnmp35

Lustre-change: https://review.whamcloud.com/39506
Lustre-commit: af2f77633bf7b12d6ca1ab606ff90cf1ee58107a

Change-Id: Ib921ac35e06149ba88fa8e39b9a0980deb94acf2
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13599 mdt: fix mti_big_lmm buffer usage 21/39521/2
Mikhail Pershin [Tue, 28 Jul 2020 11:33:18 +0000 (14:33 +0300)]
LU-13599 mdt: fix mti_big_lmm buffer usage

The mti_big_lmm buffer can be used just as temporary buffer
in some cases. It should drop mti_big_lmm_used flag after
that to avoid assertion in mdt_big_attr_get().

This fix is extracted from bigger patch of LU-11025 in
master branch.

Lustre-change: https://review.whamcloud.com/37284
Lustre-commit: a336d7c7c1cd62a5a5213835aa85b8eaa87b076a

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3718d6c413ef1d5f8242e548868602ef6476006e
Reviewed-on: https://review.whamcloud.com/39521
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-9971 lnet: use after free in lnet_discover_peer_locked() 91/38891/6
Olaf Weber [Tue, 12 Sep 2017 12:07:50 +0000 (14:07 +0200)]
LU-9971 lnet: use after free in lnet_discover_peer_locked()

When the lnet_net_lock is unlocked, the peer attached to an
lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer)
can change, and the old peer deallocated. If we are really
unlucky, then all the churn could give us a new, different,
peer at the same address in memory.

Change the reference counting on the lnet_peer lp so that it
is guaranteed to be alive when we relock the lnet_net_lock for
the cpt. When the reference count is dropped lp may go away if
it was unlinked, but the new peer is guaranteed to have a
different address, so we can still correctly determine whether
the peer changed and discovery should be redone.

LU-9971 lnet: fix peer ref counting

Exit from the loop after peer ref count has been incremented
to avoid wrong ref count.

The code makes sure that a peer is queued for discovery at most
once if discovery is disabled. This is done to use discovery
as a standard ping for gateways which do not have discovery feature
or discovery is disabled.

Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
Change-Id: Ia44dce20074b27ec0e77d7c1908c6a44ec73d326
Reviewed-on: https://review.whamcloud.com/28944
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38891
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-13609 llog: list all the log files correctly on MGS/MDT 30/39330/4
Emoly Liu [Fri, 10 Jul 2020 05:05:00 +0000 (13:05 +0800)]
LU-13609 llog: list all the log files correctly on MGS/MDT

"lctl --device xxx llog_catlist" should list all the config log on
MGS and catalog on MDT correctly without any buffer size limit.
If data can't be fetched in one time, data->ioc_count is used to
save the number of all the fetched logs and then continue.

conf-sanity.sh test_123af is added to verify this patch. And the
minor style issue in LU-13757 is fixed as well.

Lustre-change: https://review.whamcloud.com/38917
Lustre-commit: 1d97a8b4cd3de9074f323332c7b736367a70d419

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I364d563446833751b1f017fa2bef0351dab56235
Reviewed-on: https://review.whamcloud.com/39330
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13667 ptlrpc: fix endless loop issue 44/39344/2
Hongchao Zhang [Fri, 19 Jun 2020 02:53:12 +0000 (10:53 +0800)]
LU-13667 ptlrpc: fix endless loop issue

In ptlrpc_pinger_main, if the process to ping the recoverable
clients or obd_update_maxusage takes too long time, it could
be stuck in endless loop because of the negative value returned
by pinger_check_timeout.

Lustre-change: https://review.whamcloud.com/38915
Lustre-commit: 6be2dbb2595121fabceda86c5f7bdcb45e10b320

Change-Id: Ib7fc22b3cc31255223bc2be60224ced1a3585f87
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-12222 ptlrpc: Check if NID is local, not just lolnd NID 65/38865/2
Chris Horn [Mon, 27 Apr 2020 15:07:21 +0000 (10:07 -0500)]
LU-12222 ptlrpc: Check if NID is local, not just lolnd NID

There's a couple places where we check whether a NID is the lolnd NID
but we really want to know whether the NID is local. Use
LNetIsPeerLocal() to accomplish this.

Lustre-change: https://review.whamcloud.com/38388
Lustre-commit: 95bcc24642c4b95d093407fef0947ee2f5a2c01a

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ia17b9b4b54fd1063c42a6f8bdd0e593be1086683
Reviewed-on: https://review.whamcloud.com/38865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12222 lnet: Primary NID of lolnd NID is the lolnd NID 64/38864/2
Chris Horn [Wed, 22 Apr 2020 16:42:27 +0000 (11:42 -0500)]
LU-12222 lnet: Primary NID of lolnd NID is the lolnd NID

We want Lustre traffic that is intended for the local peer to be sent
and received over the lolnd. The function ptlrpc_uuid_to_peer() will
currently resolve a NID to the lolnd NID, but ptlrpc_connection_get()
will overwrite this selection with the result from LNetPrimaryNID().

Have LNetPrimaryNID return the lolnd NID when it is passed the lolnd
NID.

Lustre-change: https://review.whamcloud.com/38313
Lustre-commit: 33d2e44e5026f1e9162dd5e6b931085fdc035a34

HPE-bug-id: LUS-8457
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I02708bb45f8440091782ca7886bac7656efb0223
Reviewed-on: https://review.whamcloud.com/38864
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-12222 lnet: Introduce constant for the lolnd NID 63/38863/2
Chris Horn [Wed, 22 Apr 2020 16:39:46 +0000 (11:39 -0500)]
LU-12222 lnet: Introduce constant for the lolnd NID

This patch adds a new constant, LNET_NID_LO_0, to represent the lolnd
NID 0@lo.

Lustre-change: https://review.whamcloud.com/38312
Lustre-commit: 56203e4ba0a64789e42ea45946e8c51f1db351fb

HPE-bug-id: LUS-8457
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I3e57637f297b8de306905a447af8f025e31d1fcf
Reviewed-on: https://review.whamcloud.com/38863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-12758 quota: clear default flag for new ID 08/38808/2
Hongchao Zhang [Tue, 2 Jun 2020 16:20:47 +0000 (09:20 -0700)]
LU-12758 quota: clear default flag for new ID

When setting the quota limits as 0 by "lfs setquota", the default
flag won't be cleared if the lquota_entry is just created for some
quota ID at the first time because the quota limits are the same.

This patch is back-ported from the following one:
Lustre-commit: ce86e23b21ccffc395089578c0ca356de219ac88
Lustre-change: https://review.whamcloud.com/36236

Change-Id: I7f44ce0cb13783ca5bede2f55cd0707f1ccbc8ca
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38808
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13659 kernel: kernel update SLES12 SP4 [4.12.14-95.54.1] 39/39239/3
Jian Yu [Thu, 2 Jul 2020 04:14:25 +0000 (21:14 -0700)]
LU-13659 kernel: kernel update SLES12 SP4 [4.12.14-95.54.1]

Update SLES12 SP4 kernel to 4.12.14-95.54.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT="103a 817"

Change-Id: If7b9143bec6d9c526bd65e96a771c83f2530e608
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39239
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13599 mdt: fix logic of skipping local locks in reply_state 91/39191/4
Mikhail Pershin [Fri, 26 Jun 2020 15:17:06 +0000 (18:17 +0300)]
LU-13599 mdt: fix logic of skipping local locks in reply_state

The mdt_reint_migrate() controls amount of local locks taken and
prevent the saving too many locks in reply_state by doing local
sync instead. Meanwhile there is flaw in logic of doing that so
they are saved always causing assertion in ptlrpc_save_lock().

Patch adds 'do_sync' local parameter into consideration while
deciding to save local lock or not.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I98cca84825ce5789094fbceb5d1f7975410d134b
Reviewed-on: https://review.whamcloud.com/39191
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12424 lnet: prevent loop in LNetPrimaryNID() 90/38890/4
Amir Shehata [Tue, 11 Jun 2019 18:25:27 +0000 (11:25 -0700)]
LU-12424 lnet: prevent loop in LNetPrimaryNID()

If discovery is disabled locally or at the remote end, then attempt
discovery only once. Do not update the internal database when
discovery is disabled and do not repeat discovery.

This change prevents LNet from getting hung waiting for
discovery to complete.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4543b0f71e6cf297a1a5f058ebcc6bf74b8ac328
Reviewed-on: https://review.whamcloud.com/35191
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38890
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
3 years agoLU-13149 tests: change sanityn 103 facet value 47/38847/3
James Nunez [Fri, 5 Jun 2020 15:15:01 +0000 (09:15 -0600)]
LU-13149 tests: change sanityn 103 facet value

The facet name input to lustre_version_code() in sanityn
test 103 should be 'ost1' not a variable '$ost1'.  Let's
replace this call with the $OST1_VERSION variable.

Fixes: 2548cb9e32bfca ("LU-11670 osc: glimpse - search for active lock")
Test-Parameters: trivial
Test-Parameters: serverversion=2.10.8 serverdistro=el7.6 env=ONLY=103 testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ib7426f78210c9b32ba53c46ba5f08faeb3ea8ec5
Reviewed-on: https://review.whamcloud.com/38847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 years agoLU-11782 tests: add version check to conf-sanity 117 51/38851/2
James Nunez [Fri, 5 Jun 2020 17:21:09 +0000 (11:21 -0600)]
LU-11782 tests: add version check to conf-sanity 117

conf-sanity test 117 was added to check error returns from
read_param().  This test will fail when run with servers
with Lustre version less than 2.12.0 and, thus, should be
skipped for all Lustre servers earlier than 2.12.0.

Fixes: 6ca2425ccf6b ("LU-11198 utils: propagate errors for read_param")
Test-Parameters: trivial
Test-Parameters: serverversion=2.10.8 serverdistro=el7.6 env=ONLY=117 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ia0889584d9c1a6c09ea2a99fa11c7abfd1474de4
Reviewed-on: https://review.whamcloud.com/38851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 years agoLU-13640 tests: add version check to conf-sanity 125 50/38850/2
James Nunez [Fri, 5 Jun 2020 16:55:44 +0000 (10:55 -0600)]
LU-13640 tests: add version check to conf-sanity 125

In Lustre 2.12.3, the l_tunedisk utility was modified to
skip tuning devices on the MDS and MGS and conf-santity
test 125 was added to check this functionality.  Thus, this
test should be skipped for all Lustre server versions prior
to 2.12.3.

Fixes: bab0570ce3081 ("LU-12387 tests: Validate l_tunedisk max_sectors_kb tuning")
Test-Parameters: trivial
Test-Parameters: serverversion=2.10.8 serverdistro=el7.6 env=ONLY=125 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I89c2900c2430ff3e76bee297809957380404aa31
Reviewed-on: https://review.whamcloud.com/38850
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13088 ldlm: Fix sleeping function called in atomic 83/39283/3
Mr NeilBrown [Thu, 19 Dec 2019 05:55:35 +0000 (16:55 +1100)]
LU-13088 ldlm: Fix sleeping function called in atomic

target_recovery_overseer() can sleep while holding a spinlock, which
triggers a BUG warning.

It is easily fixed by dropping the spinlock before waiting.  In the
case where the task waits, no useful information that could be
protected by the spinlock is held, so nothing can be lost by dropping
it.

Lustre-change: https://review.whamcloud.com/#/c/37063/
Lustre-commit: b29b9310dafe17ba78e1db490b79b89d2d6fdcd1

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8bb3d02523b5dcfadac19f01ccb736d7b7f28239
Reviewed-on: https://review.whamcloud.com/37063
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39283

3 years agoLU-13653 mdt: ignore quota when creating slave stripe 82/39282/2
Hongchao Zhang [Wed, 24 Jun 2020 09:53:55 +0000 (17:53 +0800)]
LU-13653 mdt: ignore quota when creating slave stripe

When creating striped directory, the quota limit has been checked
on master MDT, the quota should be ignored when creating the slave
stripe object.

Lustre-change: https://review.whamcloud.com/#/c/38875/
Lustre-commit: f762acebfcc6a88c3f4ba6296cbd6f1696bff530

Change-Id: Ia53b1975a8d66c78725feb313659f7a9b889e735
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38875
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39282
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
3 years agoLU-13709 utils: 'lfs mkdir -i -1' doesn't work 65/39165/3
Lai Siyao [Wed, 24 Jun 2020 12:01:08 +0000 (20:01 +0800)]
LU-13709 utils: 'lfs mkdir -i -1' doesn't work

'lfs mkdir -i -1 -c...' is to create directory on MDTs by space usage,
when stripe count is more than 1, the target MDT list is not correctly
initialized, which will cause command fail.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id4584940cec390a9245e888c96c7873f5afa209e
Reviewed-on: https://review.whamcloud.com/39165
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13600 ptlrpc: limit rate of lock replays 11/39111/5
Mikhail Pershin [Fri, 12 Jun 2020 14:14:50 +0000 (17:14 +0300)]
LU-13600 ptlrpc: limit rate of lock replays

Clients send all lock replays at once and that may overwhelm
server with huge amount of replays in recovery queue causing
OOM effects.

Patch adds rate control for lock replays on client.

Patch includes also later fix for signal_completed_replay()
race.

Lustre-change: https://review.whamcloud.com/38920
Lustre-commit: 3b613a442b8698596096b23ce82e157c158a5874

Lustre-change: https://review.whamcloud.com/39140
Lustre-commit: dc654756af63bd30802ebd86074019d1533a4d8f

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie557f8481c5facb690468d7136cf5feebe4e8f11
Reviewed-on: https://review.whamcloud.com/39111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13657 kernel: kernel update RHEL8.2 [4.18.0-193.6.3.el8_2] 03/38903/4
Jian Yu [Tue, 7 Jul 2020 18:13:05 +0000 (11:13 -0700)]
LU-13657 kernel: kernel update RHEL8.2 [4.18.0-193.6.3.el8_2]

Update RHEL8.2 kernel to 4.18.0-193.6.3.el8_2 for Lustre client.

Test-Parameters: trivial clientdistro=el8.2

Change-Id: Id9eb16b9277bf2157905eb38a23a3250a0033560
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38903
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13503 mdc: allow setting max_mod_rpcs_in_flight larger 93/38893/3
Andreas Dilger [Wed, 10 Jun 2020 21:34:03 +0000 (14:34 -0700)]
LU-13503 mdc: allow setting max_mod_rpcs_in_flight larger

Allow setting mdc.*.max_mod_rpcs_in_flight > mdc.*.max_rpcs_in_flight
by increasing the latter value, rather than returning an error and
telling the user to do that.  This matches the similar behavior if
mdc.*.max_rpcs_in_flight is reduced lower than max_mod_rpcs_in_flight.

If there are multiple MDTs, the "mdc.*.max_mod_rpcs_in_flight" param
may be set from e.g. the MDT0000 config log before MDT0001 is fully
configured, catching MDT0001 with ocd_maxmodrpcs = 0 before the OCD
from the MDT has been filled in, and incorrectly trigger an error.
If seen during setup, allow ocd_maxmodrpcs = (max_rpcs_in_flight - 1),
since this will be fixed up later if mdc.*.max_rpcs_in_flight is set
smaller in the config log (if set larger it doesn't matter).

Test-Parameters: env=ONLY=90 testlist=conf-sanity

This patch is back-ported from the following one:
Lustre-commit: 6d314902e6d19229379577aab60d4b20a5b4d2ea
Lustre-change: https://review.whamcloud.com/38455

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I4b20163e9e212db451738169ebdc361ab8c1c15e
Reviewed-on: https://review.whamcloud.com/38893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12100 tests: Use least qunit to set limit 69/38769/4
Nathaniel Clark [Tue, 19 Nov 2019 14:52:45 +0000 (09:52 -0500)]
LU-12100 tests: Use least qunit to set limit

Use least qunit to set lower limit for inodes in sanity-quota/2
This ensures that the limit is set at or above the minimum size.

Lustre-change: https://review.whamcloud.com/36797
Lustre-commit: 33e500cfb33406b8dddac46e1dfb5a3d59ff01c5

Test-Parameters: trivial
Test-Parameters: env=ONLY=2 testlist=sanity-quota
Test-Parameters: env=ONLY=2 testlist=sanity-quota fstype=zfs
Test-Parameters: env=ONLY=2,ONLY_REPEAT=20 fstype=zfs testlist=sanity-quota
Test-Parameters: mdtcount=2 mdscount=4 env=ONLY=2,ONLY_REPEAT=20 fstype=zfs testlist=sanity-quota

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I80e2c3cb66870d11f74f34c435e266a46630479b
Reviewed-on: https://review.whamcloud.com/36797
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/38769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13473 llite: don't check mirror info for page discard 56/38856/2
Bobi Jam [Wed, 22 Apr 2020 05:28:54 +0000 (13:28 +0800)]
LU-13473 llite: don't check mirror info for page discard

The CIT_MISC is used for locks/pages manipulation, it will not
go with full io procedure, i.e. cl_io_loop() will not be called
for it. So don't check it for plain file since the mirror info
is not initialized/set in this case.

Lustre-change: https://review.whamcloud.com/38307
Lustre-commit: d0dd744ed6ae002f34bdade993428b635b23d072

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I723d18260629b8f7c470d350d6d899d3bb88018a
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38856
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12865 tests: fix sanity 160f to be more robust 33/38833/2
Andreas Dilger [Thu, 17 Oct 2019 07:19:26 +0000 (16:19 +0900)]
LU-12865 tests: fix sanity 160f to be more robust

The sanity test_160f test was failing intermittently because the first
Changelog user ("cl6") was being unregistered in some cases when it
set changelog_max_idle_time=10, but the test slept for 9s and then did
some operations that could be slow.  In rare cases the test runs too
long and the MDS evicts the "good" user along with the bad user:

   MDD0000: Force deregister of ChangeLog user cl7 idle more than 35s
   MDD0000: Force deregister of ChangeLog user cl6 idle more than 11s

Change the test sleep interval to be half of the max_idle limit so
that there is no risk of the "good" Changelog user being evicted.

Add some logging to the test so that it is easier to correlate test
script actions with events in the MDS debug log.

Lustre-change: https://review.whamcloud.com/36468
Lustre-commit: 4b0f0164c6ed761897409186376e9edc989323c9

Fixes: 31fef6845e8b ("LU-10680 mdd: create gc thread when no current transaction")
Test-Parameters: trivial envdefinitions=ONLY=160 testlist=sanity,sanity
Test-Parameters: envdefinitions=ONLY=160 mdscount=2 testlist=sanity,sanity

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0e4c9c271d98a2716f848e75676780b0383ebbe5
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13421 kernel: kernel update RHEL8.1 [4.18.0-147.8.1.el8_1] 27/38227/7
Jian Yu [Thu, 28 May 2020 07:44:34 +0000 (00:44 -0700)]
LU-13421 kernel: kernel update RHEL8.1 [4.18.0-147.8.1.el8_1]

Update RHEL8.1 kernel to 4.18.0-147.8.1.el8_1 for Lustre client.

Test-Parameters: trivial clientdistro=el8.1

Change-Id: I4c8d925f295927ed7b7fd8fd5d17754d720bfc4d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38227
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12761: tests: make version_code() accept two number versions too 15/38715/2
Oleg Drokin [Mon, 23 Sep 2019 12:39:48 +0000 (08:39 -0400)]
LU-12761: tests: make version_code() accept two number versions too

There's now a user in sanity test 103a that calls version_code with
2.6.  Andreas rightfully points instead of fixing the caller we can
just update the code to accept this usage.

Lustre-change: https://review.whamcloud.com/36275
Lustre-commit: 6521dda6f4377c9c688ce4905cd94adf9f99013f

Change-Id: I5915cd08a36946c6d26f2e231aa7a820a3eef46a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38715
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13553 lnd: gracefully handle unexpected events 52/38752/2
Amir Shehata [Wed, 20 May 2020 05:21:10 +0000 (22:21 -0700)]
LU-13553 lnd: gracefully handle unexpected events

When a tx completes kiblnd_tx_complete() callback is invoked.
We ensure:
LASSERT (tx->tx_sending > 0);
However this assert is being triggered in some rare scenarios.
The reason tx_sending would be 0 at this point is because:
 1. ib_post_send() failed but OFED stack is still sending
    a tx complete event.
 2. We're getting two different events for the same tx

Instead of asserting, ignore that tx_complete event and print
the tx pointer and its status.

Lustre-change: https://review.whamcloud.com/38669
Lustre-commit: 60f9f539e686fc19b080a3cda15ade7111bbd4a7

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8cd192538c0c80abaef23a4b6e6906936043060b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38752
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13225 utils: fix install path for bash-completion 70/38670/2
Andreas Dilger [Fri, 8 May 2020 23:28:39 +0000 (17:28 -0600)]
LU-13225 utils: fix install path for bash-completion

Fix the default install path for bash-completion if the package is
not installed at build time.  This avoids BASH_COMPLETION_DIR being
badly formatted in the lustre.spec file.

Lustre-change: https://review.whamcloud.com/38548

Fixes: dfb4afc24102 ("LU-13225 utils: bash completion for lfs and lctl")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie50071c4ff86f57bc9dd53409ae339da2a3ebbe5
Reviewed-on: https://review.whamcloud.com/38670
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13225 utils: bash completion for lfs and lctl 19/38519/5
Andreas Dilger [Sat, 8 Feb 2020 08:25:29 +0000 (01:25 -0700)]
LU-13225 utils: bash completion for lfs and lctl

Add a bash completion for "lfs" and improve completion for "lctl".
Rename the "lctl" completion script to "lustre" since the two
commands share helper routines for fsnames, pools, etc. and install
"lfs" and "lctl" symlinks to the common command file.

The completion prints available sub-commands and their options,
and for some sub-commands it completes available arguments
(e.g. mount points, pool names, and MDT/OST names).

A couple of minor changes to "lfs" and "lctl" usage messages to make
the sub-command options easier to parse.  More needs to be done to
make all sub-commands have proper long options.

There is definitely more that could be added to the completions,
but this is a good start and provides a framework for the future.

Lustre-change: https://review.whamcloud.com/37483
Lustre-commit: dfb4afc24102ee305d4901dc76944f4c91887633

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie989b2ef4c0d6d8565e5c6753205bb6ed83ebbe5
Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/38519
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-11986 libcfs: lnet_remove_debugfs() compat for RHEL6 16/38716/3
Jian Yu [Mon, 25 May 2020 18:27:22 +0000 (11:27 -0700)]
LU-11986 libcfs: lnet_remove_debugfs() compat for RHEL6

Unloading libcfs module on RHEL 6.10 Lustre client with
kernel 2.6.32-754.24.3 hit kernel panic issue. The issue
doesn't exist in Lustre b2_10 where RHEL 6.10 is supported
and debugfs_remove_recursive() is called directly from
lnet_remove_debugfs(). This patch adds compat changes to
lnet_remove_debugfs() to resolve the issue.

Fixes: 9d42660e173e ("LU-11986 lnet: properly cleanup lnet debugfs files")
Fixes: ae93a9f21752 ("LU-11986 libcfs: add compat for d_hash_and_lookup()")
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ib63a40afe8926f56cd1d2873975855c226098418
Reviewed-on: https://review.whamcloud.com/38716
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-10395 osd: stop OI at device shutdown 53/38153/3
Alex Zhuravlev [Tue, 18 Feb 2020 15:04:44 +0000 (18:04 +0300)]
LU-10395 osd: stop OI at device shutdown

and not at obd_cleanup(). otherwise a race is possible:
 umount <MDT> stopping OI vs MGS accessing same OSD which
results in the assertion:
ASSERTION( osd->od_oi_table != NULL && osd->od_oi_count >= 1 )

Lustre-change: https://review.whamcloud.com/37615
Lustre-commit: 2789978e1192dbf6d90399c96b5594e0dc049cd9

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I24fccea718f2e2663166cfb0ff26571039357535
Reviewed-on: https://review.whamcloud.com/38153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-11669 tests: add project in yml_test_group() 84/38684/2
Jian Yu [Wed, 20 May 2020 23:56:37 +0000 (16:56 -0700)]
LU-11669 tests: add project in yml_test_group()

This patch fixes yml_test_group() in yaml.sh to
add test project name, which is required in
results.yml for Maloo to parse.

This patch is back-ported from the following one:
Lustre-commit: 054bb02880d26bacc4e7080869955c2039bbf986
Lustre-change: https://review.whamcloud.com/33658

Test-Parameters: trivial

Change-Id: I0ae563d855dc2d28eaea85e86b1cb23d2428988b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Leonel Ochoa <lochoa@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38684
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-11269 ptlrpc: request's counter in import 40/38340/2
Alex Zhuravlev [Tue, 25 Feb 2020 16:44:18 +0000 (19:44 +0300)]
LU-11269 ptlrpc: request's counter in import

which is separate from imp_refcount as the latter can be
used for other purposes and it's hard to use to track
requests.

to verify the theory that imp_refcount should be checked.

Lustre-change: https://review.whamcloud.com/37722
Lustre-commit: b09afdf57643cbc1c437a42b4babb0837dd19e65

Change-Id: I7c273a73e2b1bb43059c7ed003ee2b7d09273bfe
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-11623 llite: hash just created files if lock allows 63/38763/3
Oleg Drokin [Tue, 6 Nov 2018 00:26:44 +0000 (19:26 -0500)]
LU-11623 llite: hash just created files if lock allows

If open|creat (and other intent operations later) returned a lookup
bit as part of the lock, hash the resultant dentry under this lock,
not to trigger further RPCs in subsequent lookups.

Lustre-change: https://review.whamcloud.com/33584
Lustre-commit: fc42cbe0e2e5d1d87d0edca73986b831ac718301

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Change-Id: Id5140d1042af7f5ab9052922e11a7eda8f92a29a
Reviewed-on: https://review.whamcloud.com/38763
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13357 lod: implement striped directory .dio_lookup 91/38691/6
Lai Siyao [Thu, 12 Mar 2020 00:35:20 +0000 (08:35 +0800)]
LU-13357 lod: implement striped directory .dio_lookup

Add function lod_striped_lookup() for
lod_striped_index_ops.dio_lookup to allow name lookup under striped
directory.

Currently this is used by subdir mount, which needs to lookup FID
of the subdir on server side.

Function lfsck_namespace_repair_dirent() should call dt_lookup() with
bottom object, because child may be shard.

Add sanity 247f.

Lustre-change: https://review.whamcloud.com/37903
Lustre-commit: 42b0304e2571a80effe5bc4ab6fb58acfabb361d

Change-Id: Iba844d1a34a318bcbd42b00186ed6fa9d165effc
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38691
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoNew release 2.12.5 2.12.5 v2_12_5
Oleg Drokin [Mon, 8 Jun 2020 13:18:56 +0000 (09:18 -0400)]
New release 2.12.5

Change-Id: I2bd2c42ba57730856fe454999f67b870b41330e8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 years agoNew Rc 2.12.5-RC1 2.12.5-RC1 v2_12_5-RC1
Oleg Drokin [Wed, 27 May 2020 22:50:20 +0000 (18:50 -0400)]
New Rc 2.12.5-RC1

Change-Id: I2db2a133d8d8fc8479cc36e3714f4f62b2ea2dd5

3 years agoLU-13416 ldiskfs: don't corrupt data on journal replay 05/38705/4
Alexey Lyashkov [Mon, 20 Apr 2020 09:45:52 +0000 (12:45 +0300)]
LU-13416 ldiskfs: don't corrupt data on journal replay

Journalled write want a special attention on blocks release,
revoke records must added to avoid replace new write blocks
with stale data. Mark inode as "journal write" to generate
valid revoke records. Large EA inode updates affected
with this bug also.

large ea fix is

Linux-commit: ddfa17e4adc4bd19c32216aaa6250dc38b0579df
Author: Tahsin Erdogan <tahsin@google.com>
Date:   Wed Jun 21 21:36:51 2017 -0400
ext4: call journal revoke when freeing ea_inode blocks

Lustre-change: https://review.whamcloud.com/38281
Lustre-commit: a23aac2219047cb04ed1fa555f31fa39e5c499dc

Change-Id: I605128c4ba70331a48715dc95546430909efb893
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13589 utils: fix lfs setstripe unit parsing 80/38680/2
Andreas Dilger [Wed, 20 May 2020 18:19:32 +0000 (12:19 -0600)]
LU-13589 utils: fix lfs setstripe unit parsing

The "size_units" variable was not being reset while parsing different
"lfs setstripe" arguments, so e.g. "lfs setstripe -E 1M -S 65536 ..."
ended up using the 'M' unit for the stripe size, which resulted in a
stripe_size of 65536MiB = 64GiB, which resulted in an error.

This only appeared with PFL or other layout patterns which had more
than one unit being parsed, and was already fixed in master via SEL.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib3f9be86f5104aaea4f77d87853255a518cbc3a0
Reviewed-on: https://review.whamcloud.com/38680
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13535 lfsck: fix possible PFL layout corruption 85/38585/5
Mikhail Pershin [Tue, 12 May 2020 20:32:22 +0000 (23:32 +0300)]
LU-13535 lfsck: fix possible PFL layout corruption

While checking lmm_oi in composite layout the pointer to 'lmm'
is re-assigned to component entry but the same pointer is used
for LOV EA buffer to update EA. Therefore if lmm_oi was fixed in
some component then just current entry is saved as new layout.

Lustre-change: https://review.whamcloud.com/38584
Lustre-commit: be009cb4a73b3bef7302083bec7d1d6289d515b7

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ifbd984a71b383ab4ca35ad59ed9cd8cf57b6d7cc
Reviewed-on: https://review.whamcloud.com/38585
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
3 years agoLU-13111 kernel: new kernel [SLES12 SP5 4.12.14-122.20.1] 40/38640/3
Jian Yu [Sun, 17 May 2020 07:39:39 +0000 (00:39 -0700)]
LU-13111 kernel: new kernel [SLES12 SP5 4.12.14-122.20.1]

This patch makes changes to support new SLES12 SP5 release
for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 817" testlist=sanity

Change-Id: Ia4b856b03801e02da9a2e584efeb8759b4dd30c3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13556 kernel: kernel update RHEL7.8 [3.10.0-1127.8.2.el7] 35/38635/2
Jian Yu [Sat, 16 May 2020 23:56:34 +0000 (16:56 -0700)]
LU-13556 kernel: kernel update RHEL7.8 [3.10.0-1127.8.2.el7]

Update RHEL7.8 kernel to 3.10.0-1127.8.2.el7.

Test-Parameters: trivial clientdistro=el7.8 serverdistro=el7.8

Change-Id: If7ac6f4b5f1fe32a15c63f51589a2e320001b4a5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38635
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13488 kernel: new kernel [RHEL 8.2 4.18.0-193.1.2.el8] 61/38461/5
Jian Yu [Fri, 22 May 2020 18:06:34 +0000 (11:06 -0700)]
LU-13488 kernel: new kernel [RHEL 8.2 4.18.0-193.1.2.el8]

This patch makes changes to support new RHEL 8.2 release
for Lustre client.

Test-Parameters: trivial clientdistro=el8.2 \
env=SANITY_EXCEPT="130" testlist=sanity

Change-Id: Icb1db3afd2e94423a45354acfdd559f8f1e294cb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38461
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12904 o2ib: ib_destroy_cq() returns void 89/38489/3
Shaun Tancheff [Mon, 4 May 2020 22:03:38 +0000 (15:03 -0700)]
LU-12904 o2ib: ib_destroy_cq() returns void

Kernel destroy CQ flows can't fail and the returned value of
ib_destroy_cq() is not interested in those flows.

kernel-commit: 890ac8d97e6722a9e4a66a0bd836d1b028d075fe

This patch is back-ported from the following one:
Lustre-commit: 7d2ea1e5bbd80f23e6935174c969b34b58048443
Lustre-change: https://review.whamcloud.com/36578

Test-Parameters: trivial
Cray-bug-id: LUS-8042
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I873bf76a33bd80d5e6de4d1b16a79ff5ea930f3a
Reviewed-on: https://review.whamcloud.com/38489
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12634 lnet: for_ifa removed. Use in_dev_for_each_ifa_rtnl 87/38487/3
Shaun Tancheff [Mon, 4 May 2020 21:59:47 +0000 (14:59 -0700)]
LU-12634 lnet: for_ifa removed. Use in_dev_for_each_ifa_rtnl

Linux 5.3 removed for_ifa and replaced it with an _rntl and _rcu
versions for use with their respective locking primitives.

kernel-commit: ef11db3310e272d3d8dbe8739e0770820dd20e52

This patch is back-ported from the following one:
Lustre-commit: 6e0d0146276353559c821916e193c90d167b14e0
Lustre-change: https://review.whamcloud.com/35744

Test-Parameters: trivial
Cray-bug-id: LUS-7689
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: Iea07222b9abb3f9c219d28fe2c660d9eaf21af80
Reviewed-on: https://review.whamcloud.com/38487
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12400 ptlrpc: Sun RPC changes for RCU locking 83/38483/3
Shaun Tancheff [Mon, 4 May 2020 20:16:00 +0000 (13:16 -0700)]
LU-12400 ptlrpc: Sun RPC changes for RCU locking

In kernel 4.20 SUNRPC cache_detail->hash_lock changed to spinlock_t

  Now that the reader functions are all RCU protected, use a regular
  spinlock rather than a reader/writer lock.

Linux-commit: 1863d77f15da0addcd293a1719fa5d3ef8cde3ca

This patch is back-ported from the following one:
Lustre-commit: 77d53777e32c80047cb75293d5f9a4c0d23bbea8
Lustre-change: https://review.whamcloud.com/35499

Test-Parameters: trivial
Cray-bug-id: LUS-7600
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: If0df38337d5a2bb0ac4b8cb645dbe89f65e0f352
Reviewed-on: https://review.whamcloud.com/38483
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12355 llite: include file linux/selinux.h removed 80/38480/3
Shaun Tancheff [Mon, 4 May 2020 20:01:34 +0000 (13:01 -0700)]
LU-12355 llite: include file linux/selinux.h removed

In kernel 5.1 linux/selinux.h was removed with
SELinux: Remove unused selinux_is_enabled

Linux-commit: 3d252529480c68bfd6a6774652df7c8968b28e41

This patch is back-ported from the following one:
Lustre-commit: 39e5bfa73414d18738001761b42ea0e3264c2983
Lustre-change: https://review.whamcloud.com/35035

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: If963e6b22b7b07899de5b970f934bb157c5f7cec
Reviewed-on: https://review.whamcloud.com/38480
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12355 llite: Lustre specific iov_for_each broken (removed) 79/38479/3
Shaun Tancheff [Fri, 22 May 2020 17:59:40 +0000 (10:59 -0700)]
LU-12355 llite: Lustre specific iov_for_each broken (removed)

Kernel 4.20 introduced iov_iter_type and broke iov_for_each

As iov_for_each is only used once so drop the macro entirely.
When iov_iter_type is available ignore invalid iter types.

Linux-commit: 8a363970d1dc38c4ec4ad575c862f776f468d057

Kernel 3.15 added type to iov_iter. Use the type to provide
a sensible replacement for iov_iter_type when it is available.

Linux-commit: 71d8e532b1549a478e6a6a8a44f309d050294d00

This patch is back-ported from the following one:
Lustre-commit: d93aa0171a25f8ffca51bed35a2d477a45fda0f3
Lustre-change: https://review.whamcloud.com/35024

Cray-bug-id: LUS-6962
Change-Id: I97cdce1c85803ac2d4436d4eedf67a545ea2cdb8
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-on: https://review.whamcloud.com/38479
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12400 llite: Use the new vm_fault_t type 78/38478/3
Shaun Tancheff [Fri, 22 May 2020 17:49:45 +0000 (10:49 -0700)]
LU-12400 llite: Use the new vm_fault_t type

Linux 4.17 created the new vm_fault_t type

Linux-commit: 1c8f422059ae5da07db7406ab916203f9417e396

Linux 5.1 changed the vm_fault_t type to bitwise unsigned int
which changes the interfaces registered to struct vm_operations_struct

Linux-commit: 3d3539018d2cbd12e5af4a132636ee7fd8d43ef0

Prefer to match the upstream API and fallback to 'int'
where vm_fault_t is not available.

This patch is back-ported from the following one:
Lustre-commit: f2b224a48cb00f885b9df2cc56e349dae5f27f9e
Lustre-change: https://review.whamcloud.com/35500

Test-Parameters: trivial
Cray-bug-id: LUS-7600
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I7122fb0d4af3ee9a19c1a5d0b77c4f13f6850181
Reviewed-on: https://review.whamcloud.com/38478
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13168 tests: verify truncated xattr is handled 04/38604/4
Andreas Dilger [Thu, 30 Apr 2020 22:20:01 +0000 (16:20 -0600)]
LU-13168 tests: verify truncated xattr is handled

Verify that a truncated trusted.lov xattr is handled properly,
for both plain and PFL layouts.

Add a test case that verifies this is fixed for both layout types.

Lustre-change: https://review.whamcloud.com/38434
Lustre-commit: cb74546354201434a6fd3d53acd1a0808fbfcb1c

Fixes: f2d06d3c76 ("LU-12911 llite: Don't access lov_md fields before size check")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I11d420c7fdc2362f64689a545b95c76e893ebbe5
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13131 osc: Ensure immediate departure of sync write pages 10/38610/4
Oleg Drokin [Wed, 1 Apr 2020 05:38:16 +0000 (01:38 -0400)]
LU-13131 osc: Ensure immediate departure of sync write pages

Except for the case of direct-io and server-lock, we are
hold potentially multiple locks that are next to impossible
to find and cross reference.
So instead just send it all right away - should only
be a factor in rare cases of out of quota or close to out
of space.

Lustre-change: https://review.whamcloud.com/38453
Lustre-commit: 13b7cf4fabdd55977b68eb856bfdc82f0a349e73

Change-Id: I961cd9ba7f3266d22dfc5eff758c2f4ebbe148a4
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13131 osc: Do not wait for grants for too long 72/38672/2
Oleg Drokin [Thu, 19 Mar 2020 19:24:28 +0000 (15:24 -0400)]
LU-13131 osc: Do not wait for grants for too long

obd_timeout is way too long considering we are holding a lock
that might be contended. If OST is slow to respond, we might
get evicted, so limit us to a half of the shortest possible
max wait a server might have before switching to synchronous IO.

Lustre-change: https://review.whamcloud.com/38283
Lustre-commit: 1eee11c75ca13745d083410e1ced3a1a8b088ee9

Change-Id: I36653194c1b8b95ba3cc2ed9240df7b0888cf7ed
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38672
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-9679 osc: convert cl_cache_waiters to a wait_queue. 75/38575/4
Oleg Drokin [Tue, 12 May 2020 00:05:32 +0000 (20:05 -0400)]
LU-9679 osc: convert cl_cache_waiters to a wait_queue.

cli->cl_cache_waiters is a list of tasks that need
to be woken when grant-space becomes available.  This
means it is acting much like a wait queue.
So let's change it to really be a wait queue.

The current implementation adds new waiters to the end of the list,
and calls osc_enter_cache_try() on each in order.  We can provide the
same behaviour by using an exclusive wait, and having each waiter wake
the next task when it succeeds.

If a waiter notices that success has become impossible, it wakes all
other waiters.

If a waiter times out, it doesn't wake other - just leaves them to
time out themselves.

Note that the old code handled -EINTR from the wait function.  That is
not a possible return value when wait_event_idle* is used, so that
case is discarded.

As we need wait_event_idle_exclusive_timeout_cmd(), we should fix the
bug in that macro - the "might_sleep()" is wrong, as a spinlock might
be held at that point.

Linux-Commit: 31f45f56ecdf ("lustre: osc_cache: convert
cl_cache_waiters to a wait_queue.")

Lustre-change: https://review.whamcloud.com/37605
Lustre-commit: b2ede01d1ed77ddc512c013220f6ea8b509e9541

Change-Id: Ib7622ea2daea8f6e59bef95d3b6c5a80d209b81e
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/38575
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13131 osc: Always send all HP RPCs requests 68/37468/8
Oleg Drokin [Fri, 7 Feb 2020 01:21:43 +0000 (20:21 -0500)]
LU-13131 osc: Always send all HP RPCs requests

that are resulting from lock cancel activity.

Lustre-change: https://review.whamcloud.com/38057
Lustre-commit: 0f1743916be6605fcd8f57993d6ce7d8d06ce12c

Change-Id: I4007167f0f39b0699e977c14bd160f475d8288ad
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37468
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-9859 libcfs: initialize bit_wait_table 74/38574/4
James Simmons [Fri, 19 Jul 2019 12:38:44 +0000 (08:38 -0400)]
LU-9859 libcfs: initialize bit_wait_table

With older platforms wait_event_var() is missing so we needed to
provide are own version. This included creating a local
bit_wait_table like the kernel has. That table was never properly
initialized so it can cause failures under the right conditions.

Test-Parameters: trivial

Lustre-commit: https://review.whamcloud.com/35567
Lustre-change: 916643c4b8d136c9c96afa5cd21d1eb6030d148c

Fixes: 372ef85512 ("LU-11089 obd: use wait_event_var() in lu_context_key_degister()")
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I310d37da7c1b54166224b367446cc905c02ab8bc
Reviewed-on: https://review.whamcloud.com/38574
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11089 obd: use wait_event_var() in lu_context_key_degister() 73/38573/4
NeilBrown [Tue, 21 May 2019 14:22:39 +0000 (10:22 -0400)]
LU-11089 obd: use wait_event_var() in lu_context_key_degister()

lu_context_key_degister() has an open coded loop which calls
schedule() without setting a new task state.  This is generally
a bad idea - it could easily just spin.

Instead, use wait_event_var() to wait for ->lct_used to be zero,
and arrange to get a wakeup when that happens.
Previously ->lct_used would only fall down to 1.  Now we decrement
it an extra time so that wake_up, which only happens when the
count reaches zero, will only happen when lu_context_key_degister()
is actually waiting for it.

Note that this patch removes key_fini() from protection by
lu_keys_guard.  key_fini() calls are not always protected
by a lock, and there seems to be no need here.  Nothing else
can be acting on the given key in that context at this point,
so no race is possible.

Linux-commit: ef84c07364211bb4e398a9de45d1c13a32059cee
Lustre-commit: 372ef85512dd2a722415fba9a3df66f81029508b
Lustre-change: https://review.whamcloud.com/33667

Change-Id: I9514bd21916f75fce00e393612967fb197e3a1c4
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/38573
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12911 llite: Don't access lov_md fields before size check 33/38433/3
Mr NeilBrown [Mon, 28 Oct 2019 01:24:26 +0000 (12:24 +1100)]
LU-12911 llite: Don't access lov_md fields before size check

When 'struct lov_user_md' is passed in via setxattr, it comes with
a size.  If thatt size is too small, some function that check exactly
what version is present might access beyond the end of allocation
memory, which can have undesirable effects, such as triggering
a KASAN warning (and possibly worse).

So check that the size is sane before looking inside the structure
at all.

Lustre-change: https://review.whamcloud.com/36589
Lustre-commit f2d06d3c76a1d69447e7bd6fd29d8165be558d73

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib3f071a3ff77a039fdfa38c903d87999108b3322
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/38433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13102 llog: fix processing of a wrapped catalog 41/38541/2
Alexander Boyko [Mon, 16 Dec 2019 13:24:16 +0000 (08:24 -0500)]
LU-13102 llog: fix processing of a wrapped catalog

The logic for rereading a llog buffer had an exception
for a full catalog, when lgh_last_idx = llh_cat_idx and a first
processing index is a llh_cat_idx+1. This check is based on
a value lh_last_idx, which stays unchanged between buffer read.
But llh_cat_idx could go forward, and this lead to a wrong check
where reread doesn't happen. As a result Lustre got ENOENT for
a record and stoped osp processing.

llog_cat_set_first_idx())
catlog [0x6:0x1:0x0] first idx 34730, last_idx 34731
osp_sync_process_queues()) 1 changes, 0 in progress, 0 in flight
llog_process_thread())
stop processing plain 0x76941:1:0 index 64767 count 1
llog_process_thread())
index: 34731, lh_last_idx: 34730 synced_idx: 34730 lgh_last_idx: 34731
llog_cat_process_common()) processing log [0x2780f:0x1:0x0]:0
at index 34731 of catalog [0x6:0x1:0x0]
llog_cat_id2handle()) snx11281-OST0001-osc-MDT0001:
error opening log id [0x2780f:0x1:0x0]:0:rc = -2

The patch fixes logic and also adds/changes debugging for
llog and osp.

Lustre-change: https://review.whamcloud.com/37102
Lustre-commit: a4f049b96562fd502b1948fb082767351e040a1c

Cray-bug-id: LUS-8193
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I9463223a1ea904b96643b19e1580f5894142c12b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38541
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13274 uapi: fix build on older kernels 20/38520/3
Andreas Dilger [Thu, 7 May 2020 00:08:14 +0000 (18:08 -0600)]
LU-13274 uapi: fix build on older kernels

The recent changes to lustre_user.h broke building on older kernels,
because it resulted in the user tools including both <linux/quota.h>
and <sys/quota.h>, which results in conflicts for the declaration of
the quotactl() function.

Also, restore the compat declaration of __ALIGN_KERNEL(), though it
only in lustre_user.h, since it is included in other headers anyway.

Test-Parameters: trivial
Fixes: 0417dce9fc75 ("LU-13274 uapi: make lustre UAPI headers C99 compliant")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9b31cabaf2732eb5872e88686e6af7d12e7d3564
Reviewed-on: https://review.whamcloud.com/38520
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12030 tests: Properly detect debug kernel use on rhel7.6 72/38572/4
Oleg Drokin [Thu, 28 Feb 2019 04:40:37 +0000 (23:40 -0500)]
LU-12030 tests: Properly detect debug kernel use on rhel7.6

kmalloc-128 slab seems to be gone, so let's use dma-kmalloc-128
instead.

Lustre-commit: 673e8b0b3c404b14a02e7e874c3aed991bc52eb3
Lustre-change: https://review.whamcloud.com/34342

Test-Parameters: trivial testlist=conf-sanity env=ONLY=63
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Change-Id: Ice7f350ba2bc6cc733c0a98b0037e6f0980216c9
Reviewed-on: https://review.whamcloud.com/38572
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12846 mdd: return error while delete failed 70/37570/3
Yang Sheng [Thu, 16 Apr 2020 15:22:54 +0000 (23:22 +0800)]
LU-12846 mdd: return error while delete failed

Since we use a global buffer, So avoid to replace the
index name while iterate the orphan directory. Also
return error code in mdd_orphan_destroy while dt_delete
failed. Else will cause a dead loop.

Fixes: e1ace3751f ("LU-8514 mdd: transaction failure should be checked")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I6fc3e992333ffa61900074309223555264cfe66b
Reviewed-on: https://review.whamcloud.com/37570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <c17828@cray.com>
4 years agoLU-11986 libcfs: add compat for d_hash_and_lookup() 29/38529/4
Andreas Dilger [Thu, 7 May 2020 11:19:58 +0000 (05:19 -0600)]
LU-11986 libcfs: add compat for d_hash_and_lookup()

Add compatibility for older kernels that do not export the
d_hash_and_lookup() function, even though it exists forever.

Fixes: 9d42660e173e ("LU-11986 lnet: properly cleanup lnet debugfs files")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I10d6547f6b17665880cacfbf87d4dbfd386b92ff
Reviewed-on: https://review.whamcloud.com/38529
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12236 gss: remove unused code in gss_svc_upcall.c 76/38476/2
Aurelien Degremont [Mon, 4 May 2020 18:58:40 +0000 (11:58 -0700)]
LU-12236 gss: remove unused code in gss_svc_upcall.c

Delete rsc_flush() related functions which are never
used.

This patch is back-ported from the following one:
Lustre-commit: 25b0bf5a23032394055b7b94b3169f5cf4068570
Lustre-change: https://review.whamcloud.com/34794

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Change-Id: Iedd6339b5fafdea81147c83e5f0499fa3ad60251
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/38476
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13228 clio: mmap write when overquota 92/38292/2
Alexander Zarochentsev [Fri, 20 Dec 2019 23:19:44 +0000 (02:19 +0300)]
LU-13228 clio: mmap write when overquota

Flagging client by overquota flag should not
cause mmap write access to sigbus the app.

Lustre-change: https://review.whamcloud.com/37495
Lustre-commit: b651089f859e8269af7272b91f5e60aa25f24226

Cray-bug-id: LUS-8221
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I29d5901fa5078b5cfca40391a02531cf27efce93
Reviewed-on: https://review.whamcloud.com/38292
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12402 lnet: handle recursion in resend 67/38367/2
Amir Shehata [Sat, 25 Apr 2020 17:19:32 +0000 (10:19 -0700)]
LU-12402 lnet: handle recursion in resend

When we're resending a message we have to decommit it first. This
could potentially result in another message being picked up from the
queue and sent, which could fail immediately and be finalized, causing
recursion. This problem was observed when a router was being shutdown.

This patch uses the same mechanism used in lnet_finalize() to limit
recursion. If a thread is already finalizing a message and it gets
into path where it starts finalizing a second, then that message
is queued and handled later.

Lustre-change: https://review.whamcloud.com/35431
Lustre-commit: ad9243693c9a5a5b2c34165ad853ddf5ceec4617

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iace64c7ddb1f56a0a63b030df6a5ab103ae6c645
Reviewed-on: https://review.whamcloud.com/38367
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12436 lov: return error if cl_env_get fails 60/38360/2
Shaun Tancheff [Thu, 13 Jun 2019 22:51:38 +0000 (17:51 -0500)]
LU-12436 lov: return error if cl_env_get fails

When cl_env_get() fails with an error return the error.

Lustre-change: https://review.whamcloud.com/35229
Lustre-commit: a7997c836bbfe2a0674007f1c23b9593e596e0ba

Test-Parameters: trivial
Cray-bug-id: LUS-7310
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: Ia065aeb142a772f4d620b84111af423e27c06b90
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38360
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13369 kernel: kernel update RHEL7.7 [3.10.0-1062.18.1.el7] 41/38241/3
Jian Yu [Tue, 21 Apr 2020 01:08:02 +0000 (18:08 -0700)]
LU-13369 kernel: kernel update RHEL7.7 [3.10.0-1062.18.1.el7]

Update RHEL7.7 kernel to 3.10.0-1062.18.1.el7.

Test-Parameters: trivial clientdistro=el7.7 serverdistro=el7.7

Change-Id: Ifc00fca35a0ad28ba8326e56e693ea39360a8114
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38241
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13347 kernel: RHEL 7.8 server support 44/38144/6
Jian Yu [Wed, 15 Apr 2020 05:23:14 +0000 (22:23 -0700)]
LU-13347 kernel: RHEL 7.8 server support

This patch makes changes to support new RHEL 7.8 release
for Lustre server (kernel 3.10.0-1127.el7).

Test-Parameters: trivial clientdistro=el7.8 serverdistro=el7.8

Change-Id: I4817fd2f9512111aa7d26109454104945fd2778f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38144
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM 66/38266/2
Ann Koehler [Mon, 14 Oct 2019 16:30:56 +0000 (11:30 -0500)]
LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM

Another path through ptl_send_rpc() can cause the assert reported
in LU-10643. The assertion in ptlrpc_register_bulk() on
!desc->bd_registered fails when an rpc is resent and the first
send attempt failed to successfully attach the reply buffer. The
bulk error cleanup in ptl_send_rpc() does not reset the
bd_registered flag.

Lustre-change: https://review.whamcloud.com/36309
Lustre-commit: e6225c07ce4c0037a127a41b2bc539364dfd1f4d

Cray-bug-id: LUS-7946
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I474211f196ea9bd83a036747e25c91c37c85ffbb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13063 tests: stop running sanity test 411 60/38260/3
James Nunez [Sat, 18 Apr 2020 09:11:14 +0000 (02:11 -0700)]
LU-13063 tests: stop running sanity test 411

sanity test 411 hits a kernel bug for RHEL 8.1.  Since this
is an issue with the kernel and not Lustre, let's stop
running this test until the kernel is patched.  Thus, we
need to add sanity test 411 to the ALWAYS_EXCEPT list.

Also change the ALWAYS_EXCEPT condition for test smoke for
lnet-selftest to be based on kernel version and not
architecture, so that the custom test for this patch can
pass.

This patch is back-ported from the following one:
Lustre-commit: 34e4c37474b3d9328aac1dd9228a018ac7d4f47e
Lustre-change: https://review.whamcloud.com/37270

Test-Parameters: trivial clientdistro=el8.1
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I60174dcd4776b53ac5b44be6c208d40e1f022445
Reviewed-on: https://review.whamcloud.com/38260
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12661 tests: skip sanity 817 if kernel version >= 4.14 59/38259/3
Li Dongyang [Sat, 18 Apr 2020 08:42:06 +0000 (01:42 -0700)]
LU-12661 tests: skip sanity 817 if kernel version >= 4.14

sanity test_817 is in the ALWAYS_EXCEPT list for aarch64,
however it's failing because the test was done on kernel-alt
which is 4.14.x, it's not related with the architecture.

On new kernels nfsd is not releasing the file after write,
it will fail with ETXTBSY regardless of whether the nfs export
is backed by a lustre mount or not.

Skip the test on new kernels for now.

This patch is back-ported from the following one:
Lustre-commit: 4fed33473ca2964ff19f61fdb8501b2210f923de
Lustre-change: https://review.whamcloud.com/36712

Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ie18ceb961eee2313fca7d60a35159a7496075029
Reviewed-on: https://review.whamcloud.com/38259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12852 pfl: restrict the stripe count correctly 12/37512/5
Emoly Liu [Fri, 6 Dec 2019 02:08:07 +0000 (10:08 +0800)]
LU-12852 pfl: restrict the stripe count correctly

In function lod_get_stripe_count(), when restricting the stripe
count to the maximum xattr size, the xattr overhead should be
taken into count correctly.

Lustre-change: https://review.whamcloud.com/36947
Lustre-commit: 6dc37759cfb22727ac5d776c38b72e8638563fd8

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ief548e47ce4d375f2e189860ccfe05d0f3c7e890
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37512
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13224 utils: expose llapi_param* functions 39/38239/2
Gian-Carlo DeFazio [Tue, 11 Feb 2020 21:08:18 +0000 (13:08 -0800)]
LU-13224 utils: expose llapi_param* functions

Added an interface to find files in sysfs,procfs,etc.,
and read them into a file or buffer.
The interface is meant to return the same results as
the "lctl list_param" and "lctl get_param" commands,
but doesn't require starting an lctl process.

The interface is being used by lctl for parts of
its "list_param" and "get_param" commands.

The output of lctl get_param is altered slightly.
Previously, for some multi-line files lctl would
print the param name and first line of the file on
the same line such as:

param=line 1
line 2
line 3

Now multi-line files consistently put a newline
after the param name and before the content such as:

param=
line 1
line 2
line 3

Added debug output to failing test sanity 300d.

Lustre-commit: 9b44cf70a95b2ee97a17697dc97fbe462ad1f5b9
Lustre-change: https://review.whamcloud.com/37545

Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I2726b27b0042d58c97284f8348970deb6efc43d1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38239
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13324 llite: page delete race 11/38111/3
Bobi Jam [Tue, 10 Mar 2020 13:12:53 +0000 (21:12 +0800)]
LU-13324 llite: page delete race

Page could be raced by truncate during its deletion by
another thread, check vmpage mapping before accessing it.

Lustre-change: https://review.whamcloud.com/37793
Lustre-commit: 8ccaed513006f0582578ff42c9917b5aa76f2f4b

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3172f5ffd3928926b16ab6fd7362b05da0c7cfd5
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12904 gss: struct cache_detail readers changed to writers 93/38193/3
Shaun Tancheff [Fri, 10 Apr 2020 05:50:25 +0000 (22:50 -0700)]
LU-12904 gss: struct cache_detail readers changed to writers

Linux 5.3 changed struct cache_detail readers to writers
SUNRPC: Track writers of the 'channel' file to improve ...

kernel-commit: 64a38e840ce5940253208eaba40265c73decc4ee

This patch is back-ported from the following one:
Lustre-commit: 81fb8bc7d214394bbc504379a5a84441e871b60a
Lustre-change: https://review.whamcloud.com/36580

Test-Parameters: trivial
Cray-bug-id: LUS-8042
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I7750303937cd6fc560e458efa79f25e521fefec7
Reviewed-on: https://review.whamcloud.com/38193
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13355 crypto: crypto engine wrappers in libcfs 05/38205/2
Sebastien Buisson [Fri, 10 Apr 2020 16:20:17 +0000 (18:20 +0200)]
LU-13355 crypto: crypto engine wrappers in libcfs

libcfs has wrappers in order to be able to use crypto engines as
standard crypto modules. But libcfs should not be considered as the
owner of these implementations.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1e22907c10837042606095ed6089c3c675fffe27
Reviewed-on: https://review.whamcloud.com/38205
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13347 kernel: new kernel [RHEL 7.8 3.10.0-1127.el7] 43/38143/6
Jian Yu [Fri, 10 Apr 2020 05:54:56 +0000 (22:54 -0700)]
LU-13347 kernel: new kernel [RHEL 7.8 3.10.0-1127.el7]

This patch makes changes to support new RHEL 7.8 release
for Lustre client.

Test-Parameters: trivial clientdistro=el7.8

Change-Id: I82f89495d5ab1d46a539a016a899307d7c8f37b7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38143
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13101 llite: eviction during ll_open_cleanup() 47/38147/2
Andriy Skulysh [Tue, 25 Feb 2020 16:04:32 +0000 (11:04 -0500)]
LU-13101 llite: eviction during ll_open_cleanup()

On error ll_open_cleanup() is called while
intent lock remains pinned. So eviction can
happen while close request waits for a mod rpc slot.

Release intent lock before ll_open_cleanup()

Lustre-change: https://review.whamcloud.com/37096
Lustre-commit: 6d5d7c6bdb4f19f9db485a774d9259d452cf220e

Change-Id: Ia422351f3f54fc652078f742f2ead0bf278c9d17
Cray-bug-id: LUS-8055
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/37096
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38147
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
4 years agoLU-13136 dom: check read-on-open buffer presents in reply 88/38188/2
Mikhail Pershin [Wed, 15 Jan 2020 11:55:18 +0000 (14:55 +0300)]
LU-13136 dom: check read-on-open buffer presents in reply

The ll_dom_finish_open() uses req_capsule_has_field() wronly,
it check only format but not buffer presence in reply, that
causes unneeded console errors about missing buffer later in
req_capsule_server_get()

Patch replaces that with req_capsule_field_present() to check
if server pack that field in reply or not and properly skip
responses from an old server.

Lustre-change: https://review.whamcloud.com/37249
Lustre-commit: 58bea527100b50abf3df2dbab0ed6d6b42b69d86

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia6114879c90e3e6b8c5020c4912e988cad90df30
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38188
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13131 osc: Make sure we don't accidentally deprioritize hp requests 18/37918/3
Oleg Drokin [Sat, 14 Mar 2020 07:11:29 +0000 (03:11 -0400)]
LU-13131 osc: Make sure we don't accidentally deprioritize hp requests

Looks like in some cases HP requests could migrade off HP list back
onto urgent list

Lustre-change: https://review.whamcloud.com/37967
Lustre-commit: f1c55f97d2d4b21fe87031b2bddfb9813f7df488

Change-Id: I96bf0a3a005b166f34dba215463c0806dfe2526a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37918
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
4 years agoLU-12773 tests: sanity test_805 Use do_facet 01/37901/6
Oleg Drokin [Tue, 17 Sep 2019 05:23:26 +0000 (01:23 -0400)]
LU-12773 tests: sanity test_805 Use do_facet

do_node cannot really work with $SINGLEMDS, that's the
facet name.

This fixes error message below (and a following syntax error):
mds1: ssh: Could not resolve hostname mds1: Name or service not known

Lustre-change: https://review.whamcloud.com/36204
Lustre-commit: fa8b581aea27acc00f6ac48e76ec261dddf8631a

Fixes: 106abc184d8b ("LU-8856 osd: mark specific transactions netfree")
Test-Parameters: trivial fstype=zfs
Change-Id: I0d842dbccbfd934c524ae01cca7399dd52158064
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13377 llite: fix dead loop for short write 63/38163/2
Wang Shilong [Sat, 21 Mar 2020 01:58:09 +0000 (09:58 +0800)]
LU-13377 llite: fix dead loop for short write

|->vvp_io_write_start()
 |->__generic_file_write_iter()
    |->iov_iter_advance() if write succeed()
  |->vvp_io_write_commit()
     |->update ci_nob

The problem is we will move forward iov iter inside
__generic_file_write_iter(), but @ci_nob will be updated
after vvp_io_write_commit().

If out of quota or some other problems happen, this could
cause a mismatch with @ci_nob and @vui_iter.

And @vui_iter->count will be reset using @ci_nob in
iov_iter_reexpand(), this will make @vui_iter->count
more than what it really left, and we could dead loop
in vvp_mmap_locks() if IO need be retried or restarted:

vvp_io_write_lock+0x45/0x80 [lustre]
cl_io_lock+0x5f/0x3d0 [obdclass]
cl_io_loop+0x92/0x190 [obdclass]
ll_file_io_generic+0x7b3/0xc90 [lustre]
ll_file_aio_write+0x12d/0x1f0 [lustre]
ll_file_write+0xce/0x1e0 [lustre]
vfs_write+0xc0/0x1f0
SyS_write+0x7f/0xf0
system_call_fastpath+0x22/0x27

Lustre-change: https://review.whamcloud.com/38018
Lustre-commit: 13dfe0df4956afb50b323a11615b0b34ed014e53

Change-Id: I5fb4c18cf02fb17bf50122b63decacef678caa01
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/38163
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12988 ldiskfs: skip non-loaded groups at cr=0/1 39/37539/5
Alex Zhuravlev [Thu, 28 Nov 2019 12:04:25 +0000 (15:04 +0300)]
LU-12988 ldiskfs: skip non-loaded groups at cr=0/1

cr=0 is supposed to be an optimization to save CPU cycles,
but if buddy data (in memory) is not initialized then all
this makes no sense as we have to do sync IO taking a lot
of cycles.  also, at cr=0 mballoc doesn't store any avaibale
chunk. cr=1 also skips groups using heruistic based on avg.
fragment size.
it's more useful to skip such groups and switch to cr=2 where
groups will be scanned for available chunks.

using sparse image and dm-slow virtual device of 120TB was
simulated. then the image was formatted as OST and filled
using debugfs to mark ~85% of available space as busy.
mount as OST w/o the patch couldn't complete in half an hour
(according to vmstat it would take ~10-11 hours). with the
patch applied mount took ~20 seconds.

Lustre-change: https://review.whamcloud.com/36891
Lustre-commit: 6a7a700a1490dfde6b60c2fb36df92a052059866

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I88c8c1b01b386af0fa438bfeb97acb6110bd00ec
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Artem Blagodarenko <c17828@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37539
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13157 mdd: migrate readlink from wrong place 12/38112/2
Lai Siyao [Mon, 20 Jan 2020 04:23:45 +0000 (12:23 +0800)]
LU-13157 mdd: migrate readlink from wrong place

In osd_read(), if symlink name buf length is smaller than i_size,
return -EOVERFLOW, and compare inline data length with i_size
instead of buflen.

Updated sanity.sh 230b.

Lustre-change: https://review.whamcloud.com/37285
Lustre-commit: a3b30423c6067b3e8644ecfb3269f8837af7e4cd

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia13b1f1079efc4ebd22ec400f1a909ff7ec2095d
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38112
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12345 ldiskfs: optimize nodelalloc mode 38/37538/5
Artem Blagodarenko [Tue, 28 May 2019 16:51:21 +0000 (19:51 +0300)]
LU-12345 ldiskfs: optimize nodelalloc mode

We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):

1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024

The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.

The reason is: when using ext4 with "nodelalloc", it will call
ext4_find_delalloc_cluster() nearly everytime it call
ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
all pages in cluster because no buffer is "delayed".  A cluster has
256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
a 1G file. That severely hurts the performance.

Therefore, we return immediately from ext4_find_delalloc_range() in
nodelalloc mode, since by definition there can't be any delalloc
pages.

The same optimization also added for ldiskfs_find_delayed_extent()
function that improve performance dromaticaly.

Here is results of testing on two node system.
Without the patch:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   56.30    0.06    0.00   43.63

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sds               0.00     0.00    0.00 1174.00     0.00     4.59
8.00     0.84    0.71    0.00    0.71   0.01   1.20

With patch:
08/29/2018 01:13:22 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.00    0.00    4.13   30.37    0.00   65.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s      wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm %util
sds               0.00     0.00    0.00 54117.82     0.00     211.43
8.00   152.59    2.82    0.00    2.82   0.02 99.01

Lustre-change: https://review.whamcloud.com/34982
Lustre-commit: af48ae8bff289b2bc083a888efeafa3c48df91e2

Cray-bug-id: LUS-5835
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Change-Id: Ie33410d4481778ee4f76a054ab8cfc11cc19a0ed
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37538
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11114 llite: Update mdc and lite stats on open|creat 58/38158/2
Olaf Faaland [Tue, 26 Nov 2019 23:20:11 +0000 (15:20 -0800)]
LU-11114 llite: Update mdc and lite stats on open|creat

Increment "create" counter in mdc/<instance>/md_stats, and
"mknod" counter in llite/<instance>stats when an open with
the CREAT flag results in a newly created file.

The mknod counter is chosen for consistency with
patch http://review.whamcloud.com/20246
 "LU-8150 mdt: Track open+create as mknod"
but the mdc counter set does not include mknod.

Lustre-change: https://review.whamcloud.com/36948
Lustre-commit: 4b8518ee4fa542f45fcdaeaec580d858dfcaf137

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: If082b911e415c0bc46248728e47ce0f37b9efa83
Reviewed-on: https://review.whamcloud.com/38158
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
4 years agoLU-12580 lov: fix out of bound usercopy 51/38051/2
Li Dongyang [Fri, 7 Feb 2020 12:16:26 +0000 (23:16 +1100)]
LU-12580 lov: fix out of bound usercopy

When handling ioctl LL_IOC_LOV_GETSTRIPE, the user
could pass a limited buffer which is bigger than
lov_comp_md_size(), it will crash the client because
we are doing the usercopy with the user provided buffer
size.

Make sure the copy works, also for the PFL file,
we should only copy the chosen component.

Lustre-change: https://review.whamcloud.com/37469
Lustre-commit: 2f1beb33144523467b596f4b6fab882b0a839187

Change-Id: I92bcf6d7b7f7a4387a9936a0b58332e50a88e542
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38051
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12198 libcfs: always copy ioctl header back to user 20/37720/3
Dominique Martinet [Thu, 13 Feb 2020 13:36:32 +0000 (13:36 +0000)]
LU-12198 libcfs: always copy ioctl header back to user

lnetctl_get_peer_list fills back the required size in header if the
given buffer was too small. Userspace needs the info back to grow
the buffer and try again.

Note we only replace err on failure if err was previously not set

Lustre-change: https://review.whamcloud.com/37559
Lustre-commit: 9e02ef474f8caa833d6a1b5e0068d5323a57e8c4

Fixes: fba98579efc4 ("LU-6202 libcfs: replace libcfs_register_ioctl with a blocking notifier_chain")
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Change-Id: I2b6e319aceeb00d488572053d27023891afe1928
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37720
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13294 libcfs: incorrect rotor behaviour 49/38049/2
Andrew Perepechko [Fri, 14 Feb 2020 02:20:09 +0000 (05:20 +0300)]
LU-13294 libcfs: incorrect rotor behaviour

Signed int cpt rotor is set to -1 on initialization.
cfs_cpt_spread_node() improperly handles this value
via "if (!rotor--)" check. The condition is never true
with negative rotor values, so for_each_node_mask()
only exits with node = MAX_NUMNODES.

kmalloc_node() attempts to determine the zonelist based
on the passed node id and maps MAX_NUMNODES to some
random pointer. Crash.

BUG: unable to handle kernel paging request at 0000000100002007
IP: [<ffffffff847c0da7>] __alloc_pages_nodemask+0x97/0x420

Lustre-change: https://review.whamcloud.com/37709
Lustre-commit: f8aa86dd1622804d81020a7dbb1116f276b340f3

Change-Id: I4df74e394bdfc2a918d66aa12e6852ff0f6738ab
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Cray-bug-id: LUS-8492
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38049
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11986 libcfs: provide QSTR_INIT compat macro 90/38090/2
Andreas Dilger [Fri, 27 Mar 2020 08:08:26 +0000 (02:08 -0600)]
LU-11986 libcfs: provide QSTR_INIT compat macro

Provide a compat macro for QSTR_INIT() for older kernels.

Fixes: 9d42660e173e ("LU-11986 lnet: properly cleanup lnet debugfs files")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ice19a4dad8456551ba398034a8d3942068006512
Reviewed-on: https://review.whamcloud.com/38090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>