Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-14024 ofd: Avoid use after free in ofd_inconsistency_verification_main 81/40881/3
Oleg Drokin [Mon, 12 Oct 2020 20:12:55 +0000 (16:12 -0400)]
LU-14024 ofd: Avoid use after free in ofd_inconsistency_verification_main

The ofd_inconsistency_lock should not be unlocked after we woken up
a different thread that is going to free the structure containing
said lock.

Lustre-commit: 1123bbd3fc4d5abeb111ddc6bd762d1fb2c1ce82
Lustre-change: https://review.whamcloud.com/40222

Change-Id: I913e7470664e1128a250597b0a803f791d99099e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-10262 mdt: mdt_reint_open: check EEXIST without lock 72/41172/2
Dominique Martinet [Fri, 31 Aug 2018 09:03:36 +0000 (18:03 +0900)]
LU-10262 mdt: mdt_reint_open: check EEXIST without lock

Many applications blindly open files with O_CREAT, and the mds gets a
write lock to the parent directory for these even if the file already
exists.
Checking for file existence first lets us take a PR lock if file
already existed even if O_CREAT was specified.

This opens up multiple races between the first lookup and the actual
locking, in each of them drop the resources we aquired and retry from
scratch to keep things as far from complicated as possible, with mixed
success.

Update (eaujames):
 - rebase the patch
 - update tests

Performance tests results:

The array below presents the average "open" syscall latency for 20
files in a single directory accessed by 400 different clients.
 _______________________________________________________________
| Test cases        | without patch | with patch | %improvement |
|___________________|_______________|____________|______________|
| readonly          | 0.960s        | 0.973s     | -1.40%       |
|___________________|_______________|____________|______________|
| readonly cached   | 0.372s        | 0.372s     | +0.01%       |
|___________________|_______________|____________|______________|
| O_CREAT+precreate | 1.645s        | 0.968s     | +41.13%      |
|___________________|_______________|____________|______________|
| O_CREAT cached    | 0.632s        | 0.623s     | +1.34%       |
|___________________|_______________|____________|______________|
| O_CREAT           | 1.261s        | 1.093s     | +13.32%      |
|___________________|_______________|____________|______________|
(for more detail, see the ticket comments section)

This patch optimizes concurent opens with O_CREAT flag when dentry are
not cached by clients.

Lustre-change: https://review.whamcloud.com/33098
Lustre-commit: 33dc40d58ef6eb8b384fce1da9f8d21cad4ef6d8

Change-Id: I247b579d14d20036f89033c99ece457d70ba19e7
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/41172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13641 socklnd: announce deprecation of 'use_tcp_bonding' 02/41102/2
Serguei Smirnov [Thu, 24 Dec 2020 01:43:21 +0000 (17:43 -0800)]
LU-13641 socklnd: announce deprecation of 'use_tcp_bonding'

Add warning to be printed if 'use_tcp_bonding' option is used
notifying the user that the feature is being deprecated.
It is suggested to use MR configuration with dynamic discovery
instead.

Multi-Rail feature doesn't need to be explicitly enabled.
To use MR instead of tcp bonding, group the interfaces
on the same network using the lnetctl utility:

lnetctl net add --net tcp --if eth2,eth3

or via the modprobe configuration file (/etc/modprobe.d/lnet.conf
or /etc/modprobe.d/lustre.conf):

        options lnet networks="tcp(eth2,eth3)"

and make sure dynamic discovery is enabled:

        lnetctl set discovery 1

MR will aggregate the throughput of all configured and available
networks/interfaces shared between peer nodes.

Lustre-change: https://review.whamcloud.com/41088

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I34288ae1c8a1c4092f88b45a571312691f145218
Reviewed-on: https://review.whamcloud.com/41102
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13709 tests: test lfs mkdir -c without -i 01/40801/3
Olaf Faaland [Thu, 16 Jul 2020 22:50:29 +0000 (15:50 -0700)]
LU-13709 tests: test lfs mkdir -c without -i

Almost every test with lfs mkdir -c in the test suite also
uses option -i, so lfs mkdir -c (same as -i -1, where lustre
chooses the MDTs) is poorly tested.  Add a test for that
case, sanity test_300s.

Lustre-change: https://review.whamcloud.com/39457
Lustre-commit: 2c89cc6c25549cb6748c7c9f5a209c7e38387eb4

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Iede537d52cf445c9c9a6353338670e55a11364da
Reviewed-on: https://review.whamcloud.com/40801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14075 kernel: kernel update RHEL8.2 [4.18.0-193.28.1.el8_2] 63/40663/3
Jian Yu [Tue, 17 Nov 2020 01:03:18 +0000 (17:03 -0800)]
LU-14075 kernel: kernel update RHEL8.2 [4.18.0-193.28.1.el8_2]

Update RHEL8.2 kernel to 4.18.0-193.28.1.el8_2 for Lustre client.

Test-Parameters: trivial clientdistro=el8.2

Change-Id: I34e1e51241c3090d1041dedef8379c2e212f58a5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40663
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12003 osd: take reference to object in osd_trunc_lock() 47/40547/2
Alex Zhuravlev [Thu, 9 Jan 2020 13:28:54 +0000 (16:28 +0300)]
LU-12003 osd: take reference to object in osd_trunc_lock()

normally the references to objects are held until a transaction
is over, but in few cases reference is released before. and then
such an object can be release, so OSD should have own reference
to prevent early release.

Lustre-change: https://review.whamcloud.com/37170
Lustre-commit: 4fcb9081378f6ad0b7d3cf4105cf5fb2d506966f

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I81647fdec8d42f123e990553edb5e371636f45c0
Reviewed-on: https://review.whamcloud.com/37170
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40547

3 years agoLU-10753 osd-zfs: initialize obj attr correctly 85/40585/3
Lai Siyao [Thu, 24 Sep 2020 16:05:15 +0000 (00:05 +0800)]
LU-10753 osd-zfs: initialize obj attr correctly

mdt_thread_info.mti_attr is used to initialize object attr in create,
currently it's copied to object.oo_attr directly, but some fields
in mti_attr may contain bogus data because it's not cleared in each
use, though la_valid is correctly set, but la_flags is used without
checking la_valid in __mdd_permission_internal().

Another minor fix in osd_create(): set size/nlink to zero since they
are set in valid.

Lustre-change: https://review.whamcloud.com/40062
Lustre-commit: cf395c2507e80717e7468456e9959d432b6accc8

Test-Parameters: testlist=sanity env=ONLY=300,ONLY_REPEAT=100
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I64816b66a0b3c7aa50e62680d5251141697a8e0f
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40585
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-11597 tests: skip sanityn tests for PPC 60/40660/7
James Nunez [Tue, 17 Nov 2020 00:00:03 +0000 (17:00 -0700)]
LU-11597 tests: skip sanityn tests for PPC

Several sanityn test suite tests fail consistenly when
testing PPC clients.  These tests should be skipped,
added to the ALWAYS_EXCEPT list, until the failures are
understood and fixed.

Tests to skip in sanityn are
16a (LU-11597)
71a (LU-11787)

Lustre-change: https://review.whamcloud.com/37561
Lustre-commit: c27e5fe50ca3de4c9d3dbb024a0704ee3cc4e15c

Test-Parameters: trivial clientdistro=el7.8 clientarch=ppc64 testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I39cc9d22e8a47eb8ef59ce8d30e1b6e9aa616a9a
Reviewed-on: https://review.whamcloud.com/40660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-11409 osc: grant shrink shouldn't account skipped OSC 64/40564/5
Alex Zhuravlev [Thu, 20 Sep 2018 14:15:42 +0000 (17:15 +0300)]
LU-11409 osc: grant shrink shouldn't account skipped OSC

otherwise only the first 100 OSCs are subject to grant shrink procedure.

Lustre-commit: 2b215d3763a8a37ff9d65bf1a250fcdaa27c4bdf
Lustre-change: https://review.whamcloud.com/33206

Change-Id: I65ed247b91422effb8f278d1991d4a5ba1c24814
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40564
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12866 tests: skip sanity-hsm test 113 45/40845/3
James Nunez [Wed, 2 Dec 2020 21:23:47 +0000 (14:23 -0700)]
LU-12866 tests: skip sanity-hsm test 113

sanity-hsm test 113 landed to the b2_12 branch with
Lustre version 2.12.3.  The code changes that landed
with the test require that sanity-hsm test 113 be skipped
for all versions of Lustre less than 2.12.3.

Test-Parameters: trivial testlist=sanity-hsm

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ia218ead24a0fd95200cbce6b3380a9ced3430c92
Reviewed-on: https://review.whamcloud.com/40845
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 years agoLU-11665 tests: check number of pages correctly 02/40702/3
Andreas Dilger [Tue, 4 Dec 2018 01:30:23 +0000 (18:30 -0700)]
LU-11665 tests: check number of pages correctly

On ARM or other 64KB PAGE_SIZE systems, check the read cache size
against the actual PAGE_SIZE instead of just checking the number
of pages being read.

Lustre-change: https://review.whamcloud.com/33772
Lustre-commit: df0bcc96ee578d6661fcc63e82b94e1e569f9efe

Test-Parameters: trivial clientdistro=el8.2 clientarch=aarch64 testlist=sanity

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8fb90325a18b343c5f5af01df603a25fe33ebbe5
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/40702
Tested-by: jenkins <devops@whamcloud.com>
3 years agoLU-11835 mdt: return DOM size on open resend 23/40223/4
Mikhail Pershin [Wed, 16 Jan 2019 13:24:58 +0000 (16:24 +0300)]
LU-11835 mdt: return DOM size on open resend

DOM size is returned along with DOM lock always, but it is
not true with open resend.

Patch fixes that issue and adds test case.

Lustre-change: https://review.whamcloud.com/34044
Lustre-commit: bc3ef43d36b51d346f22a4c32214c2945c04dbe5

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I73d43933f781f192e9aa8c6ee388a043dab5bde9
Reviewed-on: https://review.whamcloud.com/40223
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12681 osc: wrong cache of LVB attrs 39/40739/2
Vitaly Fertman [Mon, 16 Sep 2019 13:46:40 +0000 (16:46 +0300)]
LU-12681 osc: wrong cache of LVB attrs

osc object keeps the cache of LVB, obtained on lock enqueue, in
lov_oinfo. This cache gets all the modifications happenning on
the client, whereas the original LVB in locks does not get them.
At the same time, this cache is lost on object destroy, which
may appear on layout change in particular.

ldlm locks are left in LRU and could be matched on next operations.
First enqueue does not match a lock in LRU due to @kms_ignore in
enqueue_base, however if the lock will be obtained on a small offset
with some locks existent in LRU on larger offsets, the obtained size
will be cut by the policy region when set to KMS.

2nd enqueue can already match and add stale data to oinfo. Thus the
OSC cache is left with a small KMS. However the logic of preparing
a partial page code checks the KMS to decide if to read a page and
as it is small,the page is not read and therefore the non-read part
of the page is zeroed.

The object destroy detaches dlm locks from osc object, offload the
current osc oinfo cache to all the locks, so that it could be
reconstructed for the next osc oinfo. Introduce per-lock flag to
control the cached attribute status and drop re-enqueue after osc
object replacement.

This patch also fixes the handling of KMS_IGNORE added in LU-11964.
It is used only for skip the self lock in a search there is no other
logic for it and it is not needed for DOM locks at all - all the
relevant semantics is supposed to be accomplished by cbpending flag.

Lustre-change: https://review.whamcloud.com/36199
Lustre-commit: 8ac020df4592fc6e85edd75d54cb3795a4e50f8e

Change-Id: Iba45bb3e5ee181c82c2f22deb299228b1519cddb
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13759 dom: lock cancel to drop pages 02/40302/4
Mikhail Pershin [Wed, 15 Jul 2020 05:12:55 +0000 (08:12 +0300)]
LU-13759 dom: lock cancel to drop pages

Prevent stale pages after lock cancel by creating
cl_page connection for read-on-open pages.

Since VM pages are connected to cl_object they can be
found and discarded by CLIO properly.

Lustre-change: https://review.whamcloud.com/39401
Lustre-commit: e95eca236471cf23083ef281ef204a5920e4db9b

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Iba8c87c934c442b4c0b45d7d3821ceede1a6e68f
Reviewed-on: https://review.whamcloud.com/40302
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-10948 llite: Revalidate dentries in ll_intent_file_open 91/41091/3
Oleg Drokin [Wed, 25 Apr 2018 19:04:48 +0000 (15:04 -0400)]
LU-10948 llite: Revalidate dentries in ll_intent_file_open

We might get a lookup lock in response to our open request and we
definitely want to ensure that our dentry is valid, so it could
actually be matched by dcache code in future operations.

Benchmark results:

This patch can significantly improve open-create + stat on the same
client.

This patch in combination with two others:

https://review.whamcloud.com/33584
https://review.whamcloud.com/33585

Improves the 'stat' side of open-create + stat by >10x.

Without patches (master branch commit 26a7abe):

mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k

   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       3838.205       3838.204       3838.204          0.000
   File stat         :      33459.289      33459.249      33459.271          0.011
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :       3146.841       3146.841       3146.841          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

With the three patches:

mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k
SUMMARY rate: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       3822.440       3822.439       3822.440          0.000
   File stat         :     350620.140     350615.980     350617.193          1.051
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :       2076.727       2076.727       2076.727          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

Note 33K stats/second vs 350K stats/second.

ls -l time of the mdtest directory is also reduced from 23.5 seconds to
5.8 seconds.

Lustre-change: https://review.whamcloud.com/32157
Lustre-commit: 14ca3157b21d8bd22be29c9578819b72fd39a1e5

Change-Id: I2cb4f94c0300897adb90cc89425e5cfb1c6fe7af
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41091
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-9193 security: return security context for metadata ops 87/41387/3
Bruno Faccini [Wed, 26 Apr 2017 10:35:28 +0000 (12:35 +0200)]
LU-9193 security: return security context for metadata ops

Security layer needs to fetch security context of files/dirs
upon metadata ops like lookup, getattr, open, truncate, and
layout, for its own purpose and control checks.
Retrieving the security context consists in a getxattr operation
at the file system level. The fact that the requested metadata
operation and the getxattr are not atomic can create a window
for a dead-lock situation where, based on some access patterns,
all MDT service threads can become stuck waiting for lookup lock
to be released and thus unable to serve getxattr for security context.
Another problem is that sending an additional getxattr request for
every metadata op hurts performance.

This patch introduces a way to get atomicity by having
the MDT return security context upon granted lock reply,
sparing the client an additional getxattr request.

LU-12212 mdt: fix SECCTX reply buffer handling

LU-9193 changes for inline SECCTX in reply may cause often
resends and reconnects in some loads, e.g. dbench runs.
That is caused by missed buffer shrink when SECCTX is not
used.

Patch shrinks SECCTX buffer if it is not used

Lustre-change: https://review.whamcloud.com/26831
Lustre-commit: fca35f74f9ec5c5ed77e774f3e3209d9df057a01

Lustre-change: https://review.whamcloud.com/34734
Lustre-commit: cb61ed93f8563c26b6a6db396478fe54f8dc42cb

Test-Parameters: clientselinux testlist=sanity envdefinitions=EXCEPT=103a
Test-Parameters: mdscount=2 mdtcount=4 clientselinux testlist=recovery-small,sanity-selinux
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Sebastien Piechurski <sebastien.piechurski@atos.net>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I45659ffcb911a9d62e6d7e92bcdc251ae641b24b
Reviewed-on: https://review.whamcloud.com/41387
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13608 out: don't return einprogress error 83/41183/2
Alexander Boyko [Thu, 30 Jul 2020 12:04:27 +0000 (08:04 -0400)]
LU-13608 out: don't return einprogress error

When out_handle proccess an update request it could happened
that file doesn't exist, osd_fid_lookup triggers  scrub and
returns EINPROGRESS. Remote MDT would process EINPROGRESS at
ptlrpc layer and resend a request in loop, and MDT recovery
would be blocked.

The fix adds fid to OI for ENOENT, like it was before the LU-7782.
So the second attempt with the same fid will return ENOENT.

Lustre-change: https://review.whamcloud.com/39538
Lustre-commit: 865aa3f692bccdd9cf7ff6cafeee350e06bb8d76

Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
HPE-bug-id: LUS-9062
Change-Id: Ib9a1753234ccc773e9b9529195ebfa6e5a8c101c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41183
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13181 o2ib: fix page mapping error 03/41303/2
Alexey Lyashkov [Sat, 23 Jan 2021 01:18:39 +0000 (17:18 -0800)]
LU-13181 o2ib: fix page mapping error

IB DMA mapping can merge a physically continues page region into
single one.
It's confused a kiblnd_fmr_pool_map function who expect to see all
fragments mapped.
It's generate a error
 (o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements

By study an IB code, it looks ib_map_mr_sg return code should checked
against of result of ib_dma_map_sg instead of original fragments
count, same data should be used as argument of ib_map_mr_sg function.

Lustre-commit: 40385cda7afbd62faf7de2e956f0c7f4fa1a3fed
Lustre-change: https://review.whamcloud.com/37388

Test-Parameters: trivial
Cray-bug-id: LUS-8139
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I3b845ae54d8659d4045921f519effcf0a4428e49
Reviewed-on: https://review.whamcloud.com/41303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-14036 build: fix lbuild for MOFED 5.1 41/41041/2
Minh Diep [Wed, 14 Oct 2020 23:57:51 +0000 (16:57 -0700)]
LU-14036 build: fix lbuild for MOFED 5.1

Starting MOFED 5.1, rdma-core is required for libib*mad

Test-Parameters: trivial

Lustre-change: https://review.whamcloud.com/40254
Lustre-commit: 279721a9c7ea076bb8eab1f59ede53b09b9a5d07

Change-Id: Id26f3cdb0552933577e1b27384ac82f9f48e2b3a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41041
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14121 nodemap: do not force fsuid/fsgid squashing 61/40961/4
Sebastien Buisson [Fri, 13 Nov 2020 10:36:14 +0000 (19:36 +0900)]
LU-14121 nodemap: do not force fsuid/fsgid squashing

In the current implementation, if the real uid is squashed, then the
fsuid is similarly squashed, no matter what is the value of the
effective uid.
This squashing is a little bit too strict, and we should instead trust
mapped fsuid and fsgid values.

Also add euid_access test program and sanity-sec test_55 to verify
the issue is fixed.

Lustre-change: https://review.whamcloud.com/40645
Lustre-commit: 355787745f21b22bb36210bb1c8e41fb34e7b665

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iecaecac5054b105cd42206b0a9a3868cde0269b4
Reviewed-on: https://review.whamcloud.com/40961
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13498 sec: fix credentials with nodemap and SSK 60/40960/4
Sebastien Buisson [Mon, 5 Oct 2020 12:14:09 +0000 (21:14 +0900)]
LU-13498 sec: fix credentials with nodemap and SSK

When SSK is enabled, credentials are evaluated in new_init_ucred().
In case a nodemap entry is defined with squash UID/GID, it must
prevail over normally mapped UID/GID.

Lustre-change: https://review.whamcloud.com/40140
Lustre-commit: 2bf6442d7d9bd452153e6b1ea08ddaae3dfb3716

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1adfd98759e5b98ec78f0477846e1820fed5d8b3
Reviewed-on: https://review.whamcloud.com/40960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13783 o2iblnd: make FMR-pool support optional. 52/41152/4
Mr NeilBrown [Thu, 7 Jan 2021 00:19:28 +0000 (16:19 -0800)]
LU-13783 o2iblnd: make FMR-pool support optional.

Linux 5.8 removes the FMR-pool API.  This patch makes
all use for this API optional, selected only if the
support exists in the kernel.

Lustre-commit: 14b20ca66b2b6c5a735d39b753ec77fa6a574a6b
Lustre-change: https://review.whamcloud.com/40287

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4c40f3a766f5b46ae4f26d7d3ecf8434a6e5a0cb
Reviewed-on: https://review.whamcloud.com/41152
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoNew release 2.12.6 2.12.6 v2_12_6
Oleg Drokin [Wed, 9 Dec 2020 20:27:21 +0000 (15:27 -0500)]
New release 2.12.6

Change-Id: Ic0d9d56f6af7fe5bff9a3f1b2c5fd7610343b7f6
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 years agoNew RC 2.12.6-RC2 2.12.6-RC2 v2_12_6-RC2
Oleg Drokin [Mon, 7 Dec 2020 06:05:52 +0000 (01:05 -0500)]
New RC 2.12.6-RC2

Change-Id: Ida99071d8c4ece77fafea34d7f74a9aeb351b55d
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14172 lmv: optimize dir shard revalidate 75/40875/3
Lai Siyao [Thu, 3 Dec 2020 21:07:01 +0000 (05:07 +0800)]
LU-14172 lmv: optimize dir shard revalidate

mdt_is_remote_object() will check whether child is directory shard
if parent and child are on different MDTs, which needs to read LMV
from disk, and hurt striped directory stat performance.

This can be optimized, client can just set CROSS_REF flag to do a
cross reference getattr, which avoids lots of checks.

Lustre-change: https://review.whamcloud.com/40863/

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib2d5a510b27c90a26f979f9cccfd40948e32d91a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40875
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
3 years agoLU-14140 osd: don't panic meeting OI dups 44/40744/5
Alex Zhuravlev [Tue, 24 Nov 2020 09:34:00 +0000 (12:34 +0300)]
LU-14140 osd: don't panic meeting OI dups

instead dump all info (FID->ino, LMAs) and return an error

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2774f945464748e0c03505c092ceb8520a613c53
Reviewed-on: https://review.whamcloud.com/40744
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13514 tests: replace nid in conf-sanity test_32 37/40537/6
Yang Sheng [Wed, 4 Nov 2020 18:36:43 +0000 (02:36 +0800)]
LU-13514 tests: replace nid in conf-sanity test_32

Need replace_nid for test_32a. Else the mdc cannot
be initialzed and prevent client mounting hung.

Test-Parameters: trivial
Test-Parameters: env=ONLY=32a,ONLY_REPEAT=20 fstype=ldiskfs testlist=conf-sanity
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I651f5728ad4ff96a309ed599490c9dd6ed9c5274
Reviewed-on: https://review.whamcloud.com/40537
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoNew RC 2.12.6-RC1 2.12.6-RC1 v2_12_6-RC1
Oleg Drokin [Fri, 13 Nov 2020 23:08:12 +0000 (18:08 -0500)]
New RC 2.12.6-RC1

Change-Id: Ie881983730549b47c21668caf43f478fc92667a7
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13839 kernel: new kernel [RHEL 8.3 4.18.0-240.1.1.el8_3] 58/40558/3
Jian Yu [Sat, 7 Nov 2020 00:11:42 +0000 (16:11 -0800)]
LU-13839 kernel: new kernel [RHEL 8.3 4.18.0-240.1.1.el8_3]

This patch makes changes to support new RHEL 8.3 release
for Lustre client.

Test-Parameters: trivial clientdistro=el8.3

Change-Id: I06a46735b42ac258e576b1dd5c0beb17f4fd3e47
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40558
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14116 autoconf: check if DES3 enctype is supported 60/40560/2
Jian Yu [Fri, 6 Nov 2020 09:31:27 +0000 (01:31 -0800)]
LU-14116 autoconf: check if DES3 enctype is supported

krb5 releases 1.18 and later completely remove support for
all DES3 enctypes (des3-cbc-raw, des3-hmac-sha1, des3-cbc-sha1-kd).

This patch adds HAVE_DES3_SUPPORT to check if DES3 enctype
is supported.

Change-Id: Ibb51ec7961e8c775ea92dec6119f4de01e2d9b1d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40560
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
3 years agoLU-13519 osd-ldiskfs: expand inode project quota for upgrading 04/40404/10
Wang Shilong [Wed, 6 May 2020 04:45:25 +0000 (12:45 +0800)]
LU-13519 osd-ldiskfs: expand inode project quota for upgrading

When upgrading filesystem, it is possible that inode
it not big enough to hold project id field, and in that case
set project ID will return EOVERFLOW error.

Since ldiskfs have the logic to expand inode size automatically,
we could add similar logic for project quota.

Considering this as an rare case, we just call
ldiskfs_mark_inode_dirty() which will try to expand instead
of exporting more functions.

Lustre-change: https://review.whamcloud.com/38505
Lustre-commit: 57108489a3eb2ff6fc3994dbda0649ae445d6cb7

Change-Id: I941f33ce8f45d2015acc0a33c5b54cf3a771a452
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40404
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13969 tests: Updates to lustre-release yaml.sh 02/40402/2
Lee Ochoa [Mon, 26 Oct 2020 16:58:16 +0000 (10:58 -0600)]
LU-13969 tests: Updates to lustre-release yaml.sh

Updated output of release() function to standarize node.yml
file os_distribution parameter. Changes as follows:

RHEL   - use redhat-release first and os-release as backup
         as the latter may not include the full version
         (major/minor)
CENTOS - use centos-release first and os-release as backup,
         same as RHEL
SUSE   - use os-release instead of suse-release as the latter
         is deprecated
UBUNTU - use os-release

Removed parsing system-release and *-release as neither
option correctly outputs desired info

Removed "lustre_" references in node.yml file attributes,
the default in Maloo is to look for non-lustre prefixes
first.

Lustre-commit: f90199b104984da5f2157e39a286d433b725ed57
Lustre-change: https://review.whamcloud.com/39952

Change-Id: Ia011f944aae53f31fcd3a539e846ea5aba7ec7c4
Signed-off-by: Lee Ochoa <lochoa@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13687 llite: return -ENODATA if no default layout 99/40499/2
Andreas Dilger [Sat, 27 Jun 2020 11:14:02 +0000 (05:14 -0600)]
LU-13687 llite: return -ENODATA if no default layout

Don't return -ENOENT if fetching the default layout from the root
directory fails.  Otherwise, "lfs find" will print an error message
for every directory scanned in the filesystem:

     lfs find: /myth/tmp does not exist: No such file or directory

Lustre-change: https://review.whamcloud.com/39200
Lustre-commit: 7fb17eb7b7e6035931987ae1e9589639114d210e

Fixes: 3e8fa8a7396c ("LU-11656 llite: fetch default layout for a directory")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e082c5d425c44ca7770d3b24cbb13bb7d2540e5
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40499
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-12662 tests: Add new pjdfstest into tests 53/38653/8
Wei Liu [Tue, 20 Aug 2019 18:59:36 +0000 (11:59 -0700)]
LU-12662 tests: Add new pjdfstest into tests

Create a new POSIX test suite based on pjdfstest.

This is a back port from
Lustre-change: https://review.whamcloud.com/35841
Lustre-commit: 414e613c2da55e6b8d2b3b20cbfb340cd84c9854

Test-Parameters: trivial
Test-Parameters: fstype=ldiskfs testlist=pjdfstest
Test-Parameters: fstype=zfs testlist=pjdfstest

Change-Id: Iec37e2248ce5ccf89319aaffb3ead9b407ad1931
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38653
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13949 build: add autogen.sh into distribution tarball 66/40466/2
Jian Yu [Thu, 29 Oct 2020 18:07:03 +0000 (11:07 -0700)]
LU-13949 build: add autogen.sh into distribution tarball

This patch adds autogen.sh and config/lustre-version.m4 into
Lustre distribution tarball so that customers can regenerate
aclocal.m4, config.h.in, autoMakefile.in and configure in
their build environments.

Change-Id: Ic6c5430b9a8b504ebc6a7618e141f1ea23b046a2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40466
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13514 tests: remove upgrade images for conf-sanity 92/40492/2
James Nunez [Fri, 19 Jun 2020 18:01:42 +0000 (12:01 -0600)]
LU-13514 tests: remove upgrade images for conf-sanity

conf-sanity test 32a is hanging at a high rate.  We need to
explore if the issue involves old images are having problems
upgrading to the latest version of Lustre.

Test-Parameters: trivial
Test-Parameters: env=ONLY=32a,ONLY_REPEAT=20 fstype=ldiskfs testlist=conf-sanity
Test-Parameters: env=ONLY=32 fstype=zfs testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I0ff1e9e1304192b1008551b82133d95a0010c86a
Reviewed-on: https://review.whamcloud.com/39109
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40492
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13437 llite: pass name in getattr by FID 82/40482/2
Lai Siyao [Mon, 12 Oct 2020 14:22:07 +0000 (22:22 +0800)]
LU-13437 llite: pass name in getattr by FID

Now parent FID is packed in getattr_by_FID request
(see https://review.whamcloud.com/39290), it should also pass in name
from llite, so that lmv can replace fid1 with stripe FID, otherwise
MDS may treat sub files under striped directory as remote object.

Note, the name is not packed in request, because if it's packed, MDS
will getattr by name instead of FID.

Lustre-change: https://review.whamcloud.com/40219
Lustre-commit: 90ebab5833007defd91e86f5878f356ae5304a1b

Fixes: 5f2c44bf6 ("LU-13437 llite: pack parent FID in getattr")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If8215667bcb10ea3c4c5cd2c9034d81fd1cda3b5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40482
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13437 mdc: remote object support getattr from cache 51/40451/2
Lai Siyao [Sat, 10 Oct 2020 14:34:19 +0000 (22:34 +0800)]
LU-13437 mdc: remote object support getattr from cache

For historical reason, IT_GETATTR lock revalidate matches
LOOKUP|UPDATE|PERM lock bits because for MDS < 2.4, permission is
protected by LOOKUP lock, but this will cause remote object not
able to match the cached lock because LOOKUP and UPDATE lock are
fetched separately.

Add sanity 803b, and rename 803 to 803a.

Lustre-change: https://review.whamcloud.com/40218
Lustre-commit: 72a1ca996e3a35ce3e4b7e517f77ff7ac83ccdd5

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ac38fe34472736849307bb7f1eebb5de9343a5c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13692 ldlm: Ensure we reprocess the resource on ast error 12/40412/5
Oleg Drokin [Fri, 7 Aug 2020 07:38:51 +0000 (03:38 -0400)]
LU-13692 ldlm: Ensure we reprocess the resource on ast error

When we are trying to grant a lock and met an AST error, rerunning
the policy is pointless since it cannot grant a potentially now eligible
lock and our lock is already in all the queues, just be like all the other
handlers for ERESTART return and run a full resource reprocess instead.

Lustre-change: https://review.whamcloud.com/#/c/39598/
Lustre-commit: 24e3b5395bc61333a32b1e9725a0d7273925ef05

Change-Id: I3edb37bf084b2e26ba03cf2079d3358779c84b6e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40412
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-11719 ldlm: Adjust search_* functions 99/40399/2
Patrick Farrell [Mon, 3 Dec 2018 16:36:08 +0000 (10:36 -0600)]
LU-11719 ldlm: Adjust search_* functions

The search_itree and search_queue functions should both
return either a pointer to a found lock or NULL.

Currently, search_itree just returns the contents of
data->lmd_lock, whether or not a lock was found.

search_queue will do the same under certain cirumstances.

Zero lmd_lock in both search_* functions, and also stop
searching in search_itree once a lock is found.

cray-bug-id: LUS-6783
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: Ie231166756e60c228370f8f1a019ccfe14dfda6a
Reviewed-on: https://review.whamcloud.com/33754
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40399
Tested-by: jenkins <devops@whamcloud.com>
3 years agoLU-12014 llite: check correct size in ll_dom_finish_open() 01/40301/2
Mikhail Pershin [Wed, 19 Dec 2018 19:28:53 +0000 (22:28 +0300)]
LU-12014 llite: check correct size in ll_dom_finish_open()

The check in ll_dom_finish_open() for data end shouldn't
use i_size for comparision because it may be not updated
yet with just returned data from server. Use size value in
mdt_body from reply for that check.

Lustre-change: https://review.whamcloud.com/33895
Lustre-commit: 7b9fd576f7de7d4bfa40c85d06bb224e7a29c829

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I1104fbbb0eb4633869b9bf2d1803ac3e84e3853d
Reviewed-on: https://review.whamcloud.com/40301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12296 llite: improve ll_dom_lock_cancel 96/40296/3
Vladimir Saveliev [Wed, 5 Jun 2019 01:46:42 +0000 (04:46 +0300)]
LU-12296 llite: improve ll_dom_lock_cancel

ll_dom_lock_cancel() should zero kms attribute similar to
mdc_ldlm_blocking_ast0().

In order to avoid code duplication between mdc_ldlm_blocking_ast0()
and ll_dom_lock_cancel() - add new cl_object_operations method -
coo_object_flush() to reach mdc's blocking ast from llite level.

Tests illustrating the issue are added.

Lustre-change: https://review.whamcloud.com/34858
Lustre-commit: 707bab62f5d6c704b30e4ee9e769b5c9f026e1e7

LU-12704 lov: check all entries in lov_flush_composite

Check all layout entries for DOM layout and exit with
-ENODATA if no one exists. Caller consider that as valid
case due to layout change.

Define llo_flush methods for all layouts as required
by lov_dispatch().

Patch cleans up also cl_dom_size field in cl_layout which
was used in previous ll_dom_lock_cancel() implementation

Run lov_flush_composite under down_read lov->lo_type_guard to avoid
race with layout change.

Lustre-change: https://review.whamcloud.com/36368
Lustre-commit: 44460570fd21a91002190c8a0620923125135b52

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2b100ead6d420dbf561bc61be973d64dad317214
Reviewed-on: https://review.whamcloud.com/40296
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait 11/40411/3
Oleg Drokin [Fri, 23 Oct 2020 06:56:04 +0000 (02:56 -0400)]
LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait

in ldlm_handle_cp_callback the while loop is clearly supposed
to be limited by the "to" value of 1 second, but is not.
Seems to have been broken by all the Solaris porting in HEAD
all the way back in 2008.
Restore the to assignment to make it not hang indefinitely.

Lustre-change: https://review.whamcloud.com/#/c/40375/
Lusre-commit: 5da99051e58b9e9079b66a275d6c47e1e109eee5

Change-Id: I449bfd7f585ab7db475fb3fd4cbbd876126ff789
Fixes: adde80ffef ("Land b_head_libcfs onto HEAD")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40411
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13719 lov: doesn't check lov_refcount 52/40452/2
Hongchao Zhang [Fri, 21 Aug 2020 10:17:12 +0000 (18:17 +0800)]
LU-13719 lov: doesn't check lov_refcount

In lov_cleanup, the check of each OSC is protected by
lov_tgt_getrefs, which will increment the "lov_refcount",
so the "lov_refcount" shouldn't be checked inside because
it is always larger than 0.

Change-Id: I21423d4345190b3e02eb00734c127e35cbc9b1af
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40452

3 years agoLU-13636 osd: create agent inode with explicit owner 03/40403/2
Alex Zhuravlev [Fri, 5 Jun 2020 05:16:32 +0000 (08:16 +0300)]
LU-13636 osd: create agent inode with explicit owner

to avoid quota misaccounting.

Lustre-change: https://review.whamcloud.com/38842
Lustre-commit: 7805b45f1182ed21198c0cd2000ffe93b7de5340

Test-Parameters: fstype=ldiskfs
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5a02e6e7de71821a10704ac3516ee087998c9c21
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40403
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13919 kernel: kernel update RHEL7.8 [3.10.0-1127.19.1.el7] 93/39993/5
Jian Yu [Mon, 26 Oct 2020 18:22:54 +0000 (11:22 -0700)]
LU-13919 kernel: kernel update RHEL7.8 [3.10.0-1127.19.1.el7]

Update RHEL7.8 kernel to 3.10.0-1127.19.1.el7.

Test-Parameters: trivial clientdistro=el7.8 serverdistro=el7.8

Change-Id: I7d0cbdb32b33f2f8121fec707924c35fa086f965
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13477 lnet: Force full discovery cycle 77/39577/9
Amir Shehata [Wed, 5 Aug 2020 19:34:10 +0000 (12:34 -0700)]
LU-13477 lnet: Force full discovery cycle

There are scenarios where there could be a discrepancy between
cached peer information and reality. In these cases what could
end-up happening is incomplete interface information might be
cached because one side determined that the peer didn't require
a PUSH. This will lead to undesired MR behavior, where not all
the interfaces are used for a period of time.

Therefore, it is safer to always force a full discovery cycle:
GET/PUSH to ensure both sides are up-to-date.

In the NMR case, when discovery is turned off, make sure to flag
discovery as complete to avoid stalling the state machine.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie49ad11e8ff874206baa268a4ef2d58ebb536ed5
Lustre-change: https://review.whamcloud.com/38322
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39577
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-10756 ptlrpc: fix IMP_CLOSED state is being never set 21/38621/5
Mikhail Pershin [Mon, 3 Feb 2020 09:03:59 +0000 (12:03 +0300)]
LU-10756 ptlrpc: fix IMP_CLOSED state is being never set

Commit cf78502e48d checks the new state for IMP_CLOSED value
instead of import current state so instead of keeping import
closed it prevents import state from being set to IMP_CLOSE

Patch restores original check to keep import closed by
checking its current state

Fixes: cf78502e48d ("LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7df2798f09ce7023381c03957adf530da4149c2d
Reviewed-on: https://review.whamcloud.com/37405
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
(cherry picked from commit 43dddbd0785d4da14714390d802bf6ec65567350)
Reviewed-on: https://review.whamcloud.com/38621

3 years agoLU-13464 target: abort recovery if timer fail 03/40303/2
Hongchao Zhang [Mon, 19 Oct 2020 18:52:56 +0000 (11:52 -0700)]
LU-13464 target: abort recovery if timer fail

During target recovery, the recovery timer should be kept to be
armed to ensure the recovery doesn't take too long time, there
should be some problem if the deadline of the recovery timer is
passed and the recovery is not completed yet, the recovery should
be aborted in this case.

Lustre-commit: 87443d9c27e8535c3e17d6bf142ad68d4449b93f
Lustre-change: https://review.whamcloud.com/38277

Change-Id: Id44f2a2d1a3183ad8dd13f4d34392713c55a2cb3
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14012 lod: properly initialize lcm in lod_layout_convert() 06/40306/2
John L. Hammond [Tue, 20 Oct 2020 00:40:18 +0000 (17:40 -0700)]
LU-14012 lod: properly initialize lcm in lod_layout_convert()

In lod_layout_convert() zero out lcm and lcme before constructing the
converted layout.

Lustre-commit: 6f2a1c911f0a326765e6d11f35bb602daf057948
Lustre-change: https://review.whamcloud.com/40153

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I40f96d51cb63816a9bfc34217f02ff7c450de974
Reviewed-on: https://review.whamcloud.com/40306
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13511 obdclass: don't initialize obj for zero FID 04/40304/2
Lai Siyao [Mon, 19 Oct 2020 19:09:45 +0000 (12:09 -0700)]
LU-13511 obdclass: don't initialize obj for zero FID

Object with zero FID is used in stripe allocation, and it's
meaningless to initialize such object via lu_object_find_at(),
return error early to avoid assertion in lu_object_put().

Lustre-commit: 22ea9767956c89aa08ef6d80ad04aaccde647755
Lustre-change: https://review.whamcloud.com/39792

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia1bda3d01ff7552e94f31a9c928868652937d559
Reviewed-on: https://review.whamcloud.com/40304
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12233 lnet: deadlock on LNet shutdown 71/40171/2
Serguei Smirnov [Wed, 7 Oct 2020 22:13:31 +0000 (18:13 -0400)]
LU-12233 lnet: deadlock on LNet shutdown

Release ln_api_mutex during LNet shutdown while waiting
for zombie LNI to allow other threads to read the LNet
state updated by the shutdown and fall through, avoiding
the deadlock

Lustre-change: https://review.whamcloud.com/39933
Lustre-commit: e0c445648a38fb72cc426ac0c16c33f5183cda08

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If0886f1bc4412dd9cacb08a0f06fa69aeeed1c5b
Reviewed-on: https://review.whamcloud.com/40171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13892 lnet: lock-up during router check 72/40172/2
Serguei Smirnov [Wed, 7 Oct 2020 22:51:06 +0000 (18:51 -0400)]
LU-13892 lnet: lock-up during router check

This is a fix for the issue with LNet lock-up while waiting
for routers to become active with check_routers_before_use
option. Release ln_api_mutex while waiting to allow
incoming connections to be handled.

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I63b1d1ce5ee2b27a3bd2cea78713fc6fc7502cf7
Reviewed-on: https://review.whamcloud.com/40172
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-10949 mdt: lost reference on mdt_md_root 76/39976/5
Andriy Skulysh [Wed, 20 Feb 2019 10:48:03 +0000 (12:48 +0200)]
LU-10949 mdt: lost reference on mdt_md_root

mdt_remote_object_lock_try() drops object
reference in case of an error but if the
request was sent to a server it is decreased
again via failed_lock_cleanup()

Add ldlm_created_callback. It is called after
lock creation, so we can safely add a reference
to l_ast_data and drop it only in BL AST handler.

Lustre-commit: b2368774a01eb89981e2ceb92be9673e4b403d62
Lustre-change: https://review.whamcloud.com/34181

Cray-bug-id: LUS-7013
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Change-Id: I49c946278f379390634642370d15c7fe89441d86
Reviewed-on: https://review.whamcloud.com/39976
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-11276 ldlm: fix lock convert races 54/39854/2
Vitaly Fertman [Wed, 16 Oct 2019 16:07:56 +0000 (19:07 +0300)]
LU-11276 ldlm: fix lock convert races

The blocking cb may be triggered in parallel and the convert logic
of the DOM lock must be ready that the cancel_bits could be already
zeroed by the first executor.

As there may be several blocking cb parallel executors and several
conversion callers, each requesting for different inode bits, setup
the following logic:
- the lock keeps the aggregated set of bits requested for cancelling
  by different parties, where 0 means the whole lock is to be
  cancelled, and where the CBPENDING flag means there is a canceling
  job pending;
- once completed, the cancel_bits are zeroed and the CBPENDING flag
  is dropped, meaning the next request will be a part of the next job;
- once a local lock is converted, its state is changed appropriately
  and no cleanup is left for the interpret time as the lock is ready
  for the next usage;
- as the lock is unlocked in a process of conversion and more bits
  may appear, check it and repeat appropriately;
- let just 1 conversion executor to work at a time, others are waiting
  similar to ldlm_cli_cancel();
- there are others who may want to cancel unused locks (cancel_lru,
  cancel_resource_local), consider CANCELING as a request to cancel
  the full lock independently of the cancel_bits;

Some cleanups are done:
- move the cache drop logic to the CANCELING part of the blocking cb
  from the BLOCKING one;
- remove the convert RPC interpret, as the lock cleanups are already
  done in advance; the convert RPC is re-sendable and an error means
  there is a serioes net problem;

Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I901de34241704ed801152f071cb7f610fe6f4bfe
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39854
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13590 kernel: RHEL 7.9 server support 24/40224/2
Jian Yu [Mon, 12 Oct 2020 23:58:27 +0000 (16:58 -0700)]
LU-13590 kernel: RHEL 7.9 server support

This patch makes changes to support new RHEL 7.9 release
for Lustre server (kernel 3.10.0-1160.2.1.el7).

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I7653091f2bd6a579447edb12045984d2829a8235
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13922 osd-ldiskfs: no need to add OI cache in readdir 35/40135/3
Lai Siyao [Sat, 29 Aug 2020 21:53:18 +0000 (05:53 +0800)]
LU-13922 osd-ldiskfs: no need to add OI cache in readdir

It's a waste of time to call osd_add_oi_cache() in osd_it_ea_rec(),
because each dirent read will override it.

Lustre-change: https://review.whamcloud.com/39782
Lustre-commit: bc5934632df10aaa02b32b8254a473c14c6f8104

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iec701bf66153fdf2ba7a3f3b89565381215abf33
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12870 build: sanity-hsm test depends on libtool 22/38822/4
Minh Diep [Thu, 17 Oct 2019 14:11:09 +0000 (07:11 -0700)]
LU-12870 build: sanity-hsm test depends on libtool

Adding Ubuntu libtool-bin requirement

Lustre-change: https://review.whamcloud.com/36471
Lustre-commit: dbce727a3633ce03d24c28defce9a0ed6d1ef106)

Test-Parameters: trivial clientdistro=ubuntu1804 testlist=sanity-hsm

Change-Id: I04cfffc880259e4cf1c2cba142eddd47a95a736e
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38822
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 years agoLU-12352 libcfs: crashes with certain cpu part numbers 94/37994/4
Andrew Perepechko [Thu, 17 Jan 2019 21:58:10 +0000 (00:58 +0300)]
LU-12352 libcfs: crashes with certain cpu part numbers

Due to a bug in the code, libcfs will crash if the
number of online cpus does not divide by the number
of cpu partitions.

Based on the checks in cfs_cpt_table_create(), it
appears that the original intent was to push the
remaining cpus into the initial partitions.

So let's do that properly.

Lustre-commit: e33e3da58972a811e6eafc479f95f6df2baf4b9b
Lustre-change: https://review.whamcloud.com/34991

Change-Id: I3c5e2aa1fdfca4c07e7afce143c984973373f009
Cray-bug-id: LUS-6455
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/37994
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13960 tests: correct usage of _var variable 85/39985/2
James Nunez [Sat, 12 Sep 2020 18:04:02 +0000 (12:04 -0600)]
LU-13960 tests: correct usage of _var variable

In the setmodopts() function in functions.sh, the '_var'
variable is set and used.  There is one use of the variable
'var' which should be '_var'.  Change the use of 'var' to
'_var'.

Reviewed-on: https://review.whamcloud.com/39891
(cherry picked from commit ff29ed8fe9c58bd2caa4d63bcbe7556e1c320703)

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY=53 clientdistro=ubuntu1804 fstype=ldiskfs
Test-Parameters: testlist=conf-sanity env=ONLY=53 clientdistro=el7.8 fstype=ldiskfs
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: If524be1f0b4b2170a514a558256a5308c9a5e586
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39985
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13590 kernel: new kernel [RHEL 7.9 3.10.0-1160.2.1.el7] 77/40177/2
Jian Yu [Thu, 8 Oct 2020 18:13:45 +0000 (11:13 -0700)]
LU-13590 kernel: new kernel [RHEL 7.9 3.10.0-1160.2.1.el7]

This patch makes changes to support new RHEL 7.9 release
for Lustre client.

Test-Parameters: trivial clientdistro=el7.9

Change-Id: I7a2846de48a6710d6d720d6ccc3176dba4afc6bb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40177
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-12820 osc: remove 'transient' arg from osc_enter_cache_try 18/39518/7
Mr NeilBrown [Sun, 29 Sep 2019 23:09:54 +0000 (09:09 +1000)]
LU-12820 osc: remove 'transient' arg from osc_enter_cache_try

This arg is always '0', so remove it.
Consequently, OBD_BRW_NOCACHE is never set, and
cl_dirty_transit and obd_dirty_transit_pages
are never non-zero, so they can be removed as well.

Lustre-change: https://review.whamcloud.com/36319
Lustre-commit: 524deb6f985beb512a4499501fd7275ecb77f815

Patch also includes changes for atomic ops optimization
to keep in sync with master branch:

Lustre-change: https://review.whamcloud.com/33859
Lustre-commit: 8b364fbd6bd9e0088440e6d6837861a641b923a0

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia047affc33fb9277e6c28a8f6d7d088c385b51a8
Reviewed-on: https://review.whamcloud.com/39518
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13608 tgt: abort recovery while reading update llog 84/39284/6
Hongchao Zhang [Tue, 30 Jun 2020 11:22:10 +0000 (19:22 +0800)]
LU-13608 tgt: abort recovery while reading update llog

Abort the reading update LLOG fromt other MDTs when the recovery
is aborted, then the recovery process can be aborted in time.

This patch also adds watchdog for the process of the replay request
to detect possible stale process.

Lustre-change: https://review.whamcloud.com/38746
Lustre-commit: 0496cdf20451f07befebd1cb8a770544ec0f57df

Change-Id: Ie2de041360c9eba95ef9bfd14b00ac2709e6eace
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38746
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39284
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13437 llite: pack parent FID in getattr 71/39771/2
Lai Siyao [Mon, 6 Jul 2020 13:52:45 +0000 (21:52 +0800)]
LU-13437 llite: pack parent FID in getattr

Pack parent FID in getattr request if OBD_CONNECT2_GETATTR_PFID is
enabled, otherwise fill it with target FID for backward compatibility.

Lustre-change: https://review.whamcloud.com/39290
Lustre-commit: 5f2c44bf626b178503c1c4d2d85c40bae087ff4f

Fixes: f9a2da63 ("LU-13437 mdt: don't fetch LOOKUP lock for remot...")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Change-Id: I91bace23e67b548feb92fd885fb5e64e92c96408
Reviewed-on: https://review.whamcloud.com/39771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13437 uapi: add OBD_CONNECT2_GETATTR_PFID 70/39770/2
Lai Siyao [Mon, 6 Jul 2020 13:03:59 +0000 (21:03 +0800)]
LU-13437 uapi: add OBD_CONNECT2_GETATTR_PFID

Add OBD_CONNECT2_GETATTR_PFID connect flag to pack parent FID in
getattr request, which will be used to check whether target is
remote object, if so, don't take LOOKUP lock, otherwise client
may see stale directory entries.

Lustre-change: https://review.whamcloud.com/39289
Lustre-commit: f384a8733c41e43ebc2db3c542287a700ace8cbb
Test-parameters: trivial

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Change-Id: Ibdf880934456f255f83cd4bac9d61ab5e1ed7330
Reviewed-on: https://review.whamcloud.com/39770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13437 mdt: rename misses remote LOOKUP lock revoke 01/39601/3
Lai Siyao [Wed, 8 Apr 2020 14:55:22 +0000 (22:55 +0800)]
LU-13437 mdt: rename misses remote LOOKUP lock revoke

In rename, all objects but target may be remote, so to check whether
source is remote object on source parent, we need to compare which
MDTs they are located if both are remote. Add a helper function
mdt_rename_source_lock() to handle all possible combinations. If target
parent is remote, take remote LOOKUP for target on where target parent
is.

Add sanityn.sh 81c.

Lustre-change: https://review.whamcloud.com/38181
Lustre-commit: 4918fe40db262b19093436caca688c75eb632496

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2c134970d6abc8761528d01950b23495292cdf93
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39601
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13437 mdt: don't fetch LOOKUP lock for remote object 69/39769/2
Lai Siyao [Sun, 10 May 2020 07:22:36 +0000 (15:22 +0800)]
LU-13437 mdt: don't fetch LOOKUP lock for remote object

Pack parent FID in getattr by FID, which will be used to check whether
child is remote object on parent. The helper function is called
mdt_is_remote_object(). NB, directory shard is not treated as remote
object, because if so, client needs to revalidate shards when dir is
accessed, which will hurt performance much.

For getattr by FID, if object is remote file on parent, don't fetch
LOOKUP lock, otherwise client may see stale dir entries.

Lustre-change: https://review.whamcloud.com/38561
Lustre-commit: f9a2da63abab5b8b687842166a0b5b5e434ad441

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Change-Id: I37b36983735eca63da37f190456b5cc1b861b29e
Reviewed-on: https://review.whamcloud.com/39769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13437 lmv: check stripe FID sanity 00/39600/2
Lai Siyao [Fri, 8 May 2020 14:53:47 +0000 (22:53 +0800)]
LU-13437 lmv: check stripe FID sanity

Striped directory layout may be broken, if some stripe FID is insane,
return -ENODEV.

Lustre-change: https://review.whamcloud.com/38560
Lustre-commit: 698a496aac51e11791717a9cbd0a86b3525f4557

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7ed8c7c561e34625e2cb29bfd14bc0ecf3fce46c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39600
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13471 lnet: use the same src nid for discovery 76/39576/4
Amir Shehata [Thu, 23 Apr 2020 00:06:23 +0000 (17:06 -0700)]
LU-13471 lnet: use the same src nid for discovery

When discovering a remote peer (not on the same network) a GET is
sent to the peer to retrieve the peer's interfaces.  This is followed
by a PUSH, if discovery is on, to push the node's interfaces However,
if both node and peer have multiple interfaces it is likely that the
GET and the PUSH will originate on different interfaces. When the
peer receives the PUSH it will not be able to connect the two NIDs
and will not be able to consolidate the node's NIDs.  This issue is
specific for remote peers because at the time the push handler is
invoked the remote lpni has not been created yet. lnet_parse()
creates the lpni of the gateway.

Similar to the strategy already in place of using the same source NID
for all the messages of an RPC, discovery should use the same source
NID for both the GET and PUSH.

This patch stores the source NID interfaces the GET was sent on and
uses it for the PUSH.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I5a13ab7799b2ddc47714202bcbed786b0d3940b7
Reviewed-on: https://review.whamcloud.com/38320
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39576

3 years agoLU-13907 llite: don't set FS_REQUIRES_DEV on client 74/39674/4
Andreas Dilger [Thu, 13 Aug 2020 22:18:52 +0000 (16:18 -0600)]
LU-13907 llite: don't set FS_REQUIRES_DEV on client

If doing a client-only build, do not set the FS_REQUIRES_DEV flag
for the 'lustre' filesystem type.  This is only needed on the server,
but the filesystem type declaration is shared between both.

In master, this was fixed by declaring a new 'lustre_tgt' filesystem
type and using that for server filesystem mounts.  However, for 2.12
this is overkill, and it is possible to get a 95% fix by dropping
the FS_REQUIRES_DEV flag for the common case of client-only builds.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: Iab2e78515aba018e2a6bceb324ad1b8a313ebbe5
Reviewed-on: https://review.whamcloud.com/39674
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13187 osd-ldiskfs: don't enforce max dir size limit on IAM objects 82/39882/3
Li Dongyang [Thu, 3 Sep 2020 23:34:34 +0000 (09:34 +1000)]
LU-13187 osd-ldiskfs: don't enforce max dir size limit on IAM objects

Add ext4-no-max-dir-size-limit-for-iam-objects.patch to introduce new
inode state EXT4_STATE_IAM and use it to mark IAM objects.

Lustre-change: https://review.whamcloud.com/39823
Lustre-commit: 03e6db505be90d35ccacb3af7e15277784e5d448

Change-Id: I3bcc5435ea07edb9fa265dcd8e3261d849495f00
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13763 osc: don't allow negative grants 80/39380/4
Mikhail Pershin [Wed, 15 Jul 2020 05:42:49 +0000 (08:42 +0300)]
LU-13763 osc: don't allow negative grants

Add check in the osc_init_grant() to prevent possible
underflow of cl_avail_grant and report error if it happens

Lustre-change: https://review.whamcloud.com/#/c/39827
Lustre-commit: e05ccafd6ee214895d01efbb13a3757e3625a859

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Idcd25ed427c23735e1cdc70359bace43b5b9d886
Reviewed-on: https://review.whamcloud.com/39380
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12687 osc: consume grants for direct I/O 86/39386/11
Vladimir Saveliev [Mon, 29 Jun 2020 11:26:57 +0000 (14:26 +0300)]
LU-12687 osc: consume grants for direct I/O

New IO engine implementation lost consuming grants by direct I/O
writes. That led to early emergence of out of space condition during
direct I/O. The below illustrates the problem:
  # OSTSIZE=100000 sh llmount.sh
  # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct
  dd: error writing â€˜/mnt/lustre/file’: No space left on device

Consume grants for direct I/O.

Try to consume grants in osc_queue_sync_pages() when it is called for
pages which are being writted in direct i/o.

Tests are added to verify grant consumption in buffered and direct i/o
and to verify direct i/o overwrite when ost is full.
The overwrite test is for ldiskfs only as zfs is unable to overwrite
when it is full.

Lustre-change: https://review.whamcloud.com/35896
Lustre-commit: 05f326a7988a7a0d6954d1b0d318315526209ae6

Fixes: 9fe4b52ad2 ("LU-1030 osc: new IO engine implementation")
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e
Cray-bug-id: LUS-7036
Reviewed-on: https://review.whamcloud.com/39386
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13761 o2ib: Fix compilation with MOFED 5.1 81/39781/2
Sergey Gorenko [Tue, 1 Sep 2020 06:53:06 +0000 (23:53 -0700)]
LU-13761 o2ib: Fix compilation with MOFED 5.1

A new argument was added to rdma_reject() in MOFED 5.1 and
Linux 5.8.

Add a cofigure check and support both versions of rdma_reject().

Lustre-commit: 956deb0fe8195c7a0c38c66a5a8cc1e95c2c245e
Lustre-change: https://review.whamcloud.com/39323

Test-Parameters: trivial
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Change-Id: I2b28991f335658b651b21a09899b7b17ab2a9d57
Reviewed-on: https://review.whamcloud.com/39781
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13742 llite: do not bypass selinux xattr handling 71/39671/3
Shaun Tancheff [Wed, 5 Aug 2020 14:17:03 +0000 (09:17 -0500)]
LU-13742 llite: do not bypass selinux xattr handling

Without the hint from selinux_is_enabled() to determine if selinux
is running at boot the performance fix from LU-549 to skip handling
of selinux xattrs cannot be correctly handled.

The correct path is to act is if selinux is enabled.

This fixes a bug introduced by LU-12355 that now exists in
RHEL 8.2 kernels where clients have enabled selinux.

Lustre-change: https://review.whamcloud.com/39569
Lustre-commit: 994287bd47819ebd8badb716da4232cdff97d324

Fixes: 39e5bfa734 ("LU-12355 llite: include file linux/selinux.h removed")
Test-Parameters: clientdistro=el8.2 serverdistro=el8.2 clientselinux testlist=sanity-selinux
Test-Parameters: clientdistro=el8.1 serverdistro=el8.1 clientselinux testlist=sanity-selinux
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6fb5ed9ecdb79545225b5586b90509eb157a355b
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39671
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13580 tests: fix retrieval of SELinux context 13/39713/2
Sebastien Buisson [Mon, 18 May 2020 09:43:22 +0000 (11:43 +0200)]
LU-13580 tests: fix retrieval of SELinux context

Use 'stat' command instead of 'ls -lZ' to retrieve SELinux security
context, to make it more portable.

Lustre-change: https://review.whamcloud.com/38648
Lustre-commit: ca09fda138b6d72588f40e4cf79c5f2de832d2dd

Test-Parameters: trivial clientselinux testlist=sanity-selinux mdtcount=2 clientcount=2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I61bc0efb1e8ae0427d05827e2933eb0b848fb442
Reviewed-on: https://review.whamcloud.com/39713
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13278 lnet: Reconcile discovery push and reply handling 75/39575/2
Chris Horn [Mon, 10 Feb 2020 20:11:49 +0000 (14:11 -0600)]
LU-13278 lnet: Reconcile discovery push and reply handling

Reconcile the logic for updating the multi-rail flag of a peer when
processing a discovery PUSH with the logic used when processing a
discovery REPLY.

Cray-bug-id: LUS-8516
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Idfb4c3729822d03b71f9440ac66176ae6b886022
Reviewed-on: https://review.whamcloud.com/37674
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Stephen Champion <stephen.champion@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39575
Reviewed-by: Chris Horn <chris.horn@hpe.com>
3 years agoLU-13818 build: use libsnmp-dev instead of libsnmp30 79/39679/2
Minh Diep [Fri, 24 Jul 2020 17:38:04 +0000 (10:38 -0700)]
LU-13818 build: use libsnmp-dev instead of libsnmp30

Installing libsnmp-dev will pull in the correct libsnmpXX.
By depending on the libsnmp-dev we can install on
ubuntu 20.04 which is libsnmp35

Lustre-change: https://review.whamcloud.com/39506
Lustre-commit: af2f77633bf7b12d6ca1ab606ff90cf1ee58107a

Change-Id: Ib921ac35e06149ba88fa8e39b9a0980deb94acf2
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13599 mdt: fix mti_big_lmm buffer usage 21/39521/2
Mikhail Pershin [Tue, 28 Jul 2020 11:33:18 +0000 (14:33 +0300)]
LU-13599 mdt: fix mti_big_lmm buffer usage

The mti_big_lmm buffer can be used just as temporary buffer
in some cases. It should drop mti_big_lmm_used flag after
that to avoid assertion in mdt_big_attr_get().

This fix is extracted from bigger patch of LU-11025 in
master branch.

Lustre-change: https://review.whamcloud.com/37284
Lustre-commit: a336d7c7c1cd62a5a5213835aa85b8eaa87b076a

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3718d6c413ef1d5f8242e548868602ef6476006e
Reviewed-on: https://review.whamcloud.com/39521
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-9971 lnet: use after free in lnet_discover_peer_locked() 91/38891/6
Olaf Weber [Tue, 12 Sep 2017 12:07:50 +0000 (14:07 +0200)]
LU-9971 lnet: use after free in lnet_discover_peer_locked()

When the lnet_net_lock is unlocked, the peer attached to an
lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer)
can change, and the old peer deallocated. If we are really
unlucky, then all the churn could give us a new, different,
peer at the same address in memory.

Change the reference counting on the lnet_peer lp so that it
is guaranteed to be alive when we relock the lnet_net_lock for
the cpt. When the reference count is dropped lp may go away if
it was unlinked, but the new peer is guaranteed to have a
different address, so we can still correctly determine whether
the peer changed and discovery should be redone.

LU-9971 lnet: fix peer ref counting

Exit from the loop after peer ref count has been incremented
to avoid wrong ref count.

The code makes sure that a peer is queued for discovery at most
once if discovery is disabled. This is done to use discovery
as a standard ping for gateways which do not have discovery feature
or discovery is disabled.

Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
Change-Id: Ia44dce20074b27ec0e77d7c1908c6a44ec73d326
Reviewed-on: https://review.whamcloud.com/28944
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38891
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-13609 llog: list all the log files correctly on MGS/MDT 30/39330/4
Emoly Liu [Fri, 10 Jul 2020 05:05:00 +0000 (13:05 +0800)]
LU-13609 llog: list all the log files correctly on MGS/MDT

"lctl --device xxx llog_catlist" should list all the config log on
MGS and catalog on MDT correctly without any buffer size limit.
If data can't be fetched in one time, data->ioc_count is used to
save the number of all the fetched logs and then continue.

conf-sanity.sh test_123af is added to verify this patch. And the
minor style issue in LU-13757 is fixed as well.

Lustre-change: https://review.whamcloud.com/38917
Lustre-commit: 1d97a8b4cd3de9074f323332c7b736367a70d419

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I364d563446833751b1f017fa2bef0351dab56235
Reviewed-on: https://review.whamcloud.com/39330
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13667 ptlrpc: fix endless loop issue 44/39344/2
Hongchao Zhang [Fri, 19 Jun 2020 02:53:12 +0000 (10:53 +0800)]
LU-13667 ptlrpc: fix endless loop issue

In ptlrpc_pinger_main, if the process to ping the recoverable
clients or obd_update_maxusage takes too long time, it could
be stuck in endless loop because of the negative value returned
by pinger_check_timeout.

Lustre-change: https://review.whamcloud.com/38915
Lustre-commit: 6be2dbb2595121fabceda86c5f7bdcb45e10b320

Change-Id: Ib7fc22b3cc31255223bc2be60224ced1a3585f87
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-12222 ptlrpc: Check if NID is local, not just lolnd NID 65/38865/2
Chris Horn [Mon, 27 Apr 2020 15:07:21 +0000 (10:07 -0500)]
LU-12222 ptlrpc: Check if NID is local, not just lolnd NID

There's a couple places where we check whether a NID is the lolnd NID
but we really want to know whether the NID is local. Use
LNetIsPeerLocal() to accomplish this.

Lustre-change: https://review.whamcloud.com/38388
Lustre-commit: 95bcc24642c4b95d093407fef0947ee2f5a2c01a

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ia17b9b4b54fd1063c42a6f8bdd0e593be1086683
Reviewed-on: https://review.whamcloud.com/38865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12222 lnet: Primary NID of lolnd NID is the lolnd NID 64/38864/2
Chris Horn [Wed, 22 Apr 2020 16:42:27 +0000 (11:42 -0500)]
LU-12222 lnet: Primary NID of lolnd NID is the lolnd NID

We want Lustre traffic that is intended for the local peer to be sent
and received over the lolnd. The function ptlrpc_uuid_to_peer() will
currently resolve a NID to the lolnd NID, but ptlrpc_connection_get()
will overwrite this selection with the result from LNetPrimaryNID().

Have LNetPrimaryNID return the lolnd NID when it is passed the lolnd
NID.

Lustre-change: https://review.whamcloud.com/38313
Lustre-commit: 33d2e44e5026f1e9162dd5e6b931085fdc035a34

HPE-bug-id: LUS-8457
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I02708bb45f8440091782ca7886bac7656efb0223
Reviewed-on: https://review.whamcloud.com/38864
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-12222 lnet: Introduce constant for the lolnd NID 63/38863/2
Chris Horn [Wed, 22 Apr 2020 16:39:46 +0000 (11:39 -0500)]
LU-12222 lnet: Introduce constant for the lolnd NID

This patch adds a new constant, LNET_NID_LO_0, to represent the lolnd
NID 0@lo.

Lustre-change: https://review.whamcloud.com/38312
Lustre-commit: 56203e4ba0a64789e42ea45946e8c51f1db351fb

HPE-bug-id: LUS-8457
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I3e57637f297b8de306905a447af8f025e31d1fcf
Reviewed-on: https://review.whamcloud.com/38863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-12758 quota: clear default flag for new ID 08/38808/2
Hongchao Zhang [Tue, 2 Jun 2020 16:20:47 +0000 (09:20 -0700)]
LU-12758 quota: clear default flag for new ID

When setting the quota limits as 0 by "lfs setquota", the default
flag won't be cleared if the lquota_entry is just created for some
quota ID at the first time because the quota limits are the same.

This patch is back-ported from the following one:
Lustre-commit: ce86e23b21ccffc395089578c0ca356de219ac88
Lustre-change: https://review.whamcloud.com/36236

Change-Id: I7f44ce0cb13783ca5bede2f55cd0707f1ccbc8ca
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38808
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13659 kernel: kernel update SLES12 SP4 [4.12.14-95.54.1] 39/39239/3
Jian Yu [Thu, 2 Jul 2020 04:14:25 +0000 (21:14 -0700)]
LU-13659 kernel: kernel update SLES12 SP4 [4.12.14-95.54.1]

Update SLES12 SP4 kernel to 4.12.14-95.54.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT="103a 817"

Change-Id: If7b9143bec6d9c526bd65e96a771c83f2530e608
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39239
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13599 mdt: fix logic of skipping local locks in reply_state 91/39191/4
Mikhail Pershin [Fri, 26 Jun 2020 15:17:06 +0000 (18:17 +0300)]
LU-13599 mdt: fix logic of skipping local locks in reply_state

The mdt_reint_migrate() controls amount of local locks taken and
prevent the saving too many locks in reply_state by doing local
sync instead. Meanwhile there is flaw in logic of doing that so
they are saved always causing assertion in ptlrpc_save_lock().

Patch adds 'do_sync' local parameter into consideration while
deciding to save local lock or not.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I98cca84825ce5789094fbceb5d1f7975410d134b
Reviewed-on: https://review.whamcloud.com/39191
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12424 lnet: prevent loop in LNetPrimaryNID() 90/38890/4
Amir Shehata [Tue, 11 Jun 2019 18:25:27 +0000 (11:25 -0700)]
LU-12424 lnet: prevent loop in LNetPrimaryNID()

If discovery is disabled locally or at the remote end, then attempt
discovery only once. Do not update the internal database when
discovery is disabled and do not repeat discovery.

This change prevents LNet from getting hung waiting for
discovery to complete.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4543b0f71e6cf297a1a5f058ebcc6bf74b8ac328
Reviewed-on: https://review.whamcloud.com/35191
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38890
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
3 years agoLU-13149 tests: change sanityn 103 facet value 47/38847/3
James Nunez [Fri, 5 Jun 2020 15:15:01 +0000 (09:15 -0600)]
LU-13149 tests: change sanityn 103 facet value

The facet name input to lustre_version_code() in sanityn
test 103 should be 'ost1' not a variable '$ost1'.  Let's
replace this call with the $OST1_VERSION variable.

Fixes: 2548cb9e32bfca ("LU-11670 osc: glimpse - search for active lock")
Test-Parameters: trivial
Test-Parameters: serverversion=2.10.8 serverdistro=el7.6 env=ONLY=103 testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ib7426f78210c9b32ba53c46ba5f08faeb3ea8ec5
Reviewed-on: https://review.whamcloud.com/38847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 years agoLU-11782 tests: add version check to conf-sanity 117 51/38851/2
James Nunez [Fri, 5 Jun 2020 17:21:09 +0000 (11:21 -0600)]
LU-11782 tests: add version check to conf-sanity 117

conf-sanity test 117 was added to check error returns from
read_param().  This test will fail when run with servers
with Lustre version less than 2.12.0 and, thus, should be
skipped for all Lustre servers earlier than 2.12.0.

Fixes: 6ca2425ccf6b ("LU-11198 utils: propagate errors for read_param")
Test-Parameters: trivial
Test-Parameters: serverversion=2.10.8 serverdistro=el7.6 env=ONLY=117 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ia0889584d9c1a6c09ea2a99fa11c7abfd1474de4
Reviewed-on: https://review.whamcloud.com/38851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 years agoLU-13640 tests: add version check to conf-sanity 125 50/38850/2
James Nunez [Fri, 5 Jun 2020 16:55:44 +0000 (10:55 -0600)]
LU-13640 tests: add version check to conf-sanity 125

In Lustre 2.12.3, the l_tunedisk utility was modified to
skip tuning devices on the MDS and MGS and conf-santity
test 125 was added to check this functionality.  Thus, this
test should be skipped for all Lustre server versions prior
to 2.12.3.

Fixes: bab0570ce3081 ("LU-12387 tests: Validate l_tunedisk max_sectors_kb tuning")
Test-Parameters: trivial
Test-Parameters: serverversion=2.10.8 serverdistro=el7.6 env=ONLY=125 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I89c2900c2430ff3e76bee297809957380404aa31
Reviewed-on: https://review.whamcloud.com/38850
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13088 ldlm: Fix sleeping function called in atomic 83/39283/3
Mr NeilBrown [Thu, 19 Dec 2019 05:55:35 +0000 (16:55 +1100)]
LU-13088 ldlm: Fix sleeping function called in atomic

target_recovery_overseer() can sleep while holding a spinlock, which
triggers a BUG warning.

It is easily fixed by dropping the spinlock before waiting.  In the
case where the task waits, no useful information that could be
protected by the spinlock is held, so nothing can be lost by dropping
it.

Lustre-change: https://review.whamcloud.com/#/c/37063/
Lustre-commit: b29b9310dafe17ba78e1db490b79b89d2d6fdcd1

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8bb3d02523b5dcfadac19f01ccb736d7b7f28239
Reviewed-on: https://review.whamcloud.com/37063
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39283

3 years agoLU-13653 mdt: ignore quota when creating slave stripe 82/39282/2
Hongchao Zhang [Wed, 24 Jun 2020 09:53:55 +0000 (17:53 +0800)]
LU-13653 mdt: ignore quota when creating slave stripe

When creating striped directory, the quota limit has been checked
on master MDT, the quota should be ignored when creating the slave
stripe object.

Lustre-change: https://review.whamcloud.com/#/c/38875/
Lustre-commit: f762acebfcc6a88c3f4ba6296cbd6f1696bff530

Change-Id: Ia53b1975a8d66c78725feb313659f7a9b889e735
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38875
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39282
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
3 years agoLU-13709 utils: 'lfs mkdir -i -1' doesn't work 65/39165/3
Lai Siyao [Wed, 24 Jun 2020 12:01:08 +0000 (20:01 +0800)]
LU-13709 utils: 'lfs mkdir -i -1' doesn't work

'lfs mkdir -i -1 -c...' is to create directory on MDTs by space usage,
when stripe count is more than 1, the target MDT list is not correctly
initialized, which will cause command fail.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id4584940cec390a9245e888c96c7873f5afa209e
Reviewed-on: https://review.whamcloud.com/39165
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13600 ptlrpc: limit rate of lock replays 11/39111/5
Mikhail Pershin [Fri, 12 Jun 2020 14:14:50 +0000 (17:14 +0300)]
LU-13600 ptlrpc: limit rate of lock replays

Clients send all lock replays at once and that may overwhelm
server with huge amount of replays in recovery queue causing
OOM effects.

Patch adds rate control for lock replays on client.

Patch includes also later fix for signal_completed_replay()
race.

Lustre-change: https://review.whamcloud.com/38920
Lustre-commit: 3b613a442b8698596096b23ce82e157c158a5874

Lustre-change: https://review.whamcloud.com/39140
Lustre-commit: dc654756af63bd30802ebd86074019d1533a4d8f

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie557f8481c5facb690468d7136cf5feebe4e8f11
Reviewed-on: https://review.whamcloud.com/39111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13657 kernel: kernel update RHEL8.2 [4.18.0-193.6.3.el8_2] 03/38903/4
Jian Yu [Tue, 7 Jul 2020 18:13:05 +0000 (11:13 -0700)]
LU-13657 kernel: kernel update RHEL8.2 [4.18.0-193.6.3.el8_2]

Update RHEL8.2 kernel to 4.18.0-193.6.3.el8_2 for Lustre client.

Test-Parameters: trivial clientdistro=el8.2

Change-Id: Id9eb16b9277bf2157905eb38a23a3250a0033560
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38903
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13503 mdc: allow setting max_mod_rpcs_in_flight larger 93/38893/3
Andreas Dilger [Wed, 10 Jun 2020 21:34:03 +0000 (14:34 -0700)]
LU-13503 mdc: allow setting max_mod_rpcs_in_flight larger

Allow setting mdc.*.max_mod_rpcs_in_flight > mdc.*.max_rpcs_in_flight
by increasing the latter value, rather than returning an error and
telling the user to do that.  This matches the similar behavior if
mdc.*.max_rpcs_in_flight is reduced lower than max_mod_rpcs_in_flight.

If there are multiple MDTs, the "mdc.*.max_mod_rpcs_in_flight" param
may be set from e.g. the MDT0000 config log before MDT0001 is fully
configured, catching MDT0001 with ocd_maxmodrpcs = 0 before the OCD
from the MDT has been filled in, and incorrectly trigger an error.
If seen during setup, allow ocd_maxmodrpcs = (max_rpcs_in_flight - 1),
since this will be fixed up later if mdc.*.max_rpcs_in_flight is set
smaller in the config log (if set larger it doesn't matter).

Test-Parameters: env=ONLY=90 testlist=conf-sanity

This patch is back-ported from the following one:
Lustre-commit: 6d314902e6d19229379577aab60d4b20a5b4d2ea
Lustre-change: https://review.whamcloud.com/38455

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I4b20163e9e212db451738169ebdc361ab8c1c15e
Reviewed-on: https://review.whamcloud.com/38893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12100 tests: Use least qunit to set limit 69/38769/4
Nathaniel Clark [Tue, 19 Nov 2019 14:52:45 +0000 (09:52 -0500)]
LU-12100 tests: Use least qunit to set limit

Use least qunit to set lower limit for inodes in sanity-quota/2
This ensures that the limit is set at or above the minimum size.

Lustre-change: https://review.whamcloud.com/36797
Lustre-commit: 33e500cfb33406b8dddac46e1dfb5a3d59ff01c5

Test-Parameters: trivial
Test-Parameters: env=ONLY=2 testlist=sanity-quota
Test-Parameters: env=ONLY=2 testlist=sanity-quota fstype=zfs
Test-Parameters: env=ONLY=2,ONLY_REPEAT=20 fstype=zfs testlist=sanity-quota
Test-Parameters: mdtcount=2 mdscount=4 env=ONLY=2,ONLY_REPEAT=20 fstype=zfs testlist=sanity-quota

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I80e2c3cb66870d11f74f34c435e266a46630479b
Reviewed-on: https://review.whamcloud.com/36797
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/38769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13473 llite: don't check mirror info for page discard 56/38856/2
Bobi Jam [Wed, 22 Apr 2020 05:28:54 +0000 (13:28 +0800)]
LU-13473 llite: don't check mirror info for page discard

The CIT_MISC is used for locks/pages manipulation, it will not
go with full io procedure, i.e. cl_io_loop() will not be called
for it. So don't check it for plain file since the mirror info
is not initialized/set in this case.

Lustre-change: https://review.whamcloud.com/38307
Lustre-commit: d0dd744ed6ae002f34bdade993428b635b23d072

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I723d18260629b8f7c470d350d6d899d3bb88018a
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38856
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>