Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-14739 quota: nodemap squashed root cannot bypass quota
Sebastien Buisson [Fri, 11 Jun 2021 14:49:47 +0000 (16:49 +0200)]
LU-14739 quota: nodemap squashed root cannot bypass quota

When root on client is squashed via a nodemap's squash_uid/squash_gid,
its IOs must not bypass quota enforcement as it normally does without
squashing.
So on client side, do not set OBD_BRW_FROM_GRANT for every page being
used by root. And on server side, check if root is squashed via a
nodemap and remove OBD_BRW_NOQUOTA.

Lustre-change: https://review.whamcloud.com/43988
Lustre-commit: a4fbe7341baf12c00c6048bb290f8aa26c05cbac

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I95b31277273589e363193cba8b84870f008bb079
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44292
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
3 years agoLU-14647 flr: mmap write/punch does not stale other mirrors
Bobi Jam [Wed, 28 Apr 2021 05:07:36 +0000 (13:07 +0800)]
LU-14647 flr: mmap write/punch does not stale other mirrors

mmap write and punch/fallocate do not stale other mirrors and makes
FLR file contains different content in different mirrors.

Lustre-change: https://review.whamcloud.com/43470/
Lustre-commit: 03511484c668355c77e54e4b01600183236d8673

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I93a3eb5ba898e3bf0ce108718506b742ed485da5
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44258
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]
Jian Yu [Thu, 22 Jul 2021 07:39:45 +0000 (00:39 -0700)]
LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]

Update RHEL7.9 kernel to 3.10.0-1160.36.2.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ie2898b1df28c8b99ea4099e94baafe388c6aa626
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3054 lipe: unlink pcc cache file if it's needed
Lei Feng [Thu, 22 Apr 2021 09:51:26 +0000 (17:51 +0800)]
EX-3054 lipe: unlink pcc cache file if it's needed

Cache file is not guaranteed to be deleted after detach
operation returns success. So we unlink the cache file
if it's needed.

Change --force_clear option to --clear_hashdir option.
It tries to remove empty hash dir recursively up to the cache
dir root. The option is off by default.

Change-Id: I49911c688faaf6c7baa814b260a4ff492de077fc
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43403
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3041 lipe: include reserved space into disk usage
Lei Feng [Tue, 20 Apr 2021 00:08:59 +0000 (08:08 +0800)]
EX-3041 lipe: include reserved space into disk usage

Typically ext4 fs reserves 5% disk space for root. It's
calculated as used space when df command shows the Use%.
So we calculate the usage in lpcc_purge in the same way
to be consistent with df.

Change-Id: I1cdee6ea66ad27bf7501cecb4a4a9495e09647a6
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-2988 lipe: double confirm atime before detach it
Lei Feng [Wed, 14 Apr 2021 09:30:29 +0000 (17:30 +0800)]
EX-2988 lipe: double confirm atime before detach it

For lpcc_purge, double confirm the atime before detach it.
If the atime has been changed, don't detach it.

Change-Id: Ib08fc684f1c815cc8477bd8041e007541c18b6c4
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43312
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-2989 lipe: collect lpcc_purge stats
Lei Feng [Tue, 13 Apr 2021 00:19:52 +0000 (08:19 +0800)]
EX-2989 lipe: collect lpcc_purge stats

Collect stats data for lpcc_purge and dump them by sigusr1.

Change-Id: Ifbc5502e53efd3a40846e4ea8c551f05f6b0ce09
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43287
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-2975 lipe: implement approximate LRU in lpcc_purge
Lei Feng [Wed, 7 Apr 2021 08:08:55 +0000 (16:08 +0800)]
EX-2975 lipe: implement approximate LRU in lpcc_purge

LRU is actually FIFO based on the atime of cache file. But we
can only use limited memory to sort the atime of all files.
So we implement an approximate LRU in lpcc_purge.

Change-Id: Ic636d97b08ee6424f54a1088dfe07359df1f8f79
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-2818 lipe: Parse arguments and config file
Lei Feng [Tue, 27 Apr 2021 10:50:10 +0000 (18:50 +0800)]
EX-2818 lipe: Parse arguments and config file

Add the code to parse arguments and config file, and some arguments.
Add the framework of wait-scan-free.

Change-Id: I792715badcb5a1fb1aa9062f04c2afab3229b747
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-2818 lipe: Create lpcc_purge daemon
Lei Feng [Thu, 1 Apr 2021 07:02:31 +0000 (15:02 +0800)]
EX-2818 lipe: Create lpcc_purge daemon

Create an empty lpcc_purge daemon.

Change-Id: I393797cfc328e44fa89b6fc89363a4e079a26993
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/43192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3468 build: only include mlnx-tools for MOFED 5.4+
Minh Diep [Wed, 21 Jul 2021 23:17:48 +0000 (16:17 -0700)]
EX-3468 build: only include mlnx-tools for MOFED 5.4+

mlnx-tools is required starting 5.4+

Test-Parameters: trivial

Change-Id: I8fdd4d0b4ac86335d4d7fe349c89f502d608940c
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44370
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3516 tests: skip hot-pools.sh during interop testing
Jian Yu [Wed, 21 Jul 2021 18:48:01 +0000 (11:48 -0700)]
EX-3516 tests: skip hot-pools.sh during interop testing

Skip hot pools for interop testing. It is really a server-side
functionality, and using old/new test scripts doesn't make sense.

Test-Parameters: trivial \
serverjob=lustre-b_es5_2 serverbuildno=305 \
testlist=hot-pools

Change-Id: I26fee51cf284b4f4a75ddd4ef715dd9ff417d283
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoLU-13124 scrub: check for multiple linked file
Hongchao Zhang [Wed, 21 Jul 2021 08:44:07 +0000 (16:44 +0800)]
LU-13124 scrub: check for multiple linked file

The files on OSTs should have only one link, but it could
have more than one link when there are some disk failures
"multiply claimed block(s)" and fixed by e2fsck to clone
these conflicted blocks. This patch adds the check of these
multiple linked files in Scrub on OST.

The name of the objects in "O" depends on the object's FID,
the directory pattern is O/[FID_SEQ]/[SUB_DIR]/[FID_OID],
the inodes of these multiple linked files are normal, but
there is only one directroy entry compatible with the object,
this patch scans all files under "O" to check whether its name
is matched with its FID.

Lustre-change: https://review.whamcloud.com/37194
Lustre-commit: 0c1ae1cb9c19f8a4f6c5a7ff6a1fd54807430795
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I280a725939b037006935d47e9ef426a4a6a7b317
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44299
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14831 osd-ldiskfs: uninited osd_inode_id is used
Hongchao Zhang [Wed, 30 Jun 2021 11:15:03 +0000 (19:15 +0800)]
LU-14831 osd-ldiskfs: uninited osd_inode_id is used

In osd_fid_lookup, the "osd_inode_id" could be used uninitializedly
if the FID doesn't exist in OI, which cause some faked FID/inode
pair to be inserted into OI file and the OI scrub thread could be
triggered repeatedly.

Lustre-change: https://review.whamcloud.com/44349
Lustre-commit: TBD (from fd0330f95dc2b1d4f10dd137dc0bfaa7ebbf0dfb)

Change-Id: I9100dece9e94d3e590f17fb4498601876aa1edaa
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44350
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14729 osd-ldiskfs: fix to declare write commits
Wang Shilong [Mon, 14 Jun 2021 01:28:51 +0000 (09:28 +0800)]
LU-14729 osd-ldiskfs: fix to declare write commits

Fallocation might introduce unwritten extents, writting
data will trigger extents split, so we should reserve
credits for this case, to avoid complicated calculation,
we just use normal credits calculation if extent is mapped
as unwritten.

See comments in ext4:
If we add a single extent, then in the worse case, each tree
level index/leaf need to be changed in case of the tree split.
If more extents are inserted, they could cause the whole tree
split more than once, but this is really rare.

Lustre always reserve extents in 1 extent case, this is wrong.
Also fix indirect blocks calculation.

Lustre-change: https://review.whamcloud.com/43994
Lustre-commit: 9810341a839c27b7a53cdc047e0395f8f906c4bf

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I9b67ec7b002711f040f46d0c77a645bb6f57a7de
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44280
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14729 osd-ldiskfs: declare dirty block groups correctly
Wang Shilong [Wed, 2 Jun 2021 01:52:39 +0000 (09:52 +0800)]
LU-14729 osd-ldiskfs: declare dirty block groups correctly

Calculate dirty block groups only include estimated extents,
indirect blocks and extent node/leaf blocks are missed, this
could make us short of credits.

Lustre-change: https://review.whamcloud.com/43890
Lustre-commit: 42cda8781f94ad1138afac2d23180ea48f3c3450

Fixes: 0271b17b80a82 ("LU-14134 osd-ldiskfs: reduce credits for new writing")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Iec8525823b04e909c030f94bf75b8eca60d31c50
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44279
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-10948 mdt: New connect flag for non-open-by-fid lock
Oleg Drokin [Thu, 3 Jun 2021 00:10:47 +0000 (20:10 -0400)]
LU-10948 mdt: New connect flag for non-open-by-fid lock

While we removed the 2.1 check for open by fid when open
lock is requested, when you talk to old servers that don't
have that patch - they get an open error, so introduce a compat
flag.

Lustre-commit: 72c9a6e5fb6e11fca1b1438ac18f58ff7849ed7d
Lustre-change: https://review.whamcloud.com/43907/

Change-Id: I94d50ad98a2828519853a35fa90c5063adf2feab
Fixes: 41d99c4902 ("LU-10948 llite: Introduce inode open heat counter")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44260
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-10948 llite: Introduce inode open heat counter
Oleg Drokin [Tue, 13 Apr 2021 07:46:41 +0000 (03:46 -0400)]
LU-10948 llite: Introduce inode open heat counter

Initial framework to support detection of naive apps that
assume open-closes are "free" and proceed to open/close
same files between minute operations.

We will track number of file opens per inode and last time inode
was closed.

Initially we'll expose these controls:
llite/opencache_threshold_count - enables functionality and controls after
                                  how many opens open lock is requested
llite/opencache_threshold_ms    - if any reopen happens within this time (in
                                  ms), open would trigger open lock request
llite/opencache_max_ms          - If last close was longer than this many ms
                                  ago - start counting opens from zero again

Once enough useful data is collected we can look into adding a heatmap
or another similar mechanism to better manage it and enable it
by default with sensible settings.

Currently it's disabled by default

Lustre-change: https://review.whamcloud.com/32158/
Lustre-commit: 41d99c4902836b7265db946dfa49cf99381f0db4

Change-Id: I1aa5455b458840acad651f651c883a7a7a67ab4c
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/44301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14641 osd-ldiskfs: write commit declaring improvement
Wang Shilong [Mon, 26 Apr 2021 03:23:26 +0000 (11:23 +0800)]
LU-14641 osd-ldiskfs: write commit declaring improvement

This patch try to:

1)extent bytes could be missed to increase with less than
1M, fix to to compare it with current value, and decay
it for every allocation.

2)with system space usage growing up, mballoc codes won't
try best to scan block group to align best free extent as
we can. So extent bytes per extent could be decayed to a
very small value, this could make us reserve too many credits.
We could be more optimistic in the credit reservations, even
in a case where the filesystem is nearly full, it is extremely
unlikely that the worst case would ever be hit.

3)Add extent bytes stats and debug ability to analysis
over reservation problem.

Lustre-commit: 0f81c5ae973bf7fba45b6ba7f9c5f4fb1f6eadcb
Lustre-change: https://review.whamcloud.com/43446

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I357c4a855147ba26a9e9bbe9ab1269bcfd44e5f3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44244
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoRM-620 build: New tag 2.14.0-ddn6
Andreas Dilger [Tue, 20 Jul 2021 23:23:49 +0000 (17:23 -0600)]
RM-620 build: New tag 2.14.0-ddn6

New tag 2.14.0-ddn6

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1e04c0d5f8e79eec8d48cbec37fe576108dded45

3 years agoLU-13717 sec: rework includes for client encryption
Sebastien Buisson [Tue, 23 Mar 2021 14:19:01 +0000 (14:19 +0000)]
LU-13717 sec: rework includes for client encryption

Simplify includes for crypto, by not repeating stubs in case
HAVE_LUSTRE_CRYPTO is not defined.
Expose encoding routines that are going to be used in the Lustre
code (both client and server sides) with filename encryption.

Lustre-change: https://review.whamcloud.com/43386
Lustre-commit: 028281ae195927e97518c573e7be3e326d9e6bd3

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6c5853d6da7120edd2bec3a12494251d873151a8
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44187
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14629 sec: forbid file rename from enc to unencrypted dir
Sebastien Buisson [Thu, 22 Apr 2021 09:26:51 +0000 (11:26 +0200)]
LU-14629 sec: forbid file rename from enc to unencrypted dir

fscrypt allows renaming an encrypted file from an encrypted directory
into an unencrypted directory. But it leaves the file encrypted,
sitting in an unencrypted directory, which can lead to unexpected
issues.
So just prevent this kind of rename, and adapt sanity-sec test_47
accordingly.

Lustre-change: https://review.whamcloud.com/43404
Lustre-commit: 1158386ac9c6a638f791f62e47a7513b2322772c

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I38e17caa4786c1c8d80a363a826a5aa298eb0980
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43908
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14742 socklnd: detect link state to set fatal error on ni
Serguei Smirnov [Tue, 8 Jun 2021 21:11:41 +0000 (14:11 -0700)]
LU-14742 socklnd: detect link state to set fatal error on ni

To help avoid selecting lnet ni which corresponds to a downed
ethernet link for sending, add a mechanism for detecting link
events in socklnd. On link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag, similar to o2iblnd way.

Lustre-change: https://review.whamcloud.com/43952
Lustre-commit: fc2df80e96dc5db9f3fb710893ccf6f442664471

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie9f4f02fcb8b988c77bf63f751d5a621e79e9f58
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-12815 socklnd: add conns_per_peer parameter
Serguei Smirnov [Tue, 30 Mar 2021 16:58:57 +0000 (12:58 -0400)]
LU-12815 socklnd: add conns_per_peer parameter

Introduce conns_per_peer ksocklnd module parameter.
In typed mode, this parameter shall control
the number of BULK_IN and BULK_OUT tcp connections,
while the number of CONTROL connections shall stay
at 1. In untyped mode, this parameter shall control
the number of untyped connections.
The default conns_per_peer is 1. Max is 127.

Lustre-change: https://review.whamcloud.com/41056
Lustre-commit: 71b2476e4ddb95aa42f4a0ea3f23b1826017bfa5

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I70bbaf7899ae1fbc41de34553c8c4ad1c7d55f7e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
3 years agoLU-13641 socklnd: replace route construct
Serguei Smirnov [Tue, 30 Mar 2021 16:48:04 +0000 (12:48 -0400)]
LU-13641 socklnd: replace route construct

With TCP bonding removed, it's no longer necessary to
maintain multiple route constructs per peer_ni in socklnd.
Replace the route construct with connection control block,
conn_cb, and make sure there's a single conn_cb per peer_ni.

Lustre-change: https://review.whamcloud.com/40774
Lustre-commit: 7766f01e891c378d1bf099e475f128ea612488f0

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1de683429af5f93b3197b6d536e80b5ac1e67a22
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
3 years agoLU-13641 socklnd: remove tcp bonding
Serguei Smirnov [Tue, 16 Mar 2021 21:34:26 +0000 (17:34 -0400)]
LU-13641 socklnd: remove tcp bonding

TCP bonding in the socklnd has become obsolete with LNet
Multi-Rail and there's no evidence it's being used anywhere.
Remove it to keep the code simple.

Lustre-change: https://review.whamcloud.com/40000
Lustre-commit: d123c47a18adbf5665ed63d99c53117b84db9ec8

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib456f951b8ccd59112c460085632a2cb3c982004
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44326
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
3 years agoEX-3409 pcc: add owner capacity check for open attach
Qian Yingjin [Thu, 1 Jul 2021 07:49:58 +0000 (15:49 +0800)]
EX-3409 pcc: add owner capacity check for open attach

This patch adds owner and capacity check when try to auto attach
at the open() time.
Add sanity-pcc test_43.

For the command "lfs pcc attach_fid", make it more convenient by
the way that the parameter "-m" is not mandatory required when
specifying the mount point.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icb8cbddb64c5712e2db970120b06fc1a0216c332
Reviewed-on: https://review.whamcloud.com/44123
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14627 tests: Create unload_modules_local
Chris Horn [Fri, 23 Apr 2021 19:05:02 +0000 (14:05 -0500)]
LU-14627 tests: Create unload_modules_local

t-f allows for loading modules on single node via load_modules_local.
However, there is no corresponding unload_modules_local that can be
called to cleanup after call to load_modules_local, so we create it.
unload_modules() refactored to use unload_modules_local.

Also address a potential issue that can prevent LND modules from
unloading. Some LNet setup (particularly those in sanity-lnet) may
require that we call lnetctl lnet unconfigure (or lctl net down)
to drop a ref on the module before it can be unloaded.

Lustre-change: https://review.whamcloud.com/43425
Lustre-commit: 32304d863ae98c641f541362f54e7b1f24b350a6

HPE-bug-id: LUS-9031
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6458a7728f5f559f8641c5a9e29dd775c8445c38
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44255
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-12125 tests: allow racer to specify extra tasks
Andreas Dilger [Fri, 15 Jan 2021 01:21:11 +0000 (18:21 -0700)]
LU-12125 tests: allow racer to specify extra tasks

Add the RACER_EXTRA environment variable to allow racer.sh to run
extra tasks.

Lustre-change: https://review.whamcloud.com/41231
Lustre-commit: fe2663f18e50023ad5cfe5e07b695378dd27a68e

Test-Parameters: trivial testlist=racer
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic810a248a2dd665a163e0efea8c9af0e4461e09b
Reviewed-on: https://review.whamcloud.com/41961
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14526 flr: mirror split downgrade SOM
Bobi Jam [Tue, 30 Mar 2021 11:20:26 +0000 (19:20 +0800)]
LU-14526 flr: mirror split downgrade SOM

After mirror split, the file's blocks on SoM is not accurate, this
patch downgrade the SoM from STRICT so that size glimpse does not
trust the SoM from the MDS.

Lustre-change: https://review.whamcloud.com/43168
Lustre-commit: a30750ad2cc5f10d9d1cc0e30199073091c06f2b

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I02350c24190d96af93fed8c1b8a0bc6beb2c4bc2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44253
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13055 mdd: per-user changelog names and mask
Mikhail Pershin [Tue, 22 Jun 2021 18:16:26 +0000 (21:16 +0300)]
LU-13055 mdd: per-user changelog names and mask

Allow specifying a name for newly-registered changelog users,
rather than the default "clNNN" that is otherwise used. This
allows services to register a "well-known" changelog user,
rather than having to store the changelog username in HA storage
outside of the filesystem.

Each changelog user still has a unique ID appended to it, to allow
the changelog_clear and changelog_deregister commands to be run
using only the ID if necessary/desired. User name can be used to
deregister. User name is also unique per server.

If no name is given, then default "cl" format is used.

With this new functionality, it is possible to specify the name like:
 # lctl --device testfs-MDT0000 changelog_register --user watcher
   testfs-MDT0000: Registered changelog userid 'cl13-watcher'

Per-user mask is also added to allow specific operation logging on
per-user basis. Mask can be set only during registration. Resulting
mask from per-server mask and all user masks is used for current
changelog operations.

Lustre-change: https://review.whamcloud.com/43380
Lustre-commit: a15eb4f13224e148810015896101b2950c85adff

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I56028f54cc97bbc9af03fd6559c19ef854f759d8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/44283
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14731 mdd: clear orphans changelog entries
John L. Hammond [Wed, 2 Jun 2021 17:05:01 +0000 (12:05 -0500)]
LU-14731 mdd: clear orphans changelog entries

In mdd_changelog_llog_init(), adjust the orphan changelog index logic
to account for the case when no users are registered. Add sanity
test_160n() to verify this.

Lustre-change: https://review.whamcloud.com/43901
Lustre-commit: c7d8fe31064990d7053436ccd720c531bc78a2dc

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I03b0c1002a0e16f26af8ec23bf06c9a07dec858a
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44282
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14688 mdt: changelog purge deletes plain llog
Alexander Boyko [Mon, 17 May 2021 13:29:01 +0000 (09:29 -0400)]
LU-14688 mdt: changelog purge deletes plain llog

With a massive cancel records changelog could delete a plain
llog file and skip one by one record cancelling.
Also patch fixes the race between llog_destroy and llog_next_block.

Lustre-change: https://review.whamcloud.com/43719
Lustre-commit: d813c75df6798efbf3228347628c0d671ca7269c

HPE-bug-id: LUS-9950
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I47c2ed97945e979745255381f83b6a417d7ba8b1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44262
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-12142 readahead: limit over reservation
Wang Shilong [Wed, 17 Mar 2021 09:58:00 +0000 (17:58 +0800)]
LU-12142 readahead: limit over reservation

For performance reason, exceeding @ra_max_pages are allowed to
cover current read window, but this should be limited with RPC
size in case a large block size read issued. Trim to RPC boundary.

Otherwise, too many read ahead pages might be issued and
make client short of LRU pages.

Lustre-commit: 1058867c004bf19774218945631a691e8210b502
Lustre-change: https://review.whamcloud.com/42060

Fixes: 777b04a093 ("LU-13386 llite: allow current readahead to exceed reservation"
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Icf74b5fbc75cf836fedcad5184fcdf45c7b037b4
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/43455
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14504 lod: lod_xattr_del() check obj existence
Lai Siyao [Wed, 10 Mar 2021 10:13:18 +0000 (18:13 +0800)]
LU-14504 lod: lod_xattr_del() check obj existence

lod_declare_xattr_del() skips object if it doesn't exist, but
lod_xattr_del() doesn't, which may trigger assertion in
osp_xattr_del() if a stripe doesn't exist.

Lustre-change: https://review.whamcloud.com/41976
Lustre-commit: c8d81a1d1d82fede40ae95924aca12bc5e55426d

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I00723d3b0243efd1357107c59dd86967e076e2af
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/42047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14606 llog: hide ENOENT for cancelling record
Alexander Boyko [Mon, 12 Apr 2021 12:19:47 +0000 (08:19 -0400)]
LU-14606 llog: hide ENOENT for cancelling record

Llog allows parallel records processing. A record could be cancelled
at callback. If two threads processing and cancelling the same record,
one thread would get ENOENT.
The error was observed during purging changlog records.The patch
adds reproducer test sanity 160m.

This is a valid case, let's hide ENOENT error from a caller.

Lustre-change: https://review.whamcloud.com/43264
Lustre-commit: 0b60647c0382426e3b4105d82d04862d2e4831cb

HPE-bug-id: LUS-9826
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id00b959e6f329c2ad34966f8a17a52f71680f24c
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44333
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14778 readahead: fix to reserve min pages
Wang Shilong [Tue, 22 Jun 2021 01:26:40 +0000 (09:26 +0800)]
LU-14778 readahead: fix to reserve min pages

@pages_min might be larger than @pages which indicate
more pages should be read, and it will cause a warning
later.

Lustre-change: https://review.whamcloud.com/44050
Lustre-commit: 4fc127428f00d6a3b179a143a61ddc78e5d8ca7c

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ifd82f709c3877172f08b87ab0551da735a0613e0
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/44287
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14549 llite: refresh layout after mirror merge/split
Bobi Jam [Mon, 17 May 2021 09:14:33 +0000 (17:14 +0800)]
LU-14549 llite: refresh layout after mirror merge/split

mirror merge/split updates file's LOVEA and revokes client's layout
lock, but the client issuing the layout change needs to refresh its
layout (lov->lsm) as well.

Lustre-change: https://review.whamcloud.com/43716
Lustre-commit: bd7a20f8be4644ebce6a7225560ae933204f543d

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7671efe2fe5354ba0e1503b146045165608e042c
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14119 lfsck: check linkea if it's newly added
Lai Siyao [Thu, 14 Jan 2021 09:14:01 +0000 (17:14 +0800)]
LU-14119 lfsck: check linkea if it's newly added

In LFSCK phase one, if new linkea entry is added, and final linkea
entry count is more than one, add file in trace file, so that the
linkea sanity will be checked in phase two.

And in phase two check, if link parent FID can't be mapped to valid
inode, remove it from linkea.

Add sanity-lfsck 1d, which changed parent directory FID in LMA,
therefore the FID in LMA mismatches with parent FID in child linkea,
verify LFSCK can fix such inconsistency.

Lustre-change: https://review.whamcloud.com/41261
Lustre-commit: afd00cacd0b6ef87282887b4e965350a9c1a6821

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I315983d262110c1e36c3893fa2e51925d96c51a7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44237
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14734 osd-ldiskfs: enable large_dir automatically
Andreas Dilger [Sat, 5 Jun 2021 08:34:15 +0000 (02:34 -0600)]
LU-14734 osd-ldiskfs: enable large_dir automatically

Enable the large_dir feature automatically at mount time for
filesystems that do not have it enabled already.  Otherwise,
the REMOTE_PARENT_DIR may overflow if there are many remote
entries created, or for object directories on very large OSTs.
It isn't really needed on a dedicated MGS filesystem.

Lustre-change: https://review.whamcloud.com/43931
Lustre-commit: 0f6ace6e8edef1c08c8ef3785e9c08d21a72b34a

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1c4ead26b09d60567ad12945d7b366b53475cebb
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14648 lod: protect lod_object layout info
Bobi Jam [Wed, 12 May 2021 08:18:00 +0000 (16:18 +0800)]
LU-14648 lod: protect lod_object layout info

Need to protect lod_object's layout access with ldo_layout_mutex.

Lustre-commit: 25aa8527374f8120c113dc12adb1366a1ab98152
Lustre-change: https://review.whamcloud.com/43671

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2c4a2078bdce64d15485d3ff18f6670d42ca90ba
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14663 mdc: start changelog thread upon first access
Alex Zhuravlev [Sun, 2 May 2021 09:16:01 +0000 (12:16 +0300)]
LU-14663 mdc: start changelog thread upon first access

thus leaving the caller a chance to set CHANGELOG_FLAG_FOLLOW,
otherwise the thread (started from open()) can reach the end
of the changelog and exit early.

Lustre-change: https://review.whamcloud.com/43513
Lustre-commit: 72a08ea547dceb542d554e9057e0ed257138bd48

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic14b6c991010bbe5197b5a8b0fedf0f4007e98c1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44263
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14762 lmv: compare space to mkdir on parent MDT
Lai Siyao [Mon, 14 Jun 2021 07:26:47 +0000 (15:26 +0800)]
LU-14762 lmv: compare space to mkdir on parent MDT

In QOS subdirectory creation, subdirectories are kept on parent MDT
if it is less full than average, however it checks weight other than
free space, while "weight = free space - penalty", if MDTs have
different penalties, the result is not accurate, therefore this may
not work.

Check free space instead, and loosen the critirion to allow the
free space within the range of QOS threshold.

Lustre-change: https://review.whamcloud.com/43997
Lustre-commit: 002c2a80266b23c1df02d554fbdc7e5817c42d13

Fixes: 3f6fc483013d ("LU-13439 lmv: qos stay on current MDT if less full")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id34cf8f3f58fee9d329f0d05c2f7a6463b67dfe1
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-7853 lod: fixes bitfield in lod qos code
Rahul Deshmkuh [Thu, 14 Jul 2016 06:02:45 +0000 (23:02 -0700)]
LU-7853 lod: fixes bitfield in lod qos code

Updating bitfields in struct lod_qos struct is protected
by lq_rw_sem in most places but an update can be lost
due unprotected bitfield access from
lod_qos_thresholdrr_seq_write() and qos_prio_free_store().
This patch fixes it by replacing bitfields with named bits
and atomic bitops.

Lustre-change: https://review.whamcloud.com/18812
Lustre-commit: 3bae39f0a5b98a279fb5f7b8d00211ac0d09366f

Cray-bug-id: LUS-4651
Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I28299ce4960e91be551d7f6e43a3b598daf4d7a2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/44313
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14031 ptlrpc: decrease time between reconnection
Alexander Boyko [Wed, 14 Oct 2020 08:20:58 +0000 (04:20 -0400)]
LU-14031 ptlrpc: decrease time between reconnection

When a connection get a timeout or get an error reply from a sever,
the next attempt happens after PING_INTERVAL. It is equal to
obd_timeout/4. When a first reconnection fails, a second go to
failover pair. And a third connection go to a original server.
Only 3 reconnection before server evicts client base on blocking
ast timeout. Some times a first failed and the last is a bit late,
so client is evicted. It is better to try reconnect with a timeout
equal to a connection request deadline, it would increase a number
of attempts in 5 times for a large obd_timeout. For example,
    obd_timeout=200
     - [ 1597902357, CONNECTING ]
     - [ 1597902357, FULL ]
     - [ 1597902422, DISCONN ]
     - [ 1597902422, CONNECTING ]
     - [ 1597902433, DISCONN ]
     - [ 1597902473, CONNECTING ]
     - [ 1597902473, DISCONN ] <- ENODEV from a failover pair
     - [ 1597902523, CONNECTING ]
     - [ 1597902539, DISCONN ]

The patch adds a logic to wakeup pinger for failed connection request
with ETIMEDOUT or ENODEV. It adds imp_next_ping processing for
ptlrpc_pinger_main() time_to_next_wake calculation, and fixes setting
of imp_next_ping value.

Lustre-commit: de8ed5f19f04136a4addcb3f91496f26478d03e7
Lustre-change: https://review.whamcloud.com/40244

HPE-bug-id: LUS-8520
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ia0891a8ead1922810037f7d71092cd57c061dab9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-on: https://review.whamcloud.com/44251
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14603 ptlrpc: quiet messages for unsupported opcodes
Andreas Dilger [Sun, 11 Apr 2021 02:04:30 +0000 (20:04 -0600)]
LU-14603 ptlrpc: quiet messages for unsupported opcodes

Reduce message spew for unhandled RPC opcodes.

Lustre-change: https://review.whamcloud.com/43257
Lustre-commit: a23767580aebfab7f093df562ac7598e85b71b3e

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I35496168e3aa29ecb06076654ef0aa97ba2540e5
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/44249
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14541 llite: avoid stale data reading
Wang Shilong [Wed, 28 Apr 2021 14:26:10 +0000 (22:26 +0800)]
LU-14541 llite: avoid stale data reading

remove_mapping() can prohibit to kill page from page cache due page
refcount!=2, in vvp_page_delete() clear uptodate flag in case
stale data reading later.

Lustre-change: https://review.whamcloud.com/43476
Lustre-commit: TBD (from e6033b193e8d35e689b7c2860374c8b2d2b7a5ee)

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I322debec951b1a342246475456c0f40e10b0e578
Reviewed-on: https://review.whamcloud.com/44291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14616 readahead: export pages directly without RA
Wang Shilong [Fri, 16 Apr 2021 02:04:17 +0000 (10:04 +0800)]
LU-14616 readahead: export pages directly without RA

With Readahead disabled, @vpg_defer_uptodate should not
be set as we don't reserve credits for such read.
In vvp_page_completion_read() we will call ll_ra_count_put()
which makes @ra_cur_pages negative.

Lustre-change: https://review.whamcloud.com/43338
Lustre-commit: 9f1c0bfd10d619a3755c3b22b1dd95a593720ce9

Fixes: 7e8efb339b ("LU-12043 llite: fix to submit complete read block with ra disabled")
Change-Id: I1c9134f5972aa0d0e7aac998f02c690cc55b433b
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/44246
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14618 lov: correctly handling sub-lock init failure
Bobi Jam [Fri, 16 Apr 2021 15:56:01 +0000 (23:56 +0800)]
LU-14618 lov: correctly handling sub-lock init failure

In lov_lock_sub_init(), if a sublock initialization fails, it needs to
bail out of the outer loop as well as the inner one.

Lustre-change: https://review.whamcloud.com/43345
Lustre-commit: 1a5169f9962e254ed4225fe35e8ee6cb6ff7a7f6

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ic4e16f484a0a64c670eea5d47054bac19bc95144
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44245
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14031 ptlrpc: remove unused code at pinger
Alexander Boyko [Wed, 14 Oct 2020 07:45:21 +0000 (03:45 -0400)]
LU-14031 ptlrpc: remove unused code at pinger

The timeout_list was previously used for grant shrinking,
but right now is dead code.

Lustre-change: https://review.whamcloud.com/40243
Lustre-commit: f02266305941423a10e8e6ec33a5865e24c18432

HPE-bug-id: LUS-8520
Fixes: fc915a43786e ("LU-8708 osc: depart grant shrinking from pinger")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ia7a77b4ac19da768ebe1b0879d7123941f4490b5
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44250
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14627 lnet: Allow delayed sends
Chris Horn [Wed, 21 Apr 2021 19:22:46 +0000 (14:22 -0500)]
LU-14627 lnet: Allow delayed sends

The net_delay_add has some code related to delaying sends, but it
isn't fully implemented. Modify lnet_post_send_locked() to check
whether the message being sent matches a rule and should be delayed.

Fix some bugs with how the delay timers were set and checked.

Lustre-change: https://review.whamcloud.com/43416
Lustre-commit: ab14f3bc852e708100d21770c00235f95841708a

HPE-bug-id: LUS-7651
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Icbd9ee81d2ff0162a01a4187807ea2114a42276d
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14540 o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED
Chris Horn [Fri, 19 Mar 2021 18:22:26 +0000 (13:22 -0500)]
LU-14540 o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED

ECONNREFUSED means that we received a response from the remote end,
so setting the LNet health status to REMOTE_DROPPED is more
appropriate than setting LOCAL_DROPPED. Using REMOTE_DROPPED will
decrement the peer NI health and allow us to try other peer NIs for
future sends.

Decrementing the peer NI health will also result in routes being
marked down, as appropriate, for cases where a router has refused the
connection request.

Lustre-commit: f9d837b479232bfc4f271f23cd3729ca67cb6c1d
Lustre-change: https://review.whamcloud.com/42114

Test-Parameters: trivial
HPE-bug-id: LUS-9853
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8190f5d78a76ec25553908c4f215362c0c2051fc
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/44248
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14537 mdd: directory migrate skips project ID check
Lai Siyao [Sat, 13 Mar 2021 15:48:54 +0000 (23:48 +0800)]
LU-14537 mdd: directory migrate skips project ID check

mdd_migrate_sanity_check() used to call mdd_rename_sanity_check(),
while the latter checks parent and sub file project ID, which is
redundant for migration because it's an internal layout change.

Add sanity 230t.

Lustre-change: https://review.whamcloud.com/42110
Lustre-commit: d32beef5ddde49362a849efabf514206a1adf6c4

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If5ac2131acb1dfb30a312dc34052287776f581c7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44247
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-2921 lipe: merge lipe changes from b_es5_2
John L. Hammond [Thu, 15 Jul 2021 14:34:59 +0000 (09:34 -0500)]
EX-2921 lipe: merge lipe changes from b_es5_2

Merge commit 'b6fee2b803e7ae817f52911c9475e21227260d1d' into b_es6_0

$ git checkout b_es5_2
$ git subtree split --prefix=lipe
39ab2a09b51500ad850472110b31fa42576e1551
$ git checkout b_es6_0
$ git subtree merge --prefix=lipe --squash 39ab2a09b51500ad850472110b31fa42576e1551

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ie1cde4bb85d6c80882e0479e85990b9e71c2da35

3 years agoSquashed 'lipe/' changes from 8251fae..39ab2a0
John L. Hammond [Thu, 15 Jul 2021 14:25:52 +0000 (09:25 -0500)]
Squashed 'lipe/' changes from 8251fae..39ab2a0

39ab2a0 EX-3379 lipe: update EXT2_ET_EA_NAME_NOT_FOUND kluge
7c8d939 EX-3388 lipe: use correct timestamps
e1ec215 Update lipe version to 1.18.
38a55f7 EX-3058 lamigo: drop NO_ACCT tag upon replication completion
79c9922 EX-3138 lipe: lpurge slot mutex handling
d0ddcc9 EX-3100 lamigo: check rj_agent pointer
587b1ef EX-3030 lamigo: more stats
87fd5d6 EX-3046 lipe: remove IML sockets
954971d EX-2659 tests: add sanity-lipe.sh to test LiPE utilities

git-subtree-dir: lipe
git-subtree-split: 39ab2a09b51500ad850472110b31fa42576e1551

3 years agoLU-13799 osc: Improve osc_queue_sync_pages
Patrick Farrell [Tue, 15 Jun 2021 14:23:04 +0000 (10:23 -0400)]
LU-13799 osc: Improve osc_queue_sync_pages

This patch was split and partially done in:
https://review.whamcloud.com/38214

So the text below refers to the combination of this patch
and that one.  This patch now just improves a looped atomic
add by replacing with a single one.  The rest of the grant
calcuation change is in
https://review.whamcloud.com/38214

(I am retaining the text below to show the performance
improvement)
----------
osc_queue_sync_pages now has a grant calculation component,
this has a pretty painful impact on the new faster DIO
performance.  Specifically, per page ktime_get() and the
per-page atomic_add cost close to 10% of total CPU time in
the DIO path.

We can make this per batch of pages rather than for each
page, which reduces this cost from 10% of CPU to almost
nothing.

This improves write performance by about 10% (but has no
effect on reads, since they don't use grant).

This patch reduces i/o time in ms/GiB by:
Write: 10 ms/GiB
Read: 0 ms/GiB

Totals:
Write: 158 ms/GiB
Read: 161 ms/GiB

mpirun -np 1 $IOR -w -t 1G -b 64G -o $FILE --posix.odirect

Before patch:
write     6071

After patch:
write     6470

(Read is similar.)

This also fixes a mistake in c24c25dc1b / LU-13419 where it
removed the shrink interval update entirely from the direct
i/o path.

Lustre-change: https://review.whamcloud.com/39482
Lustre-commit: TBD (from ad6f5a41a14d6017e657d5c337fa24e96252bbeb)

Fixes: c24c25dc1b ("LU-13419 osc: Move shrink update to per-write")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ic606e03be58239c291ec0382fa89eba64560da53
Reviewed-on: https://review.whamcloud.com/44270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3480 pcc: set return code with errno
Qian Yingjin [Wed, 14 Jul 2021 08:41:02 +0000 (16:41 +0800)]
EX-3480 pcc: set return code with errno

In liblustreapi_pcc.c, it should set errno on error return.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ibc80ea7a593f153744f10483720db9cb79ef060a
Reviewed-on: https://review.whamcloud.com/44302
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13419 osc: Move shrink update to per-write
Patrick Farrell [Mon, 13 Apr 2020 16:23:42 +0000 (11:23 -0500)]
LU-13419 osc: Move shrink update to per-write

Updating the grant shrink interval is currently done for
each page submitted, rather than once per write.  Since
the grant shrink interval is in seconds, this is
unnecessary.

This came up because this function showed up in the perf
traces for https://review.whamcloud.com/#/c/38151/, and
it is called with the cl_loi_list_lock held.

Note that this change makes this access to the grant shrink
interval a 'dirty' access, without locking, but the grant
shrink interval is:
A) Already accessed like this in various places, and
B) can safely be out of date or suffer a lost update
without affecting correctness or performance.

IOR performance testing with this test:
mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

No patches:
5942 MiB/s
With 38151:
14950 MiB/s
With 38151+this:
15320 MiB/s

Lustre-change: https://review.whamcloud.com/38214
Lustre-commit: c24c25dc1b84912063f79e44602526c482ca0479

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I8110b3c2570c183d58be2bccdbf76813ea3e373a
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-10350 lod: adjust stripe count to available ost count
Bobi Jam [Fri, 28 May 2021 08:25:52 +0000 (16:25 +0800)]
LU-10350 lod: adjust stripe count to available ost count

* When user specifies -1 stripe count or more stripe count than the
  ost count of a pool, we'd adjust the stripe count otherwise we
  cannot alloc enough stripe objects, as LOD reports as follows:

  lod_alloc_specific() can't lstripe objid [obj_fid]: have %d want %u

  where %d is the ost count of a pool, and %u is the total ost count
  if user specifies -1 stripe count of a bigger stripe count value
  than %d as user specifies.

* In ost-pool.sh, reset $MOUNT's stripe offset, so that the created
  diretory will not inherit it from root directory.

* Preserve the root directory layout in replay-single (run before
  ost-pools) to avoid leaving a bad layout on the root dir.
  Lustre-change: https://review.whamcloud.com/43872

Lustre-change: https://review.whamcloud.com/43882
Lustre-commit: f430ec079bf882744729d7aabc2021dfd26aba0c

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idf6884faf1271a3864710aeab0ba0eca154bf492
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/44259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14119 mdc: set fid2path RPC interruptible
Lai Siyao [Wed, 13 Jan 2021 09:29:50 +0000 (17:29 +0800)]
LU-14119 mdc: set fid2path RPC interruptible

Sometimes OI scrub can't fix the inconsistency in FID and name, and
server will return -EINPROGRESS for fid2path request. Upon such
failure, client will keep resending the request. Set such request
to be interruptible to avoid deadlock.

Lustre-change: https://review.whamcloud.com/41219
Lustre-commit: bf475262610671534b1b1a33cebb49d8380b74f7

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I82192cb8a8256064ca632cabfe5581b12e86423b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14741 obdclass: Wake up queue of reqs on close completion
Oleg Drokin [Mon, 7 Jun 2021 19:17:27 +0000 (15:17 -0400)]
LU-14741 obdclass: Wake up queue of reqs on close completion

Origin title:
LU-14741 obdclass: Wake up entire queue of requests on close
completion

Since close requests could be stuck behind normal requests and get
more slots we need to wake up entire accumulated queue waiting
for the next modrpc slot or have additional waitqueue just for
close requests.

This patch goes with the former approach.

Lustre-change: https://review.whamcloud.com/43941
Lustre-commit: a4e1567d67559b797a5c24ee0bfbca4a52649c47

Fixes: 1fc013f901 ("LU-5319 mdc: manage number of modify RPCs in flight")
Change-Id: Ib4333c7f6731dd435364d5e5f529577a1600a235
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/44288
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14711 tests: Ensure no eviction with long cache discard
Oleg Drokin [Sat, 29 May 2021 02:42:49 +0000 (22:42 -0400)]
LU-14711 tests: Ensure no eviction with long cache discard

Origin title:

LU-14711 tests: Ensure there's no eviction with long cache discard

Just pause execution while doing page processing
for discard if appropriate failloc is set.

Lustre-change: https://review.whamcloud.com/43869
Lustre-commit: TBD (from 3323b40668cddaa1ac6f6644436bd305c189c5ac)

Change-Id: If0d04f3cad267cbeeab63040d63e048dcf03cd6b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Test-Parameters: trivial testlist=sanity env=ONLY=903
Reviewed-on: https://review.whamcloud.com/44286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoDDN-2042 bio: allow BIO integrity to run on any core
Andreas Dilger [Fri, 7 May 2021 21:03:18 +0000 (15:03 -0600)]
DDN-2042 bio: allow BIO integrity to run on any core

Unbind the bio integrity workqueue so that it can run on any available
core in the system, to improve concurrency for this CPU-bound task if
there are multiple requests being submitted from a single thread.

This is done in the same way in dm-verity-target, which has a similar
CPU profile to bio-integrity.

Update kernel build version to -ddn14.

Test-Parameters: trivial
Reported-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia299e52db9b0995c8f48372782882324ba3ebbe5
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/44240
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14616 readahead: fix reserving for unaliged read
Wang Shilong [Tue, 20 Apr 2021 01:47:25 +0000 (09:47 +0800)]
LU-14616 readahead: fix reserving for unaliged read

If read is [2K, 3K] on x86 platform, we only need
read one page, but it was calculated as 2 pages.

This could be problem, as we need reserve more
pages credits, vvp_page_completion_read() will only
free actual reading pages, which cause @ra_cur_pages
leaked.

Lustre-change: https://review.whamcloud.com/43377/
Lustre-commit: 5e7e9240d27a4b74127ea7a26d910ae41a6e1cb1

Fixes: d4a54de84c0 ("LU-12367 llite: Fix page count for unaligned reads")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I3cf03965196c1af833184d9cfc16779f79f5722c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/44239
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14119 lfsck: replace dt_lookup() with dt_lookup_dir()
Lai Siyao [Wed, 13 Jan 2021 09:16:55 +0000 (17:16 +0800)]
LU-14119 lfsck: replace dt_lookup() with dt_lookup_dir()

Lfsck code calls dt_lookup() to lookup sub file under directory in
many places, but this function needs to to initialize directory with
dt_try_as_dir() first, while it's missing in several places, since
the overhead is trivial, call dt_lookup_dir() instead.

Lustre-change: https://review.whamcloud.com/41218
Lustre-commit: d525ad4bd0d5d851405e4249859a1c77378f0ee3

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I40bd8d51edece50353af1729cf867572a0abea78
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14322 tests: skip sanityn 51e for old servers
James Nunez [Thu, 10 Jun 2021 18:54:51 +0000 (12:54 -0600)]
LU-14322 tests: skip sanityn 51e for old servers

sanityn test 51e was added to Lustre version 2.13.54.148.
When we run version interop testing with servers less than
this version, the test will fail.

We should skip sanityn test 51e if the server version is
less than 2.13.54.148.

Lustre-change: https://review.whamcloud.com/43969
Lustre-commit: decdd03cdccdbfdd35f31317c617698198e5ea42

Fixes: 3ea729fe82 ("LU-13693 lfs: check early for MDS_OPEN_DIRECTORY")
Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 env=ONLY=51e testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Id2f165b275c97c3a1396a0da18a3f254dbe5efa7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14622 osd: mark pages accessed on reads
Alex Zhuravlev [Mon, 19 Apr 2021 06:10:17 +0000 (09:10 +0300)]
LU-14622 osd: mark pages accessed on reads

to improve cache hit ratio

Lustre-change: https://review.whamcloud.com/43367
Lustre-commit: 7a2011a4ecee773a5f8064e1e00d441f73aa5b15

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If4850465d118ed62e9da105dc0cf144ff5041fd3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44265
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14711 osc: Notify server if cache discard takes a long time
Oleg Drokin [Fri, 28 May 2021 02:34:44 +0000 (22:34 -0400)]
LU-14711 osc: Notify server if cache discard takes a long time

Discarding a large number of pages from a mapping under a
single lock can take a really long time (750GB is over 170s).
Since there is no stream of RPCs sent to the server as with
read or write to prolong the DLM lock timeout, the server
may evict the client as it does not see progress is being made.

As such send periodic "empty" RPCs to the server to show the
client is still alive and working on the pages under the lock.

For compatibility reasons the RPC is formed as a one-byte
OST_READ request with a special flag set to avoid doing
actual IO, but older servers actually do the one-byte read

Lustre-change: https://review.whamcloud.com/43857
Lustre-commit: 564070343ac4ccf4f97843009e1c36f5130ac19c

Change-Id: I4603c83e92c328d93e29adce8cbfac3d561b25d5
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/44285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14721 tests: wait_destroy_complete should check MDTs
Oleg Drokin [Sat, 29 May 2021 03:45:20 +0000 (23:45 -0400)]
LU-14721 tests: wait_destroy_complete should check MDTs

Ever since destroys handling was moved to MDTs we need to
move waiting for destroys completion to MDTs as well.

Lustre-change: https://review.whamcloud.com/43870
Lustre-commit: 3f8e14163a57ddf51047efc1c0a1b9c15631e2b4

Change-Id: I31440ec048b960206a903387d7050aa13e45008d
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44284
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-3478 pcc: avoid uninitialized pcc mutext lock in cleanup
Qian Yingjin [Wed, 14 Jul 2021 07:27:19 +0000 (15:27 +0800)]
EX-3478 pcc: avoid uninitialized pcc mutext lock in cleanup

Running racer concurrently crashed in the following way:
  RIP: 0010:[...]  [...] __list_add+0x1b/0xc0
  __mutex_lock_slowpath+0xa6/0x1d0
  mutex_lock+0x1f/0x2f
  pcc_inode_free+0x1e/0x60 [lustre]
  ll_clear_inode+0x64/0x6a0 [lustre]
  ll_delete_inode+0x5d/0x220 [lustre]
  evict+0xb4/0x180
  iput+0xfc/0x190
  ll_iget+0x156/0x350 [lustre]
  ll_prep_inode+0x212/0x9b0 [lustre]

After analysis, we found that the mutex @lli_pcc_lock is not
initialized. The reason is that ll_lli_init() is not called to
initialize @lli.
When call pcc_inode_free(), it will call mutex_lock() on the
uniniitialized @lli_pcc_lock, thus crash the kernel.

Test-Parameters: testlist=racer env=DURATION=3600
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I612c79a5b8eb4fa9daeb9e446a457e95c666c04a
Reviewed-on: https://review.whamcloud.com/44300
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14826 mdt: getattr_name("..") under striped directory
Lai Siyao [Thu, 8 Jul 2021 14:25:51 +0000 (10:25 -0400)]
LU-14826 mdt: getattr_name("..") under striped directory

For getattr_name(".."), it should return FID of the master object for
striped directories. This includes changes on both client and server:
* lmv_getattr_name() should use master object FID if it's looking up
  "..".
* mdt_raw_lookup() should check parent object is sub stripe, if so
  it needs to lookup again to get master object FID. For old client
  without above change this needs to be checked twice.

This is needed by NFS export, because ll_get_parent() find parent by
getattr_name("..").

Reenable check_fhandle_syscall and update sanityn test_102.

Lustre-change: https://review.whamcloud.com/44168
Lustre-commit: TBD (from 1c4ab69260220be049645b4a38d06a671d21d752)

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I72c951293e41656ce3778750147402d7f8ca4cec
Reviewed-on: https://review.whamcloud.com/44289
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14627 lnet: Ensure ref taken when queueing for discovery
Chris Horn [Thu, 22 Apr 2021 19:51:44 +0000 (14:51 -0500)]
LU-14627 lnet: Ensure ref taken when queueing for discovery

Call lnet_peer_queue_for_discovery() in
lnet_discovery_event_handler() to ensure that we take a ref on
the peer when forcing it onto the discovery queue. This also ensures
that the peer state has LNET_PEER_DISCOVERING.

Add a test to sanity-lnet.sh that can trigger the refcount loss bug
in discovery.

Lustre-change: https://review.whamcloud.com/43418
Lustre-commit: 2ce6957b69370b0ce75725d1d91866bf55c07fa8

HPE-bug-id: LUS-7651
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie2908668c4ffde0f993b5b7ea9aa58acd1d6fa9c
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/44272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14327 tests: skip sanity-sec test 55 for older servers
James Nunez [Tue, 8 Jun 2021 16:34:29 +0000 (10:34 -0600)]
LU-14327 tests: skip sanity-sec test 55 for older servers

sanity-sec test 55 was added to lustre-master version
2.13.57.12 and to lustre-b2_12 version 2.12.6.3.  When
we run version interop testing with Lustre servers less
than these versions, the test will fail.  Thus, skip
sanity-sec test 55 for Lustre severs less than 2.12.6.3.

Lustre-change: https://review.whamcloud.com/43950
Lustre-commit: 34c5a9e1ec2a82b10f9e85bc54cb2a48da0d5037

Fixes: 355787745f21 (“LU-14121 nodemap: do not force fsuid/fsgid squashing”)
Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 env=ONLY=55 testlist=sanity-sec
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ie002c921e853897105396185b38485799df31b7a
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44278
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14533 tests: skip sanity-pfl 0d for older servers
James Nunez [Thu, 10 Jun 2021 21:05:18 +0000 (15:05 -0600)]
LU-14533 tests: skip sanity-pfl 0d for older servers

sanity-pfl test 0d was added in commit v2_14_0-39-gd645373541.
When we run version interop testing with servers with
version less than this, the test will fail.

We should skip sanity-pfl test 0d if the Lustre server
version is less than 2.14.0.1.

Lustre-change: https://review.whamcloud.com/43971
Lustre-commit: 06d2128b3bd0a3ec45e5d54cad29c310ae3ded7c

Fixes: 83e38bba62 ("LU-14180 utils: verify setstripe comp_end is valid")
Test-Parameters: trivial testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I49b45c7a1e4804fece33d53a4fb946b49254de2b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44274
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
3 years agoLU-13716 tests: skip sanity 205b for older servers
James Nunez [Sat, 12 Jun 2021 00:05:08 +0000 (18:05 -0600)]
LU-13716 tests: skip sanity 205b for older servers

Lustre job stats and sanity test 205b were modified in Lustre
version 2.13.54.91.  When we run version intop testing with
servers less than this version and clients that are greater,
the test will fail.

Skip sanity test 205b for Lustre servers with version less than
2.13.54.91 and client greater than that version.

Lustre-change: https://review.whamcloud.com/43993
Lustre-commit: 1ce4d064801f30d3498d7e2c55ef3e699e4ef585

Test-Parameters: trivial
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Icc5d6a6adcf03e5bd16b678596f28590fe31516e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14119 osd: add mount option "resetoi"
Lai Siyao [Wed, 3 Feb 2021 03:44:15 +0000 (11:44 +0800)]
LU-14119 osd: add mount option "resetoi"

OI files on zfs are special, and they can't be deleted by user space
tools like rm. Sometimes the OI files may contain stale OI mappings,
and they needed to be removed for namespace consistency. Add a mount
option 'resetoi' to recreate OI files on mount time, and it will
support both ldiskfs and zfs. This should be the standard way to
recreate OI files, other than mount as backend filesystem and unlink
them manually.

Add sanity-scrub 18.

Lustre-change: https://review.whamcloud.com/41402
Lustre-commit: f37bce8a573dfc5aac1b9f51f4d5c8314ba05d30

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idc0e4c2f3b81675c49c6c005bc30b61d8fd04503
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44232
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14779 utils: no DNS lookups for NID in get_param
Andreas Dilger [Wed, 23 Jun 2021 08:20:24 +0000 (02:20 -0600)]
LU-14779 utils: no DNS lookups for NID in get_param

Calling libcfs_str2nid() speculatively in "lctl get_param" to see
if there is a NID in the parameter name results in multiple DNS
lookups for invalid hostnames (e.g. "exports.192.168.0.10"). That
may take a very long time if there are a large number of connected
clients, and if the DNS server overloaded or is having problems.

Instead of doing these speculative NID conversions, skip the whole
NID string in the parameter name for the two known parameters that
may contain a NID ("*.exports.<NID>.*" and "*.MGC<NID>.*").  This
is considerably faster since it is only working on a local string.

If new parameters are added that contain a NID (unlikely, but
possible), then "clean_path()" would need to be updated as part
of that change.

Lustre-change: https://review.whamcloud.com/44056
Lustre-commit: f21c507fbca2afab5a5d97d4e816696a69d1c593

Test-Parameters: trivial
Fixes: 85cbe1a3ee69 ("LU-5030 util: migrate lctl params functions to use cfs_get_paths()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I51f865e4ce3a7bc4879f9d688c4b3a68d731810f
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44281
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-2952 utils: add libzpool
Alex Zhuravlev [Thu, 1 Apr 2021 07:43:17 +0000 (10:43 +0300)]
EX-2952 utils: add libzpool

as libmount_utils uses get_system_hostid() provided by libzpool

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id07de07c9e0bb0efb598ccf8e7abcb389a875318
Reviewed-on: https://review.whamcloud.com/43193
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44236

3 years agoLU-14175 osd: print inode number with FID in OI scrub
Andreas Dilger [Sat, 27 Mar 2021 19:50:33 +0000 (13:50 -0600)]
LU-14175 osd: print inode number with FID in OI scrub

When debugging OI Scrub problems, also print the inode number
with the FID so that it is easier to find the problematic inode.
Otherwise, if the OI is broken it is not easy to find the inode
in question without a full filesystem scan.

Lustre-change: https://review.whamcloud.com/43153
Lustre-commit: 5bab4acf8320b46076c81f32f7954f91dae21bc9

Test-Parameters: trivial testlist=sanity-scrub,sanity-lfsck
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I217624ff2116326f86e053bcfacc6f19873ebbe5
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44235
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-6142 mdc: include linux/idr.h for referenced code
Andreas Dilger [Fri, 16 Apr 2021 06:03:00 +0000 (00:03 -0600)]
LU-6142 mdc: include linux/idr.h for referenced code

Include the <linux/idr.h> header in files that references IDR
functionality.  Don't depend on its indirect inclusion elsewhere.

Lustre-change: https://review.whamcloud.com/43346
Lustre-commit: 3589a3141a4b9f94887b3ac5d6202233b06b8996

Test-Parameters: trivial
Fixes: 66172e3274ca ("LU-13238 ofd: add OFD access logs")
Fixes: d0423abc1adc ("LU-12506 changelog: support large number of MDT")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icdd03e15d31eabc4a1363d1757fc4db7723ebbe5
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/44233
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14119 osd: delete stale OI mapping entry
Lai Siyao [Wed, 24 Feb 2021 03:31:06 +0000 (11:31 +0800)]
LU-14119 osd: delete stale OI mapping entry

Once LMA check shows OI mapping entry is stale, delete it from
OI table, as can avoid removing whole OI files.

Don't add OI mapping into cache until osd_fid_lookup(), because
the mapping in OI is not trustable until FID in LMA is checked,
otherwise it may mislead LFSCK.

Lustre-change: https://review.whamcloud.com/41741
Lustre-commit: 99d00b97ef5f209a002f250e7772055ff1a6d6d6

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I4b50dcc02149d485e4bf4a361ca2994daa280feb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44231
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14119 osd-zfs: enable LUDA_VERIFY
Lai Siyao [Tue, 19 Jan 2021 13:37:50 +0000 (21:37 +0800)]
LU-14119 osd-zfs: enable LUDA_VERIFY

In osd_dir_it_rec(), if dirent is successfully got, and the FID in
dirent is sane, it returns right away, however if
LUDA_VERIFY|LUDA_VERIFY_DRYRUN is set, the FID in dirent should be
compared with the FID in LMA, and replaced with the latter one if
they are differet.

Lustre-change: https://review.whamcloud.com/41274
Lustre-commit: f5136e81957e4b67ae6ed7764d378b817fac5ee2

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I35e2a4d4606044cd37cc5847cffc577740918988
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44230
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14204 tests: make sure we have a single import
Sebastien Buisson [Wed, 9 Dec 2020 17:53:12 +0000 (18:53 +0100)]
LU-14204 tests: make sure we have a single import

In sanity, retrieve the exact name of the import being used on the
client, in order to properly get information such as lock_count
or lru_size.

Lustre-change: https://review.whamcloud.com/41758
Lustre-commit: 9bbc45d3f48acf79a1ad0a1161af832e040ee52f

Lustre-change: https://review.whamcloud.com/42019
Lustre-commit: 4541d5424ebcac028864f454af0e650f8ee9468b

Change-Id: I065b7da7990c7171d5baa24f3400c5f8ffc12fc9
Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41855
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14048 obd: fix race between connect vs disconnect
Yang Sheng [Thu, 18 Feb 2021 15:22:09 +0000 (23:22 +0800)]
LU-14048 obd: fix race between connect vs disconnect

The export nid hash would be removed in class_disconnect, But
still a race window exists in target_handle_connect to add it back.
Then the process of cleanup will wait infinity.

Lustre-commit: 8081979e76b3c07f629e0943fcd6d8b0285719e3
Lustre-change: https://review.whamcloud.com/41687

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I9ad3edbd040b81e2aef7ae22494302d9a478d65b
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14575 ofd: suppress errors on missing parent FID
John L. Hammond [Wed, 31 Mar 2021 17:45:05 +0000 (12:45 -0500)]
LU-14575 ofd: suppress errors on missing parent FID

In ofd_access(), if the parent FID is zero then skip adding an entry
to the OFD access log.

Lustre-change: https://review.whamcloud.com/43184
Lustre-commit:4a6ed7d6351e4fffd8af5745bfe7cbb161c46858

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib518dc1f181a820d99021dd58ab548916e16f29d
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14566 lnet: Skip discovery in LNetPrimaryNID if DD disabled
Chris Horn [Fri, 26 Mar 2021 16:28:18 +0000 (11:28 -0500)]
LU-14566 lnet: Skip discovery in LNetPrimaryNID if DD disabled

If discovery is disabled locally then the discovery thread will not
modify any peer objects as a result of the discovery process. Thus,
the primary NID of any peer we're asked to discover will not change
as a result of discovery. Therefore, we do not need to actually
perform discovery in LNetPrimaryNID() if discovery is disabled
locally. Since this routine can result in long client mount times
when a Lustre server is down we should avoid this unnecessary
discovery.

Lustre-change: https://review.whamcloud.com/43141
Lustre-commit: 16264da9e3c43a6368a25b6ded4113e8cfa57427

Test-Parameters: trivial
HPE-bug-id: LUS-9887
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6d188e16422ad47a146d52bb24cdd1b77a30aa71
Reviewed-on: https://review.whamcloud.com/43141
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14536 obi2lnd: don't try to reconnect if there's no listener
Li Dongyang [Fri, 19 Mar 2021 10:21:58 +0000 (21:21 +1100)]
LU-14536 obi2lnd: don't try to reconnect if there's no listener

For each discovery we try to reconnect up to retry_count times,
default to 5. during MDT mount process conf log, there will be
multiple discovery made for each OST.
If the OSTs are not up, the mount will have a long time out.

Lustre-change: https://review.whamcloud.com/42111
Lustre-commit: 0ab06eb9d865a47ea3e09880a41a9e8f0a78b6a6

Change-Id: If1d854216d2f26089c52d3fb501092b7f48a444d
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44223
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14536 o2iblnd: don't resend if there's no listener
Li Xi [Tue, 13 Jul 2021 09:22:39 +0000 (17:22 +0800)]
LU-14536 o2iblnd: don't resend if there's no listener

If there's no listener at remote peer, we will
get IB_CM_REJ_INVALID_SERVICE_ID, currently we
will try to resend which makes the discovery longer
than necessary when connecting to a node which is
not up.
Use -EHOSTUNREACH instead of -ECONNREFUSED,
so we don't end up queued for resend.

Lustre-change: https://review.whamcloud.com/42109
Lustre-commit: 65e3e4050ec5bb371c1c343fca49a605286a086e

Change-Id: Ifaf14bc3ada2e2469669285917e366af669817e2
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44222
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3046 lipe: remove IML sockets
John L. Hammond [Wed, 21 Apr 2021 15:28:26 +0000 (10:28 -0500)]
EX-3046 lipe: remove IML sockets

Remove the untested and unmaintained IML sockets from lamigo and
lpurge.

Lustre-change: https://review.whamcloud.com/43374
Lustre-commit: 1e58e57300484952da3e4e7801c8f0d1a9aa5b84

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I75346b7a1478b9f5902e4b139f2ee9e2fca67ffc
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44221
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-3468: Include mlnx-tools when we build MoFED
Gaurang Tapase [Tue, 13 Jul 2021 07:19:18 +0000 (12:49 +0530)]
EX-3468: Include mlnx-tools when we build MoFED

Test-Parameters: trivial

Change-Id: I388810611435ed080b951ff3abae966116bafdf8
Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-on: https://review.whamcloud.com/44220
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoRM-620 build: New tag 2.14.0-ddn5
Andreas Dilger [Tue, 13 Jul 2021 21:42:25 +0000 (15:42 -0600)]
RM-620 build: New tag 2.14.0-ddn5

New tag 2.14.0-ddn5

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I53a25184ebf79bb984fa4ae4e40266d4634c4cb1

3 years agoEX-2723 kernel: fix potential infinite loop
Wang Shilong [Mon, 12 Apr 2021 08:43:22 +0000 (16:43 +0800)]
EX-2723 kernel: fix potential infinite loop

In dquot_writeback_dquots(), we write back dquot from dirty dquots
list. There is a potential infinite loop if ->write_dquot() failure
and forget remove dquot from the list. This patch clear dirty bit
anyway to avoid it.

Snapshot destroy might dirty quota list, and umount will hang if
filesystem has been mounted as RO because of corrupted image.

Linux-commit: dd5f6279732e8885061d7455b9d86fdcfdf7f183

Change-Id: If5e9db82eacc3a6a621566fb612b55071e51da25
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/43732
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14430 mdd: use own buffer for changelog
Sebastien Buisson [Wed, 12 May 2021 06:55:29 +0000 (09:55 +0300)]
LU-14430 mdd: use own buffer for changelog

Use own persistent buffer for changelog needs to don't
interfere with generic big_buf in MDD thread info which
can be in use.

Lustre-change: https://review.whamcloud.com/43672
Lustre-commit: c352b89dc981e5ebe73c8bd2d9e0949094c828b2

Fixes: f3d03bc38a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I4f4692b5556eaa98e2e23d7b58c925e33401e4e5
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/43731
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14673 sec: annotate algorithms taking optional key
Sebastien Buisson [Tue, 11 May 2021 08:59:03 +0000 (10:59 +0200)]
LU-14673 sec: annotate algorithms taking optional key

Crypto algorithms implementing a ->setkey() method but that can also
be used without a key must set the CRYPTO_ALG_OPTIONAL_KEY flag if
defined in the kernel.
In Lustre, adler32 implementation defines a ->setkey() method, but
its "key" is not actually a cryptographic key.

Linux-commit: a208fa8f33031b9e0aba44c7d1b7e68eb0cbd29e

Lustre-change: https://review.whamcloud.com/43656
Lustre-commit: b161e7b777e63bb4328aeab9e50560f919fedc31

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I362211d1b1aa3763fe1481cebb3629b255f29e41
Reviewed-on: https://review.whamcloud.com/43860
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14747 kernel: kernel update RHEL7.9 [3.10.0-1160.31.1.el7]
Jian Yu [Mon, 21 Jun 2021 18:43:24 +0000 (11:43 -0700)]
LU-14747 kernel: kernel update RHEL7.9 [3.10.0-1160.31.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.31.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I19c82c5b323ae5f097bc891f75beb69ecfea706d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44045
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]
Jian Yu [Mon, 21 Jun 2021 18:53:02 +0000 (11:53 -0700)]
LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]

Update SLES12 SP5 kernel to 4.12.14-122.74.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I98952c097b14c68f744a570e5558fb21d9392ad2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44046
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14195 llite: add force_uaccess_{begin,end} helpers
Jian Yu [Tue, 6 Jul 2021 23:00:20 +0000 (16:00 -0700)]
LU-14195 llite: add force_uaccess_{begin,end} helpers

Linux kernel version 5.10 adds force_uaccess_begin() and
force_uaccess_end() helpers to wrap get_fs() and set_fs()
for undoing any damange done by set_fs(KERNEL_DS).

Lustre-change: https://review.whamcloud.com/44165
Lustre-commit: TBD (from 597a8a1e0c4c09b86b7d4e860cdcd6a3fedcb6dc)

Fixes: a84af70dcab ("LU-12358 pcc: add project quota support on PCC backend")
Change-Id: I68745a8a1e26312ffe6ee8388f962b9c834df97b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44160
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14195 libcfs: switch to kfree_sensitive
Mr NeilBrown [Tue, 6 Jul 2021 00:15:47 +0000 (17:15 -0700)]
LU-14195 libcfs: switch to kfree_sensitive

In Linux 5.10, kzfree() has been renamed kfree_sensitive().

So switch to the new name and provide back-compat support for older
kernels.

Lustre-change: https://review.whamcloud.com/40908
Lustre-commit: 67d17dd590f913643f5adc8aced369221faccf05

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If665168477a0b6241a8ddf31a111cd465fe97783
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-on: https://review.whamcloud.com/44144
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14195 lustre: remove 'fs' from 'struct lvfs_run_ctxt'
Mr NeilBrown [Tue, 6 Jul 2021 00:10:15 +0000 (17:10 -0700)]
LU-14195 lustre: remove 'fs' from 'struct lvfs_run_ctxt'

The code protected by push_ctxt() and pop_ctx() never tries to access
any user-space data, so call set_fs() to KERNEL_DS is not needed.

So remove the 'fs' field and related code.

In linux-5.10 this code fails to compile as set_fs() is deprecated.

Lustre-change: https://review.whamcloud.com/40910
Lustre-commit: de60e7767c0e3ba38f4de37e46328012780b6d19

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idb2744d656dc4228375b6da54673e38cc1c112f5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44143
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 years agoLU-14195 osd: don't use set_fs() for ->fiemap() calls.
Mr NeilBrown [Tue, 6 Jul 2021 00:06:18 +0000 (17:06 -0700)]
LU-14195 osd: don't use set_fs() for ->fiemap() calls.

->fiemap() only accesses kernel-space data, so does not need, and
never has needed, set_fs() calls.
In Linux 5.10, these calls are deprecated.
So remove the unnecessary code.

Lustre-change: https://review.whamcloud.com/40909
Lustre-commit: d0337cab8e845efcdbfb9e26e573feb18f28e303

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id336855b4787ddbf656dfa3b8d0b12f663564795
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>