Whamcloud - gitweb
fs/lustre-release.git
19 months agoLU-14441 mdc: check/grab import before access 81/41681/19
Alex Zhuravlev [Mon, 13 Dec 2021 08:27:42 +0000 (11:27 +0300)]
LU-14441 mdc: check/grab import before access

to ensure the import doesn't disappear while being accessed
via procfs.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I005c96b349e55646996fd0d265ab4dd1e2b9a1fa
Reviewed-on: https://review.whamcloud.com/41681
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15829 llite: don't use a kms if it invalid. 95/47395/6
Alexey Lyashkov [Thu, 19 May 2022 17:35:18 +0000 (20:35 +0300)]
LU-15829 llite: don't use a kms if it invalid.

Lockless DIO don't update a KMS as other IO type does,
it caused a situation when next read don't known a real file size
to be read. Lets avoid using an invalid KMS.

Fixes: 6bce5367 (LU-4198 clio: turn on lockless for some kind of IO)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie71d3f3cc24fc16c03ed07f9f5a3a17c7fdfa684
Reviewed-on: https://review.whamcloud.com/47395
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: lprocfs_exp_setup() to take struct lnet_nid 42/44642/5
Mr NeilBrown [Thu, 8 Jul 2021 01:32:48 +0000 (11:32 +1000)]
LU-10391 ptlrpc: lprocfs_exp_setup() to take struct lnet_nid

lprocfs_exp_setup() now takes 'struct lnet_nid *' as peer_nid.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If779893d8b1c7b650d39182c121c1f611d058f0d
Reviewed-on: https://review.whamcloud.com/44642
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: pass lnet_nid for self to ptl_send_buf() 41/44641/5
Mr NeilBrown [Thu, 8 Jul 2021 00:53:30 +0000 (10:53 +1000)]
LU-10391 ptlrpc: pass lnet_nid for self to ptl_send_buf()

The 'self' arg to ptl_send_buf() is now a pointer to a
'struct lnet_nid', and can be NULL meaning "ANY NID".

LNetPut() already accepts NULL as the self pointer.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I859dfa10e2f5e50c029c6926fe25ac036fb4f494
Reviewed-on: https://review.whamcloud.com/44641
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change bd_sender in ptlrpc_bulk_frag_ops 40/44640/5
Mr NeilBrown [Tue, 18 Jan 2022 18:12:50 +0000 (13:12 -0500)]
LU-10391 ptlrpc: change bd_sender in ptlrpc_bulk_frag_ops

bd_sender in struct ptlrpc_bulk_frag_ops is now 'struct lnet_nid'.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I43a6600dcc814a6a46b3a793641545123efaa6ab
Reviewed-on: https://review.whamcloud.com/44640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change rq_source to struct lnet_nid 39/44639/5
Mr NeilBrown [Sat, 20 Aug 2022 17:30:25 +0000 (13:30 -0400)]
LU-10391 ptlrpc: change rq_source to struct lnet_nid

rq_source in struct ptlrpc_request can now store large NIDs.
ptl_send_buf() now takes a struct lnet_processid for the peer.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2fe7da2332955c69f6252d44fb3ae28d2ef4e517
Reviewed-on: https://review.whamcloud.com/44639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change rq_peer to struct lnet_nid 38/44638/4
Mr NeilBrown [Thu, 4 Aug 2022 01:43:26 +0000 (21:43 -0400)]
LU-10391 ptlrpc: change rq_peer to struct lnet_nid

rq_peer in struct ptlrpc_request can now store large NIDs.
ptlrpc_connection_get() and others now take a
struct lnet_processid

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3bb419720434714301946d278413ce6090aa2cdd
Reviewed-on: https://review.whamcloud.com/44638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: pass net num to ptlrpc_uuid_to_connection 37/44637/4
Mr NeilBrown [Thu, 8 Jul 2021 00:34:36 +0000 (10:34 +1000)]
LU-10391 ptlrpc: pass net num to ptlrpc_uuid_to_connection

Rather than passing a nid to indicate which net to choose,
pass just the net number.  This will make it easier to convert to
'struct lnet_nid'.

Also change ptlrpc_uuid_to_peer() to take the refnet as an explicit
argument, rather than embedding in in the peer pid.

This makes the refnet test more obvious, and removes the (strange)
need to test the address part against zero.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0650760a59342f5ac245cc14011452e436ef8e4c
Reviewed-on: https://review.whamcloud.com/44637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change rq_self to struct lnet_nid 36/44636/4
Mr NeilBrown [Wed, 7 Jul 2021 05:55:06 +0000 (15:55 +1000)]
LU-10391 ptlrpc: change rq_self to struct lnet_nid

rq_self in struct ptlrpc_request can now store largs NIDs.
ptlrpc_connection_get() is also changed to received a
'struct lnet_nid'.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If2ea7770e967e2f044f2b2300950b612463e130c
Reviewed-on: https://review.whamcloud.com/44636
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-8367 tests: cleanup_orphans hang reproducer 39/46939/7
Alexander Boyko [Thu, 12 May 2022 13:49:34 +0000 (09:49 -0400)]
LU-8367 tests: cleanup_orphans hang reproducer

The patch adds recovery-small 144 test to reproduce hang at
osp_precreate_cleanup_orphans().

PID: 49938  TASK: ffff98c63a248000  CPU: 30  COMMAND: "osp-pre-3-1"
__schedule at ffffffff8e54e1d4
schedule at ffffffff8e54e648
osp_precreate_cleanup_orphans at ffffffffc17d00e9 [osp]
osp_precreate_thread at ffffffffc17d18da [osp]

Test-Parameters: trivial testlist=recovery-small env=ONLY=144b
HPE-bug-id: LUS-10793
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I463c75e63043c71ed0de0c6d08294098099c67e5
Reviewed-on: https://review.whamcloud.com/46939
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16075 kernel: kernel update RHEL8.6 [4.18.0-372.19.1.el8_6] 16/48116/5
Jian Yu [Tue, 23 Aug 2022 01:37:06 +0000 (18:37 -0700)]
LU-16075 kernel: kernel update RHEL8.6 [4.18.0-372.19.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.19.1.el8_6.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I8e0fbdab54d36512c4c4cbdbc97c580994ebcbd3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48116
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16090 build: Module.symvers lookup by flavor on SUSE 95/48195/2
Shaun Tancheff [Thu, 11 Aug 2022 11:48:40 +0000 (18:48 +0700)]
LU-16090 build: Module.symvers lookup by flavor on SUSE

When multiple kernel flavors are found we need to select only
the Module.symvers for the flavor that is being built.

HPE-bug-id: LUS-11149
Test-Parameters: trivial
Fixes: 1f4aaefe1aae ("LU-15962 build: add in-kernel Module.symvers to symbol path")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1c9af91108534d3a67f816077756fded4cd0b653
Reviewed-on: https://review.whamcloud.com/48195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16085 tests: fix sanityn test_106c 35/48435/2
Sebastien Buisson [Tue, 6 Sep 2022 06:57:04 +0000 (08:57 +0200)]
LU-16085 tests: fix sanityn test_106c

Fix sanityn test_106c after modification introduced when fixing
stat attributes_mask.

Test-Parameters: trivial testlist=sanityn env=ONLY=106c
Fixes: 0e48653c27 ("LU-16085 llite: fix stat attributes_mask")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I370813b9b1c22450577c390964a0cc410735b989
Reviewed-on: https://review.whamcloud.com/48435
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16100 tests: fix sanity/51d divide-by-zero 93/48393/2
Andreas Dilger [Tue, 30 Aug 2022 19:14:16 +0000 (13:14 -0600)]
LU-16100 tests: fix sanity/51d divide-by-zero

Fix dirstripe count when testing on non-DNE configs.

Test-Parameters: trivial
Fixes: cf35c54224b3 ("LU-14745 tests: ensure sanity/51d has enough objects")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I787df4cfda9e62673e5f89d2b899154f636777fe
Reviewed-on: https://review.whamcloud.com/48393
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-9859 libcfs: remove Lustre specific bitmap handling 22/48222/3
James Simmons [Tue, 16 Aug 2022 13:46:36 +0000 (09:46 -0400)]
LU-9859 libcfs: remove Lustre specific bitmap handling

Only the NRS TBF handling uses the Lustre specific bitmap
handling. Convert to the Linux bitmap API and remove the
Lustre specific bitmap handling.

Test-Parameters: trivial testlist=sanityn env=ONLY=77
Change-Id: I58dcf869778d6cf6349c16e73d75e53735ffb97d
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/48222
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16085 llite: fix stat attributes_mask 08/48208/3
Sebastien Buisson [Fri, 12 Aug 2022 07:59:02 +0000 (09:59 +0200)]
LU-16085 llite: fix stat attributes_mask

Fix stat attributes_mask to return STATX_ATTR_ENCRYPTED whenever it is
possible. Also fix sanityn test_106c to expect at least the 0x30 flag
for attributes_mask.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Icd16beff058c42d77e9b04ad1a287ec2ac04dfed
Reviewed-on: https://review.whamcloud.com/48208
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16093 kernel: kernel update SLES12 SP5 [4.12.14-122.130.1] 04/48204/2
Jian Yu [Fri, 12 Aug 2022 01:41:35 +0000 (18:41 -0700)]
LU-16093 kernel: kernel update SLES12 SP5 [4.12.14-122.130.1]

Update SLES12 SP5 kernel to 4.12.14-122.130.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: Ib2180a056889d481a7b55c41cbcd98c8e0e272d8
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48204
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16084 tests: fix lustre-patched filefrag check 88/48188/3
Andreas Dilger [Wed, 10 Aug 2022 18:27:56 +0000 (12:27 -0600)]
LU-16084 tests: fix lustre-patched filefrag check

Fix sanity test_130b thru test_130g to check for "filefrag -l"
instead of "filefrag -e", since the "-e" option has been in
upstream e2fsprogs since commit v1.42.6-50-g2508eaa7.  The "-l"
option (logical extent ordering) is really what is needed to
handle Lustre-striped files anyway.

While there, fix the code style in these subtests:
- use "local" and lower-case names for local variables
- use $(...) for subshells
- use (( ... )) for numeric comparisons
- use preferred "check || action" style checks
- use "skip_env" for environment configuration checks (e2fsprogs)
- use "skip" for test-related checks that can't be "fixed"
- use pre-defined $ost1_FSTYPE for checking OST filesystem type

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8eb7f17a9532796ab0274247194dd52cbc8a141c
Reviewed-on: https://review.whamcloud.com/48188
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15994 tests: add testing for io_uring via fio 67/48167/3
Qian Yingjin [Tue, 9 Aug 2022 07:56:23 +0000 (03:56 -0400)]
LU-15994 tests: add testing for io_uring via fio

This patch adds test case for io_uring I/O engine via fio.

Test-Parameters: trivial testlist=sanity
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I0f2e371f91c02dc76644f42e5d1055ec200597c6
Reviewed-on: https://review.whamcloud.com/48167
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15548 tests: skip conf-sanity/131 for older servers 51/48151/3
Andreas Dilger [Fri, 5 Aug 2022 20:19:41 +0000 (14:19 -0600)]
LU-15548 tests: skip conf-sanity/131 for older servers

Skip conf-sanity.sh test_131 when running against older servers that
do not support the trusted.projid xattr.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=131
Test-Parameters: testlist=conf-sanity env=ONLY=131 serverversion=2.14.0
Fixes: e4d07f2c30 ("LU-12056 ldiskfs: add trusted.projid virtual xattr")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If1858502ab50ffd10e494eab793e3bc0f883fe9e
Reviewed-on: https://review.whamcloud.com/48151
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15873 obd: skip checking read-only fs health 95/48095/5
Bobi Jam [Tue, 3 Nov 2020 09:04:01 +0000 (17:04 +0800)]
LU-15873 obd: skip checking read-only fs health

Health check upon read-only file system would fail and STONITH
ensues.

Add obd_device::obd_read_only to record read-only flag of the
obd_device. And skip checking the health of read-only devices.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ica83b9c871f7bee62cef6504deb0dcc32dd20afb
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/48095
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-1904 idl: add checks for OBD_CONNECT flags 53/48053/2
Andreas Dilger [Fri, 28 May 2021 08:49:19 +0000 (02:49 -0600)]
LU-1904 idl: add checks for OBD_CONNECT flags

Make it harder to accidentally declare OBD_CONNECT flags without
properly defining their names.  Otherwise, this can cause serious
compatibility problems if two features are using the same flag.

Add the definition lines into spelling.txt so there is *always*
a warning generated, since this always needs proper attention.

Make it clear whom to contact when reserving a new feature flag.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9a5e2c97c40c39ea57d20979d4b130854edc785a
Reviewed-on: https://review.whamcloud.com/48053
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16012 sec: fix detection of SELinux enforcement 49/48049/4
Sebastien Buisson [Wed, 27 Jul 2022 12:39:26 +0000 (12:39 +0000)]
LU-16012 sec: fix detection of SELinux enforcement

On newer distros (e.g. RHEL 9.0), on which selinux_is_enabled() does
not exist anymore, the only way to find out if SELinux is enforced
when initializing the security context is to fetch the length of the
security attribute name. If it is 0, we conclude SELinux is disabled.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ifcdcb8ffbb7f9ad50d16d7d3317e94d0d212fa42
Reviewed-on: https://review.whamcloud.com/48049
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16048 build: Update ZFS version to 2.1.5 40/48040/3
Jian Yu [Tue, 26 Jul 2022 07:15:49 +0000 (00:15 -0700)]
LU-16048 build: Update ZFS version to 2.1.5

Update ZFS version to 2.1.5. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.5

Change-Id: I9f25aafe889f87fb80677e59dbe4679932d8b920
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16045 enc: force use of new enc xattr on new servers 35/48035/3
Sebastien Buisson [Mon, 25 Jul 2022 14:39:56 +0000 (16:39 +0200)]
LU-16045 enc: force use of new enc xattr on new servers

When an older client uses encryption with a newer server, the client
wants to see the encryption context in security.c xattr. But
internally on server side, we force use of newer encryption.c xattr
for consistency purpose. When required, the encryption context is put
in the request to the client as usual, which interprets it as desired.

Fixes: 4231fab66e ("LU-13717 sec: make client encryption compatible with ext4")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I667e123bdff912acc270666e8c74ebda6f0534e7
Reviewed-on: https://review.whamcloud.com/48035
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16035 kfilnd: Initial kfilnd implementation 09/48009/8
Doug Oucharek [Tue, 16 Oct 2018 22:51:21 +0000 (15:51 -0700)]
LU-16035 kfilnd: Initial kfilnd implementation

Initial implementation of the kfabric Lustre Network Driver.

Test-Parameters: trivial
HPE-bug-id: LUS-6565
Signed-off-by: Doug Oucharek <dougso@me.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I48a070ca0ba37e4923cd6dcb3327676ae6ddaae1
Reviewed-on: https://review.whamcloud.com/48009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16027 tests: sanity:test_66: specify blocksize explicitly 83/47983/3
Elena Gryaznova [Tue, 19 Jul 2022 13:15:42 +0000 (16:15 +0300)]
LU-16027 tests: sanity:test_66: specify blocksize explicitly

Fix test_66() to be independent from BLOCKSIZE environment
variable.

To reproduce the failure, just run:
  llmount.sh
  export BLOCKSIZE=4096; ONLY=66 sh sanity.sh
  == sanity test 66: update inode blocks count on client ==
  8+0 records in
  8+0 records out
  8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00589935 s, 1.4 MB/s
  sanity test_66: @@@@@@ FAIL: /mnt/lustre/f66 blocks 2 < 8

Test-Parameters: trivial testlist=sanity env=ONLY=66
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11014
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I0adca724518cb955e3664d33a36628ae19a1712d
Reviewed-on: https://review.whamcloud.com/47983
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15999 tests: format journal with correct block size 30/47930/5
Elena Gryaznova [Mon, 11 Jul 2022 09:04:53 +0000 (12:04 +0300)]
LU-15999 tests: format journal with correct block size

Without "-b block-size" mke2fs calculates block size itself based upon
device size. In result, filesystem and journal may be formatted with
different block sizes. For example, 32M device gets formatted to journal
with block size 1K and llmount.sh fails to create Lustre with external
journal:
   mke2fs: Filesystem has unexpected block size while trying
           to open journal device /dev/vdb
because the target device itself is created with default
"Block size: 4096".
Let's make sure that journal gets formatted with correct block size.

Patch also adds the ability to create fs with different block
sizes if BLCKSIZE or <facet>_BLOCKSIZE are set.

Fixes: d01d4c697a ("LU-957 scrub: Proc interfaces and tests for OI scrub")
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11008
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I0a82e34efc23d318bbd52946916ae8f2b7cd94eb
Reviewed-on: https://review.whamcloud.com/47930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15393 tests: check QoS hang with OST failover 15/47715/4
Alexander Boyko [Thu, 23 Jun 2022 13:33:47 +0000 (09:33 -0400)]
LU-15393 tests: check QoS hang with OST failover

Patch adds recovery small test 152, to reproduce situation
where MDT object allocation sleeps on OST failover at
lod_ost_alloc_rr under lq_rw_sem read. And all other creation threads
hang at lod_ost_alloc_qos at down_write(lq_rw_sem).

HPE-bug-id: LUS-10388
Test-Parameters: trivial testlist=recovery-small env=ONLY=152
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I7b9c5a5c9870a559e673a5fd253dcaea40d9fe63
Reviewed-on: https://review.whamcloud.com/47715
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16081 lnet: Memory leak on adding existing interface 73/48173/7
Frank Sehr [Tue, 9 Aug 2022 17:10:54 +0000 (10:10 -0700)]
LU-16081 lnet: Memory leak on adding existing interface

In the function lnet_dyn_add_ni an lnet_ni structure is allocated.
In case of an error the function returns without freeing the memory of
the structure.
Added handling of possible lnet_net structure memory leaks.

Test-parameters: trivial testlist=sanity-lnet
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: I7544a9379093b99f77aaddb8d021b4a5bf221082
Reviewed-on: https://review.whamcloud.com/48173
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15694 quota: keep grace time while setting default limits 35/46935/11
Hongchao Zhang [Thu, 28 Jul 2022 13:54:00 +0000 (21:54 +0800)]
LU-15694 quota: keep grace time while setting default limits

The quota grace time should only be changed by "lfs setquota -t",
and it should be kept while setting default quota limits.

This patch also fixes an issue of not saving the grace time while
writing glboal quota record.

Signed-off-by: Hongchao Zhag <hongchao@whamcloud.com>
Change-Id: I89ca49d09dc41deffe4bc77e53721b5bb4f4be37
Reviewed-on: https://review.whamcloud.com/46935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15642 obdclass: use consistent stats units 33/46833/9
Andreas Dilger [Wed, 16 Mar 2022 04:51:55 +0000 (22:51 -0600)]
LU-15642 obdclass: use consistent stats units

Use consistent stats units, since some were "usec" and others "usecs".
Most stats already use LPROCFS_TYPE_* to encode type stats type, so
use this to provide units for those stats, and only explicitly provide
strings for the few stats that don't match the commonly-used units.
This also reduces the number of repeat static strings in the modules.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I25f31478f238072ddbf9a3918cd43bb08c3ebbe5
Reviewed-on: https://review.whamcloud.com/46833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15930 lnet: Remove duplicate checks for peer sensitivity 26/46626/12
Chris Horn [Thu, 24 Feb 2022 20:30:59 +0000 (14:30 -0600)]
LU-15930 lnet: Remove duplicate checks for peer sensitivity

Callers of lnet_inc_lpni_healthv_locked() and
lnet_dec_healthv_locked() currently check whether the parent peer
has a peer specific sensitivity defined. To remove this code
duplication, this logic is rolled into
lnet_inc_lpni_healthv_locked() and lnet_dec_lpni_healthv_locked().
The latter is a new wrapper around lnet_dec_healthv_locked().

lnet_dec_healthv_locked() is changed to return a bool indicating
whether the health value was actually modified so that the peer
net health is only updated when the peer NI health actually changes.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11018
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I624561167392ad625ea7478689e9c5975cec3f2e
Reviewed-on: https://review.whamcloud.com/46626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15929 lnet: Correct net selection for router ping 27/47527/4
Chris Horn [Wed, 1 Jun 2022 02:19:07 +0000 (21:19 -0500)]
LU-15929 lnet: Correct net selection for router ping

lnet_find_best_ni_on_local_net() contains logic for restricting
the NI selection to a net specified by lnet_peer::lp_disc_net_id. The
purpose of this is to ensure that LNet peers ping every interface on
a router at a regular interval as part of the LNet router health
feature. However, this logic is flawed because lnet_msg_discovery()
is used to determine whether the message being sent is a discovery
message, but that function actually determines whether a given message
can _trigger_ discovery.

Introduce a new function, lnet_msg_is_ping(), which determines whether
a given lnet_msg is a GET on the LNET_RESERVED_PORTAL.
Modify lnet_find_best_ni_on_local_net() to restrict NI selection to
lp_disc_net_id iff:
1. lp_disc_net_id is non-zero
2. The peer has the LNET_PEER_RTR_DISCOVERY flag set.
3. lnet_msg_is_ping() returns true

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11017
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3dbdfd5c44b6167d24b7b6e0b1097db0b3c5cb76
Reviewed-on: https://review.whamcloud.com/47527
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15595 lnet: LNet peer aliveness broken 23/46623/11
Chris Horn [Mon, 14 Feb 2022 21:48:15 +0000 (21:48 +0000)]
LU-15595 lnet: LNet peer aliveness broken

The peer health feature used on LNet routers is intended to detect if
a peer is dead or alive by keeping track of the last time it received
a message from the peer. If the last alive value is outside of a
configurable interval then the peer is considered dead and the router
will drop messages to that peer rather than attempt to send to it.

This feature no longer works as intended because even if the
last alive value is outside the interval the router will still
consider the peer NI to be alive if the health value of the NI and
the cached status both indicate the peer NI is alive.

So even if a router has not received any messages from the client in
days, as long as the router thinks the peer's interfaces are healthy
then it will consider the peer alive. This doesn't make any sense as
peers are supposed to regularly ping the router, and if they don't do
so then they should not be considered alive.

Fix the issue by relying solely on the last alive value to determine
peer aliveness. Do not consider the health value or cached status
when determining whether to drop the message.

lnet_peer_alive_locked() has single caller that only checks whether
zero was returned. We can convert lnet_peer_alive_locked() to return
bool rather than int.

Rename lnet_peer_alive_locked() to lnet_check_message_drop() to
better reflect the purpose of the function. The return value is
inverted to reflect the name change.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaabdf5109676ffd18bdba9627afea7e041ddc1e1
Reviewed-on: https://review.whamcloud.com/46623
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15595 tests: Add various router tests 22/46622/12
Chris Horn [Mon, 7 Feb 2022 23:20:37 +0000 (23:20 +0000)]
LU-15595 tests: Add various router tests

Add test cases to exercise LNet routing.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4a077937b3e3b8b07707afeb0c5c23ec1c9074f4
Reviewed-on: https://review.whamcloud.com/46622
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14955 lnet: Use fatal NI if none other available 46/44746/6
Serguei Smirnov [Tue, 24 Aug 2021 20:48:41 +0000 (13:48 -0700)]
LU-14955 lnet: Use fatal NI if none other available

Allow NI in fatal state to be selected for sending if there are no
NIs in non-fatal state.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11019
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iab8ef6ee5c5f45896196dbd88a2f61e004278297
Reviewed-on: https://review.whamcloud.com/44746
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16058 build: proc_ops check fails with SUBARCH undefined 01/48101/3
Shaun Tancheff [Mon, 1 Aug 2022 13:58:46 +0000 (20:58 +0700)]
LU-16058 build: proc_ops check fails with SUBARCH undefined

During configure with config.cache enabled SUBARCH may not
be defined.

Move the definition to a location that must be traversed.

Test-Parameters: trivial
Fixes: a5084c2f2e ("LU-14937 build: re-use config cache in 'make rpms/debs'")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I0a7b4de3ecccd41b1c55e8b2df29039517e0c416
Reviewed-on: https://review.whamcloud.com/48101
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-12514 target: move server mount code to target layer 60/47160/6
James Simmons [Mon, 6 Jun 2022 14:48:24 +0000 (10:48 -0400)]
LU-12514 target: move server mount code to target layer

Currently the server mount code for lustre_tgt is all in obdclass. If
we change the stack to initialize the LNet / ptlrpc layer after mounting
then we will end up with modular circular dependencies. To avoid this
move all the lustre_tgt mounting code to the target layer. This way the
mounting code can use both ptlrpc and obdclass module routiens. Also include
MODULE_ALAIS("lustre_tgt") so mount -t lustre_tgt will load ptlrpc which
contains the target layer.

Change-Id: I392602e8fd18d001cb97b05b909c366ba5b8fa82
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47160
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15811 llite: Refactor DIO/AIO free code 15/48115/5
Patrick Farrell [Wed, 3 Aug 2022 16:48:13 +0000 (12:48 -0400)]
LU-15811 llite: Refactor DIO/AIO free code

Refactor the DIO/AIO free code and add some asserts.

This removes a potential use-after-free in the freeing
code.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I335b18fc7a28fc426a25675e2449d3d192cba596
Reviewed-on: https://review.whamcloud.com/48115
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15811 llite: Unify range unlock 00/48000/5
Patrick Farrell [Wed, 3 Aug 2022 16:45:37 +0000 (12:45 -0400)]
LU-15811 llite: Unify range unlock

Correct parallel_dio condition and unify range unlock code
block.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ib66e8def571054df5117c279e238894bc3b58bce
Reviewed-on: https://review.whamcloud.com/48000
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15003 sec: use enc pool for bounce pages 49/47149/11
Sebastien Buisson [Fri, 25 Mar 2022 08:24:32 +0000 (09:24 +0100)]
LU-15003 sec: use enc pool for bounce pages

Take pages from the enc pool so that they can be used for
encryption, instead of letting llcrypt allocate a bounce page
for every call to the encryption primitives.
Pages are taken from the enc pool a whole array at a time.

This requires modifying the llcrypt API, so that new functions
llcrypt_encrypt_page() and llcrypt_decrypt_page() are exported.
These functions take a destination page parameter.
Until this change is pushed in upstream fscrypt, this performance
optimization is not available when Lustre is built and run against
the in-kernel fscrypt lib.

Using enc pool for bounce pages is a worthwhile performance win. Here
are performance penalties incurred by encryption, without this patch,
and with this patch:

                     ||=====================|=====================||
                     || Performance penalty | Performance penalty ||
                     ||    without patch    |     with patch      ||
||==========================================|=====================||
|| Bandwidth â€“ write |        30%-35%       |   5%-10% large IOs  ||
||                   |                      |    15% small IOs    ||
||------------------------------------------|---------------------||
|| Bandwidth â€“ read  |         20%          |    less than 10%    ||
||------------------------------------------|---------------------||
||      Metadata     |         N/A          |         5%          ||
|| creat,stat,remove |                      |                     ||
||==========================================|=====================||

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I3078d0a3349b3d24acc5e61ab53ac434b5f9d0e3
Reviewed-on: https://review.whamcloud.com/47149
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14719 utils: dir migration stop on error 40/47040/3
Lai Siyao [Tue, 29 Mar 2022 23:41:23 +0000 (19:41 -0400)]
LU-14719 utils: dir migration stop on error

Once directory migration fails, it should stop immediately since
current migration won't succceed, and subsequent migration may
fail on the same error.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I96c1693d1b1da0856c925b9b22c1ab7f3181f0d8
Reviewed-on: https://review.whamcloud.com/47040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15357 iokit: fix the obsolete usage of cfg_device 72/45872/9
Hongchao Zhang [Wed, 14 Oct 2020 01:46:42 +0000 (09:46 +0800)]
LU-15357 iokit: fix the obsolete usage of cfg_device

The LCTL command "cfg_device" is obsolete and some operations
(such as "cleanup", "detach") don't support it anymore.
In mds_survey and lfsck-performance it causes the echo client
device not to be destroyed and causes LBUG when umounting the
related Lustre device.

Change-Id: If7f6eff080906e395023289652fcd2a78dfb6fb7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45872
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15811 llite: Rework upper/lower DIO/AIO 87/47187/17
Patrick Farrell [Mon, 2 May 2022 20:17:02 +0000 (16:17 -0400)]
LU-15811 llite: Rework upper/lower DIO/AIO

One of the patches for LU-13799,
"Implement lower/upper aio"
(https://review.whamcloud.com/44209/) created a
complicated setup where the cl_dio_aio struct was used
both for the top level DIO or AIO and for the lower level
sub I/Os (corresponding to stripes).

This is quite complicated and hard to follow, so this
rewrites these two uses to be separate structs.  This
incidentally fixes at least one possible memory leak, but
is mostly a cleanup.

Fixes: 46ff761371 "LU-13799 Implement lower/upper aio"
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ide4a2b84f48624ee97dfb57fe80d201fbb7fe8d0
Reviewed-on: https://review.whamcloud.com/47187
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 lnet: change ni_status in lnet_ni to u32* 26/44626/7
Mr NeilBrown [Mon, 28 Jun 2021 05:29:38 +0000 (15:29 +1000)]
LU-10391 lnet: change ni_status in lnet_ni to u32*

struct lnet_ni.ni_status points to a 'struct lnet_ni_status', but only
the ns_status field of that structure is ever accessed.

Change ni_status to point directly to just the ns_status field.
This will provide flexibility for introducing a variant for 'struct
lnet_ni_status' which holds a large-address nid.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I5570608e98bc2aa1156b8d885df2a56f8ae7b6f7
Reviewed-on: https://review.whamcloud.com/44626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16056 libcfs: restore umask handling in kernel threads 33/48233/3
Andreas Dilger [Tue, 16 Aug 2022 15:52:26 +0000 (15:52 +0000)]
LU-16056 libcfs: restore umask handling in kernel threads

This reverts commit 9013eb2bb5 which incorrectly assumes that Lustre
service threads do not modify umask.  A quick grep shows that umask
is modified in osd-ldiskfs __osd_create().

If some other thread sharing the same fs context is modifying umask
in an incompatible way (which includes all Lustre threads after
this patch) then it will occasionally break created file access
permissions for Lustre.

Change-Id: I589b72e4286dc84f4e3f1a0c54fe31aa988e6c18
Fixes: 9013eb2bb5 (LU-9859 libcfs: don't call unshare_fs_struct()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48233
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16082 ldiskfs: old-style EA inode handling fix 74/48174/2
Alexander Zarochentsev [Tue, 9 Aug 2022 07:55:48 +0000 (10:55 +0300)]
LU-16082 ldiskfs: old-style EA inode handling fix

The upstream version of EA inodes support coming
with RHEL8 (linux kernel 4.18+) have a slightly different
implementation of EA inodes support and also have a
compatibility code to recognize old-style Lustre-only EAs.
Unfortunately the compatibility code is broken and makes
old xattr data unaccessible due to a wrong hash value check.

HPE-bug-id: LUS-11133
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Icd6f93d4ebb33dcd03b58f9eb364905c18ae81dc
Reviewed-on: https://review.whamcloud.com/48174
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16011 lnet: use preallocate bulk for server 52/47952/6
Alexey Lyashkov [Thu, 14 Jul 2022 13:39:39 +0000 (16:39 +0300)]
LU-16011 lnet: use preallocate bulk for server

Server side want to have a preallocate bulk to avoid large lock
contention on the page cache.
Without it LST limited with 35Gb/s speed with 3 rail host (HDR each)
due large CPU usage.
Preallocate bulks increase a memory consumption for small bulk,
but performance improved dramatically up to 74Gb/s with very low
cpu usage.

Test-Parameters: trivial testlist=sanity-lnet,lnet-selftest
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: If1eaf5addf6c9d9f695a892dc66023b3bc293208
Reviewed-on: https://review.whamcloud.com/47952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1] 96/47696/16
Jian Yu [Wed, 24 Aug 2022 04:09:56 +0000 (21:09 -0700)]
LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1]

This patch makes changes to support new SLES15 SP4 release
with kernel 5.14.21-150400.24.18.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 244a" testlist=sanity
Test-Parameters: trivial clientdistro=sles15sp3

Change-Id: I0bf548835578163767d2f6a2a5e5bd2b33154871
Co-Authored-By: Minh Diep <mdiep@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47696
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-14745 tests: ensure sanity/51d has enough objects 86/48086/13
Andreas Dilger [Fri, 29 Jul 2022 22:02:04 +0000 (16:02 -0600)]
LU-14745 tests: ensure sanity/51d has enough objects

Ensure that sanity test_51d has precreated enough objects on the
OSTs for the tests to finish without running out.  Otherwise,
some OSTs may be skipped and skew the results.

Test-Parameters: trivial testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity env=ONLY=51d,ONLY_REPEAT=5 mdscount=2 mdtcount=4
Fixes: fd5c915eff ("LU-15282 tests: improve sanity test_51d coverage")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifaf79173aa34fc7f3b3b1ad3d2876b65c16d1474
Reviewed-on: https://review.whamcloud.com/48086
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
19 months agoLU-15874 kernel: new kernel [RHEL 9.0 5.14.0-70.22.1.el9_0] 47/47847/11
Jian Yu [Mon, 15 Aug 2022 15:50:52 +0000 (08:50 -0700)]
LU-15874 kernel: new kernel [RHEL 9.0 5.14.0-70.22.1.el9_0]

This patch makes changes to support new RHEL 9.0 release
for Lustre client.

fix lbuild to include modified find-requires.ksyms

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="130 244a" testlist=sanity

Change-Id: Ib7fdf9d3946df626759d395b5000b375391da344
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16078 o2iblnd: Salt comp_vector 48/48148/2
Ian Ziemba [Thu, 23 Jun 2022 21:30:37 +0000 (16:30 -0500)]
LU-16078 o2iblnd: Salt comp_vector

If conns_per_peer is greater than 1, all the connections targeting
the same peer are assigned the same comp_vector. This results in
multiple IB CQs targeting the same peer to be serialized on a single
comp_vector.

Help spread out the IB CQ work to multiple cores by salting
comp_vector based on number of connections.

1 client to 1 server LST 1M write results with 4 conns_per_peer and
RXE configured to spread out work based on comp_vector.

Before: 1377.92 MB/s
After: 3828.48 MB/s

Test-Parameters: trivial
HPE-bug-id: LUS-11043
Change-Id: I4e3e2056947ee54d6d65f17e238163c9dc38cd61
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-on: https://review.whamcloud.com/48148
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-13813 tests: fix stack_trap in conf-sanity test 110/111 22/48022/4
Jian Yu [Sat, 23 Jul 2022 07:27:43 +0000 (00:27 -0700)]
LU-13813 tests: fix stack_trap in conf-sanity test 110/111

This patch fixes stack_trap in conf-sanity test 110 and 111
to restore test environment.

Test-Parameters: trivial env=SLOW=yes,ENABLE_QUOTA=yes,CONF_SANITY_EXCEPT=107 \
clientdistro=el8.5 serverdistro=el8.5 testlist=conf-sanity

Test-Parameters: trivial env=SLOW=yes,ENABLE_QUOTA=yes \
fstype=zfs \
clientdistro=el8.5 serverdistro=el8.5 testlist=conf-sanity

Test-Parameters: trivial env=SLOW=yes,ENABLE_QUOTA=yes,CONF_SANITY_EXCEPT=107 \
mdscount=2 mdtcount=4 \
clientdistro=el8.5 serverdistro=el8.5 testlist=conf-sanity

Change-Id: I540d96e8ad2c4990e7da18fe22256b44e9a19c72
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48022
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14139 statahead: add total hit/miss count stats 09/46309/7
Qian Yingjin [Wed, 26 Jan 2022 02:50:37 +0000 (21:50 -0500)]
LU-14139 statahead: add total hit/miss count stats

In this patch, it adds total hit/miss count stats for statahead.
These statistics are updated when the statahead thread terminated.

This patch also adds support to clear all statahead stats:
$LCTL set_param llite.*.statahead_stats=0

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I8b11d26385234305631c232a15711224dcfb0668
Reviewed-on: https://review.whamcloud.com/46309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14139 llite: simplify callback handling for async getattr 48/45648/14
Qian Yingjin [Thu, 19 Nov 2020 15:15:37 +0000 (23:15 +0800)]
LU-14139 llite: simplify callback handling for async getattr

In this patch, it prepares the inode and set lock data directly in
the callback interpret of the intent async getattr RPC request (in
ptlrpcd context), simplifies the old impementation that defer this
work in the statahead thread.

If the statahead entry is a striped directory, it may generate
new RPCs in the ptlrpcd interpret context to obtain the
attributes for slaves of the striped directory:
@ll_prep_inode()->@lmv_revaildate_slaves()
This is dangerous and may result in deadlock in ptlrpcd interpret
context, thus we use work queue to handle these extra RPCs.
Add sanity 123d to verify that it works correctly.

According to the benchmark result, the workload "ls -l" to a large
directory on a client without any caching (server and client),
containing 1M files (47001 bytes) shows the results with measured
elapsed time:
- w/o patch: 180 seconds;
- w patch: 181 seconds;

There is no any obvious performance regession.

Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I43aba0f609243f34f7e7b674c7fff5fa417b1c02
Reviewed-on: https://review.whamcloud.com/45648
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode 93/48093/2
Alexander Zarochentsev [Fri, 29 Jul 2022 19:38:09 +0000 (22:38 +0300)]
LU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode

The flag should be cleared for "fast" symlinks otherwise
e2fsck complains about inode correctness.
New agent inodes of symlink type may have EXT4_EXTENT_FL flag
set if the fs has "extent" feature and it is not cleared as in
other places where "fast" symlinks are created.

HPE-bug-id: LUS-10237

Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib7b807bb1298cc3a9fd4fdba35747b4bda6fe034
Reviewed-on: https://review.whamcloud.com/48093
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16060 osd-ldiskfs: copy nul byte terminator in writelink 92/48092/2
Alexander Zarochentsev [Wed, 20 Jul 2022 16:05:53 +0000 (19:05 +0300)]
LU-16060 osd-ldiskfs: copy nul byte terminator in writelink

memcpy() call in osd_ldiskfs_writelink() doesn't copy the nul
terminator byte from the source buffer, leaving the space
after target link name uninialized which is ok for the kernel
code and debugfs but not e2fsck.

HPE-bug-id: LUS-11103

Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I914f2c78e1a6571bf360a23b0ede8c70502bf0df
Reviewed-on: https://review.whamcloud.com/48092
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16037 build: remove quotes from %{mkconf_options} 44/48044/4
Jian Yu [Wed, 3 Aug 2022 21:15:06 +0000 (14:15 -0700)]
LU-16037 build: remove quotes from %{mkconf_options}

This patch fixes lustre-dkms.spec.in to remove quotes
from %{mkconf_options} passed to dkms.mkconf, so as to
resolve the following build issue:

dkms.conf: Error! Directive 'DEST_MODULE_LOCATION'
does not begin with '/kernel', '/updates', or '/extra'
in record #0.

Test-Parameters: trivial

Change-Id: I0b365d7a96cb632680bc2321e87b28a3bf076e47
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48044
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoNew tag 2.15.51 2.15.51 v2_15_51
Oleg Drokin [Mon, 8 Aug 2022 19:57:22 +0000 (15:57 -0400)]
New tag 2.15.51

Change-Id: I2fc9e9fae7a975f047528c4670276b239d77ac26
Signed-off-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15994 llite: use fatal_signal_pending in range_lock 06/48106/2
Qian Yingjin [Tue, 2 Aug 2022 09:14:48 +0000 (05:14 -0400)]
LU-15994 llite: use fatal_signal_pending in range_lock

FIO io_uring failed with one file shared by two FIO processes
under Unubtu 2204 kernel.
After analyzed, we found that range_lock() function return
-ERESTARTSYS when there pending signal on current process in
Lustre I/O. This causes -EINTR returned to the application.

we solve this bug by replacing @signal_pending(current) with
@fatal_signal_pending(current) in range_lock(). The range_lock()
function only returns -ERESTARTSYS when the current process has
fatal pending signal such as SIGKILL.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I0a0be8fa3b4ba5c89f7866286b2bdc6595f18026
Reviewed-on: https://review.whamcloud.com/48106
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16042 tests: can not get cache size on Arm64 30/48030/2
Kevin Zhao [Mon, 25 Jul 2022 07:53:44 +0000 (15:53 +0800)]
LU-16042 tests: can not get cache size on Arm64

This fix the test fail on Arm64, the cache size can not be
display on /proc/cpuinfo. And even in the VM and somee
older Arm64 CPU, we can not get the cachesize. So it's
better to fallback to a pre-set value here if we don't get
the cache size.

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Change-Id: I17ce1d8accc69d1489db2071a2741b3927fff302
Reviewed-on: https://review.whamcloud.com/48030
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16019 llite: fully disable readahead in kernel I/O path 93/47993/4
Qian Yingjin [Wed, 20 Jul 2022 02:22:35 +0000 (22:22 -0400)]
LU-16019 llite: fully disable readahead in kernel I/O path

In the new kernel (rhel9 or ubuntu 2204), the readahead path may
be out of the control of Lustre CLIO engine:

generic_file_read_iter()
  ->filemap_read()
    ->filemap_get_pages()
      ->page_cache_sync_readahead()
        ->page_cache_sync_ra()

void page_cache_sync_ra()
{
if (!ractl->ra->ra_pages || blk_cgroup_congested()) {
if (!ractl->file)
return;
req_count = 1;
do_forced_ra = true;
}

/* be dumb */
if (do_forced_ra) {
force_page_cache_ra(ractl, req_count);
return;
}
...
}

From the kernel readahead code, even if read-ahead is disabled
(via @ra_pages == 0), it still issues this request as read-ahead
as we will need it to satisfy the requested range. The forced
read-ahead will do the right thing and limit the read to just
the requested range, which we will set to 1 page for this case.

Thus it can not totally avoid the read-ahead in the kernel I/O
path only by setting @ra_pages with 0.
To fully disable the read-ahead in the Linux kernel I/O path, we
still need to set @io_pages to 0, it will set I/O range to 0 in
@force_page_cache_ra():
void force_page_cache_ra()
{
...
max_pages = = max_t(unsigned long, bdi->io_pages,
    ra->ra_pages);
nr_to_read = min_t(unsigned long, nr_to_read, max_pages);
while (nr_to_read) {
...
}
...
}

After set bdi->io_pages with 0, it can pass the sanity/101j.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I859a6404abb9116d9acfa03de91e61d3536d3554
Reviewed-on: https://review.whamcloud.com/47993
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15938 llog: llog_reader to detect more corruptions 34/47934/6
Mikhail Pershin [Tue, 12 Jul 2022 06:40:38 +0000 (09:40 +0300)]
LU-15938 llog: llog_reader to detect more corruptions

Improve llog_reader to determine more corruptions and report
errors
 - notify if llog bitmap has bits set with no records in llog
 - compare header records count with amount of records really
   found
 - fix amount of records to output, preventing wrong output of
   NOT SET record
 - list missing records in gap if found
 - count all errors found, add prefix 'error:' in output for
   better output processing by third-party scripts
 - don't exit immediately in case of error but continue if
   possible and output all read valid data

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic47dc6bb6cbdd9db6f888a0b892254403a628912
Reviewed-on: https://review.whamcloud.com/47934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15996 quota: change 'none' to 'expired' for grace 12/47912/7
Hongchao Zhang [Thu, 28 Jul 2022 13:56:55 +0000 (21:56 +0800)]
LU-15996 quota: change 'none' to 'expired' for grace

If the grace time is expired, the output of grace in 'lfs quota'
is better to use 'expired' than 'none'.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Honghao Zhang <hongchao@whamcloud.com>
Change-Id: I7a3fac77ca6e16ad406bef0bd7642d6d50feb4b2
Reviewed-on: https://review.whamcloud.com/47912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15896 gss: support OpenSSLv3 17/47717/11
Sebastien Buisson [Mon, 13 Jun 2022 12:41:11 +0000 (14:41 +0200)]
LU-15896 gss: support OpenSSLv3

Lustre GSS code makes use of some OpenSSL API that has been
deprecated in v3, namely all the functions in the DH_* family.
So replace them with their EVP_PKEY_* counterparts if Lustre is
built on a system with OpenSSLv3.

Fixes: ee60c14360 ("LU-15896 gss: ignore OpenSSLv3 deprecated API")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I78a4ca18b25aca3c34fe84e41413a33caddc01b6
Reviewed-on: https://review.whamcloud.com/47717
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-9859 lfsck: use Linux bitmap API 79/47579/11
James Simmons [Mon, 11 Jul 2022 16:41:58 +0000 (12:41 -0400)]
LU-9859 lfsck: use Linux bitmap API

Replace the use of the libcfs specific bitmap API used by lfsck
with the standard Linux bitmap API.

Change-Id: Icc0d9d2ceb9ca7b4b94dd728d9b9c499cf4d2414
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47579
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14975 utils: non-recursive dir migration fix 12/47012/2
Lai Siyao [Wed, 30 Mar 2022 06:28:04 +0000 (02:28 -0400)]
LU-14975 utils: non-recursive dir migration fix

If sem_init() doesn't return 0, llapi_semantic_traverse() won't
call sem_fini() in directory traverse, therefore
cb_migrate_mdt_init() shouldn't increase param->fp_depth if it
reaches max depth in non-recursive mode.

Update sanity 230w.

Fixes: 5604a6d270b ("LU-14975 dne: dir migration in non-recursive mode")
Signed-off-by: Lai Syao <lai.siyao@whamcloud.com>
Change-Id: I8814aaae7c267cec51654175f9fa0708f7685a5a
Reviewed-on: https://review.whamcloud.com/47012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15486 lod: mirroring a plain file in mirrored-layout dir 17/46517/2
Bobi Jam [Mon, 14 Feb 2022 11:50:04 +0000 (19:50 +0800)]
LU-15486 lod: mirroring a plain file in mirrored-layout dir

If a file does not have a mirror in a directory with a default FLR
mirror, then "lfs mirror extend" on the file fails with
"cannot create volatile file: Invalid argument".

This comes from the the non-striped file layout generated from
LOD inheriting its FLR state from the default FLR while it contains
no mirror in it, and lov_init_composite() will complain about it.

 if (equi(flr_state == LCM_FL_NONE, comp->lo_mirror_count > 1))
         RETURN(-EINVAL);

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5e849acb2327ce735d0008271bfd48fa7293161c
Reviewed-on: https://review.whamcloud.com/46517
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8837 lustre: make uapi...lustre_disk.h unnecessary on client 94/41994/14
Mr NeilBrown [Sun, 8 May 2022 22:09:31 +0000 (18:09 -0400)]
LU-8837 lustre: make uapi...lustre_disk.h unnecessary on client

uapi/linux/lustre/lustre_disk.h doesn't contain anything that is
needed for client-only code, but that code doesn't compile with the
file excluded, largely due to dependency on IS_SERVER() and related
macros.

So for client-only code provide stubs for IS_SERVER() and related
macros, and don't include the uapi...lustre_disk.h file.

This will cause some code to now be compiled-out on client-only, and
allows some #ifdefs to be removed.

A few function need to be protected with #ifdef HAVE_SERVER_SUPPORT,
and llog_server.o needs to be disabled for client-only compiles.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I19c5b1612108e448f8b6a1fe3d3a448aa4abdd2a
Reviewed-on: https://review.whamcloud.com/41994
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8238 ldlm: rid of obsolete param of ldlm_resource_get() 31/20631/10
Bobi Jam [Thu, 2 Sep 2021 15:44:57 +0000 (23:44 +0800)]
LU-8238 ldlm: rid of obsolete param of ldlm_resource_get()

The second parameter @parent of ldlm_resource_get() is obsolete, just
remove it.

Test-Parameters: trivial
Change-Id: I88af99c6984eda50a21da4d516ce7dea8cba60f5
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/20631
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-12511 utils: fix regression for UAPI headers for native client 03/47803/6
James Simmons [Mon, 27 Jun 2022 15:24:19 +0000 (11:24 -0400)]
LU-12511 utils: fix regression for UAPI headers for native client

A patch landed to add wiretest for the GSS wire protocol which is
lacking for the native client. Add disabling the new test code
for the native Linux client build.

Test-Parameters: trivial
Fixes: 7dfbc71350 ("LU-9243 gss: fix GSS struct definition badness")
Change-Id: I31c387b757a77f4503b923c784911afc16c878a0
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-9680 utils: fix Netlink / YAML library handling 02/47802/11
James Simmons [Sat, 16 Jul 2022 19:34:48 +0000 (15:34 -0400)]
LU-9680 utils: fix Netlink / YAML library handling

Testing the implementation of Netlink with lustre with early
code revealed some userland lnetconfig library bugs. Several bugs
fixed are:

1) First issue was that the YAML parser and emitter can share an
   netlink socket. This means the netlink callbacks will expect
   that same void argument passed in. We were in the case of error
   handling expecting an struct yaml_netlink_output but all other
   callbacks were using struct yaml_netlink_input. This mismatch
   can cause the application to segfault. So move all netlink
   callback handling to use just the yaml_parser. The yaml
   emittter now is used to just send Netlink packets to the
   kernel.

   Also fix the Netlink ext_ack error message handling.

2) In my board testing I found various bugs related to the
   paring of the YAML to create Netlink packet to send to the
   kernel. This patch resolves all the known issues. Most
   related to the complex layering of sequences, mappings and
   flows in a YAML block.

3) Fix up nla_strlcpy autoconf test which always fails with
   Oleg's special setup.

4) Add a Netlink protocol version YAML function.

Test-Parameters: trivial
Change-Id: I7e7c755ceaa969dffff8c6f771c2ac048dc55720
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47802
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Joe Atzinger <joseph.atzinger@microsoft.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8367 osp: enable replay for precreation request 89/46889/21
Alexander Boyko [Tue, 22 Mar 2022 12:09:01 +0000 (08:09 -0400)]
LU-8367 osp: enable replay for precreation request

Lustre has some kind of deadlock between osp_precreate_thread()
and stripe allocation at osp_precreate_reserve(). Stripe allocation
thread allocated objects and sleeps for more objects at
osp_precreate_reserve() in case of OST failover. After reconnection,
osp_precreate_thread() calls osp_precreate_cleanup_orphans() to
synchronize last id and clean-up unused objects, but it waits zero
object reservation(d->opd_pre_reserved). So, no more objects could
be created at OST and no reserved objects could be freed.
This produce slow creates messages and MDT creation threads hang
osp_precreate_reserve()) kjcf05-OST0003-osc-MDT0001: slow creates,
 last=[0x340000400:0x23a4f483:0x0], next=[0x340000400:0x23a4f378:0x0],
 reserved=267, sync_changes=0, sync_rpcs_in_progress=0, status=0
The issue reproduced more often with over stripe feature.

No need to do orphan clean-up phase when MDT supports
resend/replay for precreation request. This behaviour resolves the
osp_precreate_cleanup_orphans() hang and unblocks objects creation.

Force creation logic is added to support reformatted OST with a same
index. It was done during orphan clean-up phase before this.

Sanity tests 27S and 822 become invalid. 27S is based on orphan
clean-up after reconnection, 822 is based on not resendable
OST_CREATE request. These tests are removed.

HPE-bug-id: LUS-10793
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I21287b51252e573e796fac69ee3df6ac90e28c10
Reviewed-on: https://review.whamcloud.com/46889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15850 lmv: always space-balance r-r directories 78/47578/6
Lai Siyao [Thu, 9 Jun 2022 11:44:41 +0000 (07:44 -0400)]
LU-15850 lmv: always space-balance r-r directories

If the MDT free space is imbalanced, use QOS space balancing for
round-robin subdirectory creation, regardless of the depth
of the directory tree.  Otherwise, new subdirectories created
in parents with round-robin default layout may suddenly become
"sticky" on the parent MDT and upset the space balancing and
load distribution.

Add sanity/test_413h to check that round-robin dirs always balance.

Test-Parameters: testlist=sanity env=ONLY=413h,ONLY_REPEAT=100
Fixes: 38c4c538f5 ("LU-15216 lmv: improve MDT QOS space balance")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia1d0b5b1a027cf14236f93ae34b5cf4929e76d23
Reviewed-on: https://review.whamcloud.com/47578
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15846 tests: don't use comma-separated debug flags 08/47308/7
Andreas Dilger [Thu, 12 May 2022 04:45:45 +0000 (22:45 -0600)]
LU-15846 tests: don't use comma-separated debug flags

To avoid test interop issues between 2.15 clients and 2.12/2.14
servers, don't use comma-separated debug flags in sanity-quota.sh
quota_init() and quota_fini().

Test-Parameters: trivial testlist=sanity-quota.sh env=ONLY=0 serverversion=2.14.0
Fixes: 6b6fde1026 ("LU-13055 libcfs: allow comma-separated masks")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifca39054d14292bca8bcff9b8e03ae58fd5cc3a8
Reviewed-on: https://review.whamcloud.com/47308
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14651 ldiskfs: add 5.11 kernel support 00/47900/3
James Simmons [Wed, 6 Jul 2022 17:38:17 +0000 (11:38 -0600)]
LU-14651 ldiskfs: add 5.11 kernel support

The Ubuntu 20.04.3 LTS moved to the 5.11 kernel. Support for this
kernel is a small step from the 5.10 kernel support. This patch
adds these small changes to support ldiskfs.

Test-Parameters: trivial
Change-Id: I3055736658b628fe79a6a9fc20ac01e7e1597630
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47900
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15821 ldlm: Prioritize blocking callbacks 15/47215/9
Patrick Farrell [Thu, 5 May 2022 00:50:57 +0000 (20:50 -0400)]
LU-15821 ldlm: Prioritize blocking callbacks

The current code places bl_ast lock callbacks at the end of
the global BL callback queue.  This is bad because it
causes urgent requests from the server to wait behind
non-urgent cleanup tasks to keep lru_size at the right
level.

This can lead to evictions if there is a large queue of
items in the global queue so the callback is not serviced
in a timely manner.

Put bl_ast callbacks on the priority queue so they do not
wait behind the background traffic.

Add some additional debug in this area.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic6eb65819a4a93e9d30e807d386ca18380b30c7d
Reviewed-on: https://review.whamcloud.com/47215
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15850 llite: pass dmv inherit depth instead of dir depth 77/47577/6
Lai Siyao [Thu, 9 Jun 2022 11:40:42 +0000 (07:40 -0400)]
LU-15850 llite: pass dmv inherit depth instead of dir depth

In directory creation, once it's ancestor has default LMV, pass
the inherit depth, otherwise pass the directory depth to ROOT.

This depth will be used in QoS allocation.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id480f32c1718e9f62314c2dfe8905be5db94d1f2
Reviewed-on: https://review.whamcloud.com/47577
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16010 kernel: kernel update RHEL8.6 [4.18.0-372.16.1.el8_6] 48/47948/4
Jian Yu [Fri, 15 Jul 2022 16:50:08 +0000 (09:50 -0700)]
LU-16010 kernel: kernel update RHEL8.6 [4.18.0-372.16.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.16.1.el8_6.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I08db577f31a1d686b88804384a05d5b418e634d5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15910 llite: use max default EA size to get default LMV 37/47937/4
Lai Siyao [Mon, 11 Jul 2022 14:27:32 +0000 (10:27 -0400)]
LU-15910 llite: use max default EA size to get default LMV

Subdir mount will fetch ROOT default LMV and set it, but the default
EA size cl_default_mds_easize may not be set for MDT0 yet, because
it's updated upon getattr/enqueue, and if subdir mount is not on MDT0,
it may not be initialized yet. Use max EA size to fetch default
layout in ll_dir_get_default_layout().

Fixes: a162e24d2d ("LU-15910 llite: enforce ROOT default on subdir mount")
Fixes: 716ac65ef6 ("LU-15910 tests: skip sanity/413g for SSK")
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Test-Parameters: env=SHARED_KEY=true,ONLY="413b 413c 413g" testlist=sanity mdscount=1 mdtcount=2
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3c762cd371a80c2bea12d7fdbc16c6b14b3214e6
Reviewed-on: https://review.whamcloud.com/47937
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-12189 ec: code to add support for M to N parity 28/47628/5
James Simmons [Wed, 29 Jun 2022 19:32:13 +0000 (15:32 -0400)]
LU-12189 ec: code to add support for M to N parity

This code adds basic functionality for calculating N parities
for M data units. This allows much more than just working with
raid6 calculations. The code is derived from the Intel isa-l
userland library. Keep the code in an separate module for easy
merger upstream at a latter time.

Test-Parameters: trivial
Change-Id: Ie0bb5af2514c213db40de33139e03e16f9605ce8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Adam Disney <disneyaw@ornl.gov>
Reviewed-on: https://review.whamcloud.com/47628
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15851 lnet: Adjust niov checks for large MD 19/47319/3
Chris Horn [Sat, 16 Apr 2022 16:01:57 +0000 (10:01 -0600)]
LU-15851 lnet: Adjust niov checks for large MD

An LNet user can allocate a large contiguous MD. That MD can have >
LNET_MAX_IOV pages which causes some LNDs to assert on either niov
argument passed to lnd_recv() or the value stored in
lnet_msg::msg_niov. This is true even in cases where the actual
transfer size is <= LNET_MTU and will not exceed limits in the LNDs.

Adjust ksocklnd_send()/ksocklnd_recv() to assert on the return value
of lnet_extract_kiov().

Remove the assert on msg_niov (payload_niov) from kiblnd_send().
kiblnd_setup_rd_kiov() will already fail if we exceed ko2iblnd's
available scatter gather entries.

HPE-bug-id: LUS-10878
Test-Parameters: trivial
Fixes: 857f11169f ("LU-13004 lnet: always put a page list into struct lnet_libmd")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaa851d90f735d04e5167bb9c07235625759245b2
Reviewed-on: https://review.whamcloud.com/47319
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16036 test: make sanity-lfsck 15d more robust 48/48048/3
Lai Siyao [Fri, 15 Jul 2022 08:30:32 +0000 (04:30 -0400)]
LU-16036 test: make sanity-lfsck 15d more robust

Now migrating directory LFSCK is not fully supported, thus accessing
such directory may fail. To make sanity-lfsck 15d more robust,
reformat servers after test.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-lfsck env=ONLY=15d,ONLY_REPEAT=100
Fixes: 54a2d4662b58 ("LU-15868 lfsck: don't crash upon dir migration failure")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie613737ab3e8d294e1f9e5709ceb35baa75790ad
Reviewed-on: https://review.whamcloud.com/48048
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16023 tests: sanity-quota/8 should return success 66/47966/4
Alex Zhuravlev [Mon, 18 Jul 2022 10:28:49 +0000 (13:28 +0300)]
LU-16023 tests: sanity-quota/8 should return success

sanity-quota/8 should return success explicitly

Test-Parameters: trivial testlist=sanity-quota
Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout or short cmd error")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id30bfd3e0bafedb6516471accbc0519cc640d2bd
Reviewed-on: https://review.whamcloud.com/47966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15282 tests: relax test_51d thresholds somewhat 78/47978/3
Andreas Dilger [Mon, 18 Jul 2022 22:12:12 +0000 (16:12 -0600)]
LU-15282 tests: relax test_51d thresholds somewhat

Added combinations for sanity.sh test_51d are failing some fraction
of the time.  Relax thresholds somewhat to avoid spurious failures,
while keeping added configs to detect major regressions.

Test-Parameters: trivial
Fixes: fd5c915eff ("LU-15282 tests: improve sanity test_51d coverage")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2d28f77fa377a1eb96a00cf35827b9ebc5af806b
Reviewed-on: https://review.whamcloud.com/47978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15983 lnet: Define KFILND network type 30/47830/5
Chris Horn [Wed, 29 Jun 2022 00:24:32 +0000 (19:24 -0500)]
LU-15983 lnet: Define KFILND network type

Define the KFILND network type. This reserves the network type number
for future implementation and allows creation of kfi peers and
adding routes to kfi peers.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11060
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9111645f1290c8af4937d1b2689a068df81922a4
Reviewed-on: https://review.whamcloud.com/47830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15850 mdt: pack default LMV in open reply 76/47576/7
Lai Siyao [Thu, 9 Jun 2022 11:26:40 +0000 (07:26 -0400)]
LU-15850 mdt: pack default LMV in open reply

Add flag MDS_OPEN_DEFAULT_LMV to indicate that default LMV should be
packed in open reply, otherwise if open fetches LOOKUP lock, client
won't know directory has default LMV, and in subdir creation default
LMV won't take effect.

Test-Parameters: clientversion=2.14 testlist=sanity mdtcount=4 mdscount=2 env=SANITY_EXCEPT="39l 134b 150b 160g 205a 208 230e 230p 270a 300g 807"
Test-Parameters: serverversion=2.14 testlist=sanity mdtcount=4 mdscount=2 env=SANITY_EXCEPT="65n 247f"
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If2300ca39f406169eff9eab8f973ca1c2bfc8202
Reviewed-on: https://review.whamcloud.com/47576
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14472 quota: skip non-exist or inact tgt for lfs_quota 71/41771/19
Hongchao Zhang [Wed, 15 Dec 2021 12:11:17 +0000 (20:11 +0800)]
LU-14472 quota: skip non-exist or inact tgt for lfs_quota

The nonexistent or inactive targets (MDC or OSC) should be skipped
for "lfs quota".

Change-Id: I25eece413715e4e05dd94ccbfd101220da7477f9
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41771
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11407 tgt: cleanup job_stats output printing 64/37764/24
Andreas Dilger [Fri, 10 Jan 2020 21:18:53 +0000 (14:18 -0700)]
LU-11407 tgt: cleanup job_stats output printing

Escape non-printable and other special characters in the JobID
name, which may be passed from the client environment, to avoid
breaking YAML format parsing.  We can't use the kernel "%*pE"
escape format, because that doesn't have any option to escape
printable characters like quotes or regular spaces.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If6a0bc4276fae03305f94e8c85d8f109913ebbe5
Reviewed-on: https://review.whamcloud.com/37764
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-6612 utils: strengthen llog_reader vs wrong format/header 54/15654/5
Bruno Faccini [Mon, 20 Jul 2015 14:30:11 +0000 (16:30 +0200)]
LU-6612 utils: strengthen llog_reader vs wrong format/header

The following snippet shows that llog_reader can be puzzled due to
an invalid 0 for the number of records when parsing an expected
LLOG file header :
root# dd if=/dev/zero bs=4096 count=1 of=/tmp/zeroes
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000263962 s, 15.5 MB/s
root# llog_reader /tmp/zeroes
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I12be79e6c6a5da384a5fd81878a76a7ea8aa5834
Reviewed-on: https://review.whamcloud.com/15654
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15984 o2iblnd: debug message is missing a newline 33/47933/2
Serguei Smirnov [Mon, 11 Jul 2022 22:49:04 +0000 (15:49 -0700)]
LU-15984 o2iblnd: debug message is missing a newline

Add missing newline to one of the debug messages in
kiblnd_pool_alloc_node.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I541622322ea6166892270dbfd1567cc64f8c314c
Reviewed-on: https://review.whamcloud.com/47933
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15938 lod: prevent endless retry in recovery thread 98/47698/3
Mikhail Pershin [Wed, 22 Jun 2022 10:27:48 +0000 (13:27 +0300)]
LU-15938 lod: prevent endless retry in recovery thread

- abort lod_sub_recovery_thread() by obd_abort_recov_mdt in
  addition to obd_abort_recovery
- handle 'short llog' situation gracefully, when remote llog
  is shorter than local copy header expects, trust remote llog
  data and consider llog processing as finished
- on other errors during remote llog read, set obd_abort_recov_mdt
  but not obd_abort_recovery in attempt to skip MDT-MDT recovery
  only and continue with client recovery while possible
- fix parsing problem with 'abort_recov' and 'abort_recov_mdt' in
  lmd_parse() causing no MDT recovery abort but client recovery
  abort always. Allow also 'abort_recovery_mdt' mount option name

The original case with endless retry is caused by such de-sync
between local llog structures and remote llog. The local llog
header says there is record with some ID, so recovery thread
is trying to get that record from remote llog. Meanwhile there
is no such record on remote server, so it reads whole llog and
return it back properly but llog processing consider that as
incomplete llog due to network issues and retry endlessly.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib127fd0d1abd5289d90c7b4b3ca74ab6fc78bc71
Reviewed-on: https://review.whamcloud.com/47698
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10994 clio: Remove cl_2queue_add wrapper 51/47651/4
Shivani Bhardwaj [Mon, 27 Jun 2022 15:13:25 +0000 (11:13 -0400)]
LU-10994 clio: Remove cl_2queue_add wrapper

Remove the wrapper function cl_2queue_add() and replace all its calls in
different files with the function it wrapped. Also, comments are added
wherever necessary to make the working of function clear. Prototype of
the function is also removed from the header file as it is no longer
needed.

Linux-commit: 53f1a12768a55e53b2c40e00a8804b1edfa739b3

Change-Id: Ic746c45e3dda9fdf3f1d2f8c6204d80fec5c058f
Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/47651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15925 lnet: add debug messages for IB 83/47583/4
Cyril Bordage [Thu, 9 Jun 2022 21:41:54 +0000 (23:41 +0200)]
LU-15925 lnet: add debug messages for IB

If net debug is enabled, information about connection, when
tx status is ECONNABORTED, is collected (only for IB).

Test-Parameters: trivial
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I44a33703931630b85cc0e847e2a038217b7967c6
Reviewed-on: https://review.whamcloud.com/47583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15902 obdclass: dt_try_as_dir() check dir exists 83/47483/7
Lai Siyao [Thu, 19 May 2022 22:31:07 +0000 (18:31 -0400)]
LU-15902 obdclass: dt_try_as_dir() check dir exists

If an object is not directory, but dt_lookup() is called on it, it
may crash because .do_lookup is NULL for non-directory file.

Add argument to check object existence and type in dt_try_as_dir(),
and for object to create, skip this check.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I51df0cbb5a4e7abca370ee27dac678f995b76159
Reviewed-on: https://review.whamcloud.com/47483
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15880 quota: fix issues in reserving quota 25/47425/4
Hongchao Zhang [Fri, 24 Jun 2022 14:06:08 +0000 (22:06 +0800)]
LU-15880 quota: fix issues in reserving quota

Calling "chgrp" with unprivileged user will reserve quota space
before changing the GID of the file, and the reserved quota space
will be freed after its transaction is committed. there are some
issues in the current implementation,
1, the reserved quota isn't freed in case of error in "mdd_attr_set"
   and "tgt_cb_last_committed".
2, during freeing the reserved quota, the quota space to free is
   set as the same parameter as reserving the quota, which could
   be wrong, for instance, the reserving quota space will be 0 if
   the corresponding quota ID isn't enforces, but the call will
   return without error.

Like the "qsd_op_begin/qsd_op_end",The patch also adds reference to
the lquota_entry gotten during reserving quota and release it during
freeing the reserved quota to prevent potential issue.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I098cde7d5e89fe8b9eaab0ae4bc285a4ac6c2281
Reviewed-on: https://review.whamcloud.com/47425
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15991 kernel: kernel update RHEL7.9 [3.10.0-1160.71.1.el7] 65/47865/2
Jian Yu [Tue, 5 Jul 2022 06:08:40 +0000 (23:08 -0700)]
LU-15991 kernel: kernel update RHEL7.9 [3.10.0-1160.71.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.71.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I89215145ea8da2925e5c8c01cdf963ba8a087877
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16000 utils: align updatelog parameters in llog_reader 13/47913/2
Etienne AUJAMES [Fri, 8 Jul 2022 10:36:10 +0000 (12:36 +0200)]
LU-16000 utils: align updatelog parameters in llog_reader

Parameters in update log records are aligned on 64bits. llog_reader
do not aligned these parameters: if a parameters size is not mutiple
of 8, the next parameter size will be read incorrectly.

Test-Parameters: trivial
Fixes: 9962d6f ("LU-14617 utils: llog_reader updatelog support")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I6871614ab4ea79d59c3c3b4644b377de395bad56
Reviewed-on: https://review.whamcloud.com/47913
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15993 ofd: don't leak pages if nodemap fails 73/47873/4
Alex Zhuravlev [Tue, 5 Jul 2022 11:24:03 +0000 (14:24 +0300)]
LU-15993 ofd: don't leak pages if nodemap fails

ofd_commitrw() shouldn't exit w/o calling ofd_commitrw_write(),
otherwise the pages taken in ofd_preprw() are leaked.

same in mdt_obd_commitrw()

Fixes: bbfdc7c167 ("LU-14739 quota: fix quota with root squash enabled")

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Icd60c7ab80c5a7b65603d7da0d2e83872dc6b97f
Reviewed-on: https://review.whamcloud.com/47873
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>