Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-14945 lnet: don't use hops to determine the route state 74/44674/6
Serguei Smirnov [Mon, 16 Aug 2021 23:37:30 +0000 (16:37 -0700)]
LU-14945 lnet: don't use hops to determine the route state

NodeA <-tcp1-> GW1 <-tcp2-> GW2 <-tcp3-> NodeB

Assuming GW1 knows how to reach tcp3 network and GW2 knows
how to reach tcp1 network, it should be possible to add routes
without specifying hop=2 on nodes A and B to reach tcp3 and tcp1
respectively and then be able to lnetctl ping between them.
Changes introduced by LU-13785 interpret default hops to be
equivalent to hop=1 set explicitly for the purpose of determining
route aliveness, which results in the routes created as described
above to be considered "down".

Fix it so that default hop setting doesn't prevent
the multi-hop scenario from working.

Test-Parameters: trivial
Fixes: 2e07619477 ("LU-13785 lnet: Use lr_hops for avoid_asym_router_failure")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I341ccdfe156434b0cb306359acc91a9193b44f7b
Reviewed-on: https://review.whamcloud.com/44674
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14895 osd-ldiskfs: combine checksum functions 56/44656/2
Andreas Dilger [Wed, 4 Aug 2021 09:42:37 +0000 (03:42 -0600)]
LU-14895 osd-ldiskfs: combine checksum functions

Reduce code duplication for nearly-identical checksum calculations.
The osd_dif_type1_generate() and osd_dif_type3_generate() were nearly
the same, as were osd_dif_type1_verify() and osd_dif_type3_verify().
Combine these functions to share the code, and handle the difference
between T10-PI type 1 and type 3 with an argument.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I40afb15fd80577ef6de918c90e4111e775ce7057
Reviewed-on: https://review.whamcloud.com/44656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14895 brw: log T10 GRD tags during checksum calcs 55/44655/4
Andreas Dilger [Wed, 4 Aug 2021 08:08:12 +0000 (02:08 -0600)]
LU-14895 brw: log T10 GRD tags during checksum calcs

Log the T10 guard tags during checksum calculation on the client and
target to help identify where checksum errors are being introduced.
The added debugging is only active on RPC resend, so will not add
overhead during the normal IO path.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia4f14f2f2296da096acf629c74558386e7ce7057
Reviewed-on: https://review.whamcloud.com/44655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14816 tests: mark sanity test_230d SLOW 16/44616/2
Andreas Dilger [Wed, 11 Aug 2021 20:59:06 +0000 (14:59 -0600)]
LU-14816 tests: mark sanity test_230d SLOW

Running sanity test_230d takes an average 15 minutes to finish,
and up to an hour in some cases, but has almost never failed.
Move it over to run only with SLOW=yes.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I02ec35c1533a6a97b5400d4419664b43ab49c502
Reviewed-on: https://review.whamcloud.com/44616
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14677 sec: do not expose security.c to listxattr/getxattr 01/44101/14
Sebastien Buisson [Mon, 28 Jun 2021 18:32:16 +0000 (20:32 +0200)]
LU-14677 sec: do not expose security.c to listxattr/getxattr

security.c xattr, which contains encryption context, should not be
exposed by the xattr-related system calls such as listxattr() and
getxattr() because of its special semantics.
Update sanity-sec test_57 to test this.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I919f5cbafc53f5745fbfb5b9d2d7316e892d8c9f
Reviewed-on: https://review.whamcloud.com/44101
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14677 llite: move env contexts to ll_inode_info level 98/44198/11
Sebastien Buisson [Fri, 9 Jul 2021 13:41:34 +0000 (15:41 +0200)]
LU-14677 llite: move env contexts to ll_inode_info level

Contrary to file, inode is always available, so move the list of
env contexts from the file data to the ll_inode_info level.
This is needed because we will have to handle env properties in
ll_get_context() and ll_xattr_list()/ll_listxattr().
This also requires changing lli_lock from a spinlock to an rwlock.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I478d2a8eabfcb09074ba52601f05840d047a6da2
Reviewed-on: https://review.whamcloud.com/44198
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14790 tests: Check NI status when link is downed 73/44073/3
Chris Horn [Thu, 24 Jun 2021 18:37:42 +0000 (13:37 -0500)]
LU-14790 tests: Check NI status when link is downed

Add test to check that NI status is set to down when the
ni_fatal_error_on flag is set (i.e. when a link is down).

HPE-bug-id: LUS-10167
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If98a899b0ee8dd9637c08774109668ad06244c60
Reviewed-on: https://review.whamcloud.com/44073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14825 lod: pool spilling 89/43989/22
Alex Zhuravlev [Wed, 7 Jul 2021 08:15:27 +0000 (11:15 +0300)]
LU-14825 lod: pool spilling

To avoid the problem of the fast pool becoming full this patch
introduces so-called pool spilling: for every OST pool a target
pool can be assigned which will be used instead of original one
if the original one's use is over specified threshold:

  lctl set_param lod.*.pool.pool1.spill_target=pool2
  lctl set_param lod.*.pool.pool1.spill_threshold_pct=80

i.e. once pool1 is 80+% used, then new files will be created on
pool2.

A chain (up to 10 at the moment) can be configured using the
settings like above when different OST pools are considered
one by one.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7f6dd4931ba64f3db8a7ae6a3b185f942a629ed7
Reviewed-on: https://review.whamcloud.com/43989
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-8962 lfs: Handle non-lustre and multiple args 26/42126/14
Arshad Hussain [Mon, 22 Mar 2021 11:43:15 +0000 (17:13 +0530)]
LU-8962 lfs: Handle non-lustre and multiple args

This patch addresses:

01: Handle multiple filesystems provided to 'lfs df'
02: Correctly report 'EOPNOTSUPP' for filesystems which
    are non-Lustre.
03: Make changes to test-framework.sh to handle modified
    return value from 'lfs df'. This changes For compatibility
    reason, ignores and masquerades EOPNOTSUPP as success.

The final return value is 0 for _all_ success or
value of the first failure for even a single failure
seen during the argument processing

sanity/56e Test-case added.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I73287d21792d89b8cde672acdaf9c9caf829522f
Reviewed-on: https://review.whamcloud.com/42126
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13550 osd-zfs: snapshot with incompatible clients 93/38593/2
Shaun Tancheff [Wed, 13 May 2020 20:16:41 +0000 (15:16 -0500)]
LU-13550 osd-zfs: snapshot with incompatible clients

snapshot_create fails when clients are connected that do not support
barrier requests.

Log some information to help the administrator track down the
connections blocking snapshot_create from succeeding.

Test-Parameters: fstype=zfs
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia59ea3c4c1a885e2591464cd4f8f77a1071b4786
Reviewed-on: https://review.whamcloud.com/38593
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9699 osp: don't assert on OSP duplicating 53/27753/21
Jadhav Vikram [Tue, 25 Jul 2017 07:01:37 +0000 (12:31 +0530)]
LU-9699 osp: don't assert on OSP duplicating

Writeconf on an MDT with index > 0000 will cause
"add mdc" to be added to $FSNAME-client config
and "add osp" to be added to $FSNAME-MDTXXXX configs.

However, the configs may already contain these
directives. Duplicating the OSP device will
cause the assertion failure in osp_obd_connect():
ASSERTION( osp->opd_connects == 1 ) failed

Duplicating the MDC just returns -EEXIST in similar
situation.

A possible solution is to check configs for duplicates
before writing to them. However, sometimes we
would like to change nids which are part of
"add mdc" and "add osp".

Another solution is to mark previous entries with
SKIP flags. This patch implements this approach.
Since after revoking the config lock, the clients
and the MDTs will receive the updated log and
apply its newer entries, we still have to handle
OSP duplication, but this is only an issue
immediately after writeconf processing.

Seagate-bug-id: MRP-2634, MRP-3865
Change-Id: Idd7ad43c78d50e6bbe715850503aa0b01fcbf071
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/27753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9510 ldiskfs: to not verify preallocation in umount path 30/27130/12
Jadhav Vikram [Wed, 3 Feb 2021 15:24:31 +0000 (23:24 +0800)]
LU-9510 ldiskfs: to not verify preallocation in umount path

At umount time while discarding inode preallocation space, panic
occurred due to mismatch found in preallocation space free blocks
i.e pa->pa_free and free blocks calculated by reading on disk
block bitmap within preallocation space length. Similar crash will
occur when user sets errors=panic in mount option and if there is
mismatch in pa space free blocks.

Changes added to not verify mismatch in disk and in-memory
preallocated space unused blocks if the file system is being
umounted.

Seagate-bug-id: MRP-3741
Signed-off-by: Jadhav Vikram <vikramjadhav87@yahoo.co.in>
Signed-off-by: Jadhav Vikram <jadhav.vikram@seagate.com>
Reviewed-by: Alexey Leonidovich Lyashkov <alexey.lyashkov@seagate.com>
Reviewed-by: Ashish Purkar <ashish.purkar@seagate.com>
Tested-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Change-Id: I6d43905d49a219d1a5b966ab405e974a1f29b2f3
Reviewed-on: https://review.whamcloud.com/27130
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14781 osp: osp object header could be NULL 55/44055/4
Bobi Jam [Fri, 3 Sep 2021 04:03:18 +0000 (12:03 +0800)]
LU-14781 osp: osp object header could be NULL

Don't call lu_object_header_fini upon NULL header in
osp_object_free().

Call trace:
lu_object_free.isra.30+0xf2/0x170 [obdclass]
lu_object_find_at+0x496/0x930 [obdclass]
lod_initialize_objects+0x3e4/0xba0 [lod]
lod_parse_striping+0x693/0xc20 [lod]
lod_striping_load+0x2b2/0x660 [lod]
lod_declare_destroy+0x12b/0x600 [lod]
mdd_declare_finish_unlink+0x91/0x210 [mdd]
mdd_unlink+0x48f/0xab0 [mdd]
mdt_reint_unlink+0xc32/0x1550 [mdt]
mdt_reint_rec+0x83/0x210 [mdt]
mdt_reint_internal+0x6e1/0xb00 [mdt]
mdt_reint+0x67/0x140 [mdt]
tgt_request_handle+0xaee/0x15f0 [ptlrpc]
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
ptlrpc_main+0xb34/0x1470 [ptlrpc]
kthread+0xd1/0xe0

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Iec23cf06dffaa64c6f5853c28382ba930ee1076b
Reviewed-on: https://review.whamcloud.com/44055
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-13397 llite: support fallocate() on selected mirror 21/44721/3
Mikhail Pershin [Sun, 22 Aug 2021 19:41:33 +0000 (22:41 +0300)]
LU-13397 llite: support fallocate() on selected mirror

- add ability to do fallocate() on designated mirror in
  FLR file
- add missing FALLOC_FL_KEEP_SIZE flag to fallocate() call
  in llapi_hole_punch(). It was just not working without
  that flag silently
- add corresponding test_50d in sanity-flr.sh

Fixes: 4126fbb30c ("LU-13397 lfs: mirror resync to keep sparseness")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I8d700fce904c84458a50650f1d3cb09d23989eba
Reviewed-on: https://review.whamcloud.com/44721
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-5369 mdt: check lock handle instead assert 05/44905/4
Yang Sheng [Mon, 13 Sep 2021 21:04:00 +0000 (05:04 +0800)]
LU-5369 mdt: check lock handle instead assert

The lock handle could be NULL inn some corner case.
We should check it instead of LBUG.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I1afa7f8c129c104b012ae23141318365c388c503
Reviewed-on: https://review.whamcloud.com/44905
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-13717 sec: filename encryption - symlink support 94/43394/19
Sebastien Buisson [Tue, 31 Aug 2021 15:30:48 +0000 (17:30 +0200)]
LU-13717 sec: filename encryption - symlink support

On client side, call the appropriate llcrypt primitives from llite,
to proceed with symlink encryption before sending requests to servers
and symlink decryption upon request receipt.
The tricky part is that llcrypt needs an inode to encrypt the target
name. But by the time we prepare the symlink creation request to be
sent to the server with the target name (in ll_new_node), we do not
have an inode yet (it will be obtained only after we get the server
reply). So we create a fake inode and associate the right encryption
context to it, so that the symlink gets encrypted properly.

In order to report the correct size for an encrypted symlink (which is
ought to be the length of the symlink target), we need to read the
symlink target and decrypt or decode it in ->getattr(). This has a
performance hit, but given that the symlink target is cached in
->i_link (when the key is available), the symlink will not have to be
read and decrypted again later when it is actually followed,
readlink() is called, or lstat() is called again.
This part of the patch is adapted from kernel commit
d18760560593e5af921f51a8c9b64b6109d634c2
"fscrypt: add fscrypt_symlink_getattr() for computing st_size"

With encrypted file names, a symlink target is binary. So make sure
server side can handle that, by switching sp_symname to a
struct lu_name in struct md_op_spec.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic6892fca8926a35001697c54aaf05d15563b139d
Reviewed-on: https://review.whamcloud.com/43394
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15022 revert: "LU-14997 tests: Register stack_trap for sanity/104c" 08/45008/3
Andreas Dilger [Tue, 21 Sep 2021 22:58:30 +0000 (22:58 +0000)]
LU-15022 revert: "LU-14997 tests: Register stack_trap for sanity/104c"

This reverts commit 59b32113313c3566e5f3797bca404a5b19d5e305
since it caused constant test failures for ZFS backends.

Change-Id: I195cc483166294dbf97be50a9b747c8a2b534799
Test-Parameters: trivial fstype=zfs testlist=sanity env=ONLY=104,ONLY_REPEAT=20
Reviewed-on: https://review.whamcloud.com/45008
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoEX-3687 osp: do force disconnect if import is not ready 53/44753/4
Mikhail Pershin [Wed, 25 Aug 2021 17:03:47 +0000 (20:03 +0300)]
EX-3687 osp: do force disconnect if import is not ready

Send OSP_DISCONNECT only on health import. Otherwise,
force local disconnect for unhealthy imports.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Icd9f171271f4e17a65503fcc710ad3aaa2b84e1e
Reviewed-on: https://review.whamcloud.com/44753
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14997 tests: Register "stack_trap" for sanity/104c 82/44882/3
Arshad Hussain [Thu, 9 Sep 2021 09:18:42 +0000 (05:18 -0400)]
LU-14997 tests: Register "stack_trap" for sanity/104c

This patch is a minor improvement for calling cleanup
through 'stack_trap' versus doing right at the end of
the script.

Fixes: 8ee6e1c8825c ("LU-14565 ofd: Do not rely on tgd_blockbit")
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iae2ca81091e0119f2117f4cd57b5cc2f6ac38c6c
Reviewed-on: https://review.whamcloud.com/44882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-14990 tests: Detect correct LNet interface for sanity-lnet 57/44857/2
Chris Horn [Tue, 7 Sep 2021 15:24:14 +0000 (10:24 -0500)]
LU-14990 tests: Detect correct LNet interface for sanity-lnet

Determine the names of the interfaces used for LNet by parsing the
NIDs configured after calling load_modules(). Tests which reference
eth0 are modified to use the interface associated with the primary
NID (i.e. first NID output by lctl list_nids).

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10385
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id715aa3e5470d9c110f6248620b1a83920875e7b
Reviewed-on: https://review.whamcloud.com/44857
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14782 kernel: new kernel [SLES15 SP3 5.3.18-59.19.1] 62/44062/5
Jian Yu [Mon, 6 Sep 2021 02:19:07 +0000 (19:19 -0700)]
LU-14782 kernel: new kernel [SLES15 SP3 5.3.18-59.19.1]

This patch makes changes to support new SLES15 SP3 release
with kernel 5.3.18-59.19.1 for Lustre client.

Test-Parameters: trivial

Change-Id: Idf6fad9773dd242c02859a5c7b14401675c4ecf4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44062
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14991 tests: Correct whitespace in sanity-lnet test_101/102 56/44856/2
Chris Horn [Tue, 7 Sep 2021 15:47:06 +0000 (10:47 -0500)]
LU-14991 tests: Correct whitespace in sanity-lnet test_101/102

sanity-lnet.sh test_100 and test_101 use tab characters in the
expected yaml output, but yaml syntax does not allow tab characters.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: a5cbe7883d ("LU-12815 socklnd: allow dynamic setting of conns_per_peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0814f1965414f82cdc696cfe9996b33e863df982
Reviewed-on: https://review.whamcloud.com/44856
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14934 kernel: kernel update SLES12 SP5 [4.12.14-122.83.1] 48/44848/2
Jian Yu [Mon, 6 Sep 2021 01:47:38 +0000 (18:47 -0700)]
LU-14934 kernel: kernel update SLES12 SP5 [4.12.14-122.83.1]

Update SLES12 SP5 kernel to 4.12.14-122.83.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I2b35d129550b895324bb3e2e61910ad10e846f03
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44848
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14965 ldiskfs: hold inode mutex for ldiskfs_orphan_add() 54/44754/3
Bobi Jam [Thu, 26 Aug 2021 10:19:11 +0000 (18:19 +0800)]
LU-14965 ldiskfs: hold inode mutex for ldiskfs_orphan_add()

See following warning:

ldiskfs/namei.c:3331 ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
Call Trace:
dump_stack+0x19/0x1b
__warn+0xd8/0x100
warn_slowpath_null+0x1d/0x20
ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
ldiskfs_xattr_inode_orphan_add+0xbb/0x110 [ldiskfs]
ldiskfs_xattr_delete_inode+0x5c/0x350 [ldiskfs]
ldiskfs_evict_inode+0x1a8/0x630 [ldiskfs]
evict+0xb4/0x180
iput+0xfc/0x190
osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
lu_object_free.isra.27+0xb8/0x1c0 [obdclass]
lu_object_put+0xa5/0x460 [obdclass]
mdt_object_put+0x30/0x110 [mdt]
mdt_reint_unlink+0x8e0/0x1890 [mdt]
mdt_reint_rec+0x83/0x210 [mdt]
mdt_reint_internal+0x720/0xaf0 [mdt]
mdt_reint+0x67/0x140 [mdt]
tgt_request_handle+0x7ea/0x1750 [ptlrpc]
ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21

Need to hold inode mutex on the external EA for ldiskfs_orphan_add()
to soothe the warning.

Fixes: f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock() for truncates")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3a1abfde3289c0bbd46e0d5a5b9d2ff7d7cf9273
Reviewed-on: https://review.whamcloud.com/44754
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14323 tests: skip sanity-flr/pfl tests for older servers 94/44494/5
James Nunez [Wed, 4 Aug 2021 14:47:50 +0000 (08:47 -0600)]
LU-14323 tests: skip sanity-flr/pfl tests for older servers

sanity-flr test 46 sub tests 7, 8, 9 and 10 and sanity-pfl
test 16c were added to lustre-master version 2.13.53.205.
When we run version interop testing, these sanity-flr and
sanity-pfl tests will fail.  Thus skip sanity-flr test 46
subtests 7, 8, 9, and 10 and sanity-pfl test 16c when run
with servers with version less than 2.13.53.205 and clients
with later version.

Fixes: ee916af10de2 (“LU-13366 utils: SEL yaml and copy file support “)
Test-Parameters: trivial
Test-Parameters: env=ONLY=46 testlist=sanity-flr
Test-Parameters: env=ONLY=16 testlist=sanity-pfl
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=46 testlist=sanity-flr
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=16 testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I09b88351a10891f63dceb9a2a74c92e4fffc13c5
Reviewed-on: https://review.whamcloud.com/44494
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
2 years agoLU-14709 pcc: VM_WRITE should not trigger layout write 83/44483/7
Qian Yingjin [Sat, 31 Jul 2021 07:45:56 +0000 (15:45 +0800)]
LU-14709 pcc: VM_WRITE should not trigger layout write

VM area marked with VM_WRITE means that pages may be written, but
mmap page write may never happen.
It should delay layout write until the actual modification on the
file happen in ->page_mkwrite().
Otherwise, it will trigger panic for PCC-RO sanity-pcc test_21f().

Fixes: f2d1c4ee4 ("LU-14647 flr: mmap write/punch does not stale other mirrors")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I1cbfef8a4ed7e2c718324fd8a21bafd6157b5f0c
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44483
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14896 utils: migrate file with only '--pool' option 65/44465/4
Etienne AUJAMES [Mon, 2 Aug 2021 10:26:58 +0000 (12:26 +0200)]
LU-14896 utils: migrate file with only '--pool' option

"lfs migrate -p pool_name test_file" initiate a migration but without
changing the layout pools (migrate from layout copy).

This patch implements the same behavior that:
"lfs setstripe -p pool_name test_file"
It sets the pool name and uses the default parameters for the plain
layout.

Add sanity test 56xg to check file migrations with pool.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I1645eaca028974337218411d6a033f3acf9b9d6a
Reviewed-on: https://review.whamcloud.com/44465
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13055 changelog: use default mask if server has no mask 04/44404/3
Mikhail Pershin [Tue, 27 Jul 2021 10:37:01 +0000 (13:37 +0300)]
LU-13055 changelog: use default mask if server has no mask

When registering a new maskless user and server has no specific
mask set then effective mask to be set to DEFAULT value

Fixes: a15eb4f132 ("LU-13055 mdd: per-user changelog names and mask")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If799cb5cc29c60cce6ef6c987f2e493145e00e31
Reviewed-on: https://review.whamcloud.com/44404
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13903 utils: separate out server code for wiretest 73/43873/9
James Simmons [Sat, 21 Aug 2021 17:54:42 +0000 (13:54 -0400)]
LU-13903 utils: separate out server code for wiretest

Both the kernel and userland utility wiretest is used by both
client and server to validate data being sent over the network.
Make userland  wiretest buildable on the native Linux client
which lacks server specific data structures. Use of the UAPI
values to hardern testing of user land data passed to the
kernel.

Change-Id: I30efc8bf42ac461bab5a4371e940a027a23d12c9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43873
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13903 uapi: fixup UAPI headers for native Linux client. 64/44664/5
James Simmons [Sat, 4 Sep 2021 12:33:53 +0000 (08:33 -0400)]
LU-13903 uapi: fixup UAPI headers for native Linux client.

This covers all the UAPI problems outside of the user land
wiretest utility. One set of problems is build and the second is
that UAPI header definitions are either user land only or never
used to valid data going to or from user land.

1) Use UAPI header definitions to validate data send to or from
   kernel space. We check lum_hash_type using LMV_HASH_TYPE_MASK.
   This avoids a round trip to the server which will report back
   an error. The other case is we check the values returned for
   LL_IOC_HSM_ACTION. We keep the original behavior of passing
   unknown data to the user land application but add debug
   logging if the data looks corrupt to help track down bug
   issues.

2) We can use QIF_DQBLKSIZE* instead of Lustre specific values
   for our quota handling. QIF_DQBLKSIZE* is a Linux UAPI quota
   value.

3) The NOTIFY_GRACE_* macros are used only by user land. Move
   to lustreapi.h

4) A few of the UAPI definitions are used by utility code
   present on the client and the Lustre kernel server code; which
   are not sent over the wire. Handle these special cases. This
   covers the missing LCM_USER_MIRROR_FLAGS, LCME_TEMPLATE_FLAGS,
   and LQUOTA_* values. Once server code merges upstream we can
   clean this up.

5) lcfg_cmd2data() is server specific so in case of a client build
   we can have get_llog_event_name() just always return NULL.

6) Don't package OpenSFS UAPI headers when building for native
   Linux client.

Change-Id: I258ee917b005e438eb7c15fa6e0c4b72e9ea9d56
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44664
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: filename encryption - digest support 92/43392/16
Sebastien Buisson [Fri, 22 Jan 2021 12:06:50 +0000 (21:06 +0900)]
LU-13717 sec: filename encryption - digest support

A number of operations are allowed on encrypted files without the key:
- read file metadata (stat);
- list directories;
- remove files and directories.
In order to present valid names to users, cipher text names are base64
encoded if they are short. Otherwise we compute a digested form of the
cipher text, made of the FID (16 bytes) followed by the second-to-last
cipher block (16 bytes), and we base64 encode this digested form for
presentation to user.
These transformations are carried out in the specific overlay
functions, that now need to know the fid of the file.

As the digested form does not contain the whole cipher text name,
server side needs to proceed to an operation by FID for requests such
as lookup and getattr. It also relies on the content of the LinkEA to
verify the digested form as received from client side.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I45d10a426373c2cfe0b92a58c351da452d085d7d
Reviewed-on: https://review.whamcloud.com/43392
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
2 years agoLU-13086 tests: restore compatibility with mpich 89/38689/8
Elena Gryaznova [Thu, 21 May 2020 10:13:41 +0000 (13:13 +0300)]
LU-13086 tests: restore compatibility with mpich

The addition of the --oversubscribe MPI option to mpi_run() is
OpenMPI specific.  Patch moves --oversubscribe to MPIRUN_OPTIONS
in local.sh to restore the compatibility with MPICH.

Test-Parameters: trivial clientdistro=el8.3 serverdistro=el7.7 testlist=parallel-scale,large-scale,performance-sanity
Test-Parameters: clientdistro=el8.4 serverdistro=el7.7 testlist=parallel-scale,large-scale,performance-sanity
Fixes: 3c7aca7472 ("LU-12395 build: build mpitests for el8")
Cray-bug-id: LUS-8006
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I0a6fab072212781d12877d2503ae8600cfdc8c7a
Reviewed-on: https://review.whamcloud.com/38689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-13997 tests: sanity/418 to cancel all client locks 03/44803/4
Alex Zhuravlev [Wed, 1 Sep 2021 08:54:04 +0000 (11:54 +0300)]
LU-13997 tests: sanity/418 to cancel all client locks

verify idea about dirty client's data

Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ifef58a98b26c7790274d2a57aa52e4475e923dd0
Reviewed-on: https://review.whamcloud.com/44803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14959 ldlm: Check return value of ldlm_resource_get() 38/44738/4
Oleg Drokin [Tue, 24 Aug 2021 03:44:45 +0000 (23:44 -0400)]
LU-14959 ldlm: Check return value of ldlm_resource_get()

Fix the comment to properly indicate it returns ERR_PTR on
error and fix osc_req_attr_set() and mdc_get_lock_handle()
to actually check the return value before passing it on and
causing an unintended crash.

Change-Id: Ib85a62140a39744e85989c9a9c8aa2ed771d70d1
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44738
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
2 years agoLU-14951 llite: protect fd_{lease_}och 00/44700/2
Bobi Jam [Wed, 18 Aug 2021 13:24:50 +0000 (21:24 +0800)]
LU-14951 llite: protect fd_{lease_}och

Access ll_file_data::fd_och and fd_lease_och needs to lli_och_mutex
protection.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie9136aa345c6bf015aa73067acdaecf1a765b9f6
Reviewed-on: https://review.whamcloud.com/44700
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13195 osp: track destroyed OSP object 85/38385/11
Alex Zhuravlev [Mon, 27 Apr 2020 04:52:01 +0000 (07:52 +0300)]
LU-13195 osp: track destroyed OSP object

retain destroyed OSP objects in memory to prevent races when
in-flight destroyed is passed by read or attr_get leading to
incorrect local states.
also block operations to such an object with -ENOENT.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ied59f1a95458e8890249b92d4efc38e258a7e3cf
Reviewed-on: https://review.whamcloud.com/38385
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14729 osd-ldiskfs: declare should consider concurrency 16/44316/13
Wang Shilong [Thu, 15 Jul 2021 08:15:37 +0000 (16:15 +0800)]
LU-14729 osd-ldiskfs: declare should consider concurrency

Write in Lustre OSD is different than Ext4 since write
is serialized in local filesystem, however in OSD side,

many concurrent threads may grow tree before transaction starts.

Also fix to use @dirty_groups rather than @extents, remove
unnecessary @depth assignment.

Fixes: 9810341a8 ("LU-14729 osd-ldiskfs: fix to declare write commits")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I1e0fc9069a579736a74b0ba2607056fe980574c3
Reviewed-on: https://review.whamcloud.com/44316
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14724 nrs: TBF rule list broken when change rule rank 25/43925/6
Qian Yingjin [Fri, 28 May 2021 03:56:12 +0000 (11:56 +0800)]
LU-14724 nrs: TBF rule list broken when change rule rank

When change rank of two adjacent rules in the TBF rule list in
@nrs_tbf_rule_change_rank():
list_move(&rule->tr_linkage, next_rule->tr_linkage.prev);

The previous pointer of @next_rule is @rule, using list_move
directly will break the rule list.
In this patch, it use list_del + list_add to repace list_move to
avoid TBF rule broken.
And also add a test case sanityn test_77o for this bug.

Fixes: aa14b0b9a152 ("LU-8006 ptlrpc: specify ordering of TBF policy rules")
Change-Id: Ica30d3329f07914657ac2c4089d66f934021b763
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/43925
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14711 tests: Ensure there's no eviction with long cache discard 69/43869/9
Oleg Drokin [Sat, 29 May 2021 02:42:49 +0000 (22:42 -0400)]
LU-14711 tests: Ensure there's no eviction with long cache discard

Just pause execution while doing page processing
for discard if appropriate failloc is set.

Change-Id: If0d04f3cad267cbeeab63040d63e048dcf03cd6b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Test-Parameters: trivial testlist=sanity env=ONLY=903
Reviewed-on: https://review.whamcloud.com/43869
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-13717 sec: filename encryption 90/43390/15
Sebastien Buisson [Tue, 23 Mar 2021 13:58:50 +0000 (22:58 +0900)]
LU-13717 sec: filename encryption

On client side, call the appropriate llcrypt primitives from llite,
to proceed with filename encryption before sending requests to servers
and filename decryption upon request receipt.
Note we need specific overlay functions to handle encoding and
decoding of encrypted filenames, as we do not want server side to deal
with binary names before they reach the backend file system layer.

On server side, mainly the OSD layer, we need to know the encryption
status of files being processed.
If an object belongs to an encrypted file, the filename has been
encoded by the client because it is binary, so it needs to be decoded
before being handed over to the backend file system layer.
And conversely, the filename of an encrypted file has to be encoded
before being sent over the wire.
Note server side is osd-ldiskfs only for now.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7ac9047f5a046b8bc63afdbbb1f28e78aa5c8c7e
Reviewed-on: https://review.whamcloud.com/43390
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-14854 mdd: proper handle error in mdd_swap_layouts() 19/44319/5
Bobi Jam [Thu, 15 Jul 2021 18:20:54 +0000 (02:20 +0800)]
LU-14854 mdd: proper handle error in mdd_swap_layouts()

Only restore object's HSM xattr on error if it's for
SWAP_LAYOUTS_MDS_HSM.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9d4c58cd3107c3900e72a0946d0ec7d7286dd43f
Reviewed-on: https://review.whamcloud.com/44319
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9897 tests: add generated files to .gitignore 78/44778/3
James Simmons [Sat, 28 Aug 2021 23:55:49 +0000 (19:55 -0400)]
LU-9897 tests: add generated files to .gitignore

Several binaries and wrappers are created in the build process
that show up as files for git add which is not the case. Add
these files to .gitignore so avoid an accidental git addition.

Test-Parameters: trivial
Change-Id: If693ba7933c0329a333dec71ed6fb521a90435f4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44778
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14967 obdclass: EAGAIN after rhashtable_walk_next() 66/44766/3
Alex Zhuravlev [Fri, 27 Aug 2021 05:42:56 +0000 (08:42 +0300)]
LU-14967 obdclass: EAGAIN after rhashtable_walk_next()

rhashtable_walk_next() can return -EAGAIN when concurrent resizing
has happened. so the callers should check for this error and just
repeat rhashtable_walk_next().

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I15ba2cdf16c2678e18836b4f16b56a3b8bfdacd0
Reviewed-on: https://review.whamcloud.com/44766
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14776 zfs: fix Ubuntu 20 HWE build issues 49/44749/3
James Simmons [Wed, 25 Aug 2021 17:17:51 +0000 (11:17 -0600)]
LU-14776 zfs: fix Ubuntu 20 HWE build issues

With newer Ubuntu systems using ZFS dkms have the following build
errors:

    In file included from zfs/2.0.2/source/include/sys/arc.h:32,
                 from lustre/osd-zfs/osd_internal.h:50,
                 from lustre/osd-zfs/osd_handler.c:51:
    zfs/2.0.2/source/include/sys/zfs_context.h:45:10:
                 fatal error: sys/types.h: No such file or directory
    45 | #include <sys/types.h>
       |          ^~~~~~~~~~~~~
    compilation terminated.

This is due to layout of the tree containing the needed headers.
Include those paths in build system.

Test-Parameters: trivial
Change-Id: I453830c4111ad88ec655d3d7d0ee51627331cb0b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44749
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14776 build: Ubuntu 20.04.2 and 20.04.3 HWE client support 48/44748/2
James Simmons [Wed, 25 Aug 2021 13:26:52 +0000 (09:26 -0400)]
LU-14776 build: Ubuntu 20.04.2 and 20.04.3 HWE client support

We now support Luste clients on both Ubuntu 20.04.2 and
20.04.3 HWE platforms.

Change-Id: I772af876ffa8beeabb8a2002f80aa776fa373996
Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44748
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14962 lnet: Check for -ESHUTDOWN in lnet_parse 43/44743/3
Chris Horn [Tue, 24 Aug 2021 16:16:17 +0000 (11:16 -0500)]
LU-14962 lnet: Check for -ESHUTDOWN in lnet_parse

The fix for LU-8106, http://review.whamcloud.com/19993, no longer
works because rc does not have the return value from
lnet_nid2peerni_locked(). Use PTR_ERR to get the return value and
restore the LU-8106 fix.

HPE-bug-id: LUS-10333
Fixes: fa8b4e6357 ("LU-7734 lnet: peer/peer_ni handling adjustments")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9cc2bc2d6e675d38cf06d99c524bdd95110bf0e9
Reviewed-on: https://review.whamcloud.com/44743
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14961 tests: set Pool Quotas 40/44740/4
Elena Gryaznova [Tue, 24 Aug 2021 11:23:22 +0000 (14:23 +0300)]
LU-14961 tests: set Pool Quotas

We are interested in running some tests on fs with
pool quotas set for some users. For instance, setting
pool quotas limits for mpiuser allows to stress pool
quotas code with mpi tests.
Patch adds ability to set pool quotas block hard limits
for specific users via POOLS_QUOTA_USERS_SET.
Example:
  POOLS_QUOTA_USERS_SET="quota15_1:20M
                quota15_2:1G:gpool0
                quota15_4:200M:gpool0
                quota15_4:200M:gpool1"
For quota15_1 limit 20M will be set for all existing
pools.

Test-Parameters: env=FS_POOL="glo",POOLS_QUOTA_USERS_SET="mpiuser:200M quota15_1:2000M:glo1",FS_NPOOLS="2",ENABLE_QUOTA="yes"
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10059
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Change-Id: Ia9ee540ca77e70f37aa849e5e555e3c057e2052d
Reviewed-on: https://review.whamcloud.com/44740
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14960 tests: enhance ha.sh to work with several test dirs 39/44739/3
Elena Gryaznova [Tue, 24 Aug 2021 10:56:58 +0000 (13:56 +0300)]
LU-14960 tests: enhance ha.sh to work with several test dirs

Patch adds the ability to work with several test directories
set via ha_test_dirs variable.
Useful for emulation more Lustre clients.
Example:
  before the test mount Lustre on:
    /mnt/lustre, /mnt/lustre1 /mnt/lustre3
  Run ha.sh with:
  ha_test_dirs="/mnt/lustre /mnt/lustre1 /mnt/lustre3"
  The client's test directories will be created in the listed
  test directories:
  client0 works in /mnt/lustre subdirectory
  client1 works in /mnt/lustre1 subdirectory,
  etc.

Patch also adds the ability to not remove the test directories
if CLEANUP set to false.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9705
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: I1d04b7deeda693c9ca1c86411b0a66c6a2315923
Reviewed-on: https://review.whamcloud.com/44739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 libcfs: change libcfs_log_* functions to inline 81/44581/3
James Simmons [Tue, 10 Aug 2021 18:05:31 +0000 (14:05 -0400)]
LU-9859 libcfs: change libcfs_log_* functions to inline

The functions libcfs_log_return() and libcfs_log_goto() don't
exist in the native Linux client. We still need them for the
special OpenSFS debugging but we can change those functions
to simple inline routines since they are just wrappers
around libcfs_debug_msg().

Test-Parameters: trivial
Change-Id: I0e2b40feb18f9f1a1ffbda39756ab64308ea6439
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44581
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14021 llite: don't touch vma after filemap_fault 58/44558/2
Alexander Boyko [Tue, 10 Aug 2021 14:20:42 +0000 (10:20 -0400)]
LU-14021 llite: don't touch vma after filemap_fault

In case of error filemap_fault unlock mutex vma->vm_mm->mmap_sem,
so touching vma is dangerous, it could be reused or freed.
The patch uses local file variable to skip vma.

HPE-bug-id: LUS-10240
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I72cd086645061819fab5b8595a880db64cfb9ff7
Reviewed-on: https://review.whamcloud.com/44558
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14807 lfsck: fix race in lfsck_pos_fill 30/44130/7
Hongchao Zhang [Sun, 27 Jun 2021 21:00:20 +0000 (05:00 +0800)]
LU-14807 lfsck: fix race in lfsck_pos_fill

There is a race for lfsck->li_di_dir between lfsck_di_dir_put and
lfsck_pos_fill, which could cause lfsck_pos_fill to use freed
lfsck->li_di_dir (struct osd_it_ea) and trigger GPF.

Change-Id: Iedadf03ac15d128bb051aea8aafa24dbcd2704fb
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44130
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14696 llite: check read only mount for setquota 65/43765/6
Hongchao Zhang [Thu, 12 Aug 2021 11:06:45 +0000 (19:06 +0800)]
LU-14696 llite: check read only mount for setquota

During setting quota, it should fail if the mount is read-only.

Change-Id: I966ac71d0a4a72dcb998f09ffc0f99ae28498e27
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43765
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13668 mdt: change lock mode for lease 64/38964/23
Alex Zhuravlev [Wed, 17 Jun 2020 14:05:28 +0000 (17:05 +0300)]
LU-13668 mdt: change lock mode for lease

make it PW so that lfs getstripe and open-for-read do not
interrupt replication.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I20f4bbbc4e7bf9055333aba1b8cca80aa899c664
Reviewed-on: https://review.whamcloud.com/38964
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9868 lustre: switch to use of ->d_init() 35/44135/21
Al Viro [Mon, 16 Aug 2021 18:41:22 +0000 (14:41 -0400)]
LU-9868 lustre: switch to use of ->d_init()

Starting with 4.7 kernels the initialization of dentries
is now managed by the VFS layer at allocation time. Any
time a dentry is created by the VFS ll_d_init will be
called.

Linux-commit: 7126bc2e8d60c2a00539bf96b1005f3015be87a5

Change-Id: I02f9b83afd5007658ce88c1010c669d642665d39
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14954 socklnd: fix link state detection 32/44732/5
Serguei Smirnov [Mon, 23 Aug 2021 19:58:51 +0000 (12:58 -0700)]
LU-14954 socklnd: fix link state detection

Due to matching only the device index, link detection implemented
in LU-14742 has issues with confusing the link events for the
virtual interfaces with the link events for the interface that
LNet was actually configured to use. Fix this by improving
the identification of the event source: use both device name and
device index.

Also, to make sure the link fatal state is cleared only when
the device is bound to the IP address used at NI creation,
subscribe to inetaddr events in addition to the netdev events.

Test-Parameters: trivial
Fixes: fc2df80e96dc ("LU-14742: detect link state to set fatal error")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib1996c66a8ae2596970d66e3d920702190851e3f
Reviewed-on: https://review.whamcloud.com/44732
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14877 llite: Remove inode locking in ll_fsync 68/44368/6
Oleg Drokin [Wed, 21 Jul 2021 20:03:10 +0000 (16:03 -0400)]
LU-14877 llite: Remove inode locking in ll_fsync

It does not appear to be necessary

Change-Id: I0142a9dca4ecc6893521275b69a0a46012eab0b0
Fixes: 8f3ef1e961 ("LU-812 llite: 3.0+ kernel fsync should call write")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44368
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14904 ldiskfs: add support for Ubuntu20 kernel 5.4.0.80 03/44703/3
James Simmons [Wed, 18 Aug 2021 16:15:29 +0000 (12:15 -0400)]
LU-14904 ldiskfs: add support for Ubuntu20 kernel 5.4.0.80

Changes from newer 5.4.0 kernel version have been backported to
Ubuntu20. Test for Ubuntu 5.4.0.80 kernels so we use the correct
series file with the updated ext-simple-blockalloc.patch.

Test-Parameters: trivial
Change-Id: I73ad558a306ec50fb1ba45e6ab2c59aaec047197
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44703
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14927 scrub: share osd_scrub[prep|post] code 05/44705/2
James Simmons [Wed, 18 Aug 2021 18:04:44 +0000 (14:04 -0400)]
LU-14927 scrub: share osd_scrub[prep|post] code

Both osd-zfs and osd-ldiskfs functions osd_scrub_prep() and
osd_scrub_post() are nearly identical. Additionally the code
contains internal kernel code that can be only with non-tainted
modules. To avoid the inherited tainted issues create common
code scrub_thread_prep() and scrub_thread_post() to place in
scrub.c in obdclass. These can be handled as kthread helpers
for OSD drivers.

Change-Id: Ia4875eafc053c1e07f437ba55dbdcf58029a7fc6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44705
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14949 llite: Always do lookup on ENOENT in open 75/44675/5
Patrick Farrell [Tue, 17 Aug 2021 02:54:59 +0000 (22:54 -0400)]
LU-14949 llite: Always do lookup on ENOENT in open

When there is no valid dentry found for a file we want to
open, we perform a full lookup, which goes to the server
and looks up the file by name. When we find an existing
dentry in cache *but the file is not open on the node*, we
do not do a full lookup.  We move directly to opening the
file.

When we open files, we use the FID of the file.  The
problem occurs when a new file is renamed *over* the file
we were trying to open.  This removes the FID we are
trying to open, but the file *name* userspace called open()
on is still present.  In this case, we will return ENOENT,
even though there is a file matching the name used in the
open() call.

The solution is when we get an ENOENT on open (indicating
our open raced with an unlink), we always send ESTALE back
to the VFS, which restarts the open and forces a lookup to
the server (by forcing Lustre to consider the dentry
invalid, see comments in ll_intent_file_open and code in
ll_revalidate_dentry).

This causes a lookup by name, which will correctly handle
the rename, allowing the open to proceed normally.

This should only generate extra retries in the case where a
positive dentry exists on the client but the file has been
removed on the server, ie, open racing with unlink.

This should hopefully be rare.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Change-Id: If9157cac901c81d6ad3f15997d419d3907fe88b8
Reviewed-on: https://review.whamcloud.com/44675
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14847 ptlrpc: two replay lock threads 94/44294/5
Vitaly Fertman [Tue, 13 Jul 2021 16:07:14 +0000 (19:07 +0300)]
LU-14847 ptlrpc: two replay lock threads

conflict to each other what leads to:
        ASSERTION( atomic_read(&imp->imp_replay_inflight) == 1 )

replay_lock_interpret() does ptlrpc_connect_import() on error, and one
thread will appear starting with connect reply interpret.

replay_lock_interpret() also wakes up ldlm_lock_replay_thread() which
does ptlrpc_import_recovery_state_machine().

It may happen that both threads will get to ldlm_replay_locks() on the
next round at the same time, both increment imp_replay_inflight and
the second one will assert.

The problem appeared in LU-13600 which added ldlm_lock_replay_thread()
with the ptlrpc_import_recovery_state_machine() call.

HPE-bug-id: LUS-10147
Fixes: 3b613a442b ("LU-13600 ptlrpc: limit rate of lock replays")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ia9aafb631e3ba5f850504cc58b4826acec2813bd
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158931
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/44294
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14785 build: changelog updates should not dirty version 69/44069/5
Shaun Tancheff [Thu, 24 Jun 2021 13:08:24 +0000 (08:08 -0500)]
LU-14785 build: changelog updates should not dirty version

When building lustre debs the final version should not
include 'dirty' due to an update of the changelog

HPE-bug-id: LUS-10152
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f4da4b3006302e3598cfa56a0364b052f885ef
Reviewed-on: https://review.whamcloud.com/44069
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page) 62/43462/9
Arshad Hussain [Wed, 19 May 2021 11:04:30 +0000 (16:34 +0530)]
LU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page)

fallocate(PUNCH_HOLE) was leaving the partially-zeroed
page in the buffer cache. This was causing ASSERT when
doing large direct read/write operations. This was see
when executing a fsx run with options:-

$ fsx -c 50 -p 1000 -S 7919 -P /tmp -l 5407677 -N 100000 <file>

Lustre: DEBUG MARKER: GENERIC DEBUG start start
LustreError: 15768:0:(osd_io.c:1563:osd_write_commit())
ASSERTION( !PageDirty(lnb[i].lnb_page) ) failed:
LustreError: 15768:0:(osd_io.c:1563:osd_write_commit()) LBUG
Pid: 15768, comm: ll_ost_io00_000 3.10.0-957.el7_lustre.x86_64
Call Trace:
[<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[<0>] osd_write_commit+0x52c/0x870 [osd_ldiskfs]
[<0>] ofd_commitrw_write+0xe79/0x1510 [ofd]
[<0>] ofd_commitrw+0x2ad/0x9a0 [ofd]
[<0>] tgt_brw_write+0xfd0/0x1cb0 [ptlrpc]
[<0>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]
[<0>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[<0>] ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
[<0>] kthread+0xd1/0xe0
[<0>] ret_from_fork_nospec_begin+0xe/0x21
[<0>] 0xfffffffffffffffe
Kernel panic - not syncing: LBUG

Test-case: sanity-benchmark/fsx_partial_punch added

Test-Parameters: testlist=sanity-benchmark
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I89fcbc6af0cbf4b544b8d149703053909ecb6cad
Reviewed-on: https://review.whamcloud.com/43462
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lutf: Fix crash and other updates 26/44726/5
Amir Shehata [Mon, 23 Aug 2021 18:45:18 +0000 (11:45 -0700)]
LU-10973 lutf: Fix crash and other updates

Fix crash in wait_for_agents. Was mis-using
cYAML_get_next_seq_item().

Update the lustre_lnet_config_ni() with a newly added parameter
for conns_per_peer. Later on tests can be added to explicitly
test setting the conns_per_peer from the C API.

Remove auth_timeout from the paramiko file to be backwards
compatible with older versions of the paramiko python API.

Only delete the progress file if this node is the LUTF master
node. This is to avoid other nodes trampling over each other
if they are using the same directory to dump temporary files.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifb5ef0e16c6bc859c3893919a9242b64fd049ebe
Reviewed-on: https://review.whamcloud.com/44726
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: extend rspt_next_hop_nid in lnet_rsp_tracker 94/43594/8
Mr NeilBrown [Mon, 6 Apr 2020 03:03:36 +0000 (13:03 +1000)]
LU-10391 lnet: extend rspt_next_hop_nid in lnet_rsp_tracker

rspt_next_hop_nid in 'struct lnet_rsp_tracker' is now
a 'struct lnet_nid'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1348a05e572782383a2e68eb7a6be514a53b28b8
Reviewed-on: https://review.whamcloud.com/43594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lr_nid to struct lnet_nid 93/43593/8
Mr NeilBrown [Wed, 18 Aug 2021 21:01:48 +0000 (17:01 -0400)]
LU-10391 lnet: change lr_nid to struct lnet_nid

The nid in 'struct lnet_route' is now a struct lnet_nid'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2e2f2e9c8d2cbdbc87b408ee4589952f2df02880
Reviewed-on: https://review.whamcloud.com/43593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: enhance connect/accept to support large addr 05/42105/11
Mr NeilBrown [Fri, 3 Apr 2020 05:37:26 +0000 (16:37 +1100)]
LU-10391 lnet: enhance connect/accept to support large addr

This patch introduces a version-2 of the acceptor protocol.  This
version uses a 'struct lnet_nid' rather than 'lnet_nid_t'

lnet_connect() now accepts a struct lnet_nid and uses version 2 if
necessary.  lnet_accept() accepts either v1 or v2.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I523be0d217b6239c9791ff4fa536b9255c029ae7
Reviewed-on: https://review.whamcloud.com/42105
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: introduce lnet_processid for ksock_peer_ni 04/42104/13
Mr NeilBrown [Fri, 8 May 2020 00:53:53 +0000 (10:53 +1000)]
LU-10391 lnet: introduce lnet_processid for ksock_peer_ni

struct lnet_processid (without the '_') is like lnet_process_id, but
contains a 'struct lnet_nid' rather than lnet_nid_t.

So far it is only used for ksnp_id in struct ksock_peer_ni, and
related functions.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1fea693b1c84ca4c3ac1821f55874ad11519a33b
Reviewed-on: https://review.whamcloud.com/42104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: factor out key calculation for ksnd_peers 03/42103/13
Mr NeilBrown [Mon, 9 Mar 2020 03:13:05 +0000 (14:13 +1100)]
LU-10391 socklnd: factor out key calculation for ksnd_peers

The hash_table library requires a "long" to be used as a key.  We
currently provide the nid, which at 64bits is a suitable long on 64bit
hosts, but isn't really correct on 32bit hosts.

When we change to an extend nid (which is 160bits) it will be even
less appropriate.

So create a separate function to compute a 'long' key, and implement
by simply xoring 'long'-sized parts of the nid together.  On a 64bit
machine, this is currently optimized away for lnet_nid_t, but that
will change when we convert to struct lnet_nid.

This new function is placed in lnet-types.h as it will be more
generally useful later.

The hash_table library calls hash_long() on the key, so we don't need
to do anything more interesting than xoring.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I22c59a87c9872bb59a2f47c2a8c57b287ed53ed3
Reviewed-on: https://review.whamcloud.com/42103
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lp_disc_*_nid to struct lnet_nid 20/44620/5
Mr NeilBrown [Tue, 22 Jun 2021 05:24:42 +0000 (15:24 +1000)]
LU-10391 lnet: change lp_disc_*_nid to struct lnet_nid

Change lp_disc_src_nid and lp_disc_dst_nid in struct lnet_peer to
struct lnet_nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0f127fbc790c0821900d7b8abfa56c1a7de8f944
Reviewed-on: https://review.whamcloud.com/44620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lp_primary_nid to struct lnet_nid 02/42102/13
Mr NeilBrown [Wed, 18 Aug 2021 20:31:46 +0000 (16:31 -0400)]
LU-10391 lnet: change lp_primary_nid to struct lnet_nid

Change lp_primary_nid in struct lnet_peer to struct lnet_nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I386e85062257f6f8832ffdf4b9603c0e1c072dae
Reviewed-on: https://review.whamcloud.com/42102
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lpni_nid in lnet_peer_ni to lnet_nid 01/42101/13
Mr NeilBrown [Mon, 6 Apr 2020 06:31:55 +0000 (16:31 +1000)]
LU-10391 lnet: change lpni_nid in lnet_peer_ni to lnet_nid

lpni_nid in 'struct lnet_peer_ni' is converted to 'struct lnet_nid'
and various supporting functions updated.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7a99f758b600a0dd0668edd368663ff65f603486
Reviewed-on: https://review.whamcloud.com/42101
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: add string formating/parsing for IPv6 nids 42/43942/8
Mr NeilBrown [Tue, 8 Jun 2021 04:02:14 +0000 (14:02 +1000)]
LU-10391 lnet: add string formating/parsing for IPv6 nids

New entries for struct netstrfns:
  nf_addr2str_size
  nf_str2addr_size
which accept or report the size of the address in bytes.
New matching functions that can report or parse IPv4 and IPv6
addresses.

New interface - currently unused - libcfs_strnid() which takes a str
and provides a 'struct lnet_nid' with appropriate nid_size.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idbfc8bb9502192e1dc6a217750e7f4431e3eca4a
Reviewed-on: https://review.whamcloud.com/43942
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: introduce struct lnet_nid 00/42100/26
Mr NeilBrown [Fri, 20 Aug 2021 14:17:35 +0000 (10:17 -0400)]
LU-10391 lnet: introduce struct lnet_nid

LNet nids are currently limited to 4-bytes for addresses.
This excludes the use of IPv6.

In order to support IPv6, introduce 'struct lnet_nid' which can hold
up to 128bit address and is extensible, and deprecate 'lnet_nid_t'.
lnet_nid_it will eventually be removed.  Where lnet_nid_t is often
passed around by value, 'struct lnet_nid' will normally be passed
around by reference as it is over twice as large.

The net_type field, which currently has value up to 16, is now limited
to 0-254 with 255 being used as a wildcard.  The most significant byte
is now a size field which gives the size of the whole nid minus 8.  So
zero is correct for current nids with 4-byte addresses.

Where we still need to use 4-byte-address nids, we will use names
containing "nid4".  So "nid4" is a lnet_nid_t when "nid" is a struct
lnet_nid.  lnet_nid_to_nid4 converts a 'struct lnet_nid' to an
lnet_nid_t.

While lnet_nid_t is stored and often transmitted in host-endian format
(and possibly byte-swapped on receipt), 'struct lnet_nid' is always
stored in network-byte-order (i.e.  big-endian).  This is more common
approach for network addresses.

In this first instance, 'struct lnet_nid' is used for ni_nid in
'struct lnet_ni', and related support functions.

In particular libcfs_nidstr() is introduced which parallels
libcfs_nid2str(), but takes 'struct lnet_nid'.

In cases were we need to have similar functions for old and new style
nid, the new function is introduced with a slightly different name,
such as libcfs_nid2str above, or LNET_NID_NET (like LNET_NIDNET).
It will be confusing having both, but the plan is to remove the old
names as soon as practical.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4dcf1bab856621915b6535958d77cdde89105d96
Reviewed-on: https://review.whamcloud.com/42100
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14930 mdt: abort_recov_mdt shouldn't abort client recovery 10/44610/2
Mikhail Pershin [Wed, 11 Aug 2021 14:30:48 +0000 (17:30 +0300)]
LU-14930 mdt: abort_recov_mdt shouldn't abort client recovery

When abort_recov_mdt is set to abort MDT-MDT recovery then
abort_recovery flag is set too inside target_stop_recovery_thread()
call, that causes not just MDT-MDT recovery abort but aborts
also clients/MDT recovery.

Fixes: dd9e79b64d ("LU-12546 mdt: abort recovery between MDTs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ibda05e91a2da90156e2b6c9fdcb2169cdbd50fe4
Reviewed-on: https://review.whamcloud.com/44610
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 tests: silence gcc10 error for badarea_io 70/44670/4
James Simmons [Mon, 16 Aug 2021 17:02:15 +0000 (13:02 -0400)]
LU-14093 tests: silence gcc10 error for badarea_io

With gcc10 badarea_io will fail to build with the following error.

badarea_io.c: In function ‘main’:
badarea_io.c:59:7: error: ‘write’ reading 2097152 bytes from a
                           region of size 4 [-Werror=stringop-overflow=]
   59 |  rc = write(fd, &fd, 2UL*1024*1024);
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Talking to Oleg see stated this is the done this way on purpose.
So instead of 'fixing' the issue in this case we silence the gcc
warning.

Test-Parameters: trivial
Test-Parameters: env=ONLY=133f,133g testlist=sanity
Change-Id: Iee79c7988cc209fd099c23c38a8bd7df96015b05
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14912 obdclass: prefer T10 checksum if the target supports it 57/44657/2
Li Dongyang [Fri, 13 Aug 2021 08:58:46 +0000 (18:58 +1000)]
LU-14912 obdclass: prefer T10 checksum if the target supports it

If the target actually has T10PI support, we prefer to use that
T10 checksum even it's not the fastest on the client, given
checksum_type is not explicitly set.

Change-Id: If91217881fcadbc84d1e360e65648344f5ac2447
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
2 years agoLU-14928 mgs: allow md target re-register 94/44594/5
Alexander Zarochentsev [Sun, 30 May 2021 13:43:05 +0000 (16:43 +0300)]
LU-14928 mgs: allow md target re-register

In a DNE system, it is not safe to do writeconf of
a MD target and attempt to mount (and re-register) it again,
as it creates a weird MDT-MDT osp devices like
fsname-MDT0001-osp-MDT0001" and makes the system non-functioning.
The fix doesn't allow creation of illegal devices.

HPE-bug-id: LUS-10098
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I698ee6d70ac96f54eaec57b5c5fe553d130ba011
Reviewed-on: https://review.whamcloud.com/44594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14926 utils: print unlink and setattr recs in llog_reader 91/44591/5
Alexander Zarochentsev [Fri, 16 Jul 2021 19:16:29 +0000 (22:16 +0300)]
LU-14926 utils: print unlink and setattr recs in llog_reader

Enhance llog_reader to print unlink and setattr llog records
correctly.

HPE-bug-id: LUS-10220
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7b44f65c976459d143521185a807939524f67fa2
Reviewed-on: https://review.whamcloud.com/44591
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14924 osd-ldiskfs: fix T10PI verify/generate_fn 48/44548/4
Li Dongyang [Tue, 10 Aug 2021 13:20:13 +0000 (23:20 +1000)]
LU-14924 osd-ldiskfs: fix T10PI verify/generate_fn

We are making wrong assumptions in the verify/generate_fn
of T10PI.

Consider this case: we have 4 pages lnb[0-3] in osd_iobuf.
lnb[2] is mapped to a hole, so it won't be added to bio.
If lnb[3] happens to be contiguous after lnb[1], lnb[3] will
be added to bio, with a bi_idx of 2.
In the verify/generate_fn, we work out which niobuf_local
to feed the guard tags to using bi_idx and obp_start_page_idx
and we will end up with wrong niobuf and set the guard tags
for lnb[2].

Contiguous blocks in bio doesn't necessarily mean we are looking
at contiguous niobuf_local/lnb in osd_iobuf->dr_lnbs

Test-Parameters: env=ONLY=77n testlist=sanity
Change-Id: I1ea1b6498692044e680c8754cd31e2c2b7bc9539
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44548
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14160 tests: fix fsx logdump to fit in 80 chars 10/44510/3
Andreas Dilger [Thu, 5 Aug 2021 20:45:46 +0000 (14:45 -0600)]
LU-14160 tests: fix fsx logdump to fit in 80 chars

Fix fsx logdump fallocate/truncate lines to fit within 80 columns.
Remove spurious leading 0 for every operation length.

Test-Parameters: trivial testlist=sanityn env=ONLY=16
Fixes: cb037f305c64 ("LU-14160 fallocate: Add punch mode to fallocate")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I93460b62be8611926e620241232d886dee3ebbe5
Reviewed-on: https://review.whamcloud.com/44510
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14773 tests: quiet down some verbose messages 34/44034/5
Andreas Dilger [Fri, 18 Jun 2021 21:35:29 +0000 (15:35 -0600)]
LU-14773 tests: quiet down some verbose messages

Don't print anything into the test logs for normal background
operations that are run as part of run_one(), so that they
don't clutter the test output with repeated/useless messages.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib6a49fc268e4cd0ad92c71a391865ce2d73ebbe5
Reviewed-on: https://review.whamcloud.com/44034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14321 tests: create PFL file in sanityn 51b 27/44027/4
James Nunez [Thu, 17 Jun 2021 23:25:55 +0000 (17:25 -0600)]
LU-14321 tests: create PFL file in sanityn 51b

sanityn test 51b was modified to integrate statx API in
Lustre version 2.13.54.  When we run version interop testing
with servers less than 2.13.54 and later clients, the test
will fail.

We should modify the test to create a PFL file without the
'extension-size' lfs setstripe option which will allow this
test to run with servers less than 2.13.54.

Fixes: 3f7853b31ef6 ("LU-10934 llite: integrate statx() API with Lustre")
Test-Parameters: trivial
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=51b testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic3feb72771aa2db050b792159175624260e71f5b
Reviewed-on: https://review.whamcloud.com/44027
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
2 years agoLU-12848 tests: link succeded to an ophan remote object 91/35991/14
Alexander Zarochentsev [Mon, 12 Aug 2019 20:59:05 +0000 (23:59 +0300)]
LU-12848 tests: link succeded to an ophan remote object

An open file gets unlinked by rename,
at the same time a cross-mdt link is able to create a name
for a dying object. That causes a file system corruption
seeing as a failed attempt to remove the test dir, also
e2fsck would see an unconnected inode.

Cray-bug-id: LUS-6208
Test-Parameters: mdtcount=2 envdefinitions=ONLY=111 testlist=sanityn
Change-Id: Ic1fde278e5f4b53eaf5560ab50fe460d8c7f7dc3
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/35991
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11303 quota: enforce block quota for chgrp 96/33996/17
Hongchao Zhang [Fri, 2 Apr 2021 06:53:59 +0000 (14:53 +0800)]
LU-11303 quota: enforce block quota for chgrp

In patch https://review.whamcloud.com/30146 "LU-5152 quota: enforce
block quota for chgrp", problems were introduced due to synchronous
requests from the MDS to the OSS to change the quota assignment of
files during chgrp operations. However, in some cases, the OSTs are
themselves out of grant and may send a quota request to the MDS,
which may result in a deadlock. Another issue is the slow performance
caused by the synchronous operation between MDT and OSTs.

This patch drops the synchronous RPC requirement of the original
patch #30146 to avoid this problem.

Previously, problems in quota tracking related to chgrp were introduced
due to synchronous RPCs from the MDS to the OSS when changing the group
ownership of objects for quota tracking since
Fixes: 8a71fd5061b ("LU-5152 quota: enforce block quota for chgrp")

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I40556b9e8a0628eb18aa806d2f6b3dfb9b53e874
Reviewed-on: https://review.whamcloud.com/33996
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew tag 2.14.54 2.14.54 v2_14_54
Oleg Drokin [Wed, 18 Aug 2021 14:31:36 +0000 (10:31 -0400)]
New tag 2.14.54

Change-Id: I062c9dc76585f42edfa78108f286824e75badf8c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 lutf: fix build with gcc10 84/44484/7
James Simmons [Wed, 4 Aug 2021 13:39:19 +0000 (09:39 -0400)]
LU-14093 lutf: fix build with gcc10

The new LUTF code has build issues with gcc10. I see the following
build errors.

ld: lutf-lutf_listener.o:lutf.h:88: multiple definition of `g_lutf_cfg'
ld: lutf-lutf_listener.o:lutf.h:22: multiple definition of `debugtimestr'
ld: lutf-lutf_listener.o:lutf.h:21: multiple definition of `di'
ld: lutf-lutf_listener.o:lutf.h:20: multiple definition of `debugnow'

In function ‘snprintf’,
    inlined from ‘python_run_interactive_shell’ at lutf_python.c:45:2:
stdio2.h:71:10: error: ‘%s’ directive argument is null [-Werror=format-truncation=]
   71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   72 |        __glibc_objsize (__s), __fmt,
      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   73 |        __va_arg_pack ());
      |        ~~~~~~~~~~~~~~~~~

This patch resolves these warnings. Without this patch LUTF will
not build on Ubuntu 20 LTS.

Test-Parameters: trivial
Change-Id: Ie3c99f8c6cf2f5de583dc95a0dc63fcde1aa6ffd
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44484
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14903 doc: update lfs-setdirstripe man page 81/44481/2
Lai Siyao [Mon, 2 Aug 2021 11:55:12 +0000 (07:55 -0400)]
LU-14903 doc: update lfs-setdirstripe man page

Update lfs-setdirstripe man page to reflect the change of
filesystem-wide default directory layout.

Test-parameters: trivial

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1e7818679e057add4747565a2fc850e1857cd7b0
Reviewed-on: https://review.whamcloud.com/44481
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-14899 ldiskfs: Add 5.4.136 mainline kernel support 50/44450/2
Oleg Drokin [Sat, 31 Jul 2021 04:55:40 +0000 (00:55 -0400)]
LU-14899 ldiskfs: Add 5.4.136 mainline kernel support

The changes likely appeared in an earlier release
that we may also track down and update to.

Test-Parameters: trivial
Change-Id: I92125087650109b8cc8a968b2fd95ba5f8e7f998
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 years agoLU-12815 socklnd: set conns_per_peer based on link speed 17/44417/4
Serguei Smirnov [Wed, 28 Jul 2021 21:47:39 +0000 (14:47 -0700)]
LU-12815 socklnd: set conns_per_peer based on link speed

Specifying conns_per_peer=0 for a ni is now used to set
the conns_per_peer as a function of the corresponding link speed
as follows:
conns_per_peer = (ilog2(Gbps) / 2 + 1)

Listed below are the resulting defaults for common link speeds:
100Gbps, 200Gbps -> 4
        50Gbps  -> 3
        5Gbps, 10Gbps  -> 2
        less than 4Gbps  -> 1

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ief2b33a796c180d8669bd5796b3e35ec748423a5
Reviewed-on: https://review.whamcloud.com/44417
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7] 76/44376/3
Jian Yu [Thu, 22 Jul 2021 07:26:50 +0000 (00:26 -0700)]
LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]

Update RHEL7.9 kernel to 3.10.0-1160.36.2.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ie2898b1df28c8b99ea4099e94baafe388c6aa626
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44376
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14865 utils: llog_reader.c printf type mismatch 46/44346/5
Gian-Carlo DeFazio [Tue, 20 Jul 2021 00:30:36 +0000 (17:30 -0700)]
LU-14865 utils: llog_reader.c printf type mismatch

Add (unsigned long long) cast to results of
__le64_to_cpu so that it matches the formatting (%llu)
of the enclosing printf call.

Build log message:
"llog_reader.c:887:9: error: format '%llu' expects
argument of type 'long long unsigned int', but
argument 3 has type '__u64' [-Werror=format=]"

Test-Parameters: trivial
Fixes: 9962d6f84db5 LU-14617 utils: llog_reader updatelog support
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I9549e0a0bd21727dfcc42992b693bc39a779e1a1
Reviewed-on: https://review.whamcloud.com/44346
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_* 09/44309/4
Mr. NeilBrown [Wed, 4 Aug 2021 17:27:29 +0000 (13:27 -0400)]
LU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_*

The calling convention for ->proc_handler is rather clumsy,
as a comment in fs/procfs/proc_sysctl.c confirms.
lustre has copied this convention to lnet_debugfs_{read,write},
and then provided a wrapper for handlers - lprocfs_call_handler -
to work around the clumsiness.

It is cleaner to just fold the functionality of lprocfs_call_handler()
into lnet_debugfs_* and let them call the final handler directly.

If these files were ever moved to /proc/sys (which seems unlikely) the
handling in fs/procfs/proc_sysctl.c would need to be fixed to, but
that would not be a bad thing.

So modify all the functions that did use the wrapper to not need it
now that a more sane calling convention is available.

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I548ed6a3179cdb7cd5c024febd3fee4709285a82
Reviewed-on: https://review.whamcloud.com/44309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14787 libcfs: Proved an abstraction for AS_EXITING 70/44070/6
Shaun Tancheff [Thu, 22 Jul 2021 07:31:30 +0000 (02:31 -0500)]
LU-14787 libcfs: Proved an abstraction for AS_EXITING

Linux kernel v3.14-7405-g91b0abe36a7b added AS_EXITING flag
AS_EXITING flag is set while address_space mapping is exiting.

Provide an abstraction mapping_clear_exiting() to clear
the AS_EXITING flag. This balances the kernel mapping_set_existing()
and is used for older kernels when enum mapping_flags does
not include AS_EXITING.

HPE-bug-id: LUS-9977
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib3101b7e3eb8a7fcfd0012ac27367f1e65537f5d
Reviewed-on: https://review.whamcloud.com/44070
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1] 37/44037/2
Jian Yu [Sat, 19 Jun 2021 00:26:07 +0000 (17:26 -0700)]
LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]

Update SLES12 SP5 kernel to 4.12.14-122.74.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I98952c097b14c68f744a570e5558fb21d9392ad2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14773 tests: skip check_network() on working node 33/44033/4
Andreas Dilger [Fri, 18 Jun 2021 20:55:51 +0000 (14:55 -0600)]
LU-14773 tests: skip check_network() on working node

Don't call check_network() (which can take several seconds per node)
if the get_param command ran successfully on all of the nodes.  The
get_param success implies the connection to the remote nodes works
properly, and completes more quickly.

For consistency with previous behavior, still call check_network() if
get_param didn't return any output, since the modules may be unloaded.

Remove some extra visual clutter from every subtest.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6a11cf8a1a6b43bebc3ff8f5506e1faac13ebbe5
Reviewed-on: https://review.whamcloud.com/44033
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14668 lnet: Lock primary NID logic 63/43563/5
Amir Shehata [Wed, 5 May 2021 18:35:06 +0000 (11:35 -0700)]
LU-14668 lnet: Lock primary NID logic

If a peer is created by Lustre make sure to lock that peer's
primary NID. This peer can be discovered in the background.
There is no need to block until discovery is complete, as Lustre
can continue on with the primary NID it provided.

Discovery will populate the peer with other interfaces the peer has
but will not change the peer's primary NID. It can also delete
peer's NIDs which Lustre told it about (not the Primary NID).

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I677b8e01fc89a42128327645861ca6cfba4c1b1a
Reviewed-on: https://review.whamcloud.com/43563
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14668 lnet: peer state to lock primary nid 62/43562/5
Amir Shehata [Wed, 5 May 2021 01:20:54 +0000 (18:20 -0700)]
LU-14668 lnet: peer state to lock primary nid

Introduce the following two peer states:

LNET_PEER_LOCK_PRIMARY, set by Lustre to lock the primary NID
of a peer to the NID Lustre is configured with

LNET_PEER_BAD_CONFIG, set by LNet if Lustre attempts to set
a peer's Primary NID to a NID used as the primary NID of another
peer

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8c55e90ad2abd083c2fc902a04d4cd06a3412bfa
Reviewed-on: https://review.whamcloud.com/43562
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer/peer NI when processing llog 10/43510/6
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/43510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14661 lnet: Provide kernel API for adding peers 09/43509/5
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/43509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14531 osd: serialize access to object vs object destroy 33/43233/18
Alex Zhuravlev [Thu, 18 Mar 2021 08:43:06 +0000 (11:43 +0300)]
LU-14531 osd: serialize access to object vs object destroy

in osd-zfs as ZFS doesn't provide an internal mechanism for this.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5f25710a5cf1568f124733a15e77a37ffcb55434
Reviewed-on: https://review.whamcloud.com/43233
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>