Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-15613 utils: add lfs_migrate --cp option
Andreas Dilger [Thu, 3 Mar 2022 04:51:25 +0000 (21:51 -0700)]
LU-15613 utils: add lfs_migrate --cp option

Since "lfs_migrate" is only copying a single file at a time, there is
no real benefit from using "rsync" as the only copytool.  Add a short
"--cp" option with suitable arguments to complement "--rsync".

Also allow the copy command to be replaced arbitrarily with RSYNC and
RSYNC_OPTS environment variables to be specified by the caller.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I66c664948d5cc5a7baa5a65550abfaa8d73ebbe5
Reviewed-on: https://review.whamcloud.com/46691
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoDDN-1956 audit: laudit.conf groks mount point
Sebastien Buisson [Mon, 15 Mar 2021 14:38:06 +0000 (23:38 +0900)]
DDN-1956 audit: laudit.conf groks mount point

Add a new field to laudit.conf, named "mount", that represents
the mount path of the file system to retrieve audit information
from.

Change-Id: Ic02dc39e9a4e23286916bae92e9ee7a963406e3f
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46906
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoDDN-1956 audit: scandir's filter returns non-zero to include
Sebastien Buisson [Mon, 15 Mar 2021 13:48:29 +0000 (22:48 +0900)]
DDN-1956 audit: scandir's filter returns non-zero to include

When a filter function is used for scandir to select which entries
are to be included, the select routine should return a non-zero
value if the directory entry is to be included.

Fixes: dd3ba5e ("EX-274 lipe: add style check of C language code")
Change-Id: Ic02dc39e9a4e23286916bae92e9ee7a963406e3e
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46905
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-3612 lamigo: use system(2) instead of libssh for local agents
Alex Zhuravlev [Fri, 6 Aug 2021 05:59:51 +0000 (08:59 +0300)]
EX-3612 lamigo: use system(2) instead of libssh for local agents

In lamigo, interpret an agent host of '-' to mean the localhost using
system(2) instead of libssh.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I1211f2e43af06659c660fdd57496d09c5d5413fa
Reviewed-on: https://review.whamcloud.com/46902
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4140 lamigo: ignore preallocate status
Alex Zhuravlev [Mon, 8 Nov 2021 08:37:00 +0000 (11:37 +0300)]
EX-4140 lamigo: ignore preallocate status

preallocate status shouldn't be taken into account when space
usage is calculated, otherwise transient errors may cause
incorrect space utilization and unexpected mirrors to that
pool.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7cefd3bc7366bd5b08e1e08cec2eaa4d91e16a90
Reviewed-on: https://review.whamcloud.com/46901
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoLU-15593 mdt: Add option to disable use of SOM
Patrick Farrell [Fri, 4 Mar 2022 19:28:27 +0000 (14:28 -0500)]
LU-15593 mdt: Add option to disable use of SOM

There's currently no way to disable use of strict SOM,
which is a problem if there's ever a SOM bug.  This is
tricky to do from the client, but easy on the MDT.

The test just verifies that the size stays the same with
strict SOM disabled, because there's no easy way to check
SOM is disabled unless SOM is broken.  (Since it gives the
same value for stat.)

Note this patch requires LU-15609 to be fixed on the client
or the client will see size as 0 when SOM is disabled.

Lustre-change: https://review.whamcloud.com/46683/
Lustre-commit: 2867d497b92f43f06edd7c378828f1fb8c3b300c (tbd)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I338fa7e4dd423b07df0f3c5bad4ec6f02e935fea
Reviewed-on: https://review.whamcloud.com/46709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-4965 lipe: use new e2fsprogs xattr API in lipe_scan3
John L. Hammond [Mon, 21 Mar 2022 15:40:43 +0000 (10:40 -0500)]
EX-4965 lipe: use new e2fsprogs xattr API in lipe_scan3

In lipe_scan3, switch to the ext2fs_xattr_handle API since this
handles xattrs stored in a separate file. Update sanity-lipe-scan3
test_108() to verify.

Test-Parameters: trivial testlist=sanity-lipe-find3
Test-Parameters: trivial testlist=sanity-lipe-scan3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I307d78ba03c6612e5787f71a4682c0af5b23f1c5
Reviewed-on: https://review.whamcloud.com/46875
Tested-by: jenkins <devops@whamcloud.com>
3 years agoEX-5011 lipe: fixup lipe.spec.in python macros
John L. Hammond [Tue, 22 Mar 2022 13:56:44 +0000 (08:56 -0500)]
EX-5011 lipe: fixup lipe.spec.in python macros

Update the python macros in lipe.spec.in to support CentOS 8 builds of
the lipe RPMs.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Id910f00a672c4256089fe71d6a98171c1bbb3dd4
Reviewed-on: https://review.whamcloud.com/46891
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-5000 lipe: lamigo ssh_channel_read on CentOS 8
Alexandre Ioffe [Thu, 17 Mar 2022 23:53:47 +0000 (16:53 -0700)]
EX-5000 lipe: lamigo ssh_channel_read on CentOS 8

Restart ssh_channel_read() after timeout (return 0).

Test-Parameters: env=ONLY=72 trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I6ce5b6e941e432831452a31f8ea840abd4940821
Reviewed-on: https://review.whamcloud.com/46861
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
3 years agoEX-4999 lamigo: fix argument order in lamigo_show_progress()
John L. Hammond [Thu, 17 Mar 2022 17:22:17 +0000 (12:22 -0500)]
EX-4999 lamigo: fix argument order in lamigo_show_progress()

Fix argument order in lamigo_show_progress().

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I0f765c149f29b802c365a44d61d6521f05e7504f
Reviewed-on: https://review.whamcloud.com/46860
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
3 years agoLU-13717 llite: fix crypto patch merge errors
Andreas Dilger [Mon, 21 Mar 2022 21:19:17 +0000 (15:19 -0600)]
LU-13717 llite: fix crypto patch merge errors

Due to delayed landing of crypto patches, there was a semantic
conflict with another patch that changed the ll_getattr()
function prototype code but did not block the patch from landing.

Test-Parameters: trivial testlist=sanity-sec
Fixes: 3256c354a6 ("LU-13717 sec: filename encryption - symlink support")
Fixes: afec2fef71 ("LU-13717 sec: filename encryption")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia07d6e60f1840889a371538ce5414315e65f6e11
Reviewed-on: https://review.whamcloud.com/46877
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
3 years agoLU-13587 quota: protect qpi in proc
Sergey Cheremencev [Thu, 15 Apr 2021 14:14:51 +0000 (17:14 +0300)]
LU-13587 quota: protect qpi in proc

Access to pool info only when pool is fully inited.
This patch protects from following panic:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffffc0e55e46>] qpi_state_seq_show+0x86/0xe0 [lquota]
    ...
    Call Trace:
    [<ffffffffbbc68b50>] seq_read+0x130/0x440
    [<ffffffffbbcb8380>] proc_reg_read+0x40/0x80
    [<ffffffffbbc4118f>] vfs_read+0x9f/0x170
    [<ffffffffbbc4204f>] SyS_read+0x7f/0xf0
    [<ffffffffbc176ddb>] system_call_fastpath+0x22/0x27

Add test 79 to sanity-quota to check that race between
access to /proc/.../dt-pool_name/info of non-existed pool
with this pool creating doesn't cause a panic.

Lustre-change: https://review.whamcloud.com/43987
Lustre-commit: a11168f46cf01c45a431a804da01cf67e18ecca9

HPE-bug-id: LUS-9938
Change-Id: I8eff846c6c3881a8431a98efb54e660ecb9155bf
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46658
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
3 years agoLU-13974 llog: check stale osp object
Alexander Boyko [Tue, 24 Nov 2020 05:34:11 +0000 (00:34 -0500)]
LU-13974 llog: check stale osp object

The logic of osp_attr_get has 2 path,
1) return attributes from a cache for health osp object
2) make an out update request and return attributes for stale
osp object, object lose stale state.

When some out update request with llog writes failed, osp object
become stale. But llog handle stay inconsistent (bitmap,count,
last_index), and a next llog_add->llog_osd_write_rec do dt_attr_get,
gets attributes and makes osp object valid, and uses wrong llog
handle data. The result is index jump at llog file - recX, recX+2.
And it makes an error during update log processing if failover take
a place.
The fix adds dt_object_stale function to check osp_object.
llog_osd_write_rec check it and return ESTALE. llog_add would fail
with ESTALE error and doesn't corrupt update log.

Lustre-change: https://review.whamcloud.com/40742
Lustre-commit: 82c6e42d6137f39a1f2394b7bc6e8d600eb36181

HPE-bug-id: LUS-9030
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iadf53fd816e1c5bde0a19d4c537f0408796c864a
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46802
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15661 nodemap: fix map mode value for 'both'
Sebastien Buisson [Fri, 18 Mar 2022 16:43:31 +0000 (01:43 +0900)]
LU-15661 nodemap: fix map mode value for 'both'

The patch that introduced the ability to map project IDs with
nodemap changed the value used for the "map both uid and gid"
case, from 0 to 3.
This poses a problem in case of upgrade from a previous Lustre
version, so re-introduce the value 0 as NODEMAP_MAP_BOTH_LEGACY.

Lustre-change: https://review.whamcloud.com/46870
Lustre-commit: TBD (a774ea01a7ffdf177a0229d9794376bddcb9ab57)

Change-Id: I1f605de9c97faff32411da5052e8782a60645767
Fixes: 491f101042 ("LU-14797 sec: add projid to nodemap")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46869
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-4182 sec: support of PCC-RO for encrypted files
Sebastien Buisson [Mon, 20 Dec 2021 16:41:56 +0000 (17:41 +0100)]
EX-4182 sec: support of PCC-RO for encrypted files

In order to support PCC-RO for encrypted files, we decide to store
in PCC the ciphertext version of the Lustre files. We proceed to
decryption of PCC files only in the page cache, so cleartext is just
in memory. When a Lustre file is detached from PCC, or when the
encryption key is removed, we trash those PCC page cache pages.

As PCC files contain ciphertext, their sizes are aligned on
LUSTRE_ENCRYPTION_UNIT_SIZE instead of being lustre inode's clear
text size. In order to keep track of Lustre files' actual sizes,
we use a dedicated xattr on the PCC files. Its value is set at pcc
attach time, which is fine for PCC-RO.

Also add sanity-pcc test_21j to exercise this.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id7d96adb4eb4770c813a042acf7ed6c42224b9bf
Reviewed-on: https://review.whamcloud.com/45910
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15407 test: remove dummy enc key at cleanup
Sebastien Buisson [Tue, 11 Jan 2022 07:27:42 +0000 (08:27 +0100)]
LU-15407 test: remove dummy enc key at cleanup

Make sure to remove the dummy encryption key from session keyring
when cleaning up encryption tests.

Lustre-change: https://review.whamcloud.com/46038
Lustre-commit: ec0b308614a2bad18a7a1fd805f36eb8ed6ea5eb

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I840490fca0a485110d077fe85254ced817fd55e3
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15406 sec: fix in-kernel fscrypt support
Sebastien Buisson [Thu, 6 Jan 2022 09:18:20 +0000 (10:18 +0100)]
LU-15406 sec: fix in-kernel fscrypt support

When using in-kernel fscrypt provided by Linux 5.4, the encryption
context can be retrieved by calling the .get_context function defined
in the struct fscrypt_operations of the super_block.
llite needs to retrieve the encryption context explicitly in case of
migration via volatile files.

Lustre-change: https://review.whamcloud.com/45987
Lustre-commit: 2169aed82a32df47be9aef2f249178ede6c7fadd

Fixes: 09c558d16f ("LU-14677 sec: migrate/extend/split on encrypted file")
Fixes: fdbf2ffd41 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I76dbd21f0dc95920519ea375c583bc378d7c9f53
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15176 sec: allow subdir mount of encrypted dir
Sebastien Buisson [Fri, 29 Oct 2021 11:29:25 +0000 (13:29 +0200)]
LU-15176 sec: allow subdir mount of encrypted dir

In case of sub-directory mount of an encrypted directory, we need to
retrieve the encryption context of the root inode of the filesystem.
This is done by making the MDT return this upon getattr reply.

Also add sanity-sec test_60 to exercise this capability.

Lustre-change: https://review.whamcloud.com/45407
Lustre-commit: faf057b46bc770a1a69cacd59e65a40a4b18b9fd

Fixes: 40d91eafe2 ("LU-12275 sec: atomicity of encryption context getting/setting")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic7a273813533f2904225011b247cdfe995ce9be8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15408 sec: confirm encrypted file's hash
Sebastien Buisson [Tue, 4 Jan 2022 17:16:47 +0000 (18:16 +0100)]
LU-15408 sec: confirm encrypted file's hash

It is a good practice to always confirm on server side the encrypted
file's hash included in the digested form sent by the client.

Lustre-change: https://review.whamcloud.com/45964
Lustre-commit: 9d98d1f7739e05bb4decf2614899ccb99b34826c

Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I42212a36b23e4e6e41184a78fa8244c5e2d8dd1f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14677 sec: remove MIGRATION_ compatibility defines
Sebastien Buisson [Fri, 10 Sep 2021 12:03:03 +0000 (14:03 +0200)]
LU-14677 sec: remove MIGRATION_ compatibility defines

Remove the MIGRATION_* compatibility flags and use
LLAPI_MIGRATION_* everywhere.

Lustre-change: https://review.whamcloud.com/44957
Lustre-commit: e42d2d67d3a0dcc726d1424d3158b6f649b5abd7

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab2a2f6dfc435377e9db0d4963547841b2cbc403
Reviewed-on: https://review.whamcloud.com/45908
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14677 sec: no encryption key migrate/extend/resync/split
Sebastien Buisson [Thu, 17 Jun 2021 13:31:44 +0000 (15:31 +0200)]
LU-14677 sec: no encryption key migrate/extend/resync/split

Allow some layout operations on encrypted files, even when the
encryption key is not available:
- lfs migrate
- lfs mirror extend
- lfs mirror resync
- lfs mirror verify
- lfs mirror split
We allow these access patterns to applications that know what they are
doing, by using the specific flag O_FILE_ENC and O_DIRECT.

Also add sanity-sec test_59a,b,c to exercise these access patterns.

Lustre-change: https://review.whamcloud.com/44024
Lustre-commit: fdbf2ffd41fa5660782d5ca8489ec2eb644c8113

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ieaeee0e5bf7643f18d775fe6daa5e31c2f349f8c
Reviewed-on: https://review.whamcloud.com/44182
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13717 sec: missing defs in includes for client encryption
Sebastien Buisson [Wed, 13 Oct 2021 09:35:01 +0000 (09:35 +0000)]
LU-13717 sec: missing defs in includes for client encryption

Add a few missing definitions in lustre_crypto.h that are required
in case Lustre client encryption is built against the in-kernel
fscrypt library.

Lustre-change: https://review.whamcloud.com/45221
Lustre-commit: 79ac7c144539e0d964db329c341ebf30a8472f5c

Fixes: d08ae042d8 ("LU-13717 sec: rework includes for client encryption")
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1965503554dcf660770d201444cfafd54aa84dce
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15027 sec: initialize ll_inode_info for fake inode
Sebastien Buisson [Wed, 22 Sep 2021 15:35:49 +0000 (17:35 +0200)]
LU-15027 sec: initialize ll_inode_info for fake inode

When creating an encrypted symlink, we need to make use of a fake
inode in order to be able to encrypt the target name before sending
the create request to the MDS.
This fake inode needs minimal initialization, but it is at least
required to properly initialize the ll_inode_info associated with this
fake inode.

Lustre-change: https://review.whamcloud.com/45023
Lustre-commit: 3fb7b6271855c0b12c5a560c7f6287cdda3d1cd6

Fixes: e735298935 ("LU-13717 sec: filename encryption - symlink support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I20c30d873f9bffdbdc8b5f272cb8b80e5be7fbfb
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45733
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13717 sec: filename encryption - symlink support
Sebastien Buisson [Tue, 31 Aug 2021 15:30:48 +0000 (17:30 +0200)]
LU-13717 sec: filename encryption - symlink support

On client side, call the appropriate llcrypt primitives from llite,
to proceed with symlink encryption before sending requests to servers
and symlink decryption upon request receipt.
The tricky part is that llcrypt needs an inode to encrypt the target
name. But by the time we prepare the symlink creation request to be
sent to the server with the target name (in ll_new_node), we do not
have an inode yet (it will be obtained only after we get the server
reply). So we create a fake inode and associate the right encryption
context to it, so that the symlink gets encrypted properly.

In order to report the correct size for an encrypted symlink (which is
ought to be the length of the symlink target), we need to read the
symlink target and decrypt or decode it in ->getattr(). This has a
performance hit, but given that the symlink target is cached in
->i_link (when the key is available), the symlink will not have to be
read and decrypted again later when it is actually followed,
readlink() is called, or lstat() is called again.
This part of the patch is adapted from kernel commit
d18760560593e5af921f51a8c9b64b6109d634c2
"fscrypt: add fscrypt_symlink_getattr() for computing st_size"

With encrypted file names, a symlink target is binary. So make sure
server side can handle that, by switching sp_symname to a
struct lu_name in struct md_op_spec.

Lustre-change: https://review.whamcloud.com/43394
Lustre-commit: e735298935b64541fc561bd9e978cd7af48c503e

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic6892fca8926a35001697c54aaf05d15563b139d
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45732
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13717 sec: filename encryption - digest support
Sebastien Buisson [Fri, 22 Jan 2021 12:06:50 +0000 (21:06 +0900)]
LU-13717 sec: filename encryption - digest support

A number of operations are allowed on encrypted files without the key:
- read file metadata (stat);
- list directories;
- remove files and directories.
In order to present valid names to users, cipher text names are base64
encoded if they are short. Otherwise we compute a digested form of the
cipher text, made of the FID (16 bytes) followed by the second-to-last
cipher block (16 bytes), and we base64 encode this digested form for
presentation to user.
These transformations are carried out in the specific overlay
functions, that now need to know the fid of the file.

As the digested form does not contain the whole cipher text name,
server side needs to proceed to an operation by FID for requests such
as lookup and getattr. It also relies on the content of the LinkEA to
verify the digested form as received from client side.

Lustre-change: https://review.whamcloud.com/43392
Lustre-commit: ed4a625d88567a2498c3fe32fd340ae7985e6ad0

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I45d10a426373c2cfe0b92a58c351da452d085d7d
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45731
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13717 sec: filename encryption
Sebastien Buisson [Tue, 23 Mar 2021 13:58:50 +0000 (22:58 +0900)]
LU-13717 sec: filename encryption

On client side, call the appropriate llcrypt primitives from llite,
to proceed with filename encryption before sending requests to servers
and filename decryption upon request receipt.
Note we need specific overlay functions to handle encoding and
decoding of encrypted filenames, as we do not want server side to deal
with binary names before they reach the backend file system layer.

On server side, mainly the OSD layer, we need to know the encryption
status of files being processed.
If an object belongs to an encrypted file, the filename has been
encoded by the client because it is binary, so it needs to be decoded
before being handed over to the backend file system layer.
And conversely, the filename of an encrypted file has to be encoded
before being sent over the wire.
Note server side is osd-ldiskfs only for now.

Lustre-change: https://review.whamcloud.com/43390
Lustre-commit: 4d38566a004f6a636c37ec0c86f053be9b905bd7

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7ac9047f5a046b8bc63afdbbb1f28e78aa5c8c7e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45730
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13717 sec: make client encryption compatible with ext4
Sebastien Buisson [Thu, 6 Jan 2022 21:19:02 +0000 (14:19 -0700)]
LU-13717 sec: make client encryption compatible with ext4

In order to benefit from encrypted file handling implemented in
e2fsprogs, we need to adjust the way Lustre deals with encryption
context of files.

First, the encryption context needs to be stored in an xattr named
"encryption.c" instead of "security.c". But neither llite nor ldiskfs
has an xattr handler for this "encryption." xattr type. So we need
to export ldiskfs_xattr_get and ldiskfs_xattr_set_handle symbols for
this to work.

Second, we set the LDISKFS_ENCRYPT_FL flag on files for which we set
the 'encryption.c' xattr. But we just keep this flag for on-disk
inodes, and make sure the flag is cleared for in-memory inodes.
The purpose is to help e2fsprogs with encrypted files handling, while
not disturbing Lustre server side with the encryption flag (servers
are not supposed to know about it for Lustre client-side encryption).

To maintain compatibility with 2.14 in which encryption context is
stored in "security.c" xattr, we try to fetch enc context from this
xattr if getting it from "encryption.c" fails. On client side, in all
cases everything looks like encryption context is stored in
"encryption.c".

Lustre-change: https://review.whamcloud.com/45211
Lustre-commit: 4231fab66eab3e984499bf0c6bd4514692a409fa

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I784ec530f0dfdd2743169ba2326ff6c5cdd4e85a
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45766
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-4811 lipe: Allows nfs/nfs4 targets for loris
Shuichi Ihara [Mon, 14 Feb 2022 03:37:48 +0000 (12:37 +0900)]
EX-4811 lipe: Allows nfs/nfs4 targets for loris

Loris only supprots local filesystems (e.g. ext4, xfs and brtfs) as
backup target.

embedded MDS (ESx00X) doesn't have enough space in the local disk.
it needs to eliminate filesystem type and allow NFS/NFS4 for
loris's backup targets.

Test-Parameters: trivial
Signed-off-by: Shuichi Ihara <sihara@ddn.com>
Change-Id: I010793402065cd553c18a7f408240841e4a22565
Reviewed-on: https://review.whamcloud.com/46515
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-4865 lipe: add timestamps to lamigo and lpurge log messages
Alexandre Ioffe [Wed, 23 Feb 2022 21:09:34 +0000 (13:09 -0800)]
EX-4865 lipe: add timestamps to lamigo and lpurge log messages

Added optional timestamp to log messages; Added command line
option --timestamps to lamigo/lpurge and 'timestamps' keyword
to lpurge config. Added timestamps to log files to
hot-pools test frame by default.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: If58f20eed31c6df30a70f0cb1c93430040f8fe62
Reviewed-on: https://review.whamcloud.com/46599
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
3 years agoEX-4863 lipe: initialize cmd in lamigo_dump_jobs()
John L. Hammond [Fri, 18 Feb 2022 16:18:09 +0000 (10:18 -0600)]
EX-4863 lipe: initialize cmd in lamigo_dump_jobs()

Fix an uninitialized use of cmd in lamigo_dump_jobs().

Test-Parameters: trivial testlist=hot-pools
Fixes: 8a29c986eb ("EX-4103 lamigo: rename some "check" functions")
Change-Id: I79c32d9b32a0a06fbd72c0b6dd21d8c6919aebc6
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46554
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4963 lipe: rescan after registering a new changelog user
John L. Hammond [Thu, 10 Mar 2022 17:00:04 +0000 (11:00 -0600)]
EX-4963 lipe: rescan after registering a new changelog user

In lamigo, if we register a new changelog user then force a rescan of
the device. Add hot-pools test_6c() to verify. Adjust hot-pools to be
more precise about param matching.

Test-Parameters: trivial testlist=hot-pools
Test-Parameters: trivial testlist=hot-pools env=ONLY=6c,ONLY_REPEAT=20
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I9576d50f87ffe0eb5e44c62087fca317ddce50c7
Reviewed-on: https://review.whamcloud.com/46781
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
3 years agoEX-4965 lipe: use new e2fsprogs xattr API
John L. Hammond [Thu, 10 Mar 2022 23:20:27 +0000 (17:20 -0600)]
EX-4965 lipe: use new e2fsprogs xattr API

Switch to the ext2fs_xattr_handle API since this handles xattrs stored
in a separate file. Add sanity-lipe test_100() to verify.

Test-Parameters: testlist=sanity-lipe
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ibc39d01ecfb04307755f44751fc69ffbd79b5d69
Reviewed-on: https://review.whamcloud.com/46784
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-4103 lamigo: rename some "check" functions
John L. Hammond [Fri, 29 Oct 2021 16:09:26 +0000 (11:09 -0500)]
EX-4103 lamigo: rename some "check" functions

Despite the name of the feature, files are "hot" or "cold", while
pools are "fast" or "slow". Rename and reorganize some functions with
this in mind:

  lamigo_check_hot -> lamigo_sync_hot_files
  lamigo_new_job_for_hot -> lamigo_submit_sync
  lamigo_check_hot_one -> lamigo_sync_hot_to_fast
  lamigo_check_hot_on_cold -> lamigo_sync_hot_to_slow
  lamigo_get_hot -> lamigo_get_hot_files

Test-Parameters: trivial testlist=hot-pools
Change-Id: I2833c8828d73e50a72db8a19aae16d1400eccd66
Reviewed-on: https://review.whamcloud.com/45410
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-4539 lipe: add lipe_find3 manpage
John L. Hammond [Mon, 7 Mar 2022 20:47:18 +0000 (14:47 -0600)]
EX-4539 lipe: add lipe_find3 manpage

Add a manpage for lipe_find3. Add some undistributed internal notes
for posterity.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I2291b7b8e8d4b8706d4dc5a3aec8b4815acfff10
Reviewed-on: https://review.whamcloud.com/46731
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14587 ptlrpc: remove LASSERT in nrs_polices proc handler
Lei Feng [Wed, 9 Mar 2022 20:08:26 +0000 (12:08 -0800)]
LU-14587 ptlrpc: remove LASSERT in nrs_polices proc handler

It's not necessary to LASSERT() in nrs_polices proc handler.
CERROR() and returning error is good enough.

Lustre-change: https://review.whamcloud.com/45200
Lustre-commit: 9997f94d4b6ee335d2bf86f94bd43464d5b8f061

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I09f06dc4ab90e49b2df66a9b47a74678c64cdd2f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/46769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15615 target: Free t10pi crypto state on error
Oleg Drokin [Fri, 4 Mar 2022 22:10:25 +0000 (17:10 -0500)]
LU-15615 target: Free t10pi crypto state on error

Looks like when error happens we forgot to release crypto state that
not only leaks memory directly, but potentially can tie in-memory
pages too.

Change-Id: Ia0870ccbb194e4e9ca8701e1c01d519745c236df
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
3 years agoLU-15263 quota: fix bug in qmt_pool_recalc
Sergey Cheremencev [Thu, 21 Oct 2021 20:28:01 +0000 (23:28 +0300)]
LU-15263 quota: fix bug in qmt_pool_recalc

env should be freed at the end of qmt_pool_recalc,
as it is needed in qpi_putref. It causes a panic,
if pool has the last reference:
    BUG: unable to handle NULL pointer dereference at 000000000000a0
    IP: lu_context_key_get+0x17/0x30 [obdclass]
    Call Trace:
      lu_object_free.isra.30+0x68/0x170 [obdclass]
      lu_object_put+0xc5/0x3e0 [obdclass]
      qmt_pool_free+0x30c/0x590 [lquota]
      qmt_pool_recalc+0x365/0x1260 [lquota]
      kthread+0xd1/0xe0
      ret_from_fork_nospec_begin+0x21/0x21

Lustre-change: https://review.whamcloud.com/45632
Lustre-commit: 57d88137e12472cf5ea08aa28957b4767abd475c

HPE-bug-id: LUS-10426
Change-Id: Ic23dcb858ff811757f38948aa572c936c076e21e
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46794
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13756 quota: up_read leak in qmt_pool_lookup
Sergey Cheremencev [Thu, 30 Sep 2021 15:58:16 +0000 (18:58 +0300)]
LU-13756 quota: up_read leak in qmt_pool_lookup

qmt_pool_lock is not released if qti_pools_add fails in
qmt_pool_lookup.

Lustre-change: https://review.whamcloud.com/45106
Lustre-commit: d16b3141119a3b75276914ad3601e0dd27579b2b

Change-Id: Ic2adb44468d51af7aefcbb91279260ae6f85d67a
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15079 quota: include qsd_thread_info into mgs thread
Vladimir Saveliev [Tue, 24 Aug 2021 14:57:37 +0000 (17:57 +0300)]
LU-15079 quota: include qsd_thread_info into mgs thread

mgs service thread envs do not get supplied with qsd_thread_info,
which may lead to the failure shown below:
    (lu_object.h:1274:lu_env_info()) ASSERTION( info ) failed:
    Pid: 146951, comm: ll_mgs_0003 3.10.0-957.1.3957.1.3.x4.3.25
    Call Trace:
      libcfs_call_trace+0x8e/0xf0 [libcfs]
      lbug_with_loc+0x4c/0xa0 [libcfs]
      qsd_refresh_usage+0x25e/0x2f0 [lquota]
      qsd_op_adjust+0x2f1/0x730 [lquota]
      osd_object_delete+0x2b2/0x360 [osd_ldiskfs]
      lu_object_free.isra.32+0x68/0x170 [obdclass]
      lu_site_purge_objects+0x2fe/0x530 [obdclass]
      lu_object_find_at+0x371/0xa60 [obdclass]
      dt_locate_at+0x1d/0xb0 [obdclass]
      llog_osd_open+0x50e/0xf30 [obdclass]
      llog_open+0x15a/0x3e0 [obdclass]
      llog_origin_handle_open+0x334/0x720 [ptlrpc]
      tgt_llog_open+0x33/0xe0 [ptlrpc]
      mgs_llog_open+0x46/0x460 [mgs]
      tgt_request_handle+0x96a/0x1680 [ptlrpc]

Supply msg service context with qsd_thread_info.

Lustre-change: https://review.whamcloud.com/45181
Lustre-commit: 69a9042f26fa22b1d5b2ad7b3cb8024d508268dd

Change-Id: If8664b81e1f64df015dad46ba26c9c1d1e3f54bf
HPE-bug-id: LUS-10334
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46792
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15065 quota: fix BIO write performance drop
Sergey Cheremencev [Wed, 15 Sep 2021 15:05:45 +0000 (18:05 +0300)]
LU-15065 quota: fix BIO write performance drop

Before the patch qti_lqes_qunit_min used int to store qunit
value, while lqe_qunit type is _u64. lqe_qunit > 2G caused
an overflow in a local integer argument. For example, when
block hard limit was set to 500TB(i.e. lqe_qunit was about
64TB in a system with 2 OSTs), qti_lqes_qunit_min returned
0 instead of 64TB in a qmt_lvbo_fill. Thus new qunit was not
set on OSTs(qsd_set_qunit wasn't called). Without the qunit,
OST began to send release request after each acquire. For
example, to write 10MB at the OST were sent 2 acquire and
2 release reuests(as qunit was not set on OST). With the
fix, i.e. in a normal case, OST needs just one acquire
request. The issue caused performance drop in a bufferred
write up to 15%-20% if compare with a baseline without PQ
patches.

Note, the issue exists only when a hard limit is set to some
high value(>100GB). The exact hard limit value depends on OSTs
number in a system and on amount of used space, but let's think
that issue doesn't exist on a clean system with 2 OSTs and hard
block limit 100G(this case was checked).

Remove qmt_pool_hash - it is not used anywhere since
"LU-11023 quota: remove quota pool ID".

Lustre-change: https://review.whamcloud.com/45133
Lustre-commit: 7b8c6cd976c584b4e965b24bf4369ded86cda811

HPE-bug-id: LUS-10250
Change-Id: I2c4ce38f5b9395ed1f4868d4c8efc00751116b15
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15191 quota: set correct revoke_time
Sergey Cheremencev [Tue, 12 Oct 2021 15:21:49 +0000 (18:21 +0300)]
LU-15191 quota: set correct revoke_time

When we do qmt_adjust_qunit, there are several lqes
and lqe_revoke_time is set for some of them, it means
appropriate OSTs have been already notified with the
least qunit and there is no chance to free more space.
If a qunit of the current lqe becomes equal to the least
qunit, find an lqe with the minimum(earliest) revoke_time
and set this revoke_time to the current one.

This patch fixes the following case. For example, we have
8 OSTs and 4 MDTs(i.e. 12 slaves) and a pool with just one
OST. Global hard block limit for the user is 50M, and 10M
for this user in a pool. User's usage is 0. As global pool
has 12 slaves it's initial qunit value is 1M, i.e. equal to
the least qunit. At the same time initial qunit value for the
pool with one OST is 4M. When user begins to write, pool's
qunit is decreased to 1M, but lqe_revoke is not set - it
should be set only after sending new qunit to OSTs in
qmt_lvbo_update. However, it won't be send because appropriate
lge_qunit in lqe global array already has the same value.
This problem caused sanity-quota_72 to hang instead of fail
with EDQUOT in test_1_check_write.

Lustre-change: https://review.whamcloud.com/45447
Lustre-commit: e8ecb8775389fb7febd2d0c659f0e80440f0b620

HPE-bug-id: LUS-10516
Change-Id: I5878c1e719ae83a69ad5dbc3653717bb1b4de632
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46790
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15049 quota: fix a panic with pool number > 16
Sergey Cheremencev [Thu, 17 Jun 2021 10:45:42 +0000 (13:45 +0300)]
LU-15049 quota: fix a panic with pool number > 16

Fix a panic that may occur when there are more than 16
pools in a system:
    qti_pools_add()) ASSERTION(qti->qti_pools_num >= QMT_MAX_POOL_NUM)
    Forgot init? ffff91a5f9625800

Lustre-change: https://review.whamcloud.com/45105
Lustre-commit: d2e8208e22f21bb7354a9207f381217c222d3df3

HPE-bug-id: LUS-10116
Change-Id: I4f73b74d2fd3e85a51cf3c30e2eec29645f164be
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46789
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15048 quota: check that qti_lqes has been inited
Sergey Cheremencev [Thu, 22 Jul 2021 10:56:24 +0000 (13:56 +0300)]
LU-15048 quota: check that qti_lqes has been inited

qti_lqes_resotre_{init,fini}() should check that qti_lqes
has been inited before address qti_lqes_count.

Fix helps against following panic:
    qti_lqes_restore_fini() ASSERTION(qmt_info(env)->qti_lqes_rstr)

Lustre-change: https://review.whamcloud.com/45102
Lustre-commit: d2e8208e22f21bb7354a9207f381217c222d3df3

HPE-bug-id: LUS-10239
Change-Id: Ic93d87535f615fe419b2c3a2453506c515837031
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/46788
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14631 quota: fix qunit sort
Sergey Cheremencev [Mon, 5 Apr 2021 12:27:34 +0000 (15:27 +0300)]
LU-14631 quota: fix qunit sort

Fix lqes_cmp that is used to sort lqes by qunit. As lqes_cmp returns
integer, it returns incorrects values if difference between qunits is
grater than 4GB causing write to hang instead of fail with -EDQUOT:

[<ffffffffc0701945>] cl_sync_io_wait+0x295/0x3c0 [obdclass]
[<ffffffffc07026f8>] cl_io_submit_sync+0x1c8/0x360 [obdclass]
[<ffffffffc128dc0a>] vvp_io_commit_sync+0x12a/0x460 [lustre]
[<ffffffffc128f5ee>] vvp_io_write_commit+0x4de/0x620 [lustre]
[<ffffffffc128fa39>] vvp_io_write_start+0x309/0x990 [lustre]
[<ffffffffc0700a18>] cl_io_start+0x68/0x130 [obdclass]
[<ffffffffc0702e8c>] cl_io_loop+0xcc/0x1c0 [obdclass]
[<ffffffffc1243514>] ll_file_io_generic+0x5c4/0xdc0 [lustre]
[<ffffffffc12441b9>] ll_file_aio_write+0x289/0x730 [lustre]
[<ffffffffc1244760>] ll_file_write+0x100/0x1c0 [lustre]
[<ffffffffa0241320>] vfs_write+0xc0/0x1f0
[<ffffffffa024213f>] SyS_write+0x7f/0xf0

The issue is occurred if a user hits block hard limit in a pool (pools
limit 6GB), while global limit is some huge value (53T in my case).

Change global limit in sanity-quota_1e to check that system doesn't
hung anymore.

Lustre-change: https://review.whamcloud.com/43410
Lustre-commit: 9d3ce2985efc315529bf5faf6f3b970cd9949107

HPE-bug-id: LUS-9891
Change-Id: I5a16fd3a40172187bbf35d9a9c9bfeef2ef3a108
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46787
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
3 years agoLU-15186 o2iblnd: Default map_on_demand to 1
Chris Horn [Mon, 1 Nov 2021 20:06:31 +0000 (15:06 -0500)]
LU-15186 o2iblnd: Default map_on_demand to 1

On kernels that provide global MR we default to using that exclusively
even if FMR/FastReg is available. This causes an interop issue if the
active side of a connection request has a higher fragment count than
the passive side  because FMR/FastReg may be needed to map the higher
fragment count. We should change the default map_on_demand to 1 so
that FMR/FastReg is used by default. map_on)demand can still be set
to 0 if needed.

Lustre-change: https://review.whamcloud.com/45431
Lustre-commit: 21fdd616bd4784e4e3571294ba39f00b24a25806

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I76010a905f151efbb0b109ae6f5fba6fb7ce1956
Reviewed-on: https://review.whamcloud.com/46807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15092 o2iblnd: Fix logic for unaligned transfer
Chris Horn [Thu, 16 Sep 2021 17:12:38 +0000 (12:12 -0500)]
LU-15092 o2iblnd: Fix logic for unaligned transfer

It's possible for there to be an offset for the first page of a
transfer. However, there are two bugs with this code in o2iblnd.

The first is that this use-case will require LNET_MAX_IOV + 1 local
RDMA fragments, but we do not specify the correct corresponding values
for the max page list to ib_alloc_fast_reg_page_list(),
ib_alloc_fast_reg_mr(), etc.

The second issue is that the logic in kiblnd_setup_rd_kiov() attempts
to obtain one more scatterlist entry than is actually needed. This
causes the transfer to fail with -EFAULT.

Lustre-change: https://review.whamcloud.com/45216
Lustre-commit: 23a2c92f203ff2f39bcc083e6b6220968c17b475

Test-Parameters: trivial
HPE-bug-id: LUS-10407
Fixes: d226464aca ("LU-8057 ko2iblnd: Replace sg++ with sg = sg_next(sg)")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ifb843f11ae34a99b7d8f93d94966e3dfa1ce90e5
Reviewed-on: https://review.whamcloud.com/46474
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15094 o2iblnd: map_on_demand not needed for frag interop
Chris Horn [Wed, 29 Sep 2021 17:42:26 +0000 (12:42 -0500)]
LU-15094 o2iblnd: map_on_demand not needed for frag interop

The map_on_demand tunable is not used for setting max frags so don't
require that it be set in order to negotiate max frags.

Lustre-change: https://review.whamcloud.com/45215
Lustre-commit: 4e61a4aacdbc2376069d52d0f803a9f05315080f

HPE-bug-id: LUS-10488
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie89f1f035f4b05244feffb848c14582a8c7cf0e6
Reviewed-on: https://review.whamcloud.com/46453
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15095 tests: skip lbug_on_grant_miscount on client
Vladimir Saveliev [Thu, 10 Mar 2022 20:00:25 +0000 (12:00 -0800)]
LU-15095 tests: skip lbug_on_grant_miscount on client

Do not try to specify the lbug_on_grant_miscount=1 module parameter
on client-only builds (el7.9, pcc64le, aarch64) as this is a server
parameter and will not be present if the client is built without
HAVE_SERVER_SUPPORT.  Otherwise, loading ptlrpc.ko will fail.

Lustre-change: https://review.whamcloud.com/46185
Lustre-commit: 49e29f38343ce0389df0aecf308b0986de94c029

Test-Parameters: trivial testlist=sanityn clientdistro=el7.9
Fixes: 2c787065441e ("LU-15095 target: lbug_on_grant_miscount module parameter")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
LU-15095 target: lbug_on_grant_miscount module parameter

Some tests have hit "lctl: error invoking upcall" when setting the
lbug_on_grant_miscount tunable parameter.  Instead, define a module
parameter lbug_on_grant_miscount flag as ptlrpc module parameter,
similar to how it is done for ldiskfs_track_declares_assert.

Lustre-change: https://review.whamcloud.com/45521
Lustre-commit: 2c787065441ee60c6c163dc77851d0964f81a89c

Change-Id: I9cd0f9fa75b37539b23443bbcbb3445c87318ab1
Fixes: bb5d81ea95 ("LU-14543 target: prevent overflowing of tgd->tgd_tot_granted")
Test-Parameters: trivial
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/46768
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14543 target: prevent overflowing of tgd->tgd_tot_granted
Vladimir Saveliev [Wed, 9 Mar 2022 19:16:19 +0000 (11:16 -0800)]
LU-14543 target: prevent overflowing of tgd->tgd_tot_granted

If tgd->tgd_tot_granted < ted->ted_grant then there should not be:
   tgd->tgd_tot_granted -= ted->ted_grant;
which breaks tgd->tgd_tot_granted.
In case of obvious ted->ted_grant damage, recalculate
tgd->tgd_tot_granted using list of exports.

The same change is made for tgd->tgd_tot_dirty.

This patch also adds sanity check for exp->exp_target_data.ted_grant
increase in tgt_grant_alloc() to catch grant counting corruption as
soon as it happened. By default, the detected corruption is
CERROR()-ed, if needed that can be switched to LBUG() using lctl
set_param *.*.lbug_on_grant_miscount.
test-framework.sh:init_param_vars() enables LBUG().

Lustre-change: https://review.whamcloud.com/42129
Lustre-commit: bb5d81ea95502fb5709e176b561b70aa5280ee07

Fixes: af2d3ac30e ("LU-11939 tgt: Do not assert during grant cleanup")
Change-Id: I36ba7496f7b72b4881e98c06ec254a8eefd4c13f
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Cray-bug-id: LUS-9875
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46767
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-9704 grant: ignore grant info on read resend
Vladimir Saveliev [Wed, 9 Mar 2022 20:14:13 +0000 (12:14 -0800)]
LU-9704 grant: ignore grant info on read resend

The following scenario makes a message like "claims 28672 GRANT, real
grant 0" to appear:

 1. client owns X grants and run rpcs to shrink part of those
 2. server fails over so that the shrink rpc is to be resent.
 3. on the clinet reconnect server and client sync on initial amount
 of grants for the client.
 4. shrink rpc is resend, if server disk space is enough, shrink does
 not happen and the client adds amount of grants it was going to
 shrink to its newly initial amount of grants. Now, client thinks that
 it owns more grants than it does from server points of view.
 5. the client consumes grants and sends rpcs to server. Server avoids
 allocating new grants for the client if the current amount of grant
 is big enough:
static long tgt_grant_alloc(struct obd_export *exp, u64 curgrant,
...
        if (curgrant >= want || curgrant >= ted->ted_grant + chunk)
                RETURN(0);
 6. client continues grants consuming which eventually leads to
 complains like "claims 28672 GRANT, real grant 0".

In case of resent of read and set_info:shrink RPCs grant info should
be ignored as it was reset on reconnect.

Tests to illustrate the issue is added.

Lustre-change: https://review.whamcloud.com/45371
Lustre-commit: 38c78ac2e390b30106f3e185d8c4d92b8cb19c2b

HPE-bug-id: LUS-7666
Change-Id: I8af1db287dc61c713e5439f4cf6bd652ce02c12c
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15010 tests: skip sanity test_64g/64h for interop
Andreas Dilger [Sun, 20 Feb 2022 18:43:33 +0000 (11:43 -0700)]
LU-15010 tests: skip sanity test_64g/64h for interop

Sanity test_64g checks code that was only added in 2.14.56.

Lustre-change: https://review.whamcloud.com/46565
Lustre-commit: a57f7708c9e8ecfeca874cda9cebc6b7ced3a9bb

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=64
Fixes: 6e116213e3fd ("LU-15010 mdc: add support for grant shrink")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I339231f1b7890e8fffe7e079a052b15f54d4a050
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-on: https://review.whamcloud.com/46832

3 years agoLU-13717 sec: handle null algo for filename encryption
Sebastien Buisson [Thu, 25 Mar 2021 16:55:35 +0000 (17:55 +0100)]
LU-13717 sec: handle null algo for filename encryption

Encrypted files created with Lustre 2.14 have clear text file names.
With new code implementing filename encryption, newly created files
will have cipher text names, unless they are in an encrypted directory
created in Lustre 2.14.

So we need to make sure llcrypt library can properly handle the "null"
algorithm for client side filename encryption, which is basically a
no-op.
Handling this "null" algo for filename encryption will not be possible
with the in-kernel fscrypt library, so modify the behaviour of
configure to build with embedded llcrypt by default, and only build
against in-kernel fscrypt if explicitly specified via
--enable-crypto=in-kernel configure option.

The objective is to urge users to convert their encrypted directories
to the new fashion that encrypts filenames.
However, with the new code some operations on encrypted files created
with 2.14 might not be possible, like migrate, so expressly forbid
migrate on files that use the "null" algorithm for client side
filename encryption.

Finally, we revert commit 11fcbfa9de4a5170abc2c5df2a6e4e02f0f84268
("LU-12275 sec: force file name encryption policy to null") so that
new encrypted directories will enforce filename encryption.

Lustre-change: https://review.whamcloud.com/43388
Lustre-commit: f18c87cb5362496a4baadaa14265471c992ca06a

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I393945adc9b720a56544b5da0669cb2848507457
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45729
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13717 sec: limit hard links to linkEA size for enc files
Sebastien Buisson [Mon, 19 Oct 2020 14:23:05 +0000 (23:23 +0900)]
LU-13717 sec: limit hard links to linkEA size for enc files

Some operations on encrypted files require to identify all names for
files having the same FID. For instance, for lookup, getattr or unlink
on encrypted files without the encryption key, we need to perform an
operation by FID instead of the actual name.
In order to make operations by FID unambiguous on server side, we
decide to limit the number of possible hard links for encrypted files,
to what the linkEA can contain.
Currently linkEA stores 4KiB of links, that is 14 NAME_MAX links, or
119 16-byte names.

Lustre-change: https://review.whamcloud.com/43387
Lustre-commit: 2ffb8f5726d27e7c2324a3e833491231fdaa3306

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I20a01874899f95b2ff61e05b2aa6851d135633e8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45728
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4866 lipe: don't unmount an empty client list
John L. Hammond [Fri, 11 Mar 2022 15:14:52 +0000 (09:14 -0600)]
EX-4866 lipe: don't unmount an empty client list

In hot-pools.sh, if the node list is empty then don't try to unmount
it since that will only confuse things.

Fixes: e1da905b3884 EX-4866 lipe: don't unmount the local client
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1bf057beffd025a549524e85f02609be9611cccc
Reviewed-on: https://review.whamcloud.com/46800
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
3 years agoLU-15357 mdd: fix changelog context leak
Mikhail Pershin [Wed, 9 Mar 2022 21:17:37 +0000 (13:17 -0800)]
LU-15357 mdd: fix changelog context leak

The mdd_changelog_clear() shouldn't skip llog_ctxt_put()
in case of error.

Lustre-change: https://review.whamcloud.com/45831
Lustre-commit: d083c93c6fd9251d6637d33029049b1d27d2a20a

Fixes: 6b183927e1 (LU-14553 changelog: eliminate mdd_changelog_clear warning)
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9c9aa3ce0d11e8f67470b450d007f2a1081644c6
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46773
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoLU-14699 mdd: proactive changelog garbage collection
Mikhail Pershin [Wed, 9 Mar 2022 21:11:33 +0000 (13:11 -0800)]
LU-14699 mdd: proactive changelog garbage collection

Currently changelog starts garbage collection when user
exceeds maximum idle timeout, there is also limit by amount
of idle records but it is used only for old changelog users
which have no cur_time field, therefore it is not used at
all nowadays. Another problem is that garbage collection is
started only when changelog is almost full. That causes
often situations when changelog might have very old users
staying much longer than idle timeout and having idle
records above maximum limit consuming space for nothing.

Patch reworks changelog GC in the following way:
- GC starts when changelog is almost full (old way) or
  either idle time or idle records limits are exceeded or
  when (idle_time * idle_records) exceeds its limit as well.
  The latest limit is calculated as:
  (idle_time * idle_records) / 84600 > (1 << 32) which is a
  reasonable heuristic for deciding if a user is "too idle"
  in both cases when lots records being created quickly vs
  user is idle a very long time.
- to avoid the processing of changelog users each time GC is
  checking all conditions both least user record and time
  are tracked when changelog users are initialized or
  purged/canceled. Both values are stored as mdd_changelog
  fields mc_minrec and mc_mintime
- test 160g is changed to test the new approach when idle
  indexes are checked always along with idle time checks
- test 160s is added in sanity.sh to check heuristic approach
  with (idle_time * idle_records) value checking

Lustre-change: https://review.whamcloud.com/45068
Lustre-commit: f60b307c5001e1d9035af61d2344af33d3ea0f85

Fixes: 3442db6faf68 ("LU-7340 mdd: changelogs garbage collection")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6028f3164212a2377a4fc45b60a826c64f859099
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14058 tests: handle more MDTs in sanity.sh
Andreas Dilger [Wed, 9 Mar 2022 20:58:08 +0000 (12:58 -0800)]
LU-14058 tests: handle more MDTs in sanity.sh

Fix up sanity.sh test_160 to handle configurations with more MDTs.
The "fnv_1a_64" hash is _relatively_ uniform and harder to break
under normal (ab)use, it doesn't leave totally entries balanced.
Even "all_chars" hash has a repeat MDT every handful of entries.
Since we need perfect balance across MDTs, use "lfs mkdir -i".

Fix a bug in test_160g that wasn't setting changelog_max_idle_indexes
properly for test systems with more than 4 MDTs.

Lustre-change: https://review.whamcloud.com/41485
Lustre-commit: 173bccd140adf69ce08c20810a69e783c8c12595

Test-Parameters: trivial testlist=sanity env=ONLY=160,230 mdtcount=8
Fixes: 489afbe69d5b ("LU-13321 tests: force even DNE file distribution")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I08bf2274a00fe1c6e52ec1a55f50dc8662d354a9
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-4015 lipe: implement lazy size and blocks
John L. Hammond [Fri, 4 Mar 2022 14:17:48 +0000 (08:17 -0600)]
EX-4015 lipe: implement lazy size and blocks

If the current file is an OST object or not a regular file then we use
the size and blocks values from the inode. (But this is wrong for
striped directories.) If the current file is an regular MDT inode then
we check for strict or lazy SOM, followed by HSM released, followed by
unstriped.

Rename loa_attr_bits to loa_valid. Add new fields loa_noattr and
loa_error to distinguish among the cases of xattrs we haven't tried to
read, xattrs which are not set, and xattrs which could not be read (or
parsed).

Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5b197dd7989a3f618c97c9025a4bd534dfe86152
Reviewed-on: https://review.whamcloud.com/46698
Tested-by: jenkins <devops@whamcloud.com>
3 years agoEX-4015 lipe: use direct IO
John L. Hammond [Tue, 1 Mar 2022 20:16:52 +0000 (14:16 -0600)]
EX-4015 lipe: use direct IO

Use direct IO by default in lipe_scan3. Retry ext2fs_open() without
EXT2_FLAG_DIRECT_IO if it fails. Add a --direct-io=0|1 option to
explicitly disable or enable direct IO.

Add an --io-options option to pass down ext2 io_manager options.

Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I25347949bbff9e697da26431807daf37cfb720fa
Reviewed-on: https://review.whamcloud.com/46682
Tested-by: jenkins <devops@whamcloud.com>
3 years agoEX-4539 lipe: add -xattr and -xattr-match
John L. Hammond [Mon, 28 Feb 2022 14:05:18 +0000 (08:05 -0600)]
EX-4539 lipe: add -xattr and -xattr-match

Add '-xattr NAME' and '-xattr-match NAME VALUE' tests to
lipe_find3. Add sanity-lipe-find3 test_111() to verify.

Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I32c077f99d495cd79e670efef59e4a2939af753f
Reviewed-on: https://review.whamcloud.com/46681
Tested-by: jenkins <devops@whamcloud.com>
3 years agoEX-4166 lipe: lamigo test coverage for OSS
Alexandre Ioffe [Thu, 3 Mar 2022 00:39:45 +0000 (16:39 -0800)]
EX-4166 lipe: lamigo test coverage for OSS

Add test for lamigo ALR with multiple OSS's
Add debug trace point to report update message
from ofd_access_log_reader

Test-Parameters: trivial testlist=hot-pools
Test-Parameters: trivial testlist=hot-pools mdscount=2 osscount=2 mdtcount=2 ostcount=8
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iaae847190426ff34d8991e8a571b3e38616bc4c9
Reviewed-on: https://review.whamcloud.com/46686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-4866 lipe: don't unmount the local client
Alexandre Ioffe [Fri, 25 Feb 2022 22:00:41 +0000 (14:00 -0800)]
EX-4866 lipe: don't unmount the local client

Exclude local client from unmount list when test is
completed in hot-pools test framework.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I6b12269b6af3d3b5465645cbc007c9a5302f64a1
Reviewed-on: https://review.whamcloud.com/46671
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
3 years agoEX-4539 lipe: lipe_find3 print updates
John L. Hammond [Fri, 25 Feb 2022 23:12:54 +0000 (17:12 -0600)]
EX-4539 lipe: lipe_find3 print updates

Change -print-json to accept a comma separated list of
attributes. Optional attributes may be specified by placing them
inside brackets. For example "lipe_find3 DEVICE -print-json
'uid,gid,som,[size,blocks]' will only print JSON for inodes with a
valid UID, GID, and SoM atrribute. If in addition the size and blocks
attributes of the inode are valid then they will be included in the
object as well. Support a pass-through --list-json-attrs option.

Change the default action to print to relative path. Adjust
sanity-lipe-find3 accordingly.

Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Id380ca21e2b1aabf30f65fd3e14b7e2f7808d0a6
Reviewed-on: https://review.whamcloud.com/46630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: add -blocks, -crtime, -mirror-count, -stripe-count
John L. Hammond [Thu, 24 Feb 2022 23:17:38 +0000 (17:17 -0600)]
EX-4015 lipe: add -blocks, -crtime, -mirror-count, -stripe-count

Add -blocks, -crtime, -mirror-count, and -stripe-count to
lipe_find3. Add (crtime), (lov-mirror-count), and (lov-stripe-count)
to lipe_scan3.

Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5b4314a9621309b00453fea637329d3de442544a
Reviewed-on: https://review.whamcloud.com/46607
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: batching and threading improvements
John L. Hammond [Thu, 24 Feb 2022 16:56:59 +0000 (10:56 -0600)]
EX-4015 lipe: batching and threading improvements

Remove the current group descriptor mutex from struct
scan_control. Use __ATOMIC_RELAXED fetch and add to allocate the next
batch of groups. Report the correct start group of the batch in
debugging output and remove a redundant batch debug message.

Use atomic loads and stores for the ti_should_stop member which is
responsible for lipe-scan-break. Check if the current thread should
stop in the outer loop of ls3_scan_thread_start_scm() as well as the
inner loop of ldiskfs_scan_groups().

Reduce the default scanning thread count from _SC_NPROCESSORS_ONLN / 2
to _SC_NPROCESSORS_ONLN / 4.

Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic99c27504333f1d63a689e091d857e44062ef584
Reviewed-on: https://review.whamcloud.com/46605
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: add lipe-scan RPM
John L. Hammond [Mon, 21 Feb 2022 14:31:20 +0000 (08:31 -0600)]
EX-4015 lipe: add lipe-scan RPM

Adding new dependencies to existing EXAScaler RPMs may create
headaches when distributing hotfixes to existing installs. So move
lipe_find3 and lipe_scan3 to a new RPM (lipe-scan). This also has the
benefit of explicitly severing the new scanning tools from any python2
RPM or pip dependencies.

Compile fid.scm and find.scm to (%site-ccache-dir)/lipe/.

Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ifecb5ab1f399ba9be8cb395ded29d6394b13dc86
Reviewed-on: https://review.whamcloud.com/46572
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4539 lipe: remove -coverage
John L. Hammond [Tue, 15 Feb 2022 17:39:23 +0000 (11:39 -0600)]
EX-4539 lipe: remove -coverage

Remove -coverage from CFLAGS for lipe_scan3 and lipe_find3.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8be003650574104d5eaa8298043ba789e6464fde
Reviewed-on: https://review.whamcloud.com/46532
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4539 lipe: add -pool to lipe_find3
John L. Hammond [Tue, 22 Feb 2022 15:43:28 +0000 (09:43 -0600)]
EX-4539 lipe: add -pool to lipe_find3

Add a -pool test to lipe_find3. Add sanity-lipe-find3 test_110() to
verify.

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I9649d7f80431d22223da17372ec4d64fa6ca2f37
Reviewed-on: https://review.whamcloud.com/46584
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4539 lipe: add -perm to lipe_find3
John L. Hammond [Mon, 21 Feb 2022 20:53:42 +0000 (14:53 -0600)]
EX-4539 lipe: add -perm to lipe_find3

Fill in the -perm test in lipe_find3. Populate sanity-lipe-find3
test_101() to verify.

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib201503247101619416c39ae97f5068230441863
Reviewed-on: https://review.whamcloud.com/46576
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4539 lipe: add lipe_find3
John L. Hammond [Tue, 8 Feb 2022 14:20:11 +0000 (08:20 -0600)]
EX-4539 lipe: add lipe_find3

Add a lipe_find3 wrapper around the lipe_scan3 scanner and test script
sanity-lipe-find3.sh.

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I2259170e8b71a94394009aeaf9878a17c2a3fa6d
Reviewed-on: https://review.whamcloud.com/46417
Tested-by: jenkins <devops@whamcloud.com>
3 years agoEX-4015 lipe: add make-prompt-tag hack
John L. Hammond [Mon, 21 Feb 2022 14:20:27 +0000 (08:20 -0600)]
EX-4015 lipe: add make-prompt-tag hack

Hack guile (make-prompt-tag ...) to return a list instead of a
gensym. This reduces catch overhead in multi-threaded code and is what
guile does eventually (see guile commit 283ab48d3f). Add this along
with some comments to ls3_scm_init() and rename that function to
ls3_module_init().

Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I2ddd324b290ee4e985bba171681927b9434bbcc5
Reviewed-on: https://review.whamcloud.com/46571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15559 tests: add do_node_vp() and do_facet_vp()
John L. Hammond [Wed, 16 Feb 2022 17:54:26 +0000 (11:54 -0600)]
LU-15559 tests: add do_node_vp() and do_facet_vp()

Add new test-framework functions (do_node_vp() and do_facet_vp())
which carefully escape and quote command lines for execution on the
local or remote node. Add sanityn test_0 to verify.

Lustre-change: https://review.whamcloud.com/46535

Test-Parameters: trivial env=ONLY="0" testlist=sanityn
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic491b0148e6ef11ecd0b3ccce983afcf4d1300e5
Reviewed-on: https://review.whamcloud.com/46537
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: add lma, filter_fid, and hsm json attributes
John L. Hammond [Thu, 17 Feb 2022 20:56:36 +0000 (14:56 -0600)]
EX-4015 lipe: add lma, filter_fid, and hsm json attributes

Add "lma", "filter_fid", and "hsm" json attributes. Use struct
lustre_mdt_attrs rather than broken out fields in loa. Add
sanity-lipe-scan3 tests 111, 112, 113 to "verify".

Combine init_lipe_scan3_env and init_lipe_scan3_env_file.

Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1856a896d192d08f9e16b9ac764030907256f79c
Reviewed-on: https://review.whamcloud.com/46547
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: cache layout in loa
John L. Hammond [Thu, 17 Feb 2022 19:51:32 +0000 (13:51 -0600)]
EX-4015 lipe: cache layout in loa

In ldiskfs_read_attr_lov(), cache the decoded llapi_layout() in the
current object attrs. Then resue this layout in lov-pools. Add
lov-ost-indexes to return a list of all object OST indexes for the
current file.

Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I3451857fc25f1f9507b6e185bad39fcb3f0e6f22
Reviewed-on: https://review.whamcloud.com/46546
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: reduce ls3_object_attrs size
John L. Hammond [Thu, 17 Feb 2022 15:52:59 +0000 (09:52 -0600)]
EX-4015 lipe: reduce ls3_object_attrs size

Reduce the size of struct ls3_object_attrs from 131KB to 360B by heap
allocating the link and lmv xattr buffers.

Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5b210cb46eb027dbc922675deb1231a544b93d6a
Reviewed-on: https://review.whamcloud.com/46544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: define lipe scheme module
John L. Hammond [Mon, 14 Feb 2022 18:14:41 +0000 (12:14 -0600)]
EX-4015 lipe: define lipe scheme module

Avoid spurious "possibly undefined symbol" warnings from the guile
compiler by placing all of the snarfed definitions into a "lipe"
module. Add the fid accessors to a "lipe fid" module.

Test-Parameters: testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ifbfee81422b1a3df22ee23f1945577c29e485aec
Reviewed-on: https://review.whamcloud.com/46525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: handle trusted xattrs uniformly
John L. Hammond [Fri, 11 Feb 2022 18:11:28 +0000 (12:11 -0600)]
EX-4015 lipe: handle trusted xattrs uniformly

Add a wrapper (ldiskfs_trusted_xattr_get()) around ext2fs_attr_get()
to do uniform error messages and error handling.

Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5ad82a56b7729354364afa594b3d8d9ee83a4b7f
Reviewed-on: https://review.whamcloud.com/46513
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4015 lipe: add fid2path cache to lipe_scan3
John L. Hammond [Fri, 11 Feb 2022 17:05:15 +0000 (11:05 -0600)]
EX-4015 lipe: add fid2path cache to lipe_scan3

Add a thread local directory fid2path cache to lipe_scan3. Without the
cache, as single scanning thread could expect to do about 3K fid2path
operations per second. After the cache the rate improves to about
70K. We set the max cache size to 1024 FIDs and use LRU to reclaim
slots. Based on this a full cache will use about 4MB of memory per
thread.

Test-Parameters: testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8a022665de78e6b599f2b4c4f1e2b7400d4d8ffe
Reviewed-on: https://review.whamcloud.com/46509
Tested-by: jenkins <devops@whamcloud.com>
3 years agoEX-4015 lipe: add lipe_scan3
John L. Hammond [Thu, 3 Feb 2022 18:21:58 +0000 (12:21 -0600)]
EX-4015 lipe: add lipe_scan3

Add a guile embedded lipe scanner (lipe_scan3) and test script
sanity-lipe-scan3.sh.

Test-Parameters: testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I059fb4044db5baff76a04247fb8e3cbec82e5448
Reviewed-on: https://review.whamcloud.com/46416
Tested-by: jenkins <devops@whamcloud.com>
3 years agoLU-15068 ptlrpc: Do not unlink difficult reply until sent
Chris Horn [Tue, 5 Oct 2021 19:11:29 +0000 (14:11 -0500)]
LU-15068 ptlrpc: Do not unlink difficult reply until sent

If a difficult reply is queued in LNet, or the PUT for it is
otherwise delayed, then it is possible for the commit callback
to unlink the reply MD which will abort the send. This results in
client hitting "slow reply" timeout for the associated RPC and
an unnecessary reconnect (and possibly resend).

This patch replaces the rs_on_net flag with rs_sent and rs_unlinked.
These flags indicate whether the send event for the reply MD has
been generated, and whether the MD has been unlinked, respectively.

If rs_sent is set, but rs_unlinked has not been set, then ptlrpc_hr
is free to unlink the reply MD as a result of the commit callback.
The reply-ack will simply be dropped by the server.

If ptlrpc_hr is processing the reply because of commit callback, and
rs_sent has not been set, then ptlrpc_hr will not unlink the reply
MD. This means that the reply_out_callback must also be modified to
check for this case when the send event occurs. Otherwise, if the ACK
never arrives from the client, then the MD would never be unlinked.
Thus when the send event occurs, and rs_handled is set, the
reply_out_callback will schedule the reply for handling by ptlrpc_hr.

Lustre-change: https://review.whamcloud.com/45138
Lustre-commit: 5c156b48425aae245537aaf10229734166463347

HPE-bug-id: LUS-10505
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib8f4853c7ab35d72624fce7ee3fba9e59a746e1f
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14865 utils: llog_reader.c printf type mismatch
Gian-Carlo DeFazio [Wed, 9 Mar 2022 07:23:30 +0000 (23:23 -0800)]
LU-14865 utils: llog_reader.c printf type mismatch

Add (unsigned long long) cast to results of
__le64_to_cpu so that it matches the formatting (%llu)
of the enclosing printf call.

Build log message:
"llog_reader.c:887:9: error: format '%llu' expects
argument of type 'long long unsigned int', but
argument 3 has type '__u64' [-Werror=format=]"

Lustre-change: https://review.whamcloud.com/44346
Lustre-commit: 14b8276e06d6f4e3bfe785df1165458555e406f3

Test-Parameters: trivial
Fixes: 9962d6f84db5 LU-14617 utils: llog_reader updatelog support
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I9549e0a0bd21727dfcc42992b693bc39a779e1a1
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46757
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14617 utils: llog_reader updatelog support
Alexander Boyko [Wed, 9 Mar 2022 07:14:44 +0000 (23:14 -0800)]
LU-14617 utils: llog_reader updatelog support

The patch adds printing UPDATE_REC for llog_reader. It is usefull
for updatelog analyze. Here is an example of record

 [0x50001a21b:0x1233d:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x50001a211:0x475:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x3800182e3:0x475:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x200032c9a:0x245:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x200000001:0x15:0x0] type:write/12 params:2 p_0:3 p_1:4
 p_0 - 12/trusted.lov
 p_1 - 0/
 p_2 - 25972/\x0100000000000000000000000000000000000000000002000...
 p_3 - 25974/\x0800000000000000P\xD1AB006x0000000400EC^\x000000...
 p_4 - 1/

llog logic processing base on incrementing record index,
the fix adds checks for it. Also adds more info from header,
and drops useless - Bit X not set.

Lustre-change: https://review.whamcloud.com/43343
Lustre-commit: 9962d6f84db5fd587bbe13640a9361c2872f3728

Test-Parameters: trivial
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id50de15040526dc07ae708ac5db046832706be31
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46756
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14876 out: don't connect to busy MDS-MDS export
Mikhail Pershin [Wed, 9 Mar 2022 08:45:38 +0000 (00:45 -0800)]
LU-14876 out: don't connect to busy MDS-MDS export

MDS-MDS connection is missing check for busy requests upon
reconnect, so resent can be executed concurrently with
original request.

- in ptlrpc_server_check_resend_in_progress() remove exception
  for bulk requests, they can be compared by XID nowadays.
  This prevents OUT requests vs resent execution as well.
- fix messages in target_handle_connect() to report correct
  information about connection details
- in out_handle() check for last_xid only once per OUT_UPDATE
- test 110m is added to recovery-small to reproduce the issue

Lustre-change: https://review.whamcloud.com/44390
Lustre-commit: 301d76a71176c186129231ddd1323bae21100165

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2ad183674d59a2cdeab0037bd8551c607b10ffeb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4583 lipe: show all lpcc information in 'lpcc status' command
Lei Feng [Wed, 19 Jan 2022 02:02:20 +0000 (21:02 -0500)]
EX-4583 lipe: show all lpcc information in 'lpcc status' command

Collect lpcc config, lpcc_purge stats, and lustre stats related to
lpcc to together and show in 'lpcc status' command. 'lpcc status-all'
is an alias of 'lpcc status'. The output of 'lpcc status' looks like:
{
    "/mnt/lustre": {
        "pcc": [
            {
                "mount": "/mnt/lustre",
                "cache": "/mnt/pcc",
                ...
            },
            {
                "mount": "/mnt/lustre",
                "cache": "/mnt/pcc2",
                ...
            }
        ],
        "fs_stats": {
        }
}

Change-Id: I032763fb3b45646330b13f5cef34ce8658bddfe4
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc
Reviewed-on: https://review.whamcloud.com/46191
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-4433 pcc: add some statistics data
Lei Feng [Thu, 6 Jan 2022 01:34:15 +0000 (20:34 -0500)]
EX-4433 pcc: add some statistics data

Add statictics of the number and total size of pcc attached files
and pcc hit files.

Change-Id: Ib0e429c636298d4c6ff06d84a416073895b86184
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45976
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoEX-4893 wrong dependencies for lustre-client-modules deb package
Alex Deiter [Thu, 24 Feb 2022 14:21:01 +0000 (14:21 +0000)]
EX-4893 wrong dependencies for lustre-client-modules deb package

Fixed dependencies for DKMS deb package:
- added autocon, automake and libtool
- added bison and flex
- added required dev packages
- added linux-base and linux-image
- added python3-distutils-extra to fix build on Ubuntu 16.04

Change-Id: Ic1d05155cd8ad056dece1d22d0f040695d038652
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/46604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15142 lctl: fixes for set_param -P and llog_print
Mikhail Pershin [Thu, 14 Oct 2021 14:16:21 +0000 (17:16 +0300)]
LU-15142 lctl: fixes for set_param -P and llog_print

- properly handle permanent param deletion
- don't print skipped parameters in llog_print output
- add --raw option to llog_print to output all entries
  including markers

Lustre-change: https://review.whamcloud.com/45332
Lustre-commit: 2a5b50d207173ca1ac71be8dfc39f98a2773bc3a

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id93a206a255dc885343efa293e1ee2672493e5e5
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13107 utils: clean up lctl command usage
Andreas Dilger [Sat, 28 Dec 2019 09:42:54 +0000 (02:42 -0700)]
LU-13107 utils: clean up lctl command usage

The lctl usage is confusing because it lists a number of valid
commands after "testing (DANGEROUS)", such as LFSCK and llog.

Move the useful commands before the "testing" section so it is
not mis-interpreted as all following commands are dangerous.
Group some other commands together with more related commands,
rather than whatever order they happened to be imlpemented in.

Remove function prototypes for commands that no longer exist.

Lustre-change: https://review.whamcloud.com/37108
Lustre-commit: b0efebdaef52d8ac9b02857166ceb00079612ebc

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I469f9c92953762cc46a68e44238c4b67ebacab07
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/46637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
3 years agoLU-15316 tests: use integers in sanity test_255a
Andreas Dilger [Fri, 28 Jan 2022 05:51:24 +0000 (22:51 -0700)]
LU-15316 tests: use integers in sanity test_255a

The [[ ... > ... ]] operator doesn't really compare floats, it
compares strings.  That works as expected if the strings are
the same length, but fails for comparisons like [[ 32 > 123 ]].
Use (( ... > ... )) for comparisons, and only use integer values.

This test has been failing intermittently forever, but the error
was ignored because of running in a VM.

Lustre-change: https://review.whamcloud.com/46350
Lustre-commit: TBD (from a96a4a5894bef714b19086fa09918080f05a7674)

Test-Parameters: trivial
Fixes: f3b8f3fad502 ("tests: fix float comparison in sanity test_255a")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6787082cd579ae3f1bdd43222a739c939d3ebbe5
Reviewed-on: https://review.whamcloud.com/46618
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4889 configure script does not check for required build tools
Alex Deiter [Thu, 24 Feb 2022 12:15:32 +0000 (12:15 +0000)]
EX-4889 configure script does not check for required build tools

- added check for flex and bison
- added requirement for build kernel modules

Change-Id: I4f4f19ea44f3cd8f69482d950970bf701e81f7ec
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/46602
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoRM-620 build: New tag 2.14.0-ddn38
Andreas Dilger [Wed, 9 Mar 2022 18:04:23 +0000 (11:04 -0700)]
RM-620 build: New tag 2.14.0-ddn38

New tag 2.14.0-ddn38

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id8a05c166302a48b4553ec76922b70c665763277

3 years agoLU-15340 llite: Delay dput in ll_dirty_page_discard_warn
Oleg Drokin [Wed, 8 Dec 2021 04:30:06 +0000 (23:30 -0500)]
LU-15340 llite: Delay dput in ll_dirty_page_discard_warn

Otherwise we can be final dput and need to wait for pages
to clear which is bad because this is called from ptlrpcd
that is not supposed to block esp. for network traffic as
it can cause livelocks if it happens to be needed to kill
the very same RPC we are waiting on.

Additionally pass in the inode from IO since the page
we are using might come from directio and that is
probably not even a valid inode.

Lustre-change: https://review.whamcloud.com/45784
Lustre-commit: a1d75780ba19cfca53cbacf0d38e8d7df540b209

Fixes: 624a3ac23393 ("LU-921 llite: warning in case of discarding dirty pages")
Change-Id: Ie2f1a34047145202c11a4e1a0b18b2e01d9e4601
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/46635
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoEX-4408 lipe: collect some statistic data with lpcc_purge
Lei Feng [Fri, 24 Dec 2021 06:10:47 +0000 (01:10 -0500)]
EX-4408 lipe: collect some statistic data with lpcc_purge

Collect the number of cached files, min/max/avg file size,
min/max/avg age of files in LPCC. Scan cache device forcefully
even if it is not full enough to collect the statistics data
in time.

Change-Id: Id716d4689c83ecc5754e41734e44e7c051d36a8e
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc
Reviewed-on: https://review.whamcloud.com/45937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-15381 hsm: update size upon completion of data version
Qian Yingjin [Fri, 17 Dec 2021 08:53:37 +0000 (16:53 +0800)]
LU-15381 hsm: update size upon completion of data version

We found a HSM retore followed by a HSM release will set the
file size with 0 wrongly during the tests.
The reason is that the file size and blocks information is
incorrect obtained via @ll_merger_attr().
The data version operation will flush dirty pages from all
clients, the size and blocks information returns from the Lustre
OST is correct.
In this patch, we update the size and block attributes for a file
upon the completion of the data version operation accordingly.
By this way, HSM release will set the size and blocks information
correctly after data version ioctl operation.

Add sanity-hsm test_261.

lustre-change: https://review.whamcloud.com/45935
lustre-commit: dd3b5601ec6905b00d07cbcb8c139c46dd555b3b

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ifdbf6b58ecd00dc9677a2328438ef68529b72882
Reviewed-on: https://review.whamcloud.com/45935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46336
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14124 target: set OBD_MD_FLGRANT in read's reply
Vladimir Saveliev [Wed, 20 Oct 2021 10:32:11 +0000 (13:32 +0300)]
LU-14124 target: set OBD_MD_FLGRANT in read's reply

If tgt_grant_shrink() decides to not shrink grants - a client is
supposed to restore its cl_grant_avail in osc_update_grant(). In case
of read OBD_MD_FLGRANT is not set on reply's body->oa.o_valid, so
osc_update_grant() misses the cl_grant_avail update. As result server
keeps thinking that client has a lot of grants while a client thinks
that it is missing grants badly. That may lead to performance
degradation.

A test to illustrate the issue is included.

Lustre-change: https://review.whamcloud.com/43375
Lustre-commit: 4894683342d77964daeded9fbc608fc46aa479ee

Change-Id: Ibe7ce0af5701226c8be3ae3f9ad57c354791fa0f
HPE-bug-id: LUS-9943
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46468
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-12807 tests: fix intermittent runtests failure
Andreas Dilger [Wed, 11 Aug 2021 20:49:19 +0000 (14:49 -0600)]
LU-12807 tests: fix intermittent runtests failure

Occasional runtests failures are seen in full testing on ldiskfs.
Increase the llog space limit to 72KB from 50KB due to seeing a
regular failures in the 52/64KB range.

Lustre-change: https://review.whamcloud.com/44614
Lustre-commit: 14d07b623731233a62a8acd021c8ccdcb2705371

Test-Parameters: trivial testlist=runtests
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6e272fe9fec21a650110a42efe31a1dc48e35854
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46463
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 years agoLU-12752 mdt: commitrw_write() - check dying object under lock
Vladimir Saveliev [Mon, 1 Mar 2021 08:52:51 +0000 (11:52 +0300)]
LU-12752 mdt: commitrw_write() - check dying object under lock

If process writes to unlinked file the following race between
mdt_commitrw_write() and mdd_close() may occur because
mdt_commitrw_write() checks whether an object is dying without lock:

mdt_commitrw_write() checks lu_object_is_dying(&mo->mot_header) and it
not yet

mdd_close() interposes and destroys the object via
  mdo_destroy()
    lod_destroy()
      lod_sub_destroy()
        osd_destroy()
          obj->oo_destroyed = 1;

mdt_commitrw_write() continues, locks the object and returns ENOENT
from

  dt_attr_get()
    osd_attr_get()
      if (unlikely(obj->oo_destroyed))
        return -ENOENT;

If the file is built of DoM and raid component ll_delete_inode() calls
cl_sync_file_range() which is to iterate over both mdt and raid
components via mdc_io_fsync_start() and osc_io_fsync_start().  As
mdc_io_fsync_start() fails with -ENOENT due to failed write rpc,
osc_io_fsync_start() does not get called. Then
truncate_inode_pages_final() finds not-discarded pages and fails with:

  (osc_page.c:183:osc_page_delete()) Trying to teardown failed: -16
  (osc_page.c:184:osc_page_delete()) ASSERTION( 0 ) failed:
  (osc_page.c:184:osc_page_delete()) LBUG

Test to illustrate the issue is added.

The fix is to call lu_object_is_dying() under object lock.

Lustre-change: https://review.whamcloud.com/41797
Lustre-commit: d48a0ebb5a8d5d49684325434b503e8aab085397

Change-Id: I463c8a6f85d4f5fd934b167c6194f50ae9d4b7d4
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46612
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13055 libcfs: allow comma-separated masks
Andreas Dilger [Wed, 9 Mar 2022 08:34:43 +0000 (00:34 -0800)]
LU-13055 libcfs: allow comma-separated masks

For debug and changelog mask names, allow a comma-separated list
of names to be given, so that the space-separated list does not
need to be quoted for use.

Change sanity-quota to use a comma-separated list to verify it works.

Fix a couple of test cases where the debug parameter is set and
printed overly verbosely during tests.

Lustre-change: https://review.whamcloud.com/43741
Lustre-commit: 6b6fde1026311a28595ea43af56392ca6ad24d79

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icf1e3ebc74f0e48b38a65486b2275ec4c33ebbe5
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-15218 quota: delete unused quota ID
Hongchao Zhang [Fri, 21 Jan 2022 00:43:56 +0000 (08:43 +0800)]
LU-15218 quota: delete unused quota ID

Add lfs option '--delete' to delete unused quota ID.

Lustre-change: https://review.whamcloud.com/45548
Lustre-commit: 78be823f33396819724330d7154f054c52e11944

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I0d8e6b61dc23c7b22b6054bcced087b8dc94a277
Reviewed-on: https://review.whamcloud.com/46610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-13799 llite: Move free user pages
Patrick Farrell [Wed, 19 Jan 2022 15:46:47 +0000 (10:46 -0500)]
LU-13799 llite: Move free user pages

It is incorrect to release our reference on the user pages
before we're done with them - We need to keep it until the
i/o is complete, otherwise we access them after releasing
our reference.  This has not caused any known bugs so far,
but it's still wrong.

So only drop these references when we free the aio struct,
which is only freed once i/o is complete.

Also rename free_user_pages to release_user_pages, because
it does not free them - it just releases our reference.

This also helps performance by moving free_user_pages to
the daemon threads.  This is a 5-10% boost.

This patch reduces i/o time in ms/GiB by:
Write: 18 ms/GiB
Read: 19 ms/GiB

Totals:
Write: 180 ms/GiB
Read: 178 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5183 MiB/s
read      5201 MiB/s

Plus this patch:
write        5702 MiB/s
read         5756 MiB/s

Lustre-change: https://review.whamcloud.com/39443
Lustre-commit: 7f9b8465bc1125e51aa29cdc27db9a9d6fdc0b89

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5cf2201e5fd4eeee5b4c996de51d3a6a5394ae34
Reviewed-on: https://review.whamcloud.com/44685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>