Whamcloud - gitweb
fs/lustre-release.git
2 years agoEX-3687 osp: do force disconnect if import is not ready 53/44753/4
Mikhail Pershin [Wed, 25 Aug 2021 17:03:47 +0000 (20:03 +0300)]
EX-3687 osp: do force disconnect if import is not ready

Send OSP_DISCONNECT only on health import. Otherwise,
force local disconnect for unhealthy imports.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Icd9f171271f4e17a65503fcc710ad3aaa2b84e1e
Reviewed-on: https://review.whamcloud.com/44753
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14997 tests: Register "stack_trap" for sanity/104c 82/44882/3
Arshad Hussain [Thu, 9 Sep 2021 09:18:42 +0000 (05:18 -0400)]
LU-14997 tests: Register "stack_trap" for sanity/104c

This patch is a minor improvement for calling cleanup
through 'stack_trap' versus doing right at the end of
the script.

Fixes: 8ee6e1c8825c ("LU-14565 ofd: Do not rely on tgd_blockbit")
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iae2ca81091e0119f2117f4cd57b5cc2f6ac38c6c
Reviewed-on: https://review.whamcloud.com/44882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-14990 tests: Detect correct LNet interface for sanity-lnet 57/44857/2
Chris Horn [Tue, 7 Sep 2021 15:24:14 +0000 (10:24 -0500)]
LU-14990 tests: Detect correct LNet interface for sanity-lnet

Determine the names of the interfaces used for LNet by parsing the
NIDs configured after calling load_modules(). Tests which reference
eth0 are modified to use the interface associated with the primary
NID (i.e. first NID output by lctl list_nids).

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10385
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id715aa3e5470d9c110f6248620b1a83920875e7b
Reviewed-on: https://review.whamcloud.com/44857
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14782 kernel: new kernel [SLES15 SP3 5.3.18-59.19.1] 62/44062/5
Jian Yu [Mon, 6 Sep 2021 02:19:07 +0000 (19:19 -0700)]
LU-14782 kernel: new kernel [SLES15 SP3 5.3.18-59.19.1]

This patch makes changes to support new SLES15 SP3 release
with kernel 5.3.18-59.19.1 for Lustre client.

Test-Parameters: trivial

Change-Id: Idf6fad9773dd242c02859a5c7b14401675c4ecf4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44062
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14991 tests: Correct whitespace in sanity-lnet test_101/102 56/44856/2
Chris Horn [Tue, 7 Sep 2021 15:47:06 +0000 (10:47 -0500)]
LU-14991 tests: Correct whitespace in sanity-lnet test_101/102

sanity-lnet.sh test_100 and test_101 use tab characters in the
expected yaml output, but yaml syntax does not allow tab characters.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: a5cbe7883d ("LU-12815 socklnd: allow dynamic setting of conns_per_peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0814f1965414f82cdc696cfe9996b33e863df982
Reviewed-on: https://review.whamcloud.com/44856
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14934 kernel: kernel update SLES12 SP5 [4.12.14-122.83.1] 48/44848/2
Jian Yu [Mon, 6 Sep 2021 01:47:38 +0000 (18:47 -0700)]
LU-14934 kernel: kernel update SLES12 SP5 [4.12.14-122.83.1]

Update SLES12 SP5 kernel to 4.12.14-122.83.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I2b35d129550b895324bb3e2e61910ad10e846f03
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44848
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14965 ldiskfs: hold inode mutex for ldiskfs_orphan_add() 54/44754/3
Bobi Jam [Thu, 26 Aug 2021 10:19:11 +0000 (18:19 +0800)]
LU-14965 ldiskfs: hold inode mutex for ldiskfs_orphan_add()

See following warning:

ldiskfs/namei.c:3331 ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
Call Trace:
dump_stack+0x19/0x1b
__warn+0xd8/0x100
warn_slowpath_null+0x1d/0x20
ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
ldiskfs_xattr_inode_orphan_add+0xbb/0x110 [ldiskfs]
ldiskfs_xattr_delete_inode+0x5c/0x350 [ldiskfs]
ldiskfs_evict_inode+0x1a8/0x630 [ldiskfs]
evict+0xb4/0x180
iput+0xfc/0x190
osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
lu_object_free.isra.27+0xb8/0x1c0 [obdclass]
lu_object_put+0xa5/0x460 [obdclass]
mdt_object_put+0x30/0x110 [mdt]
mdt_reint_unlink+0x8e0/0x1890 [mdt]
mdt_reint_rec+0x83/0x210 [mdt]
mdt_reint_internal+0x720/0xaf0 [mdt]
mdt_reint+0x67/0x140 [mdt]
tgt_request_handle+0x7ea/0x1750 [ptlrpc]
ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21

Need to hold inode mutex on the external EA for ldiskfs_orphan_add()
to soothe the warning.

Fixes: f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock() for truncates")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3a1abfde3289c0bbd46e0d5a5b9d2ff7d7cf9273
Reviewed-on: https://review.whamcloud.com/44754
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14323 tests: skip sanity-flr/pfl tests for older servers 94/44494/5
James Nunez [Wed, 4 Aug 2021 14:47:50 +0000 (08:47 -0600)]
LU-14323 tests: skip sanity-flr/pfl tests for older servers

sanity-flr test 46 sub tests 7, 8, 9 and 10 and sanity-pfl
test 16c were added to lustre-master version 2.13.53.205.
When we run version interop testing, these sanity-flr and
sanity-pfl tests will fail.  Thus skip sanity-flr test 46
subtests 7, 8, 9, and 10 and sanity-pfl test 16c when run
with servers with version less than 2.13.53.205 and clients
with later version.

Fixes: ee916af10de2 (“LU-13366 utils: SEL yaml and copy file support “)
Test-Parameters: trivial
Test-Parameters: env=ONLY=46 testlist=sanity-flr
Test-Parameters: env=ONLY=16 testlist=sanity-pfl
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=46 testlist=sanity-flr
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=16 testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I09b88351a10891f63dceb9a2a74c92e4fffc13c5
Reviewed-on: https://review.whamcloud.com/44494
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
2 years agoLU-14709 pcc: VM_WRITE should not trigger layout write 83/44483/7
Qian Yingjin [Sat, 31 Jul 2021 07:45:56 +0000 (15:45 +0800)]
LU-14709 pcc: VM_WRITE should not trigger layout write

VM area marked with VM_WRITE means that pages may be written, but
mmap page write may never happen.
It should delay layout write until the actual modification on the
file happen in ->page_mkwrite().
Otherwise, it will trigger panic for PCC-RO sanity-pcc test_21f().

Fixes: f2d1c4ee4 ("LU-14647 flr: mmap write/punch does not stale other mirrors")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I1cbfef8a4ed7e2c718324fd8a21bafd6157b5f0c
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44483
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14896 utils: migrate file with only '--pool' option 65/44465/4
Etienne AUJAMES [Mon, 2 Aug 2021 10:26:58 +0000 (12:26 +0200)]
LU-14896 utils: migrate file with only '--pool' option

"lfs migrate -p pool_name test_file" initiate a migration but without
changing the layout pools (migrate from layout copy).

This patch implements the same behavior that:
"lfs setstripe -p pool_name test_file"
It sets the pool name and uses the default parameters for the plain
layout.

Add sanity test 56xg to check file migrations with pool.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I1645eaca028974337218411d6a033f3acf9b9d6a
Reviewed-on: https://review.whamcloud.com/44465
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13055 changelog: use default mask if server has no mask 04/44404/3
Mikhail Pershin [Tue, 27 Jul 2021 10:37:01 +0000 (13:37 +0300)]
LU-13055 changelog: use default mask if server has no mask

When registering a new maskless user and server has no specific
mask set then effective mask to be set to DEFAULT value

Fixes: a15eb4f132 ("LU-13055 mdd: per-user changelog names and mask")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If799cb5cc29c60cce6ef6c987f2e493145e00e31
Reviewed-on: https://review.whamcloud.com/44404
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13903 utils: separate out server code for wiretest 73/43873/9
James Simmons [Sat, 21 Aug 2021 17:54:42 +0000 (13:54 -0400)]
LU-13903 utils: separate out server code for wiretest

Both the kernel and userland utility wiretest is used by both
client and server to validate data being sent over the network.
Make userland  wiretest buildable on the native Linux client
which lacks server specific data structures. Use of the UAPI
values to hardern testing of user land data passed to the
kernel.

Change-Id: I30efc8bf42ac461bab5a4371e940a027a23d12c9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43873
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13903 uapi: fixup UAPI headers for native Linux client. 64/44664/5
James Simmons [Sat, 4 Sep 2021 12:33:53 +0000 (08:33 -0400)]
LU-13903 uapi: fixup UAPI headers for native Linux client.

This covers all the UAPI problems outside of the user land
wiretest utility. One set of problems is build and the second is
that UAPI header definitions are either user land only or never
used to valid data going to or from user land.

1) Use UAPI header definitions to validate data send to or from
   kernel space. We check lum_hash_type using LMV_HASH_TYPE_MASK.
   This avoids a round trip to the server which will report back
   an error. The other case is we check the values returned for
   LL_IOC_HSM_ACTION. We keep the original behavior of passing
   unknown data to the user land application but add debug
   logging if the data looks corrupt to help track down bug
   issues.

2) We can use QIF_DQBLKSIZE* instead of Lustre specific values
   for our quota handling. QIF_DQBLKSIZE* is a Linux UAPI quota
   value.

3) The NOTIFY_GRACE_* macros are used only by user land. Move
   to lustreapi.h

4) A few of the UAPI definitions are used by utility code
   present on the client and the Lustre kernel server code; which
   are not sent over the wire. Handle these special cases. This
   covers the missing LCM_USER_MIRROR_FLAGS, LCME_TEMPLATE_FLAGS,
   and LQUOTA_* values. Once server code merges upstream we can
   clean this up.

5) lcfg_cmd2data() is server specific so in case of a client build
   we can have get_llog_event_name() just always return NULL.

6) Don't package OpenSFS UAPI headers when building for native
   Linux client.

Change-Id: I258ee917b005e438eb7c15fa6e0c4b72e9ea9d56
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44664
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: filename encryption - digest support 92/43392/16
Sebastien Buisson [Fri, 22 Jan 2021 12:06:50 +0000 (21:06 +0900)]
LU-13717 sec: filename encryption - digest support

A number of operations are allowed on encrypted files without the key:
- read file metadata (stat);
- list directories;
- remove files and directories.
In order to present valid names to users, cipher text names are base64
encoded if they are short. Otherwise we compute a digested form of the
cipher text, made of the FID (16 bytes) followed by the second-to-last
cipher block (16 bytes), and we base64 encode this digested form for
presentation to user.
These transformations are carried out in the specific overlay
functions, that now need to know the fid of the file.

As the digested form does not contain the whole cipher text name,
server side needs to proceed to an operation by FID for requests such
as lookup and getattr. It also relies on the content of the LinkEA to
verify the digested form as received from client side.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I45d10a426373c2cfe0b92a58c351da452d085d7d
Reviewed-on: https://review.whamcloud.com/43392
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
2 years agoLU-13086 tests: restore compatibility with mpich 89/38689/8
Elena Gryaznova [Thu, 21 May 2020 10:13:41 +0000 (13:13 +0300)]
LU-13086 tests: restore compatibility with mpich

The addition of the --oversubscribe MPI option to mpi_run() is
OpenMPI specific.  Patch moves --oversubscribe to MPIRUN_OPTIONS
in local.sh to restore the compatibility with MPICH.

Test-Parameters: trivial clientdistro=el8.3 serverdistro=el7.7 testlist=parallel-scale,large-scale,performance-sanity
Test-Parameters: clientdistro=el8.4 serverdistro=el7.7 testlist=parallel-scale,large-scale,performance-sanity
Fixes: 3c7aca7472 ("LU-12395 build: build mpitests for el8")
Cray-bug-id: LUS-8006
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I0a6fab072212781d12877d2503ae8600cfdc8c7a
Reviewed-on: https://review.whamcloud.com/38689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-13997 tests: sanity/418 to cancel all client locks 03/44803/4
Alex Zhuravlev [Wed, 1 Sep 2021 08:54:04 +0000 (11:54 +0300)]
LU-13997 tests: sanity/418 to cancel all client locks

verify idea about dirty client's data

Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Test-Parameters: testlist=sanity env=ONLY=0-418 fstype=ldiskfs
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ifef58a98b26c7790274d2a57aa52e4475e923dd0
Reviewed-on: https://review.whamcloud.com/44803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14959 ldlm: Check return value of ldlm_resource_get() 38/44738/4
Oleg Drokin [Tue, 24 Aug 2021 03:44:45 +0000 (23:44 -0400)]
LU-14959 ldlm: Check return value of ldlm_resource_get()

Fix the comment to properly indicate it returns ERR_PTR on
error and fix osc_req_attr_set() and mdc_get_lock_handle()
to actually check the return value before passing it on and
causing an unintended crash.

Change-Id: Ib85a62140a39744e85989c9a9c8aa2ed771d70d1
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44738
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
2 years agoLU-14951 llite: protect fd_{lease_}och 00/44700/2
Bobi Jam [Wed, 18 Aug 2021 13:24:50 +0000 (21:24 +0800)]
LU-14951 llite: protect fd_{lease_}och

Access ll_file_data::fd_och and fd_lease_och needs to lli_och_mutex
protection.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie9136aa345c6bf015aa73067acdaecf1a765b9f6
Reviewed-on: https://review.whamcloud.com/44700
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13195 osp: track destroyed OSP object 85/38385/11
Alex Zhuravlev [Mon, 27 Apr 2020 04:52:01 +0000 (07:52 +0300)]
LU-13195 osp: track destroyed OSP object

retain destroyed OSP objects in memory to prevent races when
in-flight destroyed is passed by read or attr_get leading to
incorrect local states.
also block operations to such an object with -ENOENT.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ied59f1a95458e8890249b92d4efc38e258a7e3cf
Reviewed-on: https://review.whamcloud.com/38385
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14729 osd-ldiskfs: declare should consider concurrency 16/44316/13
Wang Shilong [Thu, 15 Jul 2021 08:15:37 +0000 (16:15 +0800)]
LU-14729 osd-ldiskfs: declare should consider concurrency

Write in Lustre OSD is different than Ext4 since write
is serialized in local filesystem, however in OSD side,

many concurrent threads may grow tree before transaction starts.

Also fix to use @dirty_groups rather than @extents, remove
unnecessary @depth assignment.

Fixes: 9810341a8 ("LU-14729 osd-ldiskfs: fix to declare write commits")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I1e0fc9069a579736a74b0ba2607056fe980574c3
Reviewed-on: https://review.whamcloud.com/44316
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14724 nrs: TBF rule list broken when change rule rank 25/43925/6
Qian Yingjin [Fri, 28 May 2021 03:56:12 +0000 (11:56 +0800)]
LU-14724 nrs: TBF rule list broken when change rule rank

When change rank of two adjacent rules in the TBF rule list in
@nrs_tbf_rule_change_rank():
list_move(&rule->tr_linkage, next_rule->tr_linkage.prev);

The previous pointer of @next_rule is @rule, using list_move
directly will break the rule list.
In this patch, it use list_del + list_add to repace list_move to
avoid TBF rule broken.
And also add a test case sanityn test_77o for this bug.

Fixes: aa14b0b9a152 ("LU-8006 ptlrpc: specify ordering of TBF policy rules")
Change-Id: Ica30d3329f07914657ac2c4089d66f934021b763
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/43925
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14711 tests: Ensure there's no eviction with long cache discard 69/43869/9
Oleg Drokin [Sat, 29 May 2021 02:42:49 +0000 (22:42 -0400)]
LU-14711 tests: Ensure there's no eviction with long cache discard

Just pause execution while doing page processing
for discard if appropriate failloc is set.

Change-Id: If0d04f3cad267cbeeab63040d63e048dcf03cd6b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Test-Parameters: trivial testlist=sanity env=ONLY=903
Reviewed-on: https://review.whamcloud.com/43869
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-13717 sec: filename encryption 90/43390/15
Sebastien Buisson [Tue, 23 Mar 2021 13:58:50 +0000 (22:58 +0900)]
LU-13717 sec: filename encryption

On client side, call the appropriate llcrypt primitives from llite,
to proceed with filename encryption before sending requests to servers
and filename decryption upon request receipt.
Note we need specific overlay functions to handle encoding and
decoding of encrypted filenames, as we do not want server side to deal
with binary names before they reach the backend file system layer.

On server side, mainly the OSD layer, we need to know the encryption
status of files being processed.
If an object belongs to an encrypted file, the filename has been
encoded by the client because it is binary, so it needs to be decoded
before being handed over to the backend file system layer.
And conversely, the filename of an encrypted file has to be encoded
before being sent over the wire.
Note server side is osd-ldiskfs only for now.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7ac9047f5a046b8bc63afdbbb1f28e78aa5c8c7e
Reviewed-on: https://review.whamcloud.com/43390
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-14854 mdd: proper handle error in mdd_swap_layouts() 19/44319/5
Bobi Jam [Thu, 15 Jul 2021 18:20:54 +0000 (02:20 +0800)]
LU-14854 mdd: proper handle error in mdd_swap_layouts()

Only restore object's HSM xattr on error if it's for
SWAP_LAYOUTS_MDS_HSM.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9d4c58cd3107c3900e72a0946d0ec7d7286dd43f
Reviewed-on: https://review.whamcloud.com/44319
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9897 tests: add generated files to .gitignore 78/44778/3
James Simmons [Sat, 28 Aug 2021 23:55:49 +0000 (19:55 -0400)]
LU-9897 tests: add generated files to .gitignore

Several binaries and wrappers are created in the build process
that show up as files for git add which is not the case. Add
these files to .gitignore so avoid an accidental git addition.

Test-Parameters: trivial
Change-Id: If693ba7933c0329a333dec71ed6fb521a90435f4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44778
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14967 obdclass: EAGAIN after rhashtable_walk_next() 66/44766/3
Alex Zhuravlev [Fri, 27 Aug 2021 05:42:56 +0000 (08:42 +0300)]
LU-14967 obdclass: EAGAIN after rhashtable_walk_next()

rhashtable_walk_next() can return -EAGAIN when concurrent resizing
has happened. so the callers should check for this error and just
repeat rhashtable_walk_next().

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I15ba2cdf16c2678e18836b4f16b56a3b8bfdacd0
Reviewed-on: https://review.whamcloud.com/44766
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14776 zfs: fix Ubuntu 20 HWE build issues 49/44749/3
James Simmons [Wed, 25 Aug 2021 17:17:51 +0000 (11:17 -0600)]
LU-14776 zfs: fix Ubuntu 20 HWE build issues

With newer Ubuntu systems using ZFS dkms have the following build
errors:

    In file included from zfs/2.0.2/source/include/sys/arc.h:32,
                 from lustre/osd-zfs/osd_internal.h:50,
                 from lustre/osd-zfs/osd_handler.c:51:
    zfs/2.0.2/source/include/sys/zfs_context.h:45:10:
                 fatal error: sys/types.h: No such file or directory
    45 | #include <sys/types.h>
       |          ^~~~~~~~~~~~~
    compilation terminated.

This is due to layout of the tree containing the needed headers.
Include those paths in build system.

Test-Parameters: trivial
Change-Id: I453830c4111ad88ec655d3d7d0ee51627331cb0b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44749
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14776 build: Ubuntu 20.04.2 and 20.04.3 HWE client support 48/44748/2
James Simmons [Wed, 25 Aug 2021 13:26:52 +0000 (09:26 -0400)]
LU-14776 build: Ubuntu 20.04.2 and 20.04.3 HWE client support

We now support Luste clients on both Ubuntu 20.04.2 and
20.04.3 HWE platforms.

Change-Id: I772af876ffa8beeabb8a2002f80aa776fa373996
Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44748
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14962 lnet: Check for -ESHUTDOWN in lnet_parse 43/44743/3
Chris Horn [Tue, 24 Aug 2021 16:16:17 +0000 (11:16 -0500)]
LU-14962 lnet: Check for -ESHUTDOWN in lnet_parse

The fix for LU-8106, http://review.whamcloud.com/19993, no longer
works because rc does not have the return value from
lnet_nid2peerni_locked(). Use PTR_ERR to get the return value and
restore the LU-8106 fix.

HPE-bug-id: LUS-10333
Fixes: fa8b4e6357 ("LU-7734 lnet: peer/peer_ni handling adjustments")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9cc2bc2d6e675d38cf06d99c524bdd95110bf0e9
Reviewed-on: https://review.whamcloud.com/44743
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14961 tests: set Pool Quotas 40/44740/4
Elena Gryaznova [Tue, 24 Aug 2021 11:23:22 +0000 (14:23 +0300)]
LU-14961 tests: set Pool Quotas

We are interested in running some tests on fs with
pool quotas set for some users. For instance, setting
pool quotas limits for mpiuser allows to stress pool
quotas code with mpi tests.
Patch adds ability to set pool quotas block hard limits
for specific users via POOLS_QUOTA_USERS_SET.
Example:
  POOLS_QUOTA_USERS_SET="quota15_1:20M
                quota15_2:1G:gpool0
                quota15_4:200M:gpool0
                quota15_4:200M:gpool1"
For quota15_1 limit 20M will be set for all existing
pools.

Test-Parameters: env=FS_POOL="glo",POOLS_QUOTA_USERS_SET="mpiuser:200M quota15_1:2000M:glo1",FS_NPOOLS="2",ENABLE_QUOTA="yes"
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10059
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Change-Id: Ia9ee540ca77e70f37aa849e5e555e3c057e2052d
Reviewed-on: https://review.whamcloud.com/44740
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14960 tests: enhance ha.sh to work with several test dirs 39/44739/3
Elena Gryaznova [Tue, 24 Aug 2021 10:56:58 +0000 (13:56 +0300)]
LU-14960 tests: enhance ha.sh to work with several test dirs

Patch adds the ability to work with several test directories
set via ha_test_dirs variable.
Useful for emulation more Lustre clients.
Example:
  before the test mount Lustre on:
    /mnt/lustre, /mnt/lustre1 /mnt/lustre3
  Run ha.sh with:
  ha_test_dirs="/mnt/lustre /mnt/lustre1 /mnt/lustre3"
  The client's test directories will be created in the listed
  test directories:
  client0 works in /mnt/lustre subdirectory
  client1 works in /mnt/lustre1 subdirectory,
  etc.

Patch also adds the ability to not remove the test directories
if CLEANUP set to false.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9705
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: I1d04b7deeda693c9ca1c86411b0a66c6a2315923
Reviewed-on: https://review.whamcloud.com/44739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 libcfs: change libcfs_log_* functions to inline 81/44581/3
James Simmons [Tue, 10 Aug 2021 18:05:31 +0000 (14:05 -0400)]
LU-9859 libcfs: change libcfs_log_* functions to inline

The functions libcfs_log_return() and libcfs_log_goto() don't
exist in the native Linux client. We still need them for the
special OpenSFS debugging but we can change those functions
to simple inline routines since they are just wrappers
around libcfs_debug_msg().

Test-Parameters: trivial
Change-Id: I0e2b40feb18f9f1a1ffbda39756ab64308ea6439
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44581
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14021 llite: don't touch vma after filemap_fault 58/44558/2
Alexander Boyko [Tue, 10 Aug 2021 14:20:42 +0000 (10:20 -0400)]
LU-14021 llite: don't touch vma after filemap_fault

In case of error filemap_fault unlock mutex vma->vm_mm->mmap_sem,
so touching vma is dangerous, it could be reused or freed.
The patch uses local file variable to skip vma.

HPE-bug-id: LUS-10240
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I72cd086645061819fab5b8595a880db64cfb9ff7
Reviewed-on: https://review.whamcloud.com/44558
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14807 lfsck: fix race in lfsck_pos_fill 30/44130/7
Hongchao Zhang [Sun, 27 Jun 2021 21:00:20 +0000 (05:00 +0800)]
LU-14807 lfsck: fix race in lfsck_pos_fill

There is a race for lfsck->li_di_dir between lfsck_di_dir_put and
lfsck_pos_fill, which could cause lfsck_pos_fill to use freed
lfsck->li_di_dir (struct osd_it_ea) and trigger GPF.

Change-Id: Iedadf03ac15d128bb051aea8aafa24dbcd2704fb
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44130
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14696 llite: check read only mount for setquota 65/43765/6
Hongchao Zhang [Thu, 12 Aug 2021 11:06:45 +0000 (19:06 +0800)]
LU-14696 llite: check read only mount for setquota

During setting quota, it should fail if the mount is read-only.

Change-Id: I966ac71d0a4a72dcb998f09ffc0f99ae28498e27
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43765
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13668 mdt: change lock mode for lease 64/38964/23
Alex Zhuravlev [Wed, 17 Jun 2020 14:05:28 +0000 (17:05 +0300)]
LU-13668 mdt: change lock mode for lease

make it PW so that lfs getstripe and open-for-read do not
interrupt replication.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I20f4bbbc4e7bf9055333aba1b8cca80aa899c664
Reviewed-on: https://review.whamcloud.com/38964
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9868 lustre: switch to use of ->d_init() 35/44135/21
Al Viro [Mon, 16 Aug 2021 18:41:22 +0000 (14:41 -0400)]
LU-9868 lustre: switch to use of ->d_init()

Starting with 4.7 kernels the initialization of dentries
is now managed by the VFS layer at allocation time. Any
time a dentry is created by the VFS ll_d_init will be
called.

Linux-commit: 7126bc2e8d60c2a00539bf96b1005f3015be87a5

Change-Id: I02f9b83afd5007658ce88c1010c669d642665d39
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14954 socklnd: fix link state detection 32/44732/5
Serguei Smirnov [Mon, 23 Aug 2021 19:58:51 +0000 (12:58 -0700)]
LU-14954 socklnd: fix link state detection

Due to matching only the device index, link detection implemented
in LU-14742 has issues with confusing the link events for the
virtual interfaces with the link events for the interface that
LNet was actually configured to use. Fix this by improving
the identification of the event source: use both device name and
device index.

Also, to make sure the link fatal state is cleared only when
the device is bound to the IP address used at NI creation,
subscribe to inetaddr events in addition to the netdev events.

Test-Parameters: trivial
Fixes: fc2df80e96dc ("LU-14742: detect link state to set fatal error")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib1996c66a8ae2596970d66e3d920702190851e3f
Reviewed-on: https://review.whamcloud.com/44732
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14877 llite: Remove inode locking in ll_fsync 68/44368/6
Oleg Drokin [Wed, 21 Jul 2021 20:03:10 +0000 (16:03 -0400)]
LU-14877 llite: Remove inode locking in ll_fsync

It does not appear to be necessary

Change-Id: I0142a9dca4ecc6893521275b69a0a46012eab0b0
Fixes: 8f3ef1e961 ("LU-812 llite: 3.0+ kernel fsync should call write")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44368
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14904 ldiskfs: add support for Ubuntu20 kernel 5.4.0.80 03/44703/3
James Simmons [Wed, 18 Aug 2021 16:15:29 +0000 (12:15 -0400)]
LU-14904 ldiskfs: add support for Ubuntu20 kernel 5.4.0.80

Changes from newer 5.4.0 kernel version have been backported to
Ubuntu20. Test for Ubuntu 5.4.0.80 kernels so we use the correct
series file with the updated ext-simple-blockalloc.patch.

Test-Parameters: trivial
Change-Id: I73ad558a306ec50fb1ba45e6ab2c59aaec047197
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44703
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14927 scrub: share osd_scrub[prep|post] code 05/44705/2
James Simmons [Wed, 18 Aug 2021 18:04:44 +0000 (14:04 -0400)]
LU-14927 scrub: share osd_scrub[prep|post] code

Both osd-zfs and osd-ldiskfs functions osd_scrub_prep() and
osd_scrub_post() are nearly identical. Additionally the code
contains internal kernel code that can be only with non-tainted
modules. To avoid the inherited tainted issues create common
code scrub_thread_prep() and scrub_thread_post() to place in
scrub.c in obdclass. These can be handled as kthread helpers
for OSD drivers.

Change-Id: Ia4875eafc053c1e07f437ba55dbdcf58029a7fc6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44705
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14949 llite: Always do lookup on ENOENT in open 75/44675/5
Patrick Farrell [Tue, 17 Aug 2021 02:54:59 +0000 (22:54 -0400)]
LU-14949 llite: Always do lookup on ENOENT in open

When there is no valid dentry found for a file we want to
open, we perform a full lookup, which goes to the server
and looks up the file by name. When we find an existing
dentry in cache *but the file is not open on the node*, we
do not do a full lookup.  We move directly to opening the
file.

When we open files, we use the FID of the file.  The
problem occurs when a new file is renamed *over* the file
we were trying to open.  This removes the FID we are
trying to open, but the file *name* userspace called open()
on is still present.  In this case, we will return ENOENT,
even though there is a file matching the name used in the
open() call.

The solution is when we get an ENOENT on open (indicating
our open raced with an unlink), we always send ESTALE back
to the VFS, which restarts the open and forces a lookup to
the server (by forcing Lustre to consider the dentry
invalid, see comments in ll_intent_file_open and code in
ll_revalidate_dentry).

This causes a lookup by name, which will correctly handle
the rename, allowing the open to proceed normally.

This should only generate extra retries in the case where a
positive dentry exists on the client but the file has been
removed on the server, ie, open racing with unlink.

This should hopefully be rare.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Change-Id: If9157cac901c81d6ad3f15997d419d3907fe88b8
Reviewed-on: https://review.whamcloud.com/44675
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14847 ptlrpc: two replay lock threads 94/44294/5
Vitaly Fertman [Tue, 13 Jul 2021 16:07:14 +0000 (19:07 +0300)]
LU-14847 ptlrpc: two replay lock threads

conflict to each other what leads to:
        ASSERTION( atomic_read(&imp->imp_replay_inflight) == 1 )

replay_lock_interpret() does ptlrpc_connect_import() on error, and one
thread will appear starting with connect reply interpret.

replay_lock_interpret() also wakes up ldlm_lock_replay_thread() which
does ptlrpc_import_recovery_state_machine().

It may happen that both threads will get to ldlm_replay_locks() on the
next round at the same time, both increment imp_replay_inflight and
the second one will assert.

The problem appeared in LU-13600 which added ldlm_lock_replay_thread()
with the ptlrpc_import_recovery_state_machine() call.

HPE-bug-id: LUS-10147
Fixes: 3b613a442b ("LU-13600 ptlrpc: limit rate of lock replays")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ia9aafb631e3ba5f850504cc58b4826acec2813bd
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158931
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/44294
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14785 build: changelog updates should not dirty version 69/44069/5
Shaun Tancheff [Thu, 24 Jun 2021 13:08:24 +0000 (08:08 -0500)]
LU-14785 build: changelog updates should not dirty version

When building lustre debs the final version should not
include 'dirty' due to an update of the changelog

HPE-bug-id: LUS-10152
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f4da4b3006302e3598cfa56a0364b052f885ef
Reviewed-on: https://review.whamcloud.com/44069
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page) 62/43462/9
Arshad Hussain [Wed, 19 May 2021 11:04:30 +0000 (16:34 +0530)]
LU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page)

fallocate(PUNCH_HOLE) was leaving the partially-zeroed
page in the buffer cache. This was causing ASSERT when
doing large direct read/write operations. This was see
when executing a fsx run with options:-

$ fsx -c 50 -p 1000 -S 7919 -P /tmp -l 5407677 -N 100000 <file>

Lustre: DEBUG MARKER: GENERIC DEBUG start start
LustreError: 15768:0:(osd_io.c:1563:osd_write_commit())
ASSERTION( !PageDirty(lnb[i].lnb_page) ) failed:
LustreError: 15768:0:(osd_io.c:1563:osd_write_commit()) LBUG
Pid: 15768, comm: ll_ost_io00_000 3.10.0-957.el7_lustre.x86_64
Call Trace:
[<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[<0>] osd_write_commit+0x52c/0x870 [osd_ldiskfs]
[<0>] ofd_commitrw_write+0xe79/0x1510 [ofd]
[<0>] ofd_commitrw+0x2ad/0x9a0 [ofd]
[<0>] tgt_brw_write+0xfd0/0x1cb0 [ptlrpc]
[<0>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]
[<0>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[<0>] ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
[<0>] kthread+0xd1/0xe0
[<0>] ret_from_fork_nospec_begin+0xe/0x21
[<0>] 0xfffffffffffffffe
Kernel panic - not syncing: LBUG

Test-case: sanity-benchmark/fsx_partial_punch added

Test-Parameters: testlist=sanity-benchmark
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I89fcbc6af0cbf4b544b8d149703053909ecb6cad
Reviewed-on: https://review.whamcloud.com/43462
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lutf: Fix crash and other updates 26/44726/5
Amir Shehata [Mon, 23 Aug 2021 18:45:18 +0000 (11:45 -0700)]
LU-10973 lutf: Fix crash and other updates

Fix crash in wait_for_agents. Was mis-using
cYAML_get_next_seq_item().

Update the lustre_lnet_config_ni() with a newly added parameter
for conns_per_peer. Later on tests can be added to explicitly
test setting the conns_per_peer from the C API.

Remove auth_timeout from the paramiko file to be backwards
compatible with older versions of the paramiko python API.

Only delete the progress file if this node is the LUTF master
node. This is to avoid other nodes trampling over each other
if they are using the same directory to dump temporary files.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifb5ef0e16c6bc859c3893919a9242b64fd049ebe
Reviewed-on: https://review.whamcloud.com/44726
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: extend rspt_next_hop_nid in lnet_rsp_tracker 94/43594/8
Mr NeilBrown [Mon, 6 Apr 2020 03:03:36 +0000 (13:03 +1000)]
LU-10391 lnet: extend rspt_next_hop_nid in lnet_rsp_tracker

rspt_next_hop_nid in 'struct lnet_rsp_tracker' is now
a 'struct lnet_nid'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1348a05e572782383a2e68eb7a6be514a53b28b8
Reviewed-on: https://review.whamcloud.com/43594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lr_nid to struct lnet_nid 93/43593/8
Mr NeilBrown [Wed, 18 Aug 2021 21:01:48 +0000 (17:01 -0400)]
LU-10391 lnet: change lr_nid to struct lnet_nid

The nid in 'struct lnet_route' is now a struct lnet_nid'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2e2f2e9c8d2cbdbc87b408ee4589952f2df02880
Reviewed-on: https://review.whamcloud.com/43593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: enhance connect/accept to support large addr 05/42105/11
Mr NeilBrown [Fri, 3 Apr 2020 05:37:26 +0000 (16:37 +1100)]
LU-10391 lnet: enhance connect/accept to support large addr

This patch introduces a version-2 of the acceptor protocol.  This
version uses a 'struct lnet_nid' rather than 'lnet_nid_t'

lnet_connect() now accepts a struct lnet_nid and uses version 2 if
necessary.  lnet_accept() accepts either v1 or v2.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I523be0d217b6239c9791ff4fa536b9255c029ae7
Reviewed-on: https://review.whamcloud.com/42105
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: introduce lnet_processid for ksock_peer_ni 04/42104/13
Mr NeilBrown [Fri, 8 May 2020 00:53:53 +0000 (10:53 +1000)]
LU-10391 lnet: introduce lnet_processid for ksock_peer_ni

struct lnet_processid (without the '_') is like lnet_process_id, but
contains a 'struct lnet_nid' rather than lnet_nid_t.

So far it is only used for ksnp_id in struct ksock_peer_ni, and
related functions.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1fea693b1c84ca4c3ac1821f55874ad11519a33b
Reviewed-on: https://review.whamcloud.com/42104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: factor out key calculation for ksnd_peers 03/42103/13
Mr NeilBrown [Mon, 9 Mar 2020 03:13:05 +0000 (14:13 +1100)]
LU-10391 socklnd: factor out key calculation for ksnd_peers

The hash_table library requires a "long" to be used as a key.  We
currently provide the nid, which at 64bits is a suitable long on 64bit
hosts, but isn't really correct on 32bit hosts.

When we change to an extend nid (which is 160bits) it will be even
less appropriate.

So create a separate function to compute a 'long' key, and implement
by simply xoring 'long'-sized parts of the nid together.  On a 64bit
machine, this is currently optimized away for lnet_nid_t, but that
will change when we convert to struct lnet_nid.

This new function is placed in lnet-types.h as it will be more
generally useful later.

The hash_table library calls hash_long() on the key, so we don't need
to do anything more interesting than xoring.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I22c59a87c9872bb59a2f47c2a8c57b287ed53ed3
Reviewed-on: https://review.whamcloud.com/42103
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lp_disc_*_nid to struct lnet_nid 20/44620/5
Mr NeilBrown [Tue, 22 Jun 2021 05:24:42 +0000 (15:24 +1000)]
LU-10391 lnet: change lp_disc_*_nid to struct lnet_nid

Change lp_disc_src_nid and lp_disc_dst_nid in struct lnet_peer to
struct lnet_nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0f127fbc790c0821900d7b8abfa56c1a7de8f944
Reviewed-on: https://review.whamcloud.com/44620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lp_primary_nid to struct lnet_nid 02/42102/13
Mr NeilBrown [Wed, 18 Aug 2021 20:31:46 +0000 (16:31 -0400)]
LU-10391 lnet: change lp_primary_nid to struct lnet_nid

Change lp_primary_nid in struct lnet_peer to struct lnet_nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I386e85062257f6f8832ffdf4b9603c0e1c072dae
Reviewed-on: https://review.whamcloud.com/42102
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lpni_nid in lnet_peer_ni to lnet_nid 01/42101/13
Mr NeilBrown [Mon, 6 Apr 2020 06:31:55 +0000 (16:31 +1000)]
LU-10391 lnet: change lpni_nid in lnet_peer_ni to lnet_nid

lpni_nid in 'struct lnet_peer_ni' is converted to 'struct lnet_nid'
and various supporting functions updated.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7a99f758b600a0dd0668edd368663ff65f603486
Reviewed-on: https://review.whamcloud.com/42101
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: add string formating/parsing for IPv6 nids 42/43942/8
Mr NeilBrown [Tue, 8 Jun 2021 04:02:14 +0000 (14:02 +1000)]
LU-10391 lnet: add string formating/parsing for IPv6 nids

New entries for struct netstrfns:
  nf_addr2str_size
  nf_str2addr_size
which accept or report the size of the address in bytes.
New matching functions that can report or parse IPv4 and IPv6
addresses.

New interface - currently unused - libcfs_strnid() which takes a str
and provides a 'struct lnet_nid' with appropriate nid_size.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idbfc8bb9502192e1dc6a217750e7f4431e3eca4a
Reviewed-on: https://review.whamcloud.com/43942
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: introduce struct lnet_nid 00/42100/26
Mr NeilBrown [Fri, 20 Aug 2021 14:17:35 +0000 (10:17 -0400)]
LU-10391 lnet: introduce struct lnet_nid

LNet nids are currently limited to 4-bytes for addresses.
This excludes the use of IPv6.

In order to support IPv6, introduce 'struct lnet_nid' which can hold
up to 128bit address and is extensible, and deprecate 'lnet_nid_t'.
lnet_nid_it will eventually be removed.  Where lnet_nid_t is often
passed around by value, 'struct lnet_nid' will normally be passed
around by reference as it is over twice as large.

The net_type field, which currently has value up to 16, is now limited
to 0-254 with 255 being used as a wildcard.  The most significant byte
is now a size field which gives the size of the whole nid minus 8.  So
zero is correct for current nids with 4-byte addresses.

Where we still need to use 4-byte-address nids, we will use names
containing "nid4".  So "nid4" is a lnet_nid_t when "nid" is a struct
lnet_nid.  lnet_nid_to_nid4 converts a 'struct lnet_nid' to an
lnet_nid_t.

While lnet_nid_t is stored and often transmitted in host-endian format
(and possibly byte-swapped on receipt), 'struct lnet_nid' is always
stored in network-byte-order (i.e.  big-endian).  This is more common
approach for network addresses.

In this first instance, 'struct lnet_nid' is used for ni_nid in
'struct lnet_ni', and related support functions.

In particular libcfs_nidstr() is introduced which parallels
libcfs_nid2str(), but takes 'struct lnet_nid'.

In cases were we need to have similar functions for old and new style
nid, the new function is introduced with a slightly different name,
such as libcfs_nid2str above, or LNET_NID_NET (like LNET_NIDNET).
It will be confusing having both, but the plan is to remove the old
names as soon as practical.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4dcf1bab856621915b6535958d77cdde89105d96
Reviewed-on: https://review.whamcloud.com/42100
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14930 mdt: abort_recov_mdt shouldn't abort client recovery 10/44610/2
Mikhail Pershin [Wed, 11 Aug 2021 14:30:48 +0000 (17:30 +0300)]
LU-14930 mdt: abort_recov_mdt shouldn't abort client recovery

When abort_recov_mdt is set to abort MDT-MDT recovery then
abort_recovery flag is set too inside target_stop_recovery_thread()
call, that causes not just MDT-MDT recovery abort but aborts
also clients/MDT recovery.

Fixes: dd9e79b64d ("LU-12546 mdt: abort recovery between MDTs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ibda05e91a2da90156e2b6c9fdcb2169cdbd50fe4
Reviewed-on: https://review.whamcloud.com/44610
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 tests: silence gcc10 error for badarea_io 70/44670/4
James Simmons [Mon, 16 Aug 2021 17:02:15 +0000 (13:02 -0400)]
LU-14093 tests: silence gcc10 error for badarea_io

With gcc10 badarea_io will fail to build with the following error.

badarea_io.c: In function ‘main’:
badarea_io.c:59:7: error: ‘write’ reading 2097152 bytes from a
                           region of size 4 [-Werror=stringop-overflow=]
   59 |  rc = write(fd, &fd, 2UL*1024*1024);
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Talking to Oleg see stated this is the done this way on purpose.
So instead of 'fixing' the issue in this case we silence the gcc
warning.

Test-Parameters: trivial
Test-Parameters: env=ONLY=133f,133g testlist=sanity
Change-Id: Iee79c7988cc209fd099c23c38a8bd7df96015b05
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14912 obdclass: prefer T10 checksum if the target supports it 57/44657/2
Li Dongyang [Fri, 13 Aug 2021 08:58:46 +0000 (18:58 +1000)]
LU-14912 obdclass: prefer T10 checksum if the target supports it

If the target actually has T10PI support, we prefer to use that
T10 checksum even it's not the fastest on the client, given
checksum_type is not explicitly set.

Change-Id: If91217881fcadbc84d1e360e65648344f5ac2447
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
2 years agoLU-14928 mgs: allow md target re-register 94/44594/5
Alexander Zarochentsev [Sun, 30 May 2021 13:43:05 +0000 (16:43 +0300)]
LU-14928 mgs: allow md target re-register

In a DNE system, it is not safe to do writeconf of
a MD target and attempt to mount (and re-register) it again,
as it creates a weird MDT-MDT osp devices like
fsname-MDT0001-osp-MDT0001" and makes the system non-functioning.
The fix doesn't allow creation of illegal devices.

HPE-bug-id: LUS-10098
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I698ee6d70ac96f54eaec57b5c5fe553d130ba011
Reviewed-on: https://review.whamcloud.com/44594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14926 utils: print unlink and setattr recs in llog_reader 91/44591/5
Alexander Zarochentsev [Fri, 16 Jul 2021 19:16:29 +0000 (22:16 +0300)]
LU-14926 utils: print unlink and setattr recs in llog_reader

Enhance llog_reader to print unlink and setattr llog records
correctly.

HPE-bug-id: LUS-10220
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7b44f65c976459d143521185a807939524f67fa2
Reviewed-on: https://review.whamcloud.com/44591
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14924 osd-ldiskfs: fix T10PI verify/generate_fn 48/44548/4
Li Dongyang [Tue, 10 Aug 2021 13:20:13 +0000 (23:20 +1000)]
LU-14924 osd-ldiskfs: fix T10PI verify/generate_fn

We are making wrong assumptions in the verify/generate_fn
of T10PI.

Consider this case: we have 4 pages lnb[0-3] in osd_iobuf.
lnb[2] is mapped to a hole, so it won't be added to bio.
If lnb[3] happens to be contiguous after lnb[1], lnb[3] will
be added to bio, with a bi_idx of 2.
In the verify/generate_fn, we work out which niobuf_local
to feed the guard tags to using bi_idx and obp_start_page_idx
and we will end up with wrong niobuf and set the guard tags
for lnb[2].

Contiguous blocks in bio doesn't necessarily mean we are looking
at contiguous niobuf_local/lnb in osd_iobuf->dr_lnbs

Test-Parameters: env=ONLY=77n testlist=sanity
Change-Id: I1ea1b6498692044e680c8754cd31e2c2b7bc9539
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44548
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14160 tests: fix fsx logdump to fit in 80 chars 10/44510/3
Andreas Dilger [Thu, 5 Aug 2021 20:45:46 +0000 (14:45 -0600)]
LU-14160 tests: fix fsx logdump to fit in 80 chars

Fix fsx logdump fallocate/truncate lines to fit within 80 columns.
Remove spurious leading 0 for every operation length.

Test-Parameters: trivial testlist=sanityn env=ONLY=16
Fixes: cb037f305c64 ("LU-14160 fallocate: Add punch mode to fallocate")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I93460b62be8611926e620241232d886dee3ebbe5
Reviewed-on: https://review.whamcloud.com/44510
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14773 tests: quiet down some verbose messages 34/44034/5
Andreas Dilger [Fri, 18 Jun 2021 21:35:29 +0000 (15:35 -0600)]
LU-14773 tests: quiet down some verbose messages

Don't print anything into the test logs for normal background
operations that are run as part of run_one(), so that they
don't clutter the test output with repeated/useless messages.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib6a49fc268e4cd0ad92c71a391865ce2d73ebbe5
Reviewed-on: https://review.whamcloud.com/44034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14321 tests: create PFL file in sanityn 51b 27/44027/4
James Nunez [Thu, 17 Jun 2021 23:25:55 +0000 (17:25 -0600)]
LU-14321 tests: create PFL file in sanityn 51b

sanityn test 51b was modified to integrate statx API in
Lustre version 2.13.54.  When we run version interop testing
with servers less than 2.13.54 and later clients, the test
will fail.

We should modify the test to create a PFL file without the
'extension-size' lfs setstripe option which will allow this
test to run with servers less than 2.13.54.

Fixes: 3f7853b31ef6 ("LU-10934 llite: integrate statx() API with Lustre")
Test-Parameters: trivial
Test-Parameters: serverversion=2.12.7 serverdistro=el7.9 env=ONLY=51b testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic3feb72771aa2db050b792159175624260e71f5b
Reviewed-on: https://review.whamcloud.com/44027
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
2 years agoLU-12848 tests: link succeded to an ophan remote object 91/35991/14
Alexander Zarochentsev [Mon, 12 Aug 2019 20:59:05 +0000 (23:59 +0300)]
LU-12848 tests: link succeded to an ophan remote object

An open file gets unlinked by rename,
at the same time a cross-mdt link is able to create a name
for a dying object. That causes a file system corruption
seeing as a failed attempt to remove the test dir, also
e2fsck would see an unconnected inode.

Cray-bug-id: LUS-6208
Test-Parameters: mdtcount=2 envdefinitions=ONLY=111 testlist=sanityn
Change-Id: Ic1fde278e5f4b53eaf5560ab50fe460d8c7f7dc3
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/35991
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11303 quota: enforce block quota for chgrp 96/33996/17
Hongchao Zhang [Fri, 2 Apr 2021 06:53:59 +0000 (14:53 +0800)]
LU-11303 quota: enforce block quota for chgrp

In patch https://review.whamcloud.com/30146 "LU-5152 quota: enforce
block quota for chgrp", problems were introduced due to synchronous
requests from the MDS to the OSS to change the quota assignment of
files during chgrp operations. However, in some cases, the OSTs are
themselves out of grant and may send a quota request to the MDS,
which may result in a deadlock. Another issue is the slow performance
caused by the synchronous operation between MDT and OSTs.

This patch drops the synchronous RPC requirement of the original
patch #30146 to avoid this problem.

Previously, problems in quota tracking related to chgrp were introduced
due to synchronous RPCs from the MDS to the OSS when changing the group
ownership of objects for quota tracking since
Fixes: 8a71fd5061b ("LU-5152 quota: enforce block quota for chgrp")

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I40556b9e8a0628eb18aa806d2f6b3dfb9b53e874
Reviewed-on: https://review.whamcloud.com/33996
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew tag 2.14.54 2.14.54 v2_14_54
Oleg Drokin [Wed, 18 Aug 2021 14:31:36 +0000 (10:31 -0400)]
New tag 2.14.54

Change-Id: I062c9dc76585f42edfa78108f286824e75badf8c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 lutf: fix build with gcc10 84/44484/7
James Simmons [Wed, 4 Aug 2021 13:39:19 +0000 (09:39 -0400)]
LU-14093 lutf: fix build with gcc10

The new LUTF code has build issues with gcc10. I see the following
build errors.

ld: lutf-lutf_listener.o:lutf.h:88: multiple definition of `g_lutf_cfg'
ld: lutf-lutf_listener.o:lutf.h:22: multiple definition of `debugtimestr'
ld: lutf-lutf_listener.o:lutf.h:21: multiple definition of `di'
ld: lutf-lutf_listener.o:lutf.h:20: multiple definition of `debugnow'

In function ‘snprintf’,
    inlined from ‘python_run_interactive_shell’ at lutf_python.c:45:2:
stdio2.h:71:10: error: ‘%s’ directive argument is null [-Werror=format-truncation=]
   71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   72 |        __glibc_objsize (__s), __fmt,
      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   73 |        __va_arg_pack ());
      |        ~~~~~~~~~~~~~~~~~

This patch resolves these warnings. Without this patch LUTF will
not build on Ubuntu 20 LTS.

Test-Parameters: trivial
Change-Id: Ie3c99f8c6cf2f5de583dc95a0dc63fcde1aa6ffd
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44484
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14903 doc: update lfs-setdirstripe man page 81/44481/2
Lai Siyao [Mon, 2 Aug 2021 11:55:12 +0000 (07:55 -0400)]
LU-14903 doc: update lfs-setdirstripe man page

Update lfs-setdirstripe man page to reflect the change of
filesystem-wide default directory layout.

Test-parameters: trivial

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1e7818679e057add4747565a2fc850e1857cd7b0
Reviewed-on: https://review.whamcloud.com/44481
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-14899 ldiskfs: Add 5.4.136 mainline kernel support 50/44450/2
Oleg Drokin [Sat, 31 Jul 2021 04:55:40 +0000 (00:55 -0400)]
LU-14899 ldiskfs: Add 5.4.136 mainline kernel support

The changes likely appeared in an earlier release
that we may also track down and update to.

Test-Parameters: trivial
Change-Id: I92125087650109b8cc8a968b2fd95ba5f8e7f998
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 years agoLU-12815 socklnd: set conns_per_peer based on link speed 17/44417/4
Serguei Smirnov [Wed, 28 Jul 2021 21:47:39 +0000 (14:47 -0700)]
LU-12815 socklnd: set conns_per_peer based on link speed

Specifying conns_per_peer=0 for a ni is now used to set
the conns_per_peer as a function of the corresponding link speed
as follows:
conns_per_peer = (ilog2(Gbps) / 2 + 1)

Listed below are the resulting defaults for common link speeds:
100Gbps, 200Gbps -> 4
        50Gbps  -> 3
        5Gbps, 10Gbps  -> 2
        less than 4Gbps  -> 1

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ief2b33a796c180d8669bd5796b3e35ec748423a5
Reviewed-on: https://review.whamcloud.com/44417
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7] 76/44376/3
Jian Yu [Thu, 22 Jul 2021 07:26:50 +0000 (00:26 -0700)]
LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]

Update RHEL7.9 kernel to 3.10.0-1160.36.2.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ie2898b1df28c8b99ea4099e94baafe388c6aa626
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44376
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14865 utils: llog_reader.c printf type mismatch 46/44346/5
Gian-Carlo DeFazio [Tue, 20 Jul 2021 00:30:36 +0000 (17:30 -0700)]
LU-14865 utils: llog_reader.c printf type mismatch

Add (unsigned long long) cast to results of
__le64_to_cpu so that it matches the formatting (%llu)
of the enclosing printf call.

Build log message:
"llog_reader.c:887:9: error: format '%llu' expects
argument of type 'long long unsigned int', but
argument 3 has type '__u64' [-Werror=format=]"

Test-Parameters: trivial
Fixes: 9962d6f84db5 LU-14617 utils: llog_reader updatelog support
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I9549e0a0bd21727dfcc42992b693bc39a779e1a1
Reviewed-on: https://review.whamcloud.com/44346
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_* 09/44309/4
Mr. NeilBrown [Wed, 4 Aug 2021 17:27:29 +0000 (13:27 -0400)]
LU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_*

The calling convention for ->proc_handler is rather clumsy,
as a comment in fs/procfs/proc_sysctl.c confirms.
lustre has copied this convention to lnet_debugfs_{read,write},
and then provided a wrapper for handlers - lprocfs_call_handler -
to work around the clumsiness.

It is cleaner to just fold the functionality of lprocfs_call_handler()
into lnet_debugfs_* and let them call the final handler directly.

If these files were ever moved to /proc/sys (which seems unlikely) the
handling in fs/procfs/proc_sysctl.c would need to be fixed to, but
that would not be a bad thing.

So modify all the functions that did use the wrapper to not need it
now that a more sane calling convention is available.

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I548ed6a3179cdb7cd5c024febd3fee4709285a82
Reviewed-on: https://review.whamcloud.com/44309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14787 libcfs: Proved an abstraction for AS_EXITING 70/44070/6
Shaun Tancheff [Thu, 22 Jul 2021 07:31:30 +0000 (02:31 -0500)]
LU-14787 libcfs: Proved an abstraction for AS_EXITING

Linux kernel v3.14-7405-g91b0abe36a7b added AS_EXITING flag
AS_EXITING flag is set while address_space mapping is exiting.

Provide an abstraction mapping_clear_exiting() to clear
the AS_EXITING flag. This balances the kernel mapping_set_existing()
and is used for older kernels when enum mapping_flags does
not include AS_EXITING.

HPE-bug-id: LUS-9977
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib3101b7e3eb8a7fcfd0012ac27367f1e65537f5d
Reviewed-on: https://review.whamcloud.com/44070
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1] 37/44037/2
Jian Yu [Sat, 19 Jun 2021 00:26:07 +0000 (17:26 -0700)]
LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]

Update SLES12 SP5 kernel to 4.12.14-122.74.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I98952c097b14c68f744a570e5558fb21d9392ad2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14773 tests: skip check_network() on working node 33/44033/4
Andreas Dilger [Fri, 18 Jun 2021 20:55:51 +0000 (14:55 -0600)]
LU-14773 tests: skip check_network() on working node

Don't call check_network() (which can take several seconds per node)
if the get_param command ran successfully on all of the nodes.  The
get_param success implies the connection to the remote nodes works
properly, and completes more quickly.

For consistency with previous behavior, still call check_network() if
get_param didn't return any output, since the modules may be unloaded.

Remove some extra visual clutter from every subtest.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6a11cf8a1a6b43bebc3ff8f5506e1faac13ebbe5
Reviewed-on: https://review.whamcloud.com/44033
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14668 lnet: Lock primary NID logic 63/43563/5
Amir Shehata [Wed, 5 May 2021 18:35:06 +0000 (11:35 -0700)]
LU-14668 lnet: Lock primary NID logic

If a peer is created by Lustre make sure to lock that peer's
primary NID. This peer can be discovered in the background.
There is no need to block until discovery is complete, as Lustre
can continue on with the primary NID it provided.

Discovery will populate the peer with other interfaces the peer has
but will not change the peer's primary NID. It can also delete
peer's NIDs which Lustre told it about (not the Primary NID).

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I677b8e01fc89a42128327645861ca6cfba4c1b1a
Reviewed-on: https://review.whamcloud.com/43563
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14668 lnet: peer state to lock primary nid 62/43562/5
Amir Shehata [Wed, 5 May 2021 01:20:54 +0000 (18:20 -0700)]
LU-14668 lnet: peer state to lock primary nid

Introduce the following two peer states:

LNET_PEER_LOCK_PRIMARY, set by Lustre to lock the primary NID
of a peer to the NID Lustre is configured with

LNET_PEER_BAD_CONFIG, set by LNet if Lustre attempts to set
a peer's Primary NID to a NID used as the primary NID of another
peer

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8c55e90ad2abd083c2fc902a04d4cd06a3412bfa
Reviewed-on: https://review.whamcloud.com/43562
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer/peer NI when processing llog 10/43510/6
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/43510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14661 lnet: Provide kernel API for adding peers 09/43509/5
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/43509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14531 osd: serialize access to object vs object destroy 33/43233/18
Alex Zhuravlev [Thu, 18 Mar 2021 08:43:06 +0000 (11:43 +0300)]
LU-14531 osd: serialize access to object vs object destroy

in osd-zfs as ZFS doesn't provide an internal mechanism for this.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5f25710a5cf1568f124733a15e77a37ffcb55434
Reviewed-on: https://review.whamcloud.com/43233
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12815 socklnd: allow dynamic setting of conns_per_peer 63/41463/13
Serguei Smirnov [Mon, 2 Aug 2021 14:48:35 +0000 (10:48 -0400)]
LU-12815 socklnd: allow dynamic setting of conns_per_peer

Modify lnetctl and associated code to allow dynamic setting
of conns_per_peer lnd parameter per ni.

The parameter can be set for a specific active nid:
        lnetctl net set --nid 192.168.122.10@tcp --conns-per-peer=4

Or when adding a new net, taking effect on the new nid:
        lnetctl net add --net tcp --if eth0 --conns-per-peer=1

By default, conns_per_peer value specified as the module parameter
shall be used.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I11625b9ad61f0311c294001a38b7855465491aaf
Reviewed-on: https://review.whamcloud.com/41463
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14093 mgc: rework mgc_apply_recover_logs() for gcc10 84/40484/8
Alex Zhuravlev [Tue, 3 Aug 2021 14:15:10 +0000 (10:15 -0400)]
LU-14093 mgc: rework mgc_apply_recover_logs() for gcc10

rework mgc_apply_recover_logs() to use a separate buffer of
appropriate size so that gcc10 doesn't complain:
mgc_request.c:1506:24: error: argument 4 may overlap destination
        object [-Werror=restrict]
 1506 |        pos += sprintf(obdname + pos, "-%s-%s", cname, inst);

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ice863b412475e53705dc6523ab30ba613244bd90
Reviewed-on: https://review.whamcloud.com/40484
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-6142 tests: remove iam_ut binary 09/44509/3
Andreas Dilger [Thu, 5 Aug 2021 20:21:44 +0000 (14:21 -0600)]
LU-6142 tests: remove iam_ut binary

Remove iam_ut binary that was incorrectly committed many years ago.

Test-Parameters: trivial
Fixes: 6e679230f2f5 ("LU-6142 tests: Remove file iam_ut.c")
Fixes: d2d56f38da01 ("make HEAD from b_post_cmd3")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2c254d990a3f07cad4feb7969e646d856b3ebbe5
Reviewed-on: https://review.whamcloud.com/44509
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14876 out: don't connect to busy MDS-MDS export 90/44390/5
Mikhail Pershin [Wed, 21 Jul 2021 15:14:01 +0000 (18:14 +0300)]
LU-14876 out: don't connect to busy MDS-MDS export

MDS-MDS connection is missing check for busy requests upon
reconnect, so resent can be executed concurrently with
original request.

- in ptlrpc_server_check_resend_in_progress() remove exception
  for bulk requests, they can be compared by XID nowadays.
  This prevents OUT requests vs resent execution as well.
- fix messages in target_handle_connect() to report correct
  information about connection details
- in out_handle() check for last_xid only once per OUT_UPDATE
- test 110m is added to recovery-small to reproduce the issue

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2ad183674d59a2cdeab0037bd8551c607b10ffeb
Reviewed-on: https://review.whamcloud.com/44390
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14798 lustre: Support RDMA only pages 11/44111/2
Amir Shehata [Thu, 6 Feb 2020 04:23:20 +0000 (20:23 -0800)]
LU-14798 lustre: Support RDMA only pages

Some memory architectures and CPU-offload cards with
on-board memory do not map data pages into the CPU
address space. Allow RDMA of data directly into those
pages without accessing contents.

Therefore, made changes to prevent doing checksum on
these type of pages.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I189c34893ffa500ed275f2a1f79e8fb817a2489d
lustre-change: https://review.whamcloud.com/37454
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44111
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14798 lnet: add LNet GPU Direct Support 10/44110/2
Amir Shehata [Thu, 6 Feb 2020 03:14:17 +0000 (19:14 -0800)]
LU-14798 lnet: add LNet GPU Direct Support

This patch exports registration/unregistration functions
which are called by the NVFS module to let the LND know
that it can call into the NVFS module to do RDMA mapping
of GPU shadow pages.

GPU priority is considered during NI selection.

Less than 4K writes are always RDMAed if the rdma source is
the gpu device

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2bfdbdd5fe3b8536e616ab442d18deace6756d57
lustre-change: https://review.whamcloud.com/37368
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14893 lctl: check user for changelog_deregister 32/44432/3
Emoly Liu [Fri, 30 Jul 2021 08:13:12 +0000 (16:13 +0800)]
LU-14893 lctl: check user for changelog_deregister

If no user is specified for "lctl changelog_deregister", usage
should be printed correctly.
Also, sanity.sh test_106e is modified to verify this fix.

Fixes: a15eb4f13224e ("LU-13055 mdd: per-user changelog names and mask")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ia7f1b18e82f6b4174b9435cd67aba5f591d43ce1
Reviewed-on: https://review.whamcloud.com/44432
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14881 libcfs: Complete testing for tcp_sock_set_* 74/44374/3
Shaun Tancheff [Thu, 22 Jul 2021 08:58:44 +0000 (03:58 -0500)]
LU-14881 libcfs: Complete testing for tcp_sock_set_*

Linux commits:
  v5.7-rc6-2504-gddd061b8daed
  tcp: add tcp_sock_set_quickack

  v5.7-rc6-2508-gd41ecaac903c
  tcp: add tcp_sock_set_keepintvl

  v5.7-rc6-2509-g480aeb9639d6
  tcp: add tcp_sock_set_keepcnt

Introduced a series of helper functions that may be
back ported individually.

Test-Parameters: trivial
Fixes: 99d9638d6c ("LU-13783 libcfs: support removal of kernel_setsockopt()")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4fce67b801979ec7857265b6bd0370c05737e268
Reviewed-on: https://review.whamcloud.com/44374
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14413 test: test for overstriping for sanity 27M 40/44340/8
James Simmons [Wed, 28 Jul 2021 00:10:29 +0000 (20:10 -0400)]
LU-14413 test: test for overstriping for sanity 27M

The introduction of sanity 27M broke interop with 2.12 LTS since
over striping doesn't exist in that version. Adjust the test to
use over striping if the client supports it, otherwise just use
traditional striping.

Test-Parameters: trivial testlist=sanity env=ONLY=27M
Change-Id: I2d788a116cbb749a83d6cec36f97d06533b32421
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44340
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14740 quota: reject invalid project id on server side 39/44339/2
Wang Shilong [Mon, 19 Jul 2021 07:14:43 +0000 (15:14 +0800)]
LU-14740 quota: reject invalid project id on server side

do sanity check before transfer project ID, reject invalid
project id if it comes from some older clients.

Test-parameters: trivial testlist=sanity-quota

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: If89e320c7808d188e615f5f0923c2322774b2ceb
Reviewed-on: https://review.whamcloud.com/44339
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 obdclass: move lu_ref to debugfs 11/44311/6
James Simmons [Tue, 20 Jul 2021 20:14:09 +0000 (16:14 -0400)]
LU-8066 obdclass: move lu_ref to debugfs

A special procfs file is created for lu_ref debugging. Lets move
this to debugfs where it belongs.

Also fixed a missed USE_LU_REF due to landing order as well as
a build fix.

Fixes: dfe2d225b86 ("LU-13799 clio: Implement real list splice")
Change-Id: I33646a87adfcabc5a5f214832953b2444e7aaf0a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14790 lnet: Reflect ni_fatal in NI status 72/44072/2
Chris Horn [Thu, 24 Jun 2021 17:16:46 +0000 (12:16 -0500)]
LU-14790 lnet: Reflect ni_fatal in NI status

If the ni_fatal_error_on flag is set on an NI then that NI should be
considered down.

HPE-bug-id: LUS-10167
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I201bda7e06da1fb1cc23db70ce0cfa3118635d0f
Reviewed-on: https://review.whamcloud.com/44072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14694 mdt: do not remove orphans at umount 83/43783/20
Alex Zhuravlev [Tue, 25 May 2021 15:39:14 +0000 (18:39 +0300)]
LU-14694 mdt: do not remove orphans at umount

as it's very likely that another MDT is being umounted as well
and such a removal can get stuck if the object being removed
is a striped directory.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0417b1b4447887e166c144605bbfa3249126eacd
Reviewed-on: https://review.whamcloud.com/43783
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 libcfs: discard cfs_cap_t, use kernel_cap_t 71/43171/9
Mr. NeilBrown [Wed, 14 Jul 2021 16:15:26 +0000 (12:15 -0400)]
LU-9859 libcfs: discard cfs_cap_t, use kernel_cap_t

lustre only sends 32bits of capabilities in on-the-wire RPC calls.
It current strips off higher bits and uses a 32bit cfs_cap_t
throughout.
Though there is a small memory cost, it is cleaner to use
kernel_cap_t throughout and only truncate when marshalling
data for RPC calls.

So this patch replaces cfs_cap_t with kernel_cap_t throughout,
and where a cfs_cap_t was previous stored in a __u32, we now
store cap.cap[0] instead.

With this, we can remove include/linux/libcfs/curproc.h

Linux-commit: 18f92a6e3d6bd00941ddfb5837835348f72d39dc

Change-Id: If7dd7a16c218dfc0d520e189f021ed6bda3b93fd
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/43171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lnet: LUTF Python infra 87/38087/50
Amir Shehata [Wed, 25 Mar 2020 02:23:43 +0000 (19:23 -0700)]
LU-10973 lnet: LUTF Python infra

Added the python LUTF infrastructure. The python infrastructure
provides the core LUTF feature set. The tests-infra is lnet
specific infrastructure to be used by LUTF test suites.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1d0336606625424880f1b64b1dd296d4c7ed85ea
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lnet: LUTF infrastructure updates 77/44177/3
Amir Shehata [Mon, 5 Jul 2021 18:17:16 +0000 (11:17 -0700)]
LU-10973 lnet: LUTF infrastructure updates

Fix Agent management
Handle python failures properly.
Change default location for temporary files to be in /tmp/lutf

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e37b6226dfa12de4b7a1f5bfd87f84e91ee1dda
Reviewed-on: https://review.whamcloud.com/44177
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-6142 lustre: use list_first_entry() in lustre subdirectory. 38/44338/2
Mr. NeilBrown [Sun, 18 Jul 2021 12:57:33 +0000 (08:57 -0400)]
LU-6142 lustre: use list_first_entry() in lustre subdirectory.

Convert
  list_entry(foo->next .....)
to
  list_first_entry(foo, ....)

in 'lustre'

In several cases the call is combined with
a list_empty() test and list_first_entry_or_null() is used

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I27b8b55cac2cfeaf95bb66930958c49ad422156e
Reviewed-on: https://review.whamcloud.com/44338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>