Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-14662 lnet: set eth routes needed for multi rail 65/44065/15
Serguei Smirnov [Wed, 23 Jun 2021 22:51:21 +0000 (15:51 -0700)]
LU-14662 lnet: set eth routes needed for multi rail

When ksocklnd is initialized or new ethernet interfaces
are added via lnetctl, set the routing rules using a common
shell script ksocklnd-config. This ensures control over
source interface when sending traffic.

For example, for eth0 with ip 192.168.122.142/24:
   the output of "ip route show table eth0" should be
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.142

This step can be omitted by specifying
   options ksocklnd skip_mr_route_setup=1
in the conf file, or by using switch
   --skip-mr-route-setup
when adding NI with lnetctl. Note that the module parameter
takes priority over the lnetctl switch: if skip-mr-route-setup
is not specified when adding NI with lnetctl, the route still
won't get created if the conf file has skip_mr_route_setup=1.

The route also won't be created if any route already exists
for the given interface, assuming advanced users who manage
routing on their own will want to continue doing so.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ia14e637bd29d4bbce5dd93daad9992336b2e6b15
Reviewed-on: https://review.whamcloud.com/44065
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14448 lod: verify LOV early in lod_get_default_striping 70/45370/3
Lai Siyao [Sat, 23 Oct 2021 14:44:36 +0000 (10:44 -0400)]
LU-14448 lod: verify LOV early in lod_get_default_striping

lod_get_default_striping() will get both default LOV and default LMV,
and parse them to struct lod_default_striping one by one, however the
LOV and LMV data are both stored in lod_thread_info.lti_ea_store, so
lod_verify_striping() should verify LOV upon getting LOV, otherwise
if both exists, it's LMV that's verified, which will return -EINVAL.

Fixes: 6a08df2d0effc7a ("LU-14448 lod: verify LOV before set/inherit")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I9763d35bdbc74101fa8515d5096ec457a4cb3524
Reviewed-on: https://review.whamcloud.com/45370
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9704 grant: ignore grant info on read resend 71/45371/5
Vladimir Saveliev [Wed, 3 Nov 2021 10:52:14 +0000 (13:52 +0300)]
LU-9704 grant: ignore grant info on read resend

The following scenario makes a message like "claims 28672 GRANT, real
grant 0" to appear:

 1. client owns X grants and run rpcs to shrink part of those
 2. server fails over so that the shrink rpc is to be resent.
 3. on the clinet reconnect server and client sync on initial amount
 of grants for the client.
 4. shrink rpc is resend, if server disk space is enough, shrink does
 not happen and the client adds amount of grants it was going to
 shrink to its newly initial amount of grants. Now, client thinks that
 it owns more grants than it does from server points of view.
 5. the client consumes grants and sends rpcs to server. Server avoids
 allocating new grants for the client if the current amount of grant
 is big enough:
static long tgt_grant_alloc(struct obd_export *exp, u64 curgrant,
...
        if (curgrant >= want || curgrant >= ted->ted_grant + chunk)
                RETURN(0);
 6. client continues grants consuming which eventually leads to
 complains like "claims 28672 GRANT, real grant 0".

In case of resent of read and set_info:shrink RPCs grant info should
be ignored as it was reset on reconnect.

Tests to illustrate the issue is added.

HPE-bug-id: LUS-7666
Change-Id: I8af1db287dc61c713e5439f4cf6bd652ce02c12c
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45371
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-930 doc: update lfs migrate usage and man page 78/43378/11
Andreas Dilger [Mon, 19 Apr 2021 21:32:31 +0000 (15:32 -0600)]
LU-930 doc: update lfs migrate usage and man page

Update the usage and man page for "lfs migrate -m", noting that
this command will recursively migrate an entire directory tree.

It is not currently possible to migrate files with DoM components
between MDTs, so provide an example of how to work around this.

Only print the command-line options for commands in the usage
message instead of the full usage, since it is otherwise much
too verbose to see the actual error message being printed.  The
user should read the lfs-migrate.1 man page for full usage.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9e34a33fcc3f0e2b90bc499fb7b946c53e6111d1
Reviewed-on: https://review.whamcloud.com/43378
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15121 llite: skip request slot for lmv_revalidate_slaves() 75/45275/2
Andriy Skulysh [Fri, 30 Aug 2019 11:43:29 +0000 (14:43 +0300)]
LU-15121 llite: skip request slot for lmv_revalidate_slaves()

Some syscalls need lmv_revalidate_slaves(). It requires
second lock enqueue and the it can be blocked by
lack of RPC slots.

Don't acquire rpc slot for second lock enqueue.

Change-Id: Ida23c648c2bd169c4d238543731796232aa490dc
HPE-bug-id: LUS-8416
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/45275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15141 quota: optimize capability check for root squash 22/45322/2
Sebastien Buisson [Thu, 21 Oct 2021 06:56:44 +0000 (08:56 +0200)]
LU-15141 quota: optimize capability check for root squash

On client side, checking for owner/group quota can be directly
bypassed if this is for root and there is no root squash.

Change-Id: If29eca428d8748df412a717615e4d0a4886ddd04
Fixes: a4fbe7341b ("LU-14739 quota: nodemap squashed root cannot bypass quota")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/45322
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12678 ptlrpc: remove bogus LASSERT 21/45421/3
Andreas Dilger [Sat, 30 Oct 2021 00:40:40 +0000 (18:40 -0600)]
LU-12678 ptlrpc: remove bogus LASSERT

In the error case, it isn't possible for rc to be both -ENOMEM and
0 at the same time, so remove the incorrect LASSERT(rc == 0) to
avoid crashing the system on an allocation failure.

Improve error messages to conform to code style.

Fixes: ceeeae4271fd ("LU-12678 lnet: me: discard struct lnet_handle_me")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I61ac5d735d7b2658dae76213a2d40cbfd2bb8bb9
Reviewed-on: https://review.whamcloud.com/45421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11667 tests: Fix sanity test 317 for 64K PAGE_SIZE OST 95/45395/6
Xinliang Liu [Thu, 28 Oct 2021 09:48:38 +0000 (09:48 +0000)]
LU-11667 tests: Fix sanity test 317 for 64K PAGE_SIZE OST

When create a file, blocks are allocated with PAGE_SIZE aligned,
see function osd_ldiskfs_map_inode_pages(). E.g. for 64K PAGE_SIZE
Arm64 OST server, if create a file with size less than 64K, it
actually allocates 128 blocks each block 512 Bytes.

It needs to adjust the test for 64K PAGE_SIZE OST server.

Test-Parameters: trivial
Test-Parameters: clientarch=aarch64 fstype=ldiskfs testlist=sanity \
env=PTLDEBUG=-1,ONLY=317
Test-Parameters: fstype=ldiskfs testlist=sanity \
env=PTLDEBUG=-1,ONLY=317

Change-Id: Iada701f4f424093e847fc70aa843873b75fe6b06
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45395
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15156 kernel: back port patch for rwsem issue 83/45383/3
Yang Sheng [Tue, 26 Oct 2021 08:09:20 +0000 (16:09 +0800)]
LU-15156 kernel: back port patch for rwsem issue

RHEL7 included a defeat in rwsem. It can cause a
thread hung on rwsem waiting infinity. Backport
commit: 5c1ec49b60cdb31e51010f8a647f3189b774bddf
to fix this issue.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ic5c469ce744ad5882c13163a9bfe14faef8fd446
Reviewed-on: https://review.whamcloud.com/45383
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15168 osd: use large allocation for idc cache 82/45382/2
Alex Zhuravlev [Wed, 27 Oct 2021 05:48:03 +0000 (08:48 +0300)]
LU-15168 osd: use large allocation for idc cache

as in some cases (e.g. ofd precreate) the cache can grow to dozens
of kilobytes (sizeof(struct idc_map_cache)=40 * 1024).

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id9e0996a7a1d07065f4a50c1d5be5051e756559a
Reviewed-on: https://review.whamcloud.com/45382
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
2 years agoLU-15154 kernel: kernel update SLES15 SP3 [5.3.18-59.27.1] 49/45349/4
Jian Yu [Mon, 25 Oct 2021 17:33:03 +0000 (10:33 -0700)]
LU-15154 kernel: kernel update SLES15 SP3 [5.3.18-59.27.1]

Update SLES15 SP3 kernel to 5.3.18-59.27.1 for Lustre client.

Test-Parameters: trivial

Change-Id: Ie3c369a8e93a75b4afbde55489bd3819bb39e1de
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45349
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15138 lnet: Fail peer add for existing gw peer 37/45337/5
Chris Horn [Fri, 22 Oct 2021 00:13:19 +0000 (00:13 +0000)]
LU-15138 lnet: Fail peer add for existing gw peer

If there's an existing peer entry for a peer that is being added
via CLI, and that existing peer was not created via the CLI, then
DLC will attempt to delete the existing peer before creating a new
one. The exit status of the peer deletion was not being checked.
This results in the ability to add duplicate peers for gateways,
because gateways cannot be deleted via lnetctl unless the routes
for that gateway have been removed first.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9b7864a2bd21540336f72d96e180c89bd0aae8dc
Reviewed-on: https://review.whamcloud.com/45337
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15086 ptlrpc: fix timeout after spurious wakeup 08/45308/5
Alex Zhuravlev [Wed, 20 Oct 2021 11:10:57 +0000 (14:10 +0300)]
LU-15086 ptlrpc: fix timeout after spurious wakeup

so that final timeout don't exceed requested one

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iff5e08c589cbbc3c483915002f3f9df7a6f2678a
Reviewed-on: https://review.whamcloud.com/45308
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15122 osd-ldiskfs: Fix ASSERTION( iobuf->dr_rw == 0 ) with 64KB PAGE_SIZE 88/45288/7
Xinliang Liu [Tue, 19 Oct 2021 08:15:59 +0000 (08:15 +0000)]
LU-15122 osd-ldiskfs: Fix ASSERTION( iobuf->dr_rw == 0 ) with 64KB PAGE_SIZE

During a writing, if there is a page can not be mapped to blocks
at once, it will cause "ASSERTION( iobuf->dr_rw == 0 )" crash
which leads by the overflow access of mapped blocks array.

This will happen on Arm platforms easily with 64KB PAGE_SIZE.
And will not happen on x86_64 platforms with 4KB PAGE_SIZE.
Because for 4KB block size, if PAGE_SIZE is 4KB, then i == 0
and blocks_left_page == 1. Which makes the inner loop each time
handle one block. Thus the outer loop condition "block_idx <
block_idx_end;" insures blocks[] array access not overflow.

Check the actual mapped count so that mapped blocks array
access will not overflow.

Fixes: 0271b17b80a8 ("LU-14134 osd-ldiskfs: reduce credits for new writing")
Change-Id: Icd46c04bea2d7930456840694d422758eebb4186
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45288
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15102 lnet: Reset ni_ping_count only on receive 35/45235/3
Chris Horn [Wed, 13 Oct 2021 23:30:01 +0000 (18:30 -0500)]
LU-15102 lnet: Reset ni_ping_count only on receive

The lnet_ni:ni_ping_count is currently reset on every (healthy) tx.
We should only reset it when receiving a message over the NI. Taking
net_lock 0 on every tx results in a performance loss for certain
workloads.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8fdf2bc62a ("LU-13569 lnet: Recover local NI w/exponential backoff interval")
HPE-bug-id: LUS-10427
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I67ea3aa977cb5d67b04f7957120c29e9985c83e6
Reviewed-on: https://review.whamcloud.com/45235
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15092 o2iblnd: Fix logic for unaligned transfer 16/45216/4
Chris Horn [Thu, 16 Sep 2021 17:12:38 +0000 (12:12 -0500)]
LU-15092 o2iblnd: Fix logic for unaligned transfer

It's possible for there to be an offset for the first page of a
transfer. However, there are two bugs with this code in o2iblnd.

The first is that this use-case will require LNET_MAX_IOV + 1 local
RDMA fragments, but we do not specify the correct corresponding values
for the max page list to ib_alloc_fast_reg_page_list(),
ib_alloc_fast_reg_mr(), etc.

The second issue is that the logic in kiblnd_setup_rd_kiov() attempts
to obtain one more scatterlist entry than is actually needed. This
causes the transfer to fail with -EFAULT.

Test-Parameters: trivial
HPE-bug-id: LUS-10407
Fixes: d226464aca ("LU-8057 ko2iblnd: Replace sg++ with sg = sg_next(sg)")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ifb843f11ae34a99b7d8f93d94966e3dfa1ce90e5
Reviewed-on: https://review.whamcloud.com/45216
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15094 o2iblnd: map_on_demand not needed for frag interop 15/45215/2
Chris Horn [Wed, 29 Sep 2021 17:42:26 +0000 (12:42 -0500)]
LU-15094 o2iblnd: map_on_demand not needed for frag interop

The map_on_demand tunable is not used for setting max frags so don't
require that it be set in order to negotiate max frags.

HPE-bug-id: LUS-10488
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie89f1f035f4b05244feffb848c14582a8c7cf0e6
Reviewed-on: https://review.whamcloud.com/45215
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15049 quota: fix a panic with pool number > 16 05/45105/3
Sergey Cheremencev [Thu, 17 Jun 2021 10:45:42 +0000 (13:45 +0300)]
LU-15049 quota: fix a panic with pool number > 16

Fix a panic that may occur when there are more than 16
pools in a system:
qti_pools_add()) ASSERTION( qti->qti_pools_num >= QMT_MAX_POOL_NUM ) failed: Forgot init? ffff91a5f9625800

HPE-bug-id: LUS-10116
Change-Id: I4f73b74d2fd3e85a51cf3c30e2eec29645f164be
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45105
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15048 quota: check that qti_lqes has been inited 02/45102/3
Sergey Cheremencev [Thu, 22 Jul 2021 10:56:24 +0000 (13:56 +0300)]
LU-15048 quota: check that qti_lqes has been inited

qti_lqes_resotre_init/fini should check that qti_lqes
has been inited before address qti_lqes_count.

Fix helps against following panic:
qti_lqes_restore_fini()) ASSERTION( qmt_info(env)->qti_lqes_rstr ) failed:

HPE-bug-id: LUS-10239
Change-Id: Ic93d87535f615fe419b2c3a2453506c515837031
Reviewed-on: https://es-gerrit.dev.cray.com/159116
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45102
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14713 llite: tighten condition for fault not drop mmap_sem 15/44715/7
Bobi Jam [Thu, 2 Sep 2021 16:38:43 +0000 (00:38 +0800)]
LU-14713 llite: tighten condition for fault not drop mmap_sem

As __lock_page_or_retry() indicates, filemap_fault() will return
VM_FAULT_RETRY without releasing mmap_sem iff flags contains
FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT.

So ll_fault0() should pass in FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT in ll_filemap_fault() so that when it
returns VM_FAULT_RETRY, we can pass on trying normal fault
under DLM lock as mmap_sem is still being held.

While in Linux 5.1 (6b4c9f4469819) FAULT_FLAG_RETRY_NOWAIT is enough
to not drop mmap_sem when failed to lock the page.

Fixes: 87865e4ae9 ("LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9420c587301722b597155558657577349a8141e4
Reviewed-on: https://review.whamcloud.com/44715
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14927 osd: share brw_stats code between OSD back ends. 90/44690/8
James Simmons [Mon, 25 Oct 2021 20:56:31 +0000 (16:56 -0400)]
LU-14927 osd: share brw_stats code between OSD back ends.

Both the ldiskfs and ZFS OSD backend handle brw_stats. With the
stricter GPL requirement ZFS can no longer carry the brw_stats
code. So move the common code to lprocfs_status_server.c as
well as move brw_stats to debugfs as well.

Change-Id: I294e5df3557552266dd3a02d3bc9844c42c01f60
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44690
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14793 hsm: record index for further HSM action scanning 77/44077/11
Qian Yingjin [Fri, 25 Jun 2021 08:22:35 +0000 (16:22 +0800)]
LU-14793 hsm: record index for further HSM action scanning

there is contention between HSM archive request and "hsm_cdtr"
kernel thread:
->mdt_hsm_request()
  ->mdt_hsm_add_actions()
    ->mdt_hsm_register_hal()
      ->mdt_agent_record_add()
        ->down_write(&cdt->cdt_llog_lock)
        ->llog_cat_add()
        ->up_write(&cdt->cdt_llog_lock)

->mdt_coordinator()
  ->cdt_llog_process()
    ->down_write(&cdt->cdt_llog_lock);
    ->llog_cat_process()
    ->up_write(&cdt->cdt_llog_lock);

HSM archive request and HSM cat llog scanning in the kernel daemon
"hsm_cdtr" are both contenting for write llog lock to add or
update the "hsm_actions" llog.

In the tesing, it uses max_requests = 1000000.
In the current implementation, it means kernel daemon thread
"hsm_cdtr" needs to scan nearly whole "hsm_actions" llog from the
beginning position with write llog lock held.
This will slow down the HSM archive requests which is contented
for write llog lock.

As llog is append-only, we record the latest handled position in
the llog, thus next scanning can start from the previous recorded
postion (llog index), does not need to start from the beginning.

Another way to mitigate this probelm is:
when the llog scanner found that there are other process
contended for the llog lock, it will stop the llog scanning and
release write llog lock properly for incoming HSM archive requests.

After applied this patch, with 200000 HSM actions in llog, the time
to queue 10000 HSM archive requests reduces from 10 seconds to 4
seconds.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I2e92daf34844605ee648787daf859143335c68bf
Reviewed-on: https://review.whamcloud.com/44077
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14706 libcfs: ___wait_event() update for older kernels 85/43785/7
Shaun Tancheff [Fri, 30 Apr 2021 11:43:46 +0000 (06:43 -0500)]
LU-14706 libcfs: ___wait_event() update for older kernels

A couple of wait issues fail to build correctly on older
3.0 kernels

Fixes: f6df31a163 ("LU-10467 lustre: add wait_event macros suitable for upstream")
Fixes: d05427a785 ("LU-10428 lnet: call event handlers without res_lock")
HPE-bug-id: LUS-9975
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I275fbf79c4472575d867075a3e3ebd3d6ec1cdfa
Reviewed-on: https://review.whamcloud.com/43785
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13358 libcfs: add timeout to cfs_race() to fix race 61/43161/25
Alex Zhuravlev [Tue, 30 Mar 2021 05:57:14 +0000 (08:57 +0300)]
LU-13358 libcfs: add timeout to cfs_race() to fix race

there is no guarantee for the branches in cfs_race() to be executed
in strict order, thus it's possible that the second branch (with
cfs_race_state=1) is executed before the first branch and then another
thread executing the first branch gets stuck.

this construction is used for testing only and as a
workaround it's enough to timeout.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie1cc0accedb3e1a198d4b17d1ab00ce298c560f2
Reviewed-on: https://review.whamcloud.com/43161
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15171 osd-ldiskfs: xattr_sem locking is missing for dquot_transfer 24/45424/6
Andrew Perepechko [Sun, 31 Oct 2021 20:03:30 +0000 (23:03 +0300)]
LU-15171 osd-ldiskfs: xattr_sem locking is missing for dquot_transfer

Kernel commit 7a9ca53ae (~v4.13) added the requirement for xattr_sem locking
when calling *dquot_transfer. As of now, in rare cases, it is possible that
we can modify inode xattrs and perform their consistency checks in parallel,
which can fail.

Change-Id: I041694e30ce6c8398864c0ad57671df0bffd2f52
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10549
Reviewed-on: https://review.whamcloud.com/45424
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15143 osd-ldiskfs: osd_declare_write() underestimates credits 28/45328/3
Andrew Perepechko [Thu, 21 Oct 2021 19:40:39 +0000 (22:40 +0300)]
LU-15143 osd-ldiskfs: osd_declare_write() underestimates credits

osd_declare_write() seems to underestimate journal credits for
the extent case. It does not properly account credits for
a new extent tree block. 3 credits (bitmap, gd, self) should be
accounted for a new data block and for a new extent tree block.

Change-Id: Iad463cac3a2a8c2b2a6b1a634e7502039bb1b7e5
HPE-bug-id: LUS-10514
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45328
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: switch to large lnet_processid for matching 97/43597/9
Mr NeilBrown [Fri, 17 Apr 2020 01:27:54 +0000 (11:27 +1000)]
LU-10391 lnet: switch to large lnet_processid for matching

Change lnet_libhandle.me_match_id and lnet_match_info.mi_id to
struct lnet_processid, so they support large nids.

This requires changing
  LNetMEAttach(), lnet_mt_match_head(), lnet_mt_of_attach(),
  lnet_ptl_match_type(), lnet_match2mt()
to accept a pointer to lnet_processid rather than an
lnet_process_id.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I6957b467bb9af84e20a4525db6351694f4d2a7af
Reviewed-on: https://review.whamcloud.com/43597
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: extend prefered nids in struct lnet_peer_ni 96/43596/9
Mr NeilBrown [Sat, 11 Sep 2021 14:16:49 +0000 (10:16 -0400)]
LU-10391 lnet: extend prefered nids in struct lnet_peer_ni

union lpni_pref in struct lnet_peer_ni how has 'struct lnet_nid'
rather than lnet_nid_t.

Also, lnet_peer_ni_set_no_mr_pref_nid() allows the pref nid to be NULL
and is a no-op in that case.

Rather then updating the user-space cfs_match_nid_net() in
libcfs/utils/nidstrings.c, remove it as it is unused.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9a2453185aa5d708e6939dadc1e954c9dbd24efc
Reviewed-on: https://review.whamcloud.com/43596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change tp_nid to 16byte in lnet_test_peer. 95/43595/10
Mr NeilBrown [Mon, 6 Apr 2020 03:11:33 +0000 (13:11 +1000)]
LU-10391 lnet: change tp_nid to 16byte in lnet_test_peer.

This updates 'struct lnet_test_peer' to store a large address nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id2f97a841bb0738503b0e87e7a9e2f8bebc4c3ec
Reviewed-on: https://review.whamcloud.com/43595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15210 tests: fix sanity-lnet to handle duplicate IP 51/45551/5
Serguei Smirnov [Fri, 12 Nov 2021 17:08:53 +0000 (09:08 -0800)]
LU-15210 tests: fix sanity-lnet to handle duplicate IP

The same IP may be added on the same interface with different netmasks
specified.
sanity-lnet is not expecting this and fails. Fix it.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I70c2df5f14c88362ea5af6f06410823c56535dee
Reviewed-on: https://review.whamcloud.com/45551
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13456 ldlm: fix reprocessing of locks with more bits 44/38244/5
Andriy Skulysh [Mon, 16 Dec 2019 20:09:37 +0000 (22:09 +0200)]
LU-13456 ldlm: fix reprocessing of locks with more bits

Reprocessing check queues should be extended
with just granted lock inodebits.

ldlm_reprocess_all() should be called on BL AST race.

Change-Id: Ifd232062068481c1c62fa2f2a14c7778d4402260
Fixes: 2250e072c3785 ("LU-12017 ldlm: DoM truncate deadlock")
HPE-bug-id: LUS-8224
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/38244
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15070 llite: update default LMV upon any change 37/45237/4
Lai Siyao [Tue, 12 Oct 2021 22:15:37 +0000 (18:15 -0400)]
LU-15070 llite: update default LMV upon any change

max_inherit and max_inherit_rr was newly added, and they are missing
in lsm_md_eq(), therefore client may not update default LMV when
either of these two fields is changed.

Add sanityn 112.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iac71b530b3702105c4213715826b1782c6aba7ca
Reviewed-on: https://review.whamcloud.com/45237
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15070 mdt: revoke remote LOOKUP lock for default LMV 36/45236/4
Lai Siyao [Tue, 12 Oct 2021 22:20:21 +0000 (18:20 -0400)]
LU-15070 mdt: revoke remote LOOKUP lock for default LMV

When setting default LMV, it will revoke LOOKUP lock, while if dir
is remote dir, its LOOKUP lock is on MDT where its parent is located.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I9f079a0bcff530603725ce72cd89c14935ba913b
Reviewed-on: https://review.whamcloud.com/45236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15151 tests: use facet check instead of node check 69/45369/3
Elena Gryaznova [Tue, 26 Oct 2021 11:34:30 +0000 (14:34 +0300)]
LU-15151 tests: use facet check instead of node check

Change wait_update_cond() call to wait_update_facet_cond()
call in test_119.

Fixes: 98f107b53e4d ("LU-9699 osp: don't assert on OSP duplicating")
Test-Parameters: trivial testlist=conf-sanity env=ONLY=119
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10557
Change-Id: Ia1ff7921026212814ec71e0c3aa60f23fbd7278f
Reviewed-on: https://review.whamcloud.com/45369
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
2 years agoLU-15160 kernel: kernel update SLES12 SP5 [4.12.14-122.91.2] 58/45358/2
Jian Yu [Mon, 25 Oct 2021 20:04:53 +0000 (13:04 -0700)]
LU-15160 kernel: kernel update SLES12 SP5 [4.12.14-122.91.2]

Update SLES12 SP5 kernel to 4.12.14-122.91.2 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: Ia6620869fa84d72f8d22c4a8a039600037ddb2d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45358
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15149 lnet: Missing newline in lnet_add_route 40/45340/2
Chris Horn [Fri, 22 Oct 2021 01:07:58 +0000 (01:07 +0000)]
LU-15149 lnet: Missing newline in lnet_add_route

CWARN string is missing a newline character.

Test-Parameters: trivial
Fixes: 3f2844dc93 ("LU-14945 lnet: don't use hops to determine the route state")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I06370c36e9d88b7e02e000bfb573297ff281aef1
Reviewed-on: https://review.whamcloud.com/45340
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15136 socklnd: default conns_per_peer to 0 19/45319/7
Serguei Smirnov [Thu, 21 Oct 2021 02:09:06 +0000 (19:09 -0700)]
LU-15136 socklnd: default conns_per_peer to 0

Setting conns_per_peer to 0 triggers socklnd to choose the
(heuristically) optimal setting for the interface given its speed.
Make 0 the default for socklnd conns_per_peer.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Fixes: c44afcfb72 ("LU-12815 socklnd: set conns_per_peer based on link speed")
Change-Id: Ie6e76eaee8693472384cce362b394b216142884e
Reviewed-on: https://review.whamcloud.com/45319
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15150 tests: sanity-lnet removes testsuite log on failure 42/45342/6
Chris Horn [Fri, 22 Oct 2021 01:34:23 +0000 (01:34 +0000)]
LU-15150 tests: sanity-lnet removes testsuite log on failure

cleanup_testsuite() needs to be more selective when removing files
created by sub-tests.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: aa739144551 ("LU-13569 tests: Check LNet Health recovery logic")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic17a68ff2aa552594a0f1ea470c39177abe985fc
Reviewed-on: https://review.whamcloud.com/45342
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15131 tests: check parameter correctly 00/45300/3
Elena Gryaznova [Tue, 19 Oct 2021 18:10:31 +0000 (21:10 +0300)]
LU-15131 tests: check parameter correctly

sanity test_254() always skipped due to wrong
parameter path: MDT0 not initialized before get_param.

Fixes: 33aad7829de5 ("LU-10461 tests: call exit in the skip routine")
Test-Parameters: trivial testlist=sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9969
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I02e38a27ca4656128e62339d63425f85386fa905
Reviewed-on: https://review.whamcloud.com/45300
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15061 tests: fix sanity-dom exit status 99/45299/2
Elena Gryaznova [Tue, 19 Oct 2021 17:36:06 +0000 (20:36 +0300)]
LU-15061 tests: fix sanity-dom exit status

If any of sanity tests fails during "ONLY=sanity sh sanity-dom.sh"
execution -- the next "ONLY=sanityn sh sanity-dom.sh" is always
detected as failed even if all sanityn tests pass. This is
because of ${TMP}/sanity.log exists and is checked for
"FAIL" also, while ${TMP}/sanityn.log is to be proceeded only.

Fixes e2cb43c409b9 ("LU-13773 tests: subscript failure propagation")

Test-Parameters: trivial testlist=sanity-dom
HPE-bug-id: LUS-10480
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I764d1d6df08da13acfecd445e67ced1b455ddce8
Reviewed-on: https://review.whamcloud.com/45299
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15098 tests: sanity-sec 27a exec commands on right node 93/45293/3
Sebastien Buisson [Tue, 19 Oct 2021 15:59:33 +0000 (17:59 +0200)]
LU-15098 tests: sanity-sec 27a exec commands on right node

In nodemap_exercise_fileset called from sanity-sec test 27a,
make sure all commands are executed on first client, as we are
testing properties of nodemaps 'default' and 'c0'.
And make sure 'default' nodemap has admin and trusted properties
set to 1, as we are carrying operations as root.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec clientcount=2 env=ONLY=27a
Fixes: 0daeebcbdc ("LU-14797 nodemap: map project id")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idd9f391db60475721f3a3856b5e3bee1a18bbbca
Reviewed-on: https://review.whamcloud.com/45293
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15103 tests: clean up busy cifs mount 38/45238/3
gaurav mahajan [Thu, 14 Oct 2021 10:12:38 +0000 (13:12 +0300)]
LU-15103 tests: clean up busy cifs mount

Busy cifs mount point makes cleanup_cifs fail which
will infact fails lustre unmount as cifs mount point is
lustre mount.

Test-Parameters: trivial
Signed-off-by: gaurav mahajan <gaurav.mahajan@seagate.com>
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-4146
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I4b7ec7e0a6a706198e328dad337648bf3cb2c3be
Reviewed-on: https://review.whamcloud.com/45238
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15099 kernel: kernel update RHEL7.9 [3.10.0-1160.45.1.el7] 26/45226/2
Jian Yu [Wed, 13 Oct 2021 16:43:11 +0000 (09:43 -0700)]
LU-15099 kernel: kernel update RHEL7.9 [3.10.0-1160.45.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.45.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I11c307bfd6a6b353bc7b6fe40bb5d604bc9b3fdc
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45226
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: missing defs in includes for client encryption 21/45221/2
Sebastien Buisson [Wed, 13 Oct 2021 09:35:01 +0000 (09:35 +0000)]
LU-13717 sec: missing defs in includes for client encryption

Add a few missing definitions in lustre_crypto.h that are required
in case Lustre client encryption is built against the in-kernel
fscrypt library.

Fixes: 028281ae19 ("LU-13717 sec: rework includes for client encryption")
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1965503554dcf660770d201444cfafd54aa84dce
Reviewed-on: https://review.whamcloud.com/45221
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15093 libcfs: Check if param_set_uint_minmax is provided 14/45214/3
Chris Horn [Mon, 27 Sep 2021 20:48:02 +0000 (15:48 -0500)]
LU-15093 libcfs: Check if param_set_uint_minmax is provided

Linux kernel v5.15 commit 2a14c9ae15a38148484a128b84bff7e9ffd90d68
moved param_set_uint_minmax to common code.

HPE-bug-id: LUS-10469
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ifd1d72ae531f0f6c7cd96cc28fbc07c8a8b70886
Reviewed-on: https://review.whamcloud.com/45214
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14999 mdt: Deadlock on parent during resend 85/44885/9
Andriy Skulysh [Sun, 6 Sep 2020 09:21:14 +0000 (12:21 +0300)]
LU-14999 mdt: Deadlock on parent during resend

Parent-child lock order gets broken during resend as
there is child lock already but there isn't parent lock
and MDS tries to lock it again.

Don't lookup child on resend, extract fid from the lock instead.

Change-Id: I443648a8162e770c63fd087dd534c27e7c637c40
HPE-bug-id: LUS-9306
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/44885
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14929 gss: detect libkeyutils dependency 97/44597/3
Sebastien Buisson [Wed, 11 Aug 2021 15:44:08 +0000 (17:44 +0200)]
LU-14929 gss: detect libkeyutils dependency

When building GSS support, gss_keyring requires libkeyutils.
So make sure this dependency is properly detected at configure time,
and include keyutils.h only when required.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9fa5750f4609250ecdc1c47f68b97bff9be13ace
Reviewed-on: https://review.whamcloud.com/44597
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14124 target: set OBD_MD_FLGRANT in read's reply 75/43375/14
Vladimir Saveliev [Wed, 20 Oct 2021 10:32:11 +0000 (13:32 +0300)]
LU-14124 target: set OBD_MD_FLGRANT in read's reply

If tgt_grant_shrink() decides to not shrink grants - a client is
supposed to restore its cl_grant_avail in osc_update_grant(). In case
of read OBD_MD_FLGRANT is not set on reply's body->oa.o_valid, so
osc_update_grant() misses the cl_grant_avail update. As result server
keeps thinking that client has a lot of grants while a client thinks
that it is missing grants badly. That may lead to performance
degradation.

A test to illustrate the issue is included.

Test-Parameters: trivial testlist=sanity
Change-Id: Ibe7ce0af5701226c8be3ae3f9ad57c354791fa0f
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
HPE-bug-id: LUS-9943
Reviewed-on: https://review.whamcloud.com/43375
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15146 mdt: mdt_lvb2reply crash fix 34/45334/3
Alexander Zarochentsev [Thu, 12 Nov 2020 16:47:36 +0000 (19:47 +0300)]
LU-15146 mdt: mdt_lvb2reply crash fix

mdt_lvb2body may crash if res->lr_lvb_data is
not allocated, make it tolerant to not initialized
lvb_data pointer.

HPE-bug-id: LUS-9549
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ie31cbba9187f9b04b3d1f8d2abc59e0612a44b41
Reviewed-on: https://review.whamcloud.com/45334
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15047 gss: gss integrity check with multi-rail 77/45277/2
Sebastien Buisson [Mon, 18 Oct 2021 11:26:40 +0000 (13:26 +0200)]
LU-15047 gss: gss integrity check with multi-rail

With multi-rail, a primary NID is used as node identifier, but LNet
decides which NID is actually used for sending/receiving data, on a
per request basis.
For the integrity check mechanism implemented as part of GSS, the
primary NID must be used in order to compute HMAC with the correct
key, independently of the actual NID for the current request.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2bf3974d3aa0e8365a9413dca56c69ee3734c12b
Reviewed-on: https://review.whamcloud.com/45277
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14587 ptlrpc: remove LASSERT in nrs_polices proc handler 00/45200/3
Lei Feng [Tue, 12 Oct 2021 06:33:22 +0000 (14:33 +0800)]
LU-14587 ptlrpc: remove LASSERT in nrs_polices proc handler

It's not necessary to LASSERT() in nrs_polices proc handler.
CERROR() and returning error is good enough.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I09f06dc4ab90e49b2df66a9b47a74678c64cdd2f
Reviewed-on: https://review.whamcloud.com/45200
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15089 tests: allow enough time to create tcp connections 31/45331/10
Serguei Smirnov [Thu, 21 Oct 2021 21:03:34 +0000 (14:03 -0700)]
LU-15089 tests: allow enough time to create tcp connections

Allow enough time to create tcp connections before counting them
when testing socklnd conns_per_peer setting in sanity-lnet test_230

Test-Parameters: trivial testlist=sanity-lnet
Fixes: a5cbe7883db6 ("LU-12815 socklnd: allow dynamic setting of conns_per_peer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ia3e25de157da03d6129b108c1af9632a8faf8efd
Reviewed-on: https://review.whamcloud.com/45331
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15125 o2iblnd: wrong list used for kib_connd_waits 16/45316/2
Chris Horn [Wed, 20 Oct 2021 19:58:52 +0000 (14:58 -0500)]
LU-15125 o2iblnd: wrong list used for kib_connd_waits

The ibc_list field of struct kib_conn is used for the kib_connd_waits
list, but kiblnd_connd() uses ibc_sched_list in the
list_first_entry_or_null macro.

Test-Parameters: trivial
Fixes: 34b57a6f8f ("LU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory.")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0cc8a94636a5129956c53e48ae96b27ece5f0228
Reviewed-on: https://review.whamcloud.com/45316
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15118 ldlm: no free thread to process resend request 72/45272/2
Andriy Skulysh [Wed, 14 Oct 2020 09:01:51 +0000 (12:01 +0300)]
LU-15118 ldlm: no free thread to process resend request

MDS grants lock request and sends a reply,
input request queue can be filled immediately
with conflicting lock enqueue request.
So there isn't any free thread to process resend of
first lock enqueue request if the client has failed
to receive the reply.

Process lock enqueue resends with existing lock on MDS
in high priority queue.

Change-Id: If7b94690100b44c774dc14231ed4902f701ed807
HPE-bug-id: LUS-9444
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/45272
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15106 ofd: quiet deprecated param warning 46/45246/2
Andreas Dilger [Thu, 14 Oct 2021 20:01:30 +0000 (14:01 -0600)]
LU-15106 ofd: quiet deprecated param warning

There are a number of obdfilter parameter files that report a
warning even when they are read, which is confusing for users
if there is a tool that is scraping all available parameters:

    # lctl get_param obdfilter.*.*
    ofd: 'obdfilter.*.read_cache_enabled' is deprecated,
         use 'osd-*.read_cache_enabled' instead
    ofd: 'obdfilter.*.readcache_max_filesize' is deprecated,
         use 'osd-*.readcache_max_filesize' instead
    ofd: 'obdfilter.*.sync_on_lock_cancel' is deprecated,
         use 'obdfilter.*.sync_lock_cancel' instead
    ofd: 'obdfilter.*.writethrough_cache_enabled' is deprecated,
         use 'osd-*.writethrough_cache_enabled' instead

It should only print a message if the parameters are actually written.
Also fix the messages to reference the correct parameter names.

Most of these parameter links were added in 2.4 with the addition of
osd-ldiskfs.  However, the deprecation warnings were only added in
2.12.53 and slated for removal in 2.15, but were not backported to
2.12 LTS, and there hasn't been an LTS release since then, so it is
better bump removal so the upcoming 2.15 LTS release includes them.

Fix the test scripts to only use the new parameter names, to avoid
spurious warning messages.  We don't test interop with 2.3 anymore.

Test-Parameters: trivial
Fixes: 7df7347b7b18 ("LU-12967 ofd: restore sync_on_lock_cancel tunable")
Fixes: 493cd8088388 ("LU-8066 osd: migrate from proc to sysfs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie548e5b6af5463959fb4774e31996097373ebbe5
Reviewed-on: https://review.whamcloud.com/45246
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15104 tests: create client dirs with custom stripe count 40/45240/2
Elena Gryaznova [Thu, 14 Oct 2021 11:03:37 +0000 (14:03 +0300)]
LU-15104 tests: create client dirs with custom stripe count

Random ha_dir_stripe_count allows clients to create directories
with different stripe counts in one ha.sh test session.
Please use
  DSTRIPECOUNT=3 DSTRIPECOUNTRAND=true
to create the directories with stripe counts 1,2 or 3.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10435
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I0092958a9bcf1991adc9f45ac1fbed9340a06c57
Reviewed-on: https://review.whamcloud.com/45240
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14195 obdclass: change list_sort to use const pointers 19/45219/2
Jian Yu [Wed, 13 Oct 2021 07:12:03 +0000 (00:12 -0700)]
LU-14195 obdclass: change list_sort to use const pointers

Kernel 5.10.70 commit 4f0f586bf0c898233d8f316f471a21db2abd522d
defines the list_cmp_func_t type and changes the comparison
function types of all list_sort() callers to use const pointers
to avoid type mismatches.

Change-Id: I40d37ec0f0d485d0deebaa9dc3493f2865f76ec9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45219
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15081 vfs: set_nlink() is not race-safe 91/45191/2
Andrew Perepechko [Mon, 11 Oct 2021 19:11:05 +0000 (22:11 +0300)]
LU-15081 vfs: set_nlink() is not race-safe

set_nlink() is not atomic wrt race with itself and
the following warning may be triggered by VFS:

WARNING: CPU: 5 PID: 195090 at fs/inode.c:241 __destroy_inode+0xdb/0xf0

It does not seem important what exact nlink value is the result
of the race. However, we need to protect the superblock remove
counter.

Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-9825
Change-Id: I67bc345b9a9e43fb88d919a83246759d11604b03
Reviewed-on: https://review.whamcloud.com/45191
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15045 utils: fix lfs_migrate for files with spaces 73/45173/6
Andreas Dilger [Thu, 30 Sep 2021 02:51:51 +0000 (20:51 -0600)]
LU-15045 utils: fix lfs_migrate for files with spaces

Fix the lfs_migrate script to properly quote "$OLDNAME" so that
it works for filenames with spaces and other characters in them.

Test-Parameters: trivial
Fixes: 8bedfa377fbd ("LU-11510 lfs: migrate a composite layout file correctly")
Fixes: 128137adfc53 ("LU-13090 utils: fix lfs_migrate -p for file with pool")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic00f41f3a91ad9dfa491ff57768a3da0c6300c1e
Reviewed-on: https://review.whamcloud.com/45173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15071 utils: tunefs erease-params for zfs 45/45145/2
Alexander Boyko [Thu, 7 Oct 2021 09:57:41 +0000 (05:57 -0400)]
LU-15071 utils: tunefs erease-params for zfs

The patch exclude special zfs params for tunefs erase-params,
skip nvlist modifying. Also fixes test_89 conf-sanity.

tunefs --erase-params produced segmentation fault with old code.

Test-Parameters: trivial fstype=zfs testlist=conf-sanity
HPE-bug-id: LUS-10314
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic8385a99ca896ce6d855692b3f77e198bf583d94
Reviewed-on: https://review.whamcloud.com/45145
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15067 lod: fix error handling in lod_new_pool 37/45137/4
Sergey Cheremencev [Wed, 23 Jun 2021 15:08:00 +0000 (18:08 +0300)]
LU-15067 lod: fix error handling in lod_new_pool

- correct error handling in lod_new_pool - ENOMEM
 from tgt_pool_init may cause incorrect pool_count.
- optimisation in lu_tgt_pool_add. Do not extend
 a pool, if the target is already exists.

HPE-bug-id: LUS-6995
Change-Id: I0ea00d335217f3994334bd02ae36c34653e7da98
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45137
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15011 lod: count all spilling events 47/44947/5
Andreas Dilger [Wed, 15 Sep 2021 23:30:23 +0000 (17:30 -0600)]
LU-15011 lod: count all spilling events

when target pool is used to as the original has no enough space.
lctl lod.*.pool.<poolname>.spill_hit can be used to get the counter.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6d54a2b910705da182b5f4118e535d196cdab004
Reviewed-on: https://review.whamcloud.com/44947
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13076 dne: dir migrate in QOS mode 86/44886/3
Lai Siyao [Tue, 7 Sep 2021 09:33:21 +0000 (05:33 -0400)]
LU-13076 dne: dir migrate in QOS mode

Support "lfs migrate -m -1 ..." to migrate directory to MDTs by
space and inode usage, if system is balanced, the target MDT is
chosen in roundrobin mode, otherwise the less full MDTs will be
chosen, and the most full MDT is avoided.

Another minor change: if directory is migrated to specific MDTs,
and the target stripe count is more than 1, its subdirs may not be
migrated to the specified MDT in the command, but migrated to the
MDT where its parent stripe is located (subdir will be striped too),
as can avoid unnecessary remote directories. NB, for command like
"lfs migrate -m 0,1,2 ...", though the subdir may be located on
either MDT0, MDT1 or MDT2, its stripes will be striped over these
three MDTs, but for command like "lfs migrate -m 0 -c 3...", the
subdir may be striped on other MDTs if the subdir is not located on
MDT0.

Add sanity 230u.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I6e9c3d75bfc240b21c65ba27cd5e4bcca7058325
Reviewed-on: https://review.whamcloud.com/44886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14927 quota: move qsd_transfer to lquota module 35/44735/7
James Simmons [Mon, 11 Oct 2021 15:10:04 +0000 (11:10 -0400)]
LU-14927 quota: move qsd_transfer to lquota module

With osd-zfs inheriting the tainted state of ZFS we can no longer
directly use kernel internals that exported GPL only. The osd-zfs
modules quota code uses some of these internal functions. The
osd-zfs qsd_transfer() function is generic enough that we can
move it to the lquota modules.

Change-Id: I735db0266306477cd5558968f1f3ed1f6e1b32da
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44735
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14938 tests: fail_abort() in t-f to take care of MDTs 71/44671/15
Alex Zhuravlev [Mon, 16 Aug 2021 17:22:00 +0000 (20:22 +0300)]
LU-14938 tests: fail_abort() in t-f to take care of MDTs

fail_abort() in test-framework ensures that the clients
are back after evictions. the same should be done for
MDTs as otherwise any subsequent test may fail due to
another MDT observing eviction and interrupting current
request with -EIO.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0a00ece52d28c6d28eef029a4f87a348efaa041c
Reviewed-on: https://review.whamcloud.com/44671
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14937 build: re-use config cache in 'make rpms/debs' 59/44659/7
Sebastien Buisson [Fri, 13 Aug 2021 15:16:08 +0000 (17:16 +0200)]
LU-14937 build: re-use config cache in 'make rpms/debs'

Idea is to pass along the value of the -C or --cache-file options from
the initial ./configure to the one launched as part of the rpm or deb
build.
But all the environment variables related cache info must be removed
otherwise the config cache file cannot be reused.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab4ae2815961ba10132d9cb44f82ca58d313e908
Reviewed-on: https://review.whamcloud.com/44659
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14909 test: mkdir_on_mdt0 to mkdir on MDT0 44/44544/4
Lai Siyao [Tue, 3 Aug 2021 15:56:47 +0000 (11:56 -0400)]
LU-14909 test: mkdir_on_mdt0 to mkdir on MDT0

Many sub tests in recovery-small.sh and replay-single.sh need to mkdir
on MDT0, use mkdir_on_mdt0() to create such directories.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=recovery-small.sh,replay-single.sh
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id4a44da062350ea284f51c8c821302aebbfe9dee
Reviewed-on: https://review.whamcloud.com/44544
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14831 osd-ldiskfs: uninited osd_inode_id is used 49/44349/3
Hongchao Zhang [Wed, 30 Jun 2021 11:15:03 +0000 (19:15 +0800)]
LU-14831 osd-ldiskfs: uninited osd_inode_id is used

In osd_fid_lookup, the "osd_inode_id" could be used uninitializedly
if the FID doesn't exist in OI, which cause some faked FID/inode
pair to be inserted into OI file and the OI scrub thread could be
triggered repeatedly.

Change-Id: I9100dece9e94d3e590f17fb4498601876aa1edaa
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44349
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14174 lfs: llapi_mirror_find() signed return 57/41757/7
Mikhail Pershin [Fri, 4 Dec 2020 12:06:37 +0000 (15:06 +0300)]
LU-14174 lfs: llapi_mirror_find() signed return

Check return codes from llapi_mirror_find() as signed value
to don't count errors as valid mirror ids

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9754eaa063c9a2d07d8b815a86e7597922201f9c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41757
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
2 years agoLU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1 73/41473/7
Shaun Tancheff [Tue, 24 Aug 2021 08:25:32 +0000 (03:25 -0500)]
LU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1

Linux-commit: f902b216501094495ff75834035656e8119c537f
ext4: fix bogus warning in ext4_update_dx_flag()

The update breaks the ldiskfs pdirop patch which disables
ext4_update_dx_flag.

SUSE 5.3.18-24.46 can directly use the 5.4.0-66-ubuntu20.series

Test-Parameters: trivial
HPE-bug-id: LUS-9684, LUS-9758
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I9271ee80c05715d7dcec78535cfde1e384ba40e9
Reviewed-on: https://review.whamcloud.com/41473
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12391 tests: mdsrate tests improvements 69/35069/5
Elena Gryaznova [Thu, 14 Oct 2021 16:18:31 +0000 (19:18 +0300)]
LU-12391 tests: mdsrate tests improvements

Patch improves mdsrate tests to work with
striped directories which are created if
MDSRATE_ENABLE_DNE=true.

mdsrate.c is fixed to not fail if --mdtcount
option is used and ndirs=1. Without this fix
  mdsrate --mdcount
fails as:
cannot create stripe dir: File exists
when the ranks do system(mkdir_cmd) on the
directory which was already created by the
first executed rank.

Patch sets mdt.*.enable_remote_dir_gid to "-1"
to allow mpiuser to create remote directories.

Patch makes the mdsrate based tests a bit more
verbose: mdsrate create/mknod is called with
--debug option if VERBOSE set to "true".

Test-Parameters: trivial testlist=performance-sanity
Fixes: f31c60c97328 ("LU-1187 tests: Add mntfmt/mntcount/mdtcount to mdsrate")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-7262
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ieb32ad7dfad838fc9124740236889a5fe47cb901
Reviewed-on: https://review.whamcloud.com/35069
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11407 obdclass: add start time to stats files 01/33201/10
Andreas Dilger [Wed, 19 Sep 2018 21:08:47 +0000 (17:08 -0400)]
LU-11407 obdclass: add start time to stats files

When the stats files are initialized or reset, store the current
timestamp with the stats.  That allows computing average IO and
RPC rates over the accumulated stats lifetime, in addition to the
normal incremental operation rates found by comparing successive
values read from the stats file with the read interval.

Any stats that currently print the "snapshot_time:" header will
now also print "start_time:" and "elapsed_time:" fields as well.
Consolodate this printing into a helper function instead of
duplicating very similar code in many different functions.  Output
can't be exactly the same for all callers, because these fields are
embedded into different types of output files, but it is very close.

Change struct rename_stats and brw_stats to use a common name prefix.

Change the obd_job_stats timestamps to ktime_t so that we can use the
common helper function for printing the header.  It is easier to store
ojs_cleanup_interval internally as 1/2 of the maximum stats age, since
since division is more easily done when the value is initially set as
seconds compared to when it is ktime_t.  This may also be a tiny bit
more efficient since we don't do a divide/shift on each access.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iacefa17def455ef53a28fd14b6d4c670463ebbe5
Reviewed-on: https://review.whamcloud.com/33201
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15115 ptlrpc: recalc timer on EINPROGRESS reply 66/45266/2
Alexander Zarochentsev [Fri, 15 Oct 2021 18:27:29 +0000 (21:27 +0300)]
LU-15115 ptlrpc: recalc timer on EINPROGRESS reply

ptlrpcd doesn't recalculate wait queue timer after
getting -EINPROGRESS reply. It may delay request resend
till its timing out.

HPE-bug-id: LUS-10366
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Idc76c688a0f7ff8e110446fd1fe13dd83f636f3b
Reviewed-on: https://review.whamcloud.com/45266
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-2084 lnet: don't retry allocating router buffers 74/45174/4
Andreas Dilger [Sat, 9 Oct 2021 01:20:49 +0000 (19:20 -0600)]
LU-2084 lnet: don't retry allocating router buffers

Don't loop indefinitely trying to allocate router buffer pools if
the number of requested buffers is too large for the system.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic0f2ccf0f7b38dfa254e46e268b27092342efdb5
Reviewed-on: https://review.whamcloud.com/45174
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14976 ptlrpc: align function names with param names 17/44817/3
Andreas Dilger [Thu, 2 Sep 2021 05:50:34 +0000 (23:50 -0600)]
LU-14976 ptlrpc: align function names with param names

Change the internal function names for the ptlrpc proc tunables
to match the parameter names exposed to userspace.  Otherwise it
is needlessly complex to find the function that implements the
"nrs_policies" parameter, since the parameter use itself is wrapped
in a macro that generates the proc handling structure.

Clean up code style in related functions.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9079766cccade963f1510cfcce228da9be3ebbe5
Reviewed-on: https://review.whamcloud.com/44817
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15076 socklnd: lock ksnc_tx_queue list processing 79/45179/2
Artem Blagodarenko [Sat, 9 Oct 2021 04:35:19 +0000 (00:35 -0400)]
LU-15076 socklnd: lock ksnc_tx_queue list processing

A GFP occurred in the ksocknal_find_timed_out_conn() while processing
ksnc_tx_queue list.

Add locking to this list.

Change-Id: I1f76683e5798c5015f11e3fa285db9613b1af906
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
HPE-bug-id: LUS-10248
Fixes: 25c1cb2c4d ("LU-9120 lnet: handle socklnd tx failure")
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45179
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13997 tests: fix sanity test_418 lock cancellation 31/45231/4
Andreas Dilger [Wed, 13 Oct 2021 20:34:14 +0000 (14:34 -0600)]
LU-13997 tests: fix sanity test_418 lock cancellation

Use "do_nodes" directly to cancel DLM locks, rather than
"do_rpc_nodes", since that is very heavy to use in a loop
(each call takes 3s, but the loop delay is only 0.2s).

Due to DoM reserving grant space for the DoM files, the "avail"
space shown by "df" may be smaller in the aggregate returned by
the MDT compared to the individual values from "lfs df".

Skip this part of the check until MDC grant cancel is fixed.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I83d989688ce671f0ff9c62ebdf3144746a3ebbe5
Reviewed-on: https://review.whamcloud.com/45231
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14448 lod: verify LOV before set/inherit 39/45039/6
Lai Siyao [Thu, 23 Sep 2021 10:31:06 +0000 (06:31 -0400)]
LU-14448 lod: verify LOV before set/inherit

DoM layout can only be set as entry in composite layout, and its
stripe count should always be 0.

Verify LOV before setting and inheriting.

Add sanity 270i.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I016d1a202960bfebc72dd808de5f80e09051a01e
Reviewed-on: https://review.whamcloud.com/45039
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13195 osp: osp_send_update_req() should check generation 42/45042/18
Alex Zhuravlev [Mon, 27 Sep 2021 13:28:50 +0000 (16:28 +0300)]
LU-13195 osp: osp_send_update_req() should check generation

and don't send requests depending on just failed one

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I27a2b21130e33287168204ad829c0a53002b517e
Reviewed-on: https://review.whamcloud.com/45042
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12807 tests: fix intermittent runtests failure 14/44614/2
Andreas Dilger [Wed, 11 Aug 2021 20:49:19 +0000 (14:49 -0600)]
LU-12807 tests: fix intermittent runtests failure

Occasional runtests failures are seen in full testing on ldiskfs.
Increase the llog space limit to 72KB from 50KB due to seeing a
regular failures in the 52/64KB range.

Test-Parameters: trivial testlist=runtests
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6e272fe9fec21a650110a42efe31a1dc48e35854
Reviewed-on: https://review.whamcloud.com/44614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15074 build: Use strlcpy if strscpy is not available 75/45175/5
Shaun Tancheff [Sun, 10 Oct 2021 04:47:29 +0000 (11:47 +0700)]
LU-15074 build: Use strlcpy if strscpy is not available

Linux commit v4.2-rc1-2-g30035e45753b
    string: provide strscpy()

    The strscpy() API is intended to be used instead of strlcpy(),
    and instead of most uses of strncpy().

Unfortunatley strscpy is not always available.

Test for strscpy and fallback to strlcpy when strscpy is
not available.

Test-Parameters: trivial
Fixes: b77a6d86936 ("LU-14665 lnet: simplify lnet_ni_add_interface")
HPE-bug-id: LUS-9546
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I43038e4a6260dafb57195ec3417ce009f5a3fad4
Reviewed-on: https://review.whamcloud.com/45175
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15064 tests: sanity-sec test_58 must read its own dir 50/45150/2
Sebastien Buisson [Thu, 7 Oct 2021 15:23:05 +0000 (17:23 +0200)]
LU-15064 tests: sanity-sec test_58 must read its own dir

sanity-sec test_58 should restrict file listing to its own
directory, and not try to list content of the entire Lustre tree.
This is useless for the purpose of the test, and exposes to unclean
remnants from previous tests.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec env=ONLY=58
Fixes: 1faf54e8bf ("LU-14989 sec: access to enc file's xattrs")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic9ab0860da0ab86355a207ad9d50feb3975adf68
Reviewed-on: https://review.whamcloud.com/45150
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15060 tests: sanity-flr test_208[a,b] fix 30/45130/3
Elena Gryaznova [Tue, 5 Oct 2021 17:08:59 +0000 (20:08 +0300)]
LU-15060 tests: sanity-flr test_208[a,b] fix

sanity-flr  test_208a  and test_208b failed as:
  test_208a returned 2
  test_208b returned 2
on Lustre setup where osts are located not on one host
because of stack_trap "do_nodes $osts $LCTL set_param $old"
returns 2. Let's use save_lustre_params() instead of trying to
set not-existing parameters.

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Test-Parameters: trivial testlist=sanity-flr env=ONLY=208
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I19cedc0a9745d0d112ac05fe3a800347ab4c40d3
Reviewed-on: https://review.whamcloud.com/45130
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15011 tests: ost-pools to remove big files 20/45120/2
Alex Zhuravlev [Sun, 3 Oct 2021 06:52:27 +0000 (09:52 +0300)]
LU-15011 tests: ost-pools to remove big files

otherwise those files can affect subtest 29

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia07c10882aba97758a3d11965693134eb2238e9a
Reviewed-on: https://review.whamcloud.com/45120
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15050 tests: do not ignore SANITY_ONLY and SANITYN_ONLY 07/45107/2
Elena Gryaznova [Thu, 30 Sep 2021 18:57:14 +0000 (21:57 +0300)]
LU-15050 tests: do not ignore SANITY_ONLY and SANITYN_ONLY

sanity 150[b,bb,c,d,f,g] and sanityn 107 tests are to be included
into SANITY_ONLY and SANITYN_ONLY lists only if these lists not
set by user.

Fixes: 3c75d2522786 ("LU-10664 dom: non-blocking enqueue for DOM locks")
Fixes: 163870abfb7c ("LU-14382 mdt: implement fallocate in MDC/MDT")
Test-Parameters: trivial testlist=sanity-dom env=SANITY_ONLY=1,SANITYN_ONLY=1
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I686731df3dfbfe1b7d4ae2e8621d1b0c10c48a22
Reviewed-on: https://review.whamcloud.com/45107
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
2 years agoLU-15039 lnet: Fix reference leak in lnet_parse 67/45067/2
Chris Horn [Wed, 5 Aug 2020 16:19:35 +0000 (11:19 -0500)]
LU-15039 lnet: Fix reference leak in lnet_parse

We need to drop the reference taken by lnet_nid2peerni_locked() if we
determine that we need to drop the message because of asymmetric
route.

Test-Parameters: trivial
HPE-bug-id: LUS-9186
Fixes: 955080c3ae ("LU-13779 lnet: Correct asymmetric route detection")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I799c9522b1ce5f4caffc5848a829995e5b5484e7
Reviewed-on: https://review.whamcloud.com/45067
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15010 mdc: add support for grant shrink 56/44956/5
Alex Zhuravlev [Thu, 16 Sep 2021 08:20:18 +0000 (11:20 +0300)]
LU-15010 mdc: add support for grant shrink

just re-use existing mechanism used in OSC

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4cdca057d35eaff6493d047127f1fe5eee9e9620
Reviewed-on: https://review.whamcloud.com/44956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14659 test: improve generate_uneven_mdts() in sanity.sh 49/44649/4
Lai Siyao [Wed, 4 Aug 2021 04:37:29 +0000 (00:37 -0400)]
LU-14659 test: improve generate_uneven_mdts() in sanity.sh

Improve generate_uneven_mdts() in several places:
1. set qos maxage to 1, so the result is up to date, and avoid filling
   up MDT.
2. fill MDT with files of size 64K other than 1M, so MDT imbalance is
   quicker to achieve.
3. when checking minimum imbalance after test, lookup max value from
   the result, other than by index stored before directory creation,
   because the result is dynamic if several MDTs have almost the same
   free space and inodes.

Test-Parameter: trivial mdscount=2 mdtcount=4 testlist=sanity
Fixes: 233344d451e ("LU-13417 test: generate uneven MDTs early for sanity 413")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2807101ff632404e25fdb640840d83d1991c88d9
Reviewed-on: https://review.whamcloud.com/44649
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-14905 lfsck: linkEA overflow handling fix 69/44469/3
Vitaly Fertman [Mon, 2 Aug 2021 17:04:44 +0000 (20:04 +0300)]
LU-14905 lfsck: linkEA overflow handling fix

An absent link in EA is not an issue and not to be fixed if EA is
overflowed. lfsck should not report it is an issue if there is no
space for this link, and should not report it is fixed whereas it
is not (linkea_add_buf() returns 0 if so without having a new entry
added into EA and lfsck_namespace_assistant_handler_p1() later
reports it is repaired).

HPE-bug-id: LUS-8810
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Iba1549045c8c3889adf55c99cdd88756e5643073
Reviewed-on: https://es-gerrit.dev.cray.com/158706
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/44469
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14866 lod: remove duplicate OST_TGT 51/44351/4
Sergey Cheremencev [Tue, 20 Jul 2021 09:24:33 +0000 (12:24 +0300)]
LU-14866 lod: remove duplicate OST_TGT

Remove duplicate OST_TGT from lod_ost_alloc_qos.

Change-Id: I4fbe2daa057f23a60e31e59d7c0db592945a5363
Fixes: 2112ccb3c4 ("LU-13073 osp: don't block waiting for new objects")
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/44351
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14989 sec: keep encryption context in xattr cache 48/45148/2
Sebastien Buisson [Thu, 7 Oct 2021 14:04:34 +0000 (16:04 +0200)]
LU-14989 sec: keep encryption context in xattr cache

When an inode is being cleared, its xattr cache must be completely
wiped. But in case of lock cancel, we want to keep the encryption
context, as further processing might need to check it.

Fixes: 1faf54e8bf ("LU-14989 sec: access to enc file's xattrs")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8a2f4497129353a7fbf86cdaaa13fae6e0988790
Reviewed-on: https://review.whamcloud.com/45148
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13941 osp: Silently lower requested create_count to maximum 67/39967/9
Shaun Tancheff [Mon, 23 Aug 2021 14:40:39 +0000 (09:40 -0500)]
LU-13941 osp: Silently lower requested create_count to maximum

When setting create_count it should silently accept a larger value
and truncate it to the current maximum.

This would avoid issues if that limit is changed in the future.

HPE-bug-id: LUS-5960
Test-Parameters: trivial testlist=parallel-scale,sanity
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4727ba6fca747e1ae9850188ef63c7abb89904be
Reviewed-on: https://review.whamcloud.com/39967
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14797 nodemap: map project id 19/44119/9
Sebastien Buisson [Wed, 30 Jun 2021 16:30:57 +0000 (18:30 +0200)]
LU-14797 nodemap: map project id

Add calls to nodemap_map_id() in order to map project IDs from
client ID to server ID and conversely.
Also extend nodemap_can_setquota() to allow setquota on project
only if ID is not squashed or deny_unknown is not set.
Update sanity-sec test_27a to exercise the feature.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id66458550d312404b1993ead8940c3d12eaadccd
Reviewed-on: https://review.whamcloud.com/44119
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15052 lnet: include linux/ethtool.h 09/45109/5
Jian Yu [Fri, 1 Oct 2021 06:27:07 +0000 (23:27 -0700)]
LU-15052 lnet: include linux/ethtool.h

Kernel 5.11.0-36 removes including linux/ethtool.h from
linux/netdevice.h, which caused the following build error:

dereferencing pointer to incomplete type 'const struct ethtool_ops'

This patch fixes the above issue by adding the include into
the file that uses the structure.

Test-Parameters: trivial

Change-Id: Ifc25de5acaebf2b5fd5bb6f1c303366ab9ea6ef6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45109
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 obdclass: Remove lprocfs_obd_short_io_bytes_* declarations 96/45096/3
Oleg Drokin [Thu, 30 Sep 2021 00:17:33 +0000 (20:17 -0400)]
LU-8066 obdclass: Remove lprocfs_obd_short_io_bytes_* declarations

The functions themselves were long renamed

Change-Id: Ic97d83a56d065ff1dadfc9a01d878e246e06a847
Test-Parameters: trivial
Fixes: 32fb31f3bf ("LU-8066 osc: move suitable values from procfs to sysfs")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45096
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 years agoLU-15040 mdc: update max_easize on reconnect 73/45073/3
Sergey Cheremencev [Wed, 11 Nov 2020 08:19:29 +0000 (11:19 +0300)]
LU-15040 mdc: update max_easize on reconnect

If MDS was restarted to enable ea_inode, clients should get new
max_easize value. However, cl_max_mds_easize is not updated. This may
cause lfs getstripe to fail if file has huge stripe number
(2000 for example):

*** Error in `lfs': free(): invalid pointer: 0x0000000000de09d0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x7f0623c03299]
/lib64/libc.so.6(closedir+0xd)[0x7f0623c42ddd]
/lib/liblustreapi.so.1(+0xa557)[0x7f06248b5557]
/lib/liblustreapi.so.1(+0xad74)[0x7f06248b5d74]
lfs[0x4105b3]
/lib/liblustreapi.so.1(Parser_execarg+0x51)[0x7f06248c88e1]
lfs[0x40448e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0623ba4555]
lfs[0x4044fc]

HPE-bug-id: LUS-9478
Change-Id: If155a63e2f07536c6500b37b5e6191cb8b0d0607
Reviewed-on: https://es-gerrit.dev.cray.com/158100
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Nikitas Angelinas <nangelinas@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45073
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12268 osd: BUG_ON for IAM corruption 72/45072/2
Alexander Boyko [Tue, 28 Sep 2021 13:27:12 +0000 (09:27 -0400)]
LU-12268 osd: BUG_ON for IAM corruption

The patch adds strict checks of buffer head overflow
for IAM dx blocks.

HPE-bug-id: LUS-10178
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I1608f6cbf00b5120fbc36d0c65fcfe37c43e375f
Reviewed-on: https://review.whamcloud.com/45072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15038 mgc: release cl_mgc_mutex on error 63/45063/3
Andreas Dilger [Mon, 27 Sep 2021 18:29:58 +0000 (12:29 -0600)]
LU-15038 mgc: release cl_mgc_mutex on error

If local_oid_storage_init() returns an error, the cl_mgc_mutex()
should be released.

Fixes: 3e38436dc09 ("LU-2059 llog: MGC to use OSD API for backup logs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I921dde4e9202733874d8e7f980e95af23739a655
Reviewed-on: https://review.whamcloud.com/45063
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15011 tests: pool spill test modifications 56/45056/3
James Nunez [Mon, 27 Sep 2021 16:59:07 +0000 (10:59 -0600)]
LU-15011 tests: pool spill test modifications

Make the following modifications to the ost-pools
test suite:
test 29 - change check for 'when striping is specified
          explicitly' file from 'file-2' to 'file-3'

test 30 - Add bad parameter check for setting the threshold
          below zero

test 31 - 'do_nodes $mdts $LCTL get_param lod.*.pool.*'
          doesn’t print anything. Change to
          'do_nodes $mdts $LCTL get_param lod.*.pool.*.spill*'

Fixes: 0a998f4723 (“LU-14825 lod: pool spilling”)
Test-Parameters: trivial testlist=ost-pools
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Icbdc3d42b7f7609bc57cc37830975d831125d659
Reviewed-on: https://review.whamcloud.com/45056
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-15028 tests: improve ha.sh to be more verbose 25/45025/2
Elena Gryaznova [Wed, 22 Sep 2021 17:47:36 +0000 (20:47 +0300)]
LU-15028 tests: improve ha.sh to be more verbose

Patch adds some informing messages to make the failure
reason detection simpler.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10286
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I3bef165f497d745c3e8ee3c8a91532096100bb99
Reviewed-on: https://review.whamcloud.com/45025
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>