Whamcloud - gitweb
Oleg Drokin [Fri, 4 Mar 2022 22:10:25 +0000 (17:10 -0500)]
LU-15615 target: Free t10pi crypto state on error
Looks like when error happens we forgot to release crypto state that
not only leaks memory directly, but potentially can tie in-memory
pages too.
Change-Id: Ia0870ccbb194e4e9ca8701e1c01d519745c236df
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Sergey Cheremencev [Thu, 21 Oct 2021 20:28:01 +0000 (23:28 +0300)]
LU-15263 quota: fix bug in qmt_pool_recalc
env should be freed at the end of qmt_pool_recalc,
as it is needed in qpi_putref. It causes a panic,
if pool has the last reference:
BUG: unable to handle NULL pointer dereference at
000000000000a0
IP: lu_context_key_get+0x17/0x30 [obdclass]
Call Trace:
lu_object_free.isra.30+0x68/0x170 [obdclass]
lu_object_put+0xc5/0x3e0 [obdclass]
qmt_pool_free+0x30c/0x590 [lquota]
qmt_pool_recalc+0x365/0x1260 [lquota]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21
Lustre-change: https://review.whamcloud.com/45632
Lustre-commit:
57d88137e12472cf5ea08aa28957b4767abd475c
HPE-bug-id: LUS-10426
Change-Id: Ic23dcb858ff811757f38948aa572c936c076e21e
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46794
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Thu, 30 Sep 2021 15:58:16 +0000 (18:58 +0300)]
LU-13756 quota: up_read leak in qmt_pool_lookup
qmt_pool_lock is not released if qti_pools_add fails in
qmt_pool_lookup.
Lustre-change: https://review.whamcloud.com/45106
Lustre-commit:
d16b3141119a3b75276914ad3601e0dd27579b2b
Change-Id: Ic2adb44468d51af7aefcbb91279260ae6f85d67a
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vladimir Saveliev [Tue, 24 Aug 2021 14:57:37 +0000 (17:57 +0300)]
LU-15079 quota: include qsd_thread_info into mgs thread
mgs service thread envs do not get supplied with qsd_thread_info,
which may lead to the failure shown below:
(lu_object.h:1274:lu_env_info()) ASSERTION( info ) failed:
Pid: 146951, comm: ll_mgs_0003 3.10.0-957.1.3957.1.3.x4.3.25
Call Trace:
libcfs_call_trace+0x8e/0xf0 [libcfs]
lbug_with_loc+0x4c/0xa0 [libcfs]
qsd_refresh_usage+0x25e/0x2f0 [lquota]
qsd_op_adjust+0x2f1/0x730 [lquota]
osd_object_delete+0x2b2/0x360 [osd_ldiskfs]
lu_object_free.isra.32+0x68/0x170 [obdclass]
lu_site_purge_objects+0x2fe/0x530 [obdclass]
lu_object_find_at+0x371/0xa60 [obdclass]
dt_locate_at+0x1d/0xb0 [obdclass]
llog_osd_open+0x50e/0xf30 [obdclass]
llog_open+0x15a/0x3e0 [obdclass]
llog_origin_handle_open+0x334/0x720 [ptlrpc]
tgt_llog_open+0x33/0xe0 [ptlrpc]
mgs_llog_open+0x46/0x460 [mgs]
tgt_request_handle+0x96a/0x1680 [ptlrpc]
Supply msg service context with qsd_thread_info.
Lustre-change: https://review.whamcloud.com/45181
Lustre-commit:
69a9042f26fa22b1d5b2ad7b3cb8024d508268dd
Change-Id: If8664b81e1f64df015dad46ba26c9c1d1e3f54bf
HPE-bug-id: LUS-10334
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46792
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sergey Cheremencev [Wed, 15 Sep 2021 15:05:45 +0000 (18:05 +0300)]
LU-15065 quota: fix BIO write performance drop
Before the patch qti_lqes_qunit_min used int to store qunit
value, while lqe_qunit type is _u64. lqe_qunit > 2G caused
an overflow in a local integer argument. For example, when
block hard limit was set to 500TB(i.e. lqe_qunit was about
64TB in a system with 2 OSTs), qti_lqes_qunit_min returned
0 instead of 64TB in a qmt_lvbo_fill. Thus new qunit was not
set on OSTs(qsd_set_qunit wasn't called). Without the qunit,
OST began to send release request after each acquire. For
example, to write 10MB at the OST were sent 2 acquire and
2 release reuests(as qunit was not set on OST). With the
fix, i.e. in a normal case, OST needs just one acquire
request. The issue caused performance drop in a bufferred
write up to 15%-20% if compare with a baseline without PQ
patches.
Note, the issue exists only when a hard limit is set to some
high value(>100GB). The exact hard limit value depends on OSTs
number in a system and on amount of used space, but let's think
that issue doesn't exist on a clean system with 2 OSTs and hard
block limit 100G(this case was checked).
Remove qmt_pool_hash - it is not used anywhere since
"LU-11023 quota: remove quota pool ID".
Lustre-change: https://review.whamcloud.com/45133
Lustre-commit:
7b8c6cd976c584b4e965b24bf4369ded86cda811
HPE-bug-id: LUS-10250
Change-Id: I2c4ce38f5b9395ed1f4868d4c8efc00751116b15
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sergey Cheremencev [Tue, 12 Oct 2021 15:21:49 +0000 (18:21 +0300)]
LU-15191 quota: set correct revoke_time
When we do qmt_adjust_qunit, there are several lqes
and lqe_revoke_time is set for some of them, it means
appropriate OSTs have been already notified with the
least qunit and there is no chance to free more space.
If a qunit of the current lqe becomes equal to the least
qunit, find an lqe with the minimum(earliest) revoke_time
and set this revoke_time to the current one.
This patch fixes the following case. For example, we have
8 OSTs and 4 MDTs(i.e. 12 slaves) and a pool with just one
OST. Global hard block limit for the user is 50M, and 10M
for this user in a pool. User's usage is 0. As global pool
has 12 slaves it's initial qunit value is 1M, i.e. equal to
the least qunit. At the same time initial qunit value for the
pool with one OST is 4M. When user begins to write, pool's
qunit is decreased to 1M, but lqe_revoke is not set - it
should be set only after sending new qunit to OSTs in
qmt_lvbo_update. However, it won't be send because appropriate
lge_qunit in lqe global array already has the same value.
This problem caused sanity-quota_72 to hang instead of fail
with EDQUOT in test_1_check_write.
Lustre-change: https://review.whamcloud.com/45447
Lustre-commit:
e8ecb8775389fb7febd2d0c659f0e80440f0b620
HPE-bug-id: LUS-10516
Change-Id: I5878c1e719ae83a69ad5dbc3653717bb1b4de632
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46790
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Thu, 17 Jun 2021 10:45:42 +0000 (13:45 +0300)]
LU-15049 quota: fix a panic with pool number > 16
Fix a panic that may occur when there are more than 16
pools in a system:
qti_pools_add()) ASSERTION(qti->qti_pools_num >= QMT_MAX_POOL_NUM)
Forgot init?
ffff91a5f9625800
Lustre-change: https://review.whamcloud.com/45105
Lustre-commit:
d2e8208e22f21bb7354a9207f381217c222d3df3
HPE-bug-id: LUS-10116
Change-Id: I4f73b74d2fd3e85a51cf3c30e2eec29645f164be
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46789
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sergey Cheremencev [Thu, 22 Jul 2021 10:56:24 +0000 (13:56 +0300)]
LU-15048 quota: check that qti_lqes has been inited
qti_lqes_resotre_{init,fini}() should check that qti_lqes
has been inited before address qti_lqes_count.
Fix helps against following panic:
qti_lqes_restore_fini() ASSERTION(qmt_info(env)->qti_lqes_rstr)
Lustre-change: https://review.whamcloud.com/45102
Lustre-commit:
d2e8208e22f21bb7354a9207f381217c222d3df3
HPE-bug-id: LUS-10239
Change-Id: Ic93d87535f615fe419b2c3a2453506c515837031
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/46788
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Mon, 5 Apr 2021 12:27:34 +0000 (15:27 +0300)]
LU-14631 quota: fix qunit sort
Fix lqes_cmp that is used to sort lqes by qunit. As lqes_cmp returns
integer, it returns incorrects values if difference between qunits is
grater than 4GB causing write to hang instead of fail with -EDQUOT:
[<
ffffffffc0701945>] cl_sync_io_wait+0x295/0x3c0 [obdclass]
[<
ffffffffc07026f8>] cl_io_submit_sync+0x1c8/0x360 [obdclass]
[<
ffffffffc128dc0a>] vvp_io_commit_sync+0x12a/0x460 [lustre]
[<
ffffffffc128f5ee>] vvp_io_write_commit+0x4de/0x620 [lustre]
[<
ffffffffc128fa39>] vvp_io_write_start+0x309/0x990 [lustre]
[<
ffffffffc0700a18>] cl_io_start+0x68/0x130 [obdclass]
[<
ffffffffc0702e8c>] cl_io_loop+0xcc/0x1c0 [obdclass]
[<
ffffffffc1243514>] ll_file_io_generic+0x5c4/0xdc0 [lustre]
[<
ffffffffc12441b9>] ll_file_aio_write+0x289/0x730 [lustre]
[<
ffffffffc1244760>] ll_file_write+0x100/0x1c0 [lustre]
[<
ffffffffa0241320>] vfs_write+0xc0/0x1f0
[<
ffffffffa024213f>] SyS_write+0x7f/0xf0
The issue is occurred if a user hits block hard limit in a pool (pools
limit 6GB), while global limit is some huge value (53T in my case).
Change global limit in sanity-quota_1e to check that system doesn't
hung anymore.
Lustre-change: https://review.whamcloud.com/43410
Lustre-commit:
9d3ce2985efc315529bf5faf6f3b970cd9949107
HPE-bug-id: LUS-9891
Change-Id: I5a16fd3a40172187bbf35d9a9c9bfeef2ef3a108
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46787
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Chris Horn [Mon, 1 Nov 2021 20:06:31 +0000 (15:06 -0500)]
LU-15186 o2iblnd: Default map_on_demand to 1
On kernels that provide global MR we default to using that exclusively
even if FMR/FastReg is available. This causes an interop issue if the
active side of a connection request has a higher fragment count than
the passive side because FMR/FastReg may be needed to map the higher
fragment count. We should change the default map_on_demand to 1 so
that FMR/FastReg is used by default. map_on)demand can still be set
to 0 if needed.
Lustre-change: https://review.whamcloud.com/45431
Lustre-commit:
21fdd616bd4784e4e3571294ba39f00b24a25806
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I76010a905f151efbb0b109ae6f5fba6fb7ce1956
Reviewed-on: https://review.whamcloud.com/46807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Thu, 16 Sep 2021 17:12:38 +0000 (12:12 -0500)]
LU-15092 o2iblnd: Fix logic for unaligned transfer
It's possible for there to be an offset for the first page of a
transfer. However, there are two bugs with this code in o2iblnd.
The first is that this use-case will require LNET_MAX_IOV + 1 local
RDMA fragments, but we do not specify the correct corresponding values
for the max page list to ib_alloc_fast_reg_page_list(),
ib_alloc_fast_reg_mr(), etc.
The second issue is that the logic in kiblnd_setup_rd_kiov() attempts
to obtain one more scatterlist entry than is actually needed. This
causes the transfer to fail with -EFAULT.
Lustre-change: https://review.whamcloud.com/45216
Lustre-commit:
23a2c92f203ff2f39bcc083e6b6220968c17b475
Test-Parameters: trivial
HPE-bug-id: LUS-10407
Fixes:
d226464aca ("LU-8057 ko2iblnd: Replace sg++ with sg = sg_next(sg)")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ifb843f11ae34a99b7d8f93d94966e3dfa1ce90e5
Reviewed-on: https://review.whamcloud.com/46474
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Wed, 29 Sep 2021 17:42:26 +0000 (12:42 -0500)]
LU-15094 o2iblnd: map_on_demand not needed for frag interop
The map_on_demand tunable is not used for setting max frags so don't
require that it be set in order to negotiate max frags.
Lustre-change: https://review.whamcloud.com/45215
Lustre-commit:
4e61a4aacdbc2376069d52d0f803a9f05315080f
HPE-bug-id: LUS-10488
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie89f1f035f4b05244feffb848c14582a8c7cf0e6
Reviewed-on: https://review.whamcloud.com/46453
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vladimir Saveliev [Thu, 10 Mar 2022 20:00:25 +0000 (12:00 -0800)]
LU-15095 tests: skip lbug_on_grant_miscount on client
Do not try to specify the lbug_on_grant_miscount=1 module parameter
on client-only builds (el7.9, pcc64le, aarch64) as this is a server
parameter and will not be present if the client is built without
HAVE_SERVER_SUPPORT. Otherwise, loading ptlrpc.ko will fail.
Lustre-change: https://review.whamcloud.com/46185
Lustre-commit:
49e29f38343ce0389df0aecf308b0986de94c029
Test-Parameters: trivial testlist=sanityn clientdistro=el7.9
Fixes:
2c787065441e ("LU-15095 target: lbug_on_grant_miscount module parameter")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
LU-15095 target: lbug_on_grant_miscount module parameter
Some tests have hit "lctl: error invoking upcall" when setting the
lbug_on_grant_miscount tunable parameter. Instead, define a module
parameter lbug_on_grant_miscount flag as ptlrpc module parameter,
similar to how it is done for ldiskfs_track_declares_assert.
Lustre-change: https://review.whamcloud.com/45521
Lustre-commit:
2c787065441ee60c6c163dc77851d0964f81a89c
Change-Id: I9cd0f9fa75b37539b23443bbcbb3445c87318ab1
Fixes:
bb5d81ea95 ("LU-14543 target: prevent overflowing of tgd->tgd_tot_granted")
Test-Parameters: trivial
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/46768
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vladimir Saveliev [Wed, 9 Mar 2022 19:16:19 +0000 (11:16 -0800)]
LU-14543 target: prevent overflowing of tgd->tgd_tot_granted
If tgd->tgd_tot_granted < ted->ted_grant then there should not be:
tgd->tgd_tot_granted -= ted->ted_grant;
which breaks tgd->tgd_tot_granted.
In case of obvious ted->ted_grant damage, recalculate
tgd->tgd_tot_granted using list of exports.
The same change is made for tgd->tgd_tot_dirty.
This patch also adds sanity check for exp->exp_target_data.ted_grant
increase in tgt_grant_alloc() to catch grant counting corruption as
soon as it happened. By default, the detected corruption is
CERROR()-ed, if needed that can be switched to LBUG() using lctl
set_param *.*.lbug_on_grant_miscount.
test-framework.sh:init_param_vars() enables LBUG().
Lustre-change: https://review.whamcloud.com/42129
Lustre-commit:
bb5d81ea95502fb5709e176b561b70aa5280ee07
Fixes:
af2d3ac30e ("LU-11939 tgt: Do not assert during grant cleanup")
Change-Id: I36ba7496f7b72b4881e98c06ec254a8eefd4c13f
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Cray-bug-id: LUS-9875
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46767
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Vladimir Saveliev [Wed, 9 Mar 2022 20:14:13 +0000 (12:14 -0800)]
LU-9704 grant: ignore grant info on read resend
The following scenario makes a message like "claims 28672 GRANT, real
grant 0" to appear:
1. client owns X grants and run rpcs to shrink part of those
2. server fails over so that the shrink rpc is to be resent.
3. on the clinet reconnect server and client sync on initial amount
of grants for the client.
4. shrink rpc is resend, if server disk space is enough, shrink does
not happen and the client adds amount of grants it was going to
shrink to its newly initial amount of grants. Now, client thinks that
it owns more grants than it does from server points of view.
5. the client consumes grants and sends rpcs to server. Server avoids
allocating new grants for the client if the current amount of grant
is big enough:
static long tgt_grant_alloc(struct obd_export *exp, u64 curgrant,
...
if (curgrant >= want || curgrant >= ted->ted_grant + chunk)
RETURN(0);
6. client continues grants consuming which eventually leads to
complains like "claims 28672 GRANT, real grant 0".
In case of resent of read and set_info:shrink RPCs grant info should
be ignored as it was reset on reconnect.
Tests to illustrate the issue is added.
Lustre-change: https://review.whamcloud.com/45371
Lustre-commit:
38c78ac2e390b30106f3e185d8c4d92b8cb19c2b
HPE-bug-id: LUS-7666
Change-Id: I8af1db287dc61c713e5439f4cf6bd652ce02c12c
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Sun, 20 Feb 2022 18:43:33 +0000 (11:43 -0700)]
LU-15010 tests: skip sanity test_64g/64h for interop
Sanity test_64g checks code that was only added in 2.14.56.
Lustre-change: https://review.whamcloud.com/46565
Lustre-commit:
a57f7708c9e8ecfeca874cda9cebc6b7ced3a9bb
Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=64
Fixes:
6e116213e3fd ("LU-15010 mdc: add support for grant shrink")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I339231f1b7890e8fffe7e079a052b15f54d4a050
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-on: https://review.whamcloud.com/46832
Sebastien Buisson [Thu, 25 Mar 2021 16:55:35 +0000 (17:55 +0100)]
LU-13717 sec: handle null algo for filename encryption
Encrypted files created with Lustre 2.14 have clear text file names.
With new code implementing filename encryption, newly created files
will have cipher text names, unless they are in an encrypted directory
created in Lustre 2.14.
So we need to make sure llcrypt library can properly handle the "null"
algorithm for client side filename encryption, which is basically a
no-op.
Handling this "null" algo for filename encryption will not be possible
with the in-kernel fscrypt library, so modify the behaviour of
configure to build with embedded llcrypt by default, and only build
against in-kernel fscrypt if explicitly specified via
--enable-crypto=in-kernel configure option.
The objective is to urge users to convert their encrypted directories
to the new fashion that encrypts filenames.
However, with the new code some operations on encrypted files created
with 2.14 might not be possible, like migrate, so expressly forbid
migrate on files that use the "null" algorithm for client side
filename encryption.
Finally, we revert commit
11fcbfa9de4a5170abc2c5df2a6e4e02f0f84268
("LU-12275 sec: force file name encryption policy to null") so that
new encrypted directories will enforce filename encryption.
Lustre-change: https://review.whamcloud.com/43388
Lustre-commit:
f18c87cb5362496a4baadaa14265471c992ca06a
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I393945adc9b720a56544b5da0669cb2848507457
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45729
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Mon, 19 Oct 2020 14:23:05 +0000 (23:23 +0900)]
LU-13717 sec: limit hard links to linkEA size for enc files
Some operations on encrypted files require to identify all names for
files having the same FID. For instance, for lookup, getattr or unlink
on encrypted files without the encryption key, we need to perform an
operation by FID instead of the actual name.
In order to make operations by FID unambiguous on server side, we
decide to limit the number of possible hard links for encrypted files,
to what the linkEA can contain.
Currently linkEA stores 4KiB of links, that is 14 NAME_MAX links, or
119 16-byte names.
Lustre-change: https://review.whamcloud.com/43387
Lustre-commit:
2ffb8f5726d27e7c2324a3e833491231fdaa3306
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I20a01874899f95b2ff61e05b2aa6851d135633e8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45728
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Fri, 11 Mar 2022 15:14:52 +0000 (09:14 -0600)]
EX-4866 lipe: don't unmount an empty client list
In hot-pools.sh, if the node list is empty then don't try to unmount
it since that will only confuse things.
Fixes:
e1da905b3884 EX-4866 lipe: don't unmount the local client
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1bf057beffd025a549524e85f02609be9611cccc
Reviewed-on: https://review.whamcloud.com/46800
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Mikhail Pershin [Wed, 9 Mar 2022 21:17:37 +0000 (13:17 -0800)]
LU-15357 mdd: fix changelog context leak
The mdd_changelog_clear() shouldn't skip llog_ctxt_put()
in case of error.
Lustre-change: https://review.whamcloud.com/45831
Lustre-commit:
d083c93c6fd9251d6637d33029049b1d27d2a20a
Fixes:
6b183927e1 (LU-14553 changelog: eliminate mdd_changelog_clear warning)
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9c9aa3ce0d11e8f67470b450d007f2a1081644c6
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46773
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Mikhail Pershin [Wed, 9 Mar 2022 21:11:33 +0000 (13:11 -0800)]
LU-14699 mdd: proactive changelog garbage collection
Currently changelog starts garbage collection when user
exceeds maximum idle timeout, there is also limit by amount
of idle records but it is used only for old changelog users
which have no cur_time field, therefore it is not used at
all nowadays. Another problem is that garbage collection is
started only when changelog is almost full. That causes
often situations when changelog might have very old users
staying much longer than idle timeout and having idle
records above maximum limit consuming space for nothing.
Patch reworks changelog GC in the following way:
- GC starts when changelog is almost full (old way) or
either idle time or idle records limits are exceeded or
when (idle_time * idle_records) exceeds its limit as well.
The latest limit is calculated as:
(idle_time * idle_records) / 84600 > (1 << 32) which is a
reasonable heuristic for deciding if a user is "too idle"
in both cases when lots records being created quickly vs
user is idle a very long time.
- to avoid the processing of changelog users each time GC is
checking all conditions both least user record and time
are tracked when changelog users are initialized or
purged/canceled. Both values are stored as mdd_changelog
fields mc_minrec and mc_mintime
- test 160g is changed to test the new approach when idle
indexes are checked always along with idle time checks
- test 160s is added in sanity.sh to check heuristic approach
with (idle_time * idle_records) value checking
Lustre-change: https://review.whamcloud.com/45068
Lustre-commit:
f60b307c5001e1d9035af61d2344af33d3ea0f85
Fixes:
3442db6faf68 ("LU-7340 mdd: changelogs garbage collection")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6028f3164212a2377a4fc45b60a826c64f859099
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Wed, 9 Mar 2022 20:58:08 +0000 (12:58 -0800)]
LU-14058 tests: handle more MDTs in sanity.sh
Fix up sanity.sh test_160 to handle configurations with more MDTs.
The "fnv_1a_64" hash is _relatively_ uniform and harder to break
under normal (ab)use, it doesn't leave totally entries balanced.
Even "all_chars" hash has a repeat MDT every handful of entries.
Since we need perfect balance across MDTs, use "lfs mkdir -i".
Fix a bug in test_160g that wasn't setting changelog_max_idle_indexes
properly for test systems with more than 4 MDTs.
Lustre-change: https://review.whamcloud.com/41485
Lustre-commit:
173bccd140adf69ce08c20810a69e783c8c12595
Test-Parameters: trivial testlist=sanity env=ONLY=160,230 mdtcount=8
Fixes:
489afbe69d5b ("LU-13321 tests: force even DNE file distribution")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I08bf2274a00fe1c6e52ec1a55f50dc8662d354a9
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
John L. Hammond [Fri, 4 Mar 2022 14:17:48 +0000 (08:17 -0600)]
EX-4015 lipe: implement lazy size and blocks
If the current file is an OST object or not a regular file then we use
the size and blocks values from the inode. (But this is wrong for
striped directories.) If the current file is an regular MDT inode then
we check for strict or lazy SOM, followed by HSM released, followed by
unstriped.
Rename loa_attr_bits to loa_valid. Add new fields loa_noattr and
loa_error to distinguish among the cases of xattrs we haven't tried to
read, xattrs which are not set, and xattrs which could not be read (or
parsed).
Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5b197dd7989a3f618c97c9025a4bd534dfe86152
Reviewed-on: https://review.whamcloud.com/46698
Tested-by: jenkins <devops@whamcloud.com>
John L. Hammond [Tue, 1 Mar 2022 20:16:52 +0000 (14:16 -0600)]
EX-4015 lipe: use direct IO
Use direct IO by default in lipe_scan3. Retry ext2fs_open() without
EXT2_FLAG_DIRECT_IO if it fails. Add a --direct-io=0|1 option to
explicitly disable or enable direct IO.
Add an --io-options option to pass down ext2 io_manager options.
Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I25347949bbff9e697da26431807daf37cfb720fa
Reviewed-on: https://review.whamcloud.com/46682
Tested-by: jenkins <devops@whamcloud.com>
John L. Hammond [Mon, 28 Feb 2022 14:05:18 +0000 (08:05 -0600)]
EX-4539 lipe: add -xattr and -xattr-match
Add '-xattr NAME' and '-xattr-match NAME VALUE' tests to
lipe_find3. Add sanity-lipe-find3 test_111() to verify.
Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I32c077f99d495cd79e670efef59e4a2939af753f
Reviewed-on: https://review.whamcloud.com/46681
Tested-by: jenkins <devops@whamcloud.com>
Alexandre Ioffe [Thu, 3 Mar 2022 00:39:45 +0000 (16:39 -0800)]
EX-4166 lipe: lamigo test coverage for OSS
Add test for lamigo ALR with multiple OSS's
Add debug trace point to report update message
from ofd_access_log_reader
Test-Parameters: trivial testlist=hot-pools
Test-Parameters: trivial testlist=hot-pools mdscount=2 osscount=2 mdtcount=2 ostcount=8
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iaae847190426ff34d8991e8a571b3e38616bc4c9
Reviewed-on: https://review.whamcloud.com/46686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Alexandre Ioffe [Fri, 25 Feb 2022 22:00:41 +0000 (14:00 -0800)]
EX-4866 lipe: don't unmount the local client
Exclude local client from unmount list when test is
completed in hot-pools test framework.
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I6b12269b6af3d3b5465645cbc007c9a5302f64a1
Reviewed-on: https://review.whamcloud.com/46671
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
John L. Hammond [Fri, 25 Feb 2022 23:12:54 +0000 (17:12 -0600)]
EX-4539 lipe: lipe_find3 print updates
Change -print-json to accept a comma separated list of
attributes. Optional attributes may be specified by placing them
inside brackets. For example "lipe_find3 DEVICE -print-json
'uid,gid,som,[size,blocks]' will only print JSON for inodes with a
valid UID, GID, and SoM atrribute. If in addition the size and blocks
attributes of the inode are valid then they will be included in the
object as well. Support a pass-through --list-json-attrs option.
Change the default action to print to relative path. Adjust
sanity-lipe-find3 accordingly.
Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Id380ca21e2b1aabf30f65fd3e14b7e2f7808d0a6
Reviewed-on: https://review.whamcloud.com/46630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Thu, 24 Feb 2022 23:17:38 +0000 (17:17 -0600)]
EX-4015 lipe: add -blocks, -crtime, -mirror-count, -stripe-count
Add -blocks, -crtime, -mirror-count, and -stripe-count to
lipe_find3. Add (crtime), (lov-mirror-count), and (lov-stripe-count)
to lipe_scan3.
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5b4314a9621309b00453fea637329d3de442544a
Reviewed-on: https://review.whamcloud.com/46607
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Thu, 24 Feb 2022 16:56:59 +0000 (10:56 -0600)]
EX-4015 lipe: batching and threading improvements
Remove the current group descriptor mutex from struct
scan_control. Use __ATOMIC_RELAXED fetch and add to allocate the next
batch of groups. Report the correct start group of the batch in
debugging output and remove a redundant batch debug message.
Use atomic loads and stores for the ti_should_stop member which is
responsible for lipe-scan-break. Check if the current thread should
stop in the outer loop of ls3_scan_thread_start_scm() as well as the
inner loop of ldiskfs_scan_groups().
Reduce the default scanning thread count from _SC_NPROCESSORS_ONLN / 2
to _SC_NPROCESSORS_ONLN / 4.
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic99c27504333f1d63a689e091d857e44062ef584
Reviewed-on: https://review.whamcloud.com/46605
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Mon, 21 Feb 2022 14:31:20 +0000 (08:31 -0600)]
EX-4015 lipe: add lipe-scan RPM
Adding new dependencies to existing EXAScaler RPMs may create
headaches when distributing hotfixes to existing installs. So move
lipe_find3 and lipe_scan3 to a new RPM (lipe-scan). This also has the
benefit of explicitly severing the new scanning tools from any python2
RPM or pip dependencies.
Compile fid.scm and find.scm to (%site-ccache-dir)/lipe/.
Test-Parameters: trivial testlist=sanity-lipe-find3 serverextra_install_params="--packages lipe-scan"
Test-Parameters: trivial testlist=sanity-lipe-scan3 serverextra_install_params="--packages lipe-scan" facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ifecb5ab1f399ba9be8cb395ded29d6394b13dc86
Reviewed-on: https://review.whamcloud.com/46572
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Tue, 15 Feb 2022 17:39:23 +0000 (11:39 -0600)]
EX-4539 lipe: remove -coverage
Remove -coverage from CFLAGS for lipe_scan3 and lipe_find3.
Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8be003650574104d5eaa8298043ba789e6464fde
Reviewed-on: https://review.whamcloud.com/46532
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Tue, 22 Feb 2022 15:43:28 +0000 (09:43 -0600)]
EX-4539 lipe: add -pool to lipe_find3
Add a -pool test to lipe_find3. Add sanity-lipe-find3 test_110() to
verify.
Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I9649d7f80431d22223da17372ec4d64fa6ca2f37
Reviewed-on: https://review.whamcloud.com/46584
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Mon, 21 Feb 2022 20:53:42 +0000 (14:53 -0600)]
EX-4539 lipe: add -perm to lipe_find3
Fill in the -perm test in lipe_find3. Populate sanity-lipe-find3
test_101() to verify.
Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib201503247101619416c39ae97f5068230441863
Reviewed-on: https://review.whamcloud.com/46576
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Tue, 8 Feb 2022 14:20:11 +0000 (08:20 -0600)]
EX-4539 lipe: add lipe_find3
Add a lipe_find3 wrapper around the lipe_scan3 scanner and test script
sanity-lipe-find3.sh.
Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I2259170e8b71a94394009aeaf9878a17c2a3fa6d
Reviewed-on: https://review.whamcloud.com/46417
Tested-by: jenkins <devops@whamcloud.com>
John L. Hammond [Mon, 21 Feb 2022 14:20:27 +0000 (08:20 -0600)]
EX-4015 lipe: add make-prompt-tag hack
Hack guile (make-prompt-tag ...) to return a list instead of a
gensym. This reduces catch overhead in multi-threaded code and is what
guile does eventually (see guile commit
283ab48d3f). Add this along
with some comments to ls3_scm_init() and rename that function to
ls3_module_init().
Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I2ddd324b290ee4e985bba171681927b9434bbcc5
Reviewed-on: https://review.whamcloud.com/46571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Wed, 16 Feb 2022 17:54:26 +0000 (11:54 -0600)]
LU-15559 tests: add do_node_vp() and do_facet_vp()
Add new test-framework functions (do_node_vp() and do_facet_vp())
which carefully escape and quote command lines for execution on the
local or remote node. Add sanityn test_0 to verify.
Lustre-change: https://review.whamcloud.com/46535
Test-Parameters: trivial env=ONLY="0" testlist=sanityn
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic491b0148e6ef11ecd0b3ccce983afcf4d1300e5
Reviewed-on: https://review.whamcloud.com/46537
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Thu, 17 Feb 2022 20:56:36 +0000 (14:56 -0600)]
EX-4015 lipe: add lma, filter_fid, and hsm json attributes
Add "lma", "filter_fid", and "hsm" json attributes. Use struct
lustre_mdt_attrs rather than broken out fields in loa. Add
sanity-lipe-scan3 tests 111, 112, 113 to "verify".
Combine init_lipe_scan3_env and init_lipe_scan3_env_file.
Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1856a896d192d08f9e16b9ac764030907256f79c
Reviewed-on: https://review.whamcloud.com/46547
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Thu, 17 Feb 2022 19:51:32 +0000 (13:51 -0600)]
EX-4015 lipe: cache layout in loa
In ldiskfs_read_attr_lov(), cache the decoded llapi_layout() in the
current object attrs. Then resue this layout in lov-pools. Add
lov-ost-indexes to return a list of all object OST indexes for the
current file.
Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I3451857fc25f1f9507b6e185bad39fcb3f0e6f22
Reviewed-on: https://review.whamcloud.com/46546
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Thu, 17 Feb 2022 15:52:59 +0000 (09:52 -0600)]
EX-4015 lipe: reduce ls3_object_attrs size
Reduce the size of struct ls3_object_attrs from 131KB to 360B by heap
allocating the link and lmv xattr buffers.
Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5b210cb46eb027dbc922675deb1231a544b93d6a
Reviewed-on: https://review.whamcloud.com/46544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Mon, 14 Feb 2022 18:14:41 +0000 (12:14 -0600)]
EX-4015 lipe: define lipe scheme module
Avoid spurious "possibly undefined symbol" warnings from the guile
compiler by placing all of the snarfed definitions into a "lipe"
module. Add the fid accessors to a "lipe fid" module.
Test-Parameters: testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ifbfee81422b1a3df22ee23f1945577c29e485aec
Reviewed-on: https://review.whamcloud.com/46525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Fri, 11 Feb 2022 18:11:28 +0000 (12:11 -0600)]
EX-4015 lipe: handle trusted xattrs uniformly
Add a wrapper (ldiskfs_trusted_xattr_get()) around ext2fs_attr_get()
to do uniform error messages and error handling.
Test-Parameters: trivial testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I5ad82a56b7729354364afa594b3d8d9ee83a4b7f
Reviewed-on: https://review.whamcloud.com/46513
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
John L. Hammond [Fri, 11 Feb 2022 17:05:15 +0000 (11:05 -0600)]
EX-4015 lipe: add fid2path cache to lipe_scan3
Add a thread local directory fid2path cache to lipe_scan3. Without the
cache, as single scanning thread could expect to do about 3K fid2path
operations per second. After the cache the rate improves to about
70K. We set the max cache size to 1024 FIDs and use LRU to reclaim
slots. Based on this a full cache will use about 4MB of memory per
thread.
Test-Parameters: testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8a022665de78e6b599f2b4c4f1e2b7400d4d8ffe
Reviewed-on: https://review.whamcloud.com/46509
Tested-by: jenkins <devops@whamcloud.com>
John L. Hammond [Thu, 3 Feb 2022 18:21:58 +0000 (12:21 -0600)]
EX-4015 lipe: add lipe_scan3
Add a guile embedded lipe scanner (lipe_scan3) and test script
sanity-lipe-scan3.sh.
Test-Parameters: testlist=sanity-lipe-scan3 facet=mds1
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I059fb4044db5baff76a04247fb8e3cbec82e5448
Reviewed-on: https://review.whamcloud.com/46416
Tested-by: jenkins <devops@whamcloud.com>
Chris Horn [Tue, 5 Oct 2021 19:11:29 +0000 (14:11 -0500)]
LU-15068 ptlrpc: Do not unlink difficult reply until sent
If a difficult reply is queued in LNet, or the PUT for it is
otherwise delayed, then it is possible for the commit callback
to unlink the reply MD which will abort the send. This results in
client hitting "slow reply" timeout for the associated RPC and
an unnecessary reconnect (and possibly resend).
This patch replaces the rs_on_net flag with rs_sent and rs_unlinked.
These flags indicate whether the send event for the reply MD has
been generated, and whether the MD has been unlinked, respectively.
If rs_sent is set, but rs_unlinked has not been set, then ptlrpc_hr
is free to unlink the reply MD as a result of the commit callback.
The reply-ack will simply be dropped by the server.
If ptlrpc_hr is processing the reply because of commit callback, and
rs_sent has not been set, then ptlrpc_hr will not unlink the reply
MD. This means that the reply_out_callback must also be modified to
check for this case when the send event occurs. Otherwise, if the ACK
never arrives from the client, then the MD would never be unlinked.
Thus when the send event occurs, and rs_handled is set, the
reply_out_callback will schedule the reply for handling by ptlrpc_hr.
Lustre-change: https://review.whamcloud.com/45138
Lustre-commit:
5c156b48425aae245537aaf10229734166463347
HPE-bug-id: LUS-10505
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib8f4853c7ab35d72624fce7ee3fba9e59a746e1f
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Gian-Carlo DeFazio [Wed, 9 Mar 2022 07:23:30 +0000 (23:23 -0800)]
LU-14865 utils: llog_reader.c printf type mismatch
Add (unsigned long long) cast to results of
__le64_to_cpu so that it matches the formatting (%llu)
of the enclosing printf call.
Build log message:
"llog_reader.c:887:9: error: format '%llu' expects
argument of type 'long long unsigned int', but
argument 3 has type '__u64' [-Werror=format=]"
Lustre-change: https://review.whamcloud.com/44346
Lustre-commit:
14b8276e06d6f4e3bfe785df1165458555e406f3
Test-Parameters: trivial
Fixes:
9962d6f84db5 LU-14617 utils: llog_reader updatelog support
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I9549e0a0bd21727dfcc42992b693bc39a779e1a1
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46757
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alexander Boyko [Wed, 9 Mar 2022 07:14:44 +0000 (23:14 -0800)]
LU-14617 utils: llog_reader updatelog support
The patch adds printing UPDATE_REC for llog_reader. It is usefull
for updatelog analyze. Here is an example of record
[0x50001a21b:0x1233d:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
[0x50001a211:0x475:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
[0x3800182e3:0x475:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
[0x200032c9a:0x245:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
[0x200000001:0x15:0x0] type:write/12 params:2 p_0:3 p_1:4
p_0 - 12/trusted.lov
p_1 - 0/
p_2 - 25972/\x0100000000000000000000000000000000000000000002000...
p_3 - 25974/\x0800000000000000P\xD1AB006x0000000400EC^\x000000...
p_4 - 1/
llog logic processing base on incrementing record index,
the fix adds checks for it. Also adds more info from header,
and drops useless - Bit X not set.
Lustre-change: https://review.whamcloud.com/43343
Lustre-commit:
9962d6f84db5fd587bbe13640a9361c2872f3728
Test-Parameters: trivial
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id50de15040526dc07ae708ac5db046832706be31
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46756
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Mikhail Pershin [Wed, 9 Mar 2022 08:45:38 +0000 (00:45 -0800)]
LU-14876 out: don't connect to busy MDS-MDS export
MDS-MDS connection is missing check for busy requests upon
reconnect, so resent can be executed concurrently with
original request.
- in ptlrpc_server_check_resend_in_progress() remove exception
for bulk requests, they can be compared by XID nowadays.
This prevents OUT requests vs resent execution as well.
- fix messages in target_handle_connect() to report correct
information about connection details
- in out_handle() check for last_xid only once per OUT_UPDATE
- test 110m is added to recovery-small to reproduce the issue
Lustre-change: https://review.whamcloud.com/44390
Lustre-commit:
301d76a71176c186129231ddd1323bae21100165
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2ad183674d59a2cdeab0037bd8551c607b10ffeb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lei Feng [Wed, 19 Jan 2022 02:02:20 +0000 (21:02 -0500)]
EX-4583 lipe: show all lpcc information in 'lpcc status' command
Collect lpcc config, lpcc_purge stats, and lustre stats related to
lpcc to together and show in 'lpcc status' command. 'lpcc status-all'
is an alias of 'lpcc status'. The output of 'lpcc status' looks like:
{
"/mnt/lustre": {
"pcc": [
{
"mount": "/mnt/lustre",
"cache": "/mnt/pcc",
...
},
{
"mount": "/mnt/lustre",
"cache": "/mnt/pcc2",
...
}
],
"fs_stats": {
}
}
Change-Id: I032763fb3b45646330b13f5cef34ce8658bddfe4
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc
Reviewed-on: https://review.whamcloud.com/46191
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Thu, 6 Jan 2022 01:34:15 +0000 (20:34 -0500)]
EX-4433 pcc: add some statistics data
Add statictics of the number and total size of pcc attached files
and pcc hit files.
Change-Id: Ib0e429c636298d4c6ff06d84a416073895b86184
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45976
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Deiter [Thu, 24 Feb 2022 14:21:01 +0000 (14:21 +0000)]
EX-4893 wrong dependencies for lustre-client-modules deb package
Fixed dependencies for DKMS deb package:
- added autocon, automake and libtool
- added bison and flex
- added required dev packages
- added linux-base and linux-image
- added python3-distutils-extra to fix build on Ubuntu 16.04
Change-Id: Ic1d05155cd8ad056dece1d22d0f040695d038652
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/46604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Thu, 14 Oct 2021 14:16:21 +0000 (17:16 +0300)]
LU-15142 lctl: fixes for set_param -P and llog_print
- properly handle permanent param deletion
- don't print skipped parameters in llog_print output
- add --raw option to llog_print to output all entries
including markers
Lustre-change: https://review.whamcloud.com/45332
Lustre-commit:
2a5b50d207173ca1ac71be8dfc39f98a2773bc3a
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id93a206a255dc885343efa293e1ee2672493e5e5
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Sat, 28 Dec 2019 09:42:54 +0000 (02:42 -0700)]
LU-13107 utils: clean up lctl command usage
The lctl usage is confusing because it lists a number of valid
commands after "testing (DANGEROUS)", such as LFSCK and llog.
Move the useful commands before the "testing" section so it is
not mis-interpreted as all following commands are dangerous.
Group some other commands together with more related commands,
rather than whatever order they happened to be imlpemented in.
Remove function prototypes for commands that no longer exist.
Lustre-change: https://review.whamcloud.com/37108
Lustre-commit:
b0efebdaef52d8ac9b02857166ceb00079612ebc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I469f9c92953762cc46a68e44238c4b67ebacab07
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/46637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Andreas Dilger [Fri, 28 Jan 2022 05:51:24 +0000 (22:51 -0700)]
LU-15316 tests: use integers in sanity test_255a
The [[ ... > ... ]] operator doesn't really compare floats, it
compares strings. That works as expected if the strings are
the same length, but fails for comparisons like [[ 32 > 123 ]].
Use (( ... > ... )) for comparisons, and only use integer values.
This test has been failing intermittently forever, but the error
was ignored because of running in a VM.
Lustre-change: https://review.whamcloud.com/46350
Lustre-commit: TBD (from
a96a4a5894bef714b19086fa09918080f05a7674)
Test-Parameters: trivial
Fixes:
f3b8f3fad502 ("tests: fix float comparison in sanity test_255a")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6787082cd579ae3f1bdd43222a739c939d3ebbe5
Reviewed-on: https://review.whamcloud.com/46618
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Deiter [Thu, 24 Feb 2022 12:15:32 +0000 (12:15 +0000)]
EX-4889 configure script does not check for required build tools
- added check for flex and bison
- added requirement for build kernel modules
Change-Id: I4f4f19ea44f3cd8f69482d950970bf701e81f7ec
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/46602
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Wed, 9 Mar 2022 18:04:23 +0000 (11:04 -0700)]
RM-620 build: New tag 2.14.0-ddn38
New tag 2.14.0-ddn38
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id8a05c166302a48b4553ec76922b70c665763277
Oleg Drokin [Wed, 8 Dec 2021 04:30:06 +0000 (23:30 -0500)]
LU-15340 llite: Delay dput in ll_dirty_page_discard_warn
Otherwise we can be final dput and need to wait for pages
to clear which is bad because this is called from ptlrpcd
that is not supposed to block esp. for network traffic as
it can cause livelocks if it happens to be needed to kill
the very same RPC we are waiting on.
Additionally pass in the inode from IO since the page
we are using might come from directio and that is
probably not even a valid inode.
Lustre-change: https://review.whamcloud.com/45784
Lustre-commit:
a1d75780ba19cfca53cbacf0d38e8d7df540b209
Fixes:
624a3ac23393 ("LU-921 llite: warning in case of discarding dirty pages")
Change-Id: Ie2f1a34047145202c11a4e1a0b18b2e01d9e4601
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/46635
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lei Feng [Fri, 24 Dec 2021 06:10:47 +0000 (01:10 -0500)]
EX-4408 lipe: collect some statistic data with lpcc_purge
Collect the number of cached files, min/max/avg file size,
min/max/avg age of files in LPCC. Scan cache device forcefully
even if it is not full enough to collect the statistics data
in time.
Change-Id: Id716d4689c83ecc5754e41734e44e7c051d36a8e
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc
Reviewed-on: https://review.whamcloud.com/45937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Fri, 17 Dec 2021 08:53:37 +0000 (16:53 +0800)]
LU-15381 hsm: update size upon completion of data version
We found a HSM retore followed by a HSM release will set the
file size with 0 wrongly during the tests.
The reason is that the file size and blocks information is
incorrect obtained via @ll_merger_attr().
The data version operation will flush dirty pages from all
clients, the size and blocks information returns from the Lustre
OST is correct.
In this patch, we update the size and block attributes for a file
upon the completion of the data version operation accordingly.
By this way, HSM release will set the size and blocks information
correctly after data version ioctl operation.
Add sanity-hsm test_261.
lustre-change: https://review.whamcloud.com/45935
lustre-commit:
dd3b5601ec6905b00d07cbcb8c139c46dd555b3b
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ifdbf6b58ecd00dc9677a2328438ef68529b72882
Reviewed-on: https://review.whamcloud.com/45935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46336
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vladimir Saveliev [Wed, 20 Oct 2021 10:32:11 +0000 (13:32 +0300)]
LU-14124 target: set OBD_MD_FLGRANT in read's reply
If tgt_grant_shrink() decides to not shrink grants - a client is
supposed to restore its cl_grant_avail in osc_update_grant(). In case
of read OBD_MD_FLGRANT is not set on reply's body->oa.o_valid, so
osc_update_grant() misses the cl_grant_avail update. As result server
keeps thinking that client has a lot of grants while a client thinks
that it is missing grants badly. That may lead to performance
degradation.
A test to illustrate the issue is included.
Lustre-change: https://review.whamcloud.com/43375
Lustre-commit:
4894683342d77964daeded9fbc608fc46aa479ee
Change-Id: Ibe7ce0af5701226c8be3ae3f9ad57c354791fa0f
HPE-bug-id: LUS-9943
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46468
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Wed, 11 Aug 2021 20:49:19 +0000 (14:49 -0600)]
LU-12807 tests: fix intermittent runtests failure
Occasional runtests failures are seen in full testing on ldiskfs.
Increase the llog space limit to 72KB from 50KB due to seeing a
regular failures in the 52/64KB range.
Lustre-change: https://review.whamcloud.com/44614
Lustre-commit:
14d07b623731233a62a8acd021c8ccdcb2705371
Test-Parameters: trivial testlist=runtests
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6e272fe9fec21a650110a42efe31a1dc48e35854
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46463
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Vladimir Saveliev [Mon, 1 Mar 2021 08:52:51 +0000 (11:52 +0300)]
LU-12752 mdt: commitrw_write() - check dying object under lock
If process writes to unlinked file the following race between
mdt_commitrw_write() and mdd_close() may occur because
mdt_commitrw_write() checks whether an object is dying without lock:
mdt_commitrw_write() checks lu_object_is_dying(&mo->mot_header) and it
not yet
mdd_close() interposes and destroys the object via
mdo_destroy()
lod_destroy()
lod_sub_destroy()
osd_destroy()
obj->oo_destroyed = 1;
mdt_commitrw_write() continues, locks the object and returns ENOENT
from
dt_attr_get()
osd_attr_get()
if (unlikely(obj->oo_destroyed))
return -ENOENT;
If the file is built of DoM and raid component ll_delete_inode() calls
cl_sync_file_range() which is to iterate over both mdt and raid
components via mdc_io_fsync_start() and osc_io_fsync_start(). As
mdc_io_fsync_start() fails with -ENOENT due to failed write rpc,
osc_io_fsync_start() does not get called. Then
truncate_inode_pages_final() finds not-discarded pages and fails with:
(osc_page.c:183:osc_page_delete()) Trying to teardown failed: -16
(osc_page.c:184:osc_page_delete()) ASSERTION( 0 ) failed:
(osc_page.c:184:osc_page_delete()) LBUG
Test to illustrate the issue is added.
The fix is to call lu_object_is_dying() under object lock.
Lustre-change: https://review.whamcloud.com/41797
Lustre-commit:
d48a0ebb5a8d5d49684325434b503e8aab085397
Change-Id: I463c8a6f85d4f5fd934b167c6194f50ae9d4b7d4
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46612
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Wed, 9 Mar 2022 08:34:43 +0000 (00:34 -0800)]
LU-13055 libcfs: allow comma-separated masks
For debug and changelog mask names, allow a comma-separated list
of names to be given, so that the space-separated list does not
need to be quoted for use.
Change sanity-quota to use a comma-separated list to verify it works.
Fix a couple of test cases where the debug parameter is set and
printed overly verbosely during tests.
Lustre-change: https://review.whamcloud.com/43741
Lustre-commit:
6b6fde1026311a28595ea43af56392ca6ad24d79
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icf1e3ebc74f0e48b38a65486b2275ec4c33ebbe5
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Hongchao Zhang [Fri, 21 Jan 2022 00:43:56 +0000 (08:43 +0800)]
LU-15218 quota: delete unused quota ID
Add lfs option '--delete' to delete unused quota ID.
Lustre-change: https://review.whamcloud.com/45548
Lustre-commit:
78be823f33396819724330d7154f054c52e11944
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I0d8e6b61dc23c7b22b6054bcced087b8dc94a277
Reviewed-on: https://review.whamcloud.com/46610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:46:47 +0000 (10:46 -0500)]
LU-13799 llite: Move free user pages
It is incorrect to release our reference on the user pages
before we're done with them - We need to keep it until the
i/o is complete, otherwise we access them after releasing
our reference. This has not caused any known bugs so far,
but it's still wrong.
So only drop these references when we free the aio struct,
which is only freed once i/o is complete.
Also rename free_user_pages to release_user_pages, because
it does not free them - it just releases our reference.
This also helps performance by moving free_user_pages to
the daemon threads. This is a 5-10% boost.
This patch reduces i/o time in ms/GiB by:
Write: 18 ms/GiB
Read: 19 ms/GiB
Totals:
Write: 180 ms/GiB
Read: 178 ms/GiB
mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
With previous patches in series:
write 5183 MiB/s
read 5201 MiB/s
Plus this patch:
write 5702 MiB/s
read 5756 MiB/s
Lustre-change: https://review.whamcloud.com/39443
Lustre-commit:
7f9b8465bc1125e51aa29cdc27db9a9d6fdc0b89
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5cf2201e5fd4eeee5b4c996de51d3a6a5394ae34
Reviewed-on: https://review.whamcloud.com/44685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:49:36 +0000 (10:49 -0500)]
LU-13799 llite: Remove unnecessary page get/put
Part of the aio cleanup code has the slightly strange
behavior of doing get on every page before calling page
cleanup, then doing a put after.
This was required because we call cl_page_list_del before
calling cl_page_delete, and cl_page_list_del was holding
the last reference on the page struct.
If we reverse the order, then we don't need the extra
get/put to keep the pages live. This should save
significant CPU time in the ptlrpcd threads when finishing
i/o, since this removes a get/put on every page.
Lustre-change: https://review.whamcloud.com/44293
Lustre-commit:
c2e94f08cf3ff000b350faf61b6d25ebbad7970e
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia9e7bf4a331a5220bfb1eab43493ac5fda59c611
Reviewed-on: https://review.whamcloud.com/44688
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:49:04 +0000 (10:49 -0500)]
LU-13799 llite: Do not get/put DIO pages
We've already told the kernel we're working with these pages
using the get/put_user_pages functions, and userspace must
hold references on them throughout the i/o anyway.
So getting/putting these vmpages is unnecessary. This
saves around 7% of the time in DIO page submission, netting
about that much of a performance improvement.
This patch reduces i/o time in ms/GiB by:
Write: 22 ms/GiB
Read: 19 ms/GiB
Totals:
Write: 135 ms/GiB
Read: 143 ms/GiB
mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
With previous patches in series:
write 6470 MiB/s
read 6354 MiB/s
Plus this patch:
write 7531 MiB/s
read 7179 MiB/s
Lustre-change: https://review.whamcloud.com/39438
Lustre-commit:
881b4c722296ff7ac22c6fd7988363f2cdad9f1e
Signed-off-by: Patrick Farrel <pfarrell@whamcloud.com>
Change-Id: Icfd5bc73ba4254898d6051f11ab6aea624948763
Reviewed-on: https://review.whamcloud.com/44686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:46:24 +0000 (10:46 -0500)]
LU-13799 llite: Implement lower/upper aio
This patch creates a lower level aio struct for each set of
pages submitted, and attaches that to the llite level aio.
That means the completion of i/o (in the sense of
successful RPC/page completion) is associated with the
lower level aio struct, and the higher level aio waits for
the completion of these lower level structs. Previously,
all pages were associated with the upper level (and only)
aio struct.
This patch is a reorganization/cleanup, which is necessary
for the next patch, which moves release pages to aio_end.
The justification for this (correctness and performance)
will be provided in that patch.
Lustre-change: https://review.whamcloud.com/44209
Lustre-commit:
46ff76137160b66f1d4437b3443859027faae9c4
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I2d1c01875c5478c54d7cdd93af7ff12bb4928d94
Reviewed-on: https://review.whamcloud.com/44684
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:46:03 +0000 (10:46 -0500)]
LU-13799 osc: Always set aio in anchor
We currently do not set csi_aio for DIO and use this to
control when we free the aio struct. (For AIO, we must
free it in cl_sync_io_note, but for other users, we have to
wait until after cl_sync_io_wait has been called.)
The lack of csi_aio causes trouble for the implementation
of the next patch, so instead we always set it and control
freeing by checking at that time if we are doing DIO.
Lustre-change: https://review.whamcloud.com/44153
Lustre-commit:
eadccb33ac4bbe54a01da5168f6170702f9b2629
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3590709c1cb891f1c111d611a438fd348e833787
Reviewed-on: https://review.whamcloud.com/44683
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:45:26 +0000 (10:45 -0500)]
LU-13799 llite: Simplify cda_no_aio_complete use
It is better to handle AIO and DIO the same as much as
possible, limiting the difference to setup if possible.
In this spirit, move the check for DIO (is_sync_kiocb()) to
the setup function rather than cleanup and just use
no_aio_complete.
Lustre-change: https://review.whamcloud.com/44154
Lustre-commit:
b60bd21ec5d5f34ed79c63158860b6f5e948dba2
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iefc0865aee20958960f48539ad73c80c997ff0b4
Reviewed-on: https://review.whamcloud.com/44682
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Wed, 19 Jan 2022 15:45:04 +0000 (10:45 -0500)]
LU-13799 lov: Cache stripe offset calculation
Calculating the page offset relative to the stripe (etc)
in a file is surprisingly expensive. Because i/o has
already been split up to stripes by the cl_io code,
calculating the stripe each time is unnecessary.
We cache most of the values requiring calculation.
This improves AIO/DIO page submission significantly,
improving performance by a bit over 10%.
Also remove lpg_generation, which isn't doing anything
useful. This suggests the possibility of removing
lov_page, but that's for another patch.
This patch reduces i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 22 ms/GiB
Totals:
Write: 119 ms/GiB
Read: 121 ms/GiB
mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
With previous patches in series:
write 7531 MiB/s
read 7179 MiB/s
Plus this patch:
write 8637 MiB/s
read 8488 MiB/s
Lustre-change: https://review.whamcloud.com/39445
Lustre-commit:
14db1faa0fbe813fed616435303753d390f45827
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I046af5eda9560da0fb8daa8ca025484851538a50
Reviewed-on: https://review.whamcloud.com/44681
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Tue, 8 Mar 2022 17:58:42 +0000 (12:58 -0500)]
LU-15317 llite: Fix iotrace name
The ES6 port of the iotrace patch missed the name of the
debug flag, leaving it as tty. Correct this.
Fixes:
32142909 ("LU-15317 llite: Add D_IOTRACE")
test-parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0674fa54d6bf13ac5d436f8fd94c6559a3eef27d
Reviewed-on: https://review.whamcloud.com/46743
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 4 Mar 2022 15:34:43 +0000 (10:34 -0500)]
LU-15317 osc: Add RPC to iotrace
Add RPCs to iotrace debugging.
To avoid creating too much debug output, this debug
ignores the possiblity that an RPC contains non-contiguous
extents. Thus the eventual visualization will act as
though the RPC is a continuous whole. I judge this to be
superior to the amount of log data and complexity of
capturing each extent separately. If that level of detail
is needed, a higher debug level can be used.
Lustre-change: https://review.whamcloud.com/45894/
Lustre-commit:
711182c0188b66c87c696fa11165de41c6e3675f (tbd)
Test-parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6fe416ba44be5572f130704ba9d3e9b85d09c656
Reviewed-on: https://review.whamcloud.com/46005
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 4 Mar 2022 15:34:01 +0000 (10:34 -0500)]
LU-15317 llite: Add COMPLETED iotrace messages
It's very useful to see how long an I/O call took. There
are other ways to do this, but the goal is for iotrace to
provide all necessary information for basic I/O performance
analysis, so we add COMPLETED messages to iotrace.
Lustre-change: https://review.whamcloud.com/46484/
Lustre-commit:
1c90d638a8d7993f5cbd70680d33052e888da6c3 (tbd)
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I17f52ebc87a31d5ba34f63dc8b6a279e83cd10ef
Reviewed-on: https://review.whamcloud.com/46703
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 4 Mar 2022 15:33:20 +0000 (10:33 -0500)]
LU-15317 llite: Add FID to async ra iotrace
IOtrace log entries need to include the FID of the file
concerned. Add this to async readahead.
test-parameters: trivial
Lustre-change: https://review.whamcloud.com/45912/
Lustre-commit:
4bd78fa2145efa1e73e42a0c650de8da3fb3887c (tbd)
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8d788969f29412ce88f1cafa229977f6efa20962
Reviewed-on: https://review.whamcloud.com/46702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 4 Mar 2022 15:32:35 +0000 (10:32 -0500)]
LU-15317 llite: Add strided readahead to iotrace
We need to capture some additional parameters to correctly
understand the behavior of strided readahead. Add these
parameters to the existing iotrace message.
test-parameters: trivial
Lustre-change: https://review.whamcloud.com/45888/
Lustre-commit:
d09c4ccd93175470088835a414a7be19638b1e4c (tbd)
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7caddf9dfaf9ba5f2645d045d5a4a50562cc1b54
Reviewed-on: https://review.whamcloud.com/46701
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 4 Mar 2022 15:31:02 +0000 (10:31 -0500)]
LU-15317 llite: Make iotrace logging quieter
Most of the time, we don't read any pages with readahead,
since we're moving through the window and aren't ready to
read more yet. That's important for readahead debug, but
there's no need to log it for iotrace. (This matters
because without this change, this messsage is the large
majority of iotrace messages.)
test-parameters: trivial
Lustre-change: https://review.whamcloud.com/45887/
Lustre-commit:
c8e04d958073e15040691c03606c1eb7631b5aff (tbd)
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I58197acd1ef97c903320a2433ec1d5dcb0e46bd0
Reviewed-on: https://review.whamcloud.com/46700
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
John L. Hammond [Fri, 14 Jan 2022 16:58:59 +0000 (10:58 -0600)]
LU-15452 utils: support lctl getattr for osc
In lctl:jt_obd_getattr(), support FIDs in addition to OIDs and print
whatever valid attributes were returned. Add a supporting
OBD_IOC_GETATTR case to osc_iocontrol().
# function lctl_osc_device() {
# Find osc device name for file and index.
# lctl_osc_device /mnt/lustre/... 42 => lustre-OST002a-osc-
ffff89cca1555000
local path="$1"
local index="$2"
local fsname=$(lfs getname --fsname "$path")
local instance=$(lfs getname --instance "$path")
printf '%s-OST%04x-osc-%sn' "$fsname" "$index" "$instance"
}
# lfs getstripe /mnt/lustre/f0 | grep l_ost_idx
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }
- 1: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] }
- 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] }
- 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] }
# lctl --device $(lctl_osc_device /mnt/lustre 1) getattr '[0x100010000:0x2:0x0]'
valid: 0x110000001008fff
oi.oi.oi_id: 0x100020000
oi.oi.oi_seq: 0x2
oi.oi_fid: [0x100020000:0x2:0x0]
atime: 0
mtime:
1642178551
ctime:
1642178551
size: 0
blocks: 0
blksize: 4194304
mode: 0107666
uid: 0
gid: 0
flags: 2097152
layout_version: 3
projid: 0
data_version:
4294967298
Lustre-change: https://review.whamcloud.com/46131
Lustre-commit:
4143c3bdec2a87319a6c9dee4a589cd41e85dad3
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I57d5778e9ac39030ae9477a0979f20b7f7460fc8
Reviewed-on: https://review.whamcloud.com/46390
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Amir Shehata [Fri, 19 Jun 2020 23:31:36 +0000 (16:31 -0700)]
LU-13621 lnet: utility to print cpt number
Added a command to lnetctl to print the cpt of the NID.
lnetctl cpt-of-nid --nid <nid> --ncpt <number of cpts>
ex:
lnetctl cpt-of-nid --nid 192.28.12.35@tcp9 --ncpt 7
This will return what cpt the NID will hash to within the 0-6 range.
If the NI is bound to specific set of CPTs, then the ncpts refers
to the number of CPTs the NI is bound to. The cpt value returned
will be an index into the list of bound CPTs.
For example if an NI is bound to [0,4,5,7], then the ncpt should be
4. And the returned value will be an index in the array:
ex:
lnetctl cpt-of-nid --nid 192.28.12.35@tcp9 --ncpt 4
cpt:
value: 1
therefore, the actual CPT the NID will be bound to is 4.
Lustre-change: https://review.whamcloud.com/39113
Lustre-commit:
df6f17ee97ac47c949c1963ff8d57fb2d4becd06
Test-parameters: trivial testlist=sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3cb562842448bfb663c2d41007be65299a919300
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Sat, 4 Dec 2021 05:14:09 +0000 (22:14 -0700)]
EX-3637 osd-ldiskfs: revalidate nonrotational state
Until the nonrotational state of the device is correctly exported by
the underlying storage, periodically recheck the nonrotational state
in case it is changed by the udev/tune_devices.sh script after the
OST is mounted. It would be possible to check less often (e.g. after
a time limit or some number of operations), but directly checking the
rotational state each time is not more expensive and is only checked
on statfs RPCs (sent about once per 5s from the MDS).
If the nonrotational state is changed and the flash-related cache
parameters have not been explicitly set, then tune them appropriately.
Rename the parameter functions and variables for read_cache_enable,
writethrough_cache_enable, and read_cache_max_filesize to match the
parameter names to make them easier to find in the code.
Lustre-change: https://review.whamcloud.com/45745
Lustre-commit: TBD (from
cbc74314532632fbcab88531d43deed7555e125e)
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iec78a5d5c22c0474eda84a5a793fbb00623ebbe5
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Tested-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-on: https://review.whamcloud.com/46514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Andreas Dilger [Thu, 30 Sep 2021 02:51:51 +0000 (20:51 -0600)]
LU-15045 utils: fix lfs_migrate for files with spaces
Fix the lfs_migrate script to properly quote "$OLDNAME" so that
it works for filenames with spaces and other characters in them.
Lustre-change: https://review.whamcloud.com/45173
Lustre-commit:
0549031b991ec0bbf570a41c7d28b8a7fd4b41c3
Test-Parameters: trivial
Fixes:
8bedfa377fbd ("LU-11510 lfs: migrate a composite layout file correctly")
Fixes:
128137adfc53 ("LU-13090 utils: fix lfs_migrate -p for file with pool")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic00f41f3a91ad9dfa491ff57768a3da0c6300c1e
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/46685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Jian Yu [Tue, 1 Mar 2022 09:28:42 +0000 (01:28 -0800)]
EX-4572 tests: fix setfattr stack_trap in init_hot_pools_env()
This patch fixes setfattr stack_trap in init_hot_pools_env() to
resolve the following error:
/usr/lib64/lustre/tests/test-framework.sh: line 2:
trusted.dmv=0xd00c......: command not found
Test-Parameters: trivial testlist=hot-pools
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=hot-pools
Fixes:
7ff7b9d8cf0 ("EX-3640 test: mkdir on MDT0 in hot-pools.sh")
Change-Id: Ia406a066455bd83c2199f96507a11b36aa80fea2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Mon, 28 Feb 2022 08:30:19 +0000 (00:30 -0800)]
LU-15492 build: fallthrough macro for pre/post gcc-7
gcc-7.5 on openSUSE 15:
error: this statement may fall through [-Werror=implicit-fallthrough=]
Use the __attribute__((fallthrough)) for gcc-7 and later.
and use a no op statement for earlier gcc where the fallthrough
attribute is not available.
Lustre-change: https://review.whamcloud.com/46357
Lustre-commit: TBD (from
48145f2eaed1537506bfffb1c0a44a8cfdb38254)
Test-Parameters: trivial
Fixes:
5549b1b9e0 ("LU-15220 lustre: use 'fallthrough' pseudo keyword for switch")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib72f5996149c738805f15e354e1e1606d981ce29
Reviewed-on: https://review.whamcloud.com/46643
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Olaf Faaland [Wed, 9 Mar 2022 08:01:20 +0000 (00:01 -0800)]
LU-14553 changelog: eliminate mdd_changelog_clear warning
When handling a changelog_clear request, the user may specify a
range of indices which do not exist. Similarly, the user may
specify a changelog user which does not exist. Neither indicates
a problem within Lustre that justifies a a console warning.
Change those cases to CDEBUG.
Lustre-change: https://review.whamcloud.com/43125
Lustre-commit:
6b183927e19715d093c80a35ebc42a1cda5e70e2
Test-Parameters: trivial
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I64bab12ef4978c4bf7139f5f36a39f9b109616fb
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46758
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Wed, 9 Mar 2022 08:39:21 +0000 (00:39 -0800)]
LU-14413 test: test for overstriping for sanity 27M
The introduction of sanity 27M broke interop with 2.12 LTS since
over striping doesn't exist in that version. Adjust the test to
use over striping if the client supports it, otherwise just use
traditional striping.
Lustre-change: https://review.whamcloud.com/44340
Lustre-commit:
4e1f9c4bd1d96063a1fbb2dfaab41b15836167ab
Test-Parameters: trivial testlist=sanity env=ONLY=27M
Change-Id: I2d788a116cbb749a83d6cec36f97d06533b32421
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46760
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alena Nikitenko [Wed, 12 Jan 2022 16:08:39 +0000 (19:08 +0300)]
EX-3342 tests: correct Lustre version checks in sanity
Many patches land to the EXAScaler branches as ports from
other branches. Sometimes the tests that are included with
the ported patches check the version of Lustre to ensure
that the feature it tests exists in this version of Lustre.
These version values are not always changed when patches
are ported from one branch to another.
Modified b_es6_0 version of these tests for interop
testing.
Sanity tests 65n and 311 were modified.
Performing test steps on versions >= 2.12.59 or between
versions 2.12.3 and 2.12.50 makes no sense for b_es5_2
branch, because all Lustre tags on this branch are lower
than 2.12.50.
Checks in test 65n will be changed to >= 2.12.3.6 which
is a tag for the patch that added this test to b_es5_2
branch.
Version check in test 311 is not needed at all, because
it includes all possible versions on b_es5_2 branch.
Lustre-change: https://review.whamcloud.com/46068
Lustre-commit:
2d92fc888e12fb7b58d496952ccd69dd10299c09
Test-Parameters: env=ONLY="65n 311" serverversion=2.10.8 serverdistro=el7.6
Test-Parameters: env=ONLY="65n 311" serverversion=2.14
Test-Parameters: trivial env=ONLY="65n 311"
Test-Parameters: env=ONLY="65n 311" clientversion=2.12.6-ddn42
Test-Parameters: env=ONLY="65n 311" serverversion=2.12.6-ddn42
Fixes:
550e41edbcb ("LU-11739 lod: subdir under ROOT should honor default layout")
Fixes:
37a9077f103 ("LU-11208 tests: add version check to sanity tests")
Signed-off-by: Alena Nikitenko <anikitenko@ddn.com>
Change-Id: Ic6383fe5585e18646a365f49f501da52848c765a
Reviewed-on: https://review.whamcloud.com/46421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alena Nikitenko [Tue, 4 Jan 2022 22:59:39 +0000 (01:59 +0300)]
EX-3342 tests: fix Lustre version in test skip checks in sanity
Many patches land to the EXAScaler branches as ports from
other branches. Sometimes the tests that are included with
the ported patches check the version of Lustre to ensure
that the feature it tests exists in this version of Lustre.
These version values are not always changed when patches
are ported from one branch to another.
Change Lustre test suite version checks to be relative to
this branch. Sanity tests 160j,432,810,812 were modified.
Lustre-change: https://review.whamcloud.com/45967
Lustre-commit:
60186cbf11d7cfa24e94721b8bf77e0dc5af0c50
Test-Parameters: env=ONLY="160j 432 810 812" serverversion=2.10.8 \
serverdistro=el7.6
Test-Parameters: env=ONLY="160j 432 810 812" clientversion=2.12.6-ddn42 \
mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: env=ONLY="160j 432 810 812" serverversion=2.12.6-ddn42 \
mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: env=ONLY="160j 432 810 812",SHARED_KEY=true mdscount=2 \
mdtcount=4 osscount=1 ostcount=8 clientcount=2 serverversion=2.12.6-ddn42
Test-Parameters: env=ONLY="160j 432 810 812",SHARED_KEY=true mdscount=2 \
mdtcount=4 osscount=1 ostcount=8 clientcount=2 clientversion=2.12.6-ddn42
Test-Parameters: trivial env=ONLY="160j 432 810 812"
Fixes:
9d88030b8a5 ("LU-11626 mdc: hold obd while processing changelog")
Fixes:
6a70c5ec116 ("LU-14804 nodemap: do not return error for improper ACL")
Fixes:
bc38e0f6eaa ("LU-12784 llite: limit max xattr size by kernel value")
Fixes:
78ddbc59530 ("LU-11951 ptlrpc: reset generation for old requests")
Signed-off-by: Alena Nikitenko <anikitenko@ddn.com>
Change-Id: I91b45cc9a9d78ab96877edfef59cf6e8fe259452
Reviewed-on: https://review.whamcloud.com/46432
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Thu, 12 Aug 2021 20:28:29 +0000 (16:28 -0400)]
LU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o
Logging to the console is relatively expensive and doing it
for fake i/o is very expensive in terms of CPU time.
If we use FAIL_CHECK_QUIET, a debug message is logged only once
to the console, and the rest at D_INFO level (probably not at all).
This should hugely reduce the CPU cost of the debugging.
Lustre-change: https://review.whamcloud.com/44651
Lustre-commit:
890466a32d3e8683a96d5a18e76c0cc704810f5f
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I46a5042efd116a4f5c80eaf0d5dae7fe132f6a79
Reviewed-on: https://review.whamcloud.com/46372
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
James Simmons [Thu, 10 Feb 2022 19:06:25 +0000 (11:06 -0800)]
LU-15420 uapi: avoid gcc-11 -Werror=stringop-overread
GCC 11 warns about string and memory operations on fixed address:
In function ‘strlen’,
inlined from ‘changelog_rec_sname’ at include/uapi/linux/lustre/lustre_user.h:1981:19,
inlined from ‘mdd_changelog_rec_ext_rename’ at lustre/mdd/mdd_dir.c:932:2,
inlined from ‘mdd_changelog_ns_store’ at lustre/mdd/mdd_dir.c:1061:3:
include/linux/fortify-string.h:25:33: error: ‘__builtin_strlen’
reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
25 | #define __underlying_strlen __builtin_strlen
The reason is that we are looking for an address right after the end
of the chanelog record header which gcc thinks is an overrun. Rework
the code to allow us to index the memory right after the changelog
record header.
Also fix a long hidden bug in the lustre snmp code.
Lustre-change: https://review.whamcloud.com/46319
Lustre-commit: TBD (from
0ae7f1196ba14df9a2354f0803eb75eea92cb1ee)
Change-Id: I13479b9074a392330d39f01656b26f9e9a91a8ec
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Mon, 28 Feb 2022 09:05:22 +0000 (01:05 -0800)]
LU-15587 kernel: kernel update RHEL7.9 [3.10.0-1160.59.1.el7]
Update RHEL7.9 kernel to 3.10.0-1160.59.1.el7.
Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9
Change-Id: I4f445be6c7ed341e6e912c7df97775af7e46c1e1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46644
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Thu, 27 Jan 2022 19:16:42 +0000 (11:16 -0800)]
LU-15196 kernel: kernel update RHEL8.4 [4.18.0-305.25.1.el8_4]
Update RHEL8.4 kernel to 4.18.0-305.25.1.el8_4 for server support.
Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity
Test-Parameters: trivial fstype=zfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity
Change-Id: If05ec88e37faa8d61881b3987a3364544e39f8a9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46343
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Minh Diep [Wed, 12 Jan 2022 03:06:10 +0000 (19:06 -0800)]
LU-15286 build: only use baseonly option on el7
el7 baseonly option allow to build perf package while
in el8 does not.
Lustre-change: https://review.whamcloud.com/45677
Lustre-commit:
6eb5d3ae4aaab1662afc8ae50f56b0a0eaa6556a
Change-Id: Ie973c5cc816b4b98ef71ab7080bd11286bcd644a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Wang Shilong [Thu, 6 Jan 2022 14:02:24 +0000 (09:02 -0500)]
LU-11597 test: Fix sanityn 16a failed on arm
As now O_DIRECT expect IO aligned with PAGE SIZE,
x86_64 expect 4K size, but some other platform, it
could be 64K, use PAGE_SIZE here to make the test happy.
And macro O_DIRECT is defined if macro _GNU_SOURCE is defined
according to open man doc[1] and _GNU_SOURCE is defined at the
head of file fsx.c already. So set the value of OP_DIRECT to
O_DIRECT instead of hardcoding its value as O_DIRECT could have
different values for other platforms like Arm64[2].
[1]
https://man7.org/linux/man-pages/man2/open.2.html
"The O_DIRECT, O_NOATIME, O_PATH, and O_TMPFILE flags are Linux-
specific. One must define _GNU_SOURCE to obtain their definitions."
[2]
https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h.html#_M/__O_DIRECT
Lustre-change: https://review.whamcloud.com/37589
Lustre-commit:
eb704aecfaad2a6256d1e2e48cdfadbabb07e5cb
Fixes:
853d180121a6 ("LU-3606 fsx: Add fallocate operation to fsx")
Test-Parameters: testlist=sanityn envdefinitions=ONLY=16a
Change-Id: If72d434adaf91a960dfc50c557d8b50793fda575
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46430
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Wed, 26 Jan 2022 23:33:27 +0000 (16:33 -0700)]
LU-15487 mdd: print FID in mdd_dir_page_build() error
Print the MDT name and FID in mdd_dir_page_build() when an error
is hit. Because this changes the callback function signature,
also update dt_index_page_build() to print a more useful message.
Add OBD_FAIL_MDS_DIR_PAGE_WALK to allow triggering this codepath
to see if this is the source of problems in error handling.
Lustre-change: https://review.whamcloud.com/46368
Lustre-commit: TBD
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic475f4a2c775871ff5af59a47e0966ba3eed7013
Reviewed-on: https://review.whamcloud.com/46335
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46370
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Xinliang Liu [Thu, 28 Oct 2021 09:48:38 +0000 (09:48 +0000)]
LU-11667 tests: Fix sanity test 317 for 64K PAGE_SIZE OST
When create a file, blocks are allocated with PAGE_SIZE aligned,
see function osd_ldiskfs_map_inode_pages(). E.g. for 64K PAGE_SIZE
Arm64 OST server, if create a file with size less than 64K, it
actually allocates 128 blocks each block 512 Bytes.
It needs to adjust the test for 64K PAGE_SIZE OST server.
Lustre-change: https://review.whamcloud.com/45395
Lustre-commit:
63d4d9ff2f5c8cc992ca6b2f698bb43a3257bfb3
Test-Parameters: trivial
Test-Parameters: clientdistro=el8.5 clientarch=aarch64 fstype=ldiskfs testlist=sanity \
env=PTLDEBUG=-1,ONLY=317
Test-Parameters: fstype=ldiskfs testlist=sanity \
env=PTLDEBUG=-1,ONLY=317
Change-Id: Iada701f4f424093e847fc70aa843873b75fe6b06
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46482
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Wed, 2 Feb 2022 22:05:18 +0000 (15:05 -0700)]
LU-15513 lod: skip uninit component in lod_fill_mirrors
Do not iterate over the "objects" in lod_fill_mirrors() to check
for non-rotational OSTs if the component is uninitialized. In
cases where an OST is not present (e.g. sparse OST indexes used)
the lod_tgt_desc[] array has holes and OST_TGT() returns NULL.
Skip the loop entirely if the component is not initialized, but
also add some sanity checks to verify that the OST index values
are sane in case there are other problems in the future (e.g.
corrupt/invalid layout on disk).
Lustre-change: https://review.whamcloud.com/46435
Lustre-commit:
591a990c617f9b953d2e838427d45fa1de061a83
Fixes:
8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8ec23367059a4ec9e483adb768095b24f03ebbe5
Reviewed-on: https://review.whamcloud.com/46437
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Hongchao Zhang [Wed, 13 Oct 2021 09:02:23 +0000 (17:02 +0800)]
LU-15021 quota: protect lqe_glbl_data in lqe
The lqe_glbl_data in "struct lquota_entry" is allocated in
qmt_lvbo_init and freed in qmt_lvbo_free, it could be freed
during qmt_seed_glbe called by qmt_set_id_notify, and cause
panic because of using freed memory.
Lustre-change: https://review.whamcloud.com/45098
Lustre-commit: TBD (from
124c60bd3ff7705a63a572f4619ecbc5cff13f4c)
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I274f07ee8609c83852572be51625cc929a9130ec
Reviewed-on: https://review.whamcloud.com/46351
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 4 Mar 2022 19:26:21 +0000 (14:26 -0500)]
LU-15609 flr: Don't assume RDONLY implies SOM
In lov_io_slice_mirror_init, the client code assumes that
the LCM_FL_RDONLY flag in the layout implies SOM and skips
glimpse if it sees one. The RDONLY flag means the mirrors
are in sync, which has historically implied SOM is valid.
To start with, using LCM_FL_RDONLY to imply SOM is sort of
a layering violation. SOM is only communicated from the
MDS when it is valid, and the client already skips glimpse
in that case, so this duplicates functionality from the
higher layers.
More seriously, patch:
"LU-14526 flr: mirror split downgrade SOM"
(https://review.whamcloud.com/43168/)
Made it possible to have LCM_FL_RDONLY but not strict SOM,
so this assumption is no longer correct.
The fix is to not look at LCM_FL_RDONLY when deciding
whether to glimpse a file for size.
Lustre-change: https://review.whamcloud.com/46666/
Lustre-commit:
91db60601be1e3ac1f06dc79185f67884598acd2 (tbd)
Fixes:
d87b6f4d1c ("LU-14526 flr: mirror split downgrade SOM")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5ed0a2124005bc58c1ed11681c9bd642cffcd1b5
Reviewed-on: https://review.whamcloud.com/46668
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Mon, 28 Feb 2022 18:26:40 +0000 (11:26 -0700)]
RM-620 build: New tag 2.14.0-ddn37
New tag 2.14.0-ddn37
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia37f6b3a8348ab8e92ba73cfa2c0e56aa95de38d
Alex Deiter [Thu, 24 Feb 2022 14:10:40 +0000 (14:10 +0000)]
EX-4890 configure script changes system header and config files
Remove the SUBDIRS target from configure tests.
Since Linux kernel version 2.6.x we can use target M.
Change-Id: I8e59bdaf2d0e4e08a659e08f63a14472fba72eb2
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/46603
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>