Whamcloud - gitweb
fs/lustre-release.git
4 years agoNew RC 2.12.4-RC1 2.12.4-RC1 v2_12_4-RC1
Oleg Drokin [Tue, 28 Jan 2020 22:39:51 +0000 (17:39 -0500)]
New RC 2.12.4-RC1

Change-Id: Ia0ed234bd5b7ffb74f1c1ec73190a34504f05496
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11385 odbclass: Handle gracefully if nsproxy is NULL 14/37314/2
Serguei Smirnov [Tue, 19 Nov 2019 22:18:17 +0000 (14:18 -0800)]
LU-11385 odbclass: Handle gracefully if nsproxy is NULL

Gracefully handle the case if current->nsproxy is NULL:
check for the condition and return an error, avoiding attempts
to dereference the pointer.

Lustre-change: https://review.whamcloud.com/36802
Lustre-commit: 15278c6d32a5a9a7a2b8ac9e08c8702383e0c2ff

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ia102d2bacdb0e54b0339985396447e6d25465c56
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37314
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12637 kernel: new kernel [RHEL 8.1 4.18.0-147.3.1.el8_1] 86/37186/4
Jian Yu [Fri, 3 Jan 2020 07:28:20 +0000 (23:28 -0800)]
LU-12637 kernel: new kernel [RHEL 8.1 4.18.0-147.3.1.el8_1]

This patch makes changes to support new RHEL 8.1 release
for Lustre client.

Test-Parameters: trivial clientdistro=el8.1 \
envdefinitions=SANITY_EXCEPT="411 817" \
testlist=sanity

Lustre-change: https://review.whamcloud.com/36946
Lustre-commit: 97e93c8f267a7d9fb9ee6d96b040236172a7f247

Change-Id: Ifcc0a15c3ad9afa99b670641f91b23c1a5c0668e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37186
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11385 lnet: check if current->nsproxy is NULL before using 13/37313/2
Sonia Sharma [Sat, 30 Mar 2019 08:32:34 +0000 (01:32 -0700)]
LU-11385 lnet: check if current->nsproxy is NULL before using

A crash is seen at few sites in the function
rdma_create_id(current->nsproxy->net_ns, cb, dev, ps, qpt).
The issue is identified with the first param in this
function - current->nsproxy->net_ns. There is a
possibility that this value is NULL and resulting in
"kernel NULL pointer dereference" crash.

Handle the case of NULL value gracefully by adding
a check and using init_net if current or
current->nsproxy is NULL.

Lustre-change: https://review.whamcloud.com/34577
Lustre-commit: ef1783e282f6eba9d69b0957f1b5fed00be0cbd6

Change-Id: I06349e081f2c4ba0480b3924fc304f94ca765891
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37313
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12853 ptlrpc: zero session enviroment 05/37305/2
Alexander Boyko [Mon, 14 Oct 2019 07:31:35 +0000 (03:31 -0400)]
LU-12853 ptlrpc: zero session enviroment

handle_recovery_req() set le_ses for request processing,
and doesn't zero it after. This leads to accessing freed memory
at keys_fill() later.

The patch also adds a cleanup for xxx_env_info, makes them equal
and combines to a single function.

Lustre-change: https://review.whamcloud.com/36443
Lustre-commit: 2a620f07e23b3b044f429f049bcc5ffa96f6d844

Cray-bug-id: LUS-7676
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: Ifad95c1177258b6f71effe5fa815f68c8426c516
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37305
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13098 ptlrpc: supress connection restored message 15/37315/2
Alex Zhuravlev [Sat, 21 Dec 2019 15:40:20 +0000 (18:40 +0300)]
LU-13098 ptlrpc: supress connection restored message

if that happens on idling connection.

Lustre-change: https://review.whamcloud.com/37086
Lustre-commit: 7aa58847b94d0ebb2796774a2de2183ba7f8cc4b

Fixes: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I506665d427f3e77477f53e2d3059bcb1daaf0318
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37315
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12799 ptlrpc: return proper error code 64/37164/3
Alex Zhuravlev [Tue, 24 Sep 2019 20:29:01 +0000 (23:29 +0300)]
LU-12799 ptlrpc: return proper error code

from ptlrpc_disconnect_prep_req() using ERR_PTR()
as the callers expect.

Lustre-change: https://review.whamcloud.com/36282
Lustre-commit: 9e2620d75cce1e1b4855704ddd9a994ce8e8d650

Fixes: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect")
Change-Id: I5493194a1f18f3d0b559921b7859bf835585ba58
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-on: https://review.whamcloud.com/37164
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13092 lbuild: include lbuild-{fc,rhel,sles} to SIGNATURE 12/37312/2
Wang Shilong [Thu, 9 Jan 2020 01:34:28 +0000 (09:34 +0800)]
LU-13092 lbuild: include lbuild-{fc,rhel,sles} to SIGNATURE

We should include these files to calculate SIGNATURE, for example
bump kernel extra tags could happen there.

Lustre-change: https://review.whamcloud.com/37076
Lustre-commit: b39e1e6e3e4ea396ad842ec3695f45cfd5dfb79e

Test-Parameters: trivial
Change-Id: I2c62ad765d3c6a1b9e99affe3be95a404d6140c5
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37312
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12988 osd: do not use preallocation during mount 55/37155/3
Alex Zhuravlev [Thu, 14 Nov 2019 15:13:16 +0000 (18:13 +0300)]
LU-12988 osd: do not use preallocation during mount

as cold mballoc cache can cause very lengthy search.

Lustre-commit: ae21fce625ec6cd134fa4764683f00bc692132cb
Lustre-change: https://review.whamcloud.com/36704

Change-Id: I821b023d392336f0085a96e821dc22e92dbf23b7
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/37155
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13087 target: init lcd last transno from reply data 87/37187/2
Mikhail Pershin [Thu, 5 Dec 2019 21:23:01 +0000 (00:23 +0300)]
LU-13087 target: init lcd last transno from reply data

Init lcd_last_transno value from reply data to keep it
valid so tgt_release_reply_data() will keep a slot with
the highest transno and on-disk data is not lost.

Lustre-change: https://review.whamcloud.com/37060
Lustre-commit: 52c1cbaa7db7505642b64b2d85448d506a444661

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id31b3b250616fb6afd3d145c31b12af30ac86be8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37187
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11911 lov: fix lov_iocontrol for inactive OST case 26/37226/2
Vladimir Saveliev [Fri, 1 Feb 2019 00:16:29 +0000 (03:16 +0300)]
LU-11911 lov: fix lov_iocontrol for inactive OST case

For inactive OSTs lov->lov_tgts[index]->ltd_exp is
NULL. lov_iocontrol() is to check that before dereferencing to
lov->lov_tgts[index]->ltd_exp->exp_obd.

Lustre-change: https://review.whamcloud.com/34148
Lustre-commit: 0facd12afa33c61e4123f6e793d232d8c814fbec

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Cray-bug-id: LUS-6937
Test-Parameters: trivial
Change-Id: I4bb332ee2c50b07a1471035556f4d77a3559847f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13061 osp: check catlog FID after reading in 85/37185/3
Hongchao Zhang [Thu, 19 Dec 2019 02:52:29 +0000 (21:52 -0500)]
LU-13061 osp: check catlog FID after reading in

In osp_sync_llog_init, the catlog FID read from "CATALOGS"
should be checked whether it is sane or not.

Lustre-change: https://review.whamcloud.com/36998
Lustre-commit: 4597fa7d884de0f1a1b030052d4d34983fed6109

Change-Id: I4342b21b7d5c6d408a9ab52a1e30815ae1d5f563
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37185
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11770 misc: fix bdev_integrity_enabled definition 67/37167/2
Li Dongyang [Thu, 9 Jan 2020 10:33:24 +0000 (21:33 +1100)]
LU-11770 misc: fix bdev_integrity_enabled definition

part of the patch was missed when it was backported
to b2_12, as a result bdev_integrity_enabled will
always defined as a function just returns false.

Change-Id: I9c9a83f3011f939e7f6d72140c08943d82a5416d
Fixes: b14e6617b9 ("LU-11770 osc: allow build without blk_integrity or crc-t10pi")
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/37167
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13059 kernel: kernel update RHEL7.7 [3.10.0-1062.9.1.el7] 61/36961/5
Jian Yu [Mon, 6 Jan 2020 07:47:24 +0000 (23:47 -0800)]
LU-13059 kernel: kernel update RHEL7.7 [3.10.0-1062.9.1.el7]

Update RHEL7.7 kernel to 3.10.0-1062.9.1.el7.

Test-Parameters: trivial clientdistro=el7.7 serverdistro=el7.7

Change-Id: I65a65db8cf044b1b91d5b116746efda9383fcf48
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36961
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11867 osd-ldiskfs: FID in LMA mismatch won't block create 35/37135/3
Lai Siyao [Mon, 7 Jan 2019 03:37:48 +0000 (11:37 +0800)]
LU-11867 osd-ldiskfs: FID in LMA mismatch won't block create

Sometimes two OST objects may be mapped to the same inode, so the
second object FID mismatch with FID in inode LMA, in this case,
if this inode was not written yet, it's safe to set object inode
to NULL to let it create a new inode.

Another case is if the mapped inode doesn't exist, it's also safe
to not initialize inode and return 0, so that create can succeed.

Add sanity-scrub.sh 4d for this.

Lustre-change: https://review.whamcloud.com/34052
Lustre-commit: cbf59ba6a56086c53a15622db7fa9f95d9798b7f

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ic84cdeaca2ea202ab0c01a0075a2f9ee8627f508
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12759 osc: don't re-enable grant shrink on reconnect 52/37152/2
Alexander Zarochentsev [Wed, 10 Jul 2019 18:37:33 +0000 (21:37 +0300)]
LU-12759 osc: don't re-enable grant shrink on reconnect

client requests grant shrinking support on each
reconnect and re-enables the capability even it was
explicitly disabled by lctl set_param.

Lustre-change: https://review.whamcloud.com/36177
Lustre-commit: efa3425c5f5a6763ea834408b982e4df5a90c914

Cray-bug-id: LUS-7585
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I87b1718022ee3346c9b177890a118410c5757458
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37152
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12026 mdt: MDS stores atime|mtime|ctime during close 69/36869/3
Qian Yingjin [Wed, 25 Sep 2019 09:14:12 +0000 (17:14 +0800)]
LU-12026 mdt: MDS stores atime|mtime|ctime during close

In order to make direct inode scanning on the MDT useful, in
addition to storing the file size/blocks via LSOM on the MDT, we
also need to store the atime/mtime/ctime on the MDT inodes.

Currently the atime is already lazily updated on the MDS (at
close time). In this patch, the final mtime/ctime are sent to the
MDS at close time and updated on the MDT inode, and make MDT-only
scanning workable.

Lustre-change: https://review.whamcloud.com/36286
Lustre-commit: d2f7cb7934a0b38fa9503e8257f2b70ed656c11d

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I4465281a03d70919c388cb241c16eebcb03e850f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36869
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13070 mdd: try old format for orphan names during recovery 29/37129/3
Artem Blagodarenko [Tue, 17 Dec 2019 09:12:36 +0000 (12:12 +0300)]
LU-13070 mdd: try old format for orphan names during recovery

mdd_orphan_destroy() loop caused by compatibility issue on upgrade to
2.11 or later. The format for names of orphans in the PENDING directory
was changed in Lustre 2.11. The old format names are not recognized by
mdd_orphan_destroy() in Lustre 2.11, but compatibility code added to
handle this was incomplete, leading to an endless loop. There's a check
for the old format name, used in mdd_orphan_delete(), but that check
was not included in mdd_orphan_destroy().

This patch adds compatibility check for mdd_orphan_destroy().

Lustre-change: https://review.whamcloud.com/37049
Lustre-commit: 05fca4be33067f24a02e527c88cff5b60a20bb39

Fixes: a02fd4573fe ("LU-7787 mdd: clean up orphan object handling")
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Cray-bug-id: LUS-8270
Change-Id: I9f42188dcb00f9d536996c14771de7df02502b40
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37129
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13043 quota: remove annoying message in osd_declare_inode_qid() 31/37131/3
Wang Shilong [Tue, 3 Dec 2019 06:32:22 +0000 (14:32 +0800)]
LU-13043 quota: remove annoying message in osd_declare_inode_qid()

The admin shouldn't be getting console error messages when a user goes
over quota(this would be happening continuously at some sites).

In some call paths, the "*flags" parameter may be NULL, don't try to
access it in that case.

As a general cleanup, move the QUOTA_FL_* flags over to a named enum
"enum osd_quota_local_flags" so that it is easier to see what this field
actually holds, rather than a totally generic "int *flags" argument that
has to be hunted through the code.

Lustre-change: https://review.whamcloud.com/36906
Lustre-commit: b3005155317b27e19c8029e6a9f92e69d0dd905e

Fixes: d30f9e6b6c5d ("LU-11425 quota: support quota for DoM")
Change-Id: Id5686ecdb8a943e48a2888067e321f83b8569188
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13077 pfl: cleanup xattr checking 32/37132/2
Sebastien Buisson [Fri, 13 Dec 2019 16:39:08 +0000 (01:39 +0900)]
LU-13077 pfl: cleanup xattr checking

Cleanup xattr checking in mdd and lod layers for PFL.

Lustre-change: https://review.whamcloud.com/37010
Lustre-commit: f765c6ceb8a4a2415a7956498f7fdaefa477ba55

Reported-by: Clement Barthelemy <clement.barthelemy@nextino.eu>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2841b615ee304785fbf316b829d8280eefc3878a
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37132
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12898 utils: %llu mismatch with type __u64 on ppcle64 30/37130/2
Olaf Faaland [Tue, 22 Oct 2019 16:44:51 +0000 (09:44 -0700)]
LU-12898 utils: %llu mismatch with type __u64 on ppcle64

Fix build errors like this one on ppcle64:

BUILDSTDERR: libmount_utils_zfs.c: In function 'zfs_mkfs_opts':
BUILDSTDERR: libmount_utils_zfs.c:573:5: error: format '%llu' expects
argument of type 'long long unsigned int', but argument 4 has type
'__u64' [-Werror=format=]
BUILDSTDERR:      mop->mo_device_kb * 1024);

__u64 was treated as an unsigned long long which breaks the build on
ppc64le, where they are not the same size.

In printf cases, cast to unsigned long long to match the printf format
so the format is compatible with the type and it is guaranteed
not to lose any data.

In the case of sscanf(), replace the call with strtoull() to eliminate
the issue.

Lustre-change: https://review.whamcloud.com/36558
Lustre-commit: 56b4b112a497661de8dbf5a851c7a045d470deff

Test-Parameters: trivial
Change-Id: I02fd82e0be4d756881c15aa9faedb9b40961661a
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37130
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12791 kernel: kernel update RHEL 8.0 [4.18.0-80.11.2.el8_0] 28/36528/4
Jian Yu [Fri, 13 Dec 2019 07:04:15 +0000 (23:04 -0800)]
LU-12791 kernel: kernel update RHEL 8.0 [4.18.0-80.11.2.el8_0]

Update RHEL 8.0 kernel to 4.18.0-80.11.2.el8_0 for Lustre client.

Test-Parameters: trivial clientdistro=el8 \
testlist=sanity

Change-Id: I4081719fa9a8c83ea0e8bff46dc9d54774cabb56
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36528
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11656 llite: fetch default layout for a directory 72/37072/3
Jian Yu [Tue, 19 Nov 2019 22:19:24 +0000 (14:19 -0800)]
LU-11656 llite: fetch default layout for a directory

For a directory that does not have trusted.lov xattr, the current
"lfs getstripe" will only print the stripe_count, stripe_size,
and stripe_index that are fetched from the /sys/fs/lustre/lov values.
It doesn't show the actual default layout that will be used when
new files will be created in that directory.

This patch fixes the above issue in ll_dir_getstripe_default() by
fetching the layout from root FID after ll_dir_get_default_layout()
returns -ENODATA from a directory that does not have trusted.lov xattr.

Lustre-change: https://review.whamcloud.com/36609
Lustre-commit: 3e8fa8a7396cd029cb0d7714a324343eed7f535e

Change-Id: Icbf1f8f4fa5e5b8788217fcb0cfd24a3b80a27d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11907 dne: allow access to striped dir with broken layout 39/36939/4
Lai Siyao [Sun, 14 Apr 2019 20:12:54 +0000 (04:12 +0800)]
LU-11907 dne: allow access to striped dir with broken layout

Sometimes the layout of striped directories may become broken:
* creation/unlink is partially executed on some MDT.
* disk failure or stopped MDS cause some stripe inaccessible.
* software bugs.

In this situation, this directory should still be accessible,
and specially be able to migrate to other active MDTs.

This patch add this support on both server and client: don't
imply stripe FID is sane, and when stripe doesn't exist, skip
it.

Add OBD_FAIL_MDS_STRIPE_FID to simulate insane stripe FID, and
OBD_FAIL_MDS_STRIPE_CREATE to simulate stripe creation failure.

Add sanity 60h.

Lustre-change: https://review.whamcloud.com/34750
Lustre-commit: d2725563e7afa17a41a53aa65255a31380606d23

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8a05a0e0cef8b051a935b3fa3d3e26c0b6ef3b4a
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11673 tests: replace obsolete '-o' to '||' 29/36929/3
James Nunez [Thu, 5 Dec 2019 16:32:46 +0000 (09:32 -0700)]
LU-11673 tests: replace obsolete '-o' to '||'

Since use of -o and -a are marked as obsolete in shell
test ([), we need to switch from using [ expr1 –o expr2 ]
to [ expr1] || [ expr2 ].

Make this change for sanity tests.

This is a partial back port of:
Lustre-change: https://review.whamcloud.com/33670
Lustre-commit: 6d277f126df7605d402255333180b0ca03991613

Test-Parameters: trivial
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Id87580d0280a716a6939a1203ae5b370e762d6ec
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12895 mdt: check if object exists first 32/37032/3
Sebastien Buisson [Thu, 31 Oct 2019 11:33:45 +0000 (20:33 +0900)]
LU-12895 mdt: check if object exists first

Make sure object exists before trying to get its attr.

Lustre-change: https://review.whamcloud.com/36629
Lustre-commit: ca68e3d677a371497586167a2318268db1d94cab

Test-Parameters: clientselinux mdtcount=4 envdefinitions=ONLY=185a testlist=sanity,sanity,sanity,sanity
Test-Parameters: clientselinux mdtcount=4 testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idb2cd5d6e3fdf7998040b933be54a001a0e5391b
Reviewed-on: https://review.whamcloud.com/37032
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12469 mdd: handle migrate case with SELinux 31/37031/2
Sebastien Buisson [Wed, 6 Nov 2019 12:51:55 +0000 (21:51 +0900)]
LU-12469 mdd: handle migrate case with SELinux

In case a metadata object is created for migration purpose,
its security context should not be initialized. The
security.selinux xattr will be copied after creation, just like
any other xattr, so that the migrated object has the right security
context.

Lustre-change: https://review.whamcloud.com/36684
Lustre-commit: 8a60fa2e2fcd28c2772d90e76d36430d30b01905

Test-Parameters: clientselinux mdtcount=4 envdefinitions=ONLY=230 testlist=sanity,sanity,sanity,sanity
Test-Parameters: clientselinux mdtcount=4 testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I0bc274426c003f8081da2f4d1e8e6c12a70b9930
Reviewed-on: https://review.whamcloud.com/37031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12944 mdd: pass correct xattr size to lower layers 30/37030/2
Sebastien Buisson [Wed, 6 Nov 2019 17:31:08 +0000 (02:31 +0900)]
LU-12944 mdd: pass correct xattr size to lower layers

In mdd_iterate_xattrs(), struct lu_buf allocated to store xattr value
can be reused for multiple xattrs, because it is only reallocated if
it happens to be too small for one xattr.
As a consequence, lb_len field does not represent actual xattr's size.
It has to be adjusted when passed to lower layers.

Lustre-change: https://review.whamcloud.com/36689
Lustre-commit: e5e584fd386a2229809bc64d440c3255cf50c1bd

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I26b54759b4e69fbac17a1032bbc724b796d78108
Reviewed-on: https://review.whamcloud.com/37030
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11956 mdd: do not reset original lu_buf.lb_len 29/37029/2
Li Dongyang [Thu, 27 Jun 2019 03:25:45 +0000 (13:25 +1000)]
LU-11956 mdd: do not reset original lu_buf.lb_len

In mdd_iterate_xattrs(), we are resetting the xbuf.lb_len
to a smaller value returned by linkea_overflow_shrink().

If that's the last xattr we gonna process, we could deduct
less than originally allocated size from obd_memory stats,
failing the memleak check later.

Lustre-change: https://review.whamcloud.com/35333
Lustre-commit: 94a5bc1bcb6c6373ead5b091ff5915dfe452377b

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I6175a91c61ceb0e37ab889d0cfd904f4993ab5cc
Reviewed-on: https://review.whamcloud.com/37029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12965 obdclass: remove assertion for imp_refcount 66/37066/2
Li Dongyang [Wed, 13 Nov 2019 04:01:25 +0000 (15:01 +1100)]
LU-12965 obdclass: remove assertion for imp_refcount

After calling obd_zombie_import_add(), obd_import could
be freed by obd_zombie before we check imp_refcount with
LASSERT_ATOMIC_GE_LT. It's a use after free and could
crash the box.

Lustre-change: https://review.whamcloud.com/36743
Lustre-commit: dd71e74fecf45b81daa27c89c0b8065a58cac5c1

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I3d63acf2bff543924ca0e74a35d24c507d68f6aa
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37066
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8207 scripts: add auto-stripe option to lfs_migrate 58/36958/2
Nathan Dauchy [Mon, 2 Jul 2018 14:21:35 +0000 (10:21 -0400)]
LU-8207 scripts: add auto-stripe option to lfs_migrate

Add a "-A" flag to lfs_migrate, which will automatically select the
stripe count as the file is rewritten. Initial algorithm to
determine stripe count is sqrt(size_in_GB)+1, with an additional cap
on object size, though the algorithm or thresholds could conceivably
change in the future.  The primary intent for this feature is to be
able to give users a tool to fix stripe settings on existing files
based on file size.

A new "-C" flag specifies the object size cap.  On each OST, the
amount of space available for migration is capped by dividing the
free space of the smallest OST by the specified value.

A new "-M" flag allows OSTs with free space less than the specified
value to be considered unavailable for migration.

A new "-v" flag increases verbosity to help debug what is being done.

A new "-X" flag limits the amount of free space on each OST that
can be used for migration to the specified value.  This flag is
useful for testing by simulating OSTs that are nearly full.

A new sanity test verifies the operation of the new "-A" flag.

Lustre-change: https://review.whamcloud.com/20552
Lustre-commit: 99d7a8ed43be126b2769ad8bb0b5350cd328ed7f

Test-Parameters: trivial
Signed-off-by: Nathan Dauchy <nathan.dauchy@nasa.gov>
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I9ce8b64e028d9abb66b6b49cf7675263fd7202f0
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12826 mdt: limit root to change project state by default 56/37056/2
Wang Shilong [Tue, 22 Oct 2019 06:15:02 +0000 (14:15 +0800)]
LU-12826 mdt: limit root to change project state by default

The current project quota implementation allows users to
change the Project ID of files for which they have write
permission to any value. This is not useful if the project
quota is intended to be enforced instead of only being used
for quota accouting.

Change it so that by default only root can change the projid
of a file. Setting "mdt.*.enable_chprojid_gid" will allow
users with the specified numeric Group ID (eg. 1 = "admin") to
also change the projid of a file. Use "-1" to return the previous
behavior where all users can change the projid of their files.

Lustre-change: https://review.whamcloud.com/36544
Lustre-commit: 8fad70c0872ba13133024e4abf53a0bbee7ba1e9

Change-Id: I91c138d29f4d0b9bc607528d86893451904c9892
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12928 gss: crash in sec2target_str() 99/36999/2
Yang Sheng [Thu, 7 Nov 2019 18:48:43 +0000 (02:48 +0800)]
LU-12928 gss: crash in sec2target_str()

The timer_setup() API has being used since 3.10.0-957.x
kernel. So change gck_timer to a embedded struct to avoid
crashed on new timer API.

Lustre-change: https://review.whamcloud.com/36708
Lustre-commit: 5b40c9b90b44ddd0b042c12c10c65c9965a9856f

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie12e21bca4169746016c8ac0e3ee4a125893ebf6
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12920 build: replace ed with sed 14/37014/2
Minh Diep [Thu, 31 Oct 2019 14:26:03 +0000 (07:26 -0700)]
LU-12920 build: replace ed with sed

Ed commad is very old

Test-Parameters: trivial

Lustre-change: https://review.whamcloud.com/36630
Lustre-commit: 9e11ac388bd85967222dd5cb5ecade1d9b8f67a8

Change-Id: I18ffe50c3fb006182e68460c03a4d34d5011e62a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37014
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12967 tgt: clean up sync_on_cancel references 38/37038/2
Andreas Dilger [Thu, 14 Nov 2019 02:49:23 +0000 (19:49 -0700)]
LU-12967 tgt: clean up sync_on_cancel references

Clean up the use of "sync_on_cancel" in the code, since the tunable
parameter is named "sync_lock_cancel" and using the same name in
the code makes it easier to find the related parts.

Rename constants to be more consistent:
  NEVER_SYNC_ON_CANCEL    -> SYNC_LOCK_CANCEL_NEVER
  BLOCKING_SYNC_ON_CANCEL -> SYNC_LOCK_CANCEL_BLOCKING
  ALWAYS_SYNC_ON_CANCEL   -> SYNC_LOCK_CANCEL_ALWAYS

Initialize sync_lock_cancel_states[] with designated initializers
so that the state names always match the declared values.

Use ARRAY_SIZE() instead of needing NUM_SYNC_ON_CANCEL_STATES.

Lustre-change: https://review.whamcloud.com/36754
Lustre-commit: 52a5981be4df863088168b3ea41fac9e29ddf060

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If7c6015420a5c3266a13798fd8b96539323ebbe5
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37038
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12967 ofd: restore sync_on_lock_cancel tunable 37/37037/2
Andreas Dilger [Thu, 14 Nov 2019 00:56:35 +0000 (17:56 -0700)]
LU-12967 ofd: restore sync_on_lock_cancel tunable

The "ofd.*.sync_on_lock_cancel" tunable was inadvertently replaced
during procfs->sysfs changes in 2.12 with "sync_lock_cancel".  Restore
the "sync_on_lock_cancel" tunable since it has existed since the 2.0
release and is definitely in use with several systems.

It isn't just a matter of restoring the old tunable name, since the
"mdt.*.sync_lock_cancel" name is also used since 2.8 and the code for
the two tunables was recently consolidated in the server target code.

Instead, keep the common "sync_lock_cancel" tunable name, add backward
compatibility for "sync_on_lock_cancel" for a number of releases, and
print a deprecation warning if the old name is used.

Fix up sanity.sh test_80 to check for both the old and new names,
but only if we actually need to change this tunable for ZFS, along
with minor test script style cleanups.

Fixes: 7059644e9ad3 ("LU-8066 ofd: migrate from proc to sysfs")

Lustre-change: https://review.whamcloud.com/36748
Lustre-commit: 7df7347b7b188e7168e094304fd6d2d985f7f274

Change-Id: Iffe65f6268d94075c71b96d42fe60ef11ac39448
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12856 target: check FLFLAGS are valid while accessing them 00/37000/2
Mikhail Pershin [Thu, 31 Oct 2019 20:44:38 +0000 (23:44 +0300)]
LU-12856 target: check FLFLAGS are valid while accessing them

While checking OBD_FL_SHORT_IO flag check first that OBD_MD_FLFLAGS
are valid.

Lustre-change: https://review.whamcloud.com/36632
Lustre-commit: 707f5a982e895c9a484dcdb8d1644e3f63c7c5cc

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I04ac61141d70883c29a113fac3985ac81cc878af
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-9341 lod: Add special O_APPEND striping 07/37007/2
Patrick Farrell [Wed, 28 Aug 2019 16:54:37 +0000 (12:54 -0400)]
LU-9341 lod: Add special O_APPEND striping

Files opened with O_APPEND are almost always log files,
which generally stay small and do not benefit from being
striped widely.  Additionally, PFL files accessed with
O_APPEND are fully instantiated, meaning that because the
files usually stay small, these objects usually wasted.

This patch adds special striping for files created with
O_APPEND.  This is controlled on the MDS by two new proc
variables:
mdd_append_stripe_count
mdd_append_pool

If the stripe count is set to 0 and the pool is not set,
this functionality is disabled and files created with
O_APPEND will be striped like any other file.

Lustre-change: https://review.whamcloud.com/35617
Lustre-commit: e2ac6e1eaa108eef3493837e9bd881629582ea1d

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I433d1b8c80488a851b8eb26c78cf5519a6cd75bf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12671 mdd: rename mdd/sync_perm to sync_permissions 06/37006/2
James Simmons [Wed, 21 Aug 2019 20:16:13 +0000 (16:16 -0400)]
LU-12671 mdd: rename mdd/sync_perm to sync_permissions

Commit e783bbff accidentally renamed a sysfs variable when moving.
Change the sysfs file to it proper name

Test-Parameters: trivial testlist=replay-vbr

Lustre-change: https://review.whamcloud.com/35851
Lustre-commit: 55a7e2dcecaf482c40840840db2b0b795bad2bb9

Change-Id: I56e0534506271cf6760f775a9c8fa99b12683861
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37006
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 mdd: migrate from proc to sysfs 05/37005/2
James Simmons [Thu, 15 Nov 2018 18:20:30 +0000 (13:20 -0500)]
LU-8066 mdd: migrate from proc to sysfs

Move the ofd module from using proc for most single value files
to sysfs. The more complex proc entries are moved to debugfs.

Lustre-change: https://review.whamcloud.com/33632
Lustre-commit: e783bbffe35b2b8ebebde5bc70abf288d07df5a3

Change-Id: I01eebf1c58f1a13c2f5e8c599a1363c80468b0bd
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37005
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-1957 tests: remove sanity test 180 from ALWAYS_EXCEPT 30/36930/2
Andreas Dilger [Mon, 26 Aug 2019 23:00:44 +0000 (17:00 -0600)]
LU-1957 tests: remove sanity test 180 from ALWAYS_EXCEPT

Remove test_180 from sanity ALWAYS_EXCEPT, since it should have been
fixed by landing LU-2803.

Lustre-change: https://review.whamcloud.com/35930
Lustre-commit: 72b59b85a253e508ec1b192fbf8cad840ca6ff2c

Fixes: e99f38594d2b ("LU-2803 osd: osd-zfs to handle echo sequence (2) properly")
Test-Parameters: trivial testlist=sanity fstype=zfs
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7601164865baba8fe2db3ce7bb33fd4c81eb0291
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12769 recovery: use monotonic timer 37/36937/2
Alex Zhuravlev [Mon, 23 Sep 2019 08:26:19 +0000 (11:26 +0300)]
LU-12769 recovery: use monotonic timer

instead of real one. also use absolute values for timer.

One of the reasons for the move from jiffies based timer
to a hrtimer timer was to avoid the issue of time drift.
It was discovered due to test failures with recovery on
VMs that the high resolution wall clock can drift as well.
Moving to the monotonic clock for the hrtimer avoids this
drift completely and it is safe to use since the recovery
timestamp is not shared between nodes.

Lustre-change: https://review.whamcloud.com/36274
Lustre-commit: 06408a4ef381121fa58783026a0cf0a6b0fa479c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8b75121934c229dec8df7be0a4e69c1cda940d3f
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11762 ldlm: don't exceed hard timeout 36/36936/2
James Simmons [Thu, 4 Jul 2019 16:47:09 +0000 (12:47 -0400)]
LU-11762 ldlm: don't exceed hard timeout

For recovery lustre has both a soft timeout, obd_recovery_timeout
and a hard timeout, obd_recovery_time_hard. When the recovery
timer is adjust with the function extend_recovery_timer() you
can control if it takes in consideration what is left of the
timer. The current code is not very clear on its intent so this
patch attempts to make the code understandable. No function
change should happen with this patch.

Lustre-change: https://review.whamcloud.com/34408
Lustre-commit: 8bfe8939d810f5ac16484d3d4b81f829c7d7d0d7

Change-Id: I5701a6cd813ad64b6b4422863767af135eb8e94b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36936
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11673 tests: quote argument of -n conf-sanity 28/36928/2
James Nunez [Thu, 1 Aug 2019 21:23:14 +0000 (15:23 -0600)]
LU-11673 tests: quote argument of -n conf-sanity

Inside the single bracket test function '[', the argument
of the ‘-n’ flag should be quoted arguments.  If the -n
argument is not quoted, a blank value will cause the
variable to disappear and this causes issues.  Quote the
argument or use [[ ]].

conf-sanity test 79 has two cases where the ‘-n’ argument
is not quoted.  Let's correct this.

Lustre-change: https://review.whamcloud.com/35669
Lustre-commit: 443cc6e51f0202b9bc40c256259c4fc14ae3f7af

Test-Parameters: trivial envdefinitions=ONLY=79 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I4b3a43de064d1992439dc25ecc7b0682520f74c9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11673 tests: quote argument of -n and test fix 27/36927/3
James Nunez [Fri, 2 Aug 2019 19:49:59 +0000 (13:49 -0600)]
LU-11673 tests: quote argument of -n and test fix

Inside the single bracket test function '[', the ‘-n’ flag
problems arise with unquoted arguments.  The -n argument
should be quoted or use double brackets for the test.

Quote the ‘-n’ argument in test-framework.sh functions.
This simple correction caused a few tests to fail.
Fix sanity test 65k to use the correct facets and check
for the mgs facet in convert_facet2label() to fix
replay-single test 58b.

Lustre-change: https://review.whamcloud.com/35080
Lustre-commit: 7e0cba246a7f2408c8266574a657e4459f691570

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I9655d2138c56c007207434f04b487b518bb3392e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12462 osc: layout and chunkbits alignment mismatch 77/36877/2
Vitaly Fertman [Thu, 8 Aug 2019 15:46:06 +0000 (18:46 +0300)]
LU-12462 osc: layout and chunkbits alignment mismatch

In the discard case, the OSC fsync/writeback code asserts
that each OSC extent is fully covered by the fsync request.

It may happen that a start(or an end) of a component does not match
the first (the last) osc object extent start (end), which is aligned
by the cl_chunkbits which depends on the OST block size.

The requirement for the component alignment is LOV_MIN_STRIPE_SIZE
which is 64K, the ZFS block size could be in MBs.

Use an aligned by chunk size the fsync reqion in the assertion.

Fixes: 092ecd6612 ("LU-12462 osc: Do not assert for first extent")

Lustre-change: https://review.whamcloud.com/35733
Lustre-commit: 7a9f7dec700c5c553396475daad272475f1b20be

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I2ff47fc87c838239142ffc63bebafce3e9403f4e
Cray-bug-id: LUS-7498
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36877
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12462 osc: Do not assert for first extent 76/36876/2
Patrick Farrell [Tue, 16 Jul 2019 16:28:25 +0000 (12:28 -0400)]
LU-12462 osc: Do not assert for first extent

In the discard case, the OSC fsync/writeback code asserts
that each OSC extent is fully covered by the fsync request.

This is not valid for the DOM case, because OSC extent
alignment requirements can create OSC extents which start
before the OST region of the layout (ie, they cross in to
the DOM region).  This is OK because the layout prevents
them from ever being used for i/o, but this same behavior
means that the OSC fsync start/end is aligned with the
layout, and so does not necessarily cover that first
extent.

The simplest solution is just to not assert on the first
extent.  (There is no way at the OSC layer to recognize the
DOM case.)

Lustre-change: https://review.whamcloud.com/35525
Lustre-commit: 092ecd66127eade284550b83192fa004ff55501b

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If66f8d81fb9dd4546a5647a10f6ca551e2cf98e3
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12899 build: rhel8 not install kernel-rpm-macros 39/37039/2
Qian Yingjin [Wed, 23 Oct 2019 01:43:24 +0000 (09:43 +0800)]
LU-12899 build: rhel8 not install kernel-rpm-macros

On RHEL8 kmodtool and kernel_module_package_buildreqs are not
installed with kernel-devel.

kernel_module_package_buildreqs is defined in kernel-rpm-marcos.
If kernel-rpm-macros is not installed, the Lustre RPM build will
report:
"Dependency tokens must begin with alpha-numeric, '_' or '/':
BuildRequires: %kernel_module_package_buildreqs"

This patch helps the developer understanding the detailed
information for the required packages when kernel-rpm-macros is
not installed.

Lustre-change: https://review.whamcloud.com/36557
Lustre-commit: 037840fb6b86d6083d55f3da5ad70d19d34cc5a5

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Id9b855eeac97d780d9c572d306da3c3a1fa95ea6
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37039
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12503 vvp_dev: increment *pos in .next 35/37035/2
NeilBrown [Sun, 11 Aug 2019 15:43:40 +0000 (11:43 -0400)]
LU-12503 vvp_dev: increment *pos in .next

As described in

Commit ec2e9995e4c5 ("lustre: llite: change how "dump_page_cache" walks a hash table")

The .next function should increment *pos. For some reason it
didn't, and this can trigger the warning in that function.

Lustre-change: https://review.whamcloud.com/35765
Lustre-commit: 02336a9a5d096dc9a603ed0e77e0c7cf7b41ffb3

Change-Id: If4ac748f455750d82712299b7915eb541a3ddc7e
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37035
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12503 llite: file write pos mimatch 34/37034/2
Bobi Jam [Wed, 27 Nov 2019 08:48:49 +0000 (16:48 +0800)]
LU-12503 llite: file write pos mimatch

In vvp_io_write_start(), after data were successfully written, but
for some reason (e.g. out of quota), the data does not or got
partially commited, so that the file's write position (kiocb->ki_pos)
would be pushed forward falsely, and in the next iteration of write
loop, it fails the assertion

ASSERTION( io->u.ci_rw.rw_iocb.ki_pos == range->cir_pos )

This patch corrects ki_pos if this scenario happens.

Lustre-change: https://review.whamcloud.com/36021
Lustre-commit: 1d2aa1513dc4e65813ad0bea138966a55244dbde

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib85b1a777da24cc935e5976beab2390052b4cec3
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12741 ptlrpc: do lu_env_refill for new request 36/37036/2
Mikhail Pershin [Fri, 8 Nov 2019 06:26:06 +0000 (09:26 +0300)]
LU-12741 ptlrpc: do lu_env_refill for new request

Perform lu_env_refill() prior any new request handling.
That was done already in tgt_request_handle() and is moved
now to ptlrpc_main() to work for any handler as well,
e.g. ldlm_cancel_handler()

Lustre-change: https://review.whamcloud.com/36714
Lustre-commit: 3f304b75d24aea0075415affa0c0bef004ef012c

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic5d8bfbd845f7e131849078c016f7e13b91d072f
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37036
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12894 sec: fix checksum for skpi 28/37028/2
Sebastien Buisson [Tue, 29 Oct 2019 09:32:22 +0000 (18:32 +0900)]
LU-12894 sec: fix checksum for skpi

Compute checkum on message before actually comparing
it to hmac value.

Add test to exercise all SSK flavors.
Make sure zconf_mount does include skpath mount option if SSK or
Kerberos is in use.

Lustre-change: https://review.whamcloud.com/36604
Lustre-commit: dcdf060342e7d69b64171840cf9475bf65d036ea

Fixes: a21c13d4df ("LU-8602 gss: Properly port gss to newer crypto api.")
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity-sec
Test-Parameters: envdefinitions=SHARED_KEY=true,SK_FLAVOR=skn testlist=sanity,recovery-small
Test-Parameters: envdefinitions=SHARED_KEY=true,SK_FLAVOR=ska testlist=sanity,recovery-small
Test-Parameters: envdefinitions=SHARED_KEY=true,SK_FLAVOR=ski testlist=sanity,recovery-small
Test-Parameters: envdefinitions=SHARED_KEY=true,SK_FLAVOR=skpi testlist=sanity,recovery-small
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7bcc3618c1824a0f0ca73219c7ac0ccc8405b946
Reviewed-on: https://review.whamcloud.com/37028
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-11981 lnet: clean up error message 01/37001/3
Amir Shehata [Thu, 12 Dec 2019 18:11:34 +0000 (10:11 -0800)]
LU-11981 lnet: clean up error message

There are instances when the message can be canceled. In this
case we do not want that to impact the interface health or output
an error message for it, as it could be noisy. Therefore, reduce
the message which logs this case from error to debug

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I586dbfcdcfa38994db99dc5983240b38c9ee2770
Reviewed-on: https://review.whamcloud.com/37001
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11997 ptlrpc: Properly swab ll_fiemap_info_key 81/36481/5
Oleg Drokin [Fri, 27 Sep 2019 14:23:18 +0000 (10:23 -0400)]
LU-11997 ptlrpc: Properly swab ll_fiemap_info_key

It was using lustre_swab_fiemap which is incorrect since the
structures don't match.

Added lustre_swab_fiemap_info_key that swabs embedded
obdo and ll_fiemap_info_key structures.

Lustre-change: https://review.whamcloud.com/36308
Lustre-commit: 2b905746ee3b5d9dbafcdb1af5930aea18120a7b

Change-Id: Ie701163bd4c2072a0461b2d9485bc184c6548f8f
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36481
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12935 obdclass: fix import connect flag printing 81/36881/2
Andreas Dilger [Tue, 5 Nov 2019 03:25:22 +0000 (20:25 -0700)]
LU-12935 obdclass: fix import connect flag printing

The obd_connect_names[] array holds strings for the OBD_CONNECT_*
and obd_CONNECT2_* flag names.  It is positional, so every flag
bit needs a corresponding field in the array.

The "async_discard" feature was backported to b2_12, but the two
earlier features "pcc" and (now removed) "plain_layout" were not
backported.  Add in strings for those features, and fill in some
earlier "unknown" flag names as well

Fixes: e5810126b3fb ("LU-11359 mdt: fix mdt_dom_discard_data() timeouts")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I883d236262805361be3f48c533d781878f9494fa
Reviewed-on: https://review.whamcloud.com/36881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12946 kernel: fix to handle BLK_MQ_RQ_QUEUE_DEV_BUSY event 68/36868/3
Wang Shilong [Thu, 7 Nov 2019 02:18:15 +0000 (10:18 +0800)]
LU-12946 kernel: fix to handle BLK_MQ_RQ_QUEUE_DEV_BUSY event

It looks like what's happening is when dm_dispatch_clone_request
dispatches the "clone" I/O request to the underlying (real) device
from the multipath device, the scsi driver can (often under load)
return BLK_MQ_RQ_QUEUE_DEV_BUSY. dm_dispatch_clone_request doesn't
have that as an exception the way it does BLK_MQ_RQ_QUEUE_BUSY and
so it calls dm_complete_request which propagates
the BLK_MQ_RQ_QUEUE_DEV_BUSY error code up the stack resulting
in multipath_end_io calling fail_path and failing the path because
there is an error value set.

Lustre-change: https://review.whamcloud.com/36699
Lustre-commit: 5c8b1e87a97bbe7b05f0b8325e98c16a0de1ff4c

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: If17ea5b3ab33a89a17d49e5dfb2e9f9f19371564
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9341 utils: fix lfs find for composite files 35/36935/3
Andreas Dilger [Thu, 25 Jul 2019 05:29:26 +0000 (23:29 -0600)]
LU-9341 utils: fix lfs find for composite files

Running "lfs getstripe -c" on a composite file returns the stripe
count of the last initialized component, but "lfs find -c N" does
not find this file because it was adding the total stripe_count
of all components.  "lfs find" should also check the stripe_count
of the last initialized component, as described in the man page.
Also use the last component stripe_size instead of any component.

Add a test case for the correct usage.

Lustre-change: https://review.whamcloud.com/35611
Lustre-commit: 72479a52be5f77f601d8234d957f5d6176edf6e8

Fixes: 5a76aee24476 ("LU-8998 lfs: user space tools for PFL")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1f0097aa002b29febcbf183cab02519b202540e5
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11673 tests: add space before ']' in test-framework 26/36926/2
James Nunez [Thu, 6 Jun 2019 13:48:13 +0000 (07:48 -0600)]
LU-11673 tests: add space before ']' in test-framework

The test command '[' expects spaces before all arguments
including the closing ']'.

Add a space before the closing ']' in the function
print_summary() in test-framework.sh.

Lustre-change: https://review.whamcloud.com/35079
Lustre-commit: 54e011a729fd656ae8568192763afe12425cd05e

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: If2365cb5f2b9c003949c6224997644c61341fe35
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36926
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12842 utils: llog_print with snapshot name 33/36933/2
Andreas Dilger [Wed, 9 Oct 2019 17:26:24 +0000 (11:26 -0600)]
LU-12842 utils: llog_print with snapshot name

The lsnapshot utility creates filesystems named with generated
hexadecimal strings.  In some cases the filesystem name may start
with a number instead of a character, which causes "lctl llog_print"
(via llog_ioctl()) to consider the filesystem name invalid.

Allow filesystem names in llog_ioctl() to start with a digit.

Lustre-change: https://review.whamcloud.com/36414
Lustre-commit: 6e73b5705c6f1cda391b3a9cec8825eb9f914d38

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib2054d5afbeaa3f661148fff834c29f83f5d98ad
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36933
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12595 lnet: Return EHOSTUNREACH for unreachable gateway 71/36871/2
Chris Horn [Fri, 26 Jul 2019 21:08:00 +0000 (16:08 -0500)]
LU-12595 lnet: Return EHOSTUNREACH for unreachable gateway

Commit 43b35351e9ca258773e89c2d68047e939fb822fb contains a flaw in
that it shouldn't be a fatal error to encounter an unreachable
gateway when parsing routes. Parsing should continue in case there
are any valid, reachable routes that are being added. Returning EINAL
here will cause a failure to load the LNet module. lnet_parse_route()
explicitly allows for lnet_add_route() to return EHOSTUNREACH for
just this purpose.

Lustre-change: https://review.whamcloud.com/35630
Lustre-commit: 7c12c24c8a10be2d0ad005d6d99d97cee6bcde18

Test-parameters: trivial
Fixes: 43b35351e9 ("LU-12411 lnet: Do not allow gateways on remote nets")
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ia0f28779a3505eff02dafdc23a6e01c1d0cbc84b
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36871
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12859 llite: clear flock when using localflock 34/36934/2
Andreas Dilger [Mon, 14 Oct 2019 22:29:30 +0000 (06:29 +0800)]
LU-12859 llite: clear flock when using localflock

When mounting a client with "-o localflock" or equivalent option in
/etc/fstab, it does not clear out the "flock" mount option flag from
the superblock.  This results in "flock" still being the option used
and it displays both options in the /proc/mounts output:

  10.0.0.1@o2ib:/lfs on /mnt/lfs type lustre (rw,flock,localflock)

Mount a client with both "flock,localflock" as mount options and
verify that the "flock" option is cleared by "localflock", and
vice versa.  Verify that "noflock" clears both options.

Remove the "remount_client()" helper in conf-sanity.sh, since this
shadows a helper function of the same name in test-framework.sh and
is confusing.  Instead, use "mount_client()" now that it can accept
mount options, and just pass "remount" explicitly in a few places.

Lustre-change: https://review.whamcloud.com/36452
Lustre-commit: 22ee4a1f64eca526ef34a3fd89dc4e95bb307732

Fixes: 3613af3e15cb ("LU-10885 llite: enable flock mount option by default")
Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie31b0c4f6674c99d3ed5b73caa39cfc23d3ebbe5
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36934
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12760 tests: stack_trap defaults to sigspec=EXIT 38/36938/2
Andreas Dilger [Sat, 14 Sep 2019 07:46:44 +0000 (01:46 -0600)]
LU-12760 tests: stack_trap defaults to sigspec=EXIT

If the "sigspec" argument is not specified for stack_trap(), default
to "EXIT" as the signal, since this is what we use for all callers
of stack_trap() today anyway.

Lustre-change: https://review.whamcloud.com/36186
Lustre-commit: 5a911faae25784a91fc085debeb6dbe8512d80b6

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2c8d986cdf8743e1d956cd7941a47bd4cd772592
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12411 lnet: Do not allow gateways on remote nets 70/36870/2
Chris Horn [Tue, 11 Jun 2019 19:59:31 +0000 (14:59 -0500)]
LU-12411 lnet: Do not allow gateways on remote nets

A gateway needs to be reachable over some local interface.

Lustre-change: https://review.whamcloud.com/35198
Lustre-commit: 43b35351e9ca258773e89c2d68047e939fb822fb

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ib66d4f8fd48d8863097280c480648ab8e29d2767
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36870
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12691 ldlm: obd_max_recoverable_clients is not atomic 78/36878/2
Tatsushi Takamura [Mon, 26 Aug 2019 00:12:37 +0000 (09:12 +0900)]
LU-12691 ldlm: obd_max_recoverable_clients is not atomic

Originally obd_max_recoverable_clients is not increased
at the same moment. But because of LU-3540,
it will be increased by multiple processes.

The type of obd_max_recoverable_clients should be
atomic_t and be handled by atomic operations.

Lustre-change: https://review.whamcloud.com/35914
Lustre-commit: 01261e7b563adc97899d962f0ba2d1b430894bf7

Signed-off-by: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com>
Change-Id: I9a67bbbfacab2e05858243f649e3a4e0d4b5d7f7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12025 osp: allow OS_STATE_* flags from OSTs 72/36872/3
Andreas Dilger [Thu, 28 Feb 2019 00:37:08 +0000 (17:37 -0700)]
LU-12025 osp: allow OS_STATE_* flags from OSTs

Allow OS_STATE_* flags to be sent from the OST, so that the
OS_STATE_NOPRECREATE can be used to prevent a newly-added OST
from being used until it is ready.  Add the "no_precreate"
parameter on the OFD that can be set from userspace.

Close a race in the cached opd_statfs.os_state handling in
osp_pre_update_statfs().  It was being overwritten by the
new statfs data from the OST, but was globally visible for a
short time to the precreate threads before the OS_STATE_*
flags were set on the cached statfs data again.

Similarly, there was a race with updating the opd_pre_status
if the OST was out of space, where it would be cleared after
a successful statfs, and wouldn't be set to -ENOSPC until a
short time later.

Split osp_pre_update_status() into osp_pre_update_msfs() that
only copies the statfs data into the cache after all of the
flags are set.  Don't clear flags from the cache, they will
only be cleared when new statfs data is sent.

Add a test that the 'N'OPRECREATE flag appears in "lfs df".

Lustre-change: https://review.whamcloud.com/35029
Lustre-commit: 9b0ebf78f7919a144673edadc4a95bad84fae2d3

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c1c7a097f3de8edfdeef2b437f40936e73ebbe5
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36872
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11575 build: install systemd stuff only for debian with systemd enabled 05/36405/5
Gu Zheng [Wed, 28 Nov 2018 15:23:06 +0000 (10:23 -0500)]
LU-11575 build: install systemd stuff only for debian with systemd enabled

Add a precheck for systemd, to avoid to try to package systemd
stuff intolustre-client/server-utils deb when building on debian
series without systemd supported.

Test-Parameters: clientdistro=ubuntu1804 trivial

Lustre-change: https://review.whamcloud.com/33492
Lustre-commit: 02b097440db37fe5e8054f983f8382dfa85f8e25

Change-Id: If58b64acc035e621594ab420a8b900b18a34a211
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36405
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12236 tests: add tests for LNET network namespace 70/36770/3
Aurelien Degremont [Thu, 1 Aug 2019 12:48:05 +0000 (12:48 +0000)]
LU-12236 tests: add tests for LNET network namespace

This patch adds tests for LNET for this feature.

Lustre-change: https://review.whamcloud.com/35666
Lustre-commit: b20704d5f63a07c54cfbea331df90e6ca765e79b

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I2320e5da1beef30be5dcca9529fa838fc9304876
Reviewed-on: https://review.whamcloud.com/36770
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12236 lnet: support non-default network namespace 69/36769/3
Aurelien Degremont [Thu, 25 Apr 2019 13:15:56 +0000 (13:15 +0000)]
LU-12236 lnet: support non-default network namespace

Replace hard coded references to default root network namespace
(&init_net) in LNET code (LNET, socklnd and o2iblnd).

When a network interface is created, Lustre records the current
network namespace. This patch improves the LNET code to use
this reference namespace most of the time instead of the root
network namespace. When using lctl, lnetctl or insmod, we
use the current process network namespace.
When starting the listening acceptor, we use the namespace of the
process that triggers this start.

An additional patch is needed for RPCSEC GSS support.

Lustre-change: https://review.whamcloud.com/34768
Lustre-commit: 93b08edfb1c6ae8aec7e1009d3aca450416358d7

Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I56877ddcd7a27883662c86f245b196153211e7b2
Reviewed-on: https://review.whamcloud.com/36769
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11670 tests: do not fail the first half in sanityn test 103 07/36407/3
Jian Yu [Thu, 26 Sep 2019 17:51:22 +0000 (10:51 -0700)]
LU-11670 tests: do not fail the first half in sanityn test 103

There are two halves in sanityn test 103. The first half is to
reproduce the problem of incorrect size when using lockahead
and the second half is to verify that the fix works. Sometimes,
the problem cannot be reproduced in the first half test, so we
should not fail the whole test.

Test-Parameters: trivial fstype=zfs \
mdscount=2 mdtcount=4 testlist=sanityn,sanityn,sanityn

Test-Parameters: trivial fstype=ldiskfs \
mdscount=2 mdtcount=4 testlist=sanityn,sanityn,sanityn

Lustre-change: https://review.whamcloud.com/36303
Lustre-commit: a88d0aa76c62e3074a05e869c9f0ba7ac128300f

Change-Id: Ib6c82bfe512ac072104abfcb406e2ef1bd6a6a02
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36407
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11670 osc: glimpse - search for active lock 06/36406/4
Patrick Farrell [Mon, 9 Sep 2019 15:56:07 +0000 (11:56 -0400)]
LU-11670 osc: glimpse - search for active lock

When there are lock-ahead write locks on a file, the server
sends one glimpse AST RPC to each client having such (it
may have many) locks. This callback is sent to the lock
having the highest offset.

Client's glimpse callback goes up to the clio layers and
gets the global (not lock-specific) view of size.  The clio
layers are connected to the extent lock through the
l_ast_data (which points to the OSC object).

Speculative locks (AGL, lockahead) do not have l_ast_data
initialised until an IO happens under the lock. Thus, some
speculative locks may not have l_ast_data initialized.

It is possible for the client to do a write using one lock
(changing file size), but for the glimpse AST to be sent to
another lock without l_ast_data initialized.  Currently, a
lock with no l_ast_data set returns ELDLM_NO_LOCK_DATA to
the server.  In this case, this means we do not return the
updated size.

The solution is to search the granted lock tree for any lock
with initialized l_ast_data (it points to the OSC object
which is the same for all the extent locks) and to reach the
clio layers for the size through this lock instead.

Lustre-change: https://review.whamcloud.com/33660
Lustre-commit: b3461d11dcb04670cc2e1bfbb99306cfd3f645ef

cray-bug-id: LUS-6747
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6c60f4133154a3d6652315f155af24bbc5752dd2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36406
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12103 ldiskfs: don't search large block range if disk full 81/36681/3
Artem Blagodarenko [Thu, 6 Jun 2019 13:50:11 +0000 (16:50 +0300)]
LU-12103 ldiskfs: don't search large block range if disk full

Block allocator tries to find:
1) group with the same range as required
2) group with the same average range as required
3) group with required amount of space
4) any group

For quite full disk step 1 is failed with higth
probability, but takes a lot of time.

Skip 1st step if disk space < 25%
Skip 2d step if disk space < 15%
Skip 3d step if disk space < 5%
Also check if group has any free space on step 4.

This three thresholds can be adjusted through added interface.

Variables added which counts unsuccessfull group processing loops.
This can show allocator effectiveness in different circumstances.

This statistics output through mb_alloc file. This file is
useful to track allocator activity.

Lustre-change: https://review.whamcloud.com/35180
Lustre-commit: 95f8ae5677491508ae7182b4f61ead3d413434ae

Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Change-Id: I18c7147e32951c49e12a2444803aa2995bb4ae2d
Cray-bug-id: LUS-6746
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36681
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12568 lnet: Defer rspt cleanup when MD queued for unlink 35/36635/3
Amir Shehata [Thu, 31 Oct 2019 22:03:46 +0000 (15:03 -0700)]
LU-12568 lnet: Defer rspt cleanup when MD queued for unlink

When an MD is queued for unlink its lnet_libhandle is invalidated so
that future lookups of the MD fail. As a result, the monitor thread
cannot detach the response tracker from such an MD, and instead must
wait for the remaining operations on the MD to complete before it can
safely free the response tracker and remove it from the list. Freeing
the memory while there are pending operations on the MD can result
in a use after free situation when the final operation on the MD
completes and we attempt to remove the response tracker from the MD
via the lnet_msg_detach_md()->lnet_detach_rsp_tracker() call chain.

Here we introduce zombie lists for such response trackers. This will
allow us to also handle the case where there are response trackers
on the monitor queue during LNet shutdown. In this instance the
zombie response trackers will be freed when either all the operations
on the MD have completed (this free'ing is performed by
lnet_detach_rsp_tracker()) or after the LND Nets have shutdown since
we are ensured there will not be any more operations on the
associated MDs (this free'ing is performed by
lnet_clean_zombie_rstqs()).

Three other small changes are included in this patch:
 - When deleting the response tracker from the monitor's list we
   should use list_del() rather than list_del_init() since we'll
   be freeing the response tracker after removing it from the list.
 - Perform a single ktime_get() call for each local queue.
 - Move the check of whether the local queue is empty outside of
   the net lock.

Signed-off-by: Chris Horn <hornc@cray.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If7e3fa53ae8585fb3e0e4aed29f0e1d97e85d356
Reviewed-on: https://review.whamcloud.com/36635
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12441 lnet: Detach rspt when md_threshold is infinite 34/36634/3
Chris Horn [Thu, 11 Jul 2019 20:08:30 +0000 (15:08 -0500)]
LU-12441 lnet: Detach rspt when md_threshold is infinite

MDs for pings use the infinite threshold on MD operations.
As such they aren't normally unlinkable as determined by
lnet_md_unlinkable(). We can cover this case by checking whether the
refcount is zero and threshold is LNET_MD_THRESH_INF.

Signed-off-by: Chris Horn <hornc@cray.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ib51f84d85005dd2d13dadca059a1d6c42ff3bf59
Reviewed-on: https://review.whamcloud.com/36634
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12530 utils: narrow l_tunedisk udev rule 76/36776/3
Olaf Faaland [Mon, 28 Oct 2019 20:34:53 +0000 (13:34 -0700)]
LU-12530 utils: narrow l_tunedisk udev rule

Narrow the udev rule so that it runs l_tunedisk only for ext4 block
devices formatted for Lustre.

Devices which are members of ZFS pools do not need such tunings to
be provided by lustre - they are handled by ZFS.

There are currently no other OSD types in the tree.  Sites/Vendors which
support other OSDs will need to adjust the rule appropriately.

Lustre-change: https://review.whamcloud.com/36599
Lustre-commit: 7b2cb54858daa60d560fd6374c4ecba552a10d27

Change-Id: Iba8b20fc705da0259ab71ee33b92193cae7e8eae
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36776
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12703 utils: reset rootpath in llapi_search_rootpath() 82/36482/3
Alex Zhuravlev [Mon, 30 Sep 2019 20:50:49 +0000 (23:50 +0300)]
LU-12703 utils: reset rootpath in llapi_search_rootpath()

as get_root_path() can use it as a source and fail if
passed pathname contains garbage (on stack);

Lustre-change: https://review.whamcloud.com/36335
Lustre-commit: 3e2e0025d1e929763f9d4de48746c3433d3684d5

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9f628353c872afc82a582b0a6ca960cd0e8cffcb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36482
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-1365 utils: allow set block size for ldiskfs backend 78/36778/2
Artem Blagodarenko [Wed, 28 Nov 2018 20:37:53 +0000 (23:37 +0300)]
LU-1365 utils: allow set block size for ldiskfs backend

Add “-b” option to mkfs.lustre that allows to set backend block size.

Lustre-change: https://review.whamcloud.com/33757
Lustre-commit: 5f674667bfd1ab9a0e47d9f03f3e7eab37eb8e17

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Change-Id: I83fc76f64ce2a0b4bf500841b695d64d3dea78de
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36778
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12593 osd: zeroing a freshly allocated block buffer 09/36709/2
Alexander Boyko [Fri, 26 Jul 2019 14:13:21 +0000 (10:13 -0400)]
LU-12593 osd: zeroing a freshly allocated block buffer

Ldiskfs zeroes new buffer only when it is not uptodate.
In rare case we can get a new buffer head with uptodate flag.
This may cause a file corruption for non zero offset writes,
especially for internal Lustre files like update_log, CATALOGS,
lov_objid.

od_fld_lookup()) lustre-MDT0001-mdtlov: invalid FID [0x0:0x50:0x0]

The patch adds zeroing under i_mutex for unmaped blocks.

The performance results, since the patch adds mutex to a creation
path (lov_objid file).
40 tasks, 2000000 files
SUMMARY: (of 5 iterations)
Operation       Max           Min           Mean    Std Dev
---------       ---           ---           ----    -------
without fix
File creation: 39990.601   19020.238     27443.823  6909.605
With fix
File creation: 37958.809   21708.187     27065.855  5900.961

Lustre-change: https://review.whamcloud.com/35629
Lustre-commit: f832a7dc33c69fae9af199f0317e6385deeaeccf

Cray-bug-id: LUS-6132
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: Ica8fbe29b5a7253d553b41a41ffe5d8d8b4b2e55
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12893 lnet: fix peer_ni selection 43/36643/2
Amir Shehata [Tue, 22 Oct 2019 18:27:24 +0000 (11:27 -0700)]
LU-12893 lnet: fix peer_ni selection

When selecting a peer-ni we must use the same peer NID
through all the messages which belong to the same RPC.
This is necessary in order to ensure we do the RDMA over
the optimal interface.

Lustre-change: https://review.whamcloud.com/36552
Lustre-commit: 94ee26738884e3f5b241698bc2e7a8da9702d264

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0391537da32bc6ac7a8a3d92e207bf172d111981
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36643
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4398 llite: do not cache write open lock for exec file 80/36680/3
Jinshan Xiong [Tue, 1 May 2018 18:35:53 +0000 (11:35 -0700)]
LU-4398 llite: do not cache write open lock for exec file

This is to avoid the problem that the MDT needs an extra lock
revocation to make the file be able to execute.

Lustre-change: https://review.whamcloud.com/32265
Lustre-commit: 6dd9d57bc006a37731d34409ce43de13c192e0cc

Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Change-Id: Ibb42a9a8cb56a9bf48a6e972b72d3d71ed7fbaf5
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36680
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12533 llite: Improve readahead RPC issuance 42/36342/2
Patrick Farrell [Thu, 8 Aug 2019 17:13:29 +0000 (13:13 -0400)]
LU-12533 llite: Improve readahead RPC issuance

lov_io_submit receives a range of pages, then adds pages in
to a batch until it hits a page which is not in the stripe
associated with this lov object.  This means that if a
readahead page range hits the same stripe more than once,
we will issue multiple I/Os, even if the pages would fit in
one RPC.

This is unnecessary - Just submit all these pages at once.

mpirun -n 2 $IOR -s 2000 -t 47K -b 47K -k -r -E -o $FILE

Without patch:
osc.lustre-OST0001-osc-ffff8fe82c952000.rpc_stats=

                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                     118  56  56   |          0   0   0
2:                       0   0  56   |          0   0   0
4:                       0   0  56   |          0   0   0
8:                       0   0  56   |          0   0   0
16:                      5   2  58   |          0   0   0
32:                      0   0  58   |          0   0   0
64:                      0   0  58   |          0   0   0
128:                    21  10  68   |          0   0   0
256:                    25  11  80   |          0   0   0
512:                    10   4  85   |          0   0   0
1024:                   31  14 100   |          0   0   0

osc.lustre-OST0002-osc-ffff8fe82c952000.rpc_stats=
                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                       5   6   6   |          0   0   0
2:                       0   0   6   |          0   0   0
4:                       0   0   6   |          0   0   0
8:                       0   0   6   |          0   0   0
16:                      0   0   6   |          0   0   0
32:                      0   0   6   |          0   0   0
64:                      0   0   6   |          0   0   0
128:                    19  23  29   |          0   0   0
256:                    19  23  52   |          0   0   0
512:                     5   6  58   |          0   0   0
1024:                   34  41 100   |          0   0   0

With patch:
osc.lustre-OST0001-osc-ffff8fe7a7227800.rpc_stats=
                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                      12  17  17   |          0   0   0
2:                       0   0  17   |          0   0   0
4:                       0   0  17   |          0   0   0
8:                       0   0  17   |          0   0   0
16:                      5   7  24   |          0   0   0
32:                      0   0  24   |          0   0   0
64:                      5   7  31   |          0   0   0
128:                     6   8  40   |          0   0   0
256:                     1   1  42   |          0   0   0
512:                     2   2  44   |          0   0   0
1024:                   38  55 100   |          0   0   0

osc.lustre-OST0002-osc-ffff8fe7a7227800.rpc_stats=
                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                       0   0   0   |          0   0   0
2:                       0   0   0   |          0   0   0
4:                       0   0   0   |          0   0   0
8:                       0   0   0   |          0   0   0
16:                      0   0   0   |          0   0   0
32:                      0   0   0   |          0   0   0
64:                      4   7   7   |          0   0   0
128:                     7  13  21   |          0   0   0
256:                     0   0  21   |          0   0   0
512:                     3   5  26   |          0   0   0
1024:                   38  73 100   |          0   0   0

Note the much larger # of smaller RPC issued without the patch.

Lustre-change: https://review.whamcloud.com/35458
Lustre-commit: 05b9da4fd124c61fd41d4b560773c0552a1ee5d7

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic10c138628c269afe57fbc57ec8c91ce990717f9
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12844 lnet: fix strncpy bound error 18/36418/3
Jian Yu [Wed, 9 Oct 2019 21:41:54 +0000 (14:41 -0700)]
LU-12844 lnet: fix strncpy bound error

This patch fixes the following error while using gcc 8:

liblnetconfig.c: In function ‘lustre_lnet_parse_nids’:
liblnetconfig.c:320:3: error: ‘strncpy’ specified bound depends on
the length of the source argument [-Werror=stringop-overflow=]
   strncpy(entry, cur, len - 1);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
liblnetconfig.c:310:10: note: length computed here
    len = strlen(cur) + 1;
          ^~~~~~~~~~~
cc1: all warnings being treated as errors

This patch is back-ported from the following one:
Lustre-commit: cebda7a478f9943f10b9a3388377c61a54957a87
Lustre-change: https://review.whamcloud.com/36417

Change-Id: I2d5840fd58c7b7d27ef1b2aa12f1f187d30abbfd
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36418
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12745 build: Account for optional SPL for ZFS 0.8+ 08/36408/3
Nathaniel Clark [Wed, 11 Sep 2019 15:10:58 +0000 (11:10 -0400)]
LU-12745 build: Account for optional SPL for ZFS 0.8+

With ZFS 0.8.0 and later, SPL is not longer present.
Some zfs packages provide vestigial spl package contents, but zfs-dkms
does not.  This makes testing SPL directories optional depending on
version of ZFS, this also accounts for the new location of the spl
include directory under the zfs include directory.

Lustre-change: https://review.whamcloud.com/36161
Lustre-commit: a245dde23a9fdbdff7d09a783bcbe3349f68a908

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I8afcff079f25543a3c86df0c404146a859b226aa
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36408
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12925 test: assign right initial value for test_61 60/36660/2
Yang Sheng [Mon, 4 Nov 2019 03:49:41 +0000 (11:49 +0800)]
LU-12925 test: assign right initial value for test_61

This patch snip from commit:591a9b4cebc510ff5. The test_62
would be failed since test_61 leave a failover state in some
case.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: If46a6d435bcaafb9000abb032ac561c5453776ee
Reviewed-on: https://review.whamcloud.com/36660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12803 libcfs: bump module version 42/36642/2
James Simmons [Fri, 18 Oct 2019 13:31:00 +0000 (09:31 -0400)]
LU-12803 libcfs: bump module version

The linux client version of libcfs is further ahead in its
cleanup so its module version is higher. While this is the
case it does prevent the OpenSFS version of libcfs from
loading and since OpenSFS is current ahead of the linux
client we prefere to use it at this time. Lets just increase
the OpenSFS libcfs module to be just slightly ahead of the
linux client.

Test-Parameters: trivial

Lustre-change: https://review.whamcloud.com/36488
Lustre-commit: 4b25d733342bc6f3a424ecfb0db80f1c175a8986

Change-Id: Ie57d93529bf25d908099f7dab06d2960f9923d58
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36642
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12328 flr: avoid reading unhealthy mirror 50/36550/2
Bobi Jam [Fri, 24 May 2019 17:40:25 +0000 (01:40 +0800)]
LU-12328 flr: avoid reading unhealthy mirror

* Fix an error in lov_io_mirror_init() which would wait unnecessarily
  if we're retrying the last mirror of the file.

* In osc_io_iter_init() we'd check its OSC import status so that the
  read path can quickly switch another mirror.
  sanity-flr test_33b is added to test this case.

* And with all mirrors have been tried, we'd turn off the quick switch
  so that when all mirrors contain bad OSTs, the read will still try
  its best to get partial data from a component before trying another
  mirror.
  sanity-flr test_33c is added to test this case.

Lustre-change: https://review.whamcloud.com/34952
Lustre-commit: 39da3c06275e04e2a6e7f055cb27ee9dff1ea576

Test-Parameters: envdefinitions=ONLY="33" testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr
Fixes: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5621a834e58ee1bfccf6c407d2c68357b5c3eb3b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11221 osd: allow concurrent bulks from pagecache 70/36570/2
Alex Zhuravlev [Wed, 31 Oct 2018 09:54:59 +0000 (12:54 +0300)]
LU-11221 osd: allow concurrent bulks from pagecache

drop page lock earlier, once IO is complete so that page can be
read by few clients simultanously.

Lustre-change: https://review.whamcloud.com/33521
Lustre-commit: 0a92632538d8c985e024def73512d18d1570d5ca

Change-Id: Iee28f578e937744f07f7c5be7eae99e59e625e6e
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11367 som: integrate LSOM with lfs find 53/36553/3
Qian Yingjin [Thu, 1 Nov 2018 08:49:53 +0000 (16:49 +0800)]
LU-11367 som: integrate LSOM with lfs find

The patch integrates LSOM functionality with lfs find so that it
is possible to use LSOM functionality directly on the client. The
MDS fills in the mbo_size and mbo_blocks fields from the LSOM
xattr, if the actual size/blocks are not available, and then set
new OBD_MD_FLLSIZE and OBD_MD_FLLBLOCKS flags in the reply so that
the client knows these fields are valid.

The lfs find command adds "-l|--lazy" option to allow the use of
LSOM data from the MDS.

Add a new version of ioctl(LL_IOC_MDC_GETINFO) call that also returns
valid flags from the MDS RPC to userspace in struct lov_user_mds_data
so that it is possible to determine whether the size and blocks are
returned by the call.  The old LL_IOC_MDC_GETINFO ioctl number is
renamed to LL_IOC_MDC_GETINFO_OLD and is binary compatible, but
newly-compiled applications will use the new struct lov_user_mds_data.

New llapi interfaces llapi_get_lum_file(), llapi_get_lum_dir(),
llapi_get_lum_file_fd(), llapi_get_lum_dir_fd() are added to fetch
valid stat() attributes and LOV info to the user.

Lustre-change: https://review.whamcloud.com/35167
Lustre-commit: 11aa7f8704c490b011f60f234c3ac9929ce76948

Signed-off-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I21dfae7c2633dead5d83b438ec340fea4d3ebbe5
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12824 o2ib: Record rc in debug log on startup failure 47/36547/2
Chris Horn [Mon, 30 Sep 2019 15:03:06 +0000 (10:03 -0500)]
LU-12824 o2ib: Record rc in debug log on startup failure

Since kiblnd_startup() return -ENETDOWN on failure, let's record the
rc value for the failure case in the debug log.

Lustre-change: https://review.whamcloud.com/36325
Lustre-commit: 99f85541a685df82265f18167e91c161c523ce50

Cray-bug-id: LUS-7935
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ied934642bc567b8d3f51293d7dd095d47ff134df
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36547
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12824 o2ib: Fix whitespace in kiblnd_startup 46/36546/2
Chris Horn [Mon, 30 Sep 2019 15:01:28 +0000 (10:01 -0500)]
LU-12824 o2ib: Fix whitespace in kiblnd_startup

Convert whitespace to tabs where appropriate in kiblnd_startup()

Lustre-change: https://review.whamcloud.com/36324
Lustre-commit: 50300e83e4cab3157149107eb735825cc4c3aff1

Cray-bug-id: LUS-7935
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I11aaaa8e47d754b219fb773d74e37190476e4eeb
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36546
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12824 o2ib: Reintroduce kiblnd_dev_search 45/36545/2
Chris Horn [Mon, 30 Sep 2019 15:04:10 +0000 (10:04 -0500)]
LU-12824 o2ib: Reintroduce kiblnd_dev_search

If we add an interface to multiple nets then we need to re-use the
struct ib_dev object for each of the nets.

Lustre-change: https://review.whamcloud.com/36326
Lustre-commit: e25e45c612a061031e8b4b5233137fbb57b50cc4

Cray-bug-id: LUS-7935
Fixes: 75ab841 ("LU-11893 lnet: consoldate secondary IP address handling")
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I1790e24458f47d632fd137b78de076d408fe5260
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36545
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12355 llite: vfs atomic_open change with FMODE_CREATED 15/36415/4
Shaun Tancheff [Thu, 10 Oct 2019 04:49:36 +0000 (21:49 -0700)]
LU-12355 llite: vfs atomic_open change with FMODE_CREATED

Kernel 4.19 introduced FMODE_CREATED and switched to it while
the last argument to vfs atomic_open was removed and the f_mode
flags are used to indicate the created state on return.

Linux-commit: 73a09dd94377e4b186b300bd5461920710c7c3d5

Lustre-change: https://review.whamcloud.com/35020
Lustre-commit: 4decb4c2da6053066f10cbe419e2db212de8e4aa

Test-Parameters: trivial
Change-Id: I26d4aadb123bb1d1bc0aa1d78a64a75b94276ffb
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36415
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12734 misc: add bash completion for lctl set/get_param 83/36483/2
Dominique Martinet [Mon, 9 Sep 2019 14:46:45 +0000 (16:46 +0200)]
LU-12734 misc: add bash completion for lctl set/get_param

Add some start of bash completion for lctl, mainly set_param and
get_param, and modify build system to install it.

Lustre-change: https://review.whamcloud.com/36105
Lustre-commit: f87a7f2656ceff174a00933a170032f093ecc72d

Test-Parameters: trivial
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Change-Id: I16d2698e782702375c7fa3edf3bfde2e3b197297
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36483
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11526 rpc: support maximum 64MB I/O RPC 69/35369/6
Qian Yingjin [Wed, 16 Jan 2019 02:13:20 +0000 (10:13 +0800)]
LU-11526 rpc: support maximum 64MB I/O RPC

On newer systems, some block drivers allow max_hw_sector_kb to
be up to 65536KB (64MB) to the underlying storage. To maximize
driver efficiency, Lustre should also have bump up maximum I/O
RPC size to 64MB.
Clamp max_read_ahead_whold_mb not to exceed
max_read_ahead_per_file_mb

Lustre-change: https://review.whamcloud.com/34042
Lustre-commit: 1a9be0046b1f1772d3f934c2146dc5233c391377

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icbf78742f8210d82dc310af7d05b7c32b93af34f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35369
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12043 llite,readahead: don't always use max RPC size 59/35559/6
Wang Shilong [Sun, 2 Jun 2019 15:17:26 +0000 (23:17 +0800)]
LU-12043 llite,readahead: don't always use max RPC size

Since 64M RPC landed, @PTLRPC_MAX_BRW_PAGES will be 64M.
And we always try to use this max possible RPC size to check
whether we should avoid fast IO and trigger real context IO.

This is not good for following reasons:

(1) Since current default RPC size is still 4M,
most of system won't use 64M for most of time.

(2) Currently default readahead size per file is still 64M,
which makes fast IO always run out of all readahead pages
before next IO. This breaks what users really want for readahead
grapping pages in advance.

To fix this problem, we use 16M as a balance value if RPC smaller
than 16M, patch also fix the problem that @ras_rpc_size could not
grow bigger which is possibe in the following case:

1) set RPC to 16M
2) Set RPC to 64M

In the current logic ras->ras_rpc_size will be kept as 16M which is wrong.

Lustre-change: https://review.whamcloud.com/35033
Lustre-commit: 7864a6854c3dfe3319dcf6809e728eed9a37b9b4

Change-Id: Ida9f839f7c692cd88d32dc0909503f6ae991d909
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11933 mdt: clear sp_cr_flags in migrate unpack 99/36399/4
Lai Siyao [Thu, 15 Aug 2019 14:31:17 +0000 (22:31 +0800)]
LU-11933 mdt: clear sp_cr_flags in migrate unpack

mdt_thread_info.mti_spec is not cleared after operation handling, so
mdt_migrate_unpack() should clear it in case the old values are used.

Lustre-change: https://review.whamcloud.com/36154
Lustre-commit: d4da3b55a8303d937828e74341b3ab5c4dfd52b2

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib3d5d39a4a072621c8da8b6ef7869cb4d8178aac
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/36399
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11626 mdc: hold obd while processing changelog 38/36338/4
Hongchao Zhang [Tue, 8 Oct 2019 17:10:34 +0000 (13:10 -0400)]
LU-11626 mdc: hold obd while processing changelog

During read/write changelog, the corresponding obd_device should
be held to protect it from being released by umount.

Lustre-change: https://review.whamcloud.com/35784
Lustre-commit: d7bb6647cd4dd26949bceb6a099cd606623aff2b

Change-Id: Ib5b528f178edcf73425587ea60335df640c1696d
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36338
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11743 utils: allow lctl pool_list on separate MGS 14/36314/4
Emoly Liu [Sun, 22 Sep 2019 12:06:07 +0000 (20:06 +0800)]
LU-11743 utils: allow lctl pool_list on separate MGS

Change lctl pool_list command to parse the configuration log directly
when run on a standalone MGS node.  This also allows the pool commands
to be run when only the MGS is started.

Also, those test scripts from the patch of LU-9899 to mount a client
on the standalone MGS to allow OST pools to work properly are cleared.

Lustre-change: https://review.whamcloud.com/35895
Lustre-commit: d908fe9686bc1e583da7434856d9c06e6cbbc4fd

Change-Id: Ic25931d49c2cf747da2a3f2ac3c25a21f6878991
Test-Parameters: standalonemgs=true testlist=ost-pools.sh,sanity.sh,conf-sanity.sh
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12131 tests: properly handle GSS in server failover 34/35534/10
Sebastien Buisson [Mon, 3 Jun 2019 14:30:50 +0000 (23:30 +0900)]
LU-12131 tests: properly handle GSS in server failover

In case of server failover, a number of aspects must be handled when
GSS based features (SSK or Kerberos) are activated:
- lsvcgssd daemon must be restarted;
- targets must be mounted with proper skpath option;
- permissions on keys must be adjusted.
When service is initially started, all that is managed in setupall().
fail() and facet_failover() have to be improved to take GSS aspects
into account.

Lustre-change: https://review.whamcloud.com/35041
Lustre-commit: 1cbfb44fb59945da62acbb672330fde5c75ddc98

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8db686f406629c7eec655496cf83c0539c1bfb33
Reviewed-on: https://review.whamcloud.com/35534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11204 obdclass: remove unprotected access to lu_object 17/36217/3
Mikhail Pershin [Sun, 26 May 2019 17:46:43 +0000 (20:46 +0300)]
LU-11204 obdclass: remove unprotected access to lu_object

The check of lu_object_is_dying() is done after reference
drop and without lock, so can access freed object if concurrent
thread did final put.

The patch saves object state right before atomic_dec_and_lock()
and checks it after check, so object is not being accessed

Lustre-change: https://review.whamcloud.com/34960
Lustre-commit: 336cf0f2f3a9ce5b11a34aeaeec062a5d5144213

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I926991f465e7913e5fc150095425bfb5bf07f57f
Reviewed-on: https://review.whamcloud.com/36217
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12719 obdclass: serialize lwp list access 49/36349/2
Lai Siyao [Sun, 11 Aug 2019 06:34:04 +0000 (14:34 +0800)]
LU-12719 obdclass: serialize lwp list access

lustre_sb_info.lsi_lwp_list should be acessed with lock, and
some place may sleep, change lsi_lwp_lock from spinlock to mutex.

Lustre-change: https://review.whamcloud.com/36003
Lustre-commit: 875252d59924ad09db8de9f0fbb611788a0b9c78

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ifc3622eb28cd6cf49661b14fc10e98aa689a58dc
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36349
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>