Whamcloud - gitweb
fs/lustre-release.git
9 months agoLU-14195 build: Adjust Makefile for Linux build changes. 07/40907/3
Mr NeilBrown [Wed, 9 Dec 2020 00:28:16 +0000 (11:28 +1100)]
LU-14195 build: Adjust Makefile for Linux build changes.

Since v5.10-rc1~51^2~19, "KBUILD_BUILTIN" has been unset
for module builds.  This means that "targets-for-builtin"
isn't built, and that is how "extra-y" is built.

So we need another way to force LUSTRE_KERNEL_TEST to be built.

Since v5.6-rc1~1^2~5 any target listed in "always-y" will always get
built.  So we can assign LUSTRE_KERNEL_TEST to this macro.

Assigning both macros is safe, even for those kernels which include
both in the list of targets.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I508b3710579c068dec93baf81ee383f3f03bd370
Reviewed-on: https://review.whamcloud.com/40907
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14178 ldlm: return error from ldlm_namespace_new() 51/40851/3
Andreas Dilger [Thu, 3 Dec 2020 09:40:35 +0000 (02:40 -0700)]
LU-14178 ldlm: return error from ldlm_namespace_new()

Return the underlying error in ldlm_namespace_new() from
ldlm_namespace_sysfs_register() to the caller instead of NULL.
Otherwise, the callers convert the NULL to -ENOMEM and this
is incorrectly reported as an allocation error to the user.

  sysfs: cannot create duplicate filename
     '/fs/lustre/ldlm/namespaces/lustre-OST0002-osc-ffff89f33be70000'
  mount.lustre: mount mgs:/lfs at /lfs failed: Cannot allocate memory

Change ldlm_namespace_new() to return errors via PTR_ERR() and
change the callers to use IS_ERR().

Fix associated CERROR() messages to follow proper code style.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7d590c7242399549b32b1c4189e46ff8748c8096
Reviewed-on: https://review.whamcloud.com/40851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14073 ofd: remove use of smp_read_barrier_depends() 94/40394/2
Mr NeilBrown [Mon, 26 Oct 2020 02:30:22 +0000 (13:30 +1100)]
LU-14073 ofd: remove use of smp_read_barrier_depends()

Linux 5.9 removes smp_read_barrier_depends(), so lustre must stop
using it.

There is only one use: in ofd_access_log.c.
This use is unnecessary and can simply be removed.

The code is based on "Documentation/core-api/circular-buffers.rst"
which gives no indication that this barrier is needed.

The comment say its purpose is to ensure the index is read before the
data is read. This is unnecessary.
The data is written in osl_write_entry(), then a barrier is issued
(smp_store_release) before the ->head is written.
oal_read_entry() issues a barrier (smp_load_acquire()) before reading
that head.
'tail' is read without a barrer, but it then compared against ->head
in CIRC_CNT().  Even if reading ->tail was racey, the fact that
comparing it wilth ->head succeeded means that the data written at
->tail must have been safely written, and we can now read it without
any further barrier.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9d0f0aeb67e1188d2012f4ae2e14b3656211c3e2
Reviewed-on: https://review.whamcloud.com/40394
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14047 lustre: change EWOULDBLOCK to EAGAIN 07/40307/4
John L. Hammond [Tue, 20 Oct 2020 14:20:35 +0000 (09:20 -0500)]
LU-14047 lustre: change EWOULDBLOCK to EAGAIN

On linux, EWOULDBLOCK has always been defined as an alias for
EAGAIN. In the interest of readability we should not use two names for
the same thing. So change the remaining uses of EWOULDBLOCK to EAGAIN
and add EWOULDBLOCK||EAGAIN to spelling.txt.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib48b8a1e58bfa961d2a4ba411c038c476bfc300d
Reviewed-on: https://review.whamcloud.com/40307
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13783 libcfs: switch from ->mmap_sem to mmap_lock() 88/40288/5
Mr NeilBrown [Fri, 16 Oct 2020 06:18:29 +0000 (17:18 +1100)]
LU-13783 libcfs: switch from ->mmap_sem to mmap_lock()

In Linux 5.8, ->mmap_sem is gone and the preferred interface
for locking the mmap is to suite of mmap*lock() functions.

So provide those functions when not available, and use them
as needed in Lustre.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4ce3959f9e93eae10a7b7db03e2b0a1525723138
Reviewed-on: https://review.whamcloud.com/40288
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
9 months agoLU-9325 osd-ldisk: replace simple_strto* with kstr* functions 19/40119/8
James Simmons [Tue, 3 Nov 2020 14:27:43 +0000 (09:27 -0500)]
LU-9325 osd-ldisk: replace simple_strto* with kstr* functions

The parsing of mount parameters from the config llog is done in
some cases with simple_strto* which is considered obsolete.
Replace simple_strto* with the kstrto* equivalent functions.

Change-Id: I7c26d14d02828c9f9a96f31a086a65bb39f3ea87
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/40119
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9859 lod: use linux kernel bitmap API 76/39876/2
James Simmons [Thu, 10 Sep 2020 14:35:55 +0000 (10:35 -0400)]
LU-9859 lod: use linux kernel bitmap API

Now that modern Linux kernels support a bitmap API we can move
away from the libcfs specific bitmap API. This patch changes
the last bitmap in the lod module to use the Linux kernel
bitmap API.

Change-Id: I92c494bf2af62e31d7b9527b3f44580322e48fd3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9859 libcfs: replace all CFS_CAP_* macros with CAP_* 75/39875/3
Mr. NeilBrown [Thu, 10 Sep 2020 13:49:30 +0000 (09:49 -0400)]
LU-9859 libcfs: replace all CFS_CAP_* macros with CAP_*

Lustre defines a few CFS_CAP_* macros which are exactly the
same as the corresponding CAP_* macro, with one exception.

CFS_CAP_SYS_BOOT is 23
CAP_SYS_BOOT is 22.

CFS_CAP_SYS_BOOT is only used through CFS_CAP_FS_MASK and
causes capability 23 (CAP_SYS_NICE) to be dropped in certain
circumstances.
It is probable that the intention was to drop CAP_SYS_BOOT,
and this is what is now done.

CFS_CAP_CHOWN_MASK and CFS_CAP_SYS_RESOURCE_MASK are never
used, so they have been removed.

Linux-commit: 5ebaa2d14850205e44757c4d5fdd4097712d01ef

Change-Id: Ifb90c0a36e204c76b90ff23ac609345d11b878da
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/39875
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13100 lov: grant deadlock if same OSC in two components 95/37095/8
Andriy Skulysh [Thu, 24 Oct 2019 12:11:25 +0000 (15:11 +0300)]
LU-13100 lov: grant deadlock if same OSC in two components

The same osc can be involved in several components but osc layer
leaves active last used extent, so an RPC can't be sent if grants
are required from the same OST for another component.

Add cl_io_extent_release() to release active extent before
switching to the next component.

Change-Id: Idadda6eaecd1d47b78880c81e1fb7513d5be2419
Cray-bug-id: LUS-8038
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/37095
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-7853 lod: fixes bitfield in lod qos code 12/18812/15
Rahul Deshmkuh [Thu, 14 Jul 2016 06:02:45 +0000 (23:02 -0700)]
LU-7853 lod: fixes bitfield in lod qos code

Updating bitfields in struct lod_qos struct is protected
by lq_rw_sem in most places but an update can be lost
due unprotected bitfield access from
lod_qos_thresholdrr_seq_write() and qos_prio_free_store().
This patch fixes it by replacing bitfields with named bits
and atomic bitops.

Cray-bug-id: LUS-4651
Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I28299ce4960e91be551d7f6e43a3b598daf4d7a2
Reviewed-on: https://review.whamcloud.com/18812
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14180 utils: verify setstripe comp_end is valid 39/41239/3
Andreas Dilger [Tue, 26 Jan 2021 02:14:10 +0000 (18:14 -0800)]
LU-14180 utils: verify setstripe comp_end is valid

Verify that the "lfs setstripe -E <component_end>" value is valid.
Otherwise, if "-S" is not specified at the same time, then an
invalid file layout can be created and the file cannot be deleted
normally, only via "lfs rmdif <FID>".

Allow values < 4096 (e.g. '64' or '128' which would all be invalid
anyway) to be interpreted as KiB units.

Update usage messages and man pages to match.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I47fe7729ffd447c1c1cc098e5117e456263ebbe5
Reviewed-on: https://review.whamcloud.com/41239
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14207 mgs: delete "add failnid" sections on replace_nids 30/40930/3
Artem Blagodarenko [Wed, 14 Oct 2020 01:54:36 +0000 (21:54 -0400)]
LU-14207 mgs: delete "add failnid" sections on replace_nids

Replace_nids left old nids in add_conn field of failnid
section of client llog. This leads to connecton errors.

Let's delete such sections. New failnids, if any, are
added by replace_nids.

Change-Id: I7fab00827035bd864aeb95fb4852a59c458bb2ba
HPE-bug-id: LUS-9440
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/40930
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13974 llog: check stale osp object 42/40742/3
Alexander Boyko [Tue, 24 Nov 2020 05:34:11 +0000 (00:34 -0500)]
LU-13974 llog: check stale osp object

The logic of osp_attr_get has 2 path,
1) return attributes from a cache for health osp object
2) make an out update request and return attributes for stale
osp object, object lose stale state.

When some out update request with llog writes failed, osp object
become stale. But llog handle stay inconsistent (bitmap,count,
last_index), and a next llog_add->llog_osd_write_rec do dt_attr_get,
gets attributes and makes osp object valid, and uses wrong llog
handle data. The result is index jump at llog file - recX, recX+2.
And it makes an error during update log processing if failover take
a place.
The fix adds dt_object_stale function to check osp_object.
llog_osd_write_rec check it and return ESTALE. llog_add would fail
with ESTALE error and doesn't corrupt update log.

HPE-bug-id: LUS-9030
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iadf53fd816e1c5bde0a19d4c537f0408796c864a
Reviewed-on: https://review.whamcloud.com/40742
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-12961 mdd: avoid double call to mdd_changelog_fini() 37/36737/11
Alex Zhuravlev [Tue, 12 Nov 2019 17:45:50 +0000 (20:45 +0300)]
LU-12961 mdd: avoid double call to mdd_changelog_fini()

the first call is done from mdd_prepare() as part of error
handling, another call is done from mdd_device_shutdown().

in the similar cases fini routines checks whether the state
is initialized. e.g. mdd_orphan_index_fini() releases the
object and sets mdd_orphans to NULL, then all subsequent
calls to mdd_orphan_index_fini() return immediately.
mdd_changelog_fini() can't do this way, so the excplicit
state has been introduced.

Change-Id: Ifd21569e68c836f44bb59adea4e8fed6ccef1c7b
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36737
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
9 months agoLU-14439 build: require a newer version of e2fsprogs 82/41682/4
Andreas Dilger [Wed, 17 Feb 2021 23:50:59 +0000 (16:50 -0700)]
LU-14439 build: require a newer version of e2fsprogs

Require a build version of e2fsprogs-1.44.3 for osd-ldiskfs builds to
get EXT2_FLAG_IGNORE_SB_ERRORS and ext2fs_has_feature_{mmp,quota}()
functions.

Require a minimum install version of ldiskfsprogs-1.45.6.wc1.

Test-Parameters: trivial
Fixes: 7dc8aa7e7848 ("LU-13241 utils: use libext2fs for ldiskfs operations")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I240823266c7342338c72126011485120543ebbe5
Reviewed-on: https://review.whamcloud.com/41682
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13609 mgs: fix config_log buffer handling 78/41478/5
Stephane Thiell [Thu, 11 Feb 2021 00:15:02 +0000 (16:15 -0800)]
LU-13609 mgs: fix config_log buffer handling

Fix buffer handling in mgs_list_logs() to list all MGS config_logs
using multiple ioctl calls when we have a large number of targets.

Fixes: 1d97a8b4cd3d ("LU-13609 llog: list all the log files correctly on MGS/MDT")
Signed-off-by: Stephane Thiell <sthiell@stanford.edu>
Change-Id: I1bf32e918e242f4da83c3d1624b7285a18a88d01
Reviewed-on: https://review.whamcloud.com/41478
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-10391 socklnd: use sockaddr instead of __u32 addresses. 04/37704/17
Mr NeilBrown [Thu, 14 May 2020 22:51:54 +0000 (08:51 +1000)]
LU-10391 socklnd: use sockaddr instead of __u32 addresses.

LNet/socklnd often uses __u32 to hold an ipv4 address.
As we want to extend socklnd to work with IPv6 addresses too,
this needs to change.

This patch changes many __u32s to variants of 'struct sockaddr'.

Library code from sunrpc is used for copying and comparing addresses
   rpc_copy_addr() and rpc_cmp_addr()
and for extracting or setting the port:
   rpc_get_port() and rpc_set_port().

The "%pIS" printf format is used for printing a sockaddr (works for
both IPv4 and IPv6), and "%pISp" for printing the address with the
port.

The __u32 is in host-byte-order, while addresses in sockaddr are
always network-order, so htonl and ntohl are used as needed.

When storing an address (e.g. in a structure), 'struct
sockaddr_storage' is used.  When passing an address to a function,
'struct sockaddr' is used.  When an address is known to be IPv4 (i.e.
when converting to or from __u32), 'struct sockaddr_in' is used.

The following functions are changed to take a 'struct sockaddr*'
argument:

 lnet_connect()
 lnet_connect_console_error()
 lnet_sock_getaddr()
 ksocknal_ip2iface()
 ksocknal_ip2index()
 ksocknal_create_route()
 ksocknal_connecting()
 ksocknal_close_peer_conns_locked()
 ksocknal_peer_del_interface_locked()

The following structures have had fields changed to 'struct
sockaddr_storage'

 struct ksock_interface:
      ksni_ipaddr -> ksni_addr
 struct ksock_conn
      ksnc_myipaddr -> ksnc_myaddr
      ksnc_ipaddr and ksnc_port -> ksnc_peeraddr
 struct ksock_route
      ksnr_myipaddr -> ksnr_myaddr
      ksnr_ipaddr and ksnr_port -> ksnr_addr

Assorted strings have been joined onto a single line, and various
indented have been changed from space to tabs.

There should be no behaviour changes, though the structures mentioned
above will now be a little larger.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I42d12260185638407b5b611391fc69bfd9f91754
Reviewed-on: https://review.whamcloud.com/37704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13929 lnet: modify assertion in lnet_post_send_locked 49/40749/4
Serguei Smirnov [Wed, 25 Nov 2020 00:05:48 +0000 (16:05 -0800)]
LU-13929 lnet: modify assertion in lnet_post_send_locked

Check that the pointer to the local interface is not NULL
before asserting. While checking if local ni is the destination,
the assertion may attempt to dereference pointer to local
interface after it has already been cleaned up on shutdown.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0f4be04a728a7243823bec70f9efbe52bcb104b3
Reviewed-on: https://review.whamcloud.com/40749
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14362 tests: sanity-flr to prepare stuff before checks 07/41307/6
Alex Zhuravlev [Mon, 25 Jan 2021 05:37:12 +0000 (08:37 +0300)]
LU-14362 tests: sanity-flr to prepare stuff before checks

otherwise /mnt/lustre may be missing and sanity-flr fails

Test-Parameters: trivial testlist=sanity-flr
Change-Id: Ic5502ee07bab557162fbf16718ca5fb5beed45e9
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41307
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-14423 osd: recognize holes in osd_is_mapped() 81/41481/3
Alex Zhuravlev [Thu, 11 Feb 2021 14:33:01 +0000 (17:33 +0300)]
LU-14423 osd: recognize holes in osd_is_mapped()

ldiskfs_fiemap() can return {0,0,0} for last non-allocated
region.  osd_is_mapped() should be able to recognize and
cache this state.

Fixes: 144b5a65c1 ("LU-7132 osd-ldiskfs: speedup rewrites")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I03883038c2c0ec84754377a442c4947c7e3021a9
Reviewed-on: https://review.whamcloud.com/41481
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14398 llapi: add llapi_fid2path_at() 06/41406/2
John L. Hammond [Wed, 3 Feb 2021 19:06:16 +0000 (13:06 -0600)]
LU-14398 llapi: add llapi_fid2path_at()

Add llapi_fid2path_at() which works like llapi_fid2path() takes an
open FD on the moint point instead of a 'fsname or dirirectory path'
and a const struct lu_fid * instead of a const char *.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I76234bc28de231587b65c5d866954441e0893aac
Reviewed-on: https://review.whamcloud.com/41406
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14398 llapi: simplify llapi_fid2path() 05/41405/2
John L. Hammond [Wed, 3 Feb 2021 18:33:26 +0000 (12:33 -0600)]
LU-14398 llapi: simplify llapi_fid2path()

Simplify llapi_fid2path(). Remove the fid_is_sane() check. Remove the
call to root_ioctl() and use get_root_path() directly.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib70b8c9e239c77da8b46408de8341fc8aaf4d1c3
Reviewed-on: https://review.whamcloud.com/41405
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
9 months agoLU-14390 gnilnd: Use DIV_ROUND_UP to calculate niov 71/41371/2
Shaun Tancheff [Sun, 31 Jan 2021 14:47:21 +0000 (08:47 -0600)]
LU-14390 gnilnd: Use DIV_ROUND_UP to calculate niov

Use DIV_ROUND_UP to calculate niov, also remove 'unlikely'
as this is the common case.

Fixes: 7a74d382 ("LU-13004 modules: replace lnet_kiov_t with struct bio_vec")
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iaf62326743ebe5e92bb3aa4b5780b47a5cfdfb18
Reviewed-on: https://review.whamcloud.com/41371
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14301 lustre: add ENOTSUPP to spelling.txt 80/41280/2
John L. Hammond [Wed, 20 Jan 2021 15:22:54 +0000 (09:22 -0600)]
LU-14301 lustre: add ENOTSUPP to spelling.txt

Add a spelling check for ENOTSUPP to suggest use of EOPNOTSUPP
instead. Note:

ENOTSUPP (524) and defined only in the kernel errno.h and is a NFSv3
specific errno. If ENOTSUPP is returned to userspace then strerror()
will print "Unknown error 524".

EOPNOTSUPP (95) is defined in kernel and userspace errno.h.

ENOTSUP is defined in userspace errno.h as an alias for EOPNOTSUPP.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I13b0389c9ec0853f43d8ab4a8f6538eb24c8a2ad
Reviewed-on: https://review.whamcloud.com/41280
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
9 months agoLU-12766 test: convert time to seconds properly 59/41259/4
Wang Shilong [Mon, 18 Jan 2021 02:26:31 +0000 (10:26 +0800)]
LU-12766 test: convert time to seconds properly

According to test logs, grace time could be 3m56s,
and it will be converted to 21305s by wrongly which
caused timeout of test sytem.

Actually, grace time could be something like 1w2d3h4m5s
Fix calculations for this.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I1c02e83f5ec15de2a4a6b312dd6e36b55dd4a7bc
Reviewed-on: https://review.whamcloud.com/41259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13903 build: make lustre-devel buildable for Linux client 88/41188/2
James Simmons [Sat, 9 Jan 2021 15:24:16 +0000 (10:24 -0500)]
LU-13903 build: make lustre-devel buildable for Linux client

Recently a new lustre-devel rpm was created which always has a
dependency on the kernel detected by the kmod macros. This doesn't
work in the case of running a custom Linux kernel. Addreess this
by not making use of the kmod macros when only building the Lustre
utilies i.e configure --disable-modules.

Test-Parameters: trivial
Fixes: 16af4e5ed634 ("LU-9215 build: Re-add the lustre-devel package")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I324fbaa0f5b03e2095b493f3d8e00b74ca64298a
Reviewed-on: https://review.whamcloud.com/41188
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14313 utils: mount error when no server support 82/41182/4
Olaf Faaland [Wed, 6 Jan 2021 09:57:26 +0000 (01:57 -0800)]
LU-14313 utils: mount error when no server support

When the mount utility was built without server support,
and this causes the mount to fail, an error message should
say so.

The error values returned by main() should be positive as
they are exposed to the user.

Upon mount() failure, errno must be checked to determine
the reason for the failure.  The return value is -1.

Test-Parameters: trivial
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I49906340f6ddbe16b7503663cedabe0085daf113
Reviewed-on: https://review.whamcloud.com/41182
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14268 lod: fix layout generation inc for mirror split 68/41068/4
Bobi Jam [Tue, 22 Dec 2020 05:58:41 +0000 (13:58 +0800)]
LU-14268 lod: fix layout generation inc for mirror split

Mirror split does not increase the layout generation properly.

Mirror split does not change FLR state of the file, even when it
contains 1 mirror afterwards, and FLR state should be LCM_FL_NONE
instead.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9c9621d67d901f2e9ca6ed3e0684cd308c396076
Reviewed-on: https://review.whamcloud.com/41068
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14098 obdclass: try to skip corrupted llog records 54/40754/10
Alex Zhuravlev [Thu, 26 Nov 2020 12:07:24 +0000 (15:07 +0300)]
LU-14098 obdclass: try to skip corrupted llog records

if llog's header or record is found corrupted, then
ignore the remaining records and try with the next one.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If47ec1fc1e2eaf64be7ba08d3aa9c2b93903c0cf
Reviewed-on: https://review.whamcloud.com/40754
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9820 osd-ldiskfs: OI scrub speed limit fix 91/40591/3
Lai Siyao [Tue, 10 Nov 2020 02:44:17 +0000 (10:44 +0800)]
LU-9820 osd-ldiskfs: OI scrub speed limit fix

The OI scrub speed limit is set at start time, and shouldn't be
changed in the middle, otherwise lfsck is run with speed control,
while OI scrub may finish before it, thus sanity-scrub 9 failed.

Allow wider margin in speed control since the speed control is
heuristic.

Test-Parameters: testlist=sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I995679db085a193c71fc7914f961b538b5930c69
Reviewed-on: https://review.whamcloud.com/40591
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14099 build: Fix for unconfigured arch_stackwalk 03/40503/7
Shaun Tancheff [Sat, 23 Jan 2021 15:22:48 +0000 (09:22 -0600)]
LU-14099 build: Fix for unconfigured arch_stackwalk

On aarch64 CONFIG_ARCH_STACKWALK is not defined and
print_stack_trace is not available.

Replace print_stack_trace with an open-coded variant
using %pB introduced in Linux v2.6.38-6557-g0f77a8d37825

This also fixes the symbols lookup of stack_trace_save_tsk
using kallsyms at module init time over the use of
symbol_get.

HPE-bug-id: LUS-9518
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I04c3a0a84bb1a05d813a90502d1ed0f5bb2e33ab
Reviewed-on: https://review.whamcloud.com/40503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14044 llog: check fid after convert 94/40294/6
Yang Sheng [Sat, 17 Oct 2020 15:28:43 +0000 (23:28 +0800)]
LU-14044 llog: check fid after convert

We should convert from llog_id and then check fid. Also
change fid-lookup to error check instead LASSERT.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I673d8f16ff9e57a0482d6a3ec3ee3db33699f57f
Reviewed-on: https://review.whamcloud.com/40294
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13620 tests: pool_add_targets() fix 03/38803/4
Sergey Cheremencev [Tue, 2 Jun 2020 08:03:08 +0000 (11:03 +0300)]
LU-13620 tests: pool_add_targets() fix

Fix pool_add_targets to don't fail if number of
OSTs is >= 10 - lctl expects them in a hex view.
Check result of "lctl pool_add". Pass only if
the result is either 0 or 17(EEXIST).
Finally, make pool_add_targets to check that were
added only requested OSTs, i.e. don't fail to add
OST0 if OST1 is already in a pool.

Test-Parameters: trivial testlist=ost-pools
HPE-bug-id: LUS-8723
HPE-bug-id: LUS-8941
Change-Id: I841b3db3a89dbc86075cd23b7d71764ffb849181
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/38803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13584 tests: gather_logs() fix 60/38660/6
Elena Gryaznova [Tue, 19 May 2020 10:29:32 +0000 (13:29 +0300)]
LU-13584 tests: gather_logs() fix

Fix gather_logs() to work on real HW where server nodes
do not have the access to clients.

Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-8888
Reviewed-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: Ifeea54e20d3123ee64582e32b92a4573e60ff33e
Reviewed-on: https://review.whamcloud.com/38660
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13513 osp: make neterr not fatal for precreate_reserve 72/38472/10
Vladimir Saveliev [Mon, 2 Nov 2020 10:10:42 +0000 (13:10 +0300)]
LU-13513 osp: make neterr not fatal for precreate_reserve

When OST_CREATE (not resendable rpc) sent by precreate thread fails
with network error, osp_pre_update_status() sets d->opd_pre_status to
EIO. osp_precreate_reserve() considers EIO as fatal and does not wait
for another attempt from precreate thread. That may make
mdt_intent_open() to return ENOSPC confusing a caller.  ENOSPC comes
from lod_alloc_rr().

osp_precreate_send(): in case of network error switch EIO to ENOTCONN.

Test to illustrate the issue is added.

Cray-bug-id: LUS-8811
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: Iffaad9bd16f216f758c784b708e21b525c999b14
Reviewed-on: https://review.whamcloud.com/38472
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13453 osd-ldiskfs: do not leak inode if OI insertion fails 35/38235/16
Alex Zhuravlev [Wed, 15 Apr 2020 14:54:07 +0000 (17:54 +0300)]
LU-13453 osd-ldiskfs: do not leak inode if OI insertion fails

osd_create() should destroy just created inode if OI insertion
fails.

also fixes lustre_index_restore() to drop nlink for object to
be removed.

the patch adds two tests:
 - ENOSPC on OI insertion
 - ENOSPC on .. insertion, i.e. directory block allocation

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2a5db657c7dab54b8dc2c50bc29365d5ee754a2e
Reviewed-on: https://review.whamcloud.com/38235
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoMerge "LU-9121 lnet: User Defined Selection Policy (UDSP)"
Gerrit Code Review [Fri, 26 Feb 2021 07:21:34 +0000 (07:21 +0000)]
Merge "LU-9121 lnet: User Defined Selection Policy (UDSP)"

9 months agoLU-12682 llite: fake symlink type of foreign file/dir 56/35856/52
Bruno Faccini [Thu, 22 Aug 2019 08:22:53 +0000 (10:22 +0200)]
LU-12682 llite: fake symlink type of foreign file/dir

This patch implements a "fake symlink" specific usage of
"foreign" LOV/LMV format. It basically allows these
particular type of foreign files/dirs to behave as a
symlink from VFS point of view, by allowing to construct
a relative path from the LOV/LMV foreign content, to
complement it with a prefix, and then to expose it to
the VFS as a symlink destination. The default/internal
mechanism simply takes the full foreign free string as
the relative path, and for more complex internal formats
an upcall has been implemented to provide format's
details (presently just in terms of constant strings
and substrings positions in EA, but this can be enhanced)
to llite layer.
Using this feature, instead of real symlinks or user EA,
will permit to benefit from the special features (lock,
prefetch, caches) already implemented to handle both
LOV/LMV EAs.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Id3c262e3b042447aa09aad25f682ff02787b350d
Reviewed-on: https://review.whamcloud.com/35856
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14444 gss: handle empty reqmsg in sptlrpc_req_ctx_switch 85/41685/2
Sebastien Buisson [Thu, 18 Feb 2021 11:03:31 +0000 (20:03 +0900)]
LU-14444 gss: handle empty reqmsg in sptlrpc_req_ctx_switch

In sptlrpc_req_ctx_switch(), everything is already there to handle
the case of a ptlrpc_request that has an empty rq_reqmsg.
But assertions were left over at the beginning of the function, so
just remove them from here.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6ae1f8b9da9600d3b57b9efc9018c2461114f2fe
Reviewed-on: https://review.whamcloud.com/41685
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14455 mdt: fix DoM lock prolong logic 01/41701/3
Mikhail Pershin [Fri, 19 Feb 2021 20:50:54 +0000 (23:50 +0300)]
LU-14455 mdt: fix DoM lock prolong logic

- don't stop at the first found lock if it is not PW or EX lock
- add LCK_GROUP lock as valid mode of lock to check

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If947f8565008953cc34146b6f0ac1e0f0a038bb5
Reviewed-on: https://review.whamcloud.com/41701
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14436 tgt: only use T10PI guard when doing full sector read 77/41677/2
Li Dongyang [Tue, 16 Feb 2021 12:40:05 +0000 (23:40 +1100)]
LU-14436 tgt: only use T10PI guard when doing full sector read

The T10PI guard was generated on full sectors, if we
do we partial read and still use the guard, the rpc
checksum won't match.

Test-Parameters: trivial
Change-Id: I40d481d703a46b9711021a162208b86a956bd8d1
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/41677
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14435 doc: include lfs-flushctx manpage inside packages 76/41676/4
Sebastien Buisson [Tue, 16 Feb 2021 08:58:25 +0000 (17:58 +0900)]
LU-14435 doc: include lfs-flushctx manpage inside packages

lfs manpage redirects to lfs-flushctx(1), so it has to be
included in the Lustre packages.

Test-Parameters: trivial
Fixes: c246a9ba04 ("LU-14263 gss: unlink revoked key")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5c55f8a74eb6dac20fa85b6ea0663ad701341006
Reviewed-on: https://review.whamcloud.com/41676
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
9 months agoLU-14430 mdd: fix inheritance of big default ACLs 94/41494/6
Mikhail Pershin [Fri, 12 Feb 2021 07:16:24 +0000 (10:16 +0300)]
LU-14430 mdd: fix inheritance of big default ACLs

If the number of default ACLs in directory is more than 31, then
mdd_acl_init() fails to inherit them for a newly created file.
This limitation is caused by using a fixed-size def_acl_buf buffer
in the mdd_create()->mdd_acl_init() call chain. Instead, the
default ACL buffer should be increased when it is needed.

Patch adds check for -ERANGE after mdd_acl_init(), reallocates
default ACL buffer with required size and calls mdd_acl_init()
again. Thus big default ACL are processed as expected.

Fixes: 6350af100c20 ("LU-3437 mdd: Fix ACL/def_ACL during object creation")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I700da90c09f824955fcb8dc7ca0bc2f581f916a0
Reviewed-on: https://review.whamcloud.com/41494
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoNew tag 2.14.50 2.14.50 v2_14_50
Oleg Drokin [Mon, 22 Feb 2021 19:24:50 +0000 (14:24 -0500)]
New tag 2.14.50

Change-Id: I760c8197bafa0f7eb8d1f5d78863900d647ccddc
Signed-off-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9121 lnet: User Defined Selection Policy (UDSP) 18/41718/1
Amir Shehata [Mon, 22 Feb 2021 17:20:31 +0000 (09:20 -0800)]
LU-9121 lnet: User Defined Selection Policy (UDSP)

User Defined Selection Policies (UDSP) are introduced to add
ability of fine traffic control. The policies are instantiated
on LNet constructs and allow preference of some constructs
over others as an extension of the selection algorithm.
The order of operation is defined by the selection
algorithm logical flow:

   1. Iterate over all the networks that a peer can be reached on
      and select the best local network
      - The remote network with the highest priority is examined
        (Network Rule)
      - The local network with the highest priority is selected
        (Network Rule)
      - The local NI with the highest priority is selected
        (NID Rule)
   2. If the peer is a remote peer and has no local networks,
      - then select the remote peer network with the highest
        priority (Network Rule)
      - Select the highest priority remote peer_ni on the network
        selected (NID Rule)
      - Now that the peer's network and NI are decided, select
        the router in round robin from the peer NI's preferred
        router list. (Router Rule)
      - Select the highest priority local NI on the local net of the
        selected route.
        (NID Rule)
   3. Otherwise for local peers, select the peer_ni from the peer.
      - highest priority peer NI is selected
        (NID Rule)
      - Select the peer NI which has the local NI selected on its
        preferred list.
        (NID Pair Rule)

   Accordingly, the User Interface allows for the following:
   - Adding a local network udsp: if multiple local networks are
     available, each one can have a priority.
   - Adding a local NID udsp: after a local network is chosen,
     if there are multiple NIs, each one can have a priority.
   - Adding a remote NID udsp: assign priority to a peer NID.
   - Adding a NID pair udsp: allows to specify local NIDs
     to be added on the list on the specified peer NIs
     When selecting a peer NI, the one with the
     local NID being used on its list is preferred.
   - Adding a Router udsp: similar to the NID pair udsp.
     Specified router NIDs are added on the list on the specified
     peer NIs. When sending to a remote peer, remote net is selected
     and the peer NID is selected. The router which has its nid on
     the peer NI list is preferred.

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0d94d160bdb8c09c1cf467f1c155e14ab761a47e

9 months agoLU-9121 lnet: Add info on udsp to lnetctl man page multi-rail
Serguei Smirnov [Thu, 21 May 2020 23:28:39 +0000 (19:28 -0400)]
LU-9121 lnet: Add info on udsp to lnetctl man page

Adding description of UDSP commands to lnetctl man page.
Listing each UDSP rule type with specific parameters.
Adding some examples of UDSP commands.

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1bde8a4da217c9ba89f31b5d9a1e7d26658bfb40
Reviewed-on: https://review.whamcloud.com/38698
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
9 months agoLU-9121 lnet: add show udsp command
Amir Shehata [Tue, 2 Apr 2019 23:16:59 +0000 (16:16 -0700)]
LU-9121 lnet: add show udsp command

Add the show udsp command in liblnetconfig and lnetctl

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ibddfa60a7b257b136a6be403e94d6f73fb444222
Reviewed-on: https://review.whamcloud.com/34580
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-9121 lnet: Delete a selection policy
Sonia Sharma [Fri, 29 Mar 2019 10:51:01 +0000 (06:51 -0400)]
LU-9121 lnet: Delete a selection policy

This patch adds the function lustre_lnet_del_udsp()
which takes index of the udsp as input and calls
the IOC_LIBCFS_DEL_UDSP ioctl handler to
delete the udsp with that index value.

Change-Id: Ib81c79fd4e76f4e8fbd84709e08cb5d29a059e63
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34553
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-9121 lnet: Add a selection policy
Sonia Sharma [Wed, 20 Mar 2019 05:20:12 +0000 (01:20 -0400)]
LU-9121 lnet: Add a selection policy

This patch adds the function lustre_lnet_add_udsp()
which marshals the input rules and makes an ioctl
call to add those rules in the lnet structures.

Change-Id: I5fa1e34d401b9eca58381e23cac2eb49c4dd3575
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34529
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-9121 lnet: Add the userspace Marshalling API
Sonia Sharma [Tue, 19 Mar 2019 18:56:35 +0000 (14:56 -0400)]
LU-9121 lnet: Add the userspace Marshalling API

Given a UDSP and a memory block of size
enough to hold a marshalled udsp, marshal
the UDSP pointed to by udsp into the memory block.

Change-Id: I67e9c2cc0d7f3dab1e968019d5ee546e6fefeba3
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34513
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
9 months agoLU-9121 lnet: Add the userspace De-Marshalling API
Sonia Sharma [Tue, 19 Mar 2019 10:00:39 +0000 (06:00 -0400)]
LU-9121 lnet: Add the userspace De-Marshalling API

Given a bulk containing a single UDSP, De-Marshalling
API demarshals it and populates a udsp structure.

Change-Id: Id2502dffc6fd259ad552c018b0cab415a12a8cfe
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34501
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-9121 lnet: ioctl handler for get policy info
Amir Shehata [Wed, 3 Apr 2019 00:01:06 +0000 (17:01 -0700)]
LU-9121 lnet: ioctl handler for get policy info

Add ioctl handler for GET_UDSP_SIZE and GET_UDSP

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e188cac92ca98c26b26e00b8917ad3632d99d06
Reviewed-on: https://review.whamcloud.com/34579
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: ioctl handler for "delete policy"
Sonia Sharma [Fri, 29 Mar 2019 08:19:35 +0000 (04:19 -0400)]
LU-9121 lnet: ioctl handler for "delete policy"

The ioctl handler for "delete policy" deletes
a policy with the given index value. It
returns 0 if a policy with that index is found
else it returns -EINVAL.

Change-Id: I037c6808e71e5c13ccf46c243145b0ce6f1229cb
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34552
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: Add the ioctl handler for "add policy"
Sonia Sharma [Tue, 19 Mar 2019 23:14:13 +0000 (19:14 -0400)]
LU-9121 lnet: Add the ioctl handler for "add policy"

The ioctl handler for "add policy" de-marshals the
udsp rules passed from userspace and then add the
rules if there is no copy of the same rules already
added. Apply the rules to the existing LNet
constructs.

Change-Id: Ia76ecdea6de94e1dbcfe71bfcb5a1753fb3c874d
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34514
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: Add the kernel level De-Marshalling API
Sonia Sharma [Sat, 16 Mar 2019 09:23:02 +0000 (05:23 -0400)]
LU-9121 lnet: Add the kernel level De-Marshalling API

Given a bulk allocated from userspace containing a
single UDSP, De-Marshalling API demarshals it
and populate the provided udsp structure.

Change-Id: I0a5a9148f8b2abc64284cb7780c37cbc8063b828
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34488
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: Add the kernel level Marshalling API
Sonia Sharma [Sat, 23 Feb 2019 04:07:45 +0000 (23:07 -0500)]
LU-9121 lnet: Add the kernel level Marshalling API

Given a UDSP, Marshal the UDSP pointed to by udsp
into the memory block that is allocated from userspace.

Change-Id: I325c977fd9c902f7ee31fceaaf07abf27ef7391e
Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34403
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: Apply UDSP on local and remote NIs
Amir Shehata [Fri, 1 Mar 2019 00:57:30 +0000 (16:57 -0800)]
LU-9121 lnet: Apply UDSP on local and remote NIs

When a peer net, peer ni, local net or local ni are created
apply the UDSPs in the system on these constructs.

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0ee427f1bc0d1c8d1ddc17072dcbd9442403fa0f
Reviewed-on: https://review.whamcloud.com/34355
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-9121 lnet: UDSP handling
Amir Shehata [Fri, 22 Feb 2019 01:39:06 +0000 (17:39 -0800)]
LU-9121 lnet: UDSP handling

This patch adds the following functionality:
1. Add UDSPs
2. Delete UDSPs
3. Apply UDSPs

- Adding a local network udsp: if multiple local networks are
available, each one can have a priority.
- Adding a local NID udsp: after a local network is chosen,
if there are multiple NIs, each one can have a priority.
- Adding a remote NID udsp: assign priority to peer NIDs.
- Adding a NID pair udsp: allows to specify local] NIDs
to be added to the list on the specified peer NIs. When
selecting a peer NI, the one with the local NID being used
on its list is preferred.
- Adding a Router udsp: similar to the NID pair udsp.
Specified router NIDs are added on the list on the specified
peer NIs. When sending to the remote peer, remote net is
selected and the peer NID is selected. The router which has
its nid on the peer NI list is preferred.
- Deleting a udsp: use the specified policy index to remove it
from the policy list.

Generally, the syntax is as follows
 lnetctl policy <add | del | show>
  --src: ip2nets syntax specifying the local NID to match
  --dst: ip2nets syntax specifying the remote NID to match
  --rte: ip2nets syntax specifying the router NID to match
  --priority: Priority to apply to rule matches
  --idx: Index of where to insert the rule. By default it appends
 to the end of the rule list

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If588b78f3d5fce286270afb02e0e7683185d6898
Reviewed-on: https://review.whamcloud.com/34354
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: select best peer and local net
Amir Shehata [Sat, 16 Feb 2019 01:59:40 +0000 (17:59 -0800)]
LU-9121 lnet: select best peer and local net

Select the healthiest and highest priority peer and local net when
sending a message.

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I42717e7fdc3226c6faa7c59c713f18422e27f2e5
Reviewed-on: https://review.whamcloud.com/34352
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
9 months agoLU-9121 lnet: Select NI/peer NI with highest prio
Amir Shehata [Fri, 15 Feb 2019 23:09:51 +0000 (15:09 -0800)]
LU-9121 lnet: Select NI/peer NI with highest prio

Modify the selection algorithm to select the highest priority
local and peer NI. Health always trumps all other selection
criteria

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I487a706f4da30311d0bd59fe03f72dbe68a52425
Reviewed-on: https://review.whamcloud.com/34351
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
9 months agoLU-9121 lnet: Preferred gateway selection
Amir Shehata [Wed, 20 Feb 2019 02:13:40 +0000 (18:13 -0800)]
LU-9121 lnet: Preferred gateway selection

Add mechanism for managing preferred gateway lists.
When selecting a route through a gateway, if there exists
a preferred gateway list for the destination peer, then choose
the preferred gateway. If there are multiple preferred
gateways, to make the selection, use in order of decreasing
importance: route priority, number of hops, number of available
tx credits on the associated lpni and route sequence counters.
If there are no preferred routes, select the best route
available using the same criteria.

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If46920cf7b79aa8b211d6c0a35995edce9b1699a
Reviewed-on: https://review.whamcloud.com/34353
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
9 months agoLU-9121 lnet: foundation patch for selection mod
Amir Shehata [Fri, 15 Feb 2019 20:45:28 +0000 (12:45 -0800)]
LU-9121 lnet: foundation patch for selection mod

Add the priority and preferred NIDs fields in the lnet_ni,
lnet_net, lnet_peer_net and lnet_peer_ni. Switched
the implementation of the preferred NIDs list to list_head
instead of array, because the code is more straight forward.
There is more memory overhead due to list_head, but these lists
are expected to be small, so I chose code simplicity over memory.

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0c75855b736345c25e1604083eee2b65d38ef28d
Reviewed-on: https://review.whamcloud.com/34350
Reviewed-by: Chris Horn <chris.horn@hpe.com>
9 months agoLU-9121 lnet: UDSP liblnetconfig structure def
Amir Shehata [Thu, 14 Feb 2019 01:22:57 +0000 (17:22 -0800)]
LU-9121 lnet: UDSP liblnetconfig structure def

Definition of the UDSP structures used to store parsed UDSPs

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifc61a191de85d7904c70917d0c45648aecdd0310
Reviewed-on: https://review.whamcloud.com/34254
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
9 months agoLU-9121 lnet: UDSP storage and marshalled structs
Amir Shehata [Wed, 13 Feb 2019 23:21:54 +0000 (15:21 -0800)]
LU-9121 lnet: UDSP storage and marshalled structs

Commit the structures which will be used by kernel space
to store the UDSPs. This commit also adds the IOCTL structures
which are used for marshalling the UDSPs between user and
kernel space.

Test-Parameters: trivial testlist=lnet-selftest,sanity-lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I490fa9a050cb1f8debc381be773cda4ce8abe29b
Reviewed-on: https://review.whamcloud.com/34253
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
9 months agoNew release 2.14.0 2.14.0 v2_14_0
Oleg Drokin [Fri, 19 Feb 2021 19:28:17 +0000 (14:28 -0500)]
New release 2.14.0

Change-Id: I2eb99af8fbeaab80b6614e427b77949b1225b406
Signed-off-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14345 misc: update e2fsprogs to 1.45.6.wc5 33/41433/3
Andreas Dilger [Mon, 8 Feb 2021 11:56:25 +0000 (04:56 -0700)]
LU-14345 misc: update e2fsprogs to 1.45.6.wc5

Update Changelog to reference new e2fsprogs release.

4aea203f LU-5949 e2fsck: call delete_inode() properly
8725134d LU-5949 e2fsck: simplify inode badness handling
71b74579 LU-14345 e2fsck: fix check of directories over 4GB

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie0845eeed410f6f9f8ef985342fc19d160aa8cb0
Reviewed-on: https://review.whamcloud.com/41433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoNew RC 2.14.0-RC3 2.14.0-RC3 v2_14_0-RC3
Oleg Drokin [Sat, 13 Feb 2021 00:52:28 +0000 (19:52 -0500)]
New RC 2.14.0-RC3

Change-Id: I594b5c6d0da7f067bef69fa7a7027374d4434dd8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14424 Revert "LU-9679 osc: simplify osc_extent_find()" 98/41498/2
Oleg Drokin [Fri, 12 Feb 2021 15:55:42 +0000 (10:55 -0500)]
LU-14424 Revert "LU-9679 osc: simplify osc_extent_find()"

It looks like there are performance regressions atttributed to this patch.

This reverts commit 80e21cce3dd6748fd760786cafe9c26d502fd74f.

Change-Id: I55e0abd50573dd82a9d216f9c3b01483f99c3223
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41498
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
9 months agoNew release candidate 2.14.0 RC2 2.14.0-RC2 v2_14_0-RC2
Oleg Drokin [Mon, 8 Feb 2021 22:13:56 +0000 (17:13 -0500)]
New release candidate 2.14.0 RC2

Change-Id: Iad3d71e7dcf96173d192717ef4fef3f0dc12b051
Signed-off-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13751 tests: remove read of changelog sanity 160j 17/41317/7
James Nunez [Tue, 26 Jan 2021 01:15:49 +0000 (18:15 -0700)]
LU-13751 tests: remove read of changelog sanity 160j

sanity test 160j tries to read the changelog after one of two
client mounts is unmounted.  In this case, we can fail to read
the changelog and get a "Cannot send after transport endpoint
shutdown" error.

The intention of sanity test 160j is to check that
there is no LBUG due to missed obd device.  So, do not try to
read from the changelog after file system unmount.

Test-Parameters: trivial testlist=sanity env=ONLY=160j,ONLY_REPEAT=200
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I1746a422b25d546b9aae38ae8438d9c08bce8827
Reviewed-on: https://review.whamcloud.com/41317
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
9 months agoLU-14355 ptlrpc: do not output error when imp_sec is freed 10/41310/2
Sebastien Buisson [Mon, 25 Jan 2021 08:24:19 +0000 (17:24 +0900)]
LU-14355 ptlrpc: do not output error when imp_sec is freed

There is a race condition on client reconnect when the import is being
destroyed.  Some outstanding client bound requests are being processed
when the imp_sec has already been freed.
Ensure to output the error message in import_sec_validate_get() only
if import is not already in the zombie work queue.

Fixes: 135fea8fa9 ("LU-4423 obdclass: use workqueue for zombie management")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4b431128e04f11b1e3ee7de47090af87538c3558
Reviewed-on: https://review.whamcloud.com/41310
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14299 test: sleep to enable quota acquire again 89/41389/3
Hongchao Zhang [Fri, 29 Jan 2021 20:51:43 +0000 (04:51 +0800)]
LU-14299 test: sleep to enable quota acquire again

sanity-quota test_61 fails with incorrect quota exceeded
errors because quota acquire will be disabled for 5 seconds
after edquot flag is set.  The test should introduce some
delay between the test of over quota and normal one.

Test-Parameters: trivial fstype=zfs testlist=sanity-quota env=ONLY=61,ONLY_REPEAT=20
Fixes: 530881fe4ee20 ("LU-7816 quota: add default quota setting support")
Change-Id: I8040ba960f32cf01cb7cee3a77c06ad4bd732f0e
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41389
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13449 tests: fix recovery-small test_140b check 09/39909/2
Andreas Dilger [Mon, 14 Sep 2020 23:07:17 +0000 (17:07 -0600)]
LU-13449 tests: fix recovery-small test_140b check

The recovery timer is printed in MM:SS format, but the current test
is unhappy if the SS part is printed as "08" or "09" since that is
interpreted by bash as an invalid octal number.  Also, the current
check does not handle the case if recovery is longer than a minute.

Change the code to convert MM:SS back to seconds for the comparison.

Test-Parameters: trivial testlist=recovery-small env=ONLY=140
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie1dc77a88bb0e8fd5025f2b5ca57d4a61d3ebbe5
Reviewed-on: https://review.whamcloud.com/39909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14316 llite: quiet spurious ioctl warning 27/41427/2
Andreas Dilger [Fri, 5 Feb 2021 20:13:10 +0000 (13:13 -0700)]
LU-14316 llite: quiet spurious ioctl warning

Calling "lfs setstripe" prints a suprious warning about using the old
ioctl(LL_IOC_LOV_GETSTRIPE) when that is not actually the case.

Remove the ioctl warning for now and deal with related issues later.

Fixes: 364ec95f3688 ("LU-9367 llite: restore ll_file_getstripe in ll_lov_setstripe")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I20f5a7adb60a30fce27e49827bd46229e2ce7057
Reviewed-on: https://review.whamcloud.com/41427
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-11607 tests: replace lustre_version/fstype in sanity-scrub 29/35929/10
James Nunez [Mon, 26 Aug 2019 22:45:40 +0000 (16:45 -0600)]
LU-11607 tests: replace lustre_version/fstype in sanity-scrub

The routine get_lustre_env() is available to all Lustre test
suites and sets an environment variable for the Lustre
version and file system types of servers.

In sanity-scrub, sanity-hsm and sanity-lfsck, replace calls
to lustre_version_code() and facet_fstype() for all server
types with definitions from get_lustre_env().

Clean up around any modifications by removing calls to
return after skip() or skip_env().

Fixes: c8790ae52393 (LU-1538 tests: standardize test script init - dne-part-2)
Fixes: c54b6ca2bdb5 (LU-13718 tests: add LU numbers to skipped tests)
Test-Parameters: trivial fstype=ldiskfs testlist=sanity-scrub,sanity-hsm,sanity-lfsck
Test-Parameters: fstype=zfs testlist=sanity-scrub,sanity-hsm,sanity-lfsck
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I5aa2898f6efef127cf2d7f4a2f08838f503c51ab
Reviewed-on: https://review.whamcloud.com/35929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
9 months agoLU-14389 lov: avoid NULL dereference in cleanup 98/41398/2
Andreas Dilger [Sun, 31 Jan 2021 07:20:47 +0000 (00:20 -0700)]
LU-14389 lov: avoid NULL dereference in cleanup

Running racer concurrently with file migration crashes easily
when the layout changes for a file in an unexpected way:

  lov_init_composite() lustre-clilov: DOM entries with different sizes
  lov_layout_change() lustre-clilov: cannot apply new layout on
      [0x200000402:0x3e6a:0x0] : rc = -22
  BUG: unable to handle kernel NULL pointer dereference at 0x00000014
  IP: [<ffffffffa08baef4>] lov_delete_composite+0x104/0x540 [lov]
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
  CPU: 1 PID: 20227 Comm: ln

Avoid the NULL dereference if the entry is not fully initialized
during cleanup.

Test-Parameters: testlist=racer env=DURATION=3600
Fixes: 61a002cd863 ("LU-13602 flr: skip unknown FLR component types")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8fe17f1b49ca2bccc7a285febe47032d023ebbe5
Reviewed-on: https://review.whamcloud.com/41398
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13669 llite: make readahead aware of hints 28/41228/3
Wang Shilong [Sat, 9 Jan 2021 10:28:43 +0000 (18:28 +0800)]
LU-13669 llite: make readahead aware of hints

Calling madvise(MADV_SEQUENTIAL) and madvise(MADV_RANDOM) sets the
VM_SEQ_READ and VM_RAND_READ hints in vma->vm_flags.  These should
be used to guide the Lustre readahead for better performance.

Disable the kernel readahead for mmap() pages and use the llite
readahead instead.  There was also a bug in ll_fault0() that would
set both VM_SEQ_READ and VM_RAND_READ at the same time, which was
confusing the detection of the VM_SEQ_READ case, since VM_RAND_READ
was being checked first.

This changes the readahead for mmap from submitting mostly 4KB RPCs
to a large number of 1MB RPCs for the application profiled:

  llite.*.read_ahead_stats     before        patched
  ------------------------     ------        -------
  hits                           2408         135924 samples [pages]
  misses                        34160           2384 samples [pages]

  osc.*.rpc_stats           read before    read patched
  ---------------          -------------  --------------
  pages per rpc            rpcs   % cum%   rpcs   % cum%
     1:                    6542  95  95     351  55  55
     2:                     224   3  99      76  12  67
     4:                      32   0  99      28   4  72
     8:                       2   0  99       9   1  73
    16:                      25   0  99      32   5  78
    32:                       0   0  99       8   1  80
    64:                       0   0  99       5   0  80
   128:                       0   0  99      15   2  83
   256:                       2   0  99     102  16  99
   512:                       0   0  99       0   0  99
  1024:                       1   0 100       3   0 100

Readahead hit rate improved from 6% to 98%, and 4KB RPCs dropped from
95% to 55% and 1MB+ RPCs increased from 0% to 16% (79% of all pages).

Add debug to ll_file_mmap(), ll_fault() and ll_fault_io_init() to
allow tracing VMA state functions for future IO optimizations.

Fixes: 62ef9c949753 ("add 2.6.27 kernel support")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I4bbb028db05b21ae01dafe6a7bea398e9b74d8a4
Reviewed-on: https://review.whamcloud.com/41228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-10958 ofd: data corruption due to RPC reordering 81/32281/18
Andrew Perepechko [Mon, 9 Dec 2019 17:13:50 +0000 (20:13 +0300)]
LU-10958 ofd: data corruption due to RPC reordering

Without read-only cache, it is possible that a client
resends a BRW RPC, receives a reply from the original
BRW RPC, modifies the same data and sends a new BRW
RPC, however, because of RPC reordering stale data
gets to disk.

Let's use range locking to protect against this race.

Change-Id: I35cbf95594601eacfc5f108b14e4c447962b0bbf
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Cray-bug-id: LUS-5578,LUS-8943
Reviewed-on: https://review.whamcloud.com/32281
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoNew release candidate 2.14.0-RC1 2.14.0-RC1 v2_14_0-RC1
Oleg Drokin [Mon, 1 Feb 2021 19:11:10 +0000 (14:11 -0500)]
New release candidate 2.14.0-RC1

Change-Id: I54bced5067f605bae67faffce46d89383dc69a39
Signed-off-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13184 tests: wait for OST startup in test_112 93/37393/8
Andreas Dilger [Fri, 31 Jan 2020 21:09:14 +0000 (14:09 -0700)]
LU-13184 tests: wait for OST startup in test_112

Wait for OST0000 to finish mounting and connecting to the MDS before
forcing to create any files there.  Otherwise, intermittent failures
are seen because the OST has no objects and returns -ERANGE:

    lfs setstripe: 'f112.conf-sanity.0': Numerical result out of range
    problem creating f112.conf-sanity.0 on OST0000

Test-Parameters: trivial testlist=conf-sanity envdefinitions=ONLY=112,ONLY_REPEAT=100
Fixes: 416e67222b76 ("LU-12036 ofd: add 'no_precreate' mount option")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If2d547a42c51a7028803ec25680931a7593ebbe5
Reviewed-on: https://review.whamcloud.com/37393
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-14307 quota: fix NULL pointer dereference 64/41264/7
Sergey Cheremencev [Mon, 18 Jan 2021 22:11:45 +0000 (01:11 +0300)]
LU-14307 quota: fix NULL pointer dereference

Fix NULL pointer dereference at 0x20 in
qmt_trans_start_with_slv->lquota_lqe_debug.

HPE-bug-id: LUS-9662
Change-Id: Iead0df053ae0dcb7453c1910a4b4b7a3728da829
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/41264
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14356 utils: change position of the -lext2fs option 05/41305/2
Jian Yu [Sat, 23 Jan 2021 20:47:29 +0000 (12:47 -0800)]
LU-14356 utils: change position of the -lext2fs option

This patch changes the position of the -lext2fs option
in the gcc command line so as to resolve the following
issue:

mount_osd_ldiskfs.so: undefined symbol: unix_io_manager

Change-Id: I9ceaca867697c132b8d4a7800169101a024d17b8
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41305
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-13578 test: use a single read() in sanity test_39r 73/40973/3
John L. Hammond [Tue, 15 Dec 2020 15:34:34 +0000 (09:34 -0600)]
LU-13578 test: use a single read() in sanity test_39r

In sanity test_39r() ensure that the we only call read() once on the
file ot update it's atime. Otherwise the file atime may be greater
than the OST object atime due to a final read() done by dd which
returns no bytes and does not generate a BRW RPC to the OST. Even
though it returns 0 bytes, it requested a non-zero number of bytes and
is therefore required to update the file access time.

Fixes: 7c9ce8aac ("LU-13383 ofd: lazy atime update")
Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=400
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1dd4e93ac150b34f4fc943fe25b92bf9119b0461
Reviewed-on: https://review.whamcloud.com/40973
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
10 months agoLU-14198 build: Use pax tar format for build of dist-targets 15/40915/2
Daniel Ahlin [Tue, 8 Dec 2020 23:30:34 +0000 (00:30 +0100)]
LU-14198 build: Use pax tar format for build of dist-targets

   The tar ustar format used during build of dist-* targets (which is
   used when building e.g. rpms and debs) has several limitations and
   will prevent e.g. users with uid > 2097151 (2^21) from building
   these targets (or from passing the autoconf test for tar).

   This commit changes format from ustar to POSIX.1-2001/pax which
   will remove this and several other limits (with path-length being
   one relevant example, see LU-12078).

Signed-off-by: Daniel Ahlin <ahlin@google.com>
Change-Id: Ic66ca696ede2e359a04c179c6d630baacaa9bcb1
Reviewed-on: https://review.whamcloud.com/40915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
10 months agoLU-14286 osd-ldiskfs: enable fallocate by default 15/41315/3
Andreas Dilger [Mon, 25 Jan 2021 23:18:09 +0000 (16:18 -0700)]
LU-14286 osd-ldiskfs: enable fallocate by default

Enable fallocate on ldiskfs OSTs by default now that the known
problems have been resolved.  The default mode=0 is the standard
"allocate unwritten extents" behavior used by ext4.  This is by
far the fastest for space allocation, but requires the unwritten
extents to be split and/or zeroed when they are overwritten.

The OST fallocate mode=1 can also be set to use "zeroed extents",
which may be handled by "WRITE SAME", "TRIM zeroes data", or
other low-level functionality in the underlying block device.
This is somewhat slower at fallocate() time (especially for very
large allocations), but still avoids sending any data over the
network, avoids runtime overhead from managing the extents.  There
is not yet an FALLOC_FL_* flag to request this behavior from the
client on a per-file basis.

If problems are hit in the field, fallocate can also be disabled
with mode=-1 at runtime or persistently.

   lctl set_param [-P] osd-ldiskfs.*.fallocate_zero_blocks=<mode>

Ensure that all of the tests which currently use fallocate() are
enabling it for test runs, even if the default changes again.

Fixes: 4f18e08099e5 ("LU-14286 osd-ldiskfs: fallocate with unwritten extents")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iefa71c525597d54fc82a3d6de27a50d4d2ce7057
Reviewed-on: https://review.whamcloud.com/41315
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14337 lov: return stripe_count=1 instead of 0 for DoM files 65/41265/2
Emoly Liu [Tue, 19 Jan 2021 04:06:06 +0000 (12:06 +0800)]
LU-14337 lov: return stripe_count=1 instead of 0 for DoM files

Return stripe_count=1 instead of 0 for DoM files to avoid
divide-by-zero for older userspace that calls this ioctl,
e.g. lustre ADIO driver.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I43c5e01bdee834f9a05a669a3e6f3d5cd926cb87
Reviewed-on: https://review.whamcloud.com/41265
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14283 osc: avoid crash if ocd reset 25/41225/4
Andreas Dilger [Thu, 14 Jan 2021 16:01:21 +0000 (09:01 -0700)]
LU-14283 osc: avoid crash if ocd reset

Avoid divide-by-zero if OSC obd_connect_data is not fully initialized.
cl_ocd_grant_param is only set after cl_max_extent_pages is OK to use.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibcee30b46e24ca3d4c2b571b27f3c0bb43f4bf71
Reviewed-on: https://review.whamcloud.com/41225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14339 obdclass: add option %H for jobid 62/41262/3
Yang Sheng [Mon, 18 Jan 2021 17:46:05 +0000 (01:46 +0800)]
LU-14339 obdclass: add option %H for jobid

Add a option %H to avoid jobid too long in some cases.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iaf70da5de25fd321a21e6e6cd7f7d211dca1adf3
Reviewed-on: https://review.whamcloud.com/41262
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
10 months agoLU-14306 sec: get rid of bad rss-counter state messages 99/41199/12
Sebastien Buisson [Mon, 11 Jan 2021 09:36:06 +0000 (09:36 +0000)]
LU-14306 sec: get rid of bad rss-counter state messages

When doing O_DIRECT IOs on encrypted files, messages about bad
rss-counter state can be seen in the console. The mm get confused
because we twist the Lustre pages used for RPCs so that they are
suitable for llcrypt API.
In order to do this properly, the original mapping on these pages
must be preserved outside of the encryption/decryption needs.

Fixes: 728036f256 ("LU-12275 sec: O_DIRECT for encrypted file")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I80ebcd3f96c51a3d158d7ef66f23b8da13904c52
Reviewed-on: https://review.whamcloud.com/41199
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14286 osd-ldiskfs: fallocate with unwritten extents 04/41204/10
Andreas Dilger [Tue, 12 Jan 2021 04:38:31 +0000 (21:38 -0700)]
LU-14286 osd-ldiskfs: fallocate with unwritten extents

The osd_fallocate() code should typically be allocating unwritten
extents with LDISKFS_GET_BLOCKS_CREATE_UNWRIT_EXT instead of actually
zeroing the blocks on disk with LDISKFS_GET_BLOCKS_CREATE_ZERO.

Writing zeroes during fallocate() is typically slower initially, and
is causing timeouts in sanity test_150e, which is trying to fill up
all OSTs to 90%.  In some cases, zeroing the underlying blocks can
use the underlying storage support for efficient zeroing (WRITE_SAME),
so it may be faster for later use than uninitialized extents that have
to be converted to initialized extents by (possibly) splitting them
into smaller extents and/or zero filling them when they are paritally
being overwritten.

Add a tunable parameter osd-ldiskfs.*.fallocate_zero_blocks to allow
selecting this behavior at runtime.  The default is -1, to disable
fallocate completely (return -EOPNOTSUPP) due to current bugs.

Test-Parameters: testlist=sanityn env=ONLY=16,ONLY_REPEAT=10
Fixes: 72617588ac8c ("LU-14286 osd-ldiskfs: fallocate() should zero new blocks")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ida3692c487fdc8918863fc5c99459caaba17d92e
Reviewed-on: https://review.whamcloud.com/41204
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14312 ldlm: don't change GROUP lock GID on client 68/41268/5
Mikhail Pershin [Tue, 19 Jan 2021 14:21:57 +0000 (17:21 +0300)]
LU-14312 ldlm: don't change GROUP lock GID on client

GROUP lock GID is part of inodebits policy and is passed
to the server from client in policy li_gid field.
Meanwhile the ldlm_ibits_policy_wire_to_local() is used on
client also when server reply or completion AST is processed,
so client original GID can be overwritten by server value.
This is not problem if both server and client have the same
Lustre version but if server is older then it can have garbage
in li_gid field and client lock policy is updated with it.

Considering that GROUP lock GID is never changed and server should
not do that, the solution is to ignore returned li_gid from server
and never update original GID of GROUP lock on client from server
response.

Test-Parameters: testlist=sanity serverversion=2.12.6 env=ONLY=272b
Test-Parameters: testlist=sanity serverversion=2.13.0 serverdistro=el7.7 env=ONLY=272b

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9a82f4b3513fd93d63b92a9527cb7b89c635e61b
Reviewed-on: https://review.whamcloud.com/41268
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-14034 hsm: add PID file handling to lhsmtool_posix 53/40253/9
John L. Hammond [Tue, 29 Dec 2020 02:10:31 +0000 (21:10 -0500)]
LU-14034 hsm: add PID file handling to lhsmtool_posix

Add pid-file handling to lhsmtool_posix to prevent accidentally
running concurrent instances of the copytool. (Multiple instances are
still allowed if you do not use PID files or use separate files.)

Use the PID file to avoid needing libtool when stopping, continuing,
or killing the copytool from test scripts.

Remove unnecessary libtool usage from test scripts.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I4f012a88c58f3a86a731df3b7d35ff32db047c2d
Reviewed-on: https://review.whamcloud.com/40253
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14326 osc: correctly update size/kms for fallocate 72/41272/2
Bobi Jam [Fri, 15 Jan 2021 05:33:17 +0000 (13:33 +0800)]
LU-14326 osc: correctly update size/kms for fallocate

* fallocate chose oa->o_size for falloc_offset and o->o_blocks for
  falloc_end, but forgot to change attr->cat_size and attr->cat_kms
  to use sa_attr.lvb_size to update osc's lvb and kms if it expands
  the file's size.

  Other setattr IO uses @size (sa_falloc_offset in fallocate case) to
  update the lvb and kms.

* lock request extent for fallocate should be
  [sa_falloc_offset, sa_falloc_end)

* calculate sa_attr.lvb_size correctly for osc objects
  (lov_io_sub_inherit())

Test-Parameters: testlist=sanityn env=ONLY=16,COUNT=50000,ONLY_REPEAT=10
Test-Parameters: testlist=sanity-benchmark env=ONLY=fsx,COUNT=50000,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Fixes: 48457868a02a ("LU-3606 fallocate: Implement fallocate preallocate operation")
Change-Id: I7dbed3bc6899a3db53284c8aac3cb9476e7958f5
Reviewed-on: https://review.whamcloud.com/41272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14303 tests: parallel-scale test rr_alloc fails 92/41192/2
Yang Sheng [Sun, 10 Jan 2021 15:50:44 +0000 (23:50 +0800)]
LU-14303 tests: parallel-scale test rr_alloc fails

Correct the parameter for DNE environment. Else the
test case will fail on 'No such file or directory'.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ib94c5b17a3b49153ac229bfc4dfcee39bd9f60d4
Reviewed-on: https://review.whamcloud.com/41192
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14334 lnet: update changelog 63/41263/4
Serguei Smirnov [Mon, 18 Jan 2021 19:20:47 +0000 (11:20 -0800)]
LU-14334 lnet: update changelog

Updated changelog to indicate changes in OFED/MOFED support
as well as new features added in this version of LNet.

Test-Parameters: trivial
Change-Id: I264f6566324da42fd51a8e159d172cbf0ae1a28b
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41263
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14283 obdclass: connect vs disconnect race 56/41256/3
Wang Shilong [Sat, 16 Jan 2021 12:46:13 +0000 (20:46 +0800)]
LU-14283 obdclass: connect vs disconnect race

There might be a possible race if setup (connect)
and cleanup (disconnect) are tangled together(similar
comments in osc_disconnect()):

  Thread1: Thread2:
   connecting  class_cleanup
     ptlrpc_connect_interpret
   obd->obd_setup = 0
obd_import_event
  if (obd->obd_set_up)
osc_init_grant() /*skipped*/
        ptlrpc_activate_import..

And If RPC was waked up and send out before
class_disconnect_exports(), It might hit divide zero crash
in osc_announce_cached() because @cl_max_extent_pages is zero.

The problem is we clear @obd_setup too early, It should be cleared
when OBD is really shutdown.

Fixes: 45900a ("LU-4134 obdclass: obd_device improvement")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I898b6f53602c05221a3154a61615a0e270167ac6
Reviewed-on: https://review.whamcloud.com/41256
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-13602 flr: skip unknown FLR component types 13/39513/8
Qian Yingjin [Mon, 27 Jul 2020 03:56:22 +0000 (11:56 +0800)]
LU-13602 flr: skip unknown FLR component types

Currently, in lov_init_composite() it will quit with error when
reading an unknown LOV pattern.
Since FLR will be used for upcoming new features, like PCC-RO,
an old client should be able to read the old format of the
component types, ignore and skip the new types of FLR component.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ica3fe98203d44b52cf25b085c34c83b1a4702464
Reviewed-on: https://review.whamcloud.com/39513
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12125 llite: send file mode with rename RPC 84/41184/5
Andreas Dilger [Sat, 9 Jan 2021 07:23:39 +0000 (00:23 -0700)]
LU-12125 llite: send file mode with rename RPC

In preparation for parallel rename operations, send renamed file mode
to the MDS in order to allow the rename locking to be more efficient.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4691f30d151a8ff81e443d24109234341b3ebbe5
Reviewed-on: https://review.whamcloud.com/41184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-14286 osd-ldiskfs: don't read unwritten blocks 16/41216/4
Alex Zhuravlev [Wed, 13 Jan 2021 15:48:54 +0000 (18:48 +0300)]
LU-14286 osd-ldiskfs: don't read unwritten blocks

which were allocated using fallocate(2), instead fill
the pages with zeroes.

Add a test to verify that fallocated blocks read as zeroes.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9c0c90f93fd33f26f834144e225b2643cf9fffb7
Reviewed-on: https://review.whamcloud.com/41216
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-14194 cksum: add lprocfs checksum support in MDC/MDT 71/40971/5
Mikhail Pershin [Tue, 15 Dec 2020 13:56:21 +0000 (16:56 +0300)]
LU-14194 cksum: add lprocfs checksum support in MDC/MDT

Add missed support for checksum parameters in MDC and MDT
Handle T10-PI parameters in MDT similar to OFD, move all
functionality to target code and unify its usage in both
targets

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7d397067304e028bf597d5c3ab16250731ccba9d
Reviewed-on: https://review.whamcloud.com/40971
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>