Whamcloud - gitweb
fs/lustre-release.git
2 months agoLU-14585 tests: use striped ha_test_dir 16/43216/2
Elena Gryaznova [Tue, 6 Apr 2021 07:13:41 +0000 (10:13 +0300)]
LU-14585 tests: use striped ha_test_dir

currently ha.sh has the ability to create the clients
striped directories with different stripe parameters.
We also need the ability to create working striped
directory (ha_test_dir) to have the chance to replicate
the issues hit on directory tree where child directory
stripe settings differ from parent ones.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9621
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I77612c360f0bf407bd0298f827b409b4a288540f
Reviewed-on: https://review.whamcloud.com/43216
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-8066 lov: move stripesize to sysfs 12/43212/3
James Simmons [Mon, 5 Apr 2021 21:04:27 +0000 (17:04 -0400)]
LU-8066 lov: move stripesize to sysfs

One lonely simple proc file exist for lov. Move stripesize to
sysfs.

Change-Id: I4db660f0e7af4d69c697f8c73547a245108adb9b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43212
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-13783 lnet: remove kernel_setsockopt() from lnet_sock_listen() 07/43207/2
Jian Yu [Mon, 5 Apr 2021 08:16:54 +0000 (01:16 -0700)]
LU-13783 lnet: remove kernel_setsockopt() from lnet_sock_listen()

Linux 5.8 removes kernel_setsockopt(). In Lustre commit
99d9638d6c0, kernel_setsockopt() was removed from Lustre codes
by using direct access or helper calls. However, the one in
lnet_sock_listen() was not removed. This patch removes it by
using the codes from previously defined kernel_setsockopt()
directly.

Fixes: 99d9638d6c0 ("LU-13783 libcfs: support removal of kernel_setsockopt()")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I5b74bf113185c0c9af5c81ca6cd346f1be7a4720
Reviewed-on: https://review.whamcloud.com/43207
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14546 tests: set smbpasswd correctly 32/42132/3
Elena Gryaznova [Mon, 22 Mar 2021 17:34:50 +0000 (20:34 +0300)]
LU-14546 tests: set smbpasswd correctly

For RHEL 8.2 the smbpasswd tool requires password
confirmation.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9761
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I367fca4253f1b06dae220921106d321e52d10bc7
Reviewed-on: https://review.whamcloud.com/42132
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14545 tests: erase failover.node parameter 31/42131/2
Elena Gryaznova [Mon, 22 Mar 2021 17:16:18 +0000 (20:16 +0300)]
LU-14545 tests: erase failover.node parameter

On the "failover pair" Lustre configuration, like
    mds1_HOST=host1
    mds2_HOST=host2
    mds1failover_HOST=host2
    mds2failover_HOST=host1
fs2mdsdev can not be mounted on mds1_HOST because of fs2mdsdev
has failover.node=<mds1_HOST>. Mount fails as:
    Denying initial registration attempt from nid <mds1_HOST>@tcp,
    specified as failover.

Patch erases fs2mdsdev failover.node parameter.

Test-Parameters: trivial testlist=conf-sanity envdefinitions=ONLY=32
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9821
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Id96cb12704fc79f32ea998702bb457d2f683a7c8
Reviewed-on: https://review.whamcloud.com/42131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14536 obi2lnd: don't try to reconnect if there's no listener 11/42111/2
Li Dongyang [Fri, 19 Mar 2021 10:21:58 +0000 (21:21 +1100)]
LU-14536 obi2lnd: don't try to reconnect if there's no listener

For each discovery we try to reconnect up to retry_count times,
default to 5. during MDT mount process conf log, there will be
multiple discovery made for each OST.
If the OSTs are not up, the mount will have a long time out.

Change-Id: If1d854216d2f26089c52d3fb501092b7f48a444d
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/42111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14536 o2iblnd: don't resend if there's no listener 09/42109/2
Li Dongyang [Fri, 19 Mar 2021 09:26:28 +0000 (20:26 +1100)]
LU-14536 o2iblnd: don't resend if there's no listener

If there's no listener at remote peer, we will
get IB_CM_REJ_INVALID_SERVICE_ID, currently we
will try to resend which makes the discovery longer
than necessary when connecting to a node which is
not up.
Use -EHOSTUNREACH instead of -ECONNREFUSED,
so we don't end up queued for resend.

Change-Id: Ifaf14bc3ada2e2469669285917e366af669817e2
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/42109
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14366 mdt: lfs mkdir should return -EEXIST if exists 29/41329/6
Lai Siyao [Sat, 23 Jan 2021 10:28:26 +0000 (18:28 +0800)]
LU-14366 mdt: lfs mkdir should return -EEXIST if exists

'lfs setdirstripe' will try restripe if target exists, however
it's confusing to get -ENOTSUPP or -EALREADY for 'lfs mkdir', while
the latter invokes the same function as 'lfs setdirstripe'.

Pack MDS_OPEN_CREAT flag in request for 'lfs mkdir', and MDT won't
try restripe if it's set.

Add sanity 230s.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7b7ed04ee0b150253ff4d13bbdf1fe847d8f577c
Reviewed-on: https://review.whamcloud.com/41329
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13780 lnet: Leverage peer aliveness more efficiently 50/39350/6
Chris Horn [Fri, 10 Jul 2020 18:52:01 +0000 (13:52 -0500)]
LU-13780 lnet: Leverage peer aliveness more efficiently

When an LNet router is revived after going down, remote peers may
discover it is alive before we do. Thus, remote peers may use it
as a next-hop, and we may start receiving messages from it while we
still consider it to be dead. We should mark router peers as alive
when we receive a message from them.

If an LNet router does not respond to a discovery ping, then we
currently mark all of its NIs as DOWN. This can actually slow down
the process of returning a route to service. If we receive a message
from a router, in the manner described above, then we can safely
return the router to service. We already set the status of the router
NI we received the message from to UP, but the remote NIs will still
be DOWN and thus the route will be considered down until we get a
reply to the next discovery ping.

When selecting a route, we only consider the aliveness of a gateway's
remote NIs if avoid_asym_router_failure is enabled and the route is
single-hop. In this case, as long as the gateway has at least one
alive NI on the remote network then the route is considered UP. In
the situation described above, we know the router has at least one
NI alive because it was used to forward a message from a remote peer.
Thus, when we receive a forwarded message from a router, we can
reasonably set the NI status of all of its NIs that are on the same
peer net as the message originator to UP. This does not impact the
route status of any multi-hop routes because we do not consider the
aliveness of remote NIs for multi-hop routes.

Similarly, we can set the cached lr_alive value to up for any routes
whose lr_net matches the net ID of the message originator NID. This
variable is converted to an atomic_t to get rid of the need for
global locking when updating it.

Test-Parameters: trivial
HPE-bug-id: LUS-9088
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0170762d78d80e4b70724799cd1ee1301118f25c
Reviewed-on: https://review.whamcloud.com/39350
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14175 osd: print inode number with FID in OI scrub 53/43153/4
Andreas Dilger [Sat, 27 Mar 2021 19:50:33 +0000 (13:50 -0600)]
LU-14175 osd: print inode number with FID in OI scrub

When debugging OI Scrub problems, also print the inode number
with the FID so that it is easier to find the problematic inode.
Otherwise, if the OI is broken it is not easy to find the inode
in question without a full filesystem scan.

Test-Parameters: trivial testlist=sanity-scrub,sanity-lfsck
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I217624ff2116326f86e053bcfacc6f19873ebbe5
Reviewed-on: https://review.whamcloud.com/43153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13785 lnet: Use lr_hops for avoid_asym_router_failure 62/39362/7
Chris Horn [Tue, 14 Jul 2020 04:08:28 +0000 (23:08 -0500)]
LU-13785 lnet: Use lr_hops for avoid_asym_router_failure

In order for the asymmetric route failure avoidance feature to work
properly it needs to know what the hop count of a route should be.
This information is defined by the lr_hops field of the lnet_route.
The lr_single_hop is what discovery was able to determine the hop
count actually is (single or multi) based on the last ping reply.
If a remote interface on a router goes missing, the route may be
classified as multi-hop by discovery, but it should be considered
single-hop for the purposes of avoiding asymmetric route failure.

HPE-bug-id: LUS-9099
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9c255f9a2175d964661850277808dae96ff7735c
Reviewed-on: https://review.whamcloud.com/39362
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14579 flr: GPF in lod_sub_declare_destroy 99/43199/5
Bobi Jam [Fri, 2 Apr 2021 04:47:32 +0000 (12:47 +0800)]
LU-14579 flr: GPF in lod_sub_declare_destroy

mirror split and unlink race reveals some problem:

- in mdd_unlink(), protect mdd_declare_unlink() in mdd_write_lock.

- lod_parse_striping() need to free lod's striping in memroy before
  parsing on-disk LOVEA.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I44182eb9139c35f57d711ef5f7db65c0ccfca56c
Reviewed-on: https://review.whamcloud.com/43199
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14119 lfsck: check linkea if it's newly added 61/41261/10
Lai Siyao [Thu, 14 Jan 2021 09:14:01 +0000 (17:14 +0800)]
LU-14119 lfsck: check linkea if it's newly added

In LFSCK phase one, if new linkea entry is added, and final linkea
entry count is more than one, add file in trace file, so that the
linkea sanity will be checked in phase two.

And in phase two check, if link parent FID can't be mapped to valid
inode, remove it from linkea.

Add sanity-lfsck 1d, which changed parent directory FID in LMA,
therefore the FID in LMA mismatches with parent FID in child linkea,
verify LFSCK can fix such inconsistency.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I315983d262110c1e36c3893fa2e51925d96c51a7
Reviewed-on: https://review.whamcloud.com/41261
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14577 ldiskfs: support Ubuntu 20.04 kernel 5.4.0-1007 87/43187/2
Jian Yu [Thu, 1 Apr 2021 01:50:31 +0000 (18:50 -0700)]
LU-14577 ldiskfs: support Ubuntu 20.04 kernel 5.4.0-1007

While applying 5.4.0-66-ubuntu20.series ldiskfs patches
to kernel 5.4.0-1007, there is a conflict in
ext4_update_dx_flag() in ubuntu2004/ext4-pdirop.patch.
It turns out the ext4_update_dx_flag() codes in kernel
5.4.0-1007 are the same with those in kernel version
smaller than 5.4.0-66. So, 5.4.0-42-ubuntu20.series works.

This patch fixes lustre-build-ldiskfs.m4 to detect
5.4.0-42-ubuntu20.series for kernel 5.4.0-1007.

Test-Parameters: trivial

Change-Id: I3cd932b8ae2d7c7f4f900b8b18647a4252d100b2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43187
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14575 ofd: suppress errors on missing parent FID 84/43184/2
John L. Hammond [Wed, 31 Mar 2021 17:45:05 +0000 (12:45 -0500)]
LU-14575 ofd: suppress errors on missing parent FID

In ofd_access(), if the parent FID is zero then skip adding an entry
to the OFD access log.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib518dc1f181a820d99021dd58ab548916e16f29d
Reviewed-on: https://review.whamcloud.com/43184
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14550 libcfs: fix setting of debug_path 09/43109/3
Andreas Dilger [Thu, 25 Mar 2021 06:39:07 +0000 (00:39 -0600)]
LU-14550 libcfs: fix setting of debug_path

While it was possible to set "lctl set_param debug_path=path" or
"echo path > /sys/module/libcfs/parameters/libcfs_debug_file_path"
this change does not affect the path used to dump debug logs.

Connect these parameters to the pathname used for the debug log.

Test-Parameters: testlist=sanity env=ONLY=60f,ONLY_REPEAT=30
Fixes: 7092309f325 ("LU-8066 libcfs: migrate to debugfs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic18b5b24d1ac939c09637e66a342f5e3622367c3
Reviewed-on: https://review.whamcloud.com/43109
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14521 flr: delete mirror without volatile file 16/42116/4
Bobi Jam [Fri, 19 Mar 2021 10:22:10 +0000 (18:22 +0800)]
LU-14521 flr: delete mirror without volatile file

Rather than opening a volatile file to delete a FLR mirror, this
patch delete sub objects on the specified mirror directly during
the mirror deletion handling.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7a5e7488dbc820fdfa312218f363955a35752034
Reviewed-on: https://review.whamcloud.com/42116
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14530 kernel: kernel update SLES12 SP5 [4.12.14-122.63.1] 58/42058/2
Jian Yu [Wed, 17 Mar 2021 01:03:09 +0000 (18:03 -0700)]
LU-14530 kernel: kernel update SLES12 SP5 [4.12.14-122.63.1]

Update SLES12 SP5 kernel to 4.12.14-122.63.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I67ab524ff2dc94c649bc970c7bb1d83009828880
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/42058
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14529 kernel: kernel update SLES15 SP2 [5.3.18-24.52.1] 55/42055/2
Jian Yu [Wed, 17 Mar 2021 00:50:00 +0000 (17:50 -0700)]
LU-14529 kernel: kernel update SLES15 SP2 [5.3.18-24.52.1]

Update SLES15 SP2 kernel to 5.3.18-24.52.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 130 136 817" \
clientdistro=sles15sp2 serverdistro=el7.9 \
testlist=sanity

Change-Id: Ifbcfdac3e7dedeb5bde9f4a31575ad5008518c80
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/42055
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9859 libcfs: simplify capability dropping. 57/41957/3
NeilBrown [Mon, 1 Mar 2021 14:15:45 +0000 (09:15 -0500)]
LU-9859 libcfs: simplify capability dropping.

Lustre has a 'squash credentials' concept similar to the "anon_uid"
for nfsd.  When accessing a file with squashed credentials, we
need to also drop capabilities.
Linux has cap_drop_fs_set() and cap_drop_nfsd_set().  Rather than
taking a completely different approach, this patch changes lustre
to use this same cap_drop_*_set() approach.

With this change we also drop CAP_MKNOD and CAP_MAC_OVERRIDE
which are probably appropriate, and don't drop
CAP_SYS_ADMIN or CAP_SYS_BOOT which should be irrelevant for
file permission checking

Calling both cap_drop_*_set() seems a bit clumsy, but gets
the job done.

Linux-commit: f497115d4cf8a430c5d9902ce35716ba5f9c21ef

Change-Id: I2f4f691bc4ad090f6abaa4e13eb473bf8d904b23
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/41957
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-8837 lmv: don't include struct lu_qos_rr in client 50/41950/6
Mr NeilBrown [Mon, 8 Mar 2021 23:05:23 +0000 (10:05 +1100)]
LU-8837 lmv: don't include struct lu_qos_rr in client

The 'lqrr' field in 'struct lu_qos' is not used on the client (lmv).
So make it server-only for use in lod.

- move 'struct lu_qos_rr' into lu_target.h
- protect lq_rr with HAVE_SERVER_SUPPORT
- make lu_qos_rr_init() a static-inline in lu_target,
  and call it explicitly from lod instead of from lu_tgt_descs_init()
- protect setting of LQ_DIRTY to in lu_tgt_descs.c with
  HAVE_SERVER_SUPPORT

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I019cac560d68688042a02a53bd96b605909acdcd
Reviewed-on: https://review.whamcloud.com/41950
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14428 libcfs: discard cfs_trace_allocate_string_buffer 91/41491/6
Mr NeilBrown [Tue, 9 Feb 2021 01:06:24 +0000 (12:06 +1100)]
LU-14428 libcfs: discard cfs_trace_allocate_string_buffer

cfs_trace_allocate_string_buffer() is now only used once, and it is a
simple wrapper for kmalloc().  So discard it and use kmalloc directly.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4658f58d3e073be2092e0af1de0d6ecec15da6a6
Reviewed-on: https://review.whamcloud.com/41491
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13641 socklnd: replace route construct 74/40774/12
Serguei Smirnov [Tue, 30 Mar 2021 16:48:04 +0000 (12:48 -0400)]
LU-13641 socklnd: replace route construct

With TCP bonding removed, it's no longer necessary to
maintain multiple route constructs per peer_ni in socklnd.
Replace the route construct with connection control block,
conn_cb, and make sure there's a single conn_cb per peer_ni.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1de683429af5f93b3197b6d536e80b5ac1e67a22
Reviewed-on: https://review.whamcloud.com/40774
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14093 utils: trivial changes to support gcc10 85/40485/6
Alex Zhuravlev [Fri, 30 Oct 2020 06:48:27 +0000 (09:48 +0300)]
LU-14093 utils: trivial changes to support gcc10

just to fix gcc10 complains.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8390d28fe78c9dad15a41301cc2b6d6184fdc330
Reviewed-on: https://review.whamcloud.com/40485
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-13903 utils: have liblustreapi support Linux client 07/39207/4
James Simmons [Thu, 11 Mar 2021 03:20:58 +0000 (22:20 -0500)]
LU-13903 utils: have liblustreapi support Linux client

Handle two difference between the Linux client and the OpenSFS
branch. The Linux client doesn't support the LL_IOC_REMOVE_ENTRY
since its considered a security risk by the VFS maintainers. The
other difference is that struct getinfo_fid2path doesn't name its
union structure which is typical in the Linux kernel.

Test-Parameters: trivial
Change-Id: Ieb9ea749e8728f33621fd6ec28f78892466ec7a4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39207
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13107 uapi: remove obsolete ioctls 07/37107/6
Andreas Dilger [Mon, 29 Mar 2021 13:12:50 +0000 (09:12 -0400)]
LU-13107 uapi: remove obsolete ioctls

Remove some very obsolete ioctl number definitions.

Highlight that some of the ioctl codes conflict with upstream
FSVerity ioctls that we may need to implement in the future.
This does not necessarily mean the actual ioctl numbers will
conflict (the "size" field can disambiguate them), but we should
avoid this range when adding new ioctls.

Migrate llapi_file_fget_lov_uuid() over to a new OBD_IOC_GETDTNAME
ioctl number to fix one of the conflicting ioctl numbers.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic330085b868dd6e284344b3fccd72dd958ecab07
Reviewed-on: https://review.whamcloud.com/37107
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14055 lmv: reduce struct lmv_obd size 62/41162/6
Andreas Dilger [Thu, 7 Jan 2021 12:26:28 +0000 (05:26 -0700)]
LU-14055 lmv: reduce struct lmv_obd size

The lmv_obd struct contains lmv_mdt_descs which is large enough
to reference 512 * 512 = 262144 targets, but there can be only
65536 OSTs or MDTs in a single filesystem today.

Shrink the allocation size to match the current limits, reducing
the size of obd_device.u since this is the largest union member.

This reduces the size of each obd_device from 6752 to 4568 bytes.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I752b021bdb5d02e3ead3bb266121be5dbf3ebbe5
Reviewed-on: https://review.whamcloud.com/41162
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14588 o2ib: make config script aware of the ofed symbols 23/43223/2
Serguei Smirnov [Tue, 6 Apr 2021 22:54:01 +0000 (15:54 -0700)]
LU-14588 o2ib: make config script aware of the ofed symbols

LNet o2ib configuration script needs to be aware of the external
ofed dkms symbols when testing for availability of o2ib features
by building "conftest" kernel objects. If this is not done,
symbols from the core kernel are used by default which is
different from what is used when actually building LNet,
at least on Ubuntu. This patch adds the check for external symbols.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iea566f8a3feb86b8bef2f4501a3abc968d76451a
Reviewed-on: https://review.whamcloud.com/43223
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14547 test: skip sanityn 109 for local setup 42/43142/2
Etienne AUJAMES [Fri, 26 Mar 2021 19:24:07 +0000 (20:24 +0100)]
LU-14547 test: skip sanityn 109 for local setup

Test 109 sanityn need to unload obdclass module to run. This test is
imcompatible with a local test setup.

Fixes: 881551f ("LU-14110 obdclass: Protect cl_env_percpu[]")
Test-Parameters: trivial testlist=sanityn env=ONLY=109
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Icf3eff28282c771cc14aa48fefeadea55882e082
Reviewed-on: https://review.whamcloud.com/43142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14552 ptlrpc: NULL pointer dereference in ptlrpc_watchdog_fire 15/43115/4
Andriy Skulysh [Mon, 1 Mar 2021 21:41:33 +0000 (23:41 +0200)]
LU-14552 ptlrpc: NULL pointer dereference in ptlrpc_watchdog_fire

thread->t_task isn't initialized by target_recovery_thread()

Change-Id: Ia38d5ccaab6b9332a1fd60ebe5ed2461f7d5db84
HPE-bug-id: LUS-9748
Fixes: 0496cdf20 ("LU-13608 tgt: abort recovery while reading update llog")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/43115
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-14540 o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED 14/42114/3
Chris Horn [Fri, 19 Mar 2021 18:22:26 +0000 (13:22 -0500)]
LU-14540 o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED

ECONNREFUSED means that we received a response from the remote end,
so setting the LNet health status to REMOTE_DROPPED is more
appropriate than setting LOCAL_DROPPED. Using REMOTE_DROPPED will
decrement the peer NI health and allow us to try other peer NIs for
future sends.

Decrementing the peer NI health will also result in routes being
marked down, as appropriate, for cases where a router has refused the
connection request.

Test-Parameters: trivial
HPE-bug-id: LUS-9853
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8190f5d78a76ec25553908c4f215362c0c2051fc
Reviewed-on: https://review.whamcloud.com/42114
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14538 gss: make namespace optional in lgss_keyring 12/42112/2
Sebastien Buisson [Fri, 19 Mar 2021 14:46:58 +0000 (15:46 +0100)]
LU-14538 gss: make namespace optional in lgss_keyring

Introduce a new tunable 'sptlrpc.gss.gss_check_upcall_ns' to
make namespace support optional in lgss_keyring.
By default it is set to 1, which means adopt the standard behavior,
consisting in checking caller's namespace and switching namespace
if necessary.
When the tunable is set to 0, lgss_keyring sticks to the current
namespace.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib9d4e47935a718d4aae31fbb0d13f6bc8a4005a5
Reviewed-on: https://review.whamcloud.com/42112
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14522 ldlm: reprocess locks if enqueue failed 31/42031/7
Alex Zhuravlev [Sun, 14 Mar 2021 04:29:11 +0000 (07:29 +0300)]
LU-14522 ldlm: reprocess locks if enqueue failed

if the export got disconnected during enqueue, ldlm_handle_enqueue0()
drops the lock, but can skip reprocessing and this way all subsequent
waiting locks conflicting with the dopped one may get stuck.

with the patch most of racers succeed, otherwise 1/4 of runs get stuck

Fixes: 37932c4beb ("LU-10175 ldlm: IBITS lock convert instead of cancel")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I584b0de2656840da5dfa86a894fe02f138e1389d
Reviewed-on: https://review.whamcloud.com/42031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14487 lustre: remove references to Sun Trademark. 80/41880/3
Mr NeilBrown [Thu, 4 Mar 2021 02:51:23 +0000 (13:51 +1100)]
LU-14487 lustre: remove references to Sun Trademark.

"lustre" is no longer a Trademark of Sun Microsystems.  There is no
need to acknowledge the trademark is every file, so just remove all
these claims.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I214670b39c5718f2b691193f268a64856e0cd743
Reviewed-on: https://review.whamcloud.com/41880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-14450 kernel: kernel update RHEL8.3 [4.18.0-240.15.1.el8_3] 04/41704/5
Jian Yu [Wed, 24 Feb 2021 18:43:59 +0000 (10:43 -0800)]
LU-14450 kernel: kernel update RHEL8.3 [4.18.0-240.15.1.el8_3]

Update RHEL8.3 kernel to 4.18.0-240.15.1.el8_3.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.3 serverdistro=el8.3 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.3 serverdistro=el8.3 testlist=sanity

Change-Id: I92ca7769fac17221da376788cfe79887ecc4c19c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14119 osd: add mount option "resetoi" 02/41402/8
Lai Siyao [Wed, 3 Feb 2021 03:44:15 +0000 (11:44 +0800)]
LU-14119 osd: add mount option "resetoi"

OI files on zfs are special, and they can't be deleted by user space
tools like rm. Sometimes the OI files may contain stale OI mappings,
and they needed to be removed for namespace consistency. Add a mount
option 'resetoi' to recreate OI files on mount time, and it will
support both ldiskfs and zfs. This should be the standard way to
recreate OI files, other than mount as backend filesystem and unlink
them manually.

Add sanity-scrub 17.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idc0e4c2f3b81675c49c6c005bc30b61d8fd04503
Reviewed-on: https://review.whamcloud.com/41402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14119 osd: delete stale OI mapping entry 41/41741/4
Lai Siyao [Wed, 24 Feb 2021 03:31:06 +0000 (11:31 +0800)]
LU-14119 osd: delete stale OI mapping entry

Once LMA check shows OI mapping entry is stale, delete it from
OI table, as can avoid removing whole OI files.

Don't add OI mapping into cache until osd_fid_lookup(), because
the mapping in OI is not trustable until FID in LMA is checked,
otherwise it may mislead LFSCK.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I4b50dcc02149d485e4bf4a361ca2994daa280feb
Reviewed-on: https://review.whamcloud.com/41741
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14119 osd-zfs: enable LUDA_VERIFY 74/41274/3
Lai Siyao [Tue, 19 Jan 2021 13:37:50 +0000 (21:37 +0800)]
LU-14119 osd-zfs: enable LUDA_VERIFY

In osd_dir_it_rec(), if dirent is successfully got, and the FID in
dirent is sane, it returns right away, however if
LUDA_VERIFY|LUDA_VERIFY_DRYRUN is set, the FID in dirent should be
compared with the FID in LMA, and replaced with the latter one if
they are differet.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I35e2a4d4606044cd37cc5847cffc577740918988
Reviewed-on: https://review.whamcloud.com/41274
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14119 mdc: set fid2path RPC interruptible 19/41219/3
Lai Siyao [Wed, 13 Jan 2021 09:29:50 +0000 (17:29 +0800)]
LU-14119 mdc: set fid2path RPC interruptible

Sometimes OI scrub can't fix the inconsistency in FID and name, and
server will return -EINPROGRESS for fid2path request. Upon such
failure, client will keep resending the request. Set such request
to be interruptible to avoid deadlock.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I82192cb8a8256064ca632cabfe5581b12e86423b
Reviewed-on: https://review.whamcloud.com/41219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 months agoLU-14291 ptlrpc: format UPDATE messages in server-only code 25/41125/7
Mr NeilBrown [Fri, 30 Oct 2020 04:01:00 +0000 (15:01 +1100)]
LU-14291 ptlrpc: format UPDATE messages in server-only code

There are some ptlrpc messages that are only used for targets to
communicate with each other: Object Updates between Targets (OUT).

These are never needed by the client, so the code for handling them
can be conditionally compiled with HAVE_SERVER_SUPPORT.

The code in layout.c needs struct declaration that are in the file, so
group them at the end of the file and add #ifdef.
The code in pack_generic.c can stand alone, so move it to a new
pack_server.c and compile that only when server code is requested.

For simplicity, also make req_check_sepol() completely server-side
and provide an inline stub for client-only code.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I788352575a2109df389760fff45207ad6de3391b
Reviewed-on: https://review.whamcloud.com/41125
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14195 libcfs: switch to kfree_sensitive 08/40908/5
Mr NeilBrown [Wed, 9 Dec 2020 01:49:13 +0000 (12:49 +1100)]
LU-14195 libcfs: switch to kfree_sensitive

In Linux 5.10, kzfree() has been renamed kfree_sensitive().

So switch to the new name and provide back-compat support for older
kernels.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If665168477a0b6241a8ddf31a111cd465fe97783
Reviewed-on: https://review.whamcloud.com/40908
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13783 libcfs: provide fallback kallsyms_lookup_name() 26/40826/6
Mr NeilBrown [Tue, 2 Mar 2021 00:49:01 +0000 (11:49 +1100)]
LU-13783 libcfs: provide fallback kallsyms_lookup_name()

Since Linux 5.7, kallsyms_lookup_name() is no longer exported, so we
cannot rely on it.

So test for this, and when not available provide a fallback which just
returns NULL.

As this was the only way to access apply_workqueue_attrs() in recent
kernels, we need to cope with the absence of that function.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I09cc00047ec163a9395c5acd415505a8586e4e99
Reviewed-on: https://review.whamcloud.com/40826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-14132 lod: do not initialize sub llogs twice 05/40605/12
Alex Zhuravlev [Wed, 11 Nov 2020 08:00:23 +0000 (11:00 +0300)]
LU-14132 lod: do not initialize sub llogs twice

this can happen during MDT re-activation and then result in leaked
objects:
lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 )

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0afb335ffb20532f9171dd2e514100b12f4d9a76
Reviewed-on: https://review.whamcloud.com/40605
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-11776 utils: add support lfs find with mdt hash flag 40/39340/7
Yang Sheng [Fri, 10 Jul 2020 15:31:17 +0000 (23:31 +0800)]
LU-11776 utils: add support lfs find with mdt hash flag

The lfs find can use mdt hash flag as a condition. Also
change it can find with one more mdt hash type.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I599bb1a3cc2c9ea2a523f50f119bd93a5520d213
Reviewed-on: https://review.whamcloud.com/39340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13397 lfs: mirror resync to keep sparseness 73/40773/11
Mikhail Pershin [Wed, 25 Nov 2020 16:05:05 +0000 (19:05 +0300)]
LU-13397 lfs: mirror resync to keep sparseness

Use SEEK_HOLE/SEEK_DATA in llapi_mirror_resync_many() to
copy just data chunks between components. Holes at the last
component are done with truncate(), holes in other components
are done with fallocate(FALLOC_FL_PUNCH_HOLE). In case of any
punch() error the hole is just copied via read(), i.e. as zeroes

Currently fallocate(FALLOC_FL_PUNCH_HOLE) is not supported yet,
so resync preserves sparseness only for last components

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id249739c5cd2d1c8a998da3341d326de1a8b8d32
Reviewed-on: https://review.whamcloud.com/40773
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 lustre: convert IFTODT to S_DT 41/40641/3
Mr NeilBrown [Thu, 12 Nov 2020 23:01:06 +0000 (10:01 +1100)]
LU-6142 lustre: convert IFTODT to S_DT

In Linux v5.1-rc1~141^2~1 introduced include/linux/fs_types.h which
adds macros for manipulating file types, including S_DT() which
does what the userpsace IFTODT() macro does.

So change kernel code to use S_DT() instead of IFTODT(), and provide
definitions for kernels which don't yet have this file.

fs_types.h is included by fs.h, so we don't need to explicitly include
it anywhere.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If001f7e7a97992af690222b7524770c5e4b7003d
Reviewed-on: https://review.whamcloud.com/40641
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14090 mgs: no local logs flag 48/40448/7
Artem Blagodarenko [Thu, 16 Jul 2020 08:37:51 +0000 (04:37 -0400)]
LU-14090 mgs: no local logs flag

There is a feature that starts a target with a local copy of
config log in order to avoid a delay in communicating with
an MGS and to load mgs log updates later on. However, that
feature is not always useful.

When replace_nids adds records with new nids it does not
append remote config logs but overwrite corresponding
records in place. If a target starts using local config
log - it gets confused by outdated nids.

This patch adds tunefs.lustre --nolocallogs key that
sets nolocallogs flag, which says ignore local configs copy.
The flag is reset once new logs are uploaded from MGS.

tunefs.lustre --nolocallogs is suggested to be executed on
targets together with replace_nids on MGS.

HPE-bug-id: LUS-2510
Change-Id: I949c19ac701d287e1c1199bc12445989476a707b
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157574
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Nikitas Angelinas <nangelinas@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/40448
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12142 clio: fix hang on urgent cached pages 37/40237/12
Wang Shilong [Wed, 14 Oct 2020 02:49:49 +0000 (10:49 +0800)]
LU-12142 clio: fix hang on urgent cached pages

Few problems addressed by this patch:

1) We try to reserve cl_pages in batch, but we don't do
that for append IO, there is no reason to skip that.

2) IO might be not page aligned, calculate reserved pages
correctly for this case.

3) If we issue one large IO block size which is larger
than max_cached_mb, IO will never be finished, because
we don't have enough cl pages to finish it, split IO
in this case.

4) Readahead should fail if we are short of LRU page
slots to avoid deadlock.

After above adjustment, LRU slots are guranteed for normal
buffer write before IO starts, if block size is too large
for max LRU slots, IO will be split.

For extra readahead, don't try hard and quit if we
are short of LRU pages, since readahead could tolerate
errors, applications won't be aware of it.

besides newly added tests, following command with 64M
max_cached_mb setting and don't see client hang any more.

/usr/lib64/openmpi/bin/mpirun --allow-run-as-root -np 12
-wd /mnt/lustre ior -g -e -w -r -b 1g -T 10 -F -C -t 64m

Todo:
Performance benchmark for readahead

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I5c85454a40daeefb4fb97609d6aa28df2eafb99c
Reviewed-on: https://review.whamcloud.com/40237
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12142 readahead: limit over reservation 60/42060/5
Wang Shilong [Wed, 17 Mar 2021 09:58:00 +0000 (17:58 +0800)]
LU-12142 readahead: limit over reservation

For performance reason, exceeding @ra_max_pages are allowed to
cover current read window, but this should be limited with RPC
size in case a large block size read issued. Trim to RPC boundary.

Otherwise, too many read ahead pages might be issued and
make client short of LRU pages.

Fixes: 777b04a093 ("LU-13386 llite: allow current readahead to exceed reservation"
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Icf74b5fbc75cf836fedcad5184fcdf45c7b037b4
Reviewed-on: https://review.whamcloud.com/42060
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10632 tests: recovery-small test_26 idle_timeout 06/42006/3
Andreas Dilger [Thu, 11 Mar 2021 09:39:57 +0000 (02:39 -0700)]
LU-10632 tests: recovery-small test_26 idle_timeout

In recovery-small test_26() use "lfs df" instead of plain "df"
since statfs may be fetched from the MDS cache and will not
ensure that the client->OST connections are currently active.

Also, check a few entries further back in the OSC state log for an
EVICTED message, in case the client idle disconnects from the server
again while checking all of the imports.

Test-Parameters: trivial testlist=recovery-small env=ONLY=26a,ONLY_REPEAT=100
Fixes: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8c370cb75f4e06258ef3c032630fc20354a15dcc
Reviewed-on: https://review.whamcloud.com/42006
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14534 gss: do not refresh context for LDLM callback 76/42076/2
Sebastien Buisson [Thu, 18 Mar 2021 16:17:31 +0000 (17:17 +0100)]
LU-14534 gss: do not refresh context for LDLM callback

If the request to be sent is an LDLM callback, do not try to
refresh context.
An LDLM callback is sent by a server to a client in order to make
it release a lock, on a communication channel that uses a reverse
context. It cannot be refreshed on its own, as it is the 'reverse'
(server-side) representation of a client context.
We do not care if the reverse context is expired, and want to send
the LDLM callback anyway. Once the client receives the AST, it is
its job to refresh its own context if it has expired, hence
refreshing the associated reverse context on server side, before
being able to send the LDLM_CANCEL requested by the server.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic8f4fe203f16ed5cfafd3da355c78cf58d96c3eb
Reviewed-on: https://review.whamcloud.com/42076
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14527 kernel: kernel update RHEL7.9 [3.10.0-1160.21.1.el7] 50/42050/4
Jian Yu [Tue, 16 Mar 2021 18:42:56 +0000 (11:42 -0700)]
LU-14527 kernel: kernel update RHEL7.9 [3.10.0-1160.21.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.21.1.el7.

Test-Parameters: clientdistro=el7.9 serverdistro=el7.9

Change-Id: I1a46fe492d280b19c0f93458aaac975a4c873caf
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/42050
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14204 tests: use first available import 19/42019/3
Sebastien Buisson [Fri, 12 Mar 2021 08:48:09 +0000 (09:48 +0100)]
LU-14204 tests: use first available import

In test suite, be careful to use first available import in case there
are multiple mount points.

Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib099cd5c9666e9d4faf9445846c91a225f4a8f57
Reviewed-on: https://review.whamcloud.com/42019
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-8837 lod: move lod-specifc pool config code into lod_dev 93/41993/4
Mr NeilBrown [Fri, 8 Jan 2021 03:23:01 +0000 (14:23 +1100)]
LU-8837 lod: move lod-specifc pool config code into lod_dev

obd_config.c contains code that only applies to lod devices, for
managing a QMT pool along-side each normal pool.

As this code is specific to lod, it is best to move it into the lod
module.  This is particularly helpful as it removes it from
client-only builds.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9e0d014a299c28b73e48ce2e06581cb011acce47
Reviewed-on: https://review.whamcloud.com/41993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-14289 libcfs: discard cfs_array_alloc() 92/41992/2
Mr NeilBrown [Wed, 24 Feb 2021 23:57:19 +0000 (10:57 +1100)]
LU-14289 libcfs: discard cfs_array_alloc()

cfs_array_alloc() and _free() are used for precisely one array, and
provide little value beyond open-coding the alloc and free.

So discard these functions and alloc/free in the loops that already
exist for setup and cleanup.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2a66be311dbba269b0b43c3a75f17ccc8e946538
Reviewed-on: https://review.whamcloud.com/41992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14507 mdt: handle default stripe_count=-1 properly 83/41983/6
Andreas Dilger [Wed, 10 Mar 2021 16:57:44 +0000 (09:57 -0700)]
LU-14507 mdt: handle default stripe_count=-1 properly

If the default LMV stripe_count=-1 print it as a signed value
instead of unsigned, to better match how it is set with "-c -1".

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I106f266c33e2c2cf0f5bcc1491e4bc5ac93ebbe5
Reviewed-on: https://review.whamcloud.com/41983
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14506 hsm: correct default stripe offset in import 78/41978/2
John L. Hammond [Wed, 10 Mar 2021 15:20:29 +0000 (09:20 -0600)]
LU-14506 hsm: correct default stripe offset in import

In lhsmtool_posix, when calling llapi_hsm_import(), pass a stripe
offset of -1 rather than 0 to select the default. Add sanity-hsm
test_11c() to check that a file may be imported to a directory with a
default striping specifing a pool that does not include OST0000.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I40636c0620b2f9314eb13bf23a8cf6d02990f851
Reviewed-on: https://review.whamcloud.com/41978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
2 months agoLU-14462 gss: remove HAVE_SETNS from lgss_keyring 67/41967/3
Sebastien Buisson [Tue, 9 Mar 2021 16:11:44 +0000 (17:11 +0100)]
LU-14462 gss: remove HAVE_SETNS from lgss_keyring

For the sake of simplification, a previous patch removed the config
check that sets HAVE_SETNS, due to the fact that in kernels 3.10+
function setns() necessarily exists.
In this case, all #ifdef on HAVE_SETNS are erroneous because it is
not set whereas the function is actually available.
So remove all references to HAVE_SETNS in the code.

Fixes: 8e88bbfef5 ("LU-12477 lustre: remove obsolete config checks")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab0726c3e847a210185cc8c9353a79976acb1381
Reviewed-on: https://review.whamcloud.com/41967
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12678 lnet: convert lpni_refcount to a kref 41/41941/5
Mr. NeilBrown [Thu, 11 Mar 2021 22:43:41 +0000 (17:43 -0500)]
LU-12678 lnet: convert lpni_refcount to a kref

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

Change-Id: I857dff2c9838cb7d8f4b5f023f75f2d66119344f
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41941
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9859 libcfs: remove linux-curproc.c 38/41938/10
Mr. NeilBrown [Thu, 18 Mar 2021 21:57:03 +0000 (17:57 -0400)]
LU-9859 libcfs: remove linux-curproc.c

The only real functionality remaining here is
cfs_curproc_cap_pack(),
and it can be trivially implemented as an inline
in curproc.h.
So do that and remove the file.

The rest can be moved to jobid.c

Linux-commit: 37d3b407dc14a13ec8bba3a4d7737c92f996e9c0

Change-Id: I3546841fa44accb19d0867099c17b16ede48228e
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/41938
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14479 ssk: explicitly set perm on key 29/41929/3
Sebastien Buisson [Mon, 8 Mar 2021 14:20:00 +0000 (15:20 +0100)]
LU-14479 ssk: explicitly set perm on key

When an SSK key is loaded, either via lgss_sk command or thanks to
skpath mount option, try to set permissions on the key.
This is to avoid a 'Permission denied' error when a Lustre client or
server wants to make use of the key later on.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1ed712ae4d07be306cc76b4e59fab303437558bb
Reviewed-on: https://review.whamcloud.com/41929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14494 mdt: check object exists in mdt_close_handle_layouts() 05/41905/7
John L. Hammond [Fri, 5 Mar 2021 18:47:43 +0000 (12:47 -0600)]
LU-14494 mdt: check object exists in mdt_close_handle_layouts()

In mdt_close_handle_layouts() the client supplied FID may not identify
an existing object. So check for this before calling lu_object_attr().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib1710ca4bf7587e0496b3a37a2afb65f81250455
Reviewed-on: https://review.whamcloud.com/41905
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
2 months agoLU-14337 lov: return valid stripe_count/size for PFL files 03/41803/6
Emoly Liu [Tue, 16 Mar 2021 03:04:46 +0000 (11:04 +0800)]
LU-14337 lov: return valid stripe_count/size for PFL files

Dump struct lov_comp_md_v1 in function ll_lov_getstripe_ea_info()
correctly to avoid stripe_count=0 or stripe_size=0 returned by
old interface llapi_file_get_stripe(), which will cause
divide-by-zero for older userspace that calls this ioctl,
e.g. lustre ADIO driver.
The rule is:
- if stripe_count=0, return stripe_count=1;
- if stripe_size=0,
  -- for DoM files, return the stripe size of the second component,
     since the first component of DoM file data is placed on the
     MDT for faster access;
  -- else, return the stripe size of the last component.

Also, lov_getstripe_old.c and santy-pfl.sh test_25 is added to
verify this patch.

Test-parameters: testlist=sanity-pfl env=ONLY=25

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I4023ca4baff1b1ad2a439aa497baaabc56e891d2
Reviewed-on: https://review.whamcloud.com/41803
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13641 socklnd: remove tcp bonding 00/40000/12
Serguei Smirnov [Tue, 16 Mar 2021 21:34:26 +0000 (17:34 -0400)]
LU-13641 socklnd: remove tcp bonding

TCP bonding in the socklnd has become obsolete with LNet
Multi-Rail and there's no evidence it's being used anywhere.
Remove it to keep the code simple.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib456f951b8ccd59112c460085632a2cb3c982004
Reviewed-on: https://review.whamcloud.com/40000
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13569 lnet: Recover peer NI w/exponential backoff interval 20/39720/15
Chris Horn [Sun, 23 Aug 2020 15:16:18 +0000 (10:16 -0500)]
LU-13569 lnet: Recover peer NI w/exponential backoff interval

Perform LNet recovery pings of peer NIs with an exponential backoff
interval.
 - The interval is equal to 2^(number failed pings) up to a maximum
   of 900 seconds (15 minutes).
 - When a message is received the count of failed pings for the
   associated peer NI is reset to 0 so that recovery can happen more
   quickly.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic7e60455015a0236a96010c07fc0ddd02078cf92
Reviewed-on: https://review.whamcloud.com/39720
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13569 lnet: Only recover known good peer NIs 19/39719/15
Chris Horn [Thu, 16 Jul 2020 03:38:52 +0000 (22:38 -0500)]
LU-13569 lnet: Only recover known good peer NIs

A peer NI should not be eligible for recovery if we've never
received a message from it.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iec2fd015f6410ab91c6ef7c222cbed0204243106
Reviewed-on: https://review.whamcloud.com/39719
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13569 lnet: Age peer NI out of recovery 18/39718/15
Chris Horn [Sun, 23 Aug 2020 15:14:22 +0000 (10:14 -0500)]
LU-13569 lnet: Age peer NI out of recovery

No longer send recovery pings to a peer NI that has been in recovery
for the recovery time limit. A peer NI will become eligible for
recovery again once we receive a message from it.

The existing lpni_last_alive field is utilized for this new purpose.

A check for NULL lpni is removed from
lnet_handle_remote_failure_locked() because all callers of that
function already ensure the lpni is non-NULL.

lnet_peer_ni_add_to_recoveryq_locked() now takes the recovery queue
as an argument rather than using the_lnet.ln_mt_peerNIRecovq. This
allows the function to be used by lnet_recover_peer_nis().
lnet_peer_ni_add_to_recoveryq_locked() is also modified to take a ref
on the peer NI if it is added to the recovery queue. Previously, it
was the responsibility of callers to take this ref.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib4676540ac4bb040690a4fb047236c54eea0e752
Reviewed-on: https://review.whamcloud.com/39718
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 checkpatch: treat CNETERR and CEMERG as log function 96/41996/2
Mr NeilBrown [Wed, 10 Mar 2021 23:22:54 +0000 (10:22 +1100)]
LU-6142 checkpatch: treat CNETERR and CEMERG as log function

CNETERR and CEMERG are log functions and should be treated as such by
checkpatch.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I295f0de9244578ebdc925e0e0783d3b436fc6fb0
Reviewed-on: https://review.whamcloud.com/41996
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-6142 lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily. 39/41939/3
Neil Brown [Mon, 1 Mar 2021 15:13:56 +0000 (10:13 -0500)]
LU-6142 lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily.

list_for_each_entry_safe() is only needed if the body of the
loop might change the list, or if it might drop a lock that would
otherwise prevent the list from being changed.

When the body does neither of these, list_for_each_entry() should be
preferred as it makes the behaviour of the loop more clear to readers.

In each of the cases changed there, the list cannot change while the
loop proceeds.

Change-Id: Ib0f08c5d4d7959b80a7a1490fb606e40e1cf5f85
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13397 lfs: mirror extend/copy keeps sparseness 72/40772/10
Mikhail Pershin [Mon, 23 Nov 2020 11:06:12 +0000 (14:06 +0300)]
LU-13397 lfs: mirror extend/copy keeps sparseness

- make ll_lseek() to work under group lock and on designated
  mirror
- enhance lfs mirror copy functions migrate_copy_data() and
  llapi_mirror_copy_many() with lseek() to find holes and copy
  only data chunks.

Both 'migrate' and 'copy' lfs functionality rewrite designated
mirror fully, so holes are not punched in destination file, but
truncate is called first to make sure old data is erased.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic4a8768b816c921acd7f0adb3311138caac05a7c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/40772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoNew tag 2.14.51 2.14.51 v2_14_51
Oleg Drokin [Tue, 23 Mar 2021 05:41:16 +0000 (01:41 -0400)]
New tag 2.14.51

Change-Id: Iaab01cccdeb761183879a9baf42d5106e0e880ce

2 months agoLU-14502 lov: fault page update cp_lov_index 54/41954/4
Bobi Jam [Tue, 9 Mar 2021 09:15:20 +0000 (17:15 +0800)]
LU-14502 lov: fault page update cp_lov_index

In fault IO, vvp_io_fault_start() could find an existing cl_page
associated with the vmpage covering the fault index, and the page
may still refer to another mirror of an old IO.

This patch update the fault page's cp_lov_index in lov_io_fault_start

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I50639700159a76061437fd2f1a09dadf25cfd33f
Reviewed-on: https://review.whamcloud.com/41954
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14473 test: check RUNAS and RUNAS_ID 85/41785/5
Olaf Faaland [Sat, 27 Feb 2021 00:53:38 +0000 (16:53 -0800)]
LU-14473 test: check RUNAS and RUNAS_ID

Validate RUNAS and RUNAS_ID before testing a file create, so
that the error messages can be more specific.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I87b2c279f981b34ab979cca42a8ae06128a294cc
Reviewed-on: https://review.whamcloud.com/41785
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14291 lustre: further cleanup of acl code. 32/42032/3
Mr NeilBrown [Sun, 14 Mar 2021 22:34:55 +0000 (09:34 +1100)]
LU-14291 lustre: further cleanup of acl code.

Code in lustre/obdclass/acl.c is only used in lustre/mdd/, so move the
file there, renaming to mdd_acl.c and removing EXPORT_SYMBOL()
declarations.

The function prototypes in lustre_eacl.h are moved to mdd_internal.h,
and the remainder of that file is discarded.  THe
HAVE_STRUCT_ACL_XATTR stanza, in particular, is unnecessary is it
exists in lustre_compat.h.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idb0978758640c5ad527d2c68c4fdf6dee32a731c
Reviewed-on: https://review.whamcloud.com/42032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-8837 lmv: don't use lqr_alloc spinlock in lmv 49/41949/6
Mr NeilBrown [Mon, 8 Mar 2021 22:28:48 +0000 (09:28 +1100)]
LU-8837 lmv: don't use lqr_alloc spinlock in lmv

The only place the lrq_alloc spinlock is used in lmv is in
lmv_locate_tgt_rr().  The purpose here is presumably to protect
lmv_qos_rr_index from concurrent updates.  This is a field that is
only tangentially related the the structure that holds the spinlock.

lmv_qos_rr_index is directly in 'struct lmv_obd' while lqr_alloc
is in struct lu_qos_rr which is in struct lu_qos, which is in lmv_obd.

As there is a spinlock in 'struct lmv_obd' (lmv_lock) it makes more
sense to use that to protect lmv_qos_rr_index.  Then the entire
lu_qos_rr structure will be unused on the client and can be made
server-only.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I926e6d31ca0ee1cbfff9905192428e28485ed448
Reviewed-on: https://review.whamcloud.com/41949
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14385 utils: add range check to strtol() in lfs.c 56/41756/4
Jian Yu [Thu, 11 Mar 2021 23:10:29 +0000 (15:10 -0800)]
LU-14385 utils: add range check to strtol() in lfs.c

Most of the strtol() and strtoll() functions called
in lfs.c did not check the range of the return value.
This patch fixes those issues.

Change-Id: I9ff51662bf0d2320961a7838da08f09552e9ef1e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41756
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14428 libcfs: discard cfs_trace_copyin_string() 90/41490/4
Mr NeilBrown [Tue, 9 Feb 2021 00:49:30 +0000 (11:49 +1100)]
LU-14428 libcfs: discard cfs_trace_copyin_string()

Instead of cfs_trace_copyin_string(), use memdup_user_nul().
This combines the allocation with the copyin, and nul-terminates.

The resulting code is a lot simpler.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I089c5da96b59ec62d177aea2f3d170bf751c6fec
Reviewed-on: https://review.whamcloud.com/41490
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14428 libcfs: discard cfs_trace_console_buffers[] 89/41489/4
Mr NeilBrown [Tue, 9 Feb 2021 00:28:45 +0000 (11:28 +1100)]
LU-14428 libcfs: discard cfs_trace_console_buffers[]

cfs_trace_console_buffers[] is a collection of buffers into which
various messages are formatted - with vscnprintf or similar - and
which are then passed to cfs_print_to_console which adds more
formatted information.

The two levels of formatting can instead be achieved using the "%pV"
format which takes a format-and-args.  If we do this, we don't need
cfs_trace_console_buffers[] and more.

One minor drawback is that cfs_tty_write_message() requires a final
string to print, not a format plus arguments.  This is only minor
because there is precisely one message that is ever sent to
cfs_tty_write_message(), and it contains no formatting.  So we now
generate a warning if the string passed with D_TTY ever contains
formatting, and just print that string ignoring any formatting.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic78ac3703e5b6321dade8c367753c0aec1cae60b
Reviewed-on: https://review.whamcloud.com/41489
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14398 hsm: use llapi_fid2path_at() in the copytool 08/41408/2
John L. Hammond [Wed, 3 Feb 2021 20:19:05 +0000 (14:19 -0600)]
LU-14398 hsm: use llapi_fid2path_at() in the copytool

In lhsmtool_posix.c and liblustreapi_hsm.c, convert several uses of
uses of llapi_fid2path() to llapi_fid2path_at().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ice64d02010b4260287be4d4e26c6b75b178bc81b
Reviewed-on: https://review.whamcloud.com/41408
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14179 lfs: avoid lfs find error with long paths 37/41337/8
Stephane Thiell [Fri, 26 Feb 2021 20:33:04 +0000 (12:33 -0800)]
LU-14179 lfs: avoid lfs find error with long paths

Test that files created in a directory having an absolute path length
of up to PATH_MAX-1 are properly found with lfs find. This change
might not cover other very deep directory tree (above PATH_MAX).

Signed-off-by: Stephane Thiell <sthiell@stanford.edu>
Change-Id: I44726efd5053c593094587e5c8a4652a3a876641
Reviewed-on: https://review.whamcloud.com/41337
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14119 lfsck: replace dt_lookup() with dt_lookup_dir() 18/41218/3
Lai Siyao [Wed, 13 Jan 2021 09:16:55 +0000 (17:16 +0800)]
LU-14119 lfsck: replace dt_lookup() with dt_lookup_dir()

Lfsck code calls dt_lookup() to lookup sub file under directory in
many places, but this function needs to to initialize directory with
dt_try_as_dir() first, while it's missing in several places, since
the overhead is trivial, call dt_lookup_dir() instead.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I40bd8d51edece50353af1729cf867572a0abea78
Reviewed-on: https://review.whamcloud.com/41218
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14110 obdclass: Protect cl_env_percpu[] 65/40565/11
Etienne AUJAMES [Tue, 3 Nov 2020 14:35:17 +0000 (15:35 +0100)]
LU-14110 obdclass: Protect cl_env_percpu[]

cl_env_percpu is not protected against multi client mounts on the
same node: "keys_fill" could be called with the same cl_env_percpu
context by several mount processes (race on lu_context.lc_value).

This patch add a mutex for cl_env_percpu to proctect contexts
"refill".

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Icfd6f3715899fa4ac5279e932f462e7cf29d98bd
Reviewed-on: https://review.whamcloud.com/40565
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 llite: use d_is_symlink to test if dentry is a symlink 70/41770/5
Mr NeilBrown [Fri, 16 Oct 2020 00:07:21 +0000 (11:07 +1100)]
LU-6142 llite: use d_is_symlink to test if dentry is a symlink

Using d_is_symlink() is preferred to testing ->get_link or
->follow_link.

A recent patch made this work for foreign files/dirs by making sure
the entry type in d_flags is correct, so we can simplify the code in
ll_revalidate_dentry().

Fixes: 15d44e787e17 ("LU-12682 llite: fake symlink type of foreign file/dir")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie4c33ae1fb9a660ccbd50e2c70b6cde65cc9b990
Reviewed-on: https://review.whamcloud.com/41770
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14480 pool: wrong usage with ost list 15/41815/3
Vitaly Fertman [Wed, 16 Dec 2020 22:02:32 +0000 (01:02 +0300)]
LU-14480 pool: wrong usage with ost list

When the OST list is given on setstripe, it should have a priority
over the pool. Also, we check only for the 1st OST if it is in the
pool at the creation time, what worked well in past with -c and
works even with -C, but not with the OST list when some of the OSTs
are out of the pool.

Make the --pool and --ost options mutualy exclusive.
Drop the pool inheritance if the OST list is given.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I94a7fe97391f1185392f986f78ab1a372238972a
Reviewed-on: https://es-gerrit.dev.cray.com/158198
HPE-bug-id: LUS-9579
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/41815
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14182 lov: cancel layout lock on replay deadlock 67/40867/2
Vitaly Fertman [Fri, 4 Dec 2020 16:35:19 +0000 (19:35 +0300)]
LU-14182 lov: cancel layout lock on replay deadlock

layout locks are not replayed and instead cancelled as unused, what
requires to take lov_conf_lock. the semaphore may be already taken by
cl_lock_flush() which prepares a new IO which is not be able to be
sent to MDS as it is in the recovery.

HPE-bug-id: LUS-9232
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I1a1a91a81c19ad4deca9ff581107512642f0b666
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/40867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14291 build: use tgt_pool for lov layer 83/39683/8
James Simmons [Fri, 26 Feb 2021 21:41:09 +0000 (16:41 -0500)]
LU-14291 build: use tgt_pool for lov layer

New general code was created for target pool handling. We can
use this new code with the lov layer. Place this tgt_pool.c in
the obdclass instead of having a special target directory just to
build this code for the client.

Change-Id: I05542c1d654d79647f5e0853bb1d587ff265fdf9
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/39683
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 lustre: remove ll_file_*_flag wrappers. 92/40292/6
Mr NeilBrown [Thu, 15 Oct 2020 23:34:04 +0000 (10:34 +1100)]
LU-6142 lustre: remove ll_file_*_flag wrappers.

ll_file_{test,set,clear,test_and_set}_flag are simple wrappers around
the various *_bit() functions.  They don't aid readability and the
convention in the kernel is to use the *_bit() functions directly.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0d50f8936ad9f97882f4771dd3210cc05fe43989
Reviewed-on: https://review.whamcloud.com/40292
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-8837 ptlrpc: mark some functions as static 47/41947/5
Mr NeilBrown [Mon, 1 Mar 2021 04:38:42 +0000 (15:38 +1100)]
LU-8837 ptlrpc: mark some functions as static

The functions
 ptlrpc_start_threads,
 ptlrpc_start_thread,
 ptlrpc_stop_all_threads
 ptlrpc_nrs_policy_register
and
 ptlrpc_nrs_policy_register

are only used in the same file that defines them, so mark them as
'static' and remove the declarations from include files.

 ptlrpc_nrs_policy_unregister

is never used at all, so remove it completely.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id7862b9da3c58ab980c0fcd4d07c1f119fbf7581
Reviewed-on: https://review.whamcloud.com/41947
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-14289 ptlrpc: rename cfs_binheap to simply binheap 75/41375/3
Mr NeilBrown [Mon, 1 Feb 2021 02:16:12 +0000 (13:16 +1100)]
LU-14289 ptlrpc: rename cfs_binheap to simply binheap

As the binheap code is no longer part of libcfs, the cfs_ prefix is
misleading.  As this code is local to one module and doesn't conflict
with anything global, there is no need for a prefix at all.  So change
cfs_binheap to binheap.

This patch was prepare using 'sed', then fixing a few text-alignment
issues caused by the loss of those 4 characters.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I168bec50898ec7b9ab72dc91b080af4852ddb3a4
Reviewed-on: https://review.whamcloud.com/41375
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-14291 lustre: clean up lustre_eacl.h and make server-only 26/41126/4
Mr NeilBrown [Thu, 29 Oct 2020 05:13:36 +0000 (16:13 +1100)]
LU-14291 lustre: clean up lustre_eacl.h and make server-only

lustre_eacl.h contains a number of declarations that are never used:
remove them.

The declarations which are used are only needed on server-side files,
so remove the #include from elsewhere.

As obdclass/acl.c is only built server-side, remove the
 #ifdef HAVE_SERVER_SUPPORT
in the file.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If1a3d908bf8357041c38ab9d335efa1e051cef16
Reviewed-on: https://review.whamcloud.com/41126
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-13783 libcfs: don't depend on sysctl support for debugfs 32/40832/4
Mr NeilBrown [Thu, 12 Nov 2020 00:16:28 +0000 (11:16 +1100)]
LU-13783 libcfs: don't depend on sysctl support for debugfs

Since Linux v5.8-rc1~55^2~6 sysctl support routines like
proc_dointvec() expect a pointer to kernel-space, not userspace.

So stop using these function for debugfs files, and instead
provide bespoke functions.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I340a748bbfbd066054a73299ce32698aa39a0e2d
Reviewed-on: https://review.whamcloud.com/40832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13783 libcfs: support __vmalloc with only 2 args. 28/40328/7
Mr NeilBrown [Wed, 21 Oct 2020 04:26:35 +0000 (15:26 +1100)]
LU-13783 libcfs: support __vmalloc with only 2 args.

Since v5.8-rc1~201^2~19 Commit 88dca4ca5a93 ("mm: remove the pgprot
argument to __vmalloc") __vmalloc only takes 2 arguments.

So introduce __ll_vmalloc which takes 2 args, and calls
__vmalloc with correct number of args.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2c89512a12e28b27544a891620e448a9b752b089
Reviewed-on: https://review.whamcloud.com/40328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13903 utils: move userland only nidstr.h handling 15/39115/5
James Simmons [Mon, 8 Mar 2021 14:09:40 +0000 (09:09 -0500)]
LU-13903 utils: move userland only nidstr.h handling

The function cfs_expand_nidlist() no longer exist for kernel
internals. We can move the function prototype from the UAPI
header to string.h which is a libcfs user land header.
The structure netstrfns that is defined in a UAPI header
has been adding user land only handling. Additional its
use struct list_head which will confuse reviewers since
kernel developers see this as a kernel only thing.

Test-Parameters: trivial

Change-Id: Ifc3c87f6d3237a94d282d009455ff389278e73ea
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39115
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10391 socklnd: convert ksocknal_add_peer to take sockaddr 08/38408/7
Mr NeilBrown [Tue, 28 Jan 2020 01:15:13 +0000 (12:15 +1100)]
LU-10391 socklnd: convert ksocknal_add_peer to take sockaddr

ksocknal_add_peer() now takes a 'struct sockaddr' which is currently
always an IPv4 address.  ksocknal_lauch_packet() is the main place
where the nid is converted to an IP address.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I194248662798542096e5cc9af985e6c0063a038a
Reviewed-on: https://review.whamcloud.com/38408
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12752 mdt: commitrw_write() - check dying object under lock 97/41797/5
Vladimir Saveliev [Mon, 1 Mar 2021 08:52:51 +0000 (11:52 +0300)]
LU-12752 mdt: commitrw_write() - check dying object under lock

If process writes to unlinked file the following race between
mdt_commitrw_write() and mdd_close() may occur because
mdt_commitrw_write() checks whether an object is dying without lock:

mdt_commitrw_write() checks lu_object_is_dying(&mo->mot_header) and it
not yet

mdd_close() interposes and destroys the object via
  mdo_destroy()
    lod_destroy()
      lod_sub_destroy()
        osd_destroy()
          obj->oo_destroyed = 1;

mdt_commitrw_write() continues, locks the object and returns ENOENT
from

  dt_attr_get()
    osd_attr_get()
      if (unlikely(obj->oo_destroyed))
        return -ENOENT;

If the file is built of DoM and raid component ll_delete_inode() calls
cl_sync_file_range() which is to iterate over both mdt and raid
components via mdc_io_fsync_start() and osc_io_fsync_start().  As
mdc_io_fsync_start() fails with -ENOENT due to failed write rpc,
osc_io_fsync_start() does not get called. Then
truncate_inode_pages_final() finds not-discarded pages and fails with:

  (osc_page.c:183:osc_page_delete()) Trying to teardown failed: -16
  (osc_page.c:184:osc_page_delete()) ASSERTION( 0 ) failed:
  (osc_page.c:184:osc_page_delete()) LBUG

Test to illustrate the issue is added.

The fix is to call lu_object_is_dying() under object lock.

Change-Id: I463c8a6f85d4f5fd934b167c6194f50ae9d4b7d4
HPE-bug-id: LUS-7189
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-on: https://review.whamcloud.com/41797
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14184 tests: component-add/del tests for DOM 70/40870/3
Vitaly Fertman [Fri, 4 Dec 2020 18:55:41 +0000 (21:55 +0300)]
LU-14184 tests: component-add/del tests for DOM

make duplicates of sanity-pfl 2,3 tests for DOM layout

HPE-bug-id: LUS-8282
Test-parameters: testlist="sanity-pfl/2.* sanity-pfl/3.*"
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: If73d7a436b2fc6b6b564cc6eec14ec9e7e4d6937
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/40870
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12828 ldlm: not freed req on enqueue 18/41818/2
Vitaly Fertman [Tue, 2 Mar 2021 20:43:08 +0000 (23:43 +0300)]
LU-12828 ldlm: not freed req on enqueue

ldlm_cli_enqueue may allocate a req but failed to allocate a req
slot and returns an errors without freeing the req.

Fixes: 85a12c6c8d ("LU-12828 ldlm: FLOCK request can be processed twice")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I9663528bbf2bf64f6439fed6c27d0bc3f274b867
HPE-bug-id: LUS-9337
Reviewed-on: https://es-gerrit.dev.cray.com/158433
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/41818
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14183 ldlm: wrong ldlm_add_waiting_lock usage 68/40868/2
Vitaly Fertman [Fri, 4 Dec 2020 17:22:55 +0000 (20:22 +0300)]
LU-14183 ldlm: wrong ldlm_add_waiting_lock usage

exp_bl_lock_at accounted the period since BLAST send until cancel RPC
came to server originally. LU-6032 started to update l_blast_sent for
expired locks which are still busy - prolonged locks when the timeout
expired. In fact, this is a good idea to cover not the whole period
but until any involved RPC comes - it avoids excessively large lock
callback timeouts - and the IO which does the lock prolong is also
able to re-start the AT cycle by updating the l_blast_sent.

Unfortunately, the change seems to be made occasionally as the main
prolong code was not adjusted accordingly.

Fixes: 292aa42e08 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock")
HPE-bug-id: LUS-9278
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Idc598508fc13aa33ac9fce56f13310ca6fc819d4
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/40868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted 36/41936/3
Yang Sheng [Mon, 8 Mar 2021 14:53:13 +0000 (22:53 +0800)]
LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted

The request may be referenced by other target even the threads
of service were stopped. It caused by some portal shared among
different services. Just wait the request to be released as a
workaround.

LustreError: (service.c::ptlrpc_service_purge_all())
ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed:
LustreError: (service.c::ptlrpc_service_purge_all()) LBUG
Pid: 21, comm: umount 3.10.0 #1 SMP
Call Trace:
 [<a01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<a01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<a0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
 [<a005e122>] ost_cleanup+0x82/0x1b0 [ost]
 [<a08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
 [<a08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
 [<a08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
 [<a08f8030>] class_decref+0x80/0x160 [obdclass]
 [<a08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
 [<a08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
 [<a08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
 [<a092a115>] server_stop_servers+0xd5/0x160 [obdclass]
 [<a092f6c6>] server_put_super+0x126/0xca0 [obdclass]
 [<8121068a>] generic_shutdown_super+0x6a/0xf0
 [<81210a62>] kill_anon_super+0x12/0x20
 [<a09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
 [<81210e59>] deactivate_locked_super+0x49/0x60
 [<812115a6>] deactivate_super+0x46/0x60
 [<8123019f>] cleanup_mnt+0x3f/0x80
 [<81230232>] __cleanup_mnt+0x12/0x20
 [<810ab085>] task_work_run+0xb5/0xf0
 [<8102ac12>] do_notify_resume+0x92/0xb0
 [<81783c83>] int_signal+0x12/0x17
 Kernel panic - not syncing: LBUG

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Idfb19df123ceae177a0e447e9344bac6861166bf
Reviewed-on: https://review.whamcloud.com/41936
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14492 tests: sanity 27Cb skip condition 03/41903/3
Alexander Zarochentsev [Thu, 15 Oct 2020 08:09:09 +0000 (11:09 +0300)]
LU-14492 tests: sanity 27Cb skip condition

The test skip condition is wrong and causes the
test to be skipped if large xattrs are not supported.
Fixing other tests as well.

Test-Parameters: trivial
Fixes: 591a9b4ce ("LU-9846 lod: Add overstriping support")
HPE-bug-id: LUS-9454
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7b9d96abb5e4cf2a3955e20828e57a64978e6229
Reviewed-on: https://review.whamcloud.com/41903
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>