Whamcloud - gitweb
fs/lustre-release.git
3 months agoLU-9859 libcfs: rename cfs_cpt_table to cfs_cpt_tab 19/37519/2
NeilBrown [Mon, 10 Feb 2020 16:22:02 +0000 (11:22 -0500)]
LU-9859 libcfs: rename cfs_cpt_table to cfs_cpt_tab

The variable "cfs_cpt_table" has the same name as
the structure "struct cfs_cpt_table".
This makes it hard to use #define to make one disappear
on a uni-processor build, but keep the other.
So rename the variable to cfs_cpt_tab.

Linux-commit: 457d63ea5c1aa81fe0b9a66a77a2282856b88983

Test-Parameters: trivial

Change-Id: I77cc6694183df2485974c8a962a5766a905fb5f9
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/37519
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10756 ptlrpc: fix IMP_CLOSED state is being never set 05/37405/4
Mikhail Pershin [Mon, 3 Feb 2020 09:03:59 +0000 (12:03 +0300)]
LU-10756 ptlrpc: fix IMP_CLOSED state is being never set

Commit cf78502e48d checks the new state for IMP_CLOSED value
instead of import current state so instead of keeping import
closed it prevents import state from being set to IMP_CLOSE

Patch restores original check to keep import closed by
checking its current state

Fixes: cf78502e48d ("LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7df2798f09ce7023381c03957adf530da4149c2d
Reviewed-on: https://review.whamcloud.com/37405
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13190 mds: send mbo_max_mdsize in open intent reply 00/37400/6
Alex Zhuravlev [Sun, 2 Feb 2020 13:45:29 +0000 (16:45 +0300)]
LU-13190 mds: send mbo_max_mdsize in open intent reply

 - client sends open|create intent before a connection to OST
   cl_default_mds_easize is 0 since initialization
 - MDS replies back without UPDATE bit in LDLM lock, but wit EAh
    (MDS doesn't send OBD_MD_FLMODEASIZE and mbo_max_mdsize back
 - client's cl_default_mds_easize is still 0
 - client sends getattr intent with 0-size buffer for EA
 - MDS replies LAYOUT lock, but empty EA due to 0-size buffer
 - client sets local layout to EMPTY
 - all subsequent I/O fails with -EBADF

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iadd5595d956f0469e3916cdc1cca2ac8f802a149
Reviewed-on: https://review.whamcloud.com/37400
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12722 target: disable recovery for local clients 25/36025/38
Alexey Zhuravlev [Mon, 9 Sep 2019 14:00:05 +0000 (17:00 +0300)]
LU-12722 target: disable recovery for local clients

when client is running on a server node, then the local
services can't rely on that client in the contex of
recovery - such a client dies with the node, can't replay
requests and states and then the restarting server has to
wait till recovery expires which doesn't make any sense.

so the servers should recogize local clients and exclude
them from recovery (i.e. don't make them part of last_rcvd).

for the purpose of local testing a special mount option
"local_recov" has been added to {MDS|OST}_MOUNT_OPTS in
tests/cfg/local.sh to save local testing when everyting
is running within a single node.

Signed-off-by: Alexey Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4cb906c44c1192933f7d77dc782160e426e9efde
Reviewed-on: https://review.whamcloud.com/36025
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12280 quota: add notify grace 17/36017/8
Hongchao Zhang [Thu, 10 Oct 2019 21:06:05 +0000 (17:06 -0400)]
LU-12280 quota: add notify grace

Add an option to get notify when the quota is over soft limit but
prevents it from becoming the hard limit.

Change-Id: I01ae1266c3683198b82af7bad119db280c1e3a07
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36017
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9859 libcfs: remove unnecessary cfs_block_allsigs() calls 50/35350/15
NeilBrown [Mon, 10 Feb 2020 14:06:31 +0000 (09:06 -0500)]
LU-9859 libcfs: remove unnecessary cfs_block_allsigs() calls

Threads started by kthread_run() ignore all signals,
as kthreadd() calls ignore_signals(), and this is
inherited by all children.
So there is no need to call cfs_block_allsigs() in functions
that are only run from kthread_run().

For the case of lnet_ping_md_unlink() it is not from a kernel
thread but nothing in that function should be affected by
signals so it is safe to remove.

For lnet_ping() we need to manually block signals since
LNetEQPool() can unconditionally abort when a signal is
recieved.

Linux-commit: 1b2dad1459e480028a2714439048d8a634132857

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I124dccf78a3187d5f4a31c7b76db5369aaafc369
Reviewed-on: https://review.whamcloud.com/35350
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12477 lustre: remove obsolete config checks 85/37085/16
James Simmons [Sat, 8 Feb 2020 13:39:30 +0000 (08:39 -0500)]
LU-12477 lustre: remove obsolete config checks

Remove from the lustre kernel code all the support for kernels
earlier than the RHEL7 3.10+. This greatly simplifies the code
and makes build times much better.

Change-Id: If52091ac5249b2719b992032040ccf30cc5bf0e4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37085
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10235 mdt: mdt_create: check EEXIST without lock 80/30880/18
Dominique Martinet [Wed, 10 Jan 2018 13:08:06 +0000 (14:08 +0100)]
LU-10235 mdt: mdt_create: check EEXIST without lock

mkdir() currently gets a write lock on the parent even if the new
directory already exists.

This patch adds an initial lookup of the new directory without a DLM
lock so that other clients do not need to cancel their DLM lock if the
"new" directory already exists, but will continue as usual if directory
did not exist.

There is a small race window that child was created by others after our
check and before locking parent, but this can be detected later during
index insert.

Performance change on two haswell 16-core VMs with ib, mean values of
mpirun -n 8 ./mdtest -D -i 8 -I 1000

test environment | directory creation | tree creation
local, no patch  | 1725/s             | 769/s
local, patch     | 1821/s             | 788/s
remote, no patch | 1729/s             | 772/s
remote, patch    | 1687/s             | 787/s

The differences are of the order of the noise here, with all mkdirs
being effective.

If directories exist, some simple stress on four nodes shows intended
improvements:
clush -w vm[0-3] 'seq 0 10000 |
    xargs -P 7 -I{} sh -c "(({}%3==0)) &&
        mkdir /mnt/lustre/testdir/foo 2>/dev/null ||
        stat /mnt/lustre/testdir > /dev/null"'

with patch: 10s
without patch: 19s
(the difference grows exponentially with number of clients and hangs
with over 60 clients without the patch; exact time was not re-measured
with patch)

Updated sanityn.sh 43a 45a to avoid race conditions.

Add sanityn.sh test_43j to verify above scenario.

Test-Parameters: envdefinitions=SLOW=yes testlist=replay-vbr,replay-vbr
Change-Id: I37fc9c8ffc7ab334c0645042beda5bef01284564
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/30880
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-11597 tests: skip sanityn tests for PPC 61/37561/2
James Nunez [Thu, 13 Feb 2020 20:10:53 +0000 (13:10 -0700)]
LU-11597 tests: skip sanityn tests for PPC

Several sanityn test suite tests fail consistenly when
testing PPC clients.  These tests should be skipped,
added to the ALWAYS_EXCEPT list, until the failures are
understood and fixed.

Tests to skip in sanityn are
16a (LU-11597)
71a (LU-11787)

Test-Parameters: trivial clientarch=ppc64 testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I39cc9d22e8a47eb8ef59ce8d30e1b6e9aa616a9a
Reviewed-on: https://review.whamcloud.com/37561
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11269 ptlrpc: do not expose transient IDLE state 23/37523/4
Alex Zhuravlev [Mon, 10 Feb 2020 21:06:07 +0000 (00:06 +0300)]
LU-11269 ptlrpc: do not expose transient IDLE state

to avoid cases when anyone sending an RPC observes the connection
in this state while it's going to reconnect right away.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9ca89051c4176fe321262f8b2f52969c382e401e
Reviewed-on: https://review.whamcloud.com/37523
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13228 clio: mmap write when overquota 95/37495/4
Alexander Zarochentsev [Fri, 20 Dec 2019 23:19:44 +0000 (02:19 +0300)]
LU-13228 clio: mmap write when overquota

Flagging client by overquota flag should not
cause mmap write access to sigbus the app.

Cray-bug-id: LUS-8221
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I29d5901fa5078b5cfca40391a02531cf27efce93
Reviewed-on: https://review.whamcloud.com/37495
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13131 osc: remove redundant osc_list() helper 79/37479/4
Andreas Dilger [Fri, 7 Feb 2020 22:01:49 +0000 (15:01 -0700)]
LU-13131 osc: remove redundant osc_list() helper

The osc_list() helper function is the same as list_empty_marker(),
and we don't need both.  Remove osc_list() from the code.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I07d2a519906f52fca8c95613a14ad7389a3ebbe5
Reviewed-on: https://review.whamcloud.com/37479
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 various: use list_splice and list_splice_init 57/37457/2
Mr NeilBrown [Wed, 13 Nov 2019 03:03:12 +0000 (14:03 +1100)]
LU-9679 various: use list_splice and list_splice_init

The construct
   list_add(to, from);
   list_del(from);
is equivalent to
   list_splice(from, to);
providing 'to' has been initialized.
Similarly with list_del_init and list_splice_init.
There is no need to check if list_empty(from) first.

Also looping over a list moving individiual entries to
another list can more easily be done with list_splice.

These changes improve code clarity.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I710eb3bbd83c75e6c8f00b8d0a4c256ad28f9082
Reviewed-on: https://review.whamcloud.com/37457
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: discard lnet_sock_accept() 03/37303/4
Mr NeilBrown [Thu, 7 Nov 2019 04:02:54 +0000 (15:02 +1100)]
LU-10391 lnet: discard lnet_sock_accept()

There is no longer any important difference between
lnet_sock_accept(), and kernel_accept(..., O_NONBLOCK).
So remove lnet_sock_accept().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iad7c91abe2359758982e3740a21c91232c919aa0
Reviewed-on: https://review.whamcloud.com/37303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: use data_ready callback to trigger accept() 02/37302/6
Mr NeilBrown [Wed, 22 Jan 2020 06:16:12 +0000 (17:16 +1100)]
LU-10391 lnet: use data_ready callback to trigger accept()

Rather than blocking in lnet_sock_accept(), set up a data_ready
callback, and use that to find out when to call lnet_sock_accept()
again.

This simplifies lnet_sock_accept() (which will be removed in
next patch), and means that we could listen on multiple
sockets, which will be useful for IPv6 support.

The code design in based on that in net/sunrpc/svcsock.c.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3015f2f6b6d420af5c8454b6c1a99611b48e7702
Reviewed-on: https://review.whamcloud.com/37302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 llite: Discard LUSTRE_FPRIVATE() 52/36652/11
Mr NeilBrown [Sun, 3 Nov 2019 23:09:25 +0000 (10:09 +1100)]
LU-9679 llite: Discard LUSTRE_FPRIVATE()

The LUSTRE_FPRIVATE() macro adds no value.
Instead of
  LUSTRE_FPRIVATE(file)
use
  file->private_data

which is shorter and more familiar, and widely used
elsewhere in lustre.

Also re-indent several functions where this was changed, to
use TABs.
Also join together some strings that were split across 2
lines.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I811aea8069b22beed15fd96d8c6bef8eca42defd
Reviewed-on: https://review.whamcloud.com/36652
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 ptlrpc: convert final users of LWI_TIMEOUT_INTERVAL 79/35979/11
Mr NeilBrown [Tue, 27 Aug 2019 22:47:09 +0000 (08:47 +1000)]
LU-10467 ptlrpc: convert final users of LWI_TIMEOUT_INTERVAL

LWI_TIMEOUT_INTERVAL causes l_wait_event to perform a slow
poll loop.  This is only needed if the event can happen without
triggering a wakeup on the wait-queue.

On this case, the event is a counter reaching zero, and we can
easily ensure a wakeup is sent whenever that counter becomes
zero.
So let's add those wake_ups, and change this to a simple
wait_event_idle_timeout().

At the same time, change all wake_up_all() calls on this wait queue to
simple wake_up().  wake_up_all() is only needed where there are
exclusive waiters, and this queue has no exclusive waiters.

Change-Id: I2bea069150f21b725025bacc7a4fa0cf4d95ab20
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/35979
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 lustre: use l_wait_event_abortable where appropriate. 75/35975/8
Mr NeilBrown [Mon, 26 Aug 2019 05:59:07 +0000 (15:59 +1000)]
LU-10467 lustre: use l_wait_event_abortable where appropriate.

If the lwi passed to l_wait_event() was created with

    lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);

the effect is to wait with no timeout and blocking any
non-fatal signals.
For this, we now have l_wait_event_abortable(), or for one
case l_wait_event_abortable_exclusive();
So use those.

l_wait_event_abortable() will return -ERESTARTSYS if a signal was
received, while l_wait_event() returens -EINTR.  We need to be
careful to handle this difference.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Iadf0fab92fcfd46802766198dcbe6b6b349214fa
Reviewed-on: https://review.whamcloud.com/35975
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12988 ldiskfs: revert prefetch patch 19/37619/2
Alex Zhuravlev [Wed, 19 Feb 2020 08:12:31 +0000 (11:12 +0300)]
LU-12988 ldiskfs: revert prefetch patch

as a problem leading to IO errors was found.
also, the patch for 4.18 kernel needs fixes.

Revert "LU-12988 ldiskfs: mballoc to prefetch groups"

This reverts commit 05f31782be20fc4c46082dba02c10bcea59539e3.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I478a011e561633516524697f3a4aa03734791790
Reviewed-on: https://review.whamcloud.com/37619
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13166 osd-ldiskfs: fix to allow to get system inode 21/37421/4
Wang Shilong [Wed, 5 Feb 2020 12:33:38 +0000 (20:33 +0800)]
LU-13166 osd-ldiskfs: fix to allow to get system inode

Lustre need load ldiskfs system inode for quota accounting purpose,
so pass LDISKFS_IGET_SPECIAL flag to ldiskfs_iget(), otherwise,
support of centos8 quota will be broken.

Fixes: 8ab3aa50a14 ("LU-12355 ldiskfs: Added ext4_iget_flags to ext4_iget")
Change-Id: I3a30ec540444b149bc3398a62951d2826eb7b9ce
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37421
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9859 libcfs: move tracefile locking from linux-tracefile.c 08/37408/4
NeilBrown [Tue, 4 Feb 2020 02:28:36 +0000 (21:28 -0500)]
LU-9859 libcfs: move tracefile locking from linux-tracefile.c
 to tracefile.c

There is no value in keeping it separate.

Linux-commit: 49209c598d93289ca077575615e98f242b1d8156

Test-Parameters: trivial

Change-Id: I24ee7545a40fd6d2ac15018f089d51142736fa27
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/37408
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12747 tests: wait properly for orhpan thread stop 95/37395/2
Andreas Dilger [Sat, 1 Feb 2020 00:55:23 +0000 (17:55 -0700)]
LU-12747 tests: wait properly for orhpan thread stop

Use wait_update_facet() to check if the MDD orphan cleanup thread has
exited, rather than a fixed 5s timeout.  We can hope that most cases
will finish faster than 5s, but don't gratuitously fail if it takes
somewhat longer.  We clearly aren't having a fatal problem here, or
there would be serious failures at cleanup time.

Fixes: fffef5c29e3b ("LU-11418 mdd: delete name if orphan doesn't exist")
Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=811,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I16b0281a519d47b5b98d495bf17040153c3ebbe5
Reviewed-on: https://review.whamcloud.com/37395
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13169 tests: add ONLY_REPEAT parameter to repeat subtests 21/37321/8
Andreas Dilger [Fri, 24 Jan 2020 09:20:38 +0000 (02:20 -0700)]
LU-13169 tests: add ONLY_REPEAT parameter to repeat subtests

Add the ONLY_REPEAT environment variable, to allow tests specified
by ONLY to be run multiple times, to ensure that the test is passing
consistently (or fixing an intermittent bug).  This is faster than
restarting the test session multiple times for only a few subtests.

Have the iteration around the subshell started for run_one() so that
any registered stack_trap EXIT calls are triggered between iterations,
the fail_loc is reset, grant/health/error checks are done, and so on.

Remove $tdir and $tfile files after each iteration to avoid failures
with the subsequent subtest runs.  For tests that do not follow the
standard naming convention for test directories and files, they need
to be updated to use $tdir and $tfile, which is good in any case.

YAML output splits each iteration into a separate subtest for Maloo.
The output from run_one() is appended to a single output file for all
iterations so all output is captured instead of just the last one.

The iterations will continue until $ONLY_REPEAT loops pass, or until
the subtest hits an error.  Trying to continue for all iterations in
the face of errors would likely end up with all of later iterations
failing also due to leftover state from the previous failure, and the
goal is for the subtests to pass consistently.  If we are trying to
determine rates of intermittent failures, this can be computed using
1/num_passes about the same as num_failures/ONLY_REPEAT iterations.

Rename variables in subtests to avoid clash with testnum, testname,
and TESTNAME, and use them consistently in functions and subtests.

Test-Parameters: testlist=sanity envdefinitions=ONLY=27l,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: I5449590dc3e25c113b059974fb7b96c892434380
Reviewed-on: https://review.whamcloud.com/37321
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12637 kernel: RHEL 8.1 server support 68/36968/19
Jian Yu [Tue, 4 Feb 2020 00:20:35 +0000 (16:20 -0800)]
LU-12637 kernel: RHEL 8.1 server support

This patch makes changes to support RHEL 8.1 release with
kernel 4.18.0-147.3.1.el8_1 for Lustre server.

Test-Parameters: trivial \
envdefinitions=SANITY_EXCEPT="411 812b" \
clientdistro=el8.1 serverdistro=el8.1 \
testlist=sanity

Change-Id: Iee6ae3dc20c62caaac1d740b14c5877ff7bfb4d5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36968
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13178 build: Update ZFS version to 0.8.3 73/37373/6
Nathaniel Clark [Thu, 30 Jan 2020 18:18:09 +0000 (10:18 -0800)]
LU-13178 build: Update ZFS version to 0.8.3

New Features 0.8.0

* Native encryption #5769 - The encryption property enables the
creation of encrypted filesystems and volumes. The aes-256-ccm
algorithm is used by default. Per-dataset keys are managed with zfs
load-key and associated subcommands.

* Raw encrypted 'zfs send/receive' #5769 - The zfs send -w option
allows an encrypted dataset to be sent and received to another pool
without decryption. The received dataset is protected by the original
user key from the sending side. This allows datasets to be efficiently
backed up to an untrusted system without fear of the data being
compromised.

* Device removal #6900 - This feature allows single and mirrored
top-level devices to be removed from the storage pool with zpool
remove. All data is copied in the background to the remaining
top-level devices and the pool capacity is reduced accordingly.

* Pool checkpoints #7570 - The zpool checkpoint subcommand allows
you to preserve the entire state of a pool and optionally revert back
to that exact state. It can be thought of as a pool wide snapshot.
This is useful when performing complex administrative actions which
are otherwise irreversible (e.g. enabling a new feature flag,
destroying a dataset, etc).

* Pool TRIM #8419 - The zpool trim subcommand provides a way to
notify the underlying devices which sectors are no longer allocated.
This allows an SSD to more efficiently manage itself and helps prevent
performance from degrading. Continuous background trimming can be
enabled via the new autotrim pool property.

* Pool initialization #8230 - The zpool initialize subcommand writes
a pattern to all the unallocated space. This eliminates the first
access performance penalty, which may exist on some virtualized
storage (e.g. VMware VMDKs).

* Project accounting and quota #6290 - This features adds project
based usage accounting and quota enforcement to the existing space
accounting and quota functionality. Project quotas add an additional
dimension to traditional user/group quotas. The zfs project and zfs
projectspace subcommands have been added to manage projects, set quota
limits and report on usage.

* Channel programs #6558 - The zpool program subcommand can be used
to perform compound ZFS administrative actions via Lua scripts in a
sandboxed environment (with time and memory limits).

* Pyzfs #7230 - The new pyzfs library is intended to provide a
stable interface for the programmatic administration of ZFS. This
wrapper provides a one-to-one mapping for the libzfs_core API
functions, but the signatures and types are more natural to Python.

* Python 3 compatibility #8096 - The arcstat, arcsummary, and
dbufstat utilities have been updated to be compatible with Python 3.

* Direct IO #7823 - Adds support for Linux's direct IO interface.

Performance

* Sequential scrub and resilver #6256 - When scrubbing or
resilvering a pool the process has been split into two phases. The
first phase scans the pool metadata in order to determine where the
data blocks are stored on disk. This allows the second phase to issue
scrub I/O as sequentially as possible, greatly improving performance.

* Allocation classes #5182 - Allows a pool to include a small number
of high-performance SSD devices that are dedicated to storing specific
types of frequently accessed blocks (e.g. metadata, DDT data, or small
file blocks). A pool can opt-in to this feature by adding a special or
dedup top-level device.

* Administrative commands #7668 - Improved performance due to
targeted caching of the metadata required for administrative commands
like zfs list and zfs get.

* Parallel allocation #7682 - The allocation process has been
parallelized by creating multiple "allocators" per-metaslab group.
This results in improved allocation performance on high-end systems.

* Deferred resilvers #7732 - This feature allows new resilvers to be
postponed if an existing one is already in progress. By waiting for
the running resilver to complete redundancy is restored as quickly as
possible.

* ZFS Intent Log (ZIL) #6566 - New log blocks are created and issued
while there are still outstanding blocks being serviced by the
storage, effectively reducing the overall latency observed by the
application.

* Volumes #8615 - When a pool contains a large number of volumes
they are more promptly registered with the system and made available
for use after a zpool import.

* QAT #7295 #7282 #6767 - Support for accelerated SHA256 checksums,
AES-GCM encryption, and the new QAT Intel(R) C62x Chipset / Atom(R)
C3000 Processor Product Family SoC.

Changes in Behavior

* Relaxed (ref)reservation constraints on volumes, they may now be
set larger than the volume size.

* The arcstat.py, arc_summary.py, and dbufstat.py commands have been
renamed arcstat, arc_summary, and dbufstat respectively.

* The SPL source is now included in the ZFS repository removing the
need for separate packages.

* The dedupditto pool property and zfs send -D option have been
deprecated and will be removed in a future release.

Changes for 0.8.1

* Fix comparison signedness in arc_is_overflowing() #8873
* Fix incorrect error message for raw receive #8863
* arc_summary: prefer python3 version and install when there is no
python #8851
* Fix %post and %postun generation in kmodtool #8866
* Reinstate raw receive check when truncating #8852 #8857
* If $ZFS_BOOTFS contains guid, replace the guid portion with $pool
* Fix integer overflow of ZTOI(zp)->i_generation #8858
* hkdf_test binary should only have one icp instance #8850
* Fixed a small typo in man/man1/raidz_test.1 #8855
* Allow TRIM_UNUSED_KSYM when build as a builtin-module #8820
* Make Python detection optional and more portable #8809 #8731
* Wait in 'S' state when send/recv pipe is blocking #8733 #8752
* Make zfs_async_block_max_blocks handle zero correctly #8829 #8289
* Revert "Report holes when there are only metadata changes" #8816
* Exclude log device ashift from normal class #8735
* Fix integer overflow in get_next_chunk() #8778 #8797
* Double-free of encryption wrapping key due to invalid pool
properties #8791
* Endless loop in zpool_do_remove() on platforms with unsigned char
* Fix embedded bp accounting in count_block() #8800 #8766
* Disable parallel processing for 'zfs mount -l' #8762 #8811
* Linux 5.2 compat: Directly call wait_on_page_bit() #8794
* Linux 5.2 compat: Fix config/kernel-shrink.m4 test failure #8776
* Linux 5.2 compat: Remove config/kernel-set-fs-pwd.m4 #8777
* zpool: status -t is not documented in help message #8782
* VERIFY3P() message is missing a space character #8786
columns #8785
* zfs: don't pretty-print objsetid property #8784
* zfs: missing newline character in zfs_do_channel_program() error
message #8783
* Fix ksh-path for random_readwrite_fixed.ksh #8779
* Linux 2.6.39 compat: Test if kstrtoul() exists #8760 #8761
* Device removal panics on 32-bit systems #8790
* zpool: trim -p is not a valid option #8781
* Fix coverity defects: CID 186143 #8788
* Fix kstat state update during pool transition #8746
* Linux 5.2 compat: rw_tryupgrade() #8730

Changes for 0.8.2

* Disabled resilver_defer feature leads to looping resilvers #9299
  #9338
* Fix dsl_scan_ds_clone_swapped logic #9140 #9163
* Scrubbing root pools may deadlock on kernels without
  elevator_change() (#9321) #9321
* QAT related bug fixes #9276 #9303
* kmodtool: depmod path #8724 #9310
* Fix /etc/hostid on root pool deadlock #9256 #9285
* BuildRequires libtirpc-devel needed for RHEL 8 #9289
* Fix zpool subcommands error message with some unsupported options
  #9270
* Fix zfs-dkms .deb package warning in prerm script #9271
* zvol_wait script should ignore partially received zvols #9260
* New service that waits on zvol links to be created #8975
* Always refuse receving non-resume stream when resume state exists
  #9252
* Fix Intel QAT / ZFS compatibility on v4.7.1+ kernels #9268 #9269
* etc/init.d/zfs-functions.in: remove arch warning
* zfs_handle used after being closed/freed in change_one callback
  #9165
* Fix zil replay panic when TX_REMOVE followed by TX_CREATE #7151
  #8910 #9123 #9145
* zfs_ioc_snapshot: check user-prop permissions on snapshotted
  datasets #9179 #9180
* Fix Plymouth passphrase prompt in initramfs script #9202
* Fix deadlock in 'zfs rollback' #9203
* Make slog test setup more robust #9194
* zfs-mount-genrator: dependencies should be space-separated #9174
* Linux 5.3: Fix switch() fall though compiler errors #9170
* Linux 5.3 compat: Makefile subdir-m no longer supported #9169
* Fix out-of-order ZIL txtype lost on hardlinked files #8769 #9061
* Increase default zcmd allocation to 256K #9084
* Improve performance by using dmu_tx_hold_*_by_dnode() #9081
* Fix channel programs on s390x #8992 #9080
* Race between zfs-share and zfs-mount services #9083
* Implement secpolicy_vnode_setid_retain() #9035 #9043
* zed crashes when devid not present #9054 #9060
* Don't directly cast unsigned long to void* #9065
* Fix module_param() type for zfs_read_chunk_size #9051
* Move some tests to cli_user/zpool_status #9057
* Race condition between spa async threads and export #9015 #9044
* hdr_recl calls zthr_wakeup() on destroyed zthr #9047
* Fix wrong comment on zcr_blksz_{min,max} #9052
* Retire unused spl_{mutex,rwlock}_{init_fini} #9029
* Linux 5.3 compat: retire rw_tryupgrade() #9029
* Linux 5.3 compat: rw_semaphore owner #9029
* Fix lockdep recursive locking false positive in dbuf_destroy #8984
* Add missing __GFP_HIGHMEM flag to vmalloc #9031
* Use zfsctl_snapshot_hold() wrapper #9039
* Minor style cleanup #9030
* Fix get_special_prop() build failure #9020
* systemd encryption key support #8750 #8848
* Drop redundant POSIX ACL check in zpl_init_acl() #9009
* Export dnode symbols #9027
* Ensure dsl_destroy_head() decrypts objsets #9021
* Disable unused pathname::pn_path* (unneeded in Linux) #9025
* Fixes: #8934 Large kmem_alloc #8934 #9011
* Fix ZTS killed processes detection #9003
* pkg-utils python sitelib for SLES15 #8969
* Fix race in parallel mount's thread dispatching algorithm #8450
  #8833 #8878
* Fix dracut Debian/Ubuntu packaging #8990 #8991
* Remove VERIFY from dsl_dataset_crypt_stats() #8976
* Improve "Unable to automount" error message. #8959
* Check b_freeze_cksum under ZFS_DEBUG_MODIFY conditional #8979
* Fix error text for EINVAL in zfs_receive_one() #8977
* Don't use d_path() for automount mount point for chroot'd process
  #8903 #8966
* nopwrites on dmu_sync-ed blocks can result in a panic #8957
* Avoid extra taskq_dispatch() calls by DMU #8909
* -Y option for zdb is valid #8926
* Fix error message on promoting encrypted dataset #8905 #8935
* Fix out-of-tree build failures #8921 #8943
* dn_struct_rwlock can not be held in dmu_tx_try_assign() #8929
* Remove arch and relax version dependency #8914
* Add libnvpair to libzfs pkg-config #8919
* Let zfs mount all tolerate in-progress mounts #8881
* zstreamdump: add per-record-type counters and an overhead counter
  #8432
* Fix comments on zfs_bookmark_phys #8945
* Add SCSI_PASSTHROUGH to zvols to enable UNMAP support #8933
* Prevent pointer to an out-of-scope local variable #8924 #8940
* dedup=verify doesn't clear the blkptr's dedup flag #8936
* Update vdev_ops_t from illumos #8925
* Allow unencrypted children of encrypted datasets #8737 #8870
* Replace whereis with type in zfs-lib.sh #8920 #8938
* Use ZFS_DEV macro instead of literals #8912
* Fix memory leak in check_disk() #8897 #8911
* kmod-zfs-devel rpm should provide kmod-spl-devel #8930
* ZTS: Fix mmp_interval failure #8906
* Minimize aggsum_compare(&arc_size, arc_c) calls. #8901
* Python config cleanup #8895
* lz4_decompress_abd declared but not defined #8894
* panic in removal_remap test on 4K devices #8893
* compress metadata in later sync passes #8892
* Move write aggregation memory copy out of vq_lock #8890
* Restrict filesystem creation if name referred either '.' or '..'
  #8842 #8564
* ztest: dmu_tx_assign() gets ENOSPC in spa_vdev_remove_thread()
  #8889
* Fix lockdep warning on insmod #8868 #8884
* fat zap should prefetch when iterating #8862
* Target ARC size can get reduced to arc_c_min #8864
* Fix typo in vdev_raidz_math.c #8875 #8880
* Improve ZTS block_device_wait debugging #8839
* Block_device_wait does not return an error code #8839
* Remove redundant redundant remove #8839
* Fix logic error in setpartition function #8839
* Allow metaslab to be unloaded even when not freed from #8837
* Avoid updating zfs_gitrev.h when rev is unchanged #8860
* l2arc_apply_transforms: Fix typo in comment #8822
* Reduced IOPS when all vdevs are in the
  zfs_mg_fragmentation_threshold #8859
* Drop objid argument in zfs_znode_alloc() (sync with OpenZFS) #8841
* Remove vn_set_fs_pwd()/vn_set_pwd() (no need to be at / during
  insmod) #8826
* grammar: it is / plural agreement #8818
* Refactor parent dataset handling in libzfs zfs_rename() #8815
* Update comments to match code #8759
* Update descriptions for vnops #8767
* Drop local definition of MOUNT_BUSY #8765
* kernel timer API rework #8647

Changes for 0.8.3

* Fix zfs-0.8.3 "qat.h"
* Prevent unnecessary resilver restarts #9155 #9378 #9551 #9588
* Fix QAT allocation failure return value #9784 #9788
* Fix zfs-0.8.3 zfs_receive_raw test case
* zdb: print block checksums with 6 d's of verbosity
* zfs-load-key.sh: ${ZFS} is not the zfs binary #9780
* Avoid some crashes when importing a pool with corrupt metadata #9022
* In initramfs, do not prompt if keylocation is "file://" #9764
* libspl: declare aok extern in header #9752
* Cancel initialize and TRIM before vdev_metaslab_fini() #8602 #9751
* Update maximum kernel version to 5.4 #9754 #9759
* Fix for ARC sysctls ignored at runtime
* cppcheck: (warning) Possible null pointer dereference: nvh #9732
* cppcheck: (error) Address of local auto-variable assigned #9732
* cppcheck: (error) Null pointer dereference: who_perm #9732
* cppcheck: (warning) Possible null pointer dereference: dnp #9732
* cppcheck: (error) Memory leak: vtoc #9732
* cppcheck: (error) Shifting signed 64-bit value by 63 bits #9732
* cppcheck: (error) Uninitialized variable #9732
* Exchanged two "${ZFS} get -H -o value" commands #9736
* Create symbolic links in /dev/disk/by-vdev for nvme disk devices
  #9730
* Force systems with kernel option "quiet" to display prompt for
  password #9731
* initramfs: setup keymapping and video for prompts #9723
* Don't fail to apply umask for O_TMPFILE files #8997 #8998
* Allow empty ds_props_obj to be destroyed #9704
* Fix use-after-free of vd_path in spa_vdev_remove() #9706
* zio_decompress_data always ASSERTs successful decompression #9612
  #9630
* Exclude data from cores unconditionally and metadata conditionally
  #9691
* Set send_realloc_files.ksh to use properties.shlib #9679
* Fix reporting of L2ARC hits/misses in arc_summary3 #9669
* Fix zdb_read_block using zio after it is destroyed #9644 #9657
* Fix use-after-free in case of L2ARC prefetch failure #9648
* Increase allowed 'special_small_blocks' maximum value #9131 #9355
* Adapt gitignore for modules #9656
* Fix encryption logic in systemd mount generator #9611
* Fix non-absolute path in systemd mount generator #9611
* Fix small typo in systemd mount generator #9611
* Implement -A (ignore ASSERTs) for zdb #9610
* Remove zfs_vdev_elevator module option #9417 #9609
* Add display of checksums to zdb -R #9607
* Check for unlinked znodes after igrab() #9602
* Remove requirement for -d 1 for zfs list and zfs get with bookmarks
  #9589
* Break out of zfs_zget early if unlinked znode #9583
* Remove inappropiate error message suggesting to use '-r' #9574
* Change zed.service to zfs-zed.service in man page #9581
* Prevent NULL pointer dereference in blkg_tryget() on EL8 kernels
  #9546 #9577
* Add missing documentation for some KMC flags #9034
* Fix zpool create -o <property> error message #9550 #9568
* Improve logging of 128KB writes #9409
* Skip loading already loaded key #9495 #9529
* Add a notice in /etc/defaults/zfs for systemd users #9544
* Include prototypes for vdev_initialize #9535
* dracut/zfs-load-key.sh: properly remove prefixes #9520
* Fix contrib/zcp/Makefile.am #9527
* Fix 'zfs change-key' with unencrypted child #9524
* Fix zpool history unbounded memory usage #9516
* Fix incremental recursive encrypted receive #9494
* Use correct format string when printing int8 #9486
* Name anonymous enum of KMC_BIT constants #9478
* Update skc_obj_alloc for spl kmem caches that are backed by Linux
  #9474
* Modify sharenfs=on default behavior #9397 #9425
* Implement ZPOOL_IMPORT_UDEV_TIMEOUT_MS #9436
* Clarify loop variable name in zfs copies test #9445
* Fix pool creation with feature@allocation_classes disabled #9427
  #9429
* Update zfs program command usage #9056 #9428
* Fix automount for root filesystems #9381 #9384
* Rename rangelock_ functions to zfs_rangelock_ #9402
* Workaround to avoid a race when /var/lib is a persistent dataset
  #9360
* Fix for zfs-dracut regression #8913 #9379
* Perform KABI checks in parallel #8547 #9132 #9341
* SIMD: Use alloc_pages_node to force alignment #9608 #9674
* Linux 5.0 compat: SIMD compatibility
* Add warning for zfs_vdev_elevator option removal #9317
* diff_cb() does not handle large dnodes #7678 #8931 #9343
* Use signed types to prevent subtraction overflow #9355
* Refactor libzfs_error_init newlines #9330
* Device removal of indirect vdev panics the kernel #9327
* Fix clone handling with encryption roots #9267 #9294
* Canonicalize Python shebangs #9314
* Fix stalled txg with repeated noop scans #9300
* Clean up do_vol_test in zfs_copies tests #9286
* Fix noop receive of raw send stream #9221 #9173
* Clean up zfs_clone_010_pos #9284
* Refactor checksum operations in tests #9280
* Use the right booleans #9264
* Fix panic on DilOS with kstat per dataset statistics #9254 #9151
* maxinflight can overflow in spa_load_verify_cb() #9272
* Fix typos #9251
* Fix typos in module/zfs/ #9240
* Fix typos in lib/ #9237
* Fix typos in module/ #9241
* Fix typos in modules/icp/ #9239
* Fix typos in include/ #9238
* Fix typos in etc/ #9236
* Fix typos in contrib/ #9235
* Fix typos in cmd/ #9234
* Fix typos in man/ #9233
* Fix typos in config/ #9232
* Fix refquota_007_neg.ksh #9257
* Prevent metaslab_sync panic due to spa_final_dirty_txg #9185 #9186
  #9231 #9253
* Simplify deleting partitions in libtest #9224
* Use compatible arg order in tests #9228
* Use smaller default slack/delta value for schedule_hrtimeout_range()
  #9217
* Prefer for(;;) to while (TRUE) #9219
* Add regression test for "zpool list -p" #9134
* Split argument list, satisfy shellcheck SC2086 #9212
* Fix install error introduced by #9089
* Document ZFS_DKMS_ENABLE_DEBUGINFO in userland configuration #9191
* Dedup IOC enum values in libzfs_input_check #9188
* Enhance ioctl number checks #9187
* Minor cleanup in Makefile.am #9189
* zfs-functions.in: in_mtab() always returns 1 #9168
* Fix lockdep circular locking false positive involving sa_lock #9110
* Set "none" scheduler if available (initramfs) #9042
* Add more refquota tests #9139
* initramfs: fixes for (debian) initramfs #7904 #9089
* dmu_tx_wait() hang likely due to cv_signal() in
  dsl_pool_dirty_delta() #9137
* Improve write performance by using dmu_read_by_dnode() #9156
* Assert that a dnode's bonuslen never exceeds its recorded size #8348
* Make txg_wait_synced conditional in zfsvfs_teardown #9115
* Prevent race in blkptr_verify against device removal #9112
* Fix device expansion when VM is powered off #9111
* spa_load_verify() may consume too much memory #9146
* Change boolean-like uint8_t fields in znode_t to boolean_t #9092
* Drop KMC_NOEMERGENCY #9119
* Don't wakeup unnecessarily in 'zpool events -f' #9091
* Test cancelling a removal in ZTS #9101
* lockdep false positive - move txg_kick() outside of ->dp_lock #9094
* Add channel program for property based snapshots #8443 #9050
* install path fixes #9087
* Don't activate metaslabs with weight 0 #8968
* OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz
  skip blocks #8973
* Concurrent small allocation defeats large allocation #8843
* Fix bp_embedded_type enum definition #8951
* OpenZFS 9425 - channel programs can be interrupted #8904
* looping in metaslab_block_picker impacts performance on fragmented
  pools #8877
* single-chunk scatter ABDs can be treated as linear #8580
* make zil max block size tunable #8865

Test-Parameters: testlist=sanity,sanity-hsm,sanityn,sanity-quota fstype=zfs ostcount=2 mdscount=2
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I28e9cd70e56a2d73fc9b8347a9ddfe28e0a85090
Reviewed-on: https://review.whamcloud.com/37373
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9855 lustre: replace LPROCFS_CLIMP_CHECK() 56/36956/6
Mr NeilBrown [Mon, 9 Dec 2019 05:33:46 +0000 (16:33 +1100)]
LU-9855 lustre: replace LPROCFS_CLIMP_CHECK()

The usage pattern for LPROCFS_CLIMP_CHECK() is clumsy.
It must be paired with LPROCFS_CLIMP_EXIT(), but
not doing this does not produce a compile-time error.
The 'import' should not be dereferenced before the CHECK, or
used after the EXIT, but sometimes it is.

Replace it with a structure macro/statement:

 with_obd_imp_lock(obd, imp, rc) {
     statements;
 }

statements are protected by the semaphore and only run if imp can be
set to a non-NULL pointer.
rc can be changed by the statements, and should be returned
afterwards as it may have been set to -ENODEV.

Errors fixed with this patch:
- some code tested u.cli.cl_import no-NULL even after
  LPROCFS_CLIMP_CHECK()
- some code dereferences cl_import before calling
  LPROCFS_CLIMP_CHECK()
- short_io_bytes_store() and max_procs_in_flight_store() don't access
  the import, so don't need LPROCFS_CLIMP_CHECK
- lprocfs_import_seq_write() set count to an error before 'goto out'
  which would free memory of length "count+1".
- lprocfs_import_seq_write() also called ptlrpc_recover_import()
  on the imp *after* dropping the semaphore.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If9d5eb452157d7f76796f690569ef13fec111d76
Reviewed-on: https://review.whamcloud.com/36956
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12678 lnet: socklnd: mark all ksock_proto struct 'const'. 94/36894/7
Mr NeilBrown [Wed, 13 Nov 2019 01:30:24 +0000 (12:30 +1100)]
LU-12678 lnet: socklnd: mark all ksock_proto struct 'const'.

These structs are always read-only, so tell the compiler.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icc7c3209135a2ab0d04a822b7053231fd2d9ff0c
Reviewed-on: https://review.whamcloud.com/36894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13004 osp: use KIOV in osp_prep_update_req 28/36828/8
Mr NeilBrown [Tue, 28 Jan 2020 13:43:49 +0000 (08:43 -0500)]
LU-13004 osp: use KIOV in osp_prep_update_req

Convert osp_prep_update_req to use a BULK_BUF_KIOV
rather than a BULK_BUF_KVEC descriptor.

This is a step towards remove KIOV support.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2fdf84d73ba2d34c678b6eb6a8bbd323a761dfe4
Reviewed-on: https://review.whamcloud.com/36828
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13004 target: use KIOV for out_handle 26/36826/6
Mr NeilBrown [Sat, 18 Jan 2020 13:51:51 +0000 (08:51 -0500)]
LU-13004 target: use KIOV for out_handle

Convert out_handle() use use a BULK_BUF_KIOV rather than
a BULK_BUF_KVEC.

This is a step towards removed KVEC support and standardizing
on KIOV.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3f5b1b06183a716ba57d6f7f2a28bf5aa0f76dfe
Reviewed-on: https://review.whamcloud.com/36826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13004 osp: break read request into pages. 25/36825/8
Mr NeilBrown [Tue, 28 Jan 2020 13:46:51 +0000 (08:46 -0500)]
LU-13004 osp: break read request into pages.

Rather than breaking up a read request into arbitrarily
sized (4K) pieces of memory in virtual address space,
break it up into pages (which might be 64K) and
use a kiov rather than kvec to manage them.

This is a step towards removing kvec suport and
standardizing on kiov.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If688764c53066a9c4db212682085fa899d4dde1b
Reviewed-on: https://review.whamcloud.com/36825
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12930 various: use schedule_timeout_*interruptible 56/36656/6
Mr NeilBrown [Mon, 4 Nov 2019 01:05:32 +0000 (12:05 +1100)]
LU-12930 various: use schedule_timeout_*interruptible

The construct:

  set_current_state(TASK_UNINTERRUPTIBLE);
  schedule_timeout(time);

Is more clearly expressed as

  schedule_timeout_uninterruptible(time);

And similarly with TASK_INTERRUPTIBLE /
schedule_timeout_interruptible()

Establishing this practice makes it harder to forget to call
set_current_state() as has happened a couple of times - in
lnet_peer_discovery and mdd_changelog_fini().

Also, there is no need to set_current_state(TASK_RUNNABLE) after
calling schedule*().  That state is guaranteed to have been set.

In mdd_changelog_fini() there was an attempt to sleep for
10 microseconds.  This will always round up to 1 jiffy, so
just make it schedule_timeout_uninterruptible(1).

Finally a few places where the number of seconds was multiplied
by 1, have had the '1 *' removed.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I01b37039de0bf7e07480de372c1a4cfe78a8cdd8
Reviewed-on: https://review.whamcloud.com/36656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12911 llite: Don't access lov_md fields before size check 89/36589/9
Mr NeilBrown [Mon, 28 Oct 2019 01:24:26 +0000 (12:24 +1100)]
LU-12911 llite: Don't access lov_md fields before size check

When 'struct lov_user_md' is passed in via setxattr, it comes with
a size.  If thatt size is too small, some function that check exactly
what version is present might access beyond the end of allocation
memory, which can have undesirable effects, such as triggering
a KASAN warning (and possibly worse).

So check that the size is sane before looking inside the structure
at all.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib3f071a3ff77a039fdfa38c903d87999108b3322
Reviewed-on: https://review.whamcloud.com/36589
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-12461 contrib: Add epython scripts for crash dump analysis 82/35282/4
Ann Koehler [Thu, 20 Jun 2019 18:25:02 +0000 (13:25 -0500)]
LU-12461 contrib: Add epython scripts for crash dump analysis

This mod creates a new subdirectory, debug_tools/epython_scripts,
in ./contrib to contain PyKdump scripts. These scripts written in
an extended version of Python aid in memory dump analysis by
extracting and formatting the content of Lustre data structures.

The scripts are written using Python 2.7 and tested on Lustre 2.11
client dumps.

Test-Parameters: trivial

Cray-bug-id: LUS-7501
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I0a15eb9025fb604742f4ae99508a080ce04163dc
Reviewed-on: https://review.whamcloud.com/35282
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12518 llite: fix stride window increase 93/35893/5
Wang Shilong [Fri, 23 Aug 2019 06:17:41 +0000 (14:17 +0800)]
LU-12518 llite: fix stride window increase

Fix following problems:

1. stride_byte_count() argument @off should be @windows_start
rather than @stride_offset to calculate stride bytes.

2. In a limited memory client(for testing etc), we could possibly
have ra_rpc_size(64M) initially > ra_max_pages_per_file, this will
make possibly @window_len 0 after ras_increase_window()

3. @window_len in ras_stride_increase_window() could be negative,
be carefully to avoid overflow.

Change-Id: Ied00bec834d4bb0ad04b688c10a03bbcd667f39b
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35893
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12644 llite: try fast io for stride io correctly 66/35466/13
Wang Shilong [Thu, 8 Aug 2019 17:14:19 +0000 (13:14 -0400)]
LU-12644 llite: try fast io for stride io correctly

We could have a really large gap for stride, calculate
skip pages correctly, otherwise, we will see many small
RPC with large stride gap.

Change-Id: Id72405c11234a2075f3cce4733d23544fe15eb17
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35466
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12756 lnet: Remove unnecessary rtr_nid argument 40/36540/7
Chris Horn [Tue, 22 Oct 2019 02:10:57 +0000 (21:10 -0500)]
LU-12756 lnet: Remove unnecessary rtr_nid argument

Cache the rtr_nid argument in lnet_select_pathway() the same way we
cache the src_nid argument.

Also remove the unnecessary lnet_nid_t variable that stores the
lp_primary_nid solely for the purposes of printing a debug message.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I0a265bbb1c57eba0373a38fbacacceb64faf4614
Reviewed-on: https://review.whamcloud.com/36540
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12756 lnet: Introduce lnet_msg_is_response 39/36539/7
Chris Horn [Tue, 22 Oct 2019 01:38:21 +0000 (20:38 -0500)]
LU-12756 lnet: Introduce lnet_msg_is_response

Implement function to determine if an lnet_msg is a response
(ACK or REPLY).

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I8ba2d92866f8bb2caba120d9f23218bb7761143a
Reviewed-on: https://review.whamcloud.com/36539
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12756 lnet: Refactor lnet_find_existing_preferred_best_ni 38/36538/7
Chris Horn [Tue, 22 Oct 2019 01:14:04 +0000 (20:14 -0500)]
LU-12756 lnet: Refactor lnet_find_existing_preferred_best_ni

Replace lnet_send_data argument.

Get rid of unnecessary lookup for lnet_peer_net.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I69e733d4a0af55ec480df4a13e9153757212333e
Reviewed-on: https://review.whamcloud.com/36538
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12756 lnet: Refactor lnet_set_non_mr_pref_nid 37/36537/7
Chris Horn [Tue, 22 Oct 2019 01:04:08 +0000 (20:04 -0500)]
LU-12756 lnet: Refactor lnet_set_non_mr_pref_nid

Replace lnet_send_data argument.

The sd_send_case check can be removed because all call paths already
satisfy this condition.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I95707c7457edef44eec7d00bde93f731545f8c4e
Reviewed-on: https://review.whamcloud.com/36537
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-8130 lov: convert lo[v|d]_pool to use rhashtable 62/32662/13
NeilBrown [Fri, 24 Jan 2020 15:26:34 +0000 (10:26 -0500)]
LU-8130 lov: convert lo[v|d]_pool to use rhashtable

The pools hashtable can be implemented using
the rhashtable implementation in lib.
This has the benefit that lookups are lock-free.

We need to use kfree_rcu() to free a pool so
that a lookup racing with a deletion will not access
freed memory.

rhashtable has no combined lookup-and-delete interface,
but as the lookup is lockless and the chains are short,
this brings little cost.  Even if a lookup finds a pool,
we must be prepared for the delete to fail to find it,
as we might race with another thread doing a delete.

We use atomic_inc_not_zero() after finding a pool in the
hash table and if that fails, we must have raced with a
deletion, so we treat the lookup as a failure.

Use hashlen_string() rather than a hand-crafted hash
function.
Note that the pool_name, and the search key, are
guaranteed to be nul terminated.

Based on

Linux-commit: 055ed193b190edac539f37a66699b02eae3a19a9

with the port of server side pool handling to rhashtables.

Change-Id: Ia5b4cbbd17515ea43a473e91719b3665f46b0d0a
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/32662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 lnet: discard lnet_ping_buffer_numref() 58/37458/2
Mr NeilBrown [Thu, 7 Nov 2019 04:10:16 +0000 (15:10 +1100)]
LU-9679 lnet: discard lnet_ping_buffer_numref()

This inline function simply reads an atomic_t.  Having
it doesn't make the code any more readable and would
make a subsequent patch a little more awkward.
So remove it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I21a1d2187a654f139a02c0045601086fe612e5bd
Reviewed-on: https://review.whamcloud.com/37458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 modules: convert MIN/MAX to kernel style 56/37456/2
Mr NeilBrown [Wed, 4 Dec 2019 01:26:46 +0000 (12:26 +1100)]
LU-9679 modules: convert MIN/MAX to kernel style

The linux kernel provides a variety of min/max style macros which
ensure type correctness - not risking signed vs unsigned comparisons
etc.

min_t and max_t can be given a type, but if the type of the
args is identicatl min/max can be used.

We also have min3() and max3() to compare three values of identical
type, and clamp_t() to restrict a value to a given range
(min(max(...)).

Use these as appropriate throughout the lustre/lnet kernel code.

The variables (rlength and mlength) have their type changed from int
to unsigned int as this makes more sense in the context, and allows
min() to be used.

Similarly the return type of kiblnd_rd_frag_size() is changed from
__u32 to unsigned int as the return value is *only* used in a min3()
comparison with another unsigned it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9f0cdd23b78d2f9dd04ba58e9b9c7df8d1ee3ca1
Reviewed-on: https://review.whamcloud.com/37456
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12651 osc: always call update_next_shrink 29/37429/4
Alexander Zarochentsev [Tue, 4 Feb 2020 17:47:06 +0000 (20:47 +0300)]
LU-12651 osc: always call update_next_shrink

Call update_next_shrink in case of clients not
supporting grant shrinking or clients with grant
shrinking explicitely disabled. Otherwise
osc_grant_work_handler() schedules itself immediately
after its completion causing excessive CPU consumption.

Fixes: 3e070e30a98d ("LU-8708 osc: enable/disable OSC grant shrink")

Cray-bug-id: LUS-8460
Change-Id: I507b3d10dd5374772456853098bc26053cbd140d
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/37429
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 lustre: avoid cast of file->private_data 51/36651/4
Mr NeilBrown [Sun, 3 Nov 2019 23:02:58 +0000 (10:02 +1100)]
LU-9679 lustre: avoid cast of file->private_data

Instead of
  foo = ((struct seq_file*)file->private_data)->private;
use
  struct seq_file *m = file->private_data;
  foo = m->private;

Many places is lustre use this second style already.
It is much less noisy and prefered for upstream Linux.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9a7adb102687496f43bab099b1ca584955f040c9
Reviewed-on: https://review.whamcloud.com/36651
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
3 months agoLU-12321 mdc: allow ELC for DOM file unlink 42/36442/10
Mikhail Pershin [Fri, 27 Sep 2019 18:29:00 +0000 (21:29 +0300)]
LU-12321 mdc: allow ELC for DOM file unlink

ELC is skipping DOM bit to prevent data flush when it
is not really needed. Meanwhile if lock bits are combined
that caused unlink slowdown because ELC is disabled for
whole lock if DOM bit exists.

This patch is simple approach which determines if inode has
dirty pages and allows ELC for DOM unlink if there are none.

Test result of mdtest_easy_delete on DoM that unlink for
zero-byte files demostrated 28% perforamnce improvements.

1 x AI400(4 x MDS/MDT) on 10 node challenges:
Without patch:
mdtest_easy_delete  96.564 kiops : time 649.36 seconds
With patch:
mdtest_easy_delete 123.630 kiops : time 454.82 seconds

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic5b2aed8c8c0884ee518a587a0c45ad54915f4fa
Reviewed-on: https://review.whamcloud.com/36442
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
3 months agoLU-11961 nodemap: nodemap_create() handles default nodemap 45/34245/8
Sebastien Buisson [Wed, 13 Feb 2019 15:41:47 +0000 (00:41 +0900)]
LU-11961 nodemap: nodemap_create() handles default nodemap

nodemap_create() is responsible for assigning nmc_default_nodemap
so it should not be done outside of this function.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8d0615196e32fb8e6c59ddedd421323a7d6eff7f
Reviewed-on: https://review.whamcloud.com/34245
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13235 lnet: copy the correct amount of CPTs to lnet_cpts 36/36636/4
Mr NeilBrown [Tue, 4 Feb 2020 15:52:22 +0000 (10:52 -0500)]
LU-13235 lnet: copy the correct amount of CPTs to lnet_cpts

A previous patch fixed one of three memcpy() calls in
lnet_net_append_cpts() to copy the correct number of bytes.
This patch fixes the other two.

Test-Parameters: trivial testlist=sanity-lnet

Fixes: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I5a3450b0043c60b6c432db5be47f1e27ecc1fc94
Reviewed-on: https://review.whamcloud.com/36636
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-13210 lnet: gcc8 add implicit-fallthrough decorator 66/37466/3
Shaun Tancheff [Thu, 6 Feb 2020 22:44:13 +0000 (16:44 -0600)]
LU-13210 lnet: gcc8 add implicit-fallthrough decorator

With newer compilers and newer kernels -Werror=implicit-fallthrough
is enabled.

This adds the missing decorator.

Test-Parameters: trivial
Cray-bug-id: LUS-8476
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I47334d5a8d0bcf17489c1b15af29cd553fa01a09
Reviewed-on: https://review.whamcloud.com/37466
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoNew tag 2.13.52 2.13.52 v2_13_52
Oleg Drokin [Wed, 12 Feb 2020 06:18:35 +0000 (01:18 -0500)]
New tag 2.13.52

Change-Id: Iafa9279dd716bac93851412e64ef7b7e85945353
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12988 ldiskfs: mballoc to prefetch groups 93/36893/15
Alex Zhuravlev [Mon, 2 Dec 2019 08:23:30 +0000 (11:23 +0300)]
LU-12988 ldiskfs: mballoc to prefetch groups

ahead of scanning. prefething is done in 8 * flex_bg groups, so
it should be 8 read-ahead reads for a single allocating thread.
at the end of allocation the allocating thread waits for read-ahead
completion and initializes buddy information so that read-aheads
are not lost in case of memory pressure.
at cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of
bitmaps looking for a perfect group and ignoring groups with
good chunks.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If86e3aff75379e064f70c0a66e2d65bdc5593651
Reviewed-on: https://review.whamcloud.com/36893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13180 lustre: reserve bit for RDMA-only memory RPC 83/37383/3
Wang Shilong [Fri, 31 Jan 2020 07:21:30 +0000 (15:21 +0800)]
LU-13180 lustre: reserve bit for RDMA-only memory RPC

This is reserved for RDMA-only memory integrated with Lustre.
The purpose of this bit is to:

1) disable short IO if memory is not dirextly addressie by CPU.
2) prevent CPU memory pages and RDMA memory pages merging into one RPC.

Test-Parameters: trivial
Change-Id: I148b269c5e7d7c52e760b20a6482c259407e0898
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37383
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
3 months agoLU-13134 obdclass: use slab allocation for cl_dio_aio 27/37227/6
Wang Shilong [Tue, 14 Jan 2020 15:00:03 +0000 (23:00 +0800)]
LU-13134 obdclass: use slab allocation for cl_dio_aio

cl_dio_aio is used frequently for dio/aio, try to use
a private slab pool for it.

This could help improve aio performance.

Change-Id: Ic06523ae59eed04e55c17ac03af9187af8f695c5
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37227
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-4198 clio: AIO support for direct IO 16/32416/28
Jinshan Xiong [Mon, 29 Apr 2019 08:30:05 +0000 (16:30 +0800)]
LU-4198 clio: AIO support for direct IO

This patch try to add aio support for Lustre, AIO is
doing IO like DIO but we don't wait IO finished upon
return, we return EIOCBQUEUED to vfs instead to indicate IO
have been issued, aio_complete() will be called in the
callback once IO have been done.

  fio AIO/DIO bandwidth results:
  # numjob=4, bs=512k

  MB/s      write       read
  master      832       1806
  patched    6591      11800

  fio AIO/DIO IOPS results:
  # 32 clients, 8192 threads
  # ioengine=libaio rw=randread blocksize=4096 iodepth=128 direct=1
  # size=1g runtime=300 group_reporting numjobs=256 create_serialize=0

  IOPS      write       read
  master      99K      1239K
  patched    265K      3498K

Test-Parameters: testgroup=review-ldiskfs-arm
Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: If2ac9283612514e10fe342fc43e95b4081347168
Reviewed-on: https://review.whamcloud.com/32416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-4198 clio: turn on lockless for some kind of IO 01/8201/46
Jinshan Xiong [Thu, 9 Mar 2017 19:30:00 +0000 (11:30 -0800)]
LU-4198 clio: turn on lockless for some kind of IO

We can safely turn on lockless for Direct IO
and no lock.

Direct IO will still enqueue lock in the server side,
and we could not use lockless for in the following case:

1) If group lock is held before DIO, use lockless will
make us deadlock, so we use group lock instead and trust
this to protect consistecy.

2) Direct IO might fallback to Buffer IO in some cases,
and we will restart Direct IO with normal lock holding

The main motivation for this patch is to support AIO.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ia004d6b39272df8159c9df3cc76662e198230b55
Reviewed-on: https://review.whamcloud.com/8201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13145 lnet: use conservative health timeouts 30/37430/2
Andreas Dilger [Fri, 31 Jan 2020 20:00:00 +0000 (13:00 -0700)]
LU-13145 lnet: use conservative health timeouts

Use more conservative lnet_transaction_timeout and lnet_retry_count
values by default.  Currently with timeout=10 and retry=3 there is
only a 3s window for the RPC to be sent before it is timed out.
This has caused fault injection rather than fault tolerance.
Increase the default timeout to 50s with retry=2, which is hopefully
long enough to cover virtually all uses, but still allows LNet Health
to be enabled by default and resend before Lustre times out itself.

Fixes: 8632e94aeb7e ("LU-11816 lnet: setup health timeout defaults")

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6bfc4d61cebab38c1554e1b42834b1f38fc34ba8
Reviewed-on: https://review.whamcloud.com/37430
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12593 osd: up i_append_sem during errors 06/37406/3
Alexander Boyko [Mon, 3 Feb 2020 09:24:40 +0000 (04:24 -0500)]
LU-12593 osd: up i_append_sem during errors

There is a potential leak of i_append_sem during errors for
buffer head read and ldiskfs_joural_get_write_access() at
osd_ldiskfs_write_record().
The patch adds up(i_append_sem) for errors paths.

Fixes: f832a7dc33c6 ("LU-12593 osd: zeroing a freshly allocated block buffer")
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I245d0c45af03519c66b75731e5d57f42de41fe95
Reviewed-on: https://review.whamcloud.com/37406
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13191 osp: handle -EROFS in osp_sync_interpret() 04/37404/2
Lai Siyao [Sat, 25 Jan 2020 21:23:28 +0000 (05:23 +0800)]
LU-13191 osp: handle -EROFS in osp_sync_interpret()

Upon OST disk failure, osp_sync_interpret() may get -EROFS,
which is a valid errno.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I5c3cff3019aa47c6d5803f0f0b373bc704f18118
Reviewed-on: https://review.whamcloud.com/37404
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13163 mdc: new kernel function xa_is_value() 99/37399/3
Lai Siyao [Sat, 25 Jan 2020 00:30:44 +0000 (08:30 +0800)]
LU-13163 mdc: new kernel function xa_is_value()

xa_is_value() is added in kernel 4.19-rc6 to replace
radix_tree_entry_exceptional().

Test-Parameters: trivial clientdistro=el8.1 envdefinitions=ONLY=65i testlist=sanity,sanity,sanity,sanity,sanity
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If89aa19c37af8a67debe782d1c77f4ef4dc6f923
Reviewed-on: https://review.whamcloud.com/37399
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-8304 libcfs: convert debug_ctlwq to a completion. 98/37398/3
NeilBrown [Sun, 2 Feb 2020 02:15:17 +0000 (21:15 -0500)]
LU-8304 libcfs: convert debug_ctlwq to a completion.

kthread_run might sleep during an allocation, and so
it's considered unsafe to call with a state that's not
RUNNABLE.
Rather than move the state setting to after kthread_run, which
introduces a small race, replace the waitqueue with a completion.
This has clean semantics which perfectly match the need here.

Change-Id: Ic3bcf21dc747d73ce482e2d50bffd6c43fc04fbc
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/37398
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13183 ldiskfs: Drop remove truncate warning patch 89/37389/3
Shaun Tancheff [Fri, 31 Jan 2020 18:28:28 +0000 (12:28 -0600)]
LU-13183 ldiskfs: Drop remove truncate warning patch

Drop the ext4-remove-truncate-warning.patch as it was
removed as part of
    f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock ...")
and is not needed.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I78667ba380e9e78d4972377e59fa56bc27f15bb5
Reviewed-on: https://review.whamcloud.com/37389
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11300 lnet: remove lnd_query interface. 37/37337/4
Mr NeilBrown [Tue, 28 Jan 2020 00:31:31 +0000 (11:31 +1100)]
LU-11300 lnet: remove lnd_query interface.

The ->lnd_query interface is completely unused, and has been since
commit 8e498d3f23ea ("LU-11300 lnet: peer aliveness")

So remove all mention of it.

Fixes: 8e498d3f23ea ("LU-11300 lnet: peer aliveness")
Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iff11652283b371519cf31bf66b9ba08e024d3193
Reviewed-on: https://review.whamcloud.com/37337
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12988 ldiskfs: skip non-loaded groups at cr=0/1 91/36891/7
Alex Zhuravlev [Thu, 28 Nov 2019 12:04:25 +0000 (15:04 +0300)]
LU-12988 ldiskfs: skip non-loaded groups at cr=0/1

cr=0 is supposed to be an optimization to save CPU cycles,
but if buddy data (in memory) is not initialized then all
this makes no sense as we have to do sync IO taking a lot
of cycles.  also, at cr=0 mballoc doesn't store any avaibale
chunk. cr=1 also skips groups using heruistic based on avg.
fragment size.
it's more useful to skip such groups and switch to cr=2 where
groups will be scanned for available chunks.

using sparse image and dm-slow virtual device of 120TB was
simulated. then the image was formatted as OST and filled
using debugfs to mark ~85% of available space as busy.
mount as OST w/o the patch couldn't complete in half an hour
(according to vmstat it would take ~10-11 hours). with the
patch applied mount took ~20 seconds.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I88c8c1b01b386af0fa438bfeb97acb6110bd00ec
Reviewed-on: https://review.whamcloud.com/36891
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Artem Blagodarenko <c17828@cray.com>
3 months agoLU-13165 mdt: MSG_RESENT can be improperly cleared. 96/37296/2
Andriy Skulysh [Wed, 9 Oct 2019 19:53:14 +0000 (22:53 +0300)]
LU-13165 mdt: MSG_RESENT can be improperly cleared.

req_can_reconstruct() can return -EPROTO, it means that
original request was processed and reply was received.

Change-Id: I06ba9aa24821f414777d38e9ca606652b172e92c
Fixes: 23773b32bf ("LU-11444 ptlrpc: resend may corrupt the data")
Cray-bug-id: LUS-7972
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/37296
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12542 handle: discard h_lock. 63/35863/7
NeilBrown [Fri, 13 Dec 2019 15:48:18 +0000 (10:48 -0500)]
LU-12542 handle: discard h_lock.

The h_lock spinlock is now only taken while bucket->lock
is held.  As a handle is associated with precisely one bucket,
this means that h_lock can never be contended, so it isn't needed.

So discard h_lock.

Also discard an increasingly irrelevant comment in the declaration
of struct portals_handle.

Change-Id: Ib5231fb43d1bf5031d5c2426c4e1d1865544bcf5
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35863
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11607 tests: replace version/fstype calls in sanity/n 19/35719/8
James Nunez [Wed, 7 Aug 2019 19:27:13 +0000 (13:27 -0600)]
LU-11607 tests: replace version/fstype calls in sanity/n

The routine get_lustre_env() is available to all Lustre
test suites and sets an environment variable for the file
system type for MDS1 and OST1 and sets a variable for the
Lustre version of servers.

Replace the calls to facet_fstype() and lustre_version_code()
for all server types defined in get_lustre_env().  While
doing this, replace SINGLEMDS with mds1 in these calls.

Clean up around any modifications with
- converting spaces to tabs
- removing calls to return after skip() or skip_env()

Test-Parameters: trivial testlist=sanityn
Test-Parameters: fstype=zfs testlist=sanityn,sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ibc66220ae3b57cf22395d13f5d35feceeb61adfe
Reviewed-on: https://review.whamcloud.com/35719
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
3 months agoLU-10447 tests: deprecate use of $SETSTRIPE/$GETSTRIPE 25/33925/3
James Nunez [Thu, 27 Dec 2018 16:50:48 +0000 (09:50 -0700)]
LU-10447 tests: deprecate use of $SETSTRIPE/$GETSTRIPE

$SETSTRIPE and $GETSTRIPE were needed when we used the
standalone 'lstripe' utility. 'lstripe' hasn't been used
for years and we need to clean up all remnants of it.

Remove the definition and replace all instances of
$SETSTRIPE with '$LFS setstripe' and $GETSTRIPE with
'$LFS getstripe' in test-framework library.

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ibd78b2d75b0b8fc7ff686c1b0a73ce51fe9452e2
Reviewed-on: https://review.whamcloud.com/33925
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-9679 lustre: use LIST_HEAD() for local lists. 55/36955/4
Mr NeilBrown [Thu, 5 Dec 2019 06:09:19 +0000 (17:09 +1100)]
LU-9679 lustre: use LIST_HEAD() for local lists.

When declaring a local list head, instead of

   struct list_head list;
   INIT_LIST_HEAD(&list);

use
   LIST_HEAD(list);

which does both steps.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I67bda77c04479e9b2b8c84f02bfb86d9c2ef5671
Reviewed-on: https://review.whamcloud.com/36955
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 lnet: use LIST_HEAD() for local lists. 54/36954/4
Mr NeilBrown [Thu, 5 Dec 2019 05:56:16 +0000 (16:56 +1100)]
LU-9679 lnet: use LIST_HEAD() for local lists.

When declaring a local list head, instead of

   struct list_head list;
   INIT_LIST_HEAD(&list);

use
   LIST_HEAD(list);

which does both steps.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia1f1f1abf1b8a9f50e3033976990010b1d2100db
Reviewed-on: https://review.whamcloud.com/36954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 lnet: discard lnet_print_text_bufs() 59/36659/2
Mr NeilBrown [Mon, 4 Nov 2019 04:20:58 +0000 (15:20 +1100)]
LU-9679 lnet: discard lnet_print_text_bufs()

lnet_print_text_bufs() is unused and has
never been used since it was introduced in
Commit ed88907a96ba ("Landing b_hd_newconfig on HEAD")
So let's remove it.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic412cdef4981a043e94060e5de5646b836bb0e36
Reviewed-on: https://review.whamcloud.com/36659
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9679 general: add missing spaces to folded strings. 53/36653/5
Mr NeilBrown [Sun, 3 Nov 2019 23:40:56 +0000 (10:40 +1100)]
LU-9679 general: add missing spaces to folded strings.

Many places in lustre fold a long string onto multiple lines,
usually at word breaks.  Sometimes the space between those words
got lost.
In a couple of places, a newline (n) rather than a space was lost.

This patch adds those spaces (and newlines) back in.

Where a space was added, the whole string is joined onto a
single line as this is current policy - encouraged by checkpatch.

In a couple of places neighbouring strings are also joined
into a single line, and some code has been re-indented to use
TABs.

Where the missing space was in a .diff file, the string hasn't
been joined into a line, as it doesn't seem worth the churn.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I6882bb957df566da0794f4ee85133dbf8c3debc1
Reviewed-on: https://review.whamcloud.com/36653
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 months agoLU-10467 ptlrpc: convert use of l_wait_event_exclusive_head() 86/35986/11
Mr NeilBrown [Sat, 18 Jan 2020 14:46:59 +0000 (09:46 -0500)]
LU-10467 ptlrpc: convert use of l_wait_event_exclusive_head()

Only one place uses l_wait_event_exclusive_head().
It uses an on_timeout function that returns non-zero, so
the wait aborts after timeout.

Change this to wait_event_idle_exclusive_lifo_timeout(),
and if it times out, perform the same action as the
on_timeout handler - a simple assignment.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I11bee6aa1eceb6564fb72e41528f2f6a80b0d207
Reviewed-on: https://review.whamcloud.com/35986
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 ldlm: convert waiting in ldlm_completion_ast() 85/35985/14
Mr NeilBrown [Sat, 18 Jan 2020 14:17:51 +0000 (09:17 -0500)]
LU-10467 ldlm: convert waiting in ldlm_completion_ast()

ldlm_completion_ast() calls l_wait_event() in two slightly different
ways depending on whether a timeout is defined.

As a non-NULL _on_signal handler in passed, the non-timed-out portion
of the wait allows signals (abortable).  As the on_timeout handler
return zero, the timed-out portion of the wait is always followed by a
non-timedout portion.

So if no timeout is defined, we can simply wait with
l_wait_event_abortable().

If there is a timeout, we first wait with wait_event_idle_timeout()
and if that times out, we call ldlm_expired_completion_wait(), then
wait with l_wait_event_abortable().

Change-Id: I6874010085864764f2fc0e294dc0c67152cb2ad2
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35985
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 ldlm: convert waiting in ldlm_flock_completion_ast() 84/35984/11
Mr NeilBrown [Sat, 18 Jan 2020 14:19:29 +0000 (09:19 -0500)]
LU-10467 ldlm: convert waiting in ldlm_flock_completion_ast()

The l_wait_event() call in ldlm_flock_completion_ast() sets no
timeout, and so always enables fatal signals.  So it can be converted
to l_wait_event_abortable().

It is passed an on_signal handler, so that needs to be called if
l_wait_event_abortable() returns a negative result.  As this is the
only place the handler is call, it can be inlined.  We already have an
'if' which captures the 'wait was interrupted' condition, so place the
signal handler code in there.

This makes struct ldlm_flock_wait_data redundant.  In fact the
fwd_genertion field in there was already unused.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I9cc3a3e8b593a66f46183584382dc13169ff9adf
Reviewed-on: https://review.whamcloud.com/35984
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 ptlrpc: convert waiters on set->set_waitq 82/35982/13
Mr NeilBrown [Fri, 24 Jan 2020 14:45:44 +0000 (09:45 -0500)]
LU-10467 ptlrpc: convert waiters on set->set_waitq

There are a couple of interesting aspects of waiters on ->set_waitq.

One is the only usage of LWI_TIMEOUT_INTR_ALL().  This causes
l_wait_event() to enable "fatal" signals during the timeout
part of the wait. (normally signals are completely blocked when
there is a timeout).
This can be converted to l_wait_event_abortable_timeout().

Another is that ptlrpc_expired_set() is passed as the on_timeout
handler.  As this always returns true, it cauess l_wait_event()
to quit after the timeout, and not go "back to sleep".
We can instead call this explicitly after the wait_event_timeout
returns 0 - which means that it timedout.
Due to this change in call pattern, we can change the function to
take a ptlrpc_request_set* instead of a void*, and to not return
anything.

Also, ptlrpc_interrupted_set() is sometimes passed as the on_signal
function.  Instead we can explicitly call this when we get a negative
return from wait_event_abortable.  Again, we can declare it as
taking the real type and not a void*.

The wait on set_waitq in ptlrpcd() might be a timedout wait or,
if timeout == 0, it is an indefinite wait.  We make that explicit
with 2 separate cases.

So this patch:
  - changes to wait_event_idle_timeout and
    l_wait_event_abortable_timeout,
  - calls ptlrpc_*_set explicitly based on return code
  - changes signatures for ptlrpc_*_set()

Change-Id: Ieb97aa3ba9b1f988a30bb7a424588f87f75e8023
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/35982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 lustre: convert users of back_to_sleep() 80/35980/11
Mr NeilBrown [Sat, 18 Jan 2020 14:39:47 +0000 (09:39 -0500)]
LU-10467 lustre: convert users of back_to_sleep()

When back_to_sleep() is passed to l_wait_event as
the on_timeout hander, the effect is to potentially wait twice.
The first wait ignores all signals and has a timeout.
If the timeout fires without the event occuring, the l_wait_event()
goes "back to sleep" indefinitely, but this time with fatal
signals unblocked.

This pattern can be made more clear with two separate wait calls:
  wait_event_idle_timeout() followed by l_wait_event_abortable().

Change-Id: I3536e33b4d982f37c960f31df1ea0d9808f9ced7
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35980
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10467 lustre: convert most users of LWI_TIMEOUT_INTERVAL() 73/35973/16
Mr NeilBrown [Sat, 18 Jan 2020 14:38:03 +0000 (09:38 -0500)]
LU-10467 lustre: convert most users of LWI_TIMEOUT_INTERVAL()

when l_wait_event() is called with an lwi initialised with
LWI_TIMEOUT_INTERVAL(t1, t2, NULL, NUL),
waits for a total of t1 jiffies, but wakes up every t2 jiffies
to check the condition - incase the condition changed without
triggering a wakeup.
In (nearly) every case, t2 is one second.
So this is effectively a poll loop around wait_event_timeout.
So replace with with

 seconds = t1;
 while (seconds > 0 &&
        wait_event_timeout(q, cond, cfs_time_seconds(1)) == 0)
     seconds -= 1;

Then if seconds is zero at the end, the whole loop timed out.

In the one exception ("nearly" above) if t1 is small, t2 is set to one
jiffies, so we always wait a little bit and check the condition.  For
that case, we count to "seconds >= 0" and adjust the timeout
accordingly when seconds == 0.

Note that in one case, the on_timeout function is
target_bulk_timeout() instead of NULL.  As this always returns '1', it
behaves exactly like passing NULL.

Change-Id: I4cddbd2c28f07012cce7915489eedcb668c7e808
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/35973
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12287 lnet: handling device failure by IB event handler 37/35037/7
Tatsushi Takamura [Mon, 3 Jun 2019 01:11:24 +0000 (10:11 +0900)]
LU-12287 lnet: handling device failure by IB event handler

The following IB events cannot be handled by QP event handler
- IB_EVENT_DEVICE_FATAL
- IB_EVENT_PORT_ERR
- IB_EVENT_PORT_ACTIVE

IB event handler handles device errors such as hardware errors
and link down.

Test-Parameters: trivial
Signed-off-by: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I9869fb1cd1172040e0dd34828318017a0f30df81
Reviewed-on: https://review.whamcloud.com/35037
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10198 llog: keep llog handle alive until last reference 67/37367/2
Mikhail Pershin [Wed, 29 Jan 2020 21:22:07 +0000 (00:22 +0300)]
LU-10198 llog: keep llog handle alive until last reference

Llog handle keeps related dt_object pinned until llog_close()
call, meanwhile llog handle can still have other users which
took llog handle via llog_cat_id2handle()

Patch changes llog_handle_put() to call lop_close() upon last
reference drop. So llog_osd_close() will put dt_object only
when llog_handle has no more references.
The llog_handle_get() checks and reports if llog_handle has
zero reference.
Also patch modifies checks for destroyed llogs, llog handle
has new lgh_destroyed flag which is set when llog is destroyed,
llog_osd_exist() checks dt_object_exist() and lgh_destroyed
flag, so destroyed llogs are considered as non-existent too.
Previously it uses lu_object_is_dying() check which is not
reliable because means only that object is not to be kept in
cache.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If7df41646c243c0d40b20a30a33e86c688d24508
Reviewed-on: https://review.whamcloud.com/37367
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13128 osc: glimpse and lock cancel race 15/37215/5
Alexander Zarochentsev [Thu, 9 Jan 2020 17:45:56 +0000 (20:45 +0300)]
LU-13128 osc: glimpse and lock cancel race

osc_dlm_blocking_ast0 clears l_ast_data before writing
file data to OST and opens a race window. Neither a glimpse
AST nor ldlm_cb_interpret can find correct file attributes at
that moment.

Cray-bug-id: LUS-8344
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: Iadac4f7da94b71639430c9a7cdd77d55e7ba2849
Reviewed-on: https://review.whamcloud.com/37215
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12852 pfl: restrict the stripe count correctly 47/36947/5
Emoly Liu [Fri, 6 Dec 2019 02:08:07 +0000 (10:08 +0800)]
LU-12852 pfl: restrict the stripe count correctly

In function lod_get_stripe_count(), when restricting the stripe
count to the maximum xattr size, the xattr overhead should be
taken into count correctly.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ief548e47ce4d375f2e189860ccfe05d0f3c7e890
Reviewed-on: https://review.whamcloud.com/36947
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11939 tgt: Do not assert during grant cleanup 15/34215/7
Patrick Farrell [Fri, 8 Feb 2019 17:14:06 +0000 (12:14 -0500)]
LU-11939 tgt: Do not assert during grant cleanup

Client/server grant inconsistencies discovered during
cleanup are indicative of a bug, but any problems they
would cause have already occurred at this point.

So do not assert during this cleanup.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic9b827b1005bc321a290505a368349699ddf2f38
Reviewed-on: https://review.whamcloud.com/34215
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13194 tests: check server version sanityn 104 61/37461/2
James Nunez [Tue, 4 Feb 2020 04:15:10 +0000 (21:15 -0700)]
LU-13194 tests: check server version sanityn 104

Check the server version before running sanityn test 104.
If the server version is less than 2.12.4, skip the test.

Fixes: d2f7cb7934a0 ("LU-12026 mdt: MDS stores atime|mtime|ctime")

Test-Parameters: trivial serverversion=2.11.0 serverdistro=el7 envdefinitions=ONLY=104 testlist=sanityn
Test-Parameters: envdefinitions=ONLY=104 testlist=sanityn

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I625fb0163c078dc95ed670d169dc5744bc16d4e8
Reviewed-on: https://review.whamcloud.com/37461
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
3 months agoLU-12598 osd-ldiskfs: always return errors for osd_ios_lf_fill 23/37323/2
James Simmons [Fri, 24 Jan 2020 16:45:48 +0000 (11:45 -0500)]
LU-12598 osd-ldiskfs: always return errors for osd_ios_lf_fill

While working on ARM ldiskfs support it was noticed that
osd_ios_lf_fill() behaves differently then the other olm_filldir
handlers. On failure of osd_lookup_one_len() osd_ios_lf_fill()
silently returns zero when it should return an error code. Change
to return proper error codes and update the cdebug messages.

Change-Id: I528b18aaa7277133875cba5db3150ce34cc6431a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37323
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13063 tests: remove checks for old RHEL versions 72/37272/4
Andreas Dilger [Fri, 17 Jan 2020 23:38:02 +0000 (16:38 -0700)]
LU-13063 tests: remove checks for old RHEL versions

There was a check in sanity test_17g for RHEL6.5, but we haven't been
testing that client version for some time, and 6.5 no longer works on
master.  Remove this check entirely

Similarly, is_project_quota_supported() was trying to check for RHEL7,
but the lfs check was being done on the client.  It was also wrong for
RHEL8 kernels, and would incorrectly match any version with a "7" in
it.  Move lfs check to the MDS, and don't check the kernel version.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If115b65ba8bc09b6c292ec9cf2e949c8153ebbe5
Reviewed-on: https://review.whamcloud.com/37272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13142 lod: cleanup layout checking 67/37267/4
Sebastien Buisson [Fri, 17 Jan 2020 13:15:25 +0000 (22:15 +0900)]
LU-13142 lod: cleanup layout checking

Cleanup layout checking in lod layer and lfs command-line utility,
for DoM components.

Reported-by: Clement Barthelemy <clement.barthelemy@nextino.eu>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib8b184a31d26442ed10241dc12a0452e5243d0e8
Reviewed-on: https://review.whamcloud.com/37267
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12133 osd-zfs: set blocksize to 8K for llog objects 92/37192/4
Alex Zhuravlev [Fri, 10 Jan 2020 20:15:12 +0000 (23:15 +0300)]
LU-12133 osd-zfs: set blocksize to 8K for llog objects

with ZFS-0.8+ default blocksize is 512 bytes. as many llog
operations use 8K chunks it turns into 16 dbuf lookups
which is quite expensive.

for example, sanity/60a takes 104s with blocksize=512 and
90s with blocksize=8K

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I86e6e598899e5d09a550dff7dcb9edd5ee56abd5
Reviewed-on: https://review.whamcloud.com/37192
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11114 llite: Update mdc and lite stats on open|creat 48/36948/7
Olaf Faaland [Tue, 26 Nov 2019 23:20:11 +0000 (15:20 -0800)]
LU-11114 llite: Update mdc and lite stats on open|creat

Increment "create" counter in mdc/<instance>/md_stats, and
"mknod" counter in llite/<instance>stats when an open with
the CREAT flag results in a newly created file.

The mknod counter is chosen for consistency with
patch http://review.whamcloud.com/20246
 "LU-8150 mdt: Track open+create as mknod"
but the mdc counter set does not include mknod.

Change-Id: Ib32d828dac35924b929f44f161cff13c99810540
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/36948
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12678 socklnd: convert peers hash table to hashtable.h 37/36837/6
Mr NeilBrown [Wed, 15 Jan 2020 15:36:42 +0000 (10:36 -0500)]
LU-12678 socklnd: convert peers hash table to hashtable.h

Using a hashtable.h hashtable, rather than bespoke code, has several
advantages:

 - the table is comprised of hlist_head, rather than list_head, so
   it consumes less memory (though we need to make it a little bigger
   as it must be a power-of-2)
 - there are existing macros for easily walking the whole table
 - it uses a "real" hash function rather than "mod a prime number".

In some ways, rhashtable might be even better, but it can change the
ordering of objects in the table at arbitrary moments, and that could
hurt the user-space API.  It also does not support the partitioned
walking that ksocknal_check_peer_timeouts() depends on.

Note that new peers are inserted at the top of a hash chain, rather
than appended at the end.  I don't think that should be a problem.

Test-Parameters: trivial testlist=sanity-lnet

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I70fe64df0dd0db73666ff6fb2d2888b1d64f4be5
Reviewed-on: https://review.whamcloud.com/36837
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12634 gss: uid_keyring and session_keyring moved 43/35743/17
Shaun Tancheff [Fri, 3 Jan 2020 20:10:58 +0000 (15:10 -0500)]
LU-12634 gss: uid_keyring and session_keyring moved

Linux 5.3 removed uid_keyring and session_keyring from user_struct
Prefer the lookup_user_key() API when it is available (~5.0)
Prefer get_request_key_auth() when it is available (~5.0)

kernel-commit: 0f44e4d976f96c6439da0d6717238efa4b91196e
kernel-commit: 822ad64d7e46a8e2c8b8a796738d7b657cbb146d

Remove LC_HAVE_CRED_TGCRED which is no longer used.

Test-Parameters: envdefinitions=SHARED_KEY=true,SANITY_SEC_EXCEPT=30b testlist=sanity,recovery-small,sanity-sec
Cray-bug-id: LUS-7689
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I6d551cd8a9e317b717a43cba9be57f184a281c0a
Reviewed-on: https://review.whamcloud.com/35743
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12889 lnet: Do not assume peers are MR capable 12/36512/8
Chris Horn [Fri, 18 Oct 2019 19:16:53 +0000 (14:16 -0500)]
LU-12889 lnet: Do not assume peers are MR capable

If a peer has discovery disabled then it will not consolidate peer
NI information. This means we need to use a consistent source NI
when sending to it just like we do for non-MR peers.

A comment in lnet_discovery_event_reply() indicates that this was a
known issue, but the situation is not handled properly.

Do not assume peers are multi-rail capable when peer objects are
allocated and initialized.

Do not mark a peer as multi-rail capable unless all of the following
conditions are satisified:
1. The peer has the MR feature flag set
2. The peer has discovery enabled.
3. We have discovery enabled locally

Note: 1, 2, and 3 above are implemented in the code for
lnet_discovery_event_reply(), but code earlier in the function breaks
this behavior. Remove the offending code.

Update sanity-lnet tests 100 and 101 to reflect the fact that peers
added via the traffic path no longer have multi-rail by default.

Cray-bug-id: LUS-7918
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ia02bd446f4b2143fb490f56c1ff6103198316da3
Reviewed-on: https://review.whamcloud.com/36512
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoRevert "LU-12222 lnet: Check if we're sending to ourselves" 59/37259/3
Chris Horn [Thu, 16 Jan 2020 19:25:29 +0000 (13:25 -0600)]
Revert "LU-12222 lnet: Check if we're sending to ourselves"

This reverts commit e4af756e1f428a9f7883bf883f66941defb1447f.

Commit e4af756 causes an assert when combined with patch
    https://review.whamcloud.com/36512
    LU-12889 lnet: Do not assume peers are MR capable
Since the 36512 patch is fixing a more serious bug, this
patch is reverted to allow that fix to land.

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I6f3c1e7f7b2858f4aa330b53880fbcc815c1e2c7
Reviewed-on: https://review.whamcloud.com/37259
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12923 libcfs: Remove CLASSERT() for libcfs_private.h 88/37188/3
Arshad Hussain [Mon, 30 Dec 2019 23:28:56 +0000 (04:58 +0530)]
LU-12923 libcfs: Remove CLASSERT() for libcfs_private.h

This patch removes final CLASSERT() define from file
libcfs/include/libcfs/libcfs_private.h. For compile
time assertion kernel defined BUILD_BUG_ON() is preferred

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I6d7dd55489824631ae61393413598fe6dc4365a2
Reviewed-on: https://review.whamcloud.com/37188
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12861 libcfs: Cleanup use of bare printk 46/37046/5
Shaun Tancheff [Fri, 3 Jan 2020 16:21:19 +0000 (10:21 -0600)]
LU-12861 libcfs: Cleanup use of bare printk

Some users of printk(<LEVEL> "fmt" can be converted to
pr_level("fmt" equivalents

Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I5bb13dfa3538839cfaf81137f3cffd937ce55a92
Reviewed-on: https://review.whamcloud.com/37046
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-3606 lustre: Reserve OST_FALLOCATE(fallocate) opcode 77/37277/4
Swapnil Pimpale [Wed, 1 Jan 2020 00:42:57 +0000 (06:12 +0530)]
LU-3606 lustre: Reserve OST_FALLOCATE(fallocate) opcode

A new RPC, OST_FALLOCATE has been added for
space preallocation. This patch reserves
OST_FALLOCATE opcode for fallocate syscall.
Reserving opcode upfront would ensure consistency
and would avoid protocol interoperability issues
in the future.

Test-Parameters: trivial testlist=sanity,sanityn,sanity-dom
Signed-off-by: Swapnil Pimpale <spimpale@ddn.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Abrarahmed Momin <abrar.momin@gmail.com>
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Ie109f8f5720dec6d34c5ce4f7732fe49ccb47cd9
Reviewed-on: https://review.whamcloud.com/37277
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
4 months agoLU-3606 fsx: Add fallocate operation to fsx 73/37273/2
Swapnil Pimpale [Tue, 31 Dec 2019 19:39:25 +0000 (01:09 +0530)]
LU-3606 fsx: Add fallocate operation to fsx

This patch updates Lustre fsx(File system exerciser)
to handle fallocate calls. There is no need to change
any existing test case using "fsx" binary as with this
fsx version the 'fallocate' call will simply be skipped
as "Operation not supported".

Test-Parameters: trivial testlist=sanity,sanityn,sanity-dom
Signed-off-by: Swapnil Pimpale <spimpale@ddn.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Abrarahmed Momin <abrar.momin@gmail.com>
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I81649d00984257b1785e763ab5c00d570eb412f9
Reviewed-on: https://review.whamcloud.com/37273
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
4 months agoLU-13164 uapi: remove unused LUSTRE_DIRECTIO_FL 95/37295/2
Andreas Dilger [Tue, 21 Jan 2020 09:32:54 +0000 (02:32 -0700)]
LU-13164 uapi: remove unused LUSTRE_DIRECTIO_FL

The LUSTRE_DIRECTIO_FL was added based on the upstream FS_DIRECTIO_FL
flag in the hopes that it might be useful, but it has since been
removed from the upstream in kernel commit v4.4-rc4-22-g68ce7bfcd995
and replaced by FS_VERITY_FL using the same value in kernel commit
v5.3-rc2-4-gfe9918d3b228, which we are much more likely to use.

Since LUSTRE_DIRECTIO_FL was unused, there is no risk to remove it.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I49e915612636a674a86d25be5d91a042693ebbe5
Reviewed-on: https://review.whamcloud.com/37295
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13130 tests: sanity-scrub to use full device size with ZFS 17/37217/4
Alex Zhuravlev [Mon, 13 Jan 2020 17:29:55 +0000 (20:29 +0300)]
LU-13130 tests: sanity-scrub to use full device size with ZFS

as on tiny devices ZFS fallbacks to non-cached writes (grants are
consumed too quickly) while formatting time doesn't depend on
device size with ZFS (which was the original reasoning for the limits).

also increase OST size as sometimes local testing with ldiskfs fails
due to lack of space.

Test-Parameters: trivial testlist=sanity-scrub mdscount=2 mdtcount=4
Test-Parameters: fstype=zfs testlist=sanity-scrub mdscount=2 mdtcount=4

Change-Id: I8aad6c39d23a1d4c8db07b76e9de7fa2a664b1e5
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37217
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-9859 libcfs: move files out of libcfs/linux 91/37191/6
James Simmons [Sat, 18 Jan 2020 14:53:35 +0000 (09:53 -0500)]
LU-9859 libcfs: move files out of libcfs/linux

Files that are not used to handle various kernel verisons are
promoted out of the linux directory. Loosely based on

Linux-commit: f72c3ab791ac0b2b75b5b5d4d51d8eb89ea1e515

This bring us more into sync with linux lustre client.

Change-Id: I4aad42671de14b4e5ca0743d2126363c829b0d74
Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37191
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12977 ldiskfs: properly take inode_lock() for truncates 16/37116/5
James Simmons [Mon, 30 Dec 2019 17:48:05 +0000 (12:48 -0500)]
LU-12977 ldiskfs: properly take inode_lock() for truncates

Originally Lustre grabbed the inode_lock() but this lead to
deadlocks as described in LU-6446 and LU-4252. The recent work
of LU-10048 changed the truncate code so that it is called
asynchronously from the main transactions. This should avoid
lock ordering issues. It should be safe to take the
inode_lock() around ldiskfs_truncate() and remove the WARN().

Test-Parameters: fstype=ldiskfs testlist=racer

Change-Id: Id7b6d05d054ab041980e946989aa1effae5c7111
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37116
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13036 lnet: avoid extra memory consumption 97/36897/3
Alexey Lyashkov [Fri, 20 Dec 2019 12:40:37 +0000 (15:40 +0300)]
LU-13036 lnet: avoid extra memory consumption

use slab allocation for the rsp_tracker and lnet_message
structs to avoid memory fragmnetation.

Test-parameters: trivial

Cray-bug-id: LUS-8190
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I67ec8f8fe4da4c646241d551e0a23745cae8ed00
Reviewed-on: https://review.whamcloud.com/36897
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>