Whamcloud - gitweb
fs/lustre-release.git
5 months agoLU-9679 osc: use overlapped() consistently. 02/37602/2
NeilBrown [Wed, 12 Dec 2018 07:20:36 +0000 (18:20 +1100)]
LU-9679 osc: use overlapped() consistently.

osc_extent_is_overlapped() open-codes exactly the test that
overlapped() performs.
So use overlapped() instead, to make the code more obviously
consistent.

Linux-Commit: 270995b08634 ("lustre: osc: use overlapped()
consistently.")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I3a3ed2ee04343a294ae94f205f5d12be98f99bf3
Reviewed-on: https://review.whamcloud.com/37602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 months agoLU-9679 osc: remove cl_io_cancel() 97/37597/3
NeilBrown [Mon, 17 Dec 2018 02:39:10 +0000 (13:39 +1100)]
LU-9679 osc: remove cl_io_cancel()

cl_io_cancel() is never used, so remove it and various
other things that it is the only user of.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I6cf9b53aa7fc3379e57fa0ac0ea236ccda4ff6b7
Reviewed-on: https://review.whamcloud.com/37597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 osc: use assert_spin_locked() 96/37596/2
NeilBrown [Wed, 12 Dec 2018 03:25:30 +0000 (14:25 +1100)]
LU-9679 osc: use assert_spin_locked()

assert_spin_locked() is preferred to spin_is_locked() for affirming
that a spinlock is locked.

__osc_extent_sanity_check() is only ever called with obj already
locked, so change the check into an assertion.

Linux-Commit: a12d8284b574 ("lustre: osc_cache: use
assert_spin_locked()")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Iaae6deb5af4dec4d31893749924f211ba0489c47
Reviewed-on: https://review.whamcloud.com/37596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 general: add missing spaces near punctuation 02/37402/5
Mr NeilBrown [Mon, 3 Feb 2020 03:24:19 +0000 (14:24 +1100)]
LU-9679 general: add missing spaces near punctuation

Many places in lustre fold a long string onto multiple lines, usually
at word breaks.  Sometimes the word-break has punctuation, such as
comma, colon, or period, but needs a space as well to be properly
readable.  Because the string is folded, the missing space isn't
immediately obvious in the code.

This patch adds those spaces after punctuation where is seems to be
needed, and joins the affected strings onto a single line, in accord
with current policy.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9e76778b1e9687bc26e85500006b4b9d9ae6f93a
Reviewed-on: https://review.whamcloud.com/37402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 months agoLU-12914 mdt: mdt_prep_ma_buf_from_rep() is called twice 11/36611/3
Bruno Faccini [Wed, 30 Oct 2019 10:38:48 +0000 (11:38 +0100)]
LU-12914 mdt: mdt_prep_ma_buf_from_rep() is called twice

In some rare cases (replay of file open with O_LOV_DELAY_CREATE
when object found dead on mdt) mdt_prep_ma_buf_from_rep() can
be called twice (in either mdt_reint_open() and mdt_open_by_fid())
during the same request handling.
So remove assert checking if LMV or LOV has already been found and
set in ma.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I78e0456ea59c37cab4276383c75c4fa5cc9f4829
Reviewed-on: https://review.whamcloud.com/36611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11668 mdd: use mdd_object_fid() instead of mdo2fid() 47/35047/5
Andreas Dilger [Wed, 21 Nov 2018 02:32:32 +0000 (19:32 -0700)]
LU-11668 mdd: use mdd_object_fid() instead of mdo2fid()

Both mdd_object_fid() and mdo2fid() helper functions are the same.
Replace mdo2fid() with the better-named mdd_object_fid(mdd_obj)
function everywhere.  Use mdd_obj_dev_name(mdd_obj) for console
error messages instead of mdd2obd_dev(mdd)->obd_name for consistency.

It would be nice to consistently use "mdd_obj" for objects (instead of
"o" or "mo" or "obj", ...) and "mdd" for devices (instead of "m"), but
that is too big to include in this patch.  Just replace them in the
few wrapper functions already affected by this patch.

Fix up whitespace and string formatting style in affected code.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6de748bada06f0f66123e4567115deb2633ebbe5
Reviewed-on: https://review.whamcloud.com/35047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12931 gnilnd: use time_after() to compare jiffies 02/36702/5
Andreas Dilger [Thu, 7 Nov 2019 06:33:55 +0000 (23:33 -0700)]
LU-12931 gnilnd: use time_after() to compare jiffies

Fix a potential bug in gnilnd it is directly comparing a timeout
against jiffies instead of using time_after() to handle jiffies wrap.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie4d190e9c04e807f2152b71dc28ef0b0463ebbe5
Reviewed-on: https://review.whamcloud.com/36702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13264 osc: ensure lu_ref work atomic from osc_lock_upcall() 29/37629/2
Bruno Faccini [Wed, 19 Feb 2020 16:25:26 +0000 (17:25 +0100)]
LU-13264 osc: ensure lu_ref work atomic from osc_lock_upcall()

Since osc_lock_upcall() uses per-cpu env via
cl_env_percpu_[get,put](), all undelying work must execute on the
same CPU, meaning that no sleep()/scheduling must occur.
This implies all lu_ref related work to no longer use lu_ref_add(),
which calls might_sleep() (likely to cause a
scheduling/cpu-switch...), but lu_ref_add_atomoc() instead.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ide33d4c415e9e382f0bc344e2114182a1f122de6
Reviewed-on: https://review.whamcloud.com/37629
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13263 osc: use LDLM_LOCK_RELEASE() if no lu_ref added 25/37625/2
Bruno Faccini [Wed, 19 Feb 2020 14:39:19 +0000 (15:39 +0100)]
LU-13263 osc: use LDLM_LOCK_RELEASE() if no lu_ref added

In osc_ldlm_glimpse_ast(), LDLM_LOCK_PUT() is used after
LDLM_LOCK_GET() when no lu_ref has been added.
This causes a LBUG when USE_LU_REF is configured, so
change LDLM_LOCK_PUT() to LDLM_LOCK_RELEASE().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Id522a02878f01ae565e6c2418fe8cd85c945dde9
Reviewed-on: https://review.whamcloud.com/37625
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13258 libcfs: make apply_workqueue_attrs() available for Lustre 13/37613/2
James Simmons [Mon, 17 Feb 2020 19:34:41 +0000 (14:34 -0500)]
LU-13258 libcfs: make apply_workqueue_attrs() available for Lustre

Currently Lustre work queues can run on any core which introduces
noise on the system. The linux kernel has a function called
apply_workqueue_attrs() which allows you to control which cores
a work queue will execute on. Manually export this function so
Lustre can use it.

Test-Parameters: trivial

Change-Id: I467539cb8def7029fa9dfff2386234de5e0fe133
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13004 target: take offset into account in tgt_send_buffer 71/37571/4
Mikhail Pershin [Fri, 14 Feb 2020 09:59:05 +0000 (12:59 +0300)]
LU-13004 target: take offset into account in tgt_send_buffer

While calculating amount of pages needed, take buffer offset into
account because it can be non-aligned after allocations in
out_read().

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib7c9b35d328d366a27cc553ffe2f2c5930949cf4
Reviewed-on: https://review.whamcloud.com/37571
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12251 tests: stop running sanity-flr for PPC 63/37563/3
James Nunez [Thu, 13 Feb 2020 20:29:50 +0000 (13:29 -0700)]
LU-12251 tests: stop running sanity-flr for PPC

Stop running all sanity-flr tests for PPC client
testing until we understand and the sanity-pfl test
suite to passes all testing for PPC clients.

Test-Parameters: trivial clientarch=ppc64 testlist=sanity-flr
Test-Parameters: testlist=sanity-flr

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Iee044e6995ed4f6cca5f6b7f92eee6b59cb7018b
Reviewed-on: https://review.whamcloud.com/37563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-12198 libcfs: always copy ioctl header back to user 59/37559/3
Dominique Martinet [Thu, 13 Feb 2020 13:36:32 +0000 (13:36 +0000)]
LU-12198 libcfs: always copy ioctl header back to user

lnetctl_get_peer_list fills back the required size in header if the
given buffer was too small. Userspace needs the info back to grow
the buffer and try again.

Note we only replace err on failure if err was previously not set

Fixes: fba98579efc4 ("LU-6202 libcfs: replace libcfs_register_ioctl with a blocking notifier_chain")
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Change-Id: I2b6e319aceeb00d488572053d27023891afe1928
Reviewed-on: https://review.whamcloud.com/37559
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13225 utils: bash completion for lfs and lctl 83/37483/12
Andreas Dilger [Sat, 8 Feb 2020 08:25:29 +0000 (01:25 -0700)]
LU-13225 utils: bash completion for lfs and lctl

Add a bash completion for "lfs" and improve completion for "lctl".
Rename the "lctl" completion script to "lustre" since the two
commands share helper routines for fsnames, pools, etc. and install
"lfs" and "lctl" symlinks to the common command file.

The completion prints available sub-commands and their options,
and for some sub-commands it completes available arguments
(e.g. mount points, pool names, and MDT/OST names).

A couple of minor changes to "lfs" and "lctl" usage messages to make
the sub-command options easier to parse.  More needs to be done to
make all sub-commands have proper long options.

There is definitely a lot more that could be added to the completions,
but this is a good start and provides a framework for the future.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie989b2ef4c0d6d8565e5c6753205bb6ed83ebbe5
Reviewed-on: https://review.whamcloud.com/37483
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13133 tests: sanity-selinux test_21{a,b} sepol update 24/37224/3
Sebastien Buisson [Tue, 14 Jan 2020 11:51:55 +0000 (20:51 +0900)]
LU-13133 tests: sanity-selinux test_21{a,b} sepol update

We need to make sure MDS receives updated sepol info from MGS.
In case of combined MGT/MDT, directly setting fileset on the node
will mask llog-based info retrieval mechanism. So always use
'lctl set_param -P' to set sepol value.

Test-Parameters: trivial
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaf8ff13364b9ba5f5d8b733be0247d79e05a6b3d
Reviewed-on: https://review.whamcloud.com/37224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13071 lnet: reduce log severity for health events 02/37002/2
Amir Shehata [Thu, 12 Dec 2019 19:19:48 +0000 (11:19 -0800)]
LU-13071 lnet: reduce log severity for health events

No need to print an error when the health of an interface is
reduced. Changed it to debug level.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia60ade12efab732ea4b0388a3803976bf65938ab
Reviewed-on: https://review.whamcloud.com/37002
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-13004 osp: use correct page count in osp_prep_update_req 87/37587/3
Mr NeilBrown [Thu, 21 Nov 2019 03:53:59 +0000 (14:53 +1100)]
LU-13004 osp: use correct page count in osp_prep_update_req

A fix that went into patchset 3 of
 https://review.whamcloud.com/#/c/36828/3
disappeared in patchset 5.
We should restore it.

Specifically, 'page_count' should be a count of pages,
but it is currently a count of the bytes in all the pages.

Fixes: f32fbf189fab ("LU-13004 osp: use KIOV in osp_prep_update_req")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic8dcdac414d16b4f2c1c6e0367d496de7e0a8cff
Reviewed-on: https://review.whamcloud.com/37587
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12861 libcfs: Cleanup libcfs_debug_msg use of snprintf 01/36901/8
Shaun Tancheff [Fri, 31 Jan 2020 20:09:34 +0000 (14:09 -0600)]
LU-12861 libcfs: Cleanup libcfs_debug_msg use of snprintf

scnprintf returns the number of bytes written to the buffer.
snprintf returns the size of the buffer needed to satisfy
the request.

Prefer scnprintf when result is being used as the number
of bytes in a buffer.

Use snprintf when the result is used for sizing or resizing
a buffer.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8c4b8f7dcc081f8b9dfffc35059011172be2e091
Reviewed-on: https://review.whamcloud.com/36901
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12191 utils: Make "lctl list_param" read exact path under sysfs tree 52/36852/5
Sonia Sharma [Tue, 4 Feb 2020 18:34:02 +0000 (13:34 -0500)]
LU-12191 utils: Make "lctl list_param" read exact path under sysfs tree

"lctl list_param -R" currently checks for the param_name
in the path and reads the sysfs tree under that. But it can
give erroneous results in the following example -

For path like /sys/fs/lnet/net/o2ib1/ib0, command
"lctl list_param -R" doesn't go down the "net" tree
because it matches "net" with "lnet" and just stop
there.

This patch changes how param_name is checked for
in the path. Like in the above example, instead
of checking for "net", it should check for
"/net". So, this patch adds this change in
param_display() in lustre/utils/lustre_cfg.c

Change-Id: Ieb3ad0d1248eee2192246ff5c4d77a71d87dc446
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36852
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13005 lnet: eq: discard struct lnet_handle_eq 41/36841/5
Mr NeilBrown [Wed, 20 Nov 2019 00:16:34 +0000 (11:16 +1100)]
LU-13005 lnet: eq: discard struct lnet_handle_eq

The Portals API uses a cookie 'handle' to identify an EQ.  This is
appropriate for a user-space API for objects maintained by the kernel,
but it brings no value when the API client and implementation are both
in the kernel, as is the case with Lustre and LNet.

Instead of using a 'handle', a pointer to the 'struct lnet_eq` can be
used.  This object is not reference counted and is always freed
correctly, so there can be no case where the cookie becomes invalid
while it is still held.

So use 'struct lnet_eq *' directly instead of having indirection
through a 'struct lnet_handle_eq'.
Also:
 - have LNetEQAttach() return the pointer, using ERR_PTR() to return
   errors.
 - discard ln_eq_containers and don't store the me there-in.
   This means we don't free any eq that have not already been freed,
   but all eq that are allocated are properly freed, so that is not
   a problem.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0d6e5b654e39e749b39d46f68d0fb3e47a3256e9
Reviewed-on: https://review.whamcloud.com/36841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread 59/36259/6
Mr NeilBrown [Wed, 23 Oct 2019 00:30:49 +0000 (11:30 +1100)]
LU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread

Instead of ptlrpc_thread use more direct interfaces.

- Instead of waiting for thread startup to complete, perform
  the startup before starting the thread.
- as nothing waits for the thread to finish we cannot use
  kthread_should_stop().  Instead, set the task pointer
  sai_task to NULL when the thread is finishing up.
- As we don't use kthread_should_stop(), we can safely do cleanup
  in the thread, because it is sure to run.
- use wake_up_process to signal the thread that there
  is work to do.
- the wake_up that is currently at the end of sa_put() becomes
  a little more complicated and is move to after the one place
  where sa_put() is called.

Change-Id: If694dafc6864348fe5203a4935f4c128ce5ff255
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12780 llite: don't use ptlrpc_thread for sai_agl_thread 58/36258/6
Mr NeilBrown [Wed, 23 Oct 2019 00:30:48 +0000 (11:30 +1100)]
LU-12780 llite: don't use ptlrpc_thread for sai_agl_thread

Instead of ptlrpc_thread use native kthread functionality.

- instead of waiting for the thread to start-up, perform
  all early initialization before starting the thread,
  and cleanup happens after thread is stopped.
- use kthread_stop()/ kthread_should_stop() to negotiate
  shutdown.
- wake_up_process to tell the thread if there is more work
  to do.  The thread sets current->state to TASK_IDLE before
  checking, so that if it gets the wakeup, the 'schedule()'
  call won't block.
  We clear ->sai_agl_task under a spinlock, from which it is
  also woken, to avoid races.

Linux-commit c044fb0f835c

Change-Id: I73294dd2f28087f56c3463ecfad1a8b32a44b2d7
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36258
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10467 lfsck: use wait_event_idle() 10/37610/3
Mr NeilBrown [Mon, 17 Feb 2020 03:46:54 +0000 (14:46 +1100)]
LU-10467 lfsck: use wait_event_idle()

This l_wait_event() call is equivalent to the more standard
wait_event_idle().
So switch over to wait_event_idle().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8e13360a40dd1eec740f597d649c0f230533eb3d
Reviewed-on: https://review.whamcloud.com/37610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10467 ldlm: use wait_event_idle() instead of l_wait_event 09/37609/2
Mr NeilBrown [Mon, 17 Feb 2020 03:45:31 +0000 (14:45 +1100)]
LU-10467 ldlm: use wait_event_idle() instead of l_wait_event

This l_wait_event() is equivalent to wait_event_idle() which is now
supported in lustre.  So switch over to it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If1ee81a0d562516534665d049fb24c1f39b59b95
Reviewed-on: https://review.whamcloud.com/37609
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13254 mdt: clear mti_mdt in mdt_thread_info_fini() 92/37592/2
Mikhail Pershin [Sat, 15 Feb 2020 17:09:31 +0000 (20:09 +0300)]
LU-13254 mdt: clear mti_mdt in mdt_thread_info_fini()

Clear mti_mdt when finalizing mdt_thread_info to prevent
its reuse my other handler later. Usually that may happen
at mdt_lvbo_fill/update() which takes thread info as is,
without initialization because at this point it is not
clear was it already initialized or not. So mti_mdt may be
used there being initialized by some other handler from
different MDT or even with garbage at old pointer.
Meanwhile there is no need to use any mdt_thread_info values
like mti_mdt in mdt_lvbo_fill() because there is MDT device
taken from namespace and the only fields are used from
mdt_thread_info are temporary storage for FID and lu_buffer.

Patch zeros mti_mdt upon thread finalizing and removes also
usages of info->mti_mdt from mdt_lvbo_fill/update() replacing
that with MDT taken from lock namespace.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib350093f0b70c777932c056b34cb56a9702b650d
Reviewed-on: https://review.whamcloud.com/37592
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10467 mdc: change ssleep to msleep_interruptible 88/37488/3
James Simmons [Mon, 17 Feb 2020 16:31:47 +0000 (11:31 -0500)]
LU-10467 mdc: change ssleep to msleep_interruptible

During review of the mdc wait_idle* changes for mdc_getpage()
it was pointed out that the use of ssleep() prevents the code
from being interruptible. Change ssleep to msleep_interruptible()
to allow breaking out of the sleep if an application sends
and INTR signal.

Change-Id: I2fcb90ecdd6f2c4f2ee6fbc54d253622e8beee29
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37488
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12811 ptlrpc: pass buffer size to the swabbing functions 35/36435/9
Emoly Liu [Mon, 23 Dec 2019 02:32:31 +0000 (10:32 +0800)]
LU-12811 ptlrpc: pass buffer size to the swabbing functions

By adding a separate rmf_swab_len() function pointer to
req_msg_field, the buffer size can be passed to the swabbing
functions, e.g. lustre_swab_fiemap() in this patch, to avoid
invalid access, especially for small buffer.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I997e6a828f2f1cdfdb8a5fa241fa43ca0ae3677e
Reviewed-on: https://review.whamcloud.com/36435
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
5 months agoLU-11915 tests: add debugging to conf-sanity test_115 48/37548/6
Andreas Dilger [Wed, 12 Feb 2020 06:48:36 +0000 (23:48 -0700)]
LU-11915 tests: add debugging to conf-sanity test_115

After updating the e2fsprogs build version to 1.45.2.wc2, this
test is not longer being skipped, and is failing to mount the
newly-formatted OST0000 due to errors registering with the MGS
(target index already in use).  Since the MDS+MGS was just
reformatted, that doesn't make sense.

Continue to skip this test until we understand why it is failing,
but use ALWAYS_EXCEPT instead of blaming the e2fsprogs version.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59c689763481c4fc3677ca1807101de09599bb77
Reviewed-on: https://review.whamcloud.com/37548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10073 tests: skip test smoke for PPC 50/37450/2
James Nunez [Wed, 5 Feb 2020 22:49:38 +0000 (15:49 -0700)]
LU-10073 tests: skip test smoke for PPC

The lnet-selftest test smoke fails consistently for
PPC client testing.  Thus, stop running this test until
we understand the failure; add smoke to the ALWAYS_EXCEPT
list.

Test-Parameters: trivial
Test-Parameters: clientarch=ppc64 testlist=lnet-selftest
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I090ec05d7ad934bb4c68e976572adb29eb29a676
Reviewed-on: https://review.whamcloud.com/37450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13186 tests: stop running tests for PPC clients 97/37397/10
James Nunez [Sun, 2 Feb 2020 01:24:56 +0000 (18:24 -0700)]
LU-13186 tests: stop running tests for PPC clients

Stop running tests, put on the ALWAYS_EXCEPT list, that fail
consistently when testing PPC clients including:

sanity-hsm tests
(LU-12251) 1a 1b 1d 1e 12c 12f 12g 12h 12m 12n 12o 12p 12q
21 22 23 24a 24b 24d 24e 24f 25b 30c 37 57 58 90 110b 111b
113 222b 222d 228 260a 260b 260c
(LU-12252) 220A 220a 221 222a 222c 223a 223b 224A 224a 226
227 600 601 602 603 604 605

sanity-pfl tests
(LU-13186) 14
(LU-13205) 16a
(LU-13207) 16b
(LU-13215) 17

Test-Parameters: trivial
Test-Parameters: clientarch=ppc64 testlist=sanity-pfl,sanity-hsm
Test-Parameters: clientarch=ppc64 testlist=sanity-pfl,sanity-hsm
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I847a8121d2675b9671bc9a39c4f6ccff67b208fa
Reviewed-on: https://review.whamcloud.com/37397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13226 ldiskfs: Add support for Ubuntu eoan 5.3 54/37554/2
Shaun Tancheff [Wed, 12 Feb 2020 20:19:09 +0000 (14:19 -0600)]
LU-13226 ldiskfs: Add support for Ubuntu eoan 5.3

Ubuntu eoan kernel is close enough to 5.4.7+ mainline to
use the patch series directly.
Update the configure script to select it.

Test-Parameters: trivial
Cray-bug-id: LUS-8485
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iadb9b87a153a88846399d91699c972c72a5e1e7a
Reviewed-on: https://review.whamcloud.com/37554
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13232 tests: add stack_trap to clean up sanity 160j 50/37550/4
James Nunez [Wed, 12 Feb 2020 19:00:56 +0000 (12:00 -0700)]
LU-13232 tests: add stack_trap to clean up sanity 160j

When sanity test 160j fails at any point in the test before
clean up, a client can be left with no file system mounted
or the second file system mount could be left mounted.  We
need to call stack_trap after each of these commands to
clean up the mount points in case of the test failing.

Test-Parameters: trivial
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I631cc2bb2d664a0cdcfe5942d16cd1d011a822ef
Reviewed-on: https://review.whamcloud.com/37550
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9859 libcfs: rename cfs_cpt_table to cfs_cpt_tab 19/37519/2
NeilBrown [Mon, 10 Feb 2020 16:22:02 +0000 (11:22 -0500)]
LU-9859 libcfs: rename cfs_cpt_table to cfs_cpt_tab

The variable "cfs_cpt_table" has the same name as
the structure "struct cfs_cpt_table".
This makes it hard to use #define to make one disappear
on a uni-processor build, but keep the other.
So rename the variable to cfs_cpt_tab.

Linux-commit: 457d63ea5c1aa81fe0b9a66a77a2282856b88983

Test-Parameters: trivial

Change-Id: I77cc6694183df2485974c8a962a5766a905fb5f9
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/37519
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10756 ptlrpc: fix IMP_CLOSED state is being never set 05/37405/4
Mikhail Pershin [Mon, 3 Feb 2020 09:03:59 +0000 (12:03 +0300)]
LU-10756 ptlrpc: fix IMP_CLOSED state is being never set

Commit cf78502e48d checks the new state for IMP_CLOSED value
instead of import current state so instead of keeping import
closed it prevents import state from being set to IMP_CLOSE

Patch restores original check to keep import closed by
checking its current state

Fixes: cf78502e48d ("LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7df2798f09ce7023381c03957adf530da4149c2d
Reviewed-on: https://review.whamcloud.com/37405
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13190 mds: send mbo_max_mdsize in open intent reply 00/37400/6
Alex Zhuravlev [Sun, 2 Feb 2020 13:45:29 +0000 (16:45 +0300)]
LU-13190 mds: send mbo_max_mdsize in open intent reply

 - client sends open|create intent before a connection to OST
   cl_default_mds_easize is 0 since initialization
 - MDS replies back without UPDATE bit in LDLM lock, but wit EAh
    (MDS doesn't send OBD_MD_FLMODEASIZE and mbo_max_mdsize back
 - client's cl_default_mds_easize is still 0
 - client sends getattr intent with 0-size buffer for EA
 - MDS replies LAYOUT lock, but empty EA due to 0-size buffer
 - client sets local layout to EMPTY
 - all subsequent I/O fails with -EBADF

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iadd5595d956f0469e3916cdc1cca2ac8f802a149
Reviewed-on: https://review.whamcloud.com/37400
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12722 target: disable recovery for local clients 25/36025/38
Alexey Zhuravlev [Mon, 9 Sep 2019 14:00:05 +0000 (17:00 +0300)]
LU-12722 target: disable recovery for local clients

when client is running on a server node, then the local
services can't rely on that client in the contex of
recovery - such a client dies with the node, can't replay
requests and states and then the restarting server has to
wait till recovery expires which doesn't make any sense.

so the servers should recogize local clients and exclude
them from recovery (i.e. don't make them part of last_rcvd).

for the purpose of local testing a special mount option
"local_recov" has been added to {MDS|OST}_MOUNT_OPTS in
tests/cfg/local.sh to save local testing when everyting
is running within a single node.

Signed-off-by: Alexey Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4cb906c44c1192933f7d77dc782160e426e9efde
Reviewed-on: https://review.whamcloud.com/36025
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12280 quota: add notify grace 17/36017/8
Hongchao Zhang [Thu, 10 Oct 2019 21:06:05 +0000 (17:06 -0400)]
LU-12280 quota: add notify grace

Add an option to get notify when the quota is over soft limit but
prevents it from becoming the hard limit.

Change-Id: I01ae1266c3683198b82af7bad119db280c1e3a07
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36017
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9859 libcfs: remove unnecessary cfs_block_allsigs() calls 50/35350/15
NeilBrown [Mon, 10 Feb 2020 14:06:31 +0000 (09:06 -0500)]
LU-9859 libcfs: remove unnecessary cfs_block_allsigs() calls

Threads started by kthread_run() ignore all signals,
as kthreadd() calls ignore_signals(), and this is
inherited by all children.
So there is no need to call cfs_block_allsigs() in functions
that are only run from kthread_run().

For the case of lnet_ping_md_unlink() it is not from a kernel
thread but nothing in that function should be affected by
signals so it is safe to remove.

For lnet_ping() we need to manually block signals since
LNetEQPool() can unconditionally abort when a signal is
recieved.

Linux-commit: 1b2dad1459e480028a2714439048d8a634132857

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I124dccf78a3187d5f4a31c7b76db5369aaafc369
Reviewed-on: https://review.whamcloud.com/35350
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12477 lustre: remove obsolete config checks 85/37085/16
James Simmons [Sat, 8 Feb 2020 13:39:30 +0000 (08:39 -0500)]
LU-12477 lustre: remove obsolete config checks

Remove from the lustre kernel code all the support for kernels
earlier than the RHEL7 3.10+. This greatly simplifies the code
and makes build times much better.

Change-Id: If52091ac5249b2719b992032040ccf30cc5bf0e4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37085
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10235 mdt: mdt_create: check EEXIST without lock 80/30880/18
Dominique Martinet [Wed, 10 Jan 2018 13:08:06 +0000 (14:08 +0100)]
LU-10235 mdt: mdt_create: check EEXIST without lock

mkdir() currently gets a write lock on the parent even if the new
directory already exists.

This patch adds an initial lookup of the new directory without a DLM
lock so that other clients do not need to cancel their DLM lock if the
"new" directory already exists, but will continue as usual if directory
did not exist.

There is a small race window that child was created by others after our
check and before locking parent, but this can be detected later during
index insert.

Performance change on two haswell 16-core VMs with ib, mean values of
mpirun -n 8 ./mdtest -D -i 8 -I 1000

test environment | directory creation | tree creation
local, no patch  | 1725/s             | 769/s
local, patch     | 1821/s             | 788/s
remote, no patch | 1729/s             | 772/s
remote, patch    | 1687/s             | 787/s

The differences are of the order of the noise here, with all mkdirs
being effective.

If directories exist, some simple stress on four nodes shows intended
improvements:
clush -w vm[0-3] 'seq 0 10000 |
    xargs -P 7 -I{} sh -c "(({}%3==0)) &&
        mkdir /mnt/lustre/testdir/foo 2>/dev/null ||
        stat /mnt/lustre/testdir > /dev/null"'

with patch: 10s
without patch: 19s
(the difference grows exponentially with number of clients and hangs
with over 60 clients without the patch; exact time was not re-measured
with patch)

Updated sanityn.sh 43a 45a to avoid race conditions.

Add sanityn.sh test_43j to verify above scenario.

Test-Parameters: envdefinitions=SLOW=yes testlist=replay-vbr,replay-vbr
Change-Id: I37fc9c8ffc7ab334c0645042beda5bef01284564
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/30880
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-11597 tests: skip sanityn tests for PPC 61/37561/2
James Nunez [Thu, 13 Feb 2020 20:10:53 +0000 (13:10 -0700)]
LU-11597 tests: skip sanityn tests for PPC

Several sanityn test suite tests fail consistenly when
testing PPC clients.  These tests should be skipped,
added to the ALWAYS_EXCEPT list, until the failures are
understood and fixed.

Tests to skip in sanityn are
16a (LU-11597)
71a (LU-11787)

Test-Parameters: trivial clientarch=ppc64 testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I39cc9d22e8a47eb8ef59ce8d30e1b6e9aa616a9a
Reviewed-on: https://review.whamcloud.com/37561
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11269 ptlrpc: do not expose transient IDLE state 23/37523/4
Alex Zhuravlev [Mon, 10 Feb 2020 21:06:07 +0000 (00:06 +0300)]
LU-11269 ptlrpc: do not expose transient IDLE state

to avoid cases when anyone sending an RPC observes the connection
in this state while it's going to reconnect right away.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9ca89051c4176fe321262f8b2f52969c382e401e
Reviewed-on: https://review.whamcloud.com/37523
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13228 clio: mmap write when overquota 95/37495/4
Alexander Zarochentsev [Fri, 20 Dec 2019 23:19:44 +0000 (02:19 +0300)]
LU-13228 clio: mmap write when overquota

Flagging client by overquota flag should not
cause mmap write access to sigbus the app.

Cray-bug-id: LUS-8221
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I29d5901fa5078b5cfca40391a02531cf27efce93
Reviewed-on: https://review.whamcloud.com/37495
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13131 osc: remove redundant osc_list() helper 79/37479/4
Andreas Dilger [Fri, 7 Feb 2020 22:01:49 +0000 (15:01 -0700)]
LU-13131 osc: remove redundant osc_list() helper

The osc_list() helper function is the same as list_empty_marker(),
and we don't need both.  Remove osc_list() from the code.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I07d2a519906f52fca8c95613a14ad7389a3ebbe5
Reviewed-on: https://review.whamcloud.com/37479
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 various: use list_splice and list_splice_init 57/37457/2
Mr NeilBrown [Wed, 13 Nov 2019 03:03:12 +0000 (14:03 +1100)]
LU-9679 various: use list_splice and list_splice_init

The construct
   list_add(to, from);
   list_del(from);
is equivalent to
   list_splice(from, to);
providing 'to' has been initialized.
Similarly with list_del_init and list_splice_init.
There is no need to check if list_empty(from) first.

Also looping over a list moving individiual entries to
another list can more easily be done with list_splice.

These changes improve code clarity.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I710eb3bbd83c75e6c8f00b8d0a4c256ad28f9082
Reviewed-on: https://review.whamcloud.com/37457
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: discard lnet_sock_accept() 03/37303/4
Mr NeilBrown [Thu, 7 Nov 2019 04:02:54 +0000 (15:02 +1100)]
LU-10391 lnet: discard lnet_sock_accept()

There is no longer any important difference between
lnet_sock_accept(), and kernel_accept(..., O_NONBLOCK).
So remove lnet_sock_accept().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iad7c91abe2359758982e3740a21c91232c919aa0
Reviewed-on: https://review.whamcloud.com/37303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: use data_ready callback to trigger accept() 02/37302/6
Mr NeilBrown [Wed, 22 Jan 2020 06:16:12 +0000 (17:16 +1100)]
LU-10391 lnet: use data_ready callback to trigger accept()

Rather than blocking in lnet_sock_accept(), set up a data_ready
callback, and use that to find out when to call lnet_sock_accept()
again.

This simplifies lnet_sock_accept() (which will be removed in
next patch), and means that we could listen on multiple
sockets, which will be useful for IPv6 support.

The code design in based on that in net/sunrpc/svcsock.c.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3015f2f6b6d420af5c8454b6c1a99611b48e7702
Reviewed-on: https://review.whamcloud.com/37302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 llite: Discard LUSTRE_FPRIVATE() 52/36652/11
Mr NeilBrown [Sun, 3 Nov 2019 23:09:25 +0000 (10:09 +1100)]
LU-9679 llite: Discard LUSTRE_FPRIVATE()

The LUSTRE_FPRIVATE() macro adds no value.
Instead of
  LUSTRE_FPRIVATE(file)
use
  file->private_data

which is shorter and more familiar, and widely used
elsewhere in lustre.

Also re-indent several functions where this was changed, to
use TABs.
Also join together some strings that were split across 2
lines.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I811aea8069b22beed15fd96d8c6bef8eca42defd
Reviewed-on: https://review.whamcloud.com/36652
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10467 ptlrpc: convert final users of LWI_TIMEOUT_INTERVAL 79/35979/11
Mr NeilBrown [Tue, 27 Aug 2019 22:47:09 +0000 (08:47 +1000)]
LU-10467 ptlrpc: convert final users of LWI_TIMEOUT_INTERVAL

LWI_TIMEOUT_INTERVAL causes l_wait_event to perform a slow
poll loop.  This is only needed if the event can happen without
triggering a wakeup on the wait-queue.

On this case, the event is a counter reaching zero, and we can
easily ensure a wakeup is sent whenever that counter becomes
zero.
So let's add those wake_ups, and change this to a simple
wait_event_idle_timeout().

At the same time, change all wake_up_all() calls on this wait queue to
simple wake_up().  wake_up_all() is only needed where there are
exclusive waiters, and this queue has no exclusive waiters.

Change-Id: I2bea069150f21b725025bacc7a4fa0cf4d95ab20
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/35979
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10467 lustre: use l_wait_event_abortable where appropriate. 75/35975/8
Mr NeilBrown [Mon, 26 Aug 2019 05:59:07 +0000 (15:59 +1000)]
LU-10467 lustre: use l_wait_event_abortable where appropriate.

If the lwi passed to l_wait_event() was created with

    lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);

the effect is to wait with no timeout and blocking any
non-fatal signals.
For this, we now have l_wait_event_abortable(), or for one
case l_wait_event_abortable_exclusive();
So use those.

l_wait_event_abortable() will return -ERESTARTSYS if a signal was
received, while l_wait_event() returens -EINTR.  We need to be
careful to handle this difference.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Iadf0fab92fcfd46802766198dcbe6b6b349214fa
Reviewed-on: https://review.whamcloud.com/35975
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12988 ldiskfs: revert prefetch patch 19/37619/2
Alex Zhuravlev [Wed, 19 Feb 2020 08:12:31 +0000 (11:12 +0300)]
LU-12988 ldiskfs: revert prefetch patch

as a problem leading to IO errors was found.
also, the patch for 4.18 kernel needs fixes.

Revert "LU-12988 ldiskfs: mballoc to prefetch groups"

This reverts commit 05f31782be20fc4c46082dba02c10bcea59539e3.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I478a011e561633516524697f3a4aa03734791790
Reviewed-on: https://review.whamcloud.com/37619
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13166 osd-ldiskfs: fix to allow to get system inode 21/37421/4
Wang Shilong [Wed, 5 Feb 2020 12:33:38 +0000 (20:33 +0800)]
LU-13166 osd-ldiskfs: fix to allow to get system inode

Lustre need load ldiskfs system inode for quota accounting purpose,
so pass LDISKFS_IGET_SPECIAL flag to ldiskfs_iget(), otherwise,
support of centos8 quota will be broken.

Fixes: 8ab3aa50a14 ("LU-12355 ldiskfs: Added ext4_iget_flags to ext4_iget")
Change-Id: I3a30ec540444b149bc3398a62951d2826eb7b9ce
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37421
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9859 libcfs: move tracefile locking from linux-tracefile.c 08/37408/4
NeilBrown [Tue, 4 Feb 2020 02:28:36 +0000 (21:28 -0500)]
LU-9859 libcfs: move tracefile locking from linux-tracefile.c
 to tracefile.c

There is no value in keeping it separate.

Linux-commit: 49209c598d93289ca077575615e98f242b1d8156

Test-Parameters: trivial

Change-Id: I24ee7545a40fd6d2ac15018f089d51142736fa27
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/37408
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12747 tests: wait properly for orhpan thread stop 95/37395/2
Andreas Dilger [Sat, 1 Feb 2020 00:55:23 +0000 (17:55 -0700)]
LU-12747 tests: wait properly for orhpan thread stop

Use wait_update_facet() to check if the MDD orphan cleanup thread has
exited, rather than a fixed 5s timeout.  We can hope that most cases
will finish faster than 5s, but don't gratuitously fail if it takes
somewhat longer.  We clearly aren't having a fatal problem here, or
there would be serious failures at cleanup time.

Fixes: fffef5c29e3b ("LU-11418 mdd: delete name if orphan doesn't exist")
Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=811,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I16b0281a519d47b5b98d495bf17040153c3ebbe5
Reviewed-on: https://review.whamcloud.com/37395
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13169 tests: add ONLY_REPEAT parameter to repeat subtests 21/37321/8
Andreas Dilger [Fri, 24 Jan 2020 09:20:38 +0000 (02:20 -0700)]
LU-13169 tests: add ONLY_REPEAT parameter to repeat subtests

Add the ONLY_REPEAT environment variable, to allow tests specified
by ONLY to be run multiple times, to ensure that the test is passing
consistently (or fixing an intermittent bug).  This is faster than
restarting the test session multiple times for only a few subtests.

Have the iteration around the subshell started for run_one() so that
any registered stack_trap EXIT calls are triggered between iterations,
the fail_loc is reset, grant/health/error checks are done, and so on.

Remove $tdir and $tfile files after each iteration to avoid failures
with the subsequent subtest runs.  For tests that do not follow the
standard naming convention for test directories and files, they need
to be updated to use $tdir and $tfile, which is good in any case.

YAML output splits each iteration into a separate subtest for Maloo.
The output from run_one() is appended to a single output file for all
iterations so all output is captured instead of just the last one.

The iterations will continue until $ONLY_REPEAT loops pass, or until
the subtest hits an error.  Trying to continue for all iterations in
the face of errors would likely end up with all of later iterations
failing also due to leftover state from the previous failure, and the
goal is for the subtests to pass consistently.  If we are trying to
determine rates of intermittent failures, this can be computed using
1/num_passes about the same as num_failures/ONLY_REPEAT iterations.

Rename variables in subtests to avoid clash with testnum, testname,
and TESTNAME, and use them consistently in functions and subtests.

Test-Parameters: testlist=sanity envdefinitions=ONLY=27l,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: I5449590dc3e25c113b059974fb7b96c892434380
Reviewed-on: https://review.whamcloud.com/37321
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12637 kernel: RHEL 8.1 server support 68/36968/19
Jian Yu [Tue, 4 Feb 2020 00:20:35 +0000 (16:20 -0800)]
LU-12637 kernel: RHEL 8.1 server support

This patch makes changes to support RHEL 8.1 release with
kernel 4.18.0-147.3.1.el8_1 for Lustre server.

Test-Parameters: trivial \
envdefinitions=SANITY_EXCEPT="411 812b" \
clientdistro=el8.1 serverdistro=el8.1 \
testlist=sanity

Change-Id: Iee6ae3dc20c62caaac1d740b14c5877ff7bfb4d5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36968
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13178 build: Update ZFS version to 0.8.3 73/37373/6
Nathaniel Clark [Thu, 30 Jan 2020 18:18:09 +0000 (10:18 -0800)]
LU-13178 build: Update ZFS version to 0.8.3

New Features 0.8.0

* Native encryption #5769 - The encryption property enables the
creation of encrypted filesystems and volumes. The aes-256-ccm
algorithm is used by default. Per-dataset keys are managed with zfs
load-key and associated subcommands.

* Raw encrypted 'zfs send/receive' #5769 - The zfs send -w option
allows an encrypted dataset to be sent and received to another pool
without decryption. The received dataset is protected by the original
user key from the sending side. This allows datasets to be efficiently
backed up to an untrusted system without fear of the data being
compromised.

* Device removal #6900 - This feature allows single and mirrored
top-level devices to be removed from the storage pool with zpool
remove. All data is copied in the background to the remaining
top-level devices and the pool capacity is reduced accordingly.

* Pool checkpoints #7570 - The zpool checkpoint subcommand allows
you to preserve the entire state of a pool and optionally revert back
to that exact state. It can be thought of as a pool wide snapshot.
This is useful when performing complex administrative actions which
are otherwise irreversible (e.g. enabling a new feature flag,
destroying a dataset, etc).

* Pool TRIM #8419 - The zpool trim subcommand provides a way to
notify the underlying devices which sectors are no longer allocated.
This allows an SSD to more efficiently manage itself and helps prevent
performance from degrading. Continuous background trimming can be
enabled via the new autotrim pool property.

* Pool initialization #8230 - The zpool initialize subcommand writes
a pattern to all the unallocated space. This eliminates the first
access performance penalty, which may exist on some virtualized
storage (e.g. VMware VMDKs).

* Project accounting and quota #6290 - This features adds project
based usage accounting and quota enforcement to the existing space
accounting and quota functionality. Project quotas add an additional
dimension to traditional user/group quotas. The zfs project and zfs
projectspace subcommands have been added to manage projects, set quota
limits and report on usage.

* Channel programs #6558 - The zpool program subcommand can be used
to perform compound ZFS administrative actions via Lua scripts in a
sandboxed environment (with time and memory limits).

* Pyzfs #7230 - The new pyzfs library is intended to provide a
stable interface for the programmatic administration of ZFS. This
wrapper provides a one-to-one mapping for the libzfs_core API
functions, but the signatures and types are more natural to Python.

* Python 3 compatibility #8096 - The arcstat, arcsummary, and
dbufstat utilities have been updated to be compatible with Python 3.

* Direct IO #7823 - Adds support for Linux's direct IO interface.

Performance

* Sequential scrub and resilver #6256 - When scrubbing or
resilvering a pool the process has been split into two phases. The
first phase scans the pool metadata in order to determine where the
data blocks are stored on disk. This allows the second phase to issue
scrub I/O as sequentially as possible, greatly improving performance.

* Allocation classes #5182 - Allows a pool to include a small number
of high-performance SSD devices that are dedicated to storing specific
types of frequently accessed blocks (e.g. metadata, DDT data, or small
file blocks). A pool can opt-in to this feature by adding a special or
dedup top-level device.

* Administrative commands #7668 - Improved performance due to
targeted caching of the metadata required for administrative commands
like zfs list and zfs get.

* Parallel allocation #7682 - The allocation process has been
parallelized by creating multiple "allocators" per-metaslab group.
This results in improved allocation performance on high-end systems.

* Deferred resilvers #7732 - This feature allows new resilvers to be
postponed if an existing one is already in progress. By waiting for
the running resilver to complete redundancy is restored as quickly as
possible.

* ZFS Intent Log (ZIL) #6566 - New log blocks are created and issued
while there are still outstanding blocks being serviced by the
storage, effectively reducing the overall latency observed by the
application.

* Volumes #8615 - When a pool contains a large number of volumes
they are more promptly registered with the system and made available
for use after a zpool import.

* QAT #7295 #7282 #6767 - Support for accelerated SHA256 checksums,
AES-GCM encryption, and the new QAT Intel(R) C62x Chipset / Atom(R)
C3000 Processor Product Family SoC.

Changes in Behavior

* Relaxed (ref)reservation constraints on volumes, they may now be
set larger than the volume size.

* The arcstat.py, arc_summary.py, and dbufstat.py commands have been
renamed arcstat, arc_summary, and dbufstat respectively.

* The SPL source is now included in the ZFS repository removing the
need for separate packages.

* The dedupditto pool property and zfs send -D option have been
deprecated and will be removed in a future release.

Changes for 0.8.1

* Fix comparison signedness in arc_is_overflowing() #8873
* Fix incorrect error message for raw receive #8863
* arc_summary: prefer python3 version and install when there is no
python #8851
* Fix %post and %postun generation in kmodtool #8866
* Reinstate raw receive check when truncating #8852 #8857
* If $ZFS_BOOTFS contains guid, replace the guid portion with $pool
* Fix integer overflow of ZTOI(zp)->i_generation #8858
* hkdf_test binary should only have one icp instance #8850
* Fixed a small typo in man/man1/raidz_test.1 #8855
* Allow TRIM_UNUSED_KSYM when build as a builtin-module #8820
* Make Python detection optional and more portable #8809 #8731
* Wait in 'S' state when send/recv pipe is blocking #8733 #8752
* Make zfs_async_block_max_blocks handle zero correctly #8829 #8289
* Revert "Report holes when there are only metadata changes" #8816
* Exclude log device ashift from normal class #8735
* Fix integer overflow in get_next_chunk() #8778 #8797
* Double-free of encryption wrapping key due to invalid pool
properties #8791
* Endless loop in zpool_do_remove() on platforms with unsigned char
* Fix embedded bp accounting in count_block() #8800 #8766
* Disable parallel processing for 'zfs mount -l' #8762 #8811
* Linux 5.2 compat: Directly call wait_on_page_bit() #8794
* Linux 5.2 compat: Fix config/kernel-shrink.m4 test failure #8776
* Linux 5.2 compat: Remove config/kernel-set-fs-pwd.m4 #8777
* zpool: status -t is not documented in help message #8782
* VERIFY3P() message is missing a space character #8786
columns #8785
* zfs: don't pretty-print objsetid property #8784
* zfs: missing newline character in zfs_do_channel_program() error
message #8783
* Fix ksh-path for random_readwrite_fixed.ksh #8779
* Linux 2.6.39 compat: Test if kstrtoul() exists #8760 #8761
* Device removal panics on 32-bit systems #8790
* zpool: trim -p is not a valid option #8781
* Fix coverity defects: CID 186143 #8788
* Fix kstat state update during pool transition #8746
* Linux 5.2 compat: rw_tryupgrade() #8730

Changes for 0.8.2

* Disabled resilver_defer feature leads to looping resilvers #9299
  #9338
* Fix dsl_scan_ds_clone_swapped logic #9140 #9163
* Scrubbing root pools may deadlock on kernels without
  elevator_change() (#9321) #9321
* QAT related bug fixes #9276 #9303
* kmodtool: depmod path #8724 #9310
* Fix /etc/hostid on root pool deadlock #9256 #9285
* BuildRequires libtirpc-devel needed for RHEL 8 #9289
* Fix zpool subcommands error message with some unsupported options
  #9270
* Fix zfs-dkms .deb package warning in prerm script #9271
* zvol_wait script should ignore partially received zvols #9260
* New service that waits on zvol links to be created #8975
* Always refuse receving non-resume stream when resume state exists
  #9252
* Fix Intel QAT / ZFS compatibility on v4.7.1+ kernels #9268 #9269
* etc/init.d/zfs-functions.in: remove arch warning
* zfs_handle used after being closed/freed in change_one callback
  #9165
* Fix zil replay panic when TX_REMOVE followed by TX_CREATE #7151
  #8910 #9123 #9145
* zfs_ioc_snapshot: check user-prop permissions on snapshotted
  datasets #9179 #9180
* Fix Plymouth passphrase prompt in initramfs script #9202
* Fix deadlock in 'zfs rollback' #9203
* Make slog test setup more robust #9194
* zfs-mount-genrator: dependencies should be space-separated #9174
* Linux 5.3: Fix switch() fall though compiler errors #9170
* Linux 5.3 compat: Makefile subdir-m no longer supported #9169
* Fix out-of-order ZIL txtype lost on hardlinked files #8769 #9061
* Increase default zcmd allocation to 256K #9084
* Improve performance by using dmu_tx_hold_*_by_dnode() #9081
* Fix channel programs on s390x #8992 #9080
* Race between zfs-share and zfs-mount services #9083
* Implement secpolicy_vnode_setid_retain() #9035 #9043
* zed crashes when devid not present #9054 #9060
* Don't directly cast unsigned long to void* #9065
* Fix module_param() type for zfs_read_chunk_size #9051
* Move some tests to cli_user/zpool_status #9057
* Race condition between spa async threads and export #9015 #9044
* hdr_recl calls zthr_wakeup() on destroyed zthr #9047
* Fix wrong comment on zcr_blksz_{min,max} #9052
* Retire unused spl_{mutex,rwlock}_{init_fini} #9029
* Linux 5.3 compat: retire rw_tryupgrade() #9029
* Linux 5.3 compat: rw_semaphore owner #9029
* Fix lockdep recursive locking false positive in dbuf_destroy #8984
* Add missing __GFP_HIGHMEM flag to vmalloc #9031
* Use zfsctl_snapshot_hold() wrapper #9039
* Minor style cleanup #9030
* Fix get_special_prop() build failure #9020
* systemd encryption key support #8750 #8848
* Drop redundant POSIX ACL check in zpl_init_acl() #9009
* Export dnode symbols #9027
* Ensure dsl_destroy_head() decrypts objsets #9021
* Disable unused pathname::pn_path* (unneeded in Linux) #9025
* Fixes: #8934 Large kmem_alloc #8934 #9011
* Fix ZTS killed processes detection #9003
* pkg-utils python sitelib for SLES15 #8969
* Fix race in parallel mount's thread dispatching algorithm #8450
  #8833 #8878
* Fix dracut Debian/Ubuntu packaging #8990 #8991
* Remove VERIFY from dsl_dataset_crypt_stats() #8976
* Improve "Unable to automount" error message. #8959
* Check b_freeze_cksum under ZFS_DEBUG_MODIFY conditional #8979
* Fix error text for EINVAL in zfs_receive_one() #8977
* Don't use d_path() for automount mount point for chroot'd process
  #8903 #8966
* nopwrites on dmu_sync-ed blocks can result in a panic #8957
* Avoid extra taskq_dispatch() calls by DMU #8909
* -Y option for zdb is valid #8926
* Fix error message on promoting encrypted dataset #8905 #8935
* Fix out-of-tree build failures #8921 #8943
* dn_struct_rwlock can not be held in dmu_tx_try_assign() #8929
* Remove arch and relax version dependency #8914
* Add libnvpair to libzfs pkg-config #8919
* Let zfs mount all tolerate in-progress mounts #8881
* zstreamdump: add per-record-type counters and an overhead counter
  #8432
* Fix comments on zfs_bookmark_phys #8945
* Add SCSI_PASSTHROUGH to zvols to enable UNMAP support #8933
* Prevent pointer to an out-of-scope local variable #8924 #8940
* dedup=verify doesn't clear the blkptr's dedup flag #8936
* Update vdev_ops_t from illumos #8925
* Allow unencrypted children of encrypted datasets #8737 #8870
* Replace whereis with type in zfs-lib.sh #8920 #8938
* Use ZFS_DEV macro instead of literals #8912
* Fix memory leak in check_disk() #8897 #8911
* kmod-zfs-devel rpm should provide kmod-spl-devel #8930
* ZTS: Fix mmp_interval failure #8906
* Minimize aggsum_compare(&arc_size, arc_c) calls. #8901
* Python config cleanup #8895
* lz4_decompress_abd declared but not defined #8894
* panic in removal_remap test on 4K devices #8893
* compress metadata in later sync passes #8892
* Move write aggregation memory copy out of vq_lock #8890
* Restrict filesystem creation if name referred either '.' or '..'
  #8842 #8564
* ztest: dmu_tx_assign() gets ENOSPC in spa_vdev_remove_thread()
  #8889
* Fix lockdep warning on insmod #8868 #8884
* fat zap should prefetch when iterating #8862
* Target ARC size can get reduced to arc_c_min #8864
* Fix typo in vdev_raidz_math.c #8875 #8880
* Improve ZTS block_device_wait debugging #8839
* Block_device_wait does not return an error code #8839
* Remove redundant redundant remove #8839
* Fix logic error in setpartition function #8839
* Allow metaslab to be unloaded even when not freed from #8837
* Avoid updating zfs_gitrev.h when rev is unchanged #8860
* l2arc_apply_transforms: Fix typo in comment #8822
* Reduced IOPS when all vdevs are in the
  zfs_mg_fragmentation_threshold #8859
* Drop objid argument in zfs_znode_alloc() (sync with OpenZFS) #8841
* Remove vn_set_fs_pwd()/vn_set_pwd() (no need to be at / during
  insmod) #8826
* grammar: it is / plural agreement #8818
* Refactor parent dataset handling in libzfs zfs_rename() #8815
* Update comments to match code #8759
* Update descriptions for vnops #8767
* Drop local definition of MOUNT_BUSY #8765
* kernel timer API rework #8647

Changes for 0.8.3

* Fix zfs-0.8.3 "qat.h"
* Prevent unnecessary resilver restarts #9155 #9378 #9551 #9588
* Fix QAT allocation failure return value #9784 #9788
* Fix zfs-0.8.3 zfs_receive_raw test case
* zdb: print block checksums with 6 d's of verbosity
* zfs-load-key.sh: ${ZFS} is not the zfs binary #9780
* Avoid some crashes when importing a pool with corrupt metadata #9022
* In initramfs, do not prompt if keylocation is "file://" #9764
* libspl: declare aok extern in header #9752
* Cancel initialize and TRIM before vdev_metaslab_fini() #8602 #9751
* Update maximum kernel version to 5.4 #9754 #9759
* Fix for ARC sysctls ignored at runtime
* cppcheck: (warning) Possible null pointer dereference: nvh #9732
* cppcheck: (error) Address of local auto-variable assigned #9732
* cppcheck: (error) Null pointer dereference: who_perm #9732
* cppcheck: (warning) Possible null pointer dereference: dnp #9732
* cppcheck: (error) Memory leak: vtoc #9732
* cppcheck: (error) Shifting signed 64-bit value by 63 bits #9732
* cppcheck: (error) Uninitialized variable #9732
* Exchanged two "${ZFS} get -H -o value" commands #9736
* Create symbolic links in /dev/disk/by-vdev for nvme disk devices
  #9730
* Force systems with kernel option "quiet" to display prompt for
  password #9731
* initramfs: setup keymapping and video for prompts #9723
* Don't fail to apply umask for O_TMPFILE files #8997 #8998
* Allow empty ds_props_obj to be destroyed #9704
* Fix use-after-free of vd_path in spa_vdev_remove() #9706
* zio_decompress_data always ASSERTs successful decompression #9612
  #9630
* Exclude data from cores unconditionally and metadata conditionally
  #9691
* Set send_realloc_files.ksh to use properties.shlib #9679
* Fix reporting of L2ARC hits/misses in arc_summary3 #9669
* Fix zdb_read_block using zio after it is destroyed #9644 #9657
* Fix use-after-free in case of L2ARC prefetch failure #9648
* Increase allowed 'special_small_blocks' maximum value #9131 #9355
* Adapt gitignore for modules #9656
* Fix encryption logic in systemd mount generator #9611
* Fix non-absolute path in systemd mount generator #9611
* Fix small typo in systemd mount generator #9611
* Implement -A (ignore ASSERTs) for zdb #9610
* Remove zfs_vdev_elevator module option #9417 #9609
* Add display of checksums to zdb -R #9607
* Check for unlinked znodes after igrab() #9602
* Remove requirement for -d 1 for zfs list and zfs get with bookmarks
  #9589
* Break out of zfs_zget early if unlinked znode #9583
* Remove inappropiate error message suggesting to use '-r' #9574
* Change zed.service to zfs-zed.service in man page #9581
* Prevent NULL pointer dereference in blkg_tryget() on EL8 kernels
  #9546 #9577
* Add missing documentation for some KMC flags #9034
* Fix zpool create -o <property> error message #9550 #9568
* Improve logging of 128KB writes #9409
* Skip loading already loaded key #9495 #9529
* Add a notice in /etc/defaults/zfs for systemd users #9544
* Include prototypes for vdev_initialize #9535
* dracut/zfs-load-key.sh: properly remove prefixes #9520
* Fix contrib/zcp/Makefile.am #9527
* Fix 'zfs change-key' with unencrypted child #9524
* Fix zpool history unbounded memory usage #9516
* Fix incremental recursive encrypted receive #9494
* Use correct format string when printing int8 #9486
* Name anonymous enum of KMC_BIT constants #9478
* Update skc_obj_alloc for spl kmem caches that are backed by Linux
  #9474
* Modify sharenfs=on default behavior #9397 #9425
* Implement ZPOOL_IMPORT_UDEV_TIMEOUT_MS #9436
* Clarify loop variable name in zfs copies test #9445
* Fix pool creation with feature@allocation_classes disabled #9427
  #9429
* Update zfs program command usage #9056 #9428
* Fix automount for root filesystems #9381 #9384
* Rename rangelock_ functions to zfs_rangelock_ #9402
* Workaround to avoid a race when /var/lib is a persistent dataset
  #9360
* Fix for zfs-dracut regression #8913 #9379
* Perform KABI checks in parallel #8547 #9132 #9341
* SIMD: Use alloc_pages_node to force alignment #9608 #9674
* Linux 5.0 compat: SIMD compatibility
* Add warning for zfs_vdev_elevator option removal #9317
* diff_cb() does not handle large dnodes #7678 #8931 #9343
* Use signed types to prevent subtraction overflow #9355
* Refactor libzfs_error_init newlines #9330
* Device removal of indirect vdev panics the kernel #9327
* Fix clone handling with encryption roots #9267 #9294
* Canonicalize Python shebangs #9314
* Fix stalled txg with repeated noop scans #9300
* Clean up do_vol_test in zfs_copies tests #9286
* Fix noop receive of raw send stream #9221 #9173
* Clean up zfs_clone_010_pos #9284
* Refactor checksum operations in tests #9280
* Use the right booleans #9264
* Fix panic on DilOS with kstat per dataset statistics #9254 #9151
* maxinflight can overflow in spa_load_verify_cb() #9272
* Fix typos #9251
* Fix typos in module/zfs/ #9240
* Fix typos in lib/ #9237
* Fix typos in module/ #9241
* Fix typos in modules/icp/ #9239
* Fix typos in include/ #9238
* Fix typos in etc/ #9236
* Fix typos in contrib/ #9235
* Fix typos in cmd/ #9234
* Fix typos in man/ #9233
* Fix typos in config/ #9232
* Fix refquota_007_neg.ksh #9257
* Prevent metaslab_sync panic due to spa_final_dirty_txg #9185 #9186
  #9231 #9253
* Simplify deleting partitions in libtest #9224
* Use compatible arg order in tests #9228
* Use smaller default slack/delta value for schedule_hrtimeout_range()
  #9217
* Prefer for(;;) to while (TRUE) #9219
* Add regression test for "zpool list -p" #9134
* Split argument list, satisfy shellcheck SC2086 #9212
* Fix install error introduced by #9089
* Document ZFS_DKMS_ENABLE_DEBUGINFO in userland configuration #9191
* Dedup IOC enum values in libzfs_input_check #9188
* Enhance ioctl number checks #9187
* Minor cleanup in Makefile.am #9189
* zfs-functions.in: in_mtab() always returns 1 #9168
* Fix lockdep circular locking false positive involving sa_lock #9110
* Set "none" scheduler if available (initramfs) #9042
* Add more refquota tests #9139
* initramfs: fixes for (debian) initramfs #7904 #9089
* dmu_tx_wait() hang likely due to cv_signal() in
  dsl_pool_dirty_delta() #9137
* Improve write performance by using dmu_read_by_dnode() #9156
* Assert that a dnode's bonuslen never exceeds its recorded size #8348
* Make txg_wait_synced conditional in zfsvfs_teardown #9115
* Prevent race in blkptr_verify against device removal #9112
* Fix device expansion when VM is powered off #9111
* spa_load_verify() may consume too much memory #9146
* Change boolean-like uint8_t fields in znode_t to boolean_t #9092
* Drop KMC_NOEMERGENCY #9119
* Don't wakeup unnecessarily in 'zpool events -f' #9091
* Test cancelling a removal in ZTS #9101
* lockdep false positive - move txg_kick() outside of ->dp_lock #9094
* Add channel program for property based snapshots #8443 #9050
* install path fixes #9087
* Don't activate metaslabs with weight 0 #8968
* OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz
  skip blocks #8973
* Concurrent small allocation defeats large allocation #8843
* Fix bp_embedded_type enum definition #8951
* OpenZFS 9425 - channel programs can be interrupted #8904
* looping in metaslab_block_picker impacts performance on fragmented
  pools #8877
* single-chunk scatter ABDs can be treated as linear #8580
* make zil max block size tunable #8865

Test-Parameters: testlist=sanity,sanity-hsm,sanityn,sanity-quota fstype=zfs ostcount=2 mdscount=2
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I28e9cd70e56a2d73fc9b8347a9ddfe28e0a85090
Reviewed-on: https://review.whamcloud.com/37373
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9855 lustre: replace LPROCFS_CLIMP_CHECK() 56/36956/6
Mr NeilBrown [Mon, 9 Dec 2019 05:33:46 +0000 (16:33 +1100)]
LU-9855 lustre: replace LPROCFS_CLIMP_CHECK()

The usage pattern for LPROCFS_CLIMP_CHECK() is clumsy.
It must be paired with LPROCFS_CLIMP_EXIT(), but
not doing this does not produce a compile-time error.
The 'import' should not be dereferenced before the CHECK, or
used after the EXIT, but sometimes it is.

Replace it with a structure macro/statement:

 with_obd_imp_lock(obd, imp, rc) {
     statements;
 }

statements are protected by the semaphore and only run if imp can be
set to a non-NULL pointer.
rc can be changed by the statements, and should be returned
afterwards as it may have been set to -ENODEV.

Errors fixed with this patch:
- some code tested u.cli.cl_import no-NULL even after
  LPROCFS_CLIMP_CHECK()
- some code dereferences cl_import before calling
  LPROCFS_CLIMP_CHECK()
- short_io_bytes_store() and max_procs_in_flight_store() don't access
  the import, so don't need LPROCFS_CLIMP_CHECK
- lprocfs_import_seq_write() set count to an error before 'goto out'
  which would free memory of length "count+1".
- lprocfs_import_seq_write() also called ptlrpc_recover_import()
  on the imp *after* dropping the semaphore.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If9d5eb452157d7f76796f690569ef13fec111d76
Reviewed-on: https://review.whamcloud.com/36956
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12678 lnet: socklnd: mark all ksock_proto struct 'const'. 94/36894/7
Mr NeilBrown [Wed, 13 Nov 2019 01:30:24 +0000 (12:30 +1100)]
LU-12678 lnet: socklnd: mark all ksock_proto struct 'const'.

These structs are always read-only, so tell the compiler.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icc7c3209135a2ab0d04a822b7053231fd2d9ff0c
Reviewed-on: https://review.whamcloud.com/36894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13004 osp: use KIOV in osp_prep_update_req 28/36828/8
Mr NeilBrown [Tue, 28 Jan 2020 13:43:49 +0000 (08:43 -0500)]
LU-13004 osp: use KIOV in osp_prep_update_req

Convert osp_prep_update_req to use a BULK_BUF_KIOV
rather than a BULK_BUF_KVEC descriptor.

This is a step towards remove KIOV support.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2fdf84d73ba2d34c678b6eb6a8bbd323a761dfe4
Reviewed-on: https://review.whamcloud.com/36828
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13004 target: use KIOV for out_handle 26/36826/6
Mr NeilBrown [Sat, 18 Jan 2020 13:51:51 +0000 (08:51 -0500)]
LU-13004 target: use KIOV for out_handle

Convert out_handle() use use a BULK_BUF_KIOV rather than
a BULK_BUF_KVEC.

This is a step towards removed KVEC support and standardizing
on KIOV.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3f5b1b06183a716ba57d6f7f2a28bf5aa0f76dfe
Reviewed-on: https://review.whamcloud.com/36826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13004 osp: break read request into pages. 25/36825/8
Mr NeilBrown [Tue, 28 Jan 2020 13:46:51 +0000 (08:46 -0500)]
LU-13004 osp: break read request into pages.

Rather than breaking up a read request into arbitrarily
sized (4K) pieces of memory in virtual address space,
break it up into pages (which might be 64K) and
use a kiov rather than kvec to manage them.

This is a step towards removing kvec suport and
standardizing on kiov.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If688764c53066a9c4db212682085fa899d4dde1b
Reviewed-on: https://review.whamcloud.com/36825
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12930 various: use schedule_timeout_*interruptible 56/36656/6
Mr NeilBrown [Mon, 4 Nov 2019 01:05:32 +0000 (12:05 +1100)]
LU-12930 various: use schedule_timeout_*interruptible

The construct:

  set_current_state(TASK_UNINTERRUPTIBLE);
  schedule_timeout(time);

Is more clearly expressed as

  schedule_timeout_uninterruptible(time);

And similarly with TASK_INTERRUPTIBLE /
schedule_timeout_interruptible()

Establishing this practice makes it harder to forget to call
set_current_state() as has happened a couple of times - in
lnet_peer_discovery and mdd_changelog_fini().

Also, there is no need to set_current_state(TASK_RUNNABLE) after
calling schedule*().  That state is guaranteed to have been set.

In mdd_changelog_fini() there was an attempt to sleep for
10 microseconds.  This will always round up to 1 jiffy, so
just make it schedule_timeout_uninterruptible(1).

Finally a few places where the number of seconds was multiplied
by 1, have had the '1 *' removed.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I01b37039de0bf7e07480de372c1a4cfe78a8cdd8
Reviewed-on: https://review.whamcloud.com/36656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12911 llite: Don't access lov_md fields before size check 89/36589/9
Mr NeilBrown [Mon, 28 Oct 2019 01:24:26 +0000 (12:24 +1100)]
LU-12911 llite: Don't access lov_md fields before size check

When 'struct lov_user_md' is passed in via setxattr, it comes with
a size.  If thatt size is too small, some function that check exactly
what version is present might access beyond the end of allocation
memory, which can have undesirable effects, such as triggering
a KASAN warning (and possibly worse).

So check that the size is sane before looking inside the structure
at all.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib3f071a3ff77a039fdfa38c903d87999108b3322
Reviewed-on: https://review.whamcloud.com/36589
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-12461 contrib: Add epython scripts for crash dump analysis 82/35282/4
Ann Koehler [Thu, 20 Jun 2019 18:25:02 +0000 (13:25 -0500)]
LU-12461 contrib: Add epython scripts for crash dump analysis

This mod creates a new subdirectory, debug_tools/epython_scripts,
in ./contrib to contain PyKdump scripts. These scripts written in
an extended version of Python aid in memory dump analysis by
extracting and formatting the content of Lustre data structures.

The scripts are written using Python 2.7 and tested on Lustre 2.11
client dumps.

Test-Parameters: trivial

Cray-bug-id: LUS-7501
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I0a15eb9025fb604742f4ae99508a080ce04163dc
Reviewed-on: https://review.whamcloud.com/35282
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12518 llite: fix stride window increase 93/35893/5
Wang Shilong [Fri, 23 Aug 2019 06:17:41 +0000 (14:17 +0800)]
LU-12518 llite: fix stride window increase

Fix following problems:

1. stride_byte_count() argument @off should be @windows_start
rather than @stride_offset to calculate stride bytes.

2. In a limited memory client(for testing etc), we could possibly
have ra_rpc_size(64M) initially > ra_max_pages_per_file, this will
make possibly @window_len 0 after ras_increase_window()

3. @window_len in ras_stride_increase_window() could be negative,
be carefully to avoid overflow.

Change-Id: Ied00bec834d4bb0ad04b688c10a03bbcd667f39b
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35893
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12644 llite: try fast io for stride io correctly 66/35466/13
Wang Shilong [Thu, 8 Aug 2019 17:14:19 +0000 (13:14 -0400)]
LU-12644 llite: try fast io for stride io correctly

We could have a really large gap for stride, calculate
skip pages correctly, otherwise, we will see many small
RPC with large stride gap.

Change-Id: Id72405c11234a2075f3cce4733d23544fe15eb17
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35466
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12756 lnet: Remove unnecessary rtr_nid argument 40/36540/7
Chris Horn [Tue, 22 Oct 2019 02:10:57 +0000 (21:10 -0500)]
LU-12756 lnet: Remove unnecessary rtr_nid argument

Cache the rtr_nid argument in lnet_select_pathway() the same way we
cache the src_nid argument.

Also remove the unnecessary lnet_nid_t variable that stores the
lp_primary_nid solely for the purposes of printing a debug message.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I0a265bbb1c57eba0373a38fbacacceb64faf4614
Reviewed-on: https://review.whamcloud.com/36540
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12756 lnet: Introduce lnet_msg_is_response 39/36539/7
Chris Horn [Tue, 22 Oct 2019 01:38:21 +0000 (20:38 -0500)]
LU-12756 lnet: Introduce lnet_msg_is_response

Implement function to determine if an lnet_msg is a response
(ACK or REPLY).

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I8ba2d92866f8bb2caba120d9f23218bb7761143a
Reviewed-on: https://review.whamcloud.com/36539
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12756 lnet: Refactor lnet_find_existing_preferred_best_ni 38/36538/7
Chris Horn [Tue, 22 Oct 2019 01:14:04 +0000 (20:14 -0500)]
LU-12756 lnet: Refactor lnet_find_existing_preferred_best_ni

Replace lnet_send_data argument.

Get rid of unnecessary lookup for lnet_peer_net.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I69e733d4a0af55ec480df4a13e9153757212333e
Reviewed-on: https://review.whamcloud.com/36538
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12756 lnet: Refactor lnet_set_non_mr_pref_nid 37/36537/7
Chris Horn [Tue, 22 Oct 2019 01:04:08 +0000 (20:04 -0500)]
LU-12756 lnet: Refactor lnet_set_non_mr_pref_nid

Replace lnet_send_data argument.

The sd_send_case check can be removed because all call paths already
satisfy this condition.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I95707c7457edef44eec7d00bde93f731545f8c4e
Reviewed-on: https://review.whamcloud.com/36537
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-8130 lov: convert lo[v|d]_pool to use rhashtable 62/32662/13
NeilBrown [Fri, 24 Jan 2020 15:26:34 +0000 (10:26 -0500)]
LU-8130 lov: convert lo[v|d]_pool to use rhashtable

The pools hashtable can be implemented using
the rhashtable implementation in lib.
This has the benefit that lookups are lock-free.

We need to use kfree_rcu() to free a pool so
that a lookup racing with a deletion will not access
freed memory.

rhashtable has no combined lookup-and-delete interface,
but as the lookup is lockless and the chains are short,
this brings little cost.  Even if a lookup finds a pool,
we must be prepared for the delete to fail to find it,
as we might race with another thread doing a delete.

We use atomic_inc_not_zero() after finding a pool in the
hash table and if that fails, we must have raced with a
deletion, so we treat the lookup as a failure.

Use hashlen_string() rather than a hand-crafted hash
function.
Note that the pool_name, and the search key, are
guaranteed to be nul terminated.

Based on

Linux-commit: 055ed193b190edac539f37a66699b02eae3a19a9

with the port of server side pool handling to rhashtables.

Change-Id: Ia5b4cbbd17515ea43a473e91719b3665f46b0d0a
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/32662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 lnet: discard lnet_ping_buffer_numref() 58/37458/2
Mr NeilBrown [Thu, 7 Nov 2019 04:10:16 +0000 (15:10 +1100)]
LU-9679 lnet: discard lnet_ping_buffer_numref()

This inline function simply reads an atomic_t.  Having
it doesn't make the code any more readable and would
make a subsequent patch a little more awkward.
So remove it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I21a1d2187a654f139a02c0045601086fe612e5bd
Reviewed-on: https://review.whamcloud.com/37458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 modules: convert MIN/MAX to kernel style 56/37456/2
Mr NeilBrown [Wed, 4 Dec 2019 01:26:46 +0000 (12:26 +1100)]
LU-9679 modules: convert MIN/MAX to kernel style

The linux kernel provides a variety of min/max style macros which
ensure type correctness - not risking signed vs unsigned comparisons
etc.

min_t and max_t can be given a type, but if the type of the
args is identicatl min/max can be used.

We also have min3() and max3() to compare three values of identical
type, and clamp_t() to restrict a value to a given range
(min(max(...)).

Use these as appropriate throughout the lustre/lnet kernel code.

The variables (rlength and mlength) have their type changed from int
to unsigned int as this makes more sense in the context, and allows
min() to be used.

Similarly the return type of kiblnd_rd_frag_size() is changed from
__u32 to unsigned int as the return value is *only* used in a min3()
comparison with another unsigned it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9f0cdd23b78d2f9dd04ba58e9b9c7df8d1ee3ca1
Reviewed-on: https://review.whamcloud.com/37456
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12651 osc: always call update_next_shrink 29/37429/4
Alexander Zarochentsev [Tue, 4 Feb 2020 17:47:06 +0000 (20:47 +0300)]
LU-12651 osc: always call update_next_shrink

Call update_next_shrink in case of clients not
supporting grant shrinking or clients with grant
shrinking explicitely disabled. Otherwise
osc_grant_work_handler() schedules itself immediately
after its completion causing excessive CPU consumption.

Fixes: 3e070e30a98d ("LU-8708 osc: enable/disable OSC grant shrink")

Cray-bug-id: LUS-8460
Change-Id: I507b3d10dd5374772456853098bc26053cbd140d
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/37429
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 lustre: avoid cast of file->private_data 51/36651/4
Mr NeilBrown [Sun, 3 Nov 2019 23:02:58 +0000 (10:02 +1100)]
LU-9679 lustre: avoid cast of file->private_data

Instead of
  foo = ((struct seq_file*)file->private_data)->private;
use
  struct seq_file *m = file->private_data;
  foo = m->private;

Many places is lustre use this second style already.
It is much less noisy and prefered for upstream Linux.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9a7adb102687496f43bab099b1ca584955f040c9
Reviewed-on: https://review.whamcloud.com/36651
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
5 months agoLU-12321 mdc: allow ELC for DOM file unlink 42/36442/10
Mikhail Pershin [Fri, 27 Sep 2019 18:29:00 +0000 (21:29 +0300)]
LU-12321 mdc: allow ELC for DOM file unlink

ELC is skipping DOM bit to prevent data flush when it
is not really needed. Meanwhile if lock bits are combined
that caused unlink slowdown because ELC is disabled for
whole lock if DOM bit exists.

This patch is simple approach which determines if inode has
dirty pages and allows ELC for DOM unlink if there are none.

Test result of mdtest_easy_delete on DoM that unlink for
zero-byte files demostrated 28% perforamnce improvements.

1 x AI400(4 x MDS/MDT) on 10 node challenges:
Without patch:
mdtest_easy_delete  96.564 kiops : time 649.36 seconds
With patch:
mdtest_easy_delete 123.630 kiops : time 454.82 seconds

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic5b2aed8c8c0884ee518a587a0c45ad54915f4fa
Reviewed-on: https://review.whamcloud.com/36442
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
5 months agoLU-11961 nodemap: nodemap_create() handles default nodemap 45/34245/8
Sebastien Buisson [Wed, 13 Feb 2019 15:41:47 +0000 (00:41 +0900)]
LU-11961 nodemap: nodemap_create() handles default nodemap

nodemap_create() is responsible for assigning nmc_default_nodemap
so it should not be done outside of this function.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8d0615196e32fb8e6c59ddedd421323a7d6eff7f
Reviewed-on: https://review.whamcloud.com/34245
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13235 lnet: copy the correct amount of CPTs to lnet_cpts 36/36636/4
Mr NeilBrown [Tue, 4 Feb 2020 15:52:22 +0000 (10:52 -0500)]
LU-13235 lnet: copy the correct amount of CPTs to lnet_cpts

A previous patch fixed one of three memcpy() calls in
lnet_net_append_cpts() to copy the correct number of bytes.
This patch fixes the other two.

Test-Parameters: trivial testlist=sanity-lnet

Fixes: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I5a3450b0043c60b6c432db5be47f1e27ecc1fc94
Reviewed-on: https://review.whamcloud.com/36636
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-13210 lnet: gcc8 add implicit-fallthrough decorator 66/37466/3
Shaun Tancheff [Thu, 6 Feb 2020 22:44:13 +0000 (16:44 -0600)]
LU-13210 lnet: gcc8 add implicit-fallthrough decorator

With newer compilers and newer kernels -Werror=implicit-fallthrough
is enabled.

This adds the missing decorator.

Test-Parameters: trivial
Cray-bug-id: LUS-8476
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I47334d5a8d0bcf17489c1b15af29cd553fa01a09
Reviewed-on: https://review.whamcloud.com/37466
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoNew tag 2.13.52 2.13.52 v2_13_52
Oleg Drokin [Wed, 12 Feb 2020 06:18:35 +0000 (01:18 -0500)]
New tag 2.13.52

Change-Id: Iafa9279dd716bac93851412e64ef7b7e85945353
Signed-off-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12988 ldiskfs: mballoc to prefetch groups 93/36893/15
Alex Zhuravlev [Mon, 2 Dec 2019 08:23:30 +0000 (11:23 +0300)]
LU-12988 ldiskfs: mballoc to prefetch groups

ahead of scanning. prefething is done in 8 * flex_bg groups, so
it should be 8 read-ahead reads for a single allocating thread.
at the end of allocation the allocating thread waits for read-ahead
completion and initializes buddy information so that read-aheads
are not lost in case of memory pressure.
at cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of
bitmaps looking for a perfect group and ignoring groups with
good chunks.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If86e3aff75379e064f70c0a66e2d65bdc5593651
Reviewed-on: https://review.whamcloud.com/36893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13180 lustre: reserve bit for RDMA-only memory RPC 83/37383/3
Wang Shilong [Fri, 31 Jan 2020 07:21:30 +0000 (15:21 +0800)]
LU-13180 lustre: reserve bit for RDMA-only memory RPC

This is reserved for RDMA-only memory integrated with Lustre.
The purpose of this bit is to:

1) disable short IO if memory is not dirextly addressie by CPU.
2) prevent CPU memory pages and RDMA memory pages merging into one RPC.

Test-Parameters: trivial
Change-Id: I148b269c5e7d7c52e760b20a6482c259407e0898
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37383
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
5 months agoLU-13134 obdclass: use slab allocation for cl_dio_aio 27/37227/6
Wang Shilong [Tue, 14 Jan 2020 15:00:03 +0000 (23:00 +0800)]
LU-13134 obdclass: use slab allocation for cl_dio_aio

cl_dio_aio is used frequently for dio/aio, try to use
a private slab pool for it.

This could help improve aio performance.

Change-Id: Ic06523ae59eed04e55c17ac03af9187af8f695c5
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37227
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-4198 clio: AIO support for direct IO 16/32416/28
Jinshan Xiong [Mon, 29 Apr 2019 08:30:05 +0000 (16:30 +0800)]
LU-4198 clio: AIO support for direct IO

This patch try to add aio support for Lustre, AIO is
doing IO like DIO but we don't wait IO finished upon
return, we return EIOCBQUEUED to vfs instead to indicate IO
have been issued, aio_complete() will be called in the
callback once IO have been done.

  fio AIO/DIO bandwidth results:
  # numjob=4, bs=512k

  MB/s      write       read
  master      832       1806
  patched    6591      11800

  fio AIO/DIO IOPS results:
  # 32 clients, 8192 threads
  # ioengine=libaio rw=randread blocksize=4096 iodepth=128 direct=1
  # size=1g runtime=300 group_reporting numjobs=256 create_serialize=0

  IOPS      write       read
  master      99K      1239K
  patched    265K      3498K

Test-Parameters: testgroup=review-ldiskfs-arm
Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: If2ac9283612514e10fe342fc43e95b4081347168
Reviewed-on: https://review.whamcloud.com/32416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-4198 clio: turn on lockless for some kind of IO 01/8201/46
Jinshan Xiong [Thu, 9 Mar 2017 19:30:00 +0000 (11:30 -0800)]
LU-4198 clio: turn on lockless for some kind of IO

We can safely turn on lockless for Direct IO
and no lock.

Direct IO will still enqueue lock in the server side,
and we could not use lockless for in the following case:

1) If group lock is held before DIO, use lockless will
make us deadlock, so we use group lock instead and trust
this to protect consistecy.

2) Direct IO might fallback to Buffer IO in some cases,
and we will restart Direct IO with normal lock holding

The main motivation for this patch is to support AIO.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ia004d6b39272df8159c9df3cc76662e198230b55
Reviewed-on: https://review.whamcloud.com/8201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13145 lnet: use conservative health timeouts 30/37430/2
Andreas Dilger [Fri, 31 Jan 2020 20:00:00 +0000 (13:00 -0700)]
LU-13145 lnet: use conservative health timeouts

Use more conservative lnet_transaction_timeout and lnet_retry_count
values by default.  Currently with timeout=10 and retry=3 there is
only a 3s window for the RPC to be sent before it is timed out.
This has caused fault injection rather than fault tolerance.
Increase the default timeout to 50s with retry=2, which is hopefully
long enough to cover virtually all uses, but still allows LNet Health
to be enabled by default and resend before Lustre times out itself.

Fixes: 8632e94aeb7e ("LU-11816 lnet: setup health timeout defaults")

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6bfc4d61cebab38c1554e1b42834b1f38fc34ba8
Reviewed-on: https://review.whamcloud.com/37430
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12593 osd: up i_append_sem during errors 06/37406/3
Alexander Boyko [Mon, 3 Feb 2020 09:24:40 +0000 (04:24 -0500)]
LU-12593 osd: up i_append_sem during errors

There is a potential leak of i_append_sem during errors for
buffer head read and ldiskfs_joural_get_write_access() at
osd_ldiskfs_write_record().
The patch adds up(i_append_sem) for errors paths.

Fixes: f832a7dc33c6 ("LU-12593 osd: zeroing a freshly allocated block buffer")
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I245d0c45af03519c66b75731e5d57f42de41fe95
Reviewed-on: https://review.whamcloud.com/37406
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13191 osp: handle -EROFS in osp_sync_interpret() 04/37404/2
Lai Siyao [Sat, 25 Jan 2020 21:23:28 +0000 (05:23 +0800)]
LU-13191 osp: handle -EROFS in osp_sync_interpret()

Upon OST disk failure, osp_sync_interpret() may get -EROFS,
which is a valid errno.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I5c3cff3019aa47c6d5803f0f0b373bc704f18118
Reviewed-on: https://review.whamcloud.com/37404
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13163 mdc: new kernel function xa_is_value() 99/37399/3
Lai Siyao [Sat, 25 Jan 2020 00:30:44 +0000 (08:30 +0800)]
LU-13163 mdc: new kernel function xa_is_value()

xa_is_value() is added in kernel 4.19-rc6 to replace
radix_tree_entry_exceptional().

Test-Parameters: trivial clientdistro=el8.1 envdefinitions=ONLY=65i testlist=sanity,sanity,sanity,sanity,sanity
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If89aa19c37af8a67debe782d1c77f4ef4dc6f923
Reviewed-on: https://review.whamcloud.com/37399
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-8304 libcfs: convert debug_ctlwq to a completion. 98/37398/3
NeilBrown [Sun, 2 Feb 2020 02:15:17 +0000 (21:15 -0500)]
LU-8304 libcfs: convert debug_ctlwq to a completion.

kthread_run might sleep during an allocation, and so
it's considered unsafe to call with a state that's not
RUNNABLE.
Rather than move the state setting to after kthread_run, which
introduces a small race, replace the waitqueue with a completion.
This has clean semantics which perfectly match the need here.

Change-Id: Ic3bcf21dc747d73ce482e2d50bffd6c43fc04fbc
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/37398
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13183 ldiskfs: Drop remove truncate warning patch 89/37389/3
Shaun Tancheff [Fri, 31 Jan 2020 18:28:28 +0000 (12:28 -0600)]
LU-13183 ldiskfs: Drop remove truncate warning patch

Drop the ext4-remove-truncate-warning.patch as it was
removed as part of
    f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock ...")
and is not needed.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I78667ba380e9e78d4972377e59fa56bc27f15bb5
Reviewed-on: https://review.whamcloud.com/37389
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11300 lnet: remove lnd_query interface. 37/37337/4
Mr NeilBrown [Tue, 28 Jan 2020 00:31:31 +0000 (11:31 +1100)]
LU-11300 lnet: remove lnd_query interface.

The ->lnd_query interface is completely unused, and has been since
commit 8e498d3f23ea ("LU-11300 lnet: peer aliveness")

So remove all mention of it.

Fixes: 8e498d3f23ea ("LU-11300 lnet: peer aliveness")
Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iff11652283b371519cf31bf66b9ba08e024d3193
Reviewed-on: https://review.whamcloud.com/37337
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12988 ldiskfs: skip non-loaded groups at cr=0/1 91/36891/7
Alex Zhuravlev [Thu, 28 Nov 2019 12:04:25 +0000 (15:04 +0300)]
LU-12988 ldiskfs: skip non-loaded groups at cr=0/1

cr=0 is supposed to be an optimization to save CPU cycles,
but if buddy data (in memory) is not initialized then all
this makes no sense as we have to do sync IO taking a lot
of cycles.  also, at cr=0 mballoc doesn't store any avaibale
chunk. cr=1 also skips groups using heruistic based on avg.
fragment size.
it's more useful to skip such groups and switch to cr=2 where
groups will be scanned for available chunks.

using sparse image and dm-slow virtual device of 120TB was
simulated. then the image was formatted as OST and filled
using debugfs to mark ~85% of available space as busy.
mount as OST w/o the patch couldn't complete in half an hour
(according to vmstat it would take ~10-11 hours). with the
patch applied mount took ~20 seconds.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I88c8c1b01b386af0fa438bfeb97acb6110bd00ec
Reviewed-on: https://review.whamcloud.com/36891
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Artem Blagodarenko <c17828@cray.com>
5 months agoLU-13165 mdt: MSG_RESENT can be improperly cleared. 96/37296/2
Andriy Skulysh [Wed, 9 Oct 2019 19:53:14 +0000 (22:53 +0300)]
LU-13165 mdt: MSG_RESENT can be improperly cleared.

req_can_reconstruct() can return -EPROTO, it means that
original request was processed and reply was received.

Change-Id: I06ba9aa24821f414777d38e9ca606652b172e92c
Fixes: 23773b32bf ("LU-11444 ptlrpc: resend may corrupt the data")
Cray-bug-id: LUS-7972
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/37296
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12542 handle: discard h_lock. 63/35863/7
NeilBrown [Fri, 13 Dec 2019 15:48:18 +0000 (10:48 -0500)]
LU-12542 handle: discard h_lock.

The h_lock spinlock is now only taken while bucket->lock
is held.  As a handle is associated with precisely one bucket,
this means that h_lock can never be contended, so it isn't needed.

So discard h_lock.

Also discard an increasingly irrelevant comment in the declaration
of struct portals_handle.

Change-Id: Ib5231fb43d1bf5031d5c2426c4e1d1865544bcf5
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35863
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11607 tests: replace version/fstype calls in sanity/n 19/35719/8
James Nunez [Wed, 7 Aug 2019 19:27:13 +0000 (13:27 -0600)]
LU-11607 tests: replace version/fstype calls in sanity/n

The routine get_lustre_env() is available to all Lustre
test suites and sets an environment variable for the file
system type for MDS1 and OST1 and sets a variable for the
Lustre version of servers.

Replace the calls to facet_fstype() and lustre_version_code()
for all server types defined in get_lustre_env().  While
doing this, replace SINGLEMDS with mds1 in these calls.

Clean up around any modifications with
- converting spaces to tabs
- removing calls to return after skip() or skip_env()

Test-Parameters: trivial testlist=sanityn
Test-Parameters: fstype=zfs testlist=sanityn,sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ibc66220ae3b57cf22395d13f5d35feceeb61adfe
Reviewed-on: https://review.whamcloud.com/35719
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
5 months agoLU-10447 tests: deprecate use of $SETSTRIPE/$GETSTRIPE 25/33925/3
James Nunez [Thu, 27 Dec 2018 16:50:48 +0000 (09:50 -0700)]
LU-10447 tests: deprecate use of $SETSTRIPE/$GETSTRIPE

$SETSTRIPE and $GETSTRIPE were needed when we used the
standalone 'lstripe' utility. 'lstripe' hasn't been used
for years and we need to clean up all remnants of it.

Remove the definition and replace all instances of
$SETSTRIPE with '$LFS setstripe' and $GETSTRIPE with
'$LFS getstripe' in test-framework library.

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ibd78b2d75b0b8fc7ff686c1b0a73ce51fe9452e2
Reviewed-on: https://review.whamcloud.com/33925
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-9679 lustre: use LIST_HEAD() for local lists. 55/36955/4
Mr NeilBrown [Thu, 5 Dec 2019 06:09:19 +0000 (17:09 +1100)]
LU-9679 lustre: use LIST_HEAD() for local lists.

When declaring a local list head, instead of

   struct list_head list;
   INIT_LIST_HEAD(&list);

use
   LIST_HEAD(list);

which does both steps.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I67bda77c04479e9b2b8c84f02bfb86d9c2ef5671
Reviewed-on: https://review.whamcloud.com/36955
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9679 lnet: use LIST_HEAD() for local lists. 54/36954/4
Mr NeilBrown [Thu, 5 Dec 2019 05:56:16 +0000 (16:56 +1100)]
LU-9679 lnet: use LIST_HEAD() for local lists.

When declaring a local list head, instead of

   struct list_head list;
   INIT_LIST_HEAD(&list);

use
   LIST_HEAD(list);

which does both steps.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia1f1f1abf1b8a9f50e3033976990010b1d2100db
Reviewed-on: https://review.whamcloud.com/36954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>