Whamcloud - gitweb
fs/lustre-release.git
4 months agoLU-12214 build: fix build with MPI 27/36427/7
Alexey Lyashkov [Fri, 8 Nov 2019 07:58:29 +0000 (10:58 +0300)]
LU-12214 build: fix build with MPI

 lack of MPI library dependence will lost a mpi tests
 from build

Cray-bug-id: LUS-6033, LUS-7204
Test-parameters: trivial
Change-Id: I88b8ad67a9a2863fdcd02e2df5289fdd480c2a74
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/36427
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12214 build: fix build with mofed 26/36426/7
Alexey Lyashkov [Fri, 8 Nov 2019 07:58:29 +0000 (10:58 +0300)]
LU-12214 build: fix build with mofed

add a MOFED / OPA modules dependency to produce a right
package.

Test-parameters: trivial
Cray-bug-id: LUS-6033, LUS-7204
Change-Id: I843762945c5dbb59d53e1913b7382813bfba86ad
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/36426
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12214 build: fixes if the name is not just 'lustre' 24/36424/7
Alexey Lyashkov [Wed, 22 Jan 2020 11:02:27 +0000 (14:02 +0300)]
LU-12214 build: fixes if the name is not just 'lustre'

fix using a %{name} macro in spec file.
this allow to have right names for server packages.

Cray-bug-id: LUS-5915
Test-parameters: trivial
Change-Id: I2ae271b5344fb899bb053f82d2534355ce60aa3a
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/36424
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12780 osp: don't use ptlrpc_thread for opd_update_thread 64/36264/5
Mr NeilBrown [Wed, 23 Oct 2019 00:30:50 +0000 (11:30 +1100)]
LU-12780 osp: don't use ptlrpc_thread for opd_update_thread

rather than ptlrpc_thread, use native kthreads functionality.

- There is no need to synchornized on startup - do all the startup
  that can fail before starting the thread.  This involves puting the
  lu_env in the per-thread struct osp_updates.

- Synchronization on shutdown is down with kthread_stop() and
  kthread_should_stop().  lu_env_put() call is moved to after
  kthread_stop() call, as it is theoretically possible that the
  thread function never gets called, so it isn't safe for it to
  be responsible for cleanup.

- opd_update_thread is replace with ou_update_task in struct
  osp_updates.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I78dafbadf419ee9b80a9bd0046152fb6f293f191
Reviewed-on: https://review.whamcloud.com/36264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11510 lfs: migrate a composite layout file correctly 82/36082/10
Emoly Liu [Wed, 26 Feb 2020 14:12:59 +0000 (22:12 +0800)]
LU-11510 lfs: migrate a composite layout file correctly

The patch fixes the following issues:
- in function migrate_open_files(), "layout" pointer should be used
  instead of "param" pointer to tell whether a comp file should be
  created or not, because "param" pointer is always not null and
  the composite layout file will never be created;
- make --copy and --yaml options work correctly in lfs_migrate tool;
- when a composite layout file is migrated, "--copy" option will be
  added to preserve its layout in both lfs_migrate and "lfs migrate",
  and in such situation, pool name will be saved as well;
- when a file is restriped with -R option by lfs_migrate, the file
  will be set with its parent's stripe by default, by adding
  "--copy $parent_dir" option;
- do some code cleanup in lfs_migrate and sanity.sh test_56wb/c

sanity.sh test_56xd/xe are added to verify this patch.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I85779c69e74444eb869f28add4363ad3a6835b97
Reviewed-on: https://review.whamcloud.com/36082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12542 ldlm: don't access l_resource when not locked. 84/35484/5
Mr NeilBrown [Thu, 27 Feb 2020 15:15:16 +0000 (10:15 -0500)]
LU-12542 ldlm: don't access l_resource when not locked.

lock->l_resource can (sometimes) change when the resource
isn't locked.
So dereferencing lock->l_resource and then locking the
resource looks wrong.
As lock_res_and_lock() returns the locked resource, this
code can easily be more obviously correct by using
that return value.

Change-Id: Iced0bf1af4fa8ddedffa817e00f1c6a02b035d76
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/35484
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11505 tests: customise run_*() functions 52/33352/12
Elena Gryaznova [Thu, 20 Feb 2020 16:50:21 +0000 (19:50 +0300)]
LU-11505 tests: customise run_*() functions

For PFL testing we need the ability to run
parallel-scale tests with composite files.
Patch adds the subtests' specific variables *_STRIPEPARAMS
which allow to set any specified striping/composite layout
for test directory.

Test-Parameters: trivial testlist=parallel-scale,parallel-scale-nfsv3
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Cray-bug-id: LUS-5936
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I767fa17523f5c40d50260892901dc947a1da8a24
Reviewed-on: https://review.whamcloud.com/33352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9868 llite: remove lld_it field of ll_dentry_data 41/37741/2
Mr NeilBrown [Thu, 27 Feb 2020 03:08:30 +0000 (14:08 +1100)]
LU-9868 llite: remove lld_it field of ll_dentry_data

This field is never set nor used.  Let's remove it.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If01ba4deec34a6528b3c36c45a378b9f3824ca2f
Reviewed-on: https://review.whamcloud.com/37741
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 osc: fix to return right weight in osc_lock_weight() 35/37735/2
Wang Shilong [Thu, 27 Feb 2020 01:36:05 +0000 (09:36 +0800)]
LU-9679 osc: fix to return right weight in osc_lock_weight()

cl_io_init() might return a negative value, it could be indicated
lock should be not eliminated, osc_lock_weight() expect a positive
value, fix to return 1 in this case.

This is not a problem for now, as osc_io_init() only return 0,
To avoid any potential issues in the future, better fix it.

Test-Parameters: trivial
Change-Id: I657e5dcc4bb7691bf3b4ca06df7cb87945008a93
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37735
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 osc: use rb_entry_safe 04/37604/2
Geliang Tang [Tue, 20 Dec 2016 13:56:55 +0000 (21:56 +0800)]
LU-9679 osc: use rb_entry_safe

Use rb_entry_safe() instead of container_of() to simplify the code.

Linux-Commit: e3e0293ca9b9 ("staging: lustre: osc: use rb_entry_safe")

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9f8e19d45d859a6c7b7aa01093a1c2211063874c
Reviewed-on: https://review.whamcloud.com/37604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 osc: simplify osc_page_gang_lookup() 99/37599/2
NeilBrown [Mon, 17 Dec 2018 04:06:47 +0000 (15:06 +1100)]
LU-9679 osc: simplify osc_page_gang_lookup()

osc_page_gang_lookup() has 4 values that it can receive from a
callback, and that it can return to the caller:
CLP_GANG_OKAY,
CLP_GANG_RESCHED,
CLP_GANG_AGAIN,
CLP_GANG_ABORT

"AGAIN" is never used.
"RESCHED" is not needed as a cond_resched() can safely be called at
the point this is returned, rather than returning it.
That leaves "OKAY" and "ABORT" which can simply by "true" and "false"
boolean values.

Internalizing the RESCHED case means the callers don't need to loop
themselves.  This simplifies calling patterns.

Linux-Commit: 1e8fd6f9806c ("lustre: osc_cache: simplify
osc_page_gang_lookup()")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I603963b72e4299bebcdaf4064428d6281fd12def
Reviewed-on: https://review.whamcloud.com/37599
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13102 llog: fix processing of a wrapped catalog 02/37102/4
Alexander Boyko [Mon, 16 Dec 2019 13:24:16 +0000 (08:24 -0500)]
LU-13102 llog: fix processing of a wrapped catalog

The logic for rereading a llog buffer had an exception
for a full catalog, when lgh_last_idx = llh_cat_idx and a first
processing index is a llh_cat_idx+1. This check is based on
a value lh_last_idx, which stays unchanged between buffer read.
But llh_cat_idx could go forward, and this lead to a wrong check
where reread doesn't happen. As a result Lustre got ENOENT for
a record and stoped osp processing.

llog_cat_set_first_idx())
catlog [0x6:0x1:0x0] first idx 34730, last_idx 34731
osp_sync_process_queues()) 1 changes, 0 in progress, 0 in flight
llog_process_thread())
stop processing plain 0x76941:1:0 index 64767 count 1
llog_process_thread())
index: 34731, lh_last_idx: 34730 synced_idx: 34730 lgh_last_idx: 34731
llog_cat_process_common()) processing log [0x2780f:0x1:0x0]:0
at index 34731 of catalog [0x6:0x1:0x0]
llog_cat_id2handle()) snx11281-OST0001-osc-MDT0001:
error opening log id [0x2780f:0x1:0x0]:0:rc = -2

The patch fixes logic and also adds/changes debugging for
llog and osp.

Cray-bug-id: LUS-8193
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I9463223a1ea904b96643b19e1580f5894142c12b
Reviewed-on: https://review.whamcloud.com/37102
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 months agoLU-9679 libcfs: Add CFS_ALLOC_PTR_ARRAY and free 75/36975/6
Mr NeilBrown [Thu, 14 Nov 2019 02:59:39 +0000 (13:59 +1100)]
LU-9679 libcfs: Add CFS_ALLOC_PTR_ARRAY and free

Following the pattern of CFS_ALLOC_PTR() and the kernel
pattern of kmalloc_array(), add
  CFS_ALLOC_PTR_ARRAY()
and
  CFS_FREE_PTR_ARRAY()

which allocate and free arrays given a pointer to
the array, and a number of elements

This makes code easier to read and could be a step
toward using the kernel's hardened alloc_array interfaces,
which insure the multiplication doesn't overflow.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I54c919bd9ce22fbc72715c3da16f8c29e7135ccc
Reviewed-on: https://review.whamcloud.com/36975
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-12950 osd-ldiskfs: increase supported size to 1024tb 05/36705/6
Artem Blagodarenko [Thu, 7 Nov 2019 15:56:26 +0000 (18:56 +0300)]
LU-12950 osd-ldiskfs: increase supported size to 1024tb

Currently attempts of creating ldisk file system with size >512TB
finished with message:

    LDISKFS-fs does not support file systems greater than 512TB
    and can cause data corruption. Use "force_over_512tb" mount
    option to override.

Change "force_over_512tb" mount option to "force_over_1024tb" as
testing for these large filesystems have not shown any serious
functional problems (though there are performance issues to address).

Test-Parameters: trivial fstype=ldiskfs testlist=conf-sanity
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Cray-bug-id: lus-6815
Change-Id: I0f84fe85aaab05ab0bafa5f0e6074e1690d79899
Reviewed-on: https://review.whamcloud.com/36705
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
4 months agoLU-13229 ldlm: unlock request memory leak 70/37670/2
Alexey Lyashkov [Fri, 21 Feb 2020 15:36:42 +0000 (18:36 +0300)]
LU-13229 ldlm: unlock request memory leak

resending an unlock request can caused an memory leak as new request
will allocated on resend. Lets finish a request before resend.

Fixes: 85a12c6c8d7 ("LU-12828 ldlm: FLOCK request can be processed twice")
Cray-bug-id: LUS-8447
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I15dfe0388d1305c8eb5be18b19b0ffc687454ef1
Reviewed-on: https://review.whamcloud.com/37670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13296 obd: make statfs cache working again 53/37753/3
Alexey Lyashkov [Thu, 27 Feb 2020 14:48:48 +0000 (17:48 +0300)]
LU-13296 obd: make statfs cache working again

Once statfs raced on mutex, lets read a cached data instead
of trash.

Test-Parameters: testlist=sanity envdefinitions=ONLY=423,ONLY_REPEAT=500
Fixes: 1c41a6ac390b ("LU-12368 obdclass: don't send multiple statfs RPCs")
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I268782875c30c078f239c194f69cdf7506d66169
Reviewed-on: https://review.whamcloud.com/37753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
4 months agoLU-13209 build: Fix vvp_account_page_dirtied 78/37778/3
Shaun Tancheff [Tue, 3 Mar 2020 01:17:18 +0000 (19:17 -0600)]
LU-13209 build: Fix vvp_account_page_dirtied

Fix compile error vvp_account_page_dirtied undefined

Fixes: 788e464a72 ("LU-13288 llite: Find account_page_dirtied on module init")
Cray-bug-id: LUS-8472
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iae0a3af1b091165423a2bb24a9159a7af8ab3cbe
Reviewed-on: https://review.whamcloud.com/37778
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13241 utils: use libext2fs for ldiskfs operations 56/37656/7
Li Dongyang [Fri, 21 Feb 2020 02:36:26 +0000 (13:36 +1100)]
LU-13241 utils: use libext2fs for ldiskfs operations

Instead of using debugfs to stat and read the CONFIGS/mountdata,
we can switch to libext2fs and control the flags used when
opening the ldiskfs, to reduce the mount time on huge targets.

Change-Id: I9da8fc1c77d149fe5cf3bd19b0ff76d892620101
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/37656
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13314 test: add 56ob to ALWAYS_EXCEPT for now 67/37767/3
Wang Shilong [Mon, 2 Mar 2020 04:55:27 +0000 (12:55 +0800)]
LU-13314 test: add 56ob to ALWAYS_EXCEPT for now

There is a problem that 'lfs find' try to calculate
365 days for one year, this will be problem for leap year.

We need fix test or codes, before that let's disable this test
to make other patches landed possibly.

Change-Id: I79d34ce29657b4d149a0fbe82220cfefa0c60378
Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=56ob
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37767
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-12988 ldiskfs: mballoc to prefetch groups 33/37633/3
Alex Zhuravlev [Mon, 2 Dec 2019 08:23:30 +0000 (11:23 +0300)]
LU-12988 ldiskfs: mballoc to prefetch groups

ahead of scanning. prefething is done in 8 * flex_bg groups, so
it should be 8 read-ahead reads for a single allocating thread.
at the end of allocation the allocating thread waits for read-ahead
completion and initializes buddy information so that read-aheads
are not lost in case of memory pressure.
at cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of
bitmaps looking for a perfect group and ignoring groups with
good chunks.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ibce9a060900544abe994f4d7a20ac9c2979ccc56
Reviewed-on: https://review.whamcloud.com/37633
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
4 months agoLU-12043 llite: move tunable params to sysfs_memparse() 49/34849/16
Andreas Dilger [Fri, 10 May 2019 21:09:41 +0000 (15:09 -0600)]
LU-12043 llite: move tunable params to sysfs_memparse()

Move the max_read_ahead_* tunables from debugfs to sysfs, since
they follow the one-value-per-file rule and should be visible to
regular users.

Rename the functions and constants from *readahead* to *read_ahead*
or *READ_AHEAD* to match the tunable names from procfs.

Deprecate usage of llprocfs_str_with_units_to_s64(), lu_str_to_s64(),
llprocfs_str_with_units_to_u64(), and lu_str_to_u64(), and instead
use sysfs_memparse() to parse sizes in the few remaining places
where they are used.  A separate patch will remove those functions.

Minor fix to the "lctl set_param" man page.

Fixes: adb5aca3d673 ("LU-8066 llite: Move all remaining procfs entries to debugfs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2cdf5f8f0aeca458ed1989366102c33ae83ebbe5
Reviewed-on: https://review.whamcloud.com/34849
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13096 llite: fix potential overflow in ll_max_cached_mb_seq_write() 07/37707/4
Wang Shilong [Tue, 25 Feb 2020 07:58:55 +0000 (15:58 +0800)]
LU-13096 llite: fix potential overflow in ll_max_cached_mb_seq_write()

atomic_long_cmpxchg() return long, however @rc is int, if we have
a larger memory etc 24T, we will get overflow, and @diff will never
change thus we get a hang loop there.

Test-Parameters: trivial
Fixes: adb5aca3d673 ("LU-8066 llite: Move all remaining procfs entries to debugfs")
Change-Id: I20d6feff9797ba10a089587bee0d8691bee460df
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37707
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13283 tests: add racer to nonmpi load 71/37671/5
Elena Gryaznova [Fri, 21 Feb 2020 15:40:38 +0000 (18:40 +0300)]
LU-13283 tests: add racer to nonmpi load

Patch adds the ability to run racer as one of
nonmpi loads. All racer parameters can be set over
RACERP:
Example:
  RACERP="MDSCOUNT=3 DURATION=600 RACER_ENABLE_STRIPED_DIRS=false"
etc.

ha_start_mpi_loads() is improved to not create
machinefiles if no mpi load set.

Test-Parameters: trivial
Cray-bug-id: LUS-8297
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I974838ce199897284396ef53d1a487d1a1ae1774
Reviewed-on: https://review.whamcloud.com/37671
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13281 tests: ha.sh improvements 66/37666/2
Elena Gryaznova [Fri, 21 Feb 2020 14:28:55 +0000 (17:28 +0300)]
LU-13281 tests: ha.sh improvements

Path adds the possbility to set the different layouts
for clients directories. This allows to run the clients loads
on the directories with DoM, PFL, SEL, etc. including the simple
layouts set on old clients, which is useful for inter-operation
testing.

Patch adds the possibility to split the full client list to
separate subsets by specifying NCLIENTSSET equal to 1 by
default. For NCLIENTSSET=2 two subsets will be created, each
one are passed to corresponding machinefile. This allow
to split the loads which is also useful for inter-operation:
old clients can operate with directories with old layouts while
the new clients operate with directories with new DoM, PFL, SEL,
etc. layouts.
For NCLIENTSSET=2, ${#ha_clients[@]}=4 and
CLIENTSSTRIPE='"-E $((64*1024)) -L mdt -E EOF" "-c -1 "'
layout "-E $((64*1024)) -L mdt -E EOF" will be applied on
even clients ${ha_clients[0]} and ${ha_clients[2]},
layout "-c -1 " will be applied on odd clients ${ha_clients[1]}
and ${ha_clients[3]}.

Test-Parameters: trivial
Cray-bug-id: LUS-6906
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: Ia6f8c07e3936ea773706c3e79217672c68d77bd0
Reviewed-on: https://review.whamcloud.com/37666
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13219 tests: add nfs-server service in setup-nfs.sh 63/37663/2
Jian Yu [Fri, 21 Feb 2020 08:50:38 +0000 (00:50 -0800)]
LU-13219 tests: add nfs-server service in setup-nfs.sh

For RHEL 8.1, the NFS server service is nfs-server
instead of nfs.

Test-Parameters: trivial \
clientdistro=el8.1 serverdistro=el8.1 \
testlist=parallel-scale-nfsv3,parallel-scale-nfsv4

Change-Id: I5a97c8fe419187412dfc02047ed66141f567d7fb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13270 tests: dom-performance fixes 44/37644/3
Elena Gryaznova [Thu, 20 Feb 2020 13:29:24 +0000 (16:29 +0300)]
LU-13270 tests: dom-performance fixes

Patch makes the fs cleanup at the start of dom-performance
optional and depending on t-f global DO_CLEANUP option
value. For DO_CLEANUP=false script removes only the
directories created by previous dom-performance session.
Without this fix test run 2nd time fail with:
  lfs setdirstripe: dirstripe error on '/mnt/testfs/dp_dne':
      stripe already set
  lfs setdirstripe: cannot create stripe dir '/mnt/testfs/dp_dne':
      File exists

Patch removes the check of the files createmany, statmany
and smalliomany which are part of lustre/tests. With
the existing check test_smallio() is always skipped
if run on PWD != lustre/tests.

Patch removes the comparison with MDSSIZE and adds the
comparison with real mds size. Without this fix the test
works incorrectly if run on existing fs (formatall was not
done during this session: Lustre was created with mds size
differs from MDSSIZE).

Add skip_env() to report about missing mdtest/dbench/IOR/etc.
Without this fix  the tests are skipped silently.

Test-Parameters: trivial mdssizegb=20 testlist=dom-performance
Cray-bug-id: LUS-7349
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I5fdf1ad480edb17598cbe427bc550396ebe97808
Reviewed-on: https://review.whamcloud.com/37644
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13269 tests: make lnet-selftest.sh more flexible 40/37640/2
Elena Gryaznova [Thu, 20 Feb 2020 12:50:32 +0000 (15:50 +0300)]
LU-13269 tests: make lnet-selftest.sh more flexible

lnet-selftest.sh adds both LST tests: from client to server and
from server to client, but sometimes we are interesting to execute
the tests only from server or only from client.

Patch allows to set required "from" parameter via lst_FROM variable.
The following values are used:
lst_FROM=c : add the tests from clients to server only
lst_FROM=s : add the tests from server to clients only

Test-Parameters: testlist=lnet-selftest envdefinitions=lst_FROM=c
Cray-bug-id: LUS-8178
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0111a4386b3d8acb022db11f11a0be970864696d
Reviewed-on: https://review.whamcloud.com/37640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13268 tests: customize lnet-selftest for performance 39/37639/2
Elena Gryaznova [Thu, 20 Feb 2020 12:32:54 +0000 (15:32 +0300)]
LU-13268 tests: customize lnet-selftest for performance

Sometimes we need to run read/write/ping tests separately
with different lst check flags.

Patch allows to:
  set the required list of tests via new lst_TESTS variable;
  set lst check to LST_BRW_CHECK_NONE, LST_BRW_CHECK_FULL
      or LST_BRW_CHECK_SIMPLE.

Test-Parameters: trivial testlist=lnet-selftest
Cray-bug-id: LUS-8005
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I502028f645f39f391c19b2028fc6ba5b7bdb1a96
Reviewed-on: https://review.whamcloud.com/37639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13267 tests: improve racer cleanup 38/37638/2
Elena Gryaznova [Thu, 20 Feb 2020 10:06:24 +0000 (13:06 +0300)]
LU-13267 tests: improve racer cleanup

Add RACER_MAX_CLEANUP_WAIT parameter to specify
timeout for racer cleanup to avoid long waiting in case which
racer processes went non-killable.

Loop in racer_cleanup() contains inaccuracy which made the loop
to sleep less that it was supposed to. Fix it.

Test-Parameters: trivial testlist=racer
Cray-bug-id: LUS-8498
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I32dac8bc11ef2041a1a580054c2782780bb5980e
Reviewed-on: https://review.whamcloud.com/37638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13227 LDLM: update LVB if it is server lock 11/37611/6
Wang Shilong [Wed, 19 Feb 2020 08:29:16 +0000 (16:29 +0800)]
LU-13227 LDLM: update LVB if it is server lock

ldlm_glimpse_ast() is registered for server lock which means
when client send a glimpse request, it just return a special error
for this lock, it is possible that local object has size expanding
with this PW lock, so we should try update LVB upon error.

Originally, ldlm_cb_interpret() has codes to handle this error,
but it only try to handle case with some clients race, it doesn't
cover server lock cases especially after we turn on lockless for DIO.

Fixes: 6bce536725 ("LU-4198 clio: turn on lockless for some kind of IO")
Change-Id: Ic84fd19d9eaf7f8245b8f7a2165ee5913849ac01
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37611
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13160 tests: fix sanity-hsm monitor setup 95/37595/2
Li Dongyang [Mon, 17 Feb 2020 02:53:16 +0000 (13:53 +1100)]
LU-13160 tests: fix sanity-hsm monitor setup

On RHEL8, even we are using pdsh -R ssh,
the ssh still waits for the remote cat process
to finish.
Use the subshell to avoid the time out.

Change-Id: Id5b8d492b5ce9a235da73448ade475ade145bbed
Test-Parameters: trivial clientdistro=el8.1 testlist=sanity-hsm
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/37595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13211 ldiskfs: rework data-in-dirent for linux 5.4.7+ 67/37467/6
Shaun Tancheff [Thu, 20 Feb 2020 00:41:26 +0000 (18:41 -0600)]
LU-13211 ldiskfs: rework data-in-dirent for linux 5.4.7+

Linux commit v5.4-rc3-92-g109ba779d6cc
ext4: check for directory entries too close to block end

This impacts the ext4-data-in-dirent.patch due to the usage
of EXT4_DIR_REC_LEN

Invert the original patch from:
  EXT4_DIR_REC_LEN(<int>)        => __EXT4_DIR_REC_LEN(<int>)
  EXT4_DIR_REC_LEN(de->name_len) => EXT4_DIR_REC_LEN(de)
to:
  EXT4_DIR_REC_LEN(<int>)        => EXT4_DIR_REC_LEN(<int>)
  EXT4_DIR_REC_LEN(de->name_len) => EXT4_DIR_ENTRY_LEN(de)

Doing this allows the patch to apply cleanly over a wider
range of kernel releases.

Test-Parameters: trivial
Cray-bug-id: LUS-8478
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I95349743f323bb150854ff4541bd2c88f01662a6
Reviewed-on: https://review.whamcloud.com/37467
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12945 lnet: Disable zero copy when running on VM 00/37300/9
Shaun Tancheff [Wed, 19 Feb 2020 16:17:06 +0000 (10:17 -0600)]
LU-12945 lnet: Disable zero copy when running on VM

When running on a hypervisor platform zero copy buffers
may still be referenced when write queue size is zero

So when running on a hypervisor push the zero copy size limit
above max payload size of 16M.

Use the hypervisor test added to linux v4.14-119-g79cc74155218
and provide a replacement for earlier kernels.

kernel-commit: 79cc74155218316b9a5d28577c7077b2adba8e58

Cray-bug-id: LUS-8072
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5582a6aa8da6f48deafaf13d60cf67a09d7a7231
Reviewed-on: https://review.whamcloud.com/37300
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13110 kernel: kernel update SLES12 SP4 [4.12.14-95.45.1] 23/37123/3
Jian Yu [Fri, 14 Feb 2020 19:10:09 +0000 (11:10 -0800)]
LU-13110 kernel: kernel update SLES12 SP4 [4.12.14-95.45.1]

Update SLES12 SP4 kernel to 4.12.14-95.45.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT="103a 817"

Change-Id: I1f7024465b4b6334488b7314f1073fafa10331d6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37123
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13082 tests: enable lgss_keyring debug traces 42/37042/8
Sebastien Buisson [Mon, 16 Dec 2019 18:50:10 +0000 (03:50 +0900)]
LU-13082 tests: enable lgss_keyring debug traces

Enable lgss_keyring debug traces in test framework, and collect
systemd journal in case of test failure.
Also, if needed, dump SSK keys, keyring and nodemap definitions.

Test-Parameters: trivial
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I201d97b640045e3cbc8dcd4cd4b25e0e4e644037
Reviewed-on: https://review.whamcloud.com/37042
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-13097 tests: set fail_loc on all MDS nodes for pdir tests 45/37145/4
Sebastien Buisson [Mon, 6 Jan 2020 15:21:35 +0000 (00:21 +0900)]
LU-13097 tests: set fail_loc on all MDS nodes for pdir tests

Set fail_loc on all MDS nodes for pdir tests, not only $SINGLEMDS.

Test-Parameters: trivial testlist=sanityn,sanityn,sanityn,sanityn
Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn,sanityn
Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn,sanityn
Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn,sanityn
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I80d935eae0b06e39712abfe48a56b8ce08537926
Reviewed-on: https://review.whamcloud.com/37145
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 months agoLU-13004 ptlrpc: remove *GET*KVEC macros and fields. 72/36972/3
Mr NeilBrown [Wed, 4 Dec 2019 02:22:29 +0000 (13:22 +1100)]
LU-13004 ptlrpc: remove *GET*KVEC macros and fields.

GET_KVEC, BD_GET_KVEC, GET_ENC_KVEC, BD_GET_ENC_KVEC
are no longer used.
There are the only users of the bd_kvec field of bd_u,
so that field can be removed, and bd_u can be discarded
and its other field (bd_kiov) promoted.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I31d19f6867c907029aa8d9ceee27c5ac9c9225a1
Reviewed-on: https://review.whamcloud.com/36972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13293 llite: don't abort readahead too aggressively 97/37697/2
Wang Shilong [Tue, 25 Feb 2020 01:49:04 +0000 (09:49 +0800)]
LU-13293 llite: don't abort readahead too aggressively

We will stop readahead if we have covered @ria_end_idx_min
but lock contention case happen, this could cause a problem
with small read with SSF mode.

Because even lock contention happen, but every client could
potentially have a large consecutive pages to read, if this
is a 4K read, it will end up with a lot small read.

To fix this problem, at least allow readahead continue with
one stripe size, this is exact behavior before the commit

Without Patch:
Max Write: 13082.37 MiB/sec (13717.85 MB/sec)
Max Read:  854.17 MiB/sec (895.67 MB/sec)

With Patch:
Max Write: 12448.90 MiB/sec (13053.61 MB/sec)
Max Read: 23921.73 MiB/sec (25083.75 MB/sec)

Fixes: cfbeae9 ("LU-12043 llite: extend readahead locks for striped file")
Change-Id: I59963592f6dbe6babd746cd01441f4a99a8cafcb
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37697
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13288 llite: Find account_page_dirtied on module init 86/37686/3
Shaun Tancheff [Mon, 24 Feb 2020 19:11:03 +0000 (13:11 -0600)]
LU-13288 llite: Find account_page_dirtied on module init

Kernel v5.2-5678-gac1c3e4 no longer exports account_page_dirtied
Use kallsyms_lookup_name to find and use it as
vvp_account_page_dirtied on module init to avoid any
performance regressions due to symbol_get.

Test-Parameters: clientdistro=el8.1
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie27abb07ffbf9e5be67fe64601ebc62409f829fd
Reviewed-on: https://review.whamcloud.com/37686
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups 87/37687/3
Alex Zhuravlev [Mon, 24 Feb 2020 03:57:24 +0000 (06:57 +0300)]
LU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups

as those need no IO to initialize buddy structures and the best
candidates for new blocks.

Fixes: 6a7a700a1490 ("LU-12988 ldiskfs: skip non-loaded groups at cr=0/1")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic3c5a238d8825024d7a0fec6a25e842b7ba1f100
Reviewed-on: https://review.whamcloud.com/37687
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13004 lustre: remove support for KVEC bulk descriptors 71/36971/3
Mr NeilBrown [Thu, 21 Nov 2019 04:05:56 +0000 (15:05 +1100)]
LU-13004 lustre: remove support for KVEC bulk descriptors

KVEC descriptors are no long used nor needed.
KIOV are sufficient for all needs.

This allows us to remove
  PTLRPC_BULK_BUF_KVEC
and
  PTLRPC_BULK_BUF_KIOV
flags - the distinction no longer exists.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic3a6ec942b60a05c7ce6c5b05659700e1399d0b9
Reviewed-on: https://review.whamcloud.com/36971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13026 lnet: MR selection of gateway ni 13/36913/6
Amir Shehata [Fri, 7 Feb 2020 19:16:23 +0000 (11:16 -0800)]
LU-13026 lnet: MR selection of gateway ni

Multi-Rail gateways can only have 1 route pointing to them. Use
the MR selection algorithm to get the best lpni on the MR
gateway to use.

Using the selection algorithm to find the gateway NI, allows us
to use the standard MR criteria: health, preference and credits.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I65526c6b64a5702734949c9a583a7558614ceae2
Reviewed-on: https://review.whamcloud.com/36913
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13025 lnet: pick healthiest peer net 12/36912/6
Amir Shehata [Wed, 20 Nov 2019 03:40:34 +0000 (19:40 -0800)]
LU-13025 lnet: pick healthiest peer net

When iterating over the peer nets, select the healthiest one.
Node might be able to reach a peer over multiple nets, and therefore
the health of these peer nets must be considered.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I155888dca358627fcb63c2ed0e51114bc49a9ff1
Reviewed-on: https://review.whamcloud.com/36912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12678 lnet: remove lnet_me_alloc/lnet_me_free 10/36910/5
Mr NeilBrown [Tue, 3 Dec 2019 23:35:02 +0000 (10:35 +1100)]
LU-12678 lnet: remove lnet_me_alloc/lnet_me_free

These functions are simple wrapper that do not benefit
readability, so remove them.

Move the DEBUJG messages to the places where allocation happens.  This
introduces only a tiny amount of code duplication.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ieb40f8a7547cba30b05c1e5e526c762e354f3c47
Reviewed-on: https://review.whamcloud.com/36910
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13040 lmv: Pool name string handling 08/36908/13
Shaun Tancheff [Thu, 20 Feb 2020 17:11:14 +0000 (11:11 -0600)]
LU-13040 lmv: Pool name string handling

KASAN picked up a case where pool_name may not be null
terminated.

There are a few cases where strncpy is used, however strncpy does
not guarantee the target string is null terminated. The preference
is to use strlcpy.

Cray-bug-id: LUS-8229
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I414e955b5964c24b061edd76ed9e64a8985c537d
Reviewed-on: https://review.whamcloud.com/36908
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-12775 test: reorder 'tar' command options 07/36907/7
Lai Siyao [Tue, 3 Dec 2019 11:43:28 +0000 (19:43 +0800)]
LU-12775 test: reorder 'tar' command options

'tar' in RHEL8 is stricter in command option order.

Test-Parameters: trivial \
envdefinitions=ONLY="32c" \
clientdistro=el8.1 serverdistro=el8.1 \
mdscount=2 mdtcount=4 testlist=conf-sanity

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I814203808efae4a746166abd3ba08f2bc5fce8f7
Reviewed-on: https://review.whamcloud.com/36907
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12981 lnet: Check MTU accurately 96/36796/4
Alexey Lyashkov [Tue, 19 Nov 2019 14:00:42 +0000 (17:00 +0300)]
LU-12981 lnet: Check MTU accurately

The existing check for MTU lacks checking for the KIOV/IOV cases,
and false positive triggered for very large incomming
buffer.

Cray-bug-id: LUS-7948
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: Id3497e5f63470c24b2e51703fc564b02c9516aa6
Reviewed-on: https://review.whamcloud.com/36796
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12761: tests: make version_code() accept two number versions too 75/36275/7
Oleg Drokin [Mon, 23 Sep 2019 12:39:48 +0000 (08:39 -0400)]
LU-12761: tests: make version_code() accept two number versions too

There's now a user in sanity test 103a that calls version_code with
2.6.  Andreas rightfully points instead of fixing the caller we can
just update the code to accept this usage.

Change-Id: I5915cd08a36946c6d26f2e231aa7a820a3eef46a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36275
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 months agoLU-12780 osp: use native kthreads for opd_pre_thread 63/36263/4
Mr NeilBrown [Wed, 23 Oct 2019 00:30:50 +0000 (11:30 +1100)]
LU-12780 osp: use native kthreads for opd_pre_thread

rather than ptlrpc_thread, use native kthreads functionality.

- provide an opt_args structure which is allocated
  and initialized before the thread is started so errors
  cannot happen in the thread.
- include a completion to synchronize startup so we can be sure
  the thread function actually runs, and so will clean up.
- use kthread_stop and kthread_should_stop to
  synchronize shutdown.

The ptlrpc_thread was not used for signaling the thread about
work-to-do, so no change is needed there.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib0e2041da3fa4d613b17f743b18700c84a79fac2
Reviewed-on: https://review.whamcloud.com/36263
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9859 libcfs: simplify linux-prim.c 10/35410/6
NeilBrown [Thu, 12 Dec 2019 17:28:40 +0000 (12:28 -0500)]
LU-9859 libcfs: simplify linux-prim.c

cfs_block_sigs() is never used.
cfs_clear_sigpending() is only used in lustre_lib.h so move it
to that header. Based on

Linux-commit: 99c1ffc99a570c68cef906d9763edb47b316ea1a

Change-Id: Ia0d5ecb736c4107c5a7b666bda85714d6819fbca
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35410
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10467 ptlrpc: convert waiting in ptlrpc_hr_main() 96/37696/4
James Simmons [Tue, 25 Feb 2020 13:41:05 +0000 (08:41 -0500)]
LU-10467 ptlrpc: convert waiting in ptlrpc_hr_main()

This is a basic conditional wait. Instead of using
l_wait_condition() use the linux standard wait_event_idle().

Change-Id: I5c81914de003468ac20b6c65f1b6bee2d4cf6891
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37696
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 osc: convert a while loop to for 06/37606/2
NeilBrown [Thu, 13 Dec 2018 00:00:38 +0000 (11:00 +1100)]
LU-9679 osc: convert a while loop to for

This loop uses 'continue' in several places,
and each one is proceeded by
   ext = next_extent(ext)
which also appears at the end.
This is exactly the pattern that a 'for' loop
simplifies.  So change to a for loop.

Linux-Commit 9083b739197b ("lustre: osc: convert a while loop to for")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Ie622690134f9b3ee829255bcf997d06289abd6e6
Reviewed-on: https://review.whamcloud.com/37606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 osc: centralize handling of PTLRPCD_SET 03/37603/2
NeilBrown [Thu, 20 Dec 2018 04:57:34 +0000 (15:57 +1100)]
LU-9679 osc: centralize handling of PTLRPCD_SET

Various places test if a given rqset is PTLRPCD_SET
and call either ptlrpcd_add_req() or ptlrpc_set_add_req()
depending on the result.

This can be unified by putting the test of PTLRPCD_SET in
ptlrpc_set_add_req(), and always calling that function.

This results in there being only one place that tests PTLRPCD_SET.

Linux-Commit: 6a587cd4c705 ("lustre: centralize handling of
PTLRPCD_SET")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I879aa9ebb7e841dc2d1240a32d1c5d07e582e0b2
Reviewed-on: https://review.whamcloud.com/37603
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-9679 osc: use overlapped() consistently. 02/37602/2
NeilBrown [Wed, 12 Dec 2018 07:20:36 +0000 (18:20 +1100)]
LU-9679 osc: use overlapped() consistently.

osc_extent_is_overlapped() open-codes exactly the test that
overlapped() performs.
So use overlapped() instead, to make the code more obviously
consistent.

Linux-Commit: 270995b08634 ("lustre: osc: use overlapped()
consistently.")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I3a3ed2ee04343a294ae94f205f5d12be98f99bf3
Reviewed-on: https://review.whamcloud.com/37602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-9679 osc: remove cl_io_cancel() 97/37597/3
NeilBrown [Mon, 17 Dec 2018 02:39:10 +0000 (13:39 +1100)]
LU-9679 osc: remove cl_io_cancel()

cl_io_cancel() is never used, so remove it and various
other things that it is the only user of.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I6cf9b53aa7fc3379e57fa0ac0ea236ccda4ff6b7
Reviewed-on: https://review.whamcloud.com/37597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 osc: use assert_spin_locked() 96/37596/2
NeilBrown [Wed, 12 Dec 2018 03:25:30 +0000 (14:25 +1100)]
LU-9679 osc: use assert_spin_locked()

assert_spin_locked() is preferred to spin_is_locked() for affirming
that a spinlock is locked.

__osc_extent_sanity_check() is only ever called with obj already
locked, so change the check into an assertion.

Linux-Commit: a12d8284b574 ("lustre: osc_cache: use
assert_spin_locked()")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Iaae6deb5af4dec4d31893749924f211ba0489c47
Reviewed-on: https://review.whamcloud.com/37596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 general: add missing spaces near punctuation 02/37402/5
Mr NeilBrown [Mon, 3 Feb 2020 03:24:19 +0000 (14:24 +1100)]
LU-9679 general: add missing spaces near punctuation

Many places in lustre fold a long string onto multiple lines, usually
at word breaks.  Sometimes the word-break has punctuation, such as
comma, colon, or period, but needs a space as well to be properly
readable.  Because the string is folded, the missing space isn't
immediately obvious in the code.

This patch adds those spaces after punctuation where is seems to be
needed, and joins the affected strings onto a single line, in accord
with current policy.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9e76778b1e9687bc26e85500006b4b9d9ae6f93a
Reviewed-on: https://review.whamcloud.com/37402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-12914 mdt: mdt_prep_ma_buf_from_rep() is called twice 11/36611/3
Bruno Faccini [Wed, 30 Oct 2019 10:38:48 +0000 (11:38 +0100)]
LU-12914 mdt: mdt_prep_ma_buf_from_rep() is called twice

In some rare cases (replay of file open with O_LOV_DELAY_CREATE
when object found dead on mdt) mdt_prep_ma_buf_from_rep() can
be called twice (in either mdt_reint_open() and mdt_open_by_fid())
during the same request handling.
So remove assert checking if LMV or LOV has already been found and
set in ma.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I78e0456ea59c37cab4276383c75c4fa5cc9f4829
Reviewed-on: https://review.whamcloud.com/36611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11668 mdd: use mdd_object_fid() instead of mdo2fid() 47/35047/5
Andreas Dilger [Wed, 21 Nov 2018 02:32:32 +0000 (19:32 -0700)]
LU-11668 mdd: use mdd_object_fid() instead of mdo2fid()

Both mdd_object_fid() and mdo2fid() helper functions are the same.
Replace mdo2fid() with the better-named mdd_object_fid(mdd_obj)
function everywhere.  Use mdd_obj_dev_name(mdd_obj) for console
error messages instead of mdd2obd_dev(mdd)->obd_name for consistency.

It would be nice to consistently use "mdd_obj" for objects (instead of
"o" or "mo" or "obj", ...) and "mdd" for devices (instead of "m"), but
that is too big to include in this patch.  Just replace them in the
few wrapper functions already affected by this patch.

Fix up whitespace and string formatting style in affected code.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6de748bada06f0f66123e4567115deb2633ebbe5
Reviewed-on: https://review.whamcloud.com/35047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12931 gnilnd: use time_after() to compare jiffies 02/36702/5
Andreas Dilger [Thu, 7 Nov 2019 06:33:55 +0000 (23:33 -0700)]
LU-12931 gnilnd: use time_after() to compare jiffies

Fix a potential bug in gnilnd it is directly comparing a timeout
against jiffies instead of using time_after() to handle jiffies wrap.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie4d190e9c04e807f2152b71dc28ef0b0463ebbe5
Reviewed-on: https://review.whamcloud.com/36702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13264 osc: ensure lu_ref work atomic from osc_lock_upcall() 29/37629/2
Bruno Faccini [Wed, 19 Feb 2020 16:25:26 +0000 (17:25 +0100)]
LU-13264 osc: ensure lu_ref work atomic from osc_lock_upcall()

Since osc_lock_upcall() uses per-cpu env via
cl_env_percpu_[get,put](), all undelying work must execute on the
same CPU, meaning that no sleep()/scheduling must occur.
This implies all lu_ref related work to no longer use lu_ref_add(),
which calls might_sleep() (likely to cause a
scheduling/cpu-switch...), but lu_ref_add_atomoc() instead.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ide33d4c415e9e382f0bc344e2114182a1f122de6
Reviewed-on: https://review.whamcloud.com/37629
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13263 osc: use LDLM_LOCK_RELEASE() if no lu_ref added 25/37625/2
Bruno Faccini [Wed, 19 Feb 2020 14:39:19 +0000 (15:39 +0100)]
LU-13263 osc: use LDLM_LOCK_RELEASE() if no lu_ref added

In osc_ldlm_glimpse_ast(), LDLM_LOCK_PUT() is used after
LDLM_LOCK_GET() when no lu_ref has been added.
This causes a LBUG when USE_LU_REF is configured, so
change LDLM_LOCK_PUT() to LDLM_LOCK_RELEASE().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Id522a02878f01ae565e6c2418fe8cd85c945dde9
Reviewed-on: https://review.whamcloud.com/37625
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13258 libcfs: make apply_workqueue_attrs() available for Lustre 13/37613/2
James Simmons [Mon, 17 Feb 2020 19:34:41 +0000 (14:34 -0500)]
LU-13258 libcfs: make apply_workqueue_attrs() available for Lustre

Currently Lustre work queues can run on any core which introduces
noise on the system. The linux kernel has a function called
apply_workqueue_attrs() which allows you to control which cores
a work queue will execute on. Manually export this function so
Lustre can use it.

Test-Parameters: trivial

Change-Id: I467539cb8def7029fa9dfff2386234de5e0fe133
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13004 target: take offset into account in tgt_send_buffer 71/37571/4
Mikhail Pershin [Fri, 14 Feb 2020 09:59:05 +0000 (12:59 +0300)]
LU-13004 target: take offset into account in tgt_send_buffer

While calculating amount of pages needed, take buffer offset into
account because it can be non-aligned after allocations in
out_read().

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib7c9b35d328d366a27cc553ffe2f2c5930949cf4
Reviewed-on: https://review.whamcloud.com/37571
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12251 tests: stop running sanity-flr for PPC 63/37563/3
James Nunez [Thu, 13 Feb 2020 20:29:50 +0000 (13:29 -0700)]
LU-12251 tests: stop running sanity-flr for PPC

Stop running all sanity-flr tests for PPC client
testing until we understand and the sanity-pfl test
suite to passes all testing for PPC clients.

Test-Parameters: trivial clientarch=ppc64 testlist=sanity-flr
Test-Parameters: testlist=sanity-flr

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Iee044e6995ed4f6cca5f6b7f92eee6b59cb7018b
Reviewed-on: https://review.whamcloud.com/37563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-12198 libcfs: always copy ioctl header back to user 59/37559/3
Dominique Martinet [Thu, 13 Feb 2020 13:36:32 +0000 (13:36 +0000)]
LU-12198 libcfs: always copy ioctl header back to user

lnetctl_get_peer_list fills back the required size in header if the
given buffer was too small. Userspace needs the info back to grow
the buffer and try again.

Note we only replace err on failure if err was previously not set

Fixes: fba98579efc4 ("LU-6202 libcfs: replace libcfs_register_ioctl with a blocking notifier_chain")
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Change-Id: I2b6e319aceeb00d488572053d27023891afe1928
Reviewed-on: https://review.whamcloud.com/37559
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13225 utils: bash completion for lfs and lctl 83/37483/12
Andreas Dilger [Sat, 8 Feb 2020 08:25:29 +0000 (01:25 -0700)]
LU-13225 utils: bash completion for lfs and lctl

Add a bash completion for "lfs" and improve completion for "lctl".
Rename the "lctl" completion script to "lustre" since the two
commands share helper routines for fsnames, pools, etc. and install
"lfs" and "lctl" symlinks to the common command file.

The completion prints available sub-commands and their options,
and for some sub-commands it completes available arguments
(e.g. mount points, pool names, and MDT/OST names).

A couple of minor changes to "lfs" and "lctl" usage messages to make
the sub-command options easier to parse.  More needs to be done to
make all sub-commands have proper long options.

There is definitely a lot more that could be added to the completions,
but this is a good start and provides a framework for the future.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie989b2ef4c0d6d8565e5c6753205bb6ed83ebbe5
Reviewed-on: https://review.whamcloud.com/37483
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13133 tests: sanity-selinux test_21{a,b} sepol update 24/37224/3
Sebastien Buisson [Tue, 14 Jan 2020 11:51:55 +0000 (20:51 +0900)]
LU-13133 tests: sanity-selinux test_21{a,b} sepol update

We need to make sure MDS receives updated sepol info from MGS.
In case of combined MGT/MDT, directly setting fileset on the node
will mask llog-based info retrieval mechanism. So always use
'lctl set_param -P' to set sepol value.

Test-Parameters: trivial
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaf8ff13364b9ba5f5d8b733be0247d79e05a6b3d
Reviewed-on: https://review.whamcloud.com/37224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13071 lnet: reduce log severity for health events 02/37002/2
Amir Shehata [Thu, 12 Dec 2019 19:19:48 +0000 (11:19 -0800)]
LU-13071 lnet: reduce log severity for health events

No need to print an error when the health of an interface is
reduced. Changed it to debug level.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia60ade12efab732ea4b0388a3803976bf65938ab
Reviewed-on: https://review.whamcloud.com/37002
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-13004 osp: use correct page count in osp_prep_update_req 87/37587/3
Mr NeilBrown [Thu, 21 Nov 2019 03:53:59 +0000 (14:53 +1100)]
LU-13004 osp: use correct page count in osp_prep_update_req

A fix that went into patchset 3 of
 https://review.whamcloud.com/#/c/36828/3
disappeared in patchset 5.
We should restore it.

Specifically, 'page_count' should be a count of pages,
but it is currently a count of the bytes in all the pages.

Fixes: f32fbf189fab ("LU-13004 osp: use KIOV in osp_prep_update_req")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic8dcdac414d16b4f2c1c6e0367d496de7e0a8cff
Reviewed-on: https://review.whamcloud.com/37587
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12861 libcfs: Cleanup libcfs_debug_msg use of snprintf 01/36901/8
Shaun Tancheff [Fri, 31 Jan 2020 20:09:34 +0000 (14:09 -0600)]
LU-12861 libcfs: Cleanup libcfs_debug_msg use of snprintf

scnprintf returns the number of bytes written to the buffer.
snprintf returns the size of the buffer needed to satisfy
the request.

Prefer scnprintf when result is being used as the number
of bytes in a buffer.

Use snprintf when the result is used for sizing or resizing
a buffer.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8c4b8f7dcc081f8b9dfffc35059011172be2e091
Reviewed-on: https://review.whamcloud.com/36901
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12191 utils: Make "lctl list_param" read exact path under sysfs tree 52/36852/5
Sonia Sharma [Tue, 4 Feb 2020 18:34:02 +0000 (13:34 -0500)]
LU-12191 utils: Make "lctl list_param" read exact path under sysfs tree

"lctl list_param -R" currently checks for the param_name
in the path and reads the sysfs tree under that. But it can
give erroneous results in the following example -

For path like /sys/fs/lnet/net/o2ib1/ib0, command
"lctl list_param -R" doesn't go down the "net" tree
because it matches "net" with "lnet" and just stop
there.

This patch changes how param_name is checked for
in the path. Like in the above example, instead
of checking for "net", it should check for
"/net". So, this patch adds this change in
param_display() in lustre/utils/lustre_cfg.c

Change-Id: Ieb3ad0d1248eee2192246ff5c4d77a71d87dc446
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36852
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13005 lnet: eq: discard struct lnet_handle_eq 41/36841/5
Mr NeilBrown [Wed, 20 Nov 2019 00:16:34 +0000 (11:16 +1100)]
LU-13005 lnet: eq: discard struct lnet_handle_eq

The Portals API uses a cookie 'handle' to identify an EQ.  This is
appropriate for a user-space API for objects maintained by the kernel,
but it brings no value when the API client and implementation are both
in the kernel, as is the case with Lustre and LNet.

Instead of using a 'handle', a pointer to the 'struct lnet_eq` can be
used.  This object is not reference counted and is always freed
correctly, so there can be no case where the cookie becomes invalid
while it is still held.

So use 'struct lnet_eq *' directly instead of having indirection
through a 'struct lnet_handle_eq'.
Also:
 - have LNetEQAttach() return the pointer, using ERR_PTR() to return
   errors.
 - discard ln_eq_containers and don't store the me there-in.
   This means we don't free any eq that have not already been freed,
   but all eq that are allocated are properly freed, so that is not
   a problem.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0d6e5b654e39e749b39d46f68d0fb3e47a3256e9
Reviewed-on: https://review.whamcloud.com/36841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread 59/36259/6
Mr NeilBrown [Wed, 23 Oct 2019 00:30:49 +0000 (11:30 +1100)]
LU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread

Instead of ptlrpc_thread use more direct interfaces.

- Instead of waiting for thread startup to complete, perform
  the startup before starting the thread.
- as nothing waits for the thread to finish we cannot use
  kthread_should_stop().  Instead, set the task pointer
  sai_task to NULL when the thread is finishing up.
- As we don't use kthread_should_stop(), we can safely do cleanup
  in the thread, because it is sure to run.
- use wake_up_process to signal the thread that there
  is work to do.
- the wake_up that is currently at the end of sa_put() becomes
  a little more complicated and is move to after the one place
  where sa_put() is called.

Change-Id: If694dafc6864348fe5203a4935f4c128ce5ff255
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12780 llite: don't use ptlrpc_thread for sai_agl_thread 58/36258/6
Mr NeilBrown [Wed, 23 Oct 2019 00:30:48 +0000 (11:30 +1100)]
LU-12780 llite: don't use ptlrpc_thread for sai_agl_thread

Instead of ptlrpc_thread use native kthread functionality.

- instead of waiting for the thread to start-up, perform
  all early initialization before starting the thread,
  and cleanup happens after thread is stopped.
- use kthread_stop()/ kthread_should_stop() to negotiate
  shutdown.
- wake_up_process to tell the thread if there is more work
  to do.  The thread sets current->state to TASK_IDLE before
  checking, so that if it gets the wakeup, the 'schedule()'
  call won't block.
  We clear ->sai_agl_task under a spinlock, from which it is
  also woken, to avoid races.

Linux-commit c044fb0f835c

Change-Id: I73294dd2f28087f56c3463ecfad1a8b32a44b2d7
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36258
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10467 lfsck: use wait_event_idle() 10/37610/3
Mr NeilBrown [Mon, 17 Feb 2020 03:46:54 +0000 (14:46 +1100)]
LU-10467 lfsck: use wait_event_idle()

This l_wait_event() call is equivalent to the more standard
wait_event_idle().
So switch over to wait_event_idle().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8e13360a40dd1eec740f597d649c0f230533eb3d
Reviewed-on: https://review.whamcloud.com/37610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10467 ldlm: use wait_event_idle() instead of l_wait_event 09/37609/2
Mr NeilBrown [Mon, 17 Feb 2020 03:45:31 +0000 (14:45 +1100)]
LU-10467 ldlm: use wait_event_idle() instead of l_wait_event

This l_wait_event() is equivalent to wait_event_idle() which is now
supported in lustre.  So switch over to it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If1ee81a0d562516534665d049fb24c1f39b59b95
Reviewed-on: https://review.whamcloud.com/37609
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13254 mdt: clear mti_mdt in mdt_thread_info_fini() 92/37592/2
Mikhail Pershin [Sat, 15 Feb 2020 17:09:31 +0000 (20:09 +0300)]
LU-13254 mdt: clear mti_mdt in mdt_thread_info_fini()

Clear mti_mdt when finalizing mdt_thread_info to prevent
its reuse my other handler later. Usually that may happen
at mdt_lvbo_fill/update() which takes thread info as is,
without initialization because at this point it is not
clear was it already initialized or not. So mti_mdt may be
used there being initialized by some other handler from
different MDT or even with garbage at old pointer.
Meanwhile there is no need to use any mdt_thread_info values
like mti_mdt in mdt_lvbo_fill() because there is MDT device
taken from namespace and the only fields are used from
mdt_thread_info are temporary storage for FID and lu_buffer.

Patch zeros mti_mdt upon thread finalizing and removes also
usages of info->mti_mdt from mdt_lvbo_fill/update() replacing
that with MDT taken from lock namespace.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib350093f0b70c777932c056b34cb56a9702b650d
Reviewed-on: https://review.whamcloud.com/37592
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10467 mdc: change ssleep to msleep_interruptible 88/37488/3
James Simmons [Mon, 17 Feb 2020 16:31:47 +0000 (11:31 -0500)]
LU-10467 mdc: change ssleep to msleep_interruptible

During review of the mdc wait_idle* changes for mdc_getpage()
it was pointed out that the use of ssleep() prevents the code
from being interruptible. Change ssleep to msleep_interruptible()
to allow breaking out of the sleep if an application sends
and INTR signal.

Change-Id: I2fcb90ecdd6f2c4f2ee6fbc54d253622e8beee29
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37488
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12811 ptlrpc: pass buffer size to the swabbing functions 35/36435/9
Emoly Liu [Mon, 23 Dec 2019 02:32:31 +0000 (10:32 +0800)]
LU-12811 ptlrpc: pass buffer size to the swabbing functions

By adding a separate rmf_swab_len() function pointer to
req_msg_field, the buffer size can be passed to the swabbing
functions, e.g. lustre_swab_fiemap() in this patch, to avoid
invalid access, especially for small buffer.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I997e6a828f2f1cdfdb8a5fa241fa43ca0ae3677e
Reviewed-on: https://review.whamcloud.com/36435
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 months agoLU-11915 tests: add debugging to conf-sanity test_115 48/37548/6
Andreas Dilger [Wed, 12 Feb 2020 06:48:36 +0000 (23:48 -0700)]
LU-11915 tests: add debugging to conf-sanity test_115

After updating the e2fsprogs build version to 1.45.2.wc2, this
test is not longer being skipped, and is failing to mount the
newly-formatted OST0000 due to errors registering with the MGS
(target index already in use).  Since the MDS+MGS was just
reformatted, that doesn't make sense.

Continue to skip this test until we understand why it is failing,
but use ALWAYS_EXCEPT instead of blaming the e2fsprogs version.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59c689763481c4fc3677ca1807101de09599bb77
Reviewed-on: https://review.whamcloud.com/37548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10073 tests: skip test smoke for PPC 50/37450/2
James Nunez [Wed, 5 Feb 2020 22:49:38 +0000 (15:49 -0700)]
LU-10073 tests: skip test smoke for PPC

The lnet-selftest test smoke fails consistently for
PPC client testing.  Thus, stop running this test until
we understand the failure; add smoke to the ALWAYS_EXCEPT
list.

Test-Parameters: trivial
Test-Parameters: clientarch=ppc64 testlist=lnet-selftest
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I090ec05d7ad934bb4c68e976572adb29eb29a676
Reviewed-on: https://review.whamcloud.com/37450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13186 tests: stop running tests for PPC clients 97/37397/10
James Nunez [Sun, 2 Feb 2020 01:24:56 +0000 (18:24 -0700)]
LU-13186 tests: stop running tests for PPC clients

Stop running tests, put on the ALWAYS_EXCEPT list, that fail
consistently when testing PPC clients including:

sanity-hsm tests
(LU-12251) 1a 1b 1d 1e 12c 12f 12g 12h 12m 12n 12o 12p 12q
21 22 23 24a 24b 24d 24e 24f 25b 30c 37 57 58 90 110b 111b
113 222b 222d 228 260a 260b 260c
(LU-12252) 220A 220a 221 222a 222c 223a 223b 224A 224a 226
227 600 601 602 603 604 605

sanity-pfl tests
(LU-13186) 14
(LU-13205) 16a
(LU-13207) 16b
(LU-13215) 17

Test-Parameters: trivial
Test-Parameters: clientarch=ppc64 testlist=sanity-pfl,sanity-hsm
Test-Parameters: clientarch=ppc64 testlist=sanity-pfl,sanity-hsm
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I847a8121d2675b9671bc9a39c4f6ccff67b208fa
Reviewed-on: https://review.whamcloud.com/37397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13226 ldiskfs: Add support for Ubuntu eoan 5.3 54/37554/2
Shaun Tancheff [Wed, 12 Feb 2020 20:19:09 +0000 (14:19 -0600)]
LU-13226 ldiskfs: Add support for Ubuntu eoan 5.3

Ubuntu eoan kernel is close enough to 5.4.7+ mainline to
use the patch series directly.
Update the configure script to select it.

Test-Parameters: trivial
Cray-bug-id: LUS-8485
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iadb9b87a153a88846399d91699c972c72a5e1e7a
Reviewed-on: https://review.whamcloud.com/37554
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13232 tests: add stack_trap to clean up sanity 160j 50/37550/4
James Nunez [Wed, 12 Feb 2020 19:00:56 +0000 (12:00 -0700)]
LU-13232 tests: add stack_trap to clean up sanity 160j

When sanity test 160j fails at any point in the test before
clean up, a client can be left with no file system mounted
or the second file system mount could be left mounted.  We
need to call stack_trap after each of these commands to
clean up the mount points in case of the test failing.

Test-Parameters: trivial
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I631cc2bb2d664a0cdcfe5942d16cd1d011a822ef
Reviewed-on: https://review.whamcloud.com/37550
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9859 libcfs: rename cfs_cpt_table to cfs_cpt_tab 19/37519/2
NeilBrown [Mon, 10 Feb 2020 16:22:02 +0000 (11:22 -0500)]
LU-9859 libcfs: rename cfs_cpt_table to cfs_cpt_tab

The variable "cfs_cpt_table" has the same name as
the structure "struct cfs_cpt_table".
This makes it hard to use #define to make one disappear
on a uni-processor build, but keep the other.
So rename the variable to cfs_cpt_tab.

Linux-commit: 457d63ea5c1aa81fe0b9a66a77a2282856b88983

Test-Parameters: trivial

Change-Id: I77cc6694183df2485974c8a962a5766a905fb5f9
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/37519
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10756 ptlrpc: fix IMP_CLOSED state is being never set 05/37405/4
Mikhail Pershin [Mon, 3 Feb 2020 09:03:59 +0000 (12:03 +0300)]
LU-10756 ptlrpc: fix IMP_CLOSED state is being never set

Commit cf78502e48d checks the new state for IMP_CLOSED value
instead of import current state so instead of keeping import
closed it prevents import state from being set to IMP_CLOSE

Patch restores original check to keep import closed by
checking its current state

Fixes: cf78502e48d ("LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7df2798f09ce7023381c03957adf530da4149c2d
Reviewed-on: https://review.whamcloud.com/37405
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13190 mds: send mbo_max_mdsize in open intent reply 00/37400/6
Alex Zhuravlev [Sun, 2 Feb 2020 13:45:29 +0000 (16:45 +0300)]
LU-13190 mds: send mbo_max_mdsize in open intent reply

 - client sends open|create intent before a connection to OST
   cl_default_mds_easize is 0 since initialization
 - MDS replies back without UPDATE bit in LDLM lock, but wit EAh
    (MDS doesn't send OBD_MD_FLMODEASIZE and mbo_max_mdsize back
 - client's cl_default_mds_easize is still 0
 - client sends getattr intent with 0-size buffer for EA
 - MDS replies LAYOUT lock, but empty EA due to 0-size buffer
 - client sets local layout to EMPTY
 - all subsequent I/O fails with -EBADF

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iadd5595d956f0469e3916cdc1cca2ac8f802a149
Reviewed-on: https://review.whamcloud.com/37400
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12722 target: disable recovery for local clients 25/36025/38
Alexey Zhuravlev [Mon, 9 Sep 2019 14:00:05 +0000 (17:00 +0300)]
LU-12722 target: disable recovery for local clients

when client is running on a server node, then the local
services can't rely on that client in the contex of
recovery - such a client dies with the node, can't replay
requests and states and then the restarting server has to
wait till recovery expires which doesn't make any sense.

so the servers should recogize local clients and exclude
them from recovery (i.e. don't make them part of last_rcvd).

for the purpose of local testing a special mount option
"local_recov" has been added to {MDS|OST}_MOUNT_OPTS in
tests/cfg/local.sh to save local testing when everyting
is running within a single node.

Signed-off-by: Alexey Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4cb906c44c1192933f7d77dc782160e426e9efde
Reviewed-on: https://review.whamcloud.com/36025
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12280 quota: add notify grace 17/36017/8
Hongchao Zhang [Thu, 10 Oct 2019 21:06:05 +0000 (17:06 -0400)]
LU-12280 quota: add notify grace

Add an option to get notify when the quota is over soft limit but
prevents it from becoming the hard limit.

Change-Id: I01ae1266c3683198b82af7bad119db280c1e3a07
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36017
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9859 libcfs: remove unnecessary cfs_block_allsigs() calls 50/35350/15
NeilBrown [Mon, 10 Feb 2020 14:06:31 +0000 (09:06 -0500)]
LU-9859 libcfs: remove unnecessary cfs_block_allsigs() calls

Threads started by kthread_run() ignore all signals,
as kthreadd() calls ignore_signals(), and this is
inherited by all children.
So there is no need to call cfs_block_allsigs() in functions
that are only run from kthread_run().

For the case of lnet_ping_md_unlink() it is not from a kernel
thread but nothing in that function should be affected by
signals so it is safe to remove.

For lnet_ping() we need to manually block signals since
LNetEQPool() can unconditionally abort when a signal is
recieved.

Linux-commit: 1b2dad1459e480028a2714439048d8a634132857

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I124dccf78a3187d5f4a31c7b76db5369aaafc369
Reviewed-on: https://review.whamcloud.com/35350
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12477 lustre: remove obsolete config checks 85/37085/16
James Simmons [Sat, 8 Feb 2020 13:39:30 +0000 (08:39 -0500)]
LU-12477 lustre: remove obsolete config checks

Remove from the lustre kernel code all the support for kernels
earlier than the RHEL7 3.10+. This greatly simplifies the code
and makes build times much better.

Change-Id: If52091ac5249b2719b992032040ccf30cc5bf0e4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37085
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10235 mdt: mdt_create: check EEXIST without lock 80/30880/18
Dominique Martinet [Wed, 10 Jan 2018 13:08:06 +0000 (14:08 +0100)]
LU-10235 mdt: mdt_create: check EEXIST without lock

mkdir() currently gets a write lock on the parent even if the new
directory already exists.

This patch adds an initial lookup of the new directory without a DLM
lock so that other clients do not need to cancel their DLM lock if the
"new" directory already exists, but will continue as usual if directory
did not exist.

There is a small race window that child was created by others after our
check and before locking parent, but this can be detected later during
index insert.

Performance change on two haswell 16-core VMs with ib, mean values of
mpirun -n 8 ./mdtest -D -i 8 -I 1000

test environment | directory creation | tree creation
local, no patch  | 1725/s             | 769/s
local, patch     | 1821/s             | 788/s
remote, no patch | 1729/s             | 772/s
remote, patch    | 1687/s             | 787/s

The differences are of the order of the noise here, with all mkdirs
being effective.

If directories exist, some simple stress on four nodes shows intended
improvements:
clush -w vm[0-3] 'seq 0 10000 |
    xargs -P 7 -I{} sh -c "(({}%3==0)) &&
        mkdir /mnt/lustre/testdir/foo 2>/dev/null ||
        stat /mnt/lustre/testdir > /dev/null"'

with patch: 10s
without patch: 19s
(the difference grows exponentially with number of clients and hangs
with over 60 clients without the patch; exact time was not re-measured
with patch)

Updated sanityn.sh 43a 45a to avoid race conditions.

Add sanityn.sh test_43j to verify above scenario.

Test-Parameters: envdefinitions=SLOW=yes testlist=replay-vbr,replay-vbr
Change-Id: I37fc9c8ffc7ab334c0645042beda5bef01284564
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/30880
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-11597 tests: skip sanityn tests for PPC 61/37561/2
James Nunez [Thu, 13 Feb 2020 20:10:53 +0000 (13:10 -0700)]
LU-11597 tests: skip sanityn tests for PPC

Several sanityn test suite tests fail consistenly when
testing PPC clients.  These tests should be skipped,
added to the ALWAYS_EXCEPT list, until the failures are
understood and fixed.

Tests to skip in sanityn are
16a (LU-11597)
71a (LU-11787)

Test-Parameters: trivial clientarch=ppc64 testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I39cc9d22e8a47eb8ef59ce8d30e1b6e9aa616a9a
Reviewed-on: https://review.whamcloud.com/37561
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11269 ptlrpc: do not expose transient IDLE state 23/37523/4
Alex Zhuravlev [Mon, 10 Feb 2020 21:06:07 +0000 (00:06 +0300)]
LU-11269 ptlrpc: do not expose transient IDLE state

to avoid cases when anyone sending an RPC observes the connection
in this state while it's going to reconnect right away.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9ca89051c4176fe321262f8b2f52969c382e401e
Reviewed-on: https://review.whamcloud.com/37523
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13228 clio: mmap write when overquota 95/37495/4
Alexander Zarochentsev [Fri, 20 Dec 2019 23:19:44 +0000 (02:19 +0300)]
LU-13228 clio: mmap write when overquota

Flagging client by overquota flag should not
cause mmap write access to sigbus the app.

Cray-bug-id: LUS-8221
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: I29d5901fa5078b5cfca40391a02531cf27efce93
Reviewed-on: https://review.whamcloud.com/37495
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13131 osc: remove redundant osc_list() helper 79/37479/4
Andreas Dilger [Fri, 7 Feb 2020 22:01:49 +0000 (15:01 -0700)]
LU-13131 osc: remove redundant osc_list() helper

The osc_list() helper function is the same as list_empty_marker(),
and we don't need both.  Remove osc_list() from the code.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I07d2a519906f52fca8c95613a14ad7389a3ebbe5
Reviewed-on: https://review.whamcloud.com/37479
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9679 various: use list_splice and list_splice_init 57/37457/2
Mr NeilBrown [Wed, 13 Nov 2019 03:03:12 +0000 (14:03 +1100)]
LU-9679 various: use list_splice and list_splice_init

The construct
   list_add(to, from);
   list_del(from);
is equivalent to
   list_splice(from, to);
providing 'to' has been initialized.
Similarly with list_del_init and list_splice_init.
There is no need to check if list_empty(from) first.

Also looping over a list moving individiual entries to
another list can more easily be done with list_splice.

These changes improve code clarity.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I710eb3bbd83c75e6c8f00b8d0a4c256ad28f9082
Reviewed-on: https://review.whamcloud.com/37457
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10391 lnet: discard lnet_sock_accept() 03/37303/4
Mr NeilBrown [Thu, 7 Nov 2019 04:02:54 +0000 (15:02 +1100)]
LU-10391 lnet: discard lnet_sock_accept()

There is no longer any important difference between
lnet_sock_accept(), and kernel_accept(..., O_NONBLOCK).
So remove lnet_sock_accept().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iad7c91abe2359758982e3740a21c91232c919aa0
Reviewed-on: https://review.whamcloud.com/37303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10391 lnet: use data_ready callback to trigger accept() 02/37302/6
Mr NeilBrown [Wed, 22 Jan 2020 06:16:12 +0000 (17:16 +1100)]
LU-10391 lnet: use data_ready callback to trigger accept()

Rather than blocking in lnet_sock_accept(), set up a data_ready
callback, and use that to find out when to call lnet_sock_accept()
again.

This simplifies lnet_sock_accept() (which will be removed in
next patch), and means that we could listen on multiple
sockets, which will be useful for IPv6 support.

The code design in based on that in net/sunrpc/svcsock.c.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3015f2f6b6d420af5c8454b6c1a99611b48e7702
Reviewed-on: https://review.whamcloud.com/37302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>