Whamcloud - gitweb
fs/lustre-release.git
11 months agoLU-10934 tests: increase timeout for sanityn test_51b 47/38947/2
Andreas Dilger [Tue, 16 Jun 2020 06:48:01 +0000 (00:48 -0600)]
LU-10934 tests: increase timeout for sanityn test_51b

Increase the timeout for sanityn test_51b, since this is causing
intermittent test failures when stat() is run before dd finishes.

Test-Parameters: trivial testlist=sanityn env=ONLY=51b,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieb3d50e6b534b535e8255cbbc566f053f33ebbe5
Reviewed-on: https://review.whamcloud.com/38947
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-11518 ldlm: cancel LRU improvement 61/39561/2
Vitaly Fertman [Fri, 31 Jul 2020 18:16:43 +0000 (21:16 +0300)]
LU-11518 ldlm: cancel LRU improvement

Add @batch parameter to cancel LRU, which means if at least 1 lock is
cancelled, try to cancel at least a batch locks. This functionality
will be used in later patches.

Limit the LRU cancel by 1 thread only, however, not for those which
have the @max limit given (ELC), as LRU may be left not cleaned up
in full.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ide21c4a2b2209b8a721249466ea1e651c8532c8a
HPE-bug-id: LUS-8678
Reviewed-on: https://es-gerrit.dev.cray.com/157067
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/39561
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-11518 ldlm: lru code cleanup 60/39560/2
Vitaly Fertman [Fri, 31 Jul 2020 18:07:04 +0000 (21:07 +0300)]
LU-11518 ldlm: lru code cleanup

cleanup includes:
- no need in unused locks parameter in the lru policy, better to
  take the current value right in the policy if needed;
- no need in a special SHRINKER policy, the same as the PASSED one
- no need in a special DEFAULT policy, the same as the PASSED one;
- no need in a special PASSED policy, LRU is to be cleaned anyway
  according to LRU resize or AGED policy;

bug fixes:
- if the @min amount is given, it should not be increased on the
  amount of locks exceeding the limit, but the max of them is to
  be taken instead;
- do not do ELC on enqueue if no LRU limits are reached;
- do not keep lock in LRUR policy once we have cancelled @min locks,
  try to cancel instead until we reach the @max limit if given;
- cancel locks from LRU with the new policy, if changed in sysfs;

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I84369da54f680e5fbddd28089c40d1b90722d42d
HPE-bug-id: LUS-8678
Reviewed-on: https://es-gerrit.dev.cray.com/157066
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/39560
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13645 ldlm: re-process ldlm lock cleanup 05/39405/3
Vitaly Fertman [Thu, 16 Jul 2020 11:28:03 +0000 (14:28 +0300)]
LU-13645 ldlm: re-process ldlm lock cleanup

For extent locks:
- rescan logic is not needed for group locks, it works well without it
- @err is not needed for ldlm_extent_compat_queue(), it is always set
  to @compat, remove it and set outside
- LDLM_FL_NO_TIMEOUT flag could be set once outside of
  ldlm_extent_compat_queue()
- add ldlm_resource_insert_lock_before();

For inodebits:
- glimpse expects ELDLM_LOCK_ABORTED to fill data properly on client
  side, do not return ELDLM_LOCK_WOULDBLOCK from
  ldlm_process_inodebits_lock()
- regular enqueue also does not have logic for ELDLM_LOCK_WOULDBLOCK,
  restore the original ELDLM_LOCK_ABORTED here as well for simplicity
- check for DOM lock in mdt_dom_client_has_lock() according to open
  flags, not for LCK_PW always;

Also, move sanity 82 to sanityn as after LU-9964 it is to be run on
two mount points.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I6d0b230f04aaa497db5b036b4ed9afe5d7f418b0
HPE-bug-id: LUS-8987
Reviewed-on: https://review.whamcloud.com/39405
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-8522 tests: improve slabinfo accuracy when slub is used. 79/39579/3
Mr NeilBrown [Thu, 6 Aug 2020 04:20:27 +0000 (14:20 +1000)]
LU-8522 tests: improve slabinfo accuracy when slub is used.

The "active_objs" count in slabinfo is never very accurate, but when
CONFIG_SLUB is being used it is even less accurate than with
CONFIG_SLAB.

If CONFIG_SLUB_DEBUG is also enabled, it is possible to shrink the
cache and remove this inaccuracy by writing '1' to
   /sys/kernel/slab/$CACHENAME/shrink

So add appropriate code to sanity.sh so that when the 'shrink' file is
available, it is used.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I36179d4609b5e4bcd1de00f0b5921c9c6bed72b0
Reviewed-on: https://review.whamcloud.com/39579
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
11 months agoLU-11310 ldiskfs: Fix suse15/ext4-max-dir-size.patch 71/39571/3
Mr NeilBrown [Wed, 5 Aug 2020 01:34:21 +0000 (11:34 +1000)]
LU-11310 ldiskfs: Fix suse15/ext4-max-dir-size.patch

The ext4-max-dir-size patch for suse15 added a 'max_dir_size' sysfs
attribute with an incorrect implementation.  The implementation is
identical to that for 'max_dir_size_kb', so setting or reading
'max_dir_size' will result in incorrect values.  This causes
sanity test 129 to fail.

So add a suitable implementation for max_dir_size

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I591259ed668bc828c3a7caa6a55d0de2b0d72797
Reviewed-on: https://review.whamcloud.com/39571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13773 tests: use TESTLOG_PREFIX in run_one_logged 52/39552/2
James Nunez [Thu, 30 Jul 2020 22:33:32 +0000 (16:33 -0600)]
LU-13773 tests: use TESTLOG_PREFIX in run_one_logged

TESTLOG_PREFIX is defined and exported in init_test_env()
in test-framework.sh.  This environment variable is defined
as $LOGDIR/$TESTSUITE. TESTLOG_PREFIX should  be used in the
definitions of test_log and zfs_debug_log.  Since the logs
created in run_one_logged() don't use the defined prefix,
this could lead to differences in the naming of the dmesg,
debug_log and test_log logs.

Let's use TESTLOG_PREFIX in the definitions of test_log and
zfs_debug_log so that changes to the prefix are reflected in
all logs.

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I14dbe0469b7a6627d63679103c235b97f0e42b67
Reviewed-on: https://review.whamcloud.com/39552
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13819 build: Update ZFS version to 0.8.4 07/39507/2
Nathaniel Clark [Fri, 24 Jul 2020 19:01:47 +0000 (15:01 -0400)]
LU-13819 build: Update ZFS version to 0.8.4

Changes from 0.8.3

* Add missing zfs_refcount_destroy() in key_mapping_rele() #10246
* Linux 5.7 compat: blk_alloc_queue() #10181 #10187
* Prefix struct rangelock #9534
* Fix icp include directories for in-tree build #10021
* ICP: gcm-avx: Support architectures lacking the MOVBE
  instruction #10029
* ICP: Improve AES-GCM performance #9749
* Bugfix/fix uio partial copies #8673 #10148
* Prevent deadlock in arc_read in Linux memory reclaim
  callback #9987
* Fix infinite scan on a pool with only special allocations
  #10106 #8694
* Static symbols exported by ICP #9791
* Linux 5.6 compat: struct proc_ops #9961
* Linux 5.6 compat: timestamp_truncate() #9956 #9961
* Linux 5.6 compat: ktime_get_raw_ts64() #10052 #10064
* Linux 5.6 compat: time_t #10052 #10064
* Fix static data to link with -fno-common #9943
* zfs_get: change time format string from %k to %H #10090 #10153
* Deprecate deduplicated send streams #7887 #10117
* Fix zfs-functions packaging bug
* initramfs: Eliminate substitutions
* Delete built init scripts in make clean
* Restore :: in Makefile.am #9210
* Make init scripts depend on Makefile
* Systemd mount generator: don't fail keyload from file if
  already loaded #10103
* Systemd mount generator: Generate noauto units; add control
  properties #9649
* Systemd mount generator: Silence shellcheck warnings #9649
* Fix CONFIG_MODULES=no Linux kernel config #9887 #10063
* Linux 5.5 compat: blkg_tryget() #9745 #10072
* zfs-mount-generator: Fix escaping for / #9970
* Missed wakeup when growing kmem cache #9989
* Order zfs-import-*.service after multipathd #9863
* Avoid here-documents in systemd mount generator #9802

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I3a270376e05d466eeb8e8ba93d7c4aa0d2546ae6
Reviewed-on: https://review.whamcloud.com/39507
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13818 build: use libsnmp-dev instead of libsnmp30 06/39506/2
Minh Diep [Fri, 24 Jul 2020 17:38:04 +0000 (10:38 -0700)]
LU-13818 build: use libsnmp-dev instead of libsnmp30

Installing libsnmp-dev will pull in the correct libsnmpXX.
By depending on the libsnmp-dev we can install on
ubuntu 20.04 which is libsnmp35

Change-Id: Ib921ac35e06149ba88fa8e39b9a0980deb94acf2
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39506
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-9325 fld: replace simple_strto* with kstr* functions 98/39498/3
James Simmons [Sat, 25 Jul 2020 21:27:22 +0000 (17:27 -0400)]
LU-9325 fld: replace simple_strto* with kstr* functions

The fldb debugfs files use simple_strto* to parse input from the
user. simple_strto* is considered obsolete so replace it with
the equivalent kstrto* functions.

Change-Id: I6d32939152ee0d65df4ec45937d7d0be03b8274e
Test-Parameters: trivial env=ONLY=68 testlist=conf-sanity
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13791 sec: enable FS capabilities 99/39399/4
Lai Siyao [Sun, 12 Jul 2020 01:15:16 +0000 (09:15 +0800)]
LU-13791 sec: enable FS capabilities

FS capabilities are not effective because they are dropped for
non-root users for historical reason: they are used to be enforced
before operations, but now they are checked in MDD layer only (see
mdd_fix_attr()).

Add sanity-sec 51.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3e355f5df5eab5509b5e6774dbc8b82281a34039
Reviewed-on: https://review.whamcloud.com/39399
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-9859 libcfs: don't save journal_info in dumplog thread. 94/39294/4
Mr NeilBrown [Mon, 6 Jul 2020 12:34:44 +0000 (08:34 -0400)]
LU-9859 libcfs: don't save journal_info in dumplog thread.

As this thread is started by kthread, it must have
a clean environment and cannot possibly be in a
filesystem transaction.  So current->journal_info
must be NULL, and preserving it serves no purpose.

Also change libcfs_debug_dumplog_internal() to 'static'
to make it clear that it shouldn't be called from
anywhere but this thread.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie863f792b36792600bef4fe778c46e97ebf046c3
Reviewed-on: https://review.whamcloud.com/39294
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-9859 libcfs: rename CFS_TCD_TYPE_MAX to CFS_TCD_TYPE_CNT 76/39276/2
Mr NeilBrown [Tue, 30 Jun 2020 14:37:14 +0000 (08:37 -0600)]
LU-9859 libcfs: rename CFS_TCD_TYPE_MAX to CFS_TCD_TYPE_CNT

The possible TCD types are 0, 1, 2.
So the MAX is 2.
The count of the number of types is 3.

CFS_TCD_TYPE_MAX is 3 - obviously wrong.

So rename it to CFS_TCD_TYPE_CNT.

Also there are 2 places where "3" is used rather
than the macro - fix them.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia4ce5fdb3225494f93d1eebd9fddfc15eb2b8316
Reviewed-on: https://review.whamcloud.com/39276
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13687 llite: return -ENODATA if no default layout 00/39200/3
Andreas Dilger [Sat, 27 Jun 2020 11:14:02 +0000 (05:14 -0600)]
LU-13687 llite: return -ENODATA if no default layout

Don't return -ENOENT if fetching the default layout from the root
directory fails.  Otherwise, "lfs find" will print an error message
for every directory scanned in the filesystem:

     lfs find: /myth/tmp does not exist: No such file or directory

Fixes: 3e8fa8a7396c ("LU-11656 llite: fetch default layout for a directory")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e082c5d425c44ca7770d3b24cbb13bb7d2540e5
Reviewed-on: https://review.whamcloud.com/39200
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13700 test: increase sanity 230o/230p wait time 19/39119/3
Lai Siyao [Mon, 22 Jun 2020 02:17:06 +0000 (10:17 +0800)]
LU-13700 test: increase sanity 230o/230p wait time

ZFS may be slow to finish dir split/merge in time, triple wait time
to avoid failure.

Test-parameters: trivial fstype=zfs testlist=sanity mdscount=2 \
mdtcount=4 env=ONLY=230,ONLY_REPEAT=30

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3d28c942ac925ea201936b53d0487d9a6bf9376c
Reviewed-on: https://review.whamcloud.com/39119
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
11 months agoLU-13688 hsm: handle in-tree executed copytools correctly 62/38962/6
Nikitas Angelinas [Wed, 17 Jun 2020 11:17:07 +0000 (04:17 -0700)]
LU-13688 hsm: handle in-tree executed copytools correctly

The Lustre test suite and HSM copytools can be invoked from either
within /usr/lib{,64}/ if they have been installed from source or from
packages, or from within the Lustre source tree, usually for
development purposes; in the latter case, the copytool process name is
prepended with an "lt-", due to being invoked via a libtool wrapper
script. The Lustre test framework relies on "libtool execute" to
distinguish between these two cases, parse the command parameters and
pass the correct process name as a parameter to utilities such as
pgrep(1), pkill(1), ps(1) and killall(1). Unfortunately, this doesn't
seem to work unless the libtool script for the copytool and the test
framework test file are in the same directory; e.g. this doesn't work
for lhsmtool_posix as its libtool script is in lustre/utils/, but the
Lustre test suite is in lustre/tests/, which doesn't allow the
"libtool execute" parsing and parameter replacing to succeed.

Fix this by determining the process name of the executed copytool
based on whether it was invoked from within the source tree or not and
using it in commands that either search for copytool processes or send
them signals by process name.

Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Cray-bug-id: LUS-8931
Change-Id: Ief7b224b793401b1a24bf9780d1df6e029f5c0d7
Reviewed-on: https://review.whamcloud.com/38962
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: nathan r <nrutman@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13688 tests: remove duplicate HSM functions 61/38961/5
Nikitas Angelinas [Wed, 17 Jun 2020 11:04:45 +0000 (04:04 -0700)]
LU-13688 tests: remove duplicate HSM functions

Some HSM test framework functions exist in both test-framework.sh and
sanity-hsm.sh. Some of these are also used in PCC tests, so the
sanity-hsm.sh copy can be removed and some are used only in HSM tests,
so the test-framework.sh copy can be removed. The test-framework.sh
copies were introduced by LU-10092 which seems to have used versions
of the functions before they were updated by LU-11742, so update
kill_copytools() to the latest version.

Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Cray-bug-id: LUS-8913
Change-Id: I8101f748bfcfffb81598f7a5d2d82f2a16696e5c
Reviewed-on: https://review.whamcloud.com/38961
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13686 utils: pool_add/remove error code fix 60/38960/6
Sergey Cheremencev [Wed, 17 Jun 2020 11:01:13 +0000 (14:01 +0300)]
LU-13686 utils: pool_add/remove error code fix

jt_pool_cmd should always return error code, even
if it failed to add/remove just one of OSTs from list.
Before this patch it returned latest command result,
ignoring previous failures.

Change-Id: Ife6cefc006f061b47a1b00daf826d0d1d34fd66c
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/38960
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13676 tools: awk script to find unique backtraces 36/38936/2
Alex Zhuravlev [Mon, 15 Jun 2020 06:09:53 +0000 (09:09 +0300)]
LU-13676 tools: awk script to find unique backtraces

looking at backtraces from crash utility it's not routine to find
interesting ones. this simple awk script can help:

1) dump backtraces in crash using "foreach bt" command
2) cat <file-with-backtraces> | crash-find-unique-traces.awk

the output will be like:

schedule,schedule_timeout,osc_extent_wait,osc_cache_wait_range,
  osc_io_fsync_end,cl_io_end,lov_io_end_wrapper,lov_io_fsync_end,
  cl_io_end,cl_io_loop,cl_sync_file_range,ll_writepages,do_writepages,
  __writeback_single_inode,writeback_sb_inodes,wb_writeback,wb_workfn,
  process_one_work,worker_thread,kthread PIDs: 7
schedule,schedule_hrtimeout_range_clock,poll_schedule_timeout,
  do_sys_poll,__se_sys_poll PIDs: 2130
schedule,schedule_hrtimeout_range_clock,do_sigtimedwait,
  __se_sys_rt_sigtimedwait PIDs: 2251
schedule,osd_trans_stop,ofd_commitrw,tgt_brw_write,tgt_request_handle,
  ptlrpc_main,kthread PIDs: 11720
schedule,mdt_restriper_main PIDs: 12859
schedule,wb_wait_for_completion,sync_inodes_sb,sync_filesystem,
  generic_shutdown_super,kill_anon_super,deactivate_locked_super,
  cleanup_mnt,task_work_run,exit_to_usermode_loop PIDs: 15097

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I94514189f15cf559336217fddf7b665dde0c8f77
Reviewed-on: https://review.whamcloud.com/38936
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13742 llite: do not bypass selinux xattr handling 69/39569/3
Shaun Tancheff [Wed, 5 Aug 2020 14:17:03 +0000 (09:17 -0500)]
LU-13742 llite: do not bypass selinux xattr handling

Without the hint from selinux_is_enabled() to determine if selinux
is running at boot the performance fix from LU-549 to skip handling
of selinux xattrs cannot be correctly handled.

The correct path is to act is if selinux is enabled.

This fixes a bug introduced by LU-12355 that now exists in
RHEL 8.2 kernels where clients have enabled selinux.

Fixes: 39e5bfa734 ("LU-12355 llite: include file linux/selinux.h removed")
Test-Parameters: clientdistro=el8.2 serverdistro=el8.2 clientselinux testlist=sanity-selinux
Test-Parameters: clientdistro=el8.1 serverdistro=el8.1 clientselinux testlist=sanity-selinux
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6fb5ed9ecdb79545225b5586b90509eb157a355b
Reviewed-on: https://review.whamcloud.com/39569
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 lmv: make various functions static. 02/39402/2
Mr NeilBrown [Thu, 16 Jul 2020 06:30:47 +0000 (16:30 +1000)]
LU-6142 lmv: make various functions static.

Multiple function in lmv_obd.c are only used in the same file, so they
can be made static.

Also make a few style cleanups and fix a spelling error.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I84bf4e8d485dc6a7f8811035b8a689e5e0d91455
Reviewed-on: https://review.whamcloud.com/39402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 lov: make various lov_object.c function static. 85/39385/2
Mr NeilBrown [Wed, 15 Jul 2020 07:14:08 +0000 (17:14 +1000)]
LU-6142 lov: make various lov_object.c function static.

These function in lov_object.c and lovsub_object.c are only
used in the file in which they are defined.
So mark them as static.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I958f4e850d13d2ced32772e0e66627eb40a1bf36
Reviewed-on: https://review.whamcloud.com/39385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 obdclass: make obd_psdev static 95/39395/2
Mr NeilBrown [Thu, 16 Jul 2020 03:53:39 +0000 (13:53 +1000)]
LU-6142 obdclass: make obd_psdev static

obd_psdev is only used in one file, so it can be local to that file.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib74d78e0e72e054d5f998d54f1476216926b293b
Reviewed-on: https://review.whamcloud.com/39395
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 lustre: use init_wait(), not init_waitqueue_entry() 00/39300/2
Mr NeilBrown [Tue, 7 Jul 2020 22:14:21 +0000 (08:14 +1000)]
LU-6142 lustre: use init_wait(), not init_waitqueue_entry()

init_waitqueue_entry(foo, current)

is equivalent to

    init_wait(foo)

So use the shorter version - in lustre and libcfs.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I621364d8f6b155df3f2159dfca39f252abc81c76
Reviewed-on: https://review.whamcloud.com/39300
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13740 build: update changelog for ubuntu kernel 31/39231/4
James Simmons [Mon, 27 Jul 2020 17:11:20 +0000 (13:11 -0400)]
LU-13740 build: update changelog for ubuntu kernel

With all the lastest Linux kernel supported added to Lustre
enabling Ubuntu 20.04 LTS support for clients is already there.

Test-Parameters: trivial
Change-Id: I35916f3205bff62e2c6ef01f03725aa890c5c8e7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39231
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
11 months agoLU-13617 tests: check client deadlock selinux 93/38793/4
Alexander Boyko [Mon, 1 Jun 2020 12:38:07 +0000 (08:38 -0400)]
LU-13617 tests: check client deadlock selinux

The patch adds test_20e to sanity-selinux. It checks client deadlock
and MDS eviction for it.

Test-Parameters: trivial testlist=sanity-selinux env=ONLY=20e
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Cray-bug-id: LUS-8924
Change-Id: If7707fa14f7307fb3a3fb2228fbd1983b55cbe6b
Reviewed-on: https://review.whamcloud.com/38793
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13759 test: make sanityn test_20 repeatable 40/39540/5
Mikhail Pershin [Thu, 30 Jul 2020 11:56:46 +0000 (14:56 +0300)]
LU-13759 test: make sanityn test_20 repeatable

- make sanityn.sh test_20 able to work with ONLY_REPEAT
  parameter by using $tdir and $tfile variable which are
  cleaned up by test framework and test can be repeated
- change sanity-dom.sh way to define sanity.sh and sanityn.sh
  parameters to allow selected tests run by using SANITY_ONLY,
  SANITYN_ONLY to choose specific tests. It supports also
  SANITY_REPEAT/SANITYN_REPEAT to repeat those tests.

Test-Parameters: trivial testlist=sanity-dom env=SANITY_ONLY=36,SANITYN_ONLY=20,SANITYN_REPEAT=50
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I314703006a8f53092daf1359f4c4694c704354d2
Reviewed-on: https://review.whamcloud.com/39540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
11 months agoLU-13809 mdc: fix lovea for replay 68/39468/6
Alexander Zarochentsev [Thu, 18 Jun 2020 06:18:05 +0000 (09:18 +0300)]
LU-13809 mdc: fix lovea for replay

lmm->lmm_stripe_offset gets overwritten by
layout generation at server reply,
so MDT does not recognize such LOVEA as
a valid striping at open request replay.
This patch extendes LU-7008 fix by supporting
of PFL layout.

HPE-bug-id: LUS-8820
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: If28836c2fcb08620dd3dc869ddfe35147c69e711
Reviewed-on: https://review.whamcloud.com/39468
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-12275 sec: use memchr_inv() to check if page is zero. 59/39459/3
Mr NeilBrown [Tue, 21 Jul 2020 01:09:37 +0000 (11:09 +1000)]
LU-12275 sec: use memchr_inv() to check if page is zero.

memchr_inv() is the preferred way to check if a memory region is all
zeros.  It is likely fast that memcmp() is it doesn't need to read the
ZERO_PAGE into cache, or into the CPU.  It was introduced in Linux
3.2.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0a5c3d30d5db43a3f5ebb270ea66b9db2b200a9a
Reviewed-on: https://review.whamcloud.com/39459
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
11 months agoLU-13759 dom: lock cancel to drop pages 01/39401/7
Mikhail Pershin [Wed, 15 Jul 2020 05:12:55 +0000 (08:12 +0300)]
LU-13759 dom: lock cancel to drop pages

Prevent stale pages after lock cancel by creating
cl_page connection for read-on-open pages.

This reverts 02e766f5ed to fix the problem.
Since VM pages are connected to cl_object they can be
found and discarded by CLIO properly.

Fixes: 02e766f5ed ("LU-11427 llite: optimize read on open pages")
Test-Parameters: mdssizegb=20 testlist=dom-performance
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Iba8c87c934c442b4c0b45d7d3821ceede1a6e68f
Reviewed-on: https://review.whamcloud.com/39401
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13359 quota: make used for pool correct 98/39298/7
Sergey Cheremencev [Tue, 7 Jul 2020 20:19:48 +0000 (23:19 +0300)]
LU-13359 quota: make used for pool correct

Before this patch used space for quota pool
was a sum of a space used by user at all OSTs
in a system. Now it is fixed and lfs quota --pool
takes into account only OSTs form the pool.
With option -v it also shows only OSTs from the pool.

Change-Id: Idf1c8ed66fca7caec70246ea4182df883bcef23c
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39298
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13127 ptlrpc: don't require CONFIG_CRYPTO_CRC32 01/39201/5
Andreas Dilger [Sat, 27 Jun 2020 11:32:37 +0000 (05:32 -0600)]
LU-13127 ptlrpc: don't require CONFIG_CRYPTO_CRC32

Don't require CONFIG_CRYPTO_CRC32 to build if not configured,
as it may not be available for all kernels and is easily fixed.

Consolidate the early reply code in sec_plain.c to also call
lustre_msg_calc_cksum() to reduce code duplication.

Fixes: e1a0f602a608 ("LU-13127 libcfs: make noise to console if CRC32 is missing")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I00511df418ddfbd8522936cf2bc0f3193d2540e5
Reviewed-on: https://review.whamcloud.com/39201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13718 tests: add LU numbers to skipped tests 17/39217/6
Vikentsi Lapa [Mon, 29 Jun 2020 08:06:14 +0000 (08:06 +0000)]
LU-13718 tests: add LU numbers to skipped tests

Some tests in ALWAYS_EXCEPT variable do not contain LU- numbers in
description. Also tests formatting inconsistent (mixed tabs and
spaces in one line).

This patch adds missing numbers and corrects formatting. This
improvement reduce time to find reasons why test was skipped and let
parse code to build table with skipped tests list.

Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=8 osscount=1 ostcount=4
Signed-off-by: Vikentsi Lapa <vlapa@whamcloud.com>
Change-Id: Iada59d1e01b8ecd07af91157abef483df7715178
Reviewed-on: https://review.whamcloud.com/39217
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13790 socklnd: NID to interface mapping issues 08/39408/3
Serguei Smirnov [Thu, 16 Jul 2020 18:16:48 +0000 (14:16 -0400)]
LU-13790 socklnd: NID to interface mapping issues

Fix the NID to interface mapping in ksocknal_startup to make sure
the messages go out the interface assigned by LNet on a system
with multiple interfaces configured.

Test-Parameters: trivial
Fixes: b770d7117f35 ("LU-11893 lnet: consoldate secondary IP address handling")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I22a47fcf17dc0b8b2bf2abebb6b295f4b0550c00
Reviewed-on: https://review.whamcloud.com/39408
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13787 build: fix snmp / libcfs build order 88/39388/3
James Simmons [Wed, 15 Jul 2020 18:33:24 +0000 (14:33 -0400)]
LU-13787 build: fix snmp / libcfs build order

The Lustre snmp code is dependent on libcfs so make
sure libcfs is built first. This only shows up in
parallel builds.

Fixes: 742897a967cf ("LU-13274 uapi: make lnet UAPI headers C99 compliant")
Test-Parameters: trivial
Change-Id: Ibe1669e8586eb54129d3b9dd74b0287766af0bf3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39388
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13784 tests: allow QUOTA_TYPE to be set 59/39359/2
Alexander Zarochentsev [Mon, 15 Jun 2020 15:07:20 +0000 (18:07 +0300)]
LU-13784 tests: allow QUOTA_TYPE to be set

QUOTA_TYPE unconditionally set to "ug3" in
lustre/test/cfg/local.sh, it makes enabling
project quota support a non trivial task;
Fixing conf-sanity test 86 to accept more than
one -O option.

HPE-bug-id: LUS-8983
Test-Parameters: envdefinitions="ENABLE_QUOTA=yes QUOTA_TYPE=p"
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ie9bfa536b5ea704e0637afb10a8bb82c64b2bdf6
Reviewed-on: https://review.whamcloud.com/39359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13782 lnet: Have LNet routers monitor the ni_fatal flag 53/39353/3
Chris Horn [Thu, 9 Jul 2020 18:33:49 +0000 (13:33 -0500)]
LU-13782 lnet: Have LNet routers monitor the ni_fatal flag

Have the LNet monitor thread on LNet routers check the
ni_fatal_error_on flag to set local NI status appropriately. When
this results in a status change, perform a discovery push to all
peers. This allows peers to update their route status appropriately.

Test-Parameters: trivial
HPE-bug-id: LUS-9068
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic4f8f33c6377f4b95f6ab95f9714414c6b9ab5e6
Reviewed-on: https://review.whamcloud.com/39353
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13764 lnet: Clear lp_dc_error when discovery completes 48/39348/2
Chris Horn [Wed, 8 Jul 2020 21:03:48 +0000 (16:03 -0500)]
LU-13764 lnet: Clear lp_dc_error when discovery completes

If discovery completes successfully then we can clear the
lp_dc_error.

Test-Parameters: trivial
HPE-bug-id: LUS-9081
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If709022c5c4ba0ab8f01b3f4b508ed464fd0b6ff
Reviewed-on: https://review.whamcloud.com/39348
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13761 o2ib: Fix compilation with MOFED 5.1 23/39323/5
Sergey Gorenko [Tue, 7 Jul 2020 11:31:31 +0000 (14:31 +0300)]
LU-13761 o2ib: Fix compilation with MOFED 5.1

A new argument was added to rdma_reject() in MOFED 5.1 and
Linux 5.8.

Add a cofigure check and support both versions of rdma_reject().

Test-Parameters: trivial
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Change-Id: I2b28991f335658b651b21a09899b7b17ab2a9d57
Reviewed-on: https://review.whamcloud.com/39323
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13775 target: fix memory copy in tgt_pages2shortio() 33/39333/4
Wang Shilong [Fri, 10 Jul 2020 07:29:39 +0000 (15:29 +0800)]
LU-13775 target: fix memory copy in tgt_pages2shortio()

tgt_pages2shortio() try to copy local pages memory to ptlrpc
inline buf.

The right logic should move page @ptr to offset + count, however,
it does this logic wrongly, this doesn't cause any problem so
far, because normally @lnb_page_offset is 0. when i tried to
play with unaligned DIO, we could hit the problem.

Anyway, fix to use right logic to handle memory.

Fixes: 70f092a ("LU-1757 brw: add short io osc/ost transfer.")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I0a2e05732c0f425043af8393eb41f6bec178da6f
Reviewed-on: https://review.whamcloud.com/39333
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-13437 llite: pack parent FID in getattr 90/39290/5
Lai Siyao [Mon, 6 Jul 2020 13:52:45 +0000 (21:52 +0800)]
LU-13437 llite: pack parent FID in getattr

Pack parent FID in getattr request if OBD_CONNECT2_GETATTR_PFID is
enabled, otherwise fill it with target FID for backward compatibility.

Fixes: f9a2da63 ("LU-13437 mdt: don't fetch LOOKUP lock for remot...")
Test-Parameters: clientversion=2.12 testlist=sanity env=SANITY_EXCEPT="27M 151 156"
Test-Parameters: serverversion=2.12 testlist=sanity env=SANITY_EXCEPT="56 165 205b"
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idcf8388b65dee1f0a09a53b240ce8303f3c6ff75
Reviewed-on: https://review.whamcloud.com/39290
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13734 lnet: Allow duplicate nets in ip2nets syntax 27/39227/10
Chris Horn [Mon, 29 Jun 2020 18:44:07 +0000 (13:44 -0500)]
LU-13734 lnet: Allow duplicate nets in ip2nets syntax

Before the MR feature was implemented, it was not possible to
configure multiple interfaces on the same LNet, so the ip2nets
syntax did not allow for this. Now that we have MR feature, we should
allow it to be configured via ip2nets syntax. e.g.

o2ib(ib0) 10.10.10.1
o2ib(ib1) 10.10.10.2

A test is added for configuring LNet with kernel ip2nets parameter.

setup_netns() refactored to facilitate the new test.

cleanup_lnet() is modified to check whether lnet module is loaded
before attempting lnetctl lnet unconfigured otherwise sanity-lnet.sh
could exit with rc 234 on cleanup.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-9046
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iafc3882035269073fd7e4abb53d138d9267f6e21
Reviewed-on: https://review.whamcloud.com/39227
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-12678 lnet: clarify initialization of lpni_refcount 20/39120/4
Mr NeilBrown [Mon, 22 Jun 2020 03:57:02 +0000 (13:57 +1000)]
LU-12678 lnet: clarify initialization of lpni_refcount

This refcount is not explicitly initialized, so is implicitly
initialized to zero.  This prohibits the use
lnet_peer_ni_addref_locked() for taking the first reference,
so a couple of places open-code the atomic_inc just in case.

There is code in lnet_peer_add_nid() which drops a reference before
accessing the structure.  This isn't actually wrong, but it looks
weird.

lnet_destroy_peer_ni_locked() makes assumptions about the content of
the structure, so it cannot be used on a partially initialized
structure.

All these special cases make the code harder to understand.  This
patch cleans this up:

- lpni_refcount is now initialized to one, so the called for
  lnet_peer_ni_alloc() now owns a reference and must be sure
  to release it.
- lnet_peer_attach_peer_ni() now consumes a reference to
  the lpni.  A pointer returned by lnet_peer_ni_alloc()
  is most often passed to lnet_peer_attach_peer_ni() so
  these to changes largely cancel each other out - not completely
- The two 'atomic_inc' calls are changed to
  'lnet_peer_ni_addref_locked().
- A LIBCFS_FREE() is replaced by lnet_peer_ni_decref_locked(),
  and that function is improved to cope with lpni_hashlist
  being empty, or ->lpni_net being NULL.
- lnet_peer_add_nid() now holds a reference on the lpni until
  it don't need it any more, then explicity drops it.

This should make no functional change, but should make the code a
little less confusing.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iec312e637d1e7b6eb14f2c363843403dd5cf8e8f
Reviewed-on: https://review.whamcloud.com/39120
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 lnet: discard unused lnet_print_hdr() 58/39358/2
Mr NeilBrown [Tue, 14 Jul 2020 04:12:54 +0000 (14:12 +1000)]
LU-6142 lnet: discard unused lnet_print_hdr()

lnet_print_hdr() is unused, and has not been used in git history since
at least 2004.
So remove it.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0d72726b69f205a8c62ae4dbf1423f6e745db5fe
Reviewed-on: https://review.whamcloud.com/39358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 socklnd: remove declarations of missing functions. 26/39326/2
Mr NeilBrown [Mon, 6 Jul 2020 01:53:39 +0000 (11:53 +1000)]
LU-6142 socklnd: remove declarations of missing functions.

Noe of ksocknal_query(), ksocknal_notify(), and
ksocknal_lib_bind_thread_to_cpu() exist, so don't declare them.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ice87c317f116bda9c04dcaa285bc7ba47be219ca
Reviewed-on: https://review.whamcloud.com/39326
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13712 lnet: Preferred NI logic breaks MR routing 68/39168/3
Chris Horn [Wed, 24 Jun 2020 16:17:45 +0000 (11:17 -0500)]
LU-13712 lnet: Preferred NI logic breaks MR routing

Edge (final-hop) routers typically use the non-multi-rail destination
(NMR_DST) send case. i.e. they treat the destination as
non-multi-rail. The reason for this is that we do not want routers to
modify the destination peer interface selected by the message
originator. As a result of using the NMR_DST send case, edge routers
set a preferred NI, and then continue to use that NI, because it's
preferred, even if the NI goes down and the router has other healthy
interfaces available to it. Routers do not need to use the preferred
NI selection logic when they are forwarding a message, so modify the
NMR_DST algorithm to allow routers to select any suitable local NI.

Test-Parameters: trivial
HPE-bug-id: LUS-9045
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iae0fb47d58a70f640d316a8c85cf3058ca2f82eb
Reviewed-on: https://review.whamcloud.com/39168
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13502 lnet: Ensure LNet pings and pushes are always tracked 51/38451/8
Chris Horn [Fri, 1 May 2020 20:50:57 +0000 (15:50 -0500)]
LU-13502 lnet: Ensure LNet pings and pushes are always tracked

Add the appropriate option to the MD used for LNet pings and pushes
to ensure that these are always tracked via LNet's response tracking
mechanism, regardless of the value of lnet_response_tracking
variable.

Test-Parameters: trivial
HPE-bug-id: LUS-8827
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I13d8ee42ccbb00c85843f64314b1f953d679a0dc
Reviewed-on: https://review.whamcloud.com/38451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-13502 lnet: Add param to control response tracking 49/38449/11
Chris Horn [Fri, 1 May 2020 20:47:06 +0000 (15:47 -0500)]
LU-13502 lnet: Add param to control response tracking

Add lnet_response_tracking parameter which will be used to control
the behavior of LNet response tracking.

Test-Parameters: trivial
HPE-bug-id: LUS-8827
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I9c5be488673bbaa3c3cb983fe099d2203c1d9fa7
Reviewed-on: https://review.whamcloud.com/38449
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoNew tag 2.13.55 2.13.55 v2_13_55
Oleg Drokin [Wed, 22 Jul 2020 16:55:23 +0000 (12:55 -0400)]
New tag 2.13.55

Change-Id: Iefb108c0dc97b0c69407e07ca08bcfc14f7dbfe2
Signed-off-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13437 uapi: add OBD_CONNECT2_GETATTR_PFID 89/39289/2
Lai Siyao [Mon, 6 Jul 2020 13:03:59 +0000 (21:03 +0800)]
LU-13437 uapi: add OBD_CONNECT2_GETATTR_PFID

Add OBD_CONNECT2_GETATTR_PFID connect flag to pack parent FID in
getattr request, which will be used to check whether target is
remote object, if so, don't take LOOKUP lock, otherwise client
may see stale directory entries.

Test-parameters: trivial

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibdf880934456f255f83cd4bac9d61ab5e1ed7330
Reviewed-on: https://review.whamcloud.com/39289
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13731 autoconf: check if VM_FAULT_RETRY is defined 81/39281/2
Jian Yu [Sun, 5 Jul 2020 08:16:06 +0000 (01:16 -0700)]
LU-13731 autoconf: check if VM_FAULT_RETRY is defined

In RHEL 8.2 kernel 4.18.0-193.el8, VM_FAULT_RETRY is
defined as an enumeration constant in linux/mm_types.h
instead of a macro in linux/mm.h. This patch adds
autoconf macros to check if VM_FAULT_RETRY is defined
at configure time.

Test-Parameters: clientdistro=el8.2 serverdistro=el8.2 \
testlist=sanity

Test-Parameters: clientdistro=el8.1 serverdistro=el8.1 \
testlist=sanity

Fixes: 2e813f3e2d ("LU-13731 llite: include linux/mm_types.h for VM_FAULT_RETRY")
Change-Id: I2fdae7b62a53e447a7eb979787bdbd79423b787d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39281
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Menadue <ben.menadue@anu.edu.au>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13732 lfs: fid2path should match the root path correctly 25/39225/6
Emoly Liu [Wed, 1 Jul 2020 10:07:00 +0000 (18:07 +0800)]
LU-13732 lfs: fid2path should match the root path correctly

This patch is to match the root path in function get_root_path()
correctly. For example, if the mount point is /mnt/lustre, the
following root path formats are acceptable:
- /mnt/lustre
- /mnt/lustre/*

sanity.sh test_154A/247d are modified to verify this patch.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: If705dd341b273d462aeba280fa27d5608b5f3b7c
Reviewed-on: https://review.whamcloud.com/39225
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12275 sec: check if page is empty with ZERO_PAGE 18/38918/8
Sebastien Buisson [Fri, 12 Jun 2020 10:52:28 +0000 (10:52 +0000)]
LU-12275 sec: check if page is empty with ZERO_PAGE

In osc_brw_fini_request(), page needs decryption only if it
is not empty. To check this, use ZERO_PAGE macro available
for all architectures, and compare with memcmp.
It will likely be faster/more efficient than comparing the
words by hand as may use optimized CPU instructions or ASM code.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5e04b72790e8acbceb1989ba3659e170c0b11192
Reviewed-on: https://review.whamcloud.com/38918
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12275 sec: encryption support for DoM files 02/38702/20
Sebastien Buisson [Fri, 22 May 2020 07:27:48 +0000 (07:27 +0000)]
LU-12275 sec: encryption support for DoM files

On client side, data read from DoM files do not go through the OSC
layer. So implement file decryption in ll_dom_finish_open() right
after file data has been put in cache pages.
On server side, DoM file size needs to be properly set on MDT when
content is encrypted. Pages are full of encrypted data, but inode size
must be apparent, clear text object size.
For reads of DoM encrypted files to work proprely, we also need to
make sure we send whole encryption units to client side.
Also add sanity-sec test_50 to exercise encryption of DoM files.

Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50" clientdistro=el8.1 fstype=ldiskfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50" clientdistro=el8.1 fstype=zfs mdscount=2 mdtcount=4
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7721ca4085373a7a01b2062c37458a7136e646e0
Reviewed-on: https://review.whamcloud.com/38702
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13593 ptlrpc: fix growing message buffer 01/38701/18
Sebastien Buisson [Wed, 20 May 2020 09:17:57 +0000 (18:17 +0900)]
LU-13593 ptlrpc: fix growing message buffer

In case some buffers need to be moved because of segment growth
from req_capsule_server_grow(), just set buflen to old length
before actually calling lustre_grow_msg().

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6707927a0f24c0637dbc79aa91788122a84ab8c4
Reviewed-on: https://review.whamcloud.com/38701
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13586 tests: Quota Pools with PFL and SEL 61/38661/9
Sergey Cheremencev [Tue, 19 May 2020 11:41:15 +0000 (14:41 +0300)]
LU-13586 tests: Quota Pools with PFL and SEL

Add sanity-quota_71a that does write to a file
consisted of 2 components on different OSTs(each OST
relates to unique pool). Check that limits in quota
pools work properly. sanity-quota_71b does the same
but for a file with SEL.

Test-Parameters: envdefinitions=ONLY=71 testlist=sanity-quota
Change-Id: I835bec4c9b21c287142e1df9b4cfe797ec68fbef
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/38661
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12275 sec: atomicity of encryption context getting/setting 30/38430/26
Sebastien Buisson [Thu, 30 Apr 2020 15:23:00 +0000 (15:23 +0000)]
LU-12275 sec: atomicity of encryption context getting/setting

Encryption layer needs to set an encryption context on files and dirs
that are encrypted. This context is stored as an extended attribute,
that then needs to be fetched upon metadata ops like lookup, getattr,
open, truncate, and layout.

With this patch we send encryption context to the MDT along with
create RPCs. This closes the insecure window between creation and
setting of the encryption context, and saves a setxattr request.

This patch also introduces a way to have the MDT return encryption
context upon granted lock reply, making the encryption context
retrieval atomic, and sparing the client an additional getxattr
request.

Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49" clientdistro=el8.1 fstype=ldiskfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49" clientdistro=el8.1 fstype=zfs mdscount=2 mdtcount=4
Test-Parameters: clientversion=2.12 env=SANITY_EXCEPT="27M 56ra 151 156 802"
Test-Parameters: serverversion=2.12 env=SANITY_EXCEPT="56oc 56od 165a 165b 165d 205b"
Test-Parameters: serverversion=2.12 clientdistro=el8.1 env=SANITYN_EXCEPT=106,SANITY_EXCEPT="56oc 56od 165a 165b 165d 205b"
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I45599cdff13d5587103aff6edd699abcda6cb8f4
Reviewed-on: https://review.whamcloud.com/38430
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12275 sec: force file name encryption policy to null 82/38882/9
Sebastien Buisson [Tue, 9 Jun 2020 15:27:53 +0000 (15:27 +0000)]
LU-12275 sec: force file name encryption policy to null

Force file/directory name encryption policy to null on newly created
inodes. This is required because first implementation step of client
side encryption only supports content encryption, and not names.
This imposes to force usage of embedded llcrypt lib to the detriment
of in-kernel fscrypt lib, even if the kernel provides it.

This patch will have to be reverted when name encryption is
implemented.

Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48" clientdistro=el8.1 fstype=ldiskfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48" clientdistro=el8.1 fstype=zfs mdscount=2 mdtcount=4
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ia697a29006507278c218088d7c3a5e5ade620a15
Reviewed-on: https://review.whamcloud.com/38882
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13776 tests: make sure pjdfstest.sh writes to tmp 38/39338/4
Oleg Drokin [Sat, 11 Jul 2020 04:51:27 +0000 (00:51 -0400)]
LU-13776 tests: make sure pjdfstest.sh writes to tmp

no writes to random Lustre source locations as they could be readonly

Test-Parameters: trivial
Test-Parameters: fstype=ldiskfs testlist=pjdfstest
Test-Parameters: fstype=zfs testlist=pjdfstest
Change-Id: Icd262a698390eadf4b53cd5d311bc6c2a561a79e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39338
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
12 months agoLU-12275 sec: introduce null algo for filename encryption 81/38881/8
Sebastien Buisson [Tue, 9 Jun 2020 15:11:42 +0000 (15:11 +0000)]
LU-12275 sec: introduce null algo for filename encryption

Introduce a "null" algorithm for client side filename encryption,
which is basically a no-op.
This is needed because first implementation step only supports
content encryption, and not names. So give the ability to support
encryption policies that have a 'filenames_encryption_mode' property
internally set to LLCRYPT_MODE_NULL.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3470f89f227b3a03e56766fe1ba5e36ae92ec27b
Reviewed-on: https://review.whamcloud.com/38881
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13485 build: Make parallel tests names unique 61/39161/4
Shaun Tancheff [Wed, 24 Jun 2020 18:36:38 +0000 (13:36 -0500)]
LU-13485 build: Make parallel tests names unique

This iteration re-used the serial tests internal name for the
parallel test variant. This name clashing causes builds to fail.

Remedy this by adding a postfix tag the internal unique identifier
of the converted parallel tests.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia39bf2e0004abd10a0c7b146eb2ef1cf62e6d891
Reviewed-on: https://review.whamcloud.com/39161
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
12 months agoLU-13366 tests: add SEL support to racer 59/37959/6
Vitaly Fertman [Fri, 13 Mar 2020 12:47:00 +0000 (15:47 +0300)]
LU-13366 tests: add SEL support to racer

some files are created with a sel layout if RACER_ENABLE_SEL is set

Test-Parameters: testlist=racer envdefinitions="RACER_ENABLE_SEL=true"
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I1f699988fdd0a0dbb19b71d6ea353d68326d0d5b
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/156106
HPE-bug-id: LUS-7591
Reviewed-on: https://review.whamcloud.com/37959
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13366 lod: check for extension size at instantiation time 61/37961/8
Vitaly Fertman [Fri, 13 Mar 2020 13:42:35 +0000 (16:42 +0300)]
LU-13366 lod: check for extension size at instantiation time

lod_statfs_and_check() may consider an OST as OK, but SEL needs its
extension size to fit the free space and the OST may turn out to be
low-on-space, thus inappropriate OST may be chosen, SEL cannot use
it and ENOSPC occurs.

Take the extension size into account at the object allocation time.
If none of OSTs is able to allocate an object, try to allocate in old
manner instead of returning ENOSPC from lod_qos_prep_create().

Adjust tests to work on smaller extension sizes to fit into OSTs
used for sanity tests to enable the new logic above.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: If5159200a72a68a7261aa7031e58a1ac6a8e3f24
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/156372
HPE-bug-id: LUS-8157
Reviewed-on: https://review.whamcloud.com/37961
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13344 all: Separate debugfs and procfs handling 34/37834/14
Shaun Tancheff [Thu, 4 Jun 2020 21:27:44 +0000 (16:27 -0500)]
LU-13344 all: Separate debugfs and procfs handling

Linux 5.6 introduces proc_ops with v5.5-8862-gd56c0d45f0e2
proc: decouple proc from VFS with "struct proc_ops"

Separate debugfs usage and procfs usage to prepare for the divergence
of debugfs using file_operations and procfs using proc_ops

HPE-bug-id: LUS-8589
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1746e563b55a9e89f90ac01843c304fe6b690d8b
Reviewed-on: https://review.whamcloud.com/37834
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13196 llite: Remove mutex on dio read 19/37419/6
Patrick Farrell [Tue, 4 Feb 2020 03:10:23 +0000 (22:10 -0500)]
LU-13196 llite: Remove mutex on dio read

DIO reads in Lustre are protected by Lustre range locking
and do not need the inode mutex.  This code was removed
in LU-1669, the range lock was added for DIO reads in
LU-6227, and then the mutex was accidentally re-introduced
in LU-6260.

Remove it again.

Test-Parameters: envdefinitions=ONLY=16 testlist=sanityn,sanityn,sanityn,sanityn,sanityn,sanityn,sanityn,sanityn
Fixes: 98883bd3e2cc ("LU-6260 llite: add support for direct IO api changes")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I5b3570e83a4b4ff36d9a22bc6bd3be5d5f991924
Reviewed-on: https://review.whamcloud.com/37419
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-9679 mdc: create mdc_acl.c 92/39292/4
Mr NeilBrown [Mon, 6 Jul 2020 12:34:47 +0000 (08:34 -0400)]
LU-9679 mdc: create mdc_acl.c

This new C file contains acl related code so it can be
conditionally compiled.

Also remove conditional code around the call to mdc_unpack_acl(), as
those tests are already performed inside mdc_unpack_acl().

In the Makefile, use CONFIG_FS_POSIX_ACL to control conditional
compilation as CONFIG_LUSTRE_FS_POSIX_ACL is not available, but the
two are semantically equivalent in this context.

This removes all #ifdef of CONFIG_LUSTRE_FS_POSIX_ACL from C files in
'mdc'.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic9060bee4a2ad55580d8879fe32fee01b1cb8884
Reviewed-on: https://review.whamcloud.com/39292
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-9679 lustre: make ptlrpc_connection_put() static inline 91/39291/3
Mr NeilBrown [Mon, 6 Jul 2020 12:34:48 +0000 (08:34 -0400)]
LU-9679 lustre: make ptlrpc_connection_put() static inline

This function needs to be called from the obdclass modules,
but is currently defined in a module that depends on that module.
The get around this interdependence, a global variable
  ptlrpc_put_connection_superhack
is used to make a pointer to the function available.

Rather than this hack, we can make ptlrpc_connection_put()
static-inline.  This does expose some details of ptlrpc to obdclass,
but there is already a fairly tight connection.

Also change the return value to 'void' as it is never used,
and don't bother checking for NULL before calling, as the
function has its own test for NULL.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I37333b78e410e8c46bddc468c31bed61dd9e7b33
Reviewed-on: https://review.whamcloud.com/39291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-9679 lov: annotate nested locking of obd_dev_mutex 48/39248/2
Mr NeilBrown [Thu, 2 Jul 2020 21:43:45 +0000 (07:43 +1000)]
LU-9679 lov: annotate nested locking of obd_dev_mutex

obd_statfs() can call lmv_statfs() with ->obd_dev_mutex helds.
lmv_statfs will then call obd_statfs() on a different device
and ->obd_dev_mutex will be taken again.  This is a *different*
mutex, but lockdep cannot see the difference, so it complains.

We can tell lockdep not to worry in this case using
mutex_lock_interruptible_nested().

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0776407b722dd29ab1321289953b63f76fce7ceb
Reviewed-on: https://review.whamcloud.com/39248
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13754 gss: open sptlrpc init channel in R+W mode 97/39297/3
Sebastien Buisson [Tue, 7 Jul 2020 14:59:08 +0000 (23:59 +0900)]
LU-13754 gss: open sptlrpc init channel in R+W mode

Linux 5.3 changed struct cache_detail readers to writers.
As this mechanism is used by GSS authentication in Lustre via SunRPC,
we need to make sure lsvcgssd daemon does open
/proc/net/rpc/auth.sptlrpc.init/channel in R+W mode.

It also affects CentOS/RHEL 7.8, as the kernel commit was ported to
these distros.

Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true,SANITY_EXCEPT="56w 405" mdscount=2 mdtcount=4 osscount=2 ostcount=8 testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If88802d4f2bc3168dda4f79fe57f2f44ac7ef39e
Reviewed-on: https://review.whamcloud.com/39297
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12586 lov: one more fix to write_intent end for trunc 12/38412/8
Bobi Jam [Wed, 29 Apr 2020 09:14:04 +0000 (17:14 +0800)]
LU-12586 lov: one more fix to write_intent end for trunc

This patch fixes another case where the truncate write intent
extent is set incorrectly.  This may cause errors when truncating
PFL files to exactly the boundary between two extents.

Fixes: c32c7401426d ("LU-12586 lov: Correct write_intent end for trunc")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I40f14a48c53bfd1e6442e4414ee30f9eb159fc02
Reviewed-on: https://review.whamcloud.com/38412
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
12 months agoLU-13729 osd-ldiskfs: race access to iam_formats during setup 13/39213/2
Wang Shilong [Tue, 30 Jun 2020 01:12:48 +0000 (09:12 +0800)]
LU-13729 osd-ldiskfs: race access to iam_formats during setup

It might be possible during OST mounting, two targets reach
iam_format_guess() at the same time, if @initialized is 0,
they both access iam_lxx_format_init(), however list operation
inside is not protected by any locking which cause list corruptions
finally.

We could fix this by doing formats registration in module init,
since there are only two formats, just remove pointless list.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I6dd5a4d1297792b47fb4b94052465a7e0f9123aa
Reviewed-on: https://review.whamcloud.com/39213
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13713 lnet: check rtr_nid is a gateway 75/39175/2
Amir Shehata [Wed, 24 Jun 2020 23:46:58 +0000 (16:46 -0700)]
LU-13713 lnet: check rtr_nid is a gateway

The rtr_nid is specified for all REPLY/ACK. However it is possible
for the route through the gateway specified by rtr_nid to be removed.
In this case we don't want to use it. We should lookup alternative
paths.

This patch checks if the peer looked up is indeed a gateway. If it's
not a gateway then we attempt to find another path. There is no need
to fail right away. It's not a hard requirement to fail if the default
rtr_nid is not valid.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic1c93b7c6c2c8060e2cfeb8fb1cb875dbc3010f7
Reviewed-on: https://review.whamcloud.com/39175
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 socklnd: change ksnd_nthreads to atomic_t 21/39121/3
Mr NeilBrown [Sun, 7 Jun 2020 23:24:36 +0000 (19:24 -0400)]
LU-12678 socklnd: change ksnd_nthreads to atomic_t

This variable is treated like an atomic_t, but a global spinlock is
used to protect updates - and also unnecessarily to protect reads.

Change to atomic_t and avoid using the spinlock.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id94d280875a9e115dc077253c49e97a725dc91e1
Reviewed-on: https://review.whamcloud.com/39121
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13697 llite: fix short io for AIO 04/39104/8
Wang Shilong [Fri, 19 Jun 2020 14:19:18 +0000 (22:19 +0800)]
LU-13697 llite: fix short io for AIO

The problem is currently AIO could not handle i/o size > stripe size:

We need cl io loop to handle io across stripes, since -EIOCBQUEUED is
returned for AIO, io loop will be stopped thus short io happen.

The patch try to fix the problem by making IO engine aware of
special error, and it will be proceed to finish all IO requests.

Fixes: d1dde ("LU-4198 clio: AIO support for direct IO")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I1885e0fc510571417d888249b381f9c2f130ca5a
Reviewed-on: https://review.whamcloud.com/39104
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13460 lfs: make function print_failed_tgt() work correctly 59/38959/6
Emoly Liu [Wed, 17 Jun 2020 07:43:00 +0000 (15:43 +0800)]
LU-13460 lfs: make function print_failed_tgt() work correctly

param->fp_xxx_index is the number of the specified targets, not
the specific index, so it can't be passed into llapi_obd_statfs()
directly to get the statfs information. Instead, all the targets
listed in param->fp_xxx_indexes should be passed one by one.

Also, $mdt_idx in sanity.sh test_56rb should be initialized with
the correct mdt index.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Idffdddac1b5b249aa903b97912c767826f3b146c
Reviewed-on: https://review.whamcloud.com/38959
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12549 utils: Check range of quota ID for "lfs" arguments 38/38938/6
Etienne AUJAMES [Tue, 9 Jun 2020 17:07:37 +0000 (19:07 +0200)]
LU-12549 utils: Check range of quota ID for "lfs" arguments

strtoul function return a 64bits value on a 64bits system, so an
overflow occurs when we store user value into a quota/project
structure.

This commit apply the same 32 bits verification for "lfs" project,
quota,setquota and find commands on uid, gid and project id arguments.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6
Reviewed-on: https://review.whamcloud.com/38938
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
12 months agoLU-13667 ptlrpc: fix endless loop issue 15/38915/4
Hongchao Zhang [Fri, 19 Jun 2020 02:53:12 +0000 (10:53 +0800)]
LU-13667 ptlrpc: fix endless loop issue

In ptlrpc_pinger_main, if the process to ping the recoverable
clients or obd_update_maxusage takes too long time, it could
be stuck in endless loop because of the negative value returned
by pinger_check_timeout.

Change-Id: Ib7fc22b3cc31255223bc2be60224ced1a3585f87
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38915
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13648 lnet: Set remote NI status in lnet_notify 62/38862/2
Chris Horn [Fri, 22 May 2020 01:49:53 +0000 (20:49 -0500)]
LU-13648 lnet: Set remote NI status in lnet_notify

The gnilnd receives node health information asynchronous from any tx
failure, so aliveness of lpni as reported by lnet_is_peer_ni_alive()
may not match what LND is telling us. Use existing reset flag to
set cached NI status down so we can be sure that remote NIs are
correctly set down.

Test-Parameters: trivial
HPE-bug-id: LUS-8897
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1ab36b63d83fb35803eb13a330d698cfa49f17e9
Reviewed-on: https://review.whamcloud.com/38862
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 lnet: remove LNetMEUnlink and clean up related code 46/38646/13
Mr NeilBrown [Mon, 18 May 2020 00:45:05 +0000 (10:45 +1000)]
LU-12678 lnet: remove LNetMEUnlink and clean up related code

LNetMEUnlink is not particularly useful, and exposing it as an LNet
interface only provides the opportunity for it to be misused.

Every successful call to LNetMEAttach() is followed by a call to
LNetMDAttach().  If that call succeeds, the ME is owned by
the MD and the caller mustn't touch it again.
If the call fails, the caller is currently required to call
LNetMEUnlink(), which all callers do, and these are the only places
that LNetMEUnlink() are called.

As LNetMDAttach() knows when it will fail, it can unlink the ME itself
and save the caller the effort.
This allows LNetMEUnlink() to be removed which simplifies
the LNet interface.

LNetMEUnlink() is also used in in ptl_send_rpc() in a situation where
ptl_send_buf() fails.  In this case both the ME and the MD need to be
unlinked, as as they are interconnected, LNetMEUnlink() or
LNetMDUnlink() can equally do the job.  So change it to use
LNetMDUnlink().

LNetMEUnlink() is primarily a call the lnet_me_unlink(). It also
 - has some handling if ->me_md is not NULL, but that is never the
   case
 - takes the lnet_res_lock().  However LNetMDAttach() already
   takes that lock.
So none of this functionality is useful to LNetMDAttach().
On failure, it can call lnet_me_unlink() directly while ensuring
it still has the lock.

This patch:
 - moves the calls to lnet_md_validate() into lnet_md_build()
 - changes LNetMDAttach() to always take the lnet_res_lock(),
   and to call lnet_me_unlink() on failure.
 - removes all calls to LNetMEUnlink() and sometimes simplifies
   surrounding code.
 - changes lnet_md_link() to 'void' as it only ever returns
   '0', and thus simplify error handling in LNetMDAttach()
   and LNetMDBind()

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ied4e8fb544dbe1b32df7dc70439161dc74366a1d
Reviewed-on: https://review.whamcloud.com/38646
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12514 utils: try lustre_tgt filesystem for mount 99/38799/7
James Simmons [Sun, 21 Jun 2020 14:36:17 +0000 (10:36 -0400)]
LU-12514 utils: try lustre_tgt filesystem for mount

Now that Lustre supports a separate file system for the server
targets we can update mount.lustre to try "lustre_tgt" for the
server backend. For backwards compatibility if using "lustre_tgt"
fails try the original "lustre" file system type.

Change-Id: Ica08ff2abc3ad06c78d6d435db7e2fa3897e037e
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/38799
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
12 months agoLU-12687 osc: consume grants for direct I/O 96/35896/26
Vladimir Saveliev [Mon, 29 Jun 2020 11:26:57 +0000 (14:26 +0300)]
LU-12687 osc: consume grants for direct I/O

New IO engine implementation lost consuming grants by direct I/O
writes. That led to early emergence of out of space condition during
direct I/O. The below illustrates the problem:
  # OSTSIZE=100000 sh llmount.sh
  # dd if=/dev/zero of=/mnt/lustre/file bs=4k count=100 oflag=direct
  dd: error writing ‘/mnt/lustre/file’: No space left on device

Consume grants for direct I/O.

Try to consume grants in osc_queue_sync_pages() when it is called for
pages which are being writted in direct i/o.

Tests are added to verify grant consumption in buffered and direct i/o
and to verify direct i/o overwrite when ost is full.
The overwrite test is for ldiskfs only as zfs is unable to overwrite
when it is full.

Fixes: 9fe4b52ad2 ("LU-1030 osc: new IO engine implementation")
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I9a199452c564e8e8ad02f79231e8481166f3666e
Cray-bug-id: LUS-7036
Reviewed-on: https://review.whamcloud.com/35896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
12 months agoLU-5338 tests: sanity-lfsck 11b allow larger last_id 49/34349/8
Andreas Dilger [Fri, 1 Mar 2019 00:54:41 +0000 (17:54 -0700)]
LU-5338 tests: sanity-lfsck 11b allow larger last_id

In sanity-lfsck test_11b allow the OST to report the last_id
larger than the previously-used OID on the MDS.  This may
happen if the OST preallocates additional objects or skips
some objects during recovery.

Add wait_update_cond() and wait_update_facet_cond() to
allow checking an arbitrary condition against the expected
result rather than only exactly equal to the expected value.

Remove the wait_result() helper function, which is essentially
the same thing as wait_update_facet(), and only has a few users.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-lfsck
Test-Parameters: fstype=zfs mdscount=2 mdtcount=4 testlist=sanity-lfsck
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1443d16699d8115fce664a331134ca6076ecab07
Reviewed-on: https://review.whamcloud.com/34349
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-9679 llite: annotate non-owner locking 34/39234/3
Mr NeilBrown [Thu, 2 Jul 2020 01:02:02 +0000 (11:02 +1000)]
LU-9679 llite: annotate non-owner locking

The lli_lsm_sem locks taken by ll_prep_md_op_data() are sometimes
released by a different thread.  This confuses lockdep unless we
explain the situation.

So use down_read_non_owner() and up_read_non_owner().

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie6543706c658fc427461ef03448f3fcf90abaab7
Reviewed-on: https://review.whamcloud.com/39234
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-6142 fld: Fix style issues for fld_cache.c 05/39205/2
Arshad Hussain [Tue, 23 Jun 2020 19:20:27 +0000 (00:50 +0530)]
LU-6142 fld: Fix style issues for fld_cache.c

This patch fixes issues reported by checkpatch
for file lustre/fld/fld_cache.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I91377fca9437fa31091ca039afba8f0c2ad8da7b
Reviewed-on: https://review.whamcloud.com/39205
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-6142 libcfs: resolve debug.c checkpatch issues 18/39118/4
James Simmons [Sun, 28 Jun 2020 12:14:01 +0000 (08:14 -0400)]
LU-6142 libcfs: resolve debug.c checkpatch issues

Cleanup up libcfs debug.c to resolve various checkpatch issues.
This also brings us into alignment with the Linux Lustre client.

Change-Id: I011a11ce7f0d5189186f5f9d8e32954ec0ce1ff7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39118
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13514 tests: remove upgrade images for conf-sanity 09/39109/4
James Nunez [Fri, 19 Jun 2020 18:01:42 +0000 (12:01 -0600)]
LU-13514 tests: remove upgrade images for conf-sanity

conf-sanity test 32a is hanging at a high rate.  We need to
explore if the issue involves old images are having problems
upgrading to the latest version of Lustre.

Test-Parameters: trivial
Test-Parameters: env=ONLY=32a,ONLY_REPEAT=20 fstype=ldiskfs testlist=conf-sanity
Test-Parameters: env=ONLY=32 fstype=zfs testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I0ff1e9e1304192b1008551b82133d95a0010c86a
Reviewed-on: https://review.whamcloud.com/39109
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
12 months agoLU-6142 osd-ldiskfs: Fix style issues for osd_io.c 88/38788/3
Arshad Hussain [Fri, 29 May 2020 12:13:37 +0000 (17:43 +0530)]
LU-6142 osd-ldiskfs: Fix style issues for osd_io.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_io.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Iab3fdfa20271f53a794a88d71abfa4f0e72afb07
Reviewed-on: https://review.whamcloud.com/38788
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13677 quota: qunit sorting doesn't work 42/38942/2
Sergey Cheremencev [Mon, 15 Jun 2020 15:11:09 +0000 (18:11 +0300)]
LU-13677 quota: qunit sorting doesn't work

As sorting doesn't work correctly, sometimes
new decreased qunit may be not sent to OST.
If qunit reaches it's minimum and is not sent
to OST, this may cause a client to hung on
write as OST gets EINPROGRESS instead of
EDQUOT. Patch solves this issue.

Change-Id: I3ae22cc4d080968132ca762b9c0915a994ac126e
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/38942
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13606 lnet: Allow router to forward to healthier NID 98/38798/6
Chris Horn [Tue, 26 May 2020 18:47:50 +0000 (13:47 -0500)]
LU-13606 lnet: Allow router to forward to healthier NID

When a final-hop router (aka edge router) is forwarding a message,
if both the originator and destination of the message are mutli-rail
capable, then allow the router to choose a new destination lpni if
the one selected by the message originator is unhealthy or down.

Test-Parameters: trivial
HPE-bug-id: LUS-8905
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4676edc0395584c9a8c396930f2db3d6ffd99eba
Reviewed-on: https://review.whamcloud.com/38798
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13581 build: xarray and lockdep_is_held const clash 50/39150/4
Shaun Tancheff [Sun, 28 Jun 2020 03:06:28 +0000 (22:06 -0500)]
LU-13581 build: xarray and lockdep_is_held const clash

xarray support added to lustre breaks building with RHEL
debug kernels. The root cause is due to an change in
the signature of lock_is_held when CONFIG_LOCKDEP is enabled.

Provide a workaround when the const mismatch conditions
exist to enable building RHEL debug kernel packages.

Also narrow the test for xarray support to explicitly require
xa_is_value be defined to protect against relying on
incomplete xarray support.

The same xarray issue is present in MOFED 5 so the same
scheme is used to protect lock_is_held change to require
const parameter from breaking.

Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Icd51cfb111be6b30adf6f720fb680459ca8cf5b4
Reviewed-on: https://review.whamcloud.com/39150
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13723 lustre: Convert ERR_PTR(PTR_ERR()) to ERR_CAST() 04/39204/3
Arshad Hussain [Sat, 27 Jun 2020 12:25:18 +0000 (17:55 +0530)]
LU-13723 lustre: Convert ERR_PTR(PTR_ERR()) to ERR_CAST()

This patch converts ERR_PTR(PTR_ERR()) to ERR_CAST()
This fixes warning thrown by coccinelle

It also updates contrib/scripts/spelling.txt to catch
any future misuse of ERR_PTR(PTR_ERR())

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I2214f46c08e5295dd139e7b4b245881cd9f6495a
Reviewed-on: https://review.whamcloud.com/39204
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-13693 lfs: check early for MDS_OPEN_DIRECTORY 59/39159/4
John L. Hammond [Tue, 23 Jun 2020 18:00:55 +0000 (13:00 -0500)]
LU-13693 lfs: check early for MDS_OPEN_DIRECTORY

In mdt_reint_open() check earlier for MDS_OPEN_DIRECTORY/O_DIRECTORY
to avoid breaking leases used by lfs mirror when calling lfs
getstripe. Add a multi client version of sanity test_210 to sanityn.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1860fc76c8014da3e637d83b487cb28b037ba71b
Reviewed-on: https://review.whamcloud.com/39159
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-9859 libcfs: don't call unshare_fs_struct() 32/39132/3
Mr NeilBrown [Sun, 7 Jun 2020 23:24:25 +0000 (19:24 -0400)]
LU-9859 libcfs: don't call unshare_fs_struct()

A kthread runs with the same fs_struct as init.
It is only helpful to unshare this if the thread
will change one of the fields in the fs_struct:
 root directory
 current working directory
 umask.

No lustre kthread changes any of these, so there is
no need to call unshare_fs_struct().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7309b6ed184b14a272bad7dc5149ad36281f948e
Reviewed-on: https://review.whamcloud.com/39132
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 socklnd: convert various refcounts to refcount_t 30/39130/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:27 +0000 (19:24 -0400)]
LU-12678 socklnd: convert various refcounts to refcount_t

Each of these refcounts exactly follows the expectations of
refcount_t, so change the atomic_t to refcoun_t.

We can remove the LASSERTs on incref/decref as they can now be enabled
at build time with CONFIG_REFCOUNT_FULL

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I50f13465b588c30a70568a5800619cc7ec26293d
Reviewed-on: https://review.whamcloud.com/39130
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 socklnd: use list_for_each_entry_safe() 29/39129/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:28 +0000 (19:24 -0400)]
LU-12678 socklnd: use list_for_each_entry_safe()

Several loops use list_for_each_safe(), then call list_entry() as
first step.  These can be merged using list_for_each_entry_safe().

In one case, the 'safe' version is clearly not needed, so just use
list_for_each_entry().

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I52eb3598cf4308dfac0aad2a493c4b91c4102e7d
Reviewed-on: https://review.whamcloud.com/39129
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 socklnd: use need_resched() 28/39128/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:29 +0000 (19:24 -0400)]
LU-12678 socklnd: use need_resched()

Rather than using a counter to decide when to drop the lock and see if
we need to reshedule we can use need_resched(), which is a precise
test instead of a guess.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If13871a4a4c57ca87cbb1e22af85cb7fd24ab006
Reviewed-on: https://review.whamcloud.com/39128
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 o2iblnd: Use list_for_each_entry_safe 26/39126/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:31 +0000 (19:24 -0400)]
LU-12678 o2iblnd: Use list_for_each_entry_safe

Several loops use list_for_each_safe(), then call
list_entry() as first step.
These can be merged using list_for_each_entry_safe().

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I01ba77b98bd4863fa37cbbe6b4072ac2513e5821
Reviewed-on: https://review.whamcloud.com/39126
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 o2iblnd: use need_resched() 25/39125/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:32 +0000 (19:24 -0400)]
LU-12678 o2iblnd: use need_resched()

Rather than using a counter to decide when to drop
the lock and see if we need to reshedule we can
use need_resched(), which is a precise test instead of a guess.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I43a1d9d0963622953761f25e13bc4781c2b02be2
Reviewed-on: https://review.whamcloud.com/39125
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12768 o2iblnd: wait properly for fps->increasing. 24/39124/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:33 +0000 (19:24 -0400)]
LU-12768 o2iblnd: wait properly for fps->increasing.

If we need to allocate a new fmr_pool and another thread is currently
allocating one, we call schedule() and then try again.  This can spin,
consuming a CPU and wasting power.

Instead, use wait_var_event() and wake_up_var() to
wait for fps_increasing to be cleared.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I16210fc6904d7605f4671f5edfa2f490526c3a16
Reviewed-on: https://review.whamcloud.com/39124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 months agoLU-12678 o2iblnd: Use ib_mtu_int_to_enum() 23/39123/2
Mr NeilBrown [Sun, 7 Jun 2020 23:24:34 +0000 (19:24 -0400)]
LU-12678 o2iblnd: Use ib_mtu_int_to_enum()

Rather than bespoke code for converting an MTU into the enum,
use ib_mtu_int_to_enum().
This has slightly different behaviour for invalid values,
but those are caught when the parameter is set.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9ba4230fd1aad2c6c59233bac9558870cb1ffeda
Reviewed-on: https://review.whamcloud.com/39123
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>