Whamcloud - gitweb
fs/lustre-release.git
14 months agoLU-11304 misc: update all url links to whamcloud 94/33094/2
Wang Shilong [Thu, 30 Aug 2018 11:46:36 +0000 (19:46 +0800)]
LU-11304 misc: update all url links to whamcloud

Even old links could redirect to whamcloud automatically,
We'd better update to use new whamcloud links to
avoid any further confusions.

Test-parameters: trivial
Change-Id: Ida7161a062d822141bf0c1fdf20b2098a21ea9e7
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/33094
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
14 months agoLU-11279 lod: reset ostlist properly in lod_get_default_lov_striping 69/33069/7
Wang Shilong [Fri, 24 Aug 2018 06:49:06 +0000 (14:49 +0800)]
LU-11279 lod: reset ostlist properly in lod_get_default_lov_striping

Ostlist might be allocated previously, and we should
reset them properly, otherwise, it will pollute new
default setting and cause unexpected behavior.

Test-Parameters: trivial testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity,sanity,sanity
Change-Id: I9b7acb5f05ec4b371da99f68b9647f0b75cd7021
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/33069
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11006 lnet: fix show peer yaml tree with no peer 20/32320/2
Sonia Sharma [Tue, 8 May 2018 04:08:03 +0000 (21:08 -0700)]
LU-11006 lnet: fix show peer yaml tree with no peer

When no peer exists then the root created for the peer
yaml tree should be deleted. And lnetctl show peer
should not display anything.

Currently lnetctl peer show shows the root string "peer"
even when there is no peer. This create issues when
starting lnet using /etc/lnet.conf derived from the
existing configuration.

Change-Id: Ie310a49e60386b579b48898b032467b1bc112da9
Test-Parameters: trivial
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-on: https://review.whamcloud.com/32320
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-8130 libcfs: prepare rhashtable support 02/32102/27
James Simmons [Sun, 19 Aug 2018 13:25:52 +0000 (09:25 -0400)]
LU-8130 libcfs: prepare rhashtable support

Linux has a resizeable hashtable implementation in lib,
so we should use that instead of having one in libcfs.
In the process we gain lockless lookup which should be
a performance boost. All modern distributions Lustre
support has rhashtable support but a few pieces are
missing for systems running a 4.4 kernel. The other
target platforms have the full implementation we need.

Test-Parameters: trivial

Change-Id: I63d5b7dae9d52eed12dbefed8ca6062af33efd30
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32102
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-10654 lnet: lnetctl doesn't error out on mistyped options 65/31265/3
Sonia Sharma [Sun, 11 Feb 2018 18:02:18 +0000 (10:02 -0800)]
LU-10654 lnet: lnetctl doesn't error out on mistyped options

Running lnetctl command to add/delete peer/net/route should
error out with mistyped options.

This patch add the changes in lnetctl.c to make lnetctl
error out withmistyped options.

Change-Id: Ib8ae54bea919d6ff235b4ca3a23807a809f8962c
Test-Parameters: trivial
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-on: https://review.whamcloud.com/31265
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11273 lnet: update logging 44/33044/2
Amir Shehata [Tue, 21 Aug 2018 19:29:27 +0000 (12:29 -0700)]
LU-11273 lnet: update logging

Add the retry count when logging message sending/resending.
Make timed out responses visible on net error.
Log cases when a message is not resent

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0908d495c8ba54754fa77b0fc3b5df59317bb2e8
Reviewed-on: https://review.whamcloud.com/33044
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11195 lod: Mark comps cached on replay of layout change 04/32904/2
Ann Koehler [Mon, 30 Jul 2018 21:02:59 +0000 (16:02 -0500)]
LU-11195 lod: Mark comps cached on replay of layout change

Replay of a layout change request on a PFL file leaves the object
in an unexpected state: Some components can have llc_stripe set
but ldo_comp_cached is not set in the object. The next layout
change request on the same object will LBUG when it tries to free
the comp entries.

The fix is to set ldo_comp_cached on replay so subsequent layout
change requests will use the in memory components rather than
fetching them from disk.

Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I8eaee5614c7f2f6e6a3f2c51de93a65422a3122b
Reviewed-on: https://review.whamcloud.com/32904
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10818 obdecho: don't set ma_need in echo_attr_get_complex() 97/33097/2
Nikitas Angelinas [Fri, 31 Aug 2018 08:04:18 +0000 (11:04 +0300)]
LU-10818 obdecho: don't set ma_need in echo_attr_get_complex()

echo_attr_get_complex() copies ma_need to a local variable, masks
MA_* values other than MA_INODE if MA_INODE is set in ma_need,
and restores the saved value of ma_need before the function exits.
This does not seem to be useful, and triggers an assertion in
echo_big_lmm_get() when MA_LOV and/or MA_LMV is set in ma_need.

Signed-off-by: Nikitas Angelinas <nangelinas@cray.com>
Cray-bug-id: LUS-6252
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Change-Id: I3f5a01b57bdd83937f19fd1fa392b53f7b316455
Reviewed-on: https://review.whamcloud.com/33097
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11057 obd: check '-o network' and peer discovery conflict 62/32562/5
Sebastien Buisson [Fri, 25 May 2018 16:15:25 +0000 (01:15 +0900)]
LU-11057 obd: check '-o network' and peer discovery conflict

"-o network=net" client mount option is not taken into account
when LNet dynamic peer discovery is active.
Check if LNet dynamic peer discovery is active on local node. If it
is, return error if "-o network=net" option is specified.

This patch will have to be reverted when the incompatibility between
"-o network=net" client mount option and LNet dynamic peer discovery
is resolved.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I0520e58b22b7adecf797fbd351506c2f8712dc85
Reviewed-on: https://review.whamcloud.com/32562
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11272 lnet: router handling 43/33043/3
Amir Shehata [Tue, 21 Aug 2018 19:23:26 +0000 (12:23 -0700)]
LU-11272 lnet: router handling

Re-create the md and mdh if the router checker ping times out.
When re-transmitting a message do so even if the peer is marked down
to fulfill the message's retry quota.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7b2a1ec6602dac9a112f4d318b0512f68f923969
Reviewed-on: https://review.whamcloud.com/33043
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11271 lnd: conditionally set health status 42/33042/2
Amir Shehata [Tue, 21 Aug 2018 19:15:30 +0000 (12:15 -0700)]
LU-11271 lnd: conditionally set health status

For specific error scenarios a more accurate health status is set
per transmit. These shouldn't be overwritten in
kiblnd_txlist_done()

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6c3ad6660aa654d32e823b29ebe3aedb9fc5508e
Reviewed-on: https://review.whamcloud.com/33042
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11275 llite: check truncate race for DOM pages 87/33087/2
Mikhail Pershin [Tue, 28 Aug 2018 10:06:21 +0000 (13:06 +0300)]
LU-11275 llite: check truncate race for DOM pages

In ll_dom_finish_open() check vmpage mapping still
exists after locking and exit otherwise. This can
happen if page has been truncated concurrently.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib6ef551673a40ad99baaa9bd620225c65ce34454
Reviewed-on: https://review.whamcloud.com/33087
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-11282 osd-zfs: drop cache immediately 72/33072/7
Alex Zhuravlev [Fri, 24 Aug 2018 10:43:43 +0000 (14:43 +0400)]
LU-11282 osd-zfs: drop cache immediately

if this is requested via:
  lctl set_param osd-zfs.*.readcache_max_filesize=<bytes>

dropping cache at read is almost free, but may take few
cycles at write as we have to find corresponding dbufs.

Change-Id: I107fc1bf5a8d7655da4054048ff07d3dffa9d4d8
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33072
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-9855 obdclass: simplify md_stats code 22/32822/12
James Simmons [Fri, 10 Aug 2018 16:19:30 +0000 (12:19 -0400)]
LU-9855 obdclass: simplify md_stats code

The md_stats code is layered in many levels of macros that make
the code difficult to read as well as introduce undetected
errors. This peels away the macro wrappers by replacing it with
the function lprocfs_exp_count_increment() which doesn't care
about the order of the function pointers in struct md_ops. The
other change is macros used for initializing the counters. This
is done by replacing lprocfs_init_mps_stats() with very simple
handling in lprocfs_alloc_md_stats().

Change-Id: I036ce4518ffb08d53e2d27bcdea564a4c799181d
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32822
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11110 ofd: remove obdfilter.*.* symlinks in few releases 52/32752/5
Emoly Liu [Mon, 2 Jul 2018 06:53:00 +0000 (14:53 +0800)]
LU-11110 ofd: remove obdfilter.*.* symlinks in few releases

Add a #if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(2, 14, 53, 0) check
around the obdfilter.*.* symlinks creation code to keep them in
place for another few releases, so that old test scripts that use
them will not break, then remove them.

Change-Id: I703c7ec3af8434b0de8b7cbed19c2c32611f6b18
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32752
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11075 ldlm: correct logic in ldlm_prepare_lru_list() 60/32660/2
John L. Hammond [Thu, 7 Jun 2018 17:08:42 +0000 (12:08 -0500)]
LU-11075 ldlm: correct logic in ldlm_prepare_lru_list()

In ldlm_prepare_lru_list() fix an (x != a || x != b) type error and
correct a use after free.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4e34e531260295805c4461e7d8d98675400f1148
Reviewed-on: https://review.whamcloud.com/32660
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10365 tests: set proper paths headers for sanity test 400a 37/31737/8
James Simmons [Tue, 21 Aug 2018 18:39:57 +0000 (14:39 -0400)]
LU-10365 tests: set proper paths headers for sanity test 400a

For the case when /usr/include/lustre doesn't exist sanity test
400a attempts to use the lustre user land headers located in the
source tree. Some of the lustre user land headers are wrappers
around the UAPI headers so we need to include those paths as well.

A test move was done in the linux kernel that moved the UAPI headers
to their proper place. Errors were reported mainly due to
linux/types.h being missing. This could be the reason Ubuntu18 fails
the sanity 400a test.

Test-Parameters: trivial clientdistro=ubuntu1604 testlist=sanity envdefinitions=ONLY=400a

Change-Id: If17da7d9fc4cedb3b9c18feaafbee47d1f94d49b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/31737
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 obdclass: Fix style issues for lu_ref.c 81/33081/4
Arshad Hussain [Sat, 25 Aug 2018 22:00:07 +0000 (03:30 +0530)]
LU-6142 obdclass: Fix style issues for lu_ref.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/lu_ref.c

Change-Id: I8733fcac454685704b327219ba4afb096d3943c3
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33081
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 obdclass: Fix style issues for lustre_handles.c 80/33080/2
Arshad Hussain [Sat, 25 Aug 2018 21:39:24 +0000 (03:09 +0530)]
LU-6142 obdclass: Fix style issues for lustre_handles.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/lustre_handles.c

Change-Id: I6e6ad8c56e225dcdd3707bf5f3b233eda3f90320
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33080
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 obdclass: Fix style issues for lustre_peer.c 79/33079/3
Arshad Hussain [Sat, 25 Aug 2018 18:25:59 +0000 (23:55 +0530)]
LU-6142 obdclass: Fix style issues for lustre_peer.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/lustre_peer.c

Change-Id: I6cf95dfdd709974cae62626ac50a3507588f425d
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33079
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-11255 kernel: kernel update [SLES12 SP3 4.4.143-94.47] 65/33065/3
Jian Yu [Fri, 24 Aug 2018 06:36:34 +0000 (23:36 -0700)]
LU-11255 kernel: kernel update [SLES12 SP3 4.4.143-94.47]

Update SLES12 SP3 kernel to 4.4.143-94.47.

Test-Parameters: mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
clientdistro=sles12sp3 ossdistro=sles12sp3 mdsdistro=sles12sp3 \
testgroup=review-ldiskfs

Change-Id: I8b2c99c9a65149f1b149fa91351970034d6f7a47
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33065
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11125 ofd: decrease message level 85/32985/5
Mikhail Pershin [Mon, 13 Aug 2018 14:34:30 +0000 (17:34 +0300)]
LU-11125 ofd: decrease message level

The "destroys_in_progress already cleared" message
in ofd_create_hdl() may be result of high load on OST
server prior failover. It is not an error, so decrease
its level to D_HA from D_ERROR.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id5142672a61244a6362be3778d0769baafc87b86
Reviewed-on: https://review.whamcloud.com/32985
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
14 months agoLU-10509 mdd: don't set size attr for DOM file 08/33008/2
Mikhail Pershin [Wed, 15 Aug 2018 21:15:03 +0000 (00:15 +0300)]
LU-10509 mdd: don't set size attr for DOM file

When client does truncate it calls ll_md_setattr() followed by
ll_setattr_ost() to set size on OSTs. With DOM file that causes
setattr on MDT first including size then PUNCH RPC on the same
object. That was considered as non-optimized situation and
LU-11033 is intended to improve it, but with ZFS there is
check in OSD which does no truncate if size is the same already.
Therefore real file blocks are not truncated actually so sparse
write beyond the end of file will get old data in hole instead of
zeroes.

Quick patch checks if mdd_attr_set() is going to set SIZE attr for
DOM file and clear LA_SIZE bit, assuming there will be truncate.

Complete solution for this will be implemented under LU-11033

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I47873dccf4270e5f0338f7b6696aa5969cfb9444
Reviewed-on: https://review.whamcloud.com/33008
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10018 protocol: MDT as a statfs proxy 36/29136/91
Alex Zhuravlev [Thu, 21 Sep 2017 15:24:18 +0000 (18:24 +0300)]
LU-10018 protocol: MDT as a statfs proxy

MDT can act as a proxy for statfs data. this should
make df faster (RTT vs RTT*(#MDTs+1)) and enable
idling connections so that clients don't connect to
each OST just to report statfs data. the protocol
has been changing slightly to let MDT differentiate
self and aggregated statfs.

also, obd_statfs has got a new field "granted" where
OST reports how much space has been granted to the
requesting MDT so that space can be added to available
space.

client's NID is used to distribute MDS_STATFS among
MDTS.

Change-Id: I59e03cb5abf809ae8820f874ec51dd2b74e1806c
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/29136
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-8708 osc: enable/disable OSC grant shrink 03/23203/18
Bobi Jam [Mon, 17 Oct 2016 09:50:41 +0000 (17:50 +0800)]
LU-8708 osc: enable/disable OSC grant shrink

Add an OSC proc interface to enable/disable client's grant shrink
feature.

lctl get_param osc.*.grant_shrink
lctl set_param osc.*.grant_shrink={0,1}

Change-Id: I7974b3bf1c4f9c294dd0d4871d09b1a2e45a8d78
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/23203
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11212 lod: preserve mirror ID on mirror extension 38/32938/7
Bobi Jam [Tue, 12 Jun 2018 11:28:16 +0000 (19:28 +0800)]
LU-11212 lod: preserve mirror ID on mirror extension

When merging/expanding existing mirrors of a FLR file, we need keep
its existing mirror's mirror ID.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: If139076c37c33bb1a330e1a5e997f8f56015fd9a
Reviewed-on: https://review.whamcloud.com/32938
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11238 lod: refine obj avoid collect for FLR 95/32995/4
Bobi Jam [Tue, 14 Aug 2018 03:12:12 +0000 (11:12 +0800)]
LU-11238 lod: refine obj avoid collect for FLR

When a FLR file is being created, the MDS tries to allocate objects
for the first components of all mirrors, and in this decalre phase,
the objects for their component has been allocated, but the
component's ID and init flag has not been set until the exec phase,
lod_create()->lod_striped_create(), so lod_collect_avoidance() should
take heed of this scenario.

This patch also addes some debug messages.

Test-Parameters: testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I16ef2da44f6db06a8e0bc67ae2646cdc3ff3bb63
Reviewed-on: https://review.whamcloud.com/32995
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11259 test: correct fail_loc names in replay-{single,dual} 15/33015/2
John L. Hammond [Thu, 16 Aug 2018 14:37:23 +0000 (09:37 -0500)]
LU-11259 test: correct fail_loc names in replay-{single,dual}

Some comments in replay-single and replay-dual confusingly referred to
OBD_FAIL_OUT_UPDATE_NET_REP as OBD_FAIL_OBJ_UPDATE_NET_REP or
OBD_FAIL_UPDATE_OBJ_NET_REP. Correct these.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib724e8151ba0ea34a5dacf2f148673a52dc37824
Reviewed-on: https://review.whamcloud.com/33015
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
14 months agoLU-8066 tests: don't access /proc/sys/lnet/debug directly 05/33005/2
Andreas Dilger [Wed, 15 Aug 2018 17:39:15 +0000 (11:39 -0600)]
LU-8066 tests: don't access /proc/sys/lnet/debug directly

In replay-single test_70e use "lctl set_param" to set the debug mask
rather than writing into the /proc/sys/lnet/debug file directly, since
this tunable moved to sysfs in commit v2_10_51_0-12-g7092309f32.

Clean up the test code style in test_70e as well.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65c2bd9643fc6fc54a5de7b6404d316c0ff12537
Reviewed-on: https://review.whamcloud.com/33005
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11240 gnilnd: Replace KGNILND_BUILD_REV 90/32990/2
Chris Horn [Tue, 5 Jun 2018 19:32:32 +0000 (14:32 -0500)]
LU-11240 gnilnd: Replace KGNILND_BUILD_REV

The current format of the gnilnd version string causes a compilation
error. Since gnilnd doesn't really need its own version string we just
replace it with LUSTRE_VERSION_STRING.

Cray-bug-id: LUS-6072
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I6f45df2566853a6f4c2078cf72c7eac7a52f3fad
Reviewed-on: https://review.whamcloud.com/32990
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chuck Fossen <chuckf@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11135 mdt: LASSERT(lu_object_exists(o)) fails 03/32803/4
Andriy Skulysh [Thu, 3 May 2018 10:07:22 +0000 (13:07 +0300)]
LU-11135 mdt: LASSERT(lu_object_exists(o)) fails

mdt_object_find() can return a vaild nonexisting object.
It's return value needs to be checked additionaly if exists.

Change-Id: Ib1f5bd5289a69e29437db520706591929bf55830
Cray-bug-id: LUS-6192
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/32803
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11056 lwp: fix lwp reconnection issue 36/32536/6
Hongchao Zhang [Thu, 24 May 2018 20:09:27 +0000 (16:09 -0400)]
LU-11056 lwp: fix lwp reconnection issue

After the OST or MDT was restarted, the lwp reconnection can be
failed for -EALREADY because the connect count in the connecttion
request is less then the value saved in the corresponding export
at MDT0000, which could cause the system hang.

The patch also changes lustre_lwp_connect to use OBD_CONNECT_MDS_MDS
flag only when the connection is between MDTs.

Change-Id: I9ae7b4faadc65fdaa78458a06315b1739d144feb
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/32536
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10855 llog: use llog_common_cat_ops 12/31812/3
John L. Hammond [Tue, 27 Mar 2018 17:17:10 +0000 (12:17 -0500)]
LU-10855 llog: use llog_common_cat_ops

Remove changelog_orig_logops, hsm_actions_logops, and
osp_mds_ost_orig_logops, replacing each with llog_common_cat_ops.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ia19337350452f9793b3ea9a56343ef3a065c1f83
Reviewed-on: https://review.whamcloud.com/31812
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10824 llite: don't use ll_mnt to get fstype name 25/33025/3
James Simmons [Sat, 18 Aug 2018 16:07:12 +0000 (12:07 -0400)]
LU-10824 llite: don't use ll_mnt to get fstype name

Originally lustre would report using the fstype proc file either
'lustre' or 'llite'. This required us to query struct super_block
but its been a very long time since that is the case. This also
removes a direct use of ll_mnt. The fix is simply report 'lustre'.

Test-Parameters: trivial

Change-Id: Ia766c8e0a027e58a48de8fa6e2756238e20312b2
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33025
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 obdclass: Fix style issues for statfs_pack.c 81/32981/2
Arshad Hussain [Sun, 12 Aug 2018 04:18:54 +0000 (09:48 +0530)]
LU-6142 obdclass: Fix style issues for statfs_pack.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/statfs_pack.c

Change-Id: I7a34dd87875ab049c3339022f3153fb07937021e
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32981
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 osd-ldiskfs: Fix style issues for osd_handler.c 18/32818/5
Arshad Hussain [Sat, 14 Jul 2018 15:27:17 +0000 (20:57 +0530)]
LU-6142 osd-ldiskfs: Fix style issues for osd_handler.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_handler.c

Change-Id: Ifd6468acc75b59a4324385c68af1175a74a3c312
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32818
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-11266 build: update changelog for Ubuntu 20/33020/3
Minh Diep [Fri, 17 Aug 2018 19:10:08 +0000 (12:10 -0700)]
LU-11266 build: update changelog for Ubuntu

Record the version that we are building

Test-Parameters: trivial

Change-Id: Ib1c2e74774d8a6caa6c3f70814affb53cf8cd22e
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33020
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11009 test: add version check to test_102 40/32340/7
Wei Liu [Wed, 9 May 2018 19:53:11 +0000 (12:53 -0700)]
LU-11009 test: add version check to test_102

Skip test_102 if server is equal or less than 2.9.53

Test-Parameters:trivial testlist=conf-sanity envdefinitions=ONLY=102 serverjob=lustre-b2_9 serverbuildno=22
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I1964a7a5df8b910652b2fe774703d7b62f953e95
Reviewed-on: https://review.whamcloud.com/32340
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11040 utils: improve mount usage/man page 81/32481/6
Andreas Dilger [Wed, 25 Jul 2018 07:15:40 +0000 (01:15 -0600)]
LU-11040 utils: improve mount usage/man page

Improve the description of the mount.lustre.8 man page and usage:
- provide separate SYNOPSYS for client and server mount commands
- move "acl" option out of general options into server-only options,
  since client option was removed and ACLs are only controlled by MDS
- correct "CLIENT OPTIONS" section to be named "SERVER OPTIONS"
- add checksum, lruresize, lazystatfs, 32bitapi, user_fid2path usage
- mark the default values of the options in the usage message

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I28fe0f13d363e0a26ffcbc1ba9923e4fd35804f0
Reviewed-on: https://review.whamcloud.com/32481
Tested-by: Jenkins
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11096 osd: wrap new blk integrity stuff 25/32725/9
Alex Zhuravlev [Thu, 9 Aug 2018 22:26:35 +0000 (18:26 -0400)]
LU-11096 osd: wrap new blk integrity stuff

to be able to build Lustre against kernels with no blk integrity.

Change-Id: I050020e94524f4519fdf46a22f0d847979754291
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32725
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-8066 lov: fix lov.*.stripeoffset printing 19/33019/3
Andreas Dilger [Fri, 17 Aug 2018 18:48:44 +0000 (12:48 -0600)]
LU-8066 lov: fix lov.*.stripeoffset printing

The move of lov.*.stripeoffset from /proc to /sys in commit 3c900918
reverted the printing of stripeoffset from a signed value to an
unsigned value, which is broken for the common value of "-1".  This
was previously fixed in LU-9611 commit f93276d9.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib61305ddbf902dd74ac0e16c0c2fe6920052ddf4
Reviewed-on: https://review.whamcloud.com/33019
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11215 tests: replace "large_xattr" with "ea_inode" 12/33012/2
Li Dongyang [Thu, 16 Aug 2018 06:26:12 +0000 (16:26 +1000)]
LU-11215 tests: replace "large_xattr" with "ea_inode"

Change the test scripts over to using the "ea_inode" name, since
this is what the upstream e2fsprogs is using.  The "large_xattr"
feature name was only ever used in the Lustre-patched e2fsprogs.

Don't try to turn off "ea_inode" feature on the targets anymore,
it's not supported by upstream e2fsprogs.

e2fsprogs commit: 5b72578279fe2470e682692a15d70a43d9289e0f

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I83bd303827fa28050d1d6d2416b2d630dc94ec12
Reviewed-on: https://review.whamcloud.com/33012
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT 06/33006/3
John L. Hammond [Wed, 15 Aug 2018 20:06:16 +0000 (15:06 -0500)]
LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT

This test continues to fail at a low rate so disable it.

Test-Parameters: trivial testlist=lustre-rsync-test
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8fe4d039e8edd0552e56ee9451cc05f08cb34c8d
Reviewed-on: https://review.whamcloud.com/33006
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11244 build: apply IB_OPTIONS to debian rules 96/32996/2
Jinshan Xiong [Tue, 14 Aug 2018 03:33:33 +0000 (20:33 -0700)]
LU-11244 build: apply IB_OPTIONS to debian rules

IB_OPTIONS should be honored when making debian package.

Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Change-Id: Ibc16a5428d47f072499c39a62ea457c922ae7352
Reviewed-on: https://review.whamcloud.com/32996
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Martin Schroeder <martin.h.schroeder@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11227 lod: lod_sync: don't attempt sync to inactive targets 64/32964/5
Robin Humble [Thu, 9 Aug 2018 05:33:04 +0000 (15:33 +1000)]
LU-11227 lod: lod_sync: don't attempt sync to inactive targets

chgrp on a client triggers lod_sync() which in turn loops over OST/MDT
targets with dt_sync(). dt_sync() fails with -ENOTCONN when targets
have been deactivated (ie. set to active=0). The client retries
infinitely causing the client process to hang and considerably MDS
network traffic, load, and disk i/o.

the fix is to not attempt dt_sync() to ost/mdt targets that have been
deactivated and also (because of possible races) to ignore connection
errors.

tested with Lustre 2.10.4.

Signed-off-by: Robin Humble <plaguedbypenguins@gmail.com>
Change-Id: I617509cf7944541489f4fd9762c233b771132165
Reviewed-on: https://review.whamcloud.com/32964
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11226 flr: mirror resync regression 68/32968/5
Bobi Jam [Thu, 9 Aug 2018 06:35:49 +0000 (14:35 +0800)]
LU-11226 flr: mirror resync regression

There is a glitch in the lfs mirror resync tool in commit
0e5c12ac29a9622e8ca05d5e39cd5e2a721ace93, resync write needs to
restricted to the component's extent.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ifbd3f16b2f621407b31c7fe37ce9745de48fcc99
Reviewed-on: https://review.whamcloud.com/32968
Tested-by: Jenkins
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11146 lustre: fix setstripe for specific osts upon dir 14/32814/16
Wang Shilong [Wed, 11 Jul 2018 14:11:47 +0000 (22:11 +0800)]
LU-11146 lustre: fix setstripe for specific osts upon dir

LOV_USER_MAGIC_SPECIFIC function is broken and it
was not available for setting directory.

1)llite doesn't handle LOV_USER_MAGIC_SPECIFIC case
properly for dir {set,get}_stripe, and ioctl
LL_IOC_LOV_SETSTRIPE did not alloc enough buf,
copy ost lists from userspace.

2)lod_get_default_lov_striping() did not handle
LOV_USER_MAGIC_SPECIFIC type that newly created
files/dir won't inherit parent setting well.

3)there is not any case to cover lfs setstripe
'-o' interface which make it hard to figure out
when this function was broken.

Change-Id: Icc2ee60a474e5e565db12b35a9a38fde65b05bbd
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32814
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-8066 llite: move /proc/fs/lustre/llite/uuid to sysfs 01/32501/9
James Simmons [Sun, 29 Jul 2018 14:34:19 +0000 (10:34 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/uuid to sysfs

Move uuid file from /proc/fs/lustre/llite/*
to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: ec55a6299990efa969dfc00d95c72444ff1e3461

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: I2dc13c248879f554f9f7ed6dc62a6772a59f6f35
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32501
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
14 months agoLU-8215 tests: sanity-benchmark/iozone should wait for space recovery 99/20499/2
Alex Zhuravlev [Mon, 30 May 2016 10:45:51 +0000 (14:45 +0400)]
LU-8215 tests: sanity-benchmark/iozone should wait for space recovery

otherwise it may fail due to a transient state where the space confsumed
by the previous run hasn't recovered yet. this happens to tiny filesystems
used in local setups.

Change-Id: I04b3ce096621583629277c1e52c64a1551bc8ace
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/20499
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11201 lfsck: check linkea entry validity 58/32958/2
Lai Siyao [Sun, 22 Jul 2018 21:45:23 +0000 (05:45 +0800)]
LU-11201 lfsck: check linkea entry validity

Invalid linkea data may lead to dead loop in linkea iteration, check
linkea entry validity on unpack, and if entry is not unpacked, check
entry length validity.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-lfsck
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8e1890ed64fab38b85149ebbfecce04caaf41e17
Reviewed-on: https://review.whamcloud.com/32958
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11154 llite: use proper flags for FS_IOC_{FSSET,FSGET}XATTR 28/32828/6
Wang Shilong [Wed, 18 Jul 2018 08:30:28 +0000 (16:30 +0800)]
LU-11154 llite: use proper flags for FS_IOC_{FSSET,FSGET}XATTR

Two problems addressed by this patch:

1)struct fsxattr fsx_xflags has its own flags definition
like FS_XFLAG_XXX, we should use proper convert macro for
it, here we used wrong constant flag for project inherit flag.

2)FS_XFLAG_PROJINHERIT is not a valid vfs inode flag, looking
at current linux codes, local filesystem set project inherit
flag on its private flags, we should do similar thing to Lustre

Test-Parameters: trivial testlist=sanity-quota,sanity-quota,sanity-quota
Change-Id: I453db8ed074e8008f0ec145c726d7577121422e6
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32828
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-9120 lnet: LNet Health/Resiliency Feature
Oleg Drokin [Tue, 21 Aug 2018 16:15:26 +0000 (12:15 -0400)]
LU-9120 lnet: LNet Health/Resiliency Feature

The LNet Health/Resiliency feature adds the ability for LNet
to try out different interfaces available to it if message
sending fails. It maintains the health of each remote and local
interfaces and selects the best interface for sending from and best
remote interface to send to.

Merge commit '958ef71f33fa925e6657f9902702cd3677e15ec9'

Change-Id: I9ca740654c48d642fe130f98a60c5c59b9b4ebe1
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
14 months agoLU-10686 tests: stop running sanity-pfl test 9 45/32945/2
James Nunez [Mon, 6 Aug 2018 21:26:25 +0000 (15:26 -0600)]
LU-10686 tests: stop running sanity-pfl test 9

sanity-pfl test 9 consistently fails when run on a Lustre
file system with a single MDS. We need to add test 9 to
the ALWAYS_EXCCEPT list and, thus, stop running the test
until a fix for the underlying problem can be found.

Test-Parameters: trivial mdscount=1 mdtcount=1 testlist=sanity-pfl
Test-Parameters: mdscount=2 mdtcount=2 testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ife4b3c044e2777bb9b9010e0be7c00549a683fdc
Reviewed-on: https://review.whamcloud.com/32945
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments 39/32939/5
James Simmons [Mon, 6 Aug 2018 17:56:55 +0000 (13:56 -0400)]
LU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments

For the linux kernel their exist two ways to initialize a
struct timer_list. One method is with setup_timer() and the other is
with the DEFINE_TIMER macro. For earlier kernels both methods employed
callbacked with a argument of the type unsigned long. In kernels 4.15+
both methods of initialization use struct timer_list pointer for its
callback argument. During the 4.14 development phase we have
setup_timer() using struct timer_list as an argument for its callback
but DEFINE_TIMER was still using unsigned long. Additionally when
DEFINE_TIMER did move to using struct timer_list it reduced the number
of arguments to the macro. This patch handles the 4.14 kernel state of
development for the timer API.

Test-Parameters: trivial

Change-Id: I1c509838153328ed4bbdfa50468a396e13037d50
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32939
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11014 mdc: remove obsolete intent opcodes 61/32361/6
John L. Hammond [Fri, 11 May 2018 17:04:02 +0000 (12:04 -0500)]
LU-11014 mdc: remove obsolete intent opcodes

In enum ldlm_intent_flags, remove the obsolete constants IT_UNLINK,
IT_TRUNC, IT_EXEC, IT_PIN, IT_SETXATTR. Remove any handling code for
these opcodes.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I66f20e4c881cb77a481805a148a33f1c2daa5f0c
Reviewed-on: https://review.whamcloud.com/32361
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-8066 lod: migrate from proc to sysfs 98/32198/6
James Simmons [Sat, 28 Jul 2018 15:54:38 +0000 (11:54 -0400)]
LU-8066 lod: migrate from proc to sysfs

Move the lod module from using proc for most single value files
to sysfs. Create the default attrs for dt_devices which can be
used for other server side devices.

Change-Id: I734f01ef0d9f0c18efc141c835e4cf8ad2365250
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32198
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
14 months agoLU-11121 mdt: take discard lock at cleanup stage 30/29930/21
Mikhal Pershin [Fri, 3 Nov 2017 09:38:04 +0000 (12:38 +0300)]
LU-11121 mdt: take discard lock at cleanup stage

Call mdt_dom_check_and_discard() after mdt_object_unlock() to
avoid possible deadlock if some third lock is conflicting with
both like in the scenario below:
 thread1: mdt_object_lock() with some bits
 thread2: take conflicting lock and wait
 thread1: mdt_dom_check_and_discard() with bits conflicting
          with thread2 causes deadlock.

Patch enables dom layout in racer to test it on regular basis
Another minor update uses 'trap' in related tests.

Test-Parameters: mdssizegb=20 mdtcount=1 mdscount=1 testlist=sanity-dom,dom-performance,racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I63bedabb4a82cfa2f01e126d35dc8c2a89d64f56
Reviewed-on: https://review.whamcloud.com/29930
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11175 osc: serialize access to idle_timeout vs cleanup 83/32883/4
Alex Zhuravlev [Thu, 26 Jul 2018 07:52:38 +0000 (11:52 +0400)]
LU-11175 osc: serialize access to idle_timeout vs cleanup

use LPROCFS_CLIMP_CHECK() and LPROCFS_CLIMP_EXIT() as cl_import
can disappear due to umount.

Change-Id: I2a067f416691f39cde13cfae8f64ed5769d92041
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32883
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-6142 obdclass: Fix style issues for acl.c 51/32851/5
Arshad Hussain [Sun, 22 Jul 2018 03:00:27 +0000 (08:30 +0530)]
LU-6142 obdclass: Fix style issues for acl.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/acl.c

Change-Id: I00d4535123fb6677863bfd10937df5039ee7a339
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32851
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 osd-ldiskfs: Fix style issues for osd_iam_lfix.c 49/32849/6
Arshad Hussain [Sat, 21 Jul 2018 19:35:19 +0000 (01:05 +0530)]
LU-6142 osd-ldiskfs: Fix style issues for osd_iam_lfix.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_iam_lfix.c

Change-Id: I9d32231e397689dd3806fecf106bc1ce2f1439a4
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32849
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-11116 llog: error handling cleanup 80/32780/2
Alexander Boyko [Wed, 4 Jul 2018 10:41:52 +0000 (06:41 -0400)]
LU-11116 llog: error handling cleanup

llog_cat_new_log() needs some error handling cleanup.
Save and restore thread lgi_cookie when using, to prevent
conflict/corruptions with llog_process_thread().

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I12fdfe1a72e77cfeb5ad464b8582db68a7bcfe16
Cray-bug-id: LUS-4780
Reviewed-on: https://review.whamcloud.com/32780
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11224 obd: use correct ip_compute_csum() version 53/32953/2
James Simmons [Tue, 7 Aug 2018 17:20:54 +0000 (13:20 -0400)]
LU-11224 obd: use correct ip_compute_csum() version

The linux kernel provides a generic platform independent version
of ip_compute_csum() as well as platform optimized versions. Some
platforms will disable the generic platform version in favor of
the optimized one. If the generic version is disabled and if the
checksum.h header from asm-generic is used then we will end up
with a undefined symbol error when loading the obdclass module.
The solution is to use the platform specific checksum.h header
that will handle using the generic or optimized version for us.
As a bounus we get better performance with the right kernel
configuration.

Test-Parameters: trivial

Change-Id: Ia0cfc9f4363bb61d5e381790655423ff5f91d9be
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32953
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-9325 ptlrpc: replace simple_strtol with kstrtol 85/32785/8
James Simmons [Thu, 5 Jul 2018 03:56:02 +0000 (23:56 -0400)]
LU-9325 ptlrpc: replace simple_strtol with kstrtol

Eventually simple_strtol() will be removed so replace its use in
the ptlrpc with kstrtoXXX() class of functions.

Change-Id: I41b44c5dc329832a901c1772a9ba0608df30282a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32785
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
14 months agoLU-9120 lnet: LNet Health/Resiliency Feature 23/33023/1
Amir Shehata [Sat, 18 Aug 2018 01:23:53 +0000 (18:23 -0700)]
LU-9120 lnet: LNet Health/Resiliency Feature

The LNet Health/Resiliency feature adds the ability for LNet
to try out different interfaces available to it if message
sending fails. It maintains the health of each remote and local
interfaces and selects the best interface for sending from and best
remote interface to send to.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcbbc34f8acfc3afb36ffe73eb27d69c147d02ce

14 months agoLU-9120 lnet: health error simulation 51/32951/13
Amir Shehata [Sun, 5 Aug 2018 21:37:29 +0000 (14:37 -0700)]
LU-9120 lnet: health error simulation

Modified the error simulation code to simulate health errors for
testing purposes. The specific error can be set. If multiple
errors are configured then one at random is chosen from the set.

EX:
lctl net_drop_add -s *@tcp -d *@tcp -m GET -i 1 -e local_interrupt

The -e can be repeated multiple times to specify different
errors to simulate. The available set are
local_interrupt
local_dropped
local_aborted
local_no_route
local_error
local_timeout
remote_error
remote_dropped
remote_timeout
network_timeout
random

a -n, "--random", has been added to randomize error generation for
drop rules. This will rely an interval value provided via -i. This
will generate a random number no bigger than interval. If the number
is smaller than half of the interval then the rule isn't matched,
otherwise it is.

The purpose of this is because drop matching can happen multiple
times in the path of sending the message, and using time based
or rate will not result in even error generation across the
multiple calls.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If070e29f68c3de10100a9d5eaa49d10cdb76a59a
Reviewed-on: https://review.whamcloud.com/32951
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
14 months agoLU-9120 lnet: print recovery queues content 50/32950/12
Amir Shehata [Sun, 5 Aug 2018 21:25:47 +0000 (14:25 -0700)]
LU-9120 lnet: print recovery queues content

Add commands to lnetctl to print recovery queues content from
user space.

Associated code to handle the IOCTL is added in LNet module.

for local NIs:
lnetctl debug recovery --local

for peer NIs:
lnetctl debug recovery --peer

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id136d506772d95381fd5d8346d772177442a84fb
Reviewed-on: https://review.whamcloud.com/32950
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-9120 lnet: add global health statistics 49/32949/12
Amir Shehata [Sun, 5 Aug 2018 21:16:49 +0000 (14:16 -0700)]
LU-9120 lnet: add global health statistics

Added global health statistics

Print that from lnetctl.

lnetctl stats show

lnet_selftest passes the statistics block over the wire. This,
unfortunately, creates an unnecessary backwards compatibility link
for lnet_selftest, which shouldn't be there. This patch breaks
this backwards compatibility, which means lnet_selftest will
not work with older selftest modules.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4a171c4f3cf13a1e8ab0d607d3b328352f727380
Reviewed-on: https://review.whamcloud.com/32949
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
14 months agoLU-9120 lnet: set health value from user space 63/32863/14
Amir Shehata [Tue, 24 Jul 2018 00:11:07 +0000 (17:11 -0700)]
LU-9120 lnet: set health value from user space

Add commands to lnetctl to set the health value.

for local NIs:
 lnetctl net set --nid <nid> --health <value>

for peer NIs:
 lnetctl peer set --nid <nid> --health <value>

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I06e1238df54c94bcfecadd84fbaa30cc1ce4dd68
Reviewed-on: https://review.whamcloud.com/32863
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-9120 lnet: show peer ni health stats 83/32783/15
Amir Shehata [Wed, 4 Jul 2018 18:49:38 +0000 (11:49 -0700)]
LU-9120 lnet: show peer ni health stats

Added another section in the peer ni show output for the health
statistics.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7ab3a9343972622d90a984c4f8c0b096b15ecbdc
Reviewed-on: https://review.whamcloud.com/32783
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: show local ni health stats 82/32782/15
Amir Shehata [Wed, 4 Jul 2018 17:42:58 +0000 (10:42 -0700)]
LU-9120 lnet: show local ni health stats

Added another section in the ni show output for the health
statistics.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id57013e510cf1fb4befdd7a4c18af28d1f995ce2
Reviewed-on: https://review.whamcloud.com/32782
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: set health sensitivity from lnetctl 79/32779/16
Amir Shehata [Wed, 4 Jul 2018 00:51:29 +0000 (17:51 -0700)]
LU-9120 lnet: set health sensitivity from lnetctl

Added an lnetctl command to set the health sensitivity
from userspace.

lnetctl set health_sensitivity {>0}

0 - turn off health evaluation
>0 - sensitivity value not more than 1000

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic9289b06c5c9285a69c1819a33b79e954319a01e
Reviewed-on: https://review.whamcloud.com/32779
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: set transaction timeout from lnetctl 78/32778/16
Amir Shehata [Wed, 4 Jul 2018 00:24:31 +0000 (17:24 -0700)]
LU-9120 lnet: set transaction timeout from lnetctl

Added an lnetctl command to set the transaction timeout
from userspace.

lnetctl set transaction_timeout {>0}

>0 - timeout in seconds.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I71274e82fd46bff8017e36c37de449d8a7639ec6
Reviewed-on: https://review.whamcloud.com/32778
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: set retry count from lnetctl 77/32777/16
Amir Shehata [Wed, 4 Jul 2018 00:04:16 +0000 (17:04 -0700)]
LU-9120 lnet: set retry count from lnetctl

Added an lnetctl command to set the retry_count from userspace.

lnetctl set retry_count [0|>0]

0 - turns off retries in the system
>0 - number of retries.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2fd5c88a91590195cfdad52e6d177619ccbbc840
Reviewed-on: https://review.whamcloud.com/32777
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: remove obsolete health functions 62/32862/14
Amir Shehata [Tue, 17 Jul 2018 18:58:22 +0000 (11:58 -0700)]
LU-9120 lnet: remove obsolete health functions

Removed obsolete health functions that were originally added
during the Multi-Rail project. Some assumptions were made about
the health implementation back then, that are no longer true.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4d4f47a03541d58da6807d9c2b786ecd868b50b0
Reviewed-on: https://review.whamcloud.com/32862
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-9120 lnet: Add ioctl to get health stats 76/32776/16
Amir Shehata [Tue, 3 Jul 2018 23:27:10 +0000 (16:27 -0700)]
LU-9120 lnet: Add ioctl to get health stats

At the time of this patch the sysfs statistics features is
still in development. Therefore, using ioctl to get the stats
from LNet.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia216484f9e6ee062c766c1043f456e38a27e4d39
Reviewed-on: https://review.whamcloud.com/32776
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-9120 lnet: add health statistics 75/32775/15
Amir Shehata [Tue, 3 Jul 2018 01:24:44 +0000 (18:24 -0700)]
LU-9120 lnet: add health statistics

Add a health statistics block for each local and peer NI.
These statistics will be incremented when processing errors reported
by lnet_finalize()

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia1ec4d5de50c04392605e94ac2f81adef78fc17c
Reviewed-on: https://review.whamcloud.com/32775
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: reset health value 73/32773/15
Amir Shehata [Mon, 2 Jul 2018 21:36:50 +0000 (14:36 -0700)]
LU-9120 lnet: reset health value

Added an IOCTL to set the local or peer ni health value.
This would be useful in debugging where we can test the selection
algorithm and recovery mechanism by reducing the health of an
interface.

If the value specified is -1 then reset the health value to maximum.
This is useful to reset the system once a network issue has been
resolved. There would be no need to wait for the interface to go to
fully healthy on its own. It might be desirable to shortcut the
process.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I45a5844bbaa72f769e37a39526773ef4c71118c0
Reviewed-on: https://review.whamcloud.com/32773
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
14 months agoLU-9120 lnet: handle fatal device error 72/32772/15
Amir Shehata [Fri, 29 Jun 2018 23:54:38 +0000 (16:54 -0700)]
LU-9120 lnet: handle fatal device error

The o2iblnd can receive device status on the QP event handler.
There are three in specific that are being handled in this patch:
IB_EVENT_DEVICE_FATAL
IB_EVENT_PORT_ERR
IB_EVENT_PORT_ACTIVE
For DEVICE_FATAL and PORT_ERR the NI associated with the QP is set
in fatal error mode. This NI will no longer be selected when sending
messages. When PORT_ACTIVE is received the NI associated with the QP
has the fatal error cleared and future messages can use that NI.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I282aa463927f489c46e4e45040e93478c9823a37
Reviewed-on: https://review.whamcloud.com/32772
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: remove duplicate timeout mechanism 92/32992/8
Amir Shehata [Mon, 13 Aug 2018 23:19:00 +0000 (16:19 -0700)]
LU-9120 lnet: remove duplicate timeout mechanism

Remove the duplicate GET/PUT timeout mechanism currently implemented
for discovery, as it has been replaced by a more generic timeout
mechanism for all GET/PUT messages.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I28efae8c1fca6fc07fcaad4bfacf123b00ff887d
Reviewed-on: https://review.whamcloud.com/32992
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: timeout delayed REPLYs and ACKs 71/32771/15
Amir Shehata [Fri, 29 Jun 2018 01:02:42 +0000 (18:02 -0700)]
LU-9120 lnet: timeout delayed REPLYs and ACKs

When a GET or a PUT which require an ACK are sent, add a response
tracker block on a percpt queue. When the REPLY/ACK are received
then remove the block from the percpt queue. The monitor thread
will wake up periodically to check if any of the blocks have
expired and if so, it will send a timeout event to the ULP and
flag the MD as stale, then unlink.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia219fca5a578d625819b9f9c8ee2b3aa050dce80
Reviewed-on: https://review.whamcloud.com/32771
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
14 months agoLU-9120 lnet: sysfs functions for module params 61/32861/14
Amir Shehata [Fri, 20 Jul 2018 23:13:55 +0000 (16:13 -0700)]
LU-9120 lnet: sysfs functions for module params

Allow transaction timeout and retry count module parameters to be
set and shown via sysfs.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ica3819f9343a4b45cb0ae322f85f936230fa8138
Reviewed-on: https://review.whamcloud.com/32861
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: calculate the lnd timeout 70/32770/15
Amir Shehata [Tue, 26 Jun 2018 03:59:07 +0000 (20:59 -0700)]
LU-9120 lnet: calculate the lnd timeout

Calculate the LND timeout based on the transaction timeout
and the retry count. Both of these are user defined values. Whenever
they are set the lnd timeout is calculated. The LNDs use these
timeouts instead of the LND timeout module parameter.

Retry count can be set to 0, which means no retries. In that case the
LND timeout will default to 5 seconds, which is the same as the
default transaction timeout.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I5a37caa2b69df155211864735ba8b275fc2d34bb
Reviewed-on: https://review.whamcloud.com/32770
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: add retry count 69/32769/15
Amir Shehata [Tue, 26 Jun 2018 02:16:46 +0000 (19:16 -0700)]
LU-9120 lnet: add retry count

Added a module parameter to define the number of retries on a
message. It defaults to 0, which means no retries will be attempted.
Each message will keep track of the number of times it has been
retransmitted. When queuing it on the resend queue, the retry count
will be checked and if it's exceeded, then the message will be
finalized.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3a622c2128ff89f22b0f8bff02f862163c9d007e
Reviewed-on: https://review.whamcloud.com/32769
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-9120 lnet: handle remote errors in LNet 67/32767/15
Amir Shehata [Fri, 22 Jun 2018 17:42:23 +0000 (10:42 -0700)]
LU-9120 lnet: handle remote errors in LNet

Add health value in the peer NI structure. Decrement the
value whenever there is an error sending to the peer.
Modify the selection algorithm to look at the peer NI health
value when selecting the best peer NI to send to.

Put the peer NI on the recovery queue whenever there is
an error sending to it. Attempt only to resend on REMOTE
DROPPED since we're sure the message was never received by
the peer. For other errors finalize the message.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcb41b3fb538e76b973bcb10fcd07638c118acb9
Reviewed-on: https://review.whamcloud.com/32767
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
14 months agoLU-9120 lnet: handle socklnd tx failure 66/32766/15
Amir Shehata [Fri, 22 Jun 2018 04:06:56 +0000 (21:06 -0700)]
LU-9120 lnet: handle socklnd tx failure

Update the socklnd to propagate the health status up to
LNet for handling.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iec090ade478acafb976aef7f6eaf5315ccd1fb67
Reviewed-on: https://review.whamcloud.com/32766
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: handle o2iblnd tx failure 65/32765/15
Amir Shehata [Fri, 15 Jun 2018 20:15:27 +0000 (13:15 -0700)]
LU-9120 lnet: handle o2iblnd tx failure

Monitor the different types of failures that might occur on the
transmit and flag the type of failure to be propagated to LNet
which will handle either by attempting a resend or simply
finalizing the message and propagating a failure to the ULP.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e2bb62257cb8bd2a5ed0054c172742c465731be
Reviewed-on: https://review.whamcloud.com/32765
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-9120 lnet: handle local ni failure 64/32764/15
Amir Shehata [Tue, 5 Jun 2018 20:34:52 +0000 (13:34 -0700)]
LU-9120 lnet: handle local ni failure

Added an enumerated type listing the different errors which
the LND can propagate up to LNet for further handling.

All local timeout errors will trigger a resend if the
system is configured for resends. Remote errors will
not trigger a resend to avoid creating duplicate message
scenario on the receiving end. If a transmit error is encountered
where we're sure the message wasn't received by the remote end
we will attempt a resend.

LNet level logic to handle local NI failure. When the LND finalizes
a message lnet_finalize() will check if the message completed
successfully, if so it increments the healthv of the local NI, but
not beyond the max, and if it failed then it'll decrement the healthv
but not below 0 and put the message on the resend queue.

On local NI failure the local NI is placed on a recovery queue.

The monitor thread will wake up and resend all the messages pending.
The selection algorithm will properly select the local and remote NIs
based on the new healthv.

The monitor thread will ping each NI on the local recovery queue. On
reply it will check if the NIs healthv is back to maximum, if it is
then it will remove it from the recovery queue, otherwise it'll
keep it there until it's fully recovered.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1cf5c6e74b9c5e5b06b15209f6ac77b49014e270
Reviewed-on: https://review.whamcloud.com/32764
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-9120 lnet: add monitor thread 63/32763/11
Amir Shehata [Thu, 31 May 2018 00:20:10 +0000 (17:20 -0700)]
LU-9120 lnet: add monitor thread

Refactored the router checker thread to be the monitor thread.
The monitor thread will check router aliveness, expires messages
on the active list, recover local and remote NIs and resend messages.

In this patch it only checks router aliveness.

A deadline on the message is also added to keep track of when this
message should expire.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I712cad13d55328400ce61749967979673c4d673f
Reviewed-on: https://review.whamcloud.com/32763
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoLU-9120 lnet: add lnet_health_sensitivity 62/32762/10
Amir Shehata [Mon, 19 Feb 2018 23:35:58 +0000 (15:35 -0800)]
LU-9120 lnet: add lnet_health_sensitivity

Add lnet_health_senstivity value. This value determines the amount
the NI health value is decremented by. The value defaults to 0,
which turns off the health feature by default. The user needs
to explicitly turn on this feature. The assumption is that many sites
will only have one interface in their nodes. In this case the
health feature will not increase the resiliency of their system.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I23f70b00f270803e5d296033e36a3a09986fd3cf
Reviewed-on: https://review.whamcloud.com/32762
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoLU-9120 lnet: add health value per ni 61/32761/10
Amir Shehata [Fri, 16 Feb 2018 22:10:33 +0000 (14:10 -0800)]
LU-9120 lnet: add health value per ni

Add a health value per local network interface. The health value
reflects the health of the NI. It is initialized to 1000. 1000 is
chosen to be able to granularly decrement the health value on error.

If the NI is absolutely not healthy that will be indicated by an
LND event, which will flag that the NI is down and should never
be used.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0fb362a84c110f482633fb86a81c4d7b26c3ecba
Reviewed-on: https://review.whamcloud.com/32761
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoLU-9120 lnet: refactor lnet_select_pathway() 60/32760/9
Amir Shehata [Tue, 13 Feb 2018 21:11:30 +0000 (13:11 -0800)]
LU-9120 lnet: refactor lnet_select_pathway()

lnet_select_pathway() is a complex monolithic function which handles
many send cases. Broke down lnet_select_pathway() to multiple
functions. Each function handles a different send case. This will
make it easier to add the handling of the different health cases in
future patches.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6e554c71eaa61f3e1bdfdc60bd9cd38f70df57b5
Reviewed-on: https://review.whamcloud.com/32760
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoNew tag 2.11.54 2.11.54 v2_11_54 v2_11_54_0
Oleg Drokin [Fri, 17 Aug 2018 18:43:06 +0000 (14:43 -0400)]
New tag 2.11.54

Change-Id: If0cf2f80cbed8deb946dc57d9e8582c8b1e9b951
Signed-off-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-1895 tests: don't fail mmp test_5 due to race 55/32355/7
Andreas Dilger [Thu, 2 Aug 2018 15:50:42 +0000 (09:50 -0600)]
LU-1895 tests: don't fail mmp test_5 due to race

In the mmp.sh test_5() mount_after_unmount() testing, it is possible
that the first filesystem unmounts successfully before the second
one starts, and there is no contention for the MMP block.

This caused the test to fail on a regular basis.  However, there is
still value in running this test, since non-MMP race conditions have
previously been seen in this area (OBD device refcount, etc).

Make mount_after_unmount() more robust, only failing if the first
filesystem is still mounted at the same time as the second one.

Author: Andreas Dilger <adilger@whamcloud.com>

Test-Parameters: trivial mdtfilesystemtype=ldiskfs failover=true ostfilesystemtype=ldiskfs osscount=2 mdscount=2 mdtcount=1 austeroptions=-R iscsi=1 testlist=mmp
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I186b9ce0a5a0e1ed6f2b46895fec4a32e73ebbe5
Reviewed-on: https://review.whamcloud.com/32355
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-11152 test: work around find bug in sanity 133[fg] 34/32934/5
John L. Hammond [Fri, 3 Aug 2018 16:40:28 +0000 (11:40 -0500)]
LU-11152 test: work around find bug in sanity 133[fg]

Some versions of find do not handle the -ignore_readdir_race option
correctly. Work around this by calling error_ignore() rather than
error() in these cases.

Test-Parameters: trivial

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I0ad9cef3743f1748908dbab9087b0b54e6466d0a
Reviewed-on: https://review.whamcloud.com/32934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
15 months agoLU-11062 libcfs: use save_stack_trace for stack dump 52/32952/3
Yang Sheng [Tue, 7 Aug 2018 16:24:19 +0000 (00:24 +0800)]
LU-11062 libcfs: use save_stack_trace for stack dump

The stacktrace_ops has been removed recently. So we
have to use save_stack_trace_tsk for stack trace
dump.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Icb3d0dbd62c35fdd9b8de925aec9358a2208814f
Reviewed-on: https://review.whamcloud.com/32952
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-11176 systemd: use univeral path for modprobe 44/32944/3
James Simmons [Mon, 6 Aug 2018 20:00:58 +0000 (16:00 -0400)]
LU-11176 systemd: use univeral path for modprobe

The program modprobe is not the same on all platforms. On RHEL
systems it is located in /usr/sbin. For Ubuntu/Debian which is
busybox based /sbin/modprobe is a symlink to /bin/kmod. On all
platforms to keep some sort of standard a symlink for modprobe
exist in /sbin. Update the lnet.service script to use the hard
patch /sbin/modprobe

Test-Parameters: trivial

Change-Id: I54342971a6ee1aa4ce86a9fae0ac4dcb167b1510
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32944
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-10940 tests: skip sanity test 802 when quota enabled 00/32900/2
James Nunez [Mon, 30 Jul 2018 16:02:17 +0000 (10:02 -0600)]
LU-10940 tests: skip sanity test 802 when quota enabled

If ENABLE_QUOTA is set, sanity test 802 will try to set
the quota type on read-only targets. Setting quota requires
changes to the targets and, thus, does not make sense for
this test. sanity test 802 should be skipped if ENABLE_QUOTA
is set.

Test-Parameters: trivial envdefinitions=ENABLE_QUOTA=yes,ONLY=802 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic9c245045961867b7dc93be9268e6f4a4631c1dc
Reviewed-on: https://review.whamcloud.com/32900
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-11171 tests: set parameters for racer_on_nfs 80/32880/3
James Nunez [Wed, 25 Jul 2018 16:59:33 +0000 (10:59 -0600)]
LU-11171 tests: set parameters for racer_on_nfs

The parallel-scale-nfs script calls the racer test without
specifying a directory to create files, create directories,
etc. in. In addition, racer needs a few other global
parameters to work properly, including the number of OSTs,
MDTs and which LFS to use.

Test-Parameters: trivial testlist=parallel-scale-nfsv3,parallel-scale-nfsv4
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic4f5f08ddec7a8df5cb818b434aa3473f6cd72cb
Reviewed-on: https://review.whamcloud.com/32880
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-9007 lod: get rid of comp ost in use array 13/32813/3
Bobi Jam [Thu, 12 Jul 2018 22:09:56 +0000 (16:09 -0600)]
LU-9007 lod: get rid of comp ost in use array

Use lod_layout_component::llc_ost_indices to serve the same purpose.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I66c89fe6349b48b89593e34e9e985ec6ea5a1758
Reviewed-on: https://review.whamcloud.com/32813
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-11109 mdt: handle zero length xattr values correctly 55/32755/3
John L. Hammond [Mon, 2 Jul 2018 19:52:01 +0000 (14:52 -0500)]
LU-11109 mdt: handle zero length xattr values correctly

In mdt_getxattr(), set OBD_MD_FLXATTR in mbo_valid of the reply's MDT
body so that the client can distinguish between nonexistent extended
attributes and zero length values. In ll_xattr_list() and
ll_getxattr_common() test for OBD_MD_FLXATTR and return 0 rather than
-ENODATA in the appropriate cases. Add sanity test_102t() to test that
zero length values are handled correctly.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I15649581c26dc52e83ca714b44f8372f29954ed5
Reviewed-on: https://review.whamcloud.com/32755
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>