Whamcloud - gitweb
fs/lustre-release.git
5 years agoLU-6142 obdclass: Fix style issues for lu_ref.c 81/33081/4
Arshad Hussain [Sat, 25 Aug 2018 22:00:07 +0000 (03:30 +0530)]
LU-6142 obdclass: Fix style issues for lu_ref.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/lu_ref.c

Change-Id: I8733fcac454685704b327219ba4afb096d3943c3
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33081
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-6142 obdclass: Fix style issues for lustre_handles.c 80/33080/2
Arshad Hussain [Sat, 25 Aug 2018 21:39:24 +0000 (03:09 +0530)]
LU-6142 obdclass: Fix style issues for lustre_handles.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/lustre_handles.c

Change-Id: I6e6ad8c56e225dcdd3707bf5f3b233eda3f90320
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33080
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-6142 obdclass: Fix style issues for lustre_peer.c 79/33079/3
Arshad Hussain [Sat, 25 Aug 2018 18:25:59 +0000 (23:55 +0530)]
LU-6142 obdclass: Fix style issues for lustre_peer.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/lustre_peer.c

Change-Id: I6cf95dfdd709974cae62626ac50a3507588f425d
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33079
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-11255 kernel: kernel update [SLES12 SP3 4.4.143-94.47] 65/33065/3
Jian Yu [Fri, 24 Aug 2018 06:36:34 +0000 (23:36 -0700)]
LU-11255 kernel: kernel update [SLES12 SP3 4.4.143-94.47]

Update SLES12 SP3 kernel to 4.4.143-94.47.

Test-Parameters: mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
clientdistro=sles12sp3 ossdistro=sles12sp3 mdsdistro=sles12sp3 \
testgroup=review-ldiskfs

Change-Id: I8b2c99c9a65149f1b149fa91351970034d6f7a47
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33065
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11125 ofd: decrease message level 85/32985/5
Mikhail Pershin [Mon, 13 Aug 2018 14:34:30 +0000 (17:34 +0300)]
LU-11125 ofd: decrease message level

The "destroys_in_progress already cleared" message
in ofd_create_hdl() may be result of high load on OST
server prior failover. It is not an error, so decrease
its level to D_HA from D_ERROR.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id5142672a61244a6362be3778d0769baafc87b86
Reviewed-on: https://review.whamcloud.com/32985
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
5 years agoLU-10509 mdd: don't set size attr for DOM file 08/33008/2
Mikhail Pershin [Wed, 15 Aug 2018 21:15:03 +0000 (00:15 +0300)]
LU-10509 mdd: don't set size attr for DOM file

When client does truncate it calls ll_md_setattr() followed by
ll_setattr_ost() to set size on OSTs. With DOM file that causes
setattr on MDT first including size then PUNCH RPC on the same
object. That was considered as non-optimized situation and
LU-11033 is intended to improve it, but with ZFS there is
check in OSD which does no truncate if size is the same already.
Therefore real file blocks are not truncated actually so sparse
write beyond the end of file will get old data in hole instead of
zeroes.

Quick patch checks if mdd_attr_set() is going to set SIZE attr for
DOM file and clear LA_SIZE bit, assuming there will be truncate.

Complete solution for this will be implemented under LU-11033

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I47873dccf4270e5f0338f7b6696aa5969cfb9444
Reviewed-on: https://review.whamcloud.com/33008
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10018 protocol: MDT as a statfs proxy 36/29136/91
Alex Zhuravlev [Thu, 21 Sep 2017 15:24:18 +0000 (18:24 +0300)]
LU-10018 protocol: MDT as a statfs proxy

MDT can act as a proxy for statfs data. this should
make df faster (RTT vs RTT*(#MDTs+1)) and enable
idling connections so that clients don't connect to
each OST just to report statfs data. the protocol
has been changing slightly to let MDT differentiate
self and aggregated statfs.

also, obd_statfs has got a new field "granted" where
OST reports how much space has been granted to the
requesting MDT so that space can be added to available
space.

client's NID is used to distribute MDS_STATFS among
MDTS.

Change-Id: I59e03cb5abf809ae8820f874ec51dd2b74e1806c
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/29136
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8708 osc: enable/disable OSC grant shrink 03/23203/18
Bobi Jam [Mon, 17 Oct 2016 09:50:41 +0000 (17:50 +0800)]
LU-8708 osc: enable/disable OSC grant shrink

Add an OSC proc interface to enable/disable client's grant shrink
feature.

lctl get_param osc.*.grant_shrink
lctl set_param osc.*.grant_shrink={0,1}

Change-Id: I7974b3bf1c4f9c294dd0d4871d09b1a2e45a8d78
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/23203
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11212 lod: preserve mirror ID on mirror extension 38/32938/7
Bobi Jam [Tue, 12 Jun 2018 11:28:16 +0000 (19:28 +0800)]
LU-11212 lod: preserve mirror ID on mirror extension

When merging/expanding existing mirrors of a FLR file, we need keep
its existing mirror's mirror ID.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: If139076c37c33bb1a330e1a5e997f8f56015fd9a
Reviewed-on: https://review.whamcloud.com/32938
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11238 lod: refine obj avoid collect for FLR 95/32995/4
Bobi Jam [Tue, 14 Aug 2018 03:12:12 +0000 (11:12 +0800)]
LU-11238 lod: refine obj avoid collect for FLR

When a FLR file is being created, the MDS tries to allocate objects
for the first components of all mirrors, and in this decalre phase,
the objects for their component has been allocated, but the
component's ID and init flag has not been set until the exec phase,
lod_create()->lod_striped_create(), so lod_collect_avoidance() should
take heed of this scenario.

This patch also addes some debug messages.

Test-Parameters: testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I16ef2da44f6db06a8e0bc67ae2646cdc3ff3bb63
Reviewed-on: https://review.whamcloud.com/32995
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11259 test: correct fail_loc names in replay-{single,dual} 15/33015/2
John L. Hammond [Thu, 16 Aug 2018 14:37:23 +0000 (09:37 -0500)]
LU-11259 test: correct fail_loc names in replay-{single,dual}

Some comments in replay-single and replay-dual confusingly referred to
OBD_FAIL_OUT_UPDATE_NET_REP as OBD_FAIL_OBJ_UPDATE_NET_REP or
OBD_FAIL_UPDATE_OBJ_NET_REP. Correct these.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib724e8151ba0ea34a5dacf2f148673a52dc37824
Reviewed-on: https://review.whamcloud.com/33015
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
5 years agoLU-8066 tests: don't access /proc/sys/lnet/debug directly 05/33005/2
Andreas Dilger [Wed, 15 Aug 2018 17:39:15 +0000 (11:39 -0600)]
LU-8066 tests: don't access /proc/sys/lnet/debug directly

In replay-single test_70e use "lctl set_param" to set the debug mask
rather than writing into the /proc/sys/lnet/debug file directly, since
this tunable moved to sysfs in commit v2_10_51_0-12-g7092309f32.

Clean up the test code style in test_70e as well.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65c2bd9643fc6fc54a5de7b6404d316c0ff12537
Reviewed-on: https://review.whamcloud.com/33005
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11240 gnilnd: Replace KGNILND_BUILD_REV 90/32990/2
Chris Horn [Tue, 5 Jun 2018 19:32:32 +0000 (14:32 -0500)]
LU-11240 gnilnd: Replace KGNILND_BUILD_REV

The current format of the gnilnd version string causes a compilation
error. Since gnilnd doesn't really need its own version string we just
replace it with LUSTRE_VERSION_STRING.

Cray-bug-id: LUS-6072
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I6f45df2566853a6f4c2078cf72c7eac7a52f3fad
Reviewed-on: https://review.whamcloud.com/32990
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chuck Fossen <chuckf@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11135 mdt: LASSERT(lu_object_exists(o)) fails 03/32803/4
Andriy Skulysh [Thu, 3 May 2018 10:07:22 +0000 (13:07 +0300)]
LU-11135 mdt: LASSERT(lu_object_exists(o)) fails

mdt_object_find() can return a vaild nonexisting object.
It's return value needs to be checked additionaly if exists.

Change-Id: Ib1f5bd5289a69e29437db520706591929bf55830
Cray-bug-id: LUS-6192
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/32803
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11056 lwp: fix lwp reconnection issue 36/32536/6
Hongchao Zhang [Thu, 24 May 2018 20:09:27 +0000 (16:09 -0400)]
LU-11056 lwp: fix lwp reconnection issue

After the OST or MDT was restarted, the lwp reconnection can be
failed for -EALREADY because the connect count in the connecttion
request is less then the value saved in the corresponding export
at MDT0000, which could cause the system hang.

The patch also changes lustre_lwp_connect to use OBD_CONNECT_MDS_MDS
flag only when the connection is between MDTs.

Change-Id: I9ae7b4faadc65fdaa78458a06315b1739d144feb
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/32536
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10855 llog: use llog_common_cat_ops 12/31812/3
John L. Hammond [Tue, 27 Mar 2018 17:17:10 +0000 (12:17 -0500)]
LU-10855 llog: use llog_common_cat_ops

Remove changelog_orig_logops, hsm_actions_logops, and
osp_mds_ost_orig_logops, replacing each with llog_common_cat_ops.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ia19337350452f9793b3ea9a56343ef3a065c1f83
Reviewed-on: https://review.whamcloud.com/31812
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10824 llite: don't use ll_mnt to get fstype name 25/33025/3
James Simmons [Sat, 18 Aug 2018 16:07:12 +0000 (12:07 -0400)]
LU-10824 llite: don't use ll_mnt to get fstype name

Originally lustre would report using the fstype proc file either
'lustre' or 'llite'. This required us to query struct super_block
but its been a very long time since that is the case. This also
removes a direct use of ll_mnt. The fix is simply report 'lustre'.

Test-Parameters: trivial

Change-Id: Ia766c8e0a027e58a48de8fa6e2756238e20312b2
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33025
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-6142 obdclass: Fix style issues for statfs_pack.c 81/32981/2
Arshad Hussain [Sun, 12 Aug 2018 04:18:54 +0000 (09:48 +0530)]
LU-6142 obdclass: Fix style issues for statfs_pack.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/statfs_pack.c

Change-Id: I7a34dd87875ab049c3339022f3153fb07937021e
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32981
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-6142 osd-ldiskfs: Fix style issues for osd_handler.c 18/32818/5
Arshad Hussain [Sat, 14 Jul 2018 15:27:17 +0000 (20:57 +0530)]
LU-6142 osd-ldiskfs: Fix style issues for osd_handler.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_handler.c

Change-Id: Ifd6468acc75b59a4324385c68af1175a74a3c312
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32818
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-11266 build: update changelog for Ubuntu 20/33020/3
Minh Diep [Fri, 17 Aug 2018 19:10:08 +0000 (12:10 -0700)]
LU-11266 build: update changelog for Ubuntu

Record the version that we are building

Test-Parameters: trivial

Change-Id: Ib1c2e74774d8a6caa6c3f70814affb53cf8cd22e
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33020
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11009 test: add version check to test_102 40/32340/7
Wei Liu [Wed, 9 May 2018 19:53:11 +0000 (12:53 -0700)]
LU-11009 test: add version check to test_102

Skip test_102 if server is equal or less than 2.9.53

Test-Parameters:trivial testlist=conf-sanity envdefinitions=ONLY=102 serverjob=lustre-b2_9 serverbuildno=22
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I1964a7a5df8b910652b2fe774703d7b62f953e95
Reviewed-on: https://review.whamcloud.com/32340
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11040 utils: improve mount usage/man page 81/32481/6
Andreas Dilger [Wed, 25 Jul 2018 07:15:40 +0000 (01:15 -0600)]
LU-11040 utils: improve mount usage/man page

Improve the description of the mount.lustre.8 man page and usage:
- provide separate SYNOPSYS for client and server mount commands
- move "acl" option out of general options into server-only options,
  since client option was removed and ACLs are only controlled by MDS
- correct "CLIENT OPTIONS" section to be named "SERVER OPTIONS"
- add checksum, lruresize, lazystatfs, 32bitapi, user_fid2path usage
- mark the default values of the options in the usage message

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I28fe0f13d363e0a26ffcbc1ba9923e4fd35804f0
Reviewed-on: https://review.whamcloud.com/32481
Tested-by: Jenkins
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11096 osd: wrap new blk integrity stuff 25/32725/9
Alex Zhuravlev [Thu, 9 Aug 2018 22:26:35 +0000 (18:26 -0400)]
LU-11096 osd: wrap new blk integrity stuff

to be able to build Lustre against kernels with no blk integrity.

Change-Id: I050020e94524f4519fdf46a22f0d847979754291
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32725
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8066 lov: fix lov.*.stripeoffset printing 19/33019/3
Andreas Dilger [Fri, 17 Aug 2018 18:48:44 +0000 (12:48 -0600)]
LU-8066 lov: fix lov.*.stripeoffset printing

The move of lov.*.stripeoffset from /proc to /sys in commit 3c900918
reverted the printing of stripeoffset from a signed value to an
unsigned value, which is broken for the common value of "-1".  This
was previously fixed in LU-9611 commit f93276d9.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib61305ddbf902dd74ac0e16c0c2fe6920052ddf4
Reviewed-on: https://review.whamcloud.com/33019
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11215 tests: replace "large_xattr" with "ea_inode" 12/33012/2
Li Dongyang [Thu, 16 Aug 2018 06:26:12 +0000 (16:26 +1000)]
LU-11215 tests: replace "large_xattr" with "ea_inode"

Change the test scripts over to using the "ea_inode" name, since
this is what the upstream e2fsprogs is using.  The "large_xattr"
feature name was only ever used in the Lustre-patched e2fsprogs.

Don't try to turn off "ea_inode" feature on the targets anymore,
it's not supported by upstream e2fsprogs.

e2fsprogs commit: 5b72578279fe2470e682692a15d70a43d9289e0f

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I83bd303827fa28050d1d6d2416b2d630dc94ec12
Reviewed-on: https://review.whamcloud.com/33012
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT 06/33006/3
John L. Hammond [Wed, 15 Aug 2018 20:06:16 +0000 (15:06 -0500)]
LU-4256 test: add lustre-rsync-test 2b to ALWAYS_EXCEPT

This test continues to fail at a low rate so disable it.

Test-Parameters: trivial testlist=lustre-rsync-test
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8fe4d039e8edd0552e56ee9451cc05f08cb34c8d
Reviewed-on: https://review.whamcloud.com/33006
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11244 build: apply IB_OPTIONS to debian rules 96/32996/2
Jinshan Xiong [Tue, 14 Aug 2018 03:33:33 +0000 (20:33 -0700)]
LU-11244 build: apply IB_OPTIONS to debian rules

IB_OPTIONS should be honored when making debian package.

Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Change-Id: Ibc16a5428d47f072499c39a62ea457c922ae7352
Reviewed-on: https://review.whamcloud.com/32996
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Martin Schroeder <martin.h.schroeder@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11227 lod: lod_sync: don't attempt sync to inactive targets 64/32964/5
Robin Humble [Thu, 9 Aug 2018 05:33:04 +0000 (15:33 +1000)]
LU-11227 lod: lod_sync: don't attempt sync to inactive targets

chgrp on a client triggers lod_sync() which in turn loops over OST/MDT
targets with dt_sync(). dt_sync() fails with -ENOTCONN when targets
have been deactivated (ie. set to active=0). The client retries
infinitely causing the client process to hang and considerably MDS
network traffic, load, and disk i/o.

the fix is to not attempt dt_sync() to ost/mdt targets that have been
deactivated and also (because of possible races) to ignore connection
errors.

tested with Lustre 2.10.4.

Signed-off-by: Robin Humble <plaguedbypenguins@gmail.com>
Change-Id: I617509cf7944541489f4fd9762c233b771132165
Reviewed-on: https://review.whamcloud.com/32964
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11226 flr: mirror resync regression 68/32968/5
Bobi Jam [Thu, 9 Aug 2018 06:35:49 +0000 (14:35 +0800)]
LU-11226 flr: mirror resync regression

There is a glitch in the lfs mirror resync tool in commit
0e5c12ac29a9622e8ca05d5e39cd5e2a721ace93, resync write needs to
restricted to the component's extent.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ifbd3f16b2f621407b31c7fe37ce9745de48fcc99
Reviewed-on: https://review.whamcloud.com/32968
Tested-by: Jenkins
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11146 lustre: fix setstripe for specific osts upon dir 14/32814/16
Wang Shilong [Wed, 11 Jul 2018 14:11:47 +0000 (22:11 +0800)]
LU-11146 lustre: fix setstripe for specific osts upon dir

LOV_USER_MAGIC_SPECIFIC function is broken and it
was not available for setting directory.

1)llite doesn't handle LOV_USER_MAGIC_SPECIFIC case
properly for dir {set,get}_stripe, and ioctl
LL_IOC_LOV_SETSTRIPE did not alloc enough buf,
copy ost lists from userspace.

2)lod_get_default_lov_striping() did not handle
LOV_USER_MAGIC_SPECIFIC type that newly created
files/dir won't inherit parent setting well.

3)there is not any case to cover lfs setstripe
'-o' interface which make it hard to figure out
when this function was broken.

Change-Id: Icc2ee60a474e5e565db12b35a9a38fde65b05bbd
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32814
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8066 llite: move /proc/fs/lustre/llite/uuid to sysfs 01/32501/9
James Simmons [Sun, 29 Jul 2018 14:34:19 +0000 (10:34 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/uuid to sysfs

Move uuid file from /proc/fs/lustre/llite/*
to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: ec55a6299990efa969dfc00d95c72444ff1e3461

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: I2dc13c248879f554f9f7ed6dc62a6772a59f6f35
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32501
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-8215 tests: sanity-benchmark/iozone should wait for space recovery 99/20499/2
Alex Zhuravlev [Mon, 30 May 2016 10:45:51 +0000 (14:45 +0400)]
LU-8215 tests: sanity-benchmark/iozone should wait for space recovery

otherwise it may fail due to a transient state where the space confsumed
by the previous run hasn't recovered yet. this happens to tiny filesystems
used in local setups.

Change-Id: I04b3ce096621583629277c1e52c64a1551bc8ace
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/20499
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11201 lfsck: check linkea entry validity 58/32958/2
Lai Siyao [Sun, 22 Jul 2018 21:45:23 +0000 (05:45 +0800)]
LU-11201 lfsck: check linkea entry validity

Invalid linkea data may lead to dead loop in linkea iteration, check
linkea entry validity on unpack, and if entry is not unpacked, check
entry length validity.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-lfsck
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8e1890ed64fab38b85149ebbfecce04caaf41e17
Reviewed-on: https://review.whamcloud.com/32958
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11154 llite: use proper flags for FS_IOC_{FSSET,FSGET}XATTR 28/32828/6
Wang Shilong [Wed, 18 Jul 2018 08:30:28 +0000 (16:30 +0800)]
LU-11154 llite: use proper flags for FS_IOC_{FSSET,FSGET}XATTR

Two problems addressed by this patch:

1)struct fsxattr fsx_xflags has its own flags definition
like FS_XFLAG_XXX, we should use proper convert macro for
it, here we used wrong constant flag for project inherit flag.

2)FS_XFLAG_PROJINHERIT is not a valid vfs inode flag, looking
at current linux codes, local filesystem set project inherit
flag on its private flags, we should do similar thing to Lustre

Test-Parameters: trivial testlist=sanity-quota,sanity-quota,sanity-quota
Change-Id: I453db8ed074e8008f0ec145c726d7577121422e6
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32828
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9120 lnet: LNet Health/Resiliency Feature
Oleg Drokin [Tue, 21 Aug 2018 16:15:26 +0000 (12:15 -0400)]
LU-9120 lnet: LNet Health/Resiliency Feature

The LNet Health/Resiliency feature adds the ability for LNet
to try out different interfaces available to it if message
sending fails. It maintains the health of each remote and local
interfaces and selects the best interface for sending from and best
remote interface to send to.

Merge commit '958ef71f33fa925e6657f9902702cd3677e15ec9'

Change-Id: I9ca740654c48d642fe130f98a60c5c59b9b4ebe1
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
5 years agoLU-10686 tests: stop running sanity-pfl test 9 45/32945/2
James Nunez [Mon, 6 Aug 2018 21:26:25 +0000 (15:26 -0600)]
LU-10686 tests: stop running sanity-pfl test 9

sanity-pfl test 9 consistently fails when run on a Lustre
file system with a single MDS. We need to add test 9 to
the ALWAYS_EXCCEPT list and, thus, stop running the test
until a fix for the underlying problem can be found.

Test-Parameters: trivial mdscount=1 mdtcount=1 testlist=sanity-pfl
Test-Parameters: mdscount=2 mdtcount=2 testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ife4b3c044e2777bb9b9010e0be7c00549a683fdc
Reviewed-on: https://review.whamcloud.com/32945
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments 39/32939/5
James Simmons [Mon, 6 Aug 2018 17:56:55 +0000 (13:56 -0400)]
LU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments

For the linux kernel their exist two ways to initialize a
struct timer_list. One method is with setup_timer() and the other is
with the DEFINE_TIMER macro. For earlier kernels both methods employed
callbacked with a argument of the type unsigned long. In kernels 4.15+
both methods of initialization use struct timer_list pointer for its
callback argument. During the 4.14 development phase we have
setup_timer() using struct timer_list as an argument for its callback
but DEFINE_TIMER was still using unsigned long. Additionally when
DEFINE_TIMER did move to using struct timer_list it reduced the number
of arguments to the macro. This patch handles the 4.14 kernel state of
development for the timer API.

Test-Parameters: trivial

Change-Id: I1c509838153328ed4bbdfa50468a396e13037d50
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32939
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11014 mdc: remove obsolete intent opcodes 61/32361/6
John L. Hammond [Fri, 11 May 2018 17:04:02 +0000 (12:04 -0500)]
LU-11014 mdc: remove obsolete intent opcodes

In enum ldlm_intent_flags, remove the obsolete constants IT_UNLINK,
IT_TRUNC, IT_EXEC, IT_PIN, IT_SETXATTR. Remove any handling code for
these opcodes.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I66f20e4c881cb77a481805a148a33f1c2daa5f0c
Reviewed-on: https://review.whamcloud.com/32361
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8066 lod: migrate from proc to sysfs 98/32198/6
James Simmons [Sat, 28 Jul 2018 15:54:38 +0000 (11:54 -0400)]
LU-8066 lod: migrate from proc to sysfs

Move the lod module from using proc for most single value files
to sysfs. Create the default attrs for dt_devices which can be
used for other server side devices.

Change-Id: I734f01ef0d9f0c18efc141c835e4cf8ad2365250
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32198
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
5 years agoLU-11121 mdt: take discard lock at cleanup stage 30/29930/21
Mikhal Pershin [Fri, 3 Nov 2017 09:38:04 +0000 (12:38 +0300)]
LU-11121 mdt: take discard lock at cleanup stage

Call mdt_dom_check_and_discard() after mdt_object_unlock() to
avoid possible deadlock if some third lock is conflicting with
both like in the scenario below:
 thread1: mdt_object_lock() with some bits
 thread2: take conflicting lock and wait
 thread1: mdt_dom_check_and_discard() with bits conflicting
          with thread2 causes deadlock.

Patch enables dom layout in racer to test it on regular basis
Another minor update uses 'trap' in related tests.

Test-Parameters: mdssizegb=20 mdtcount=1 mdscount=1 testlist=sanity-dom,dom-performance,racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I63bedabb4a82cfa2f01e126d35dc8c2a89d64f56
Reviewed-on: https://review.whamcloud.com/29930
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11175 osc: serialize access to idle_timeout vs cleanup 83/32883/4
Alex Zhuravlev [Thu, 26 Jul 2018 07:52:38 +0000 (11:52 +0400)]
LU-11175 osc: serialize access to idle_timeout vs cleanup

use LPROCFS_CLIMP_CHECK() and LPROCFS_CLIMP_EXIT() as cl_import
can disappear due to umount.

Change-Id: I2a067f416691f39cde13cfae8f64ed5769d92041
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32883
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-6142 obdclass: Fix style issues for acl.c 51/32851/5
Arshad Hussain [Sun, 22 Jul 2018 03:00:27 +0000 (08:30 +0530)]
LU-6142 obdclass: Fix style issues for acl.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/acl.c

Change-Id: I00d4535123fb6677863bfd10937df5039ee7a339
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32851
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-6142 osd-ldiskfs: Fix style issues for osd_iam_lfix.c 49/32849/6
Arshad Hussain [Sat, 21 Jul 2018 19:35:19 +0000 (01:05 +0530)]
LU-6142 osd-ldiskfs: Fix style issues for osd_iam_lfix.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_iam_lfix.c

Change-Id: I9d32231e397689dd3806fecf106bc1ce2f1439a4
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32849
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-11116 llog: error handling cleanup 80/32780/2
Alexander Boyko [Wed, 4 Jul 2018 10:41:52 +0000 (06:41 -0400)]
LU-11116 llog: error handling cleanup

llog_cat_new_log() needs some error handling cleanup.
Save and restore thread lgi_cookie when using, to prevent
conflict/corruptions with llog_process_thread().

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I12fdfe1a72e77cfeb5ad464b8582db68a7bcfe16
Cray-bug-id: LUS-4780
Reviewed-on: https://review.whamcloud.com/32780
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11224 obd: use correct ip_compute_csum() version 53/32953/2
James Simmons [Tue, 7 Aug 2018 17:20:54 +0000 (13:20 -0400)]
LU-11224 obd: use correct ip_compute_csum() version

The linux kernel provides a generic platform independent version
of ip_compute_csum() as well as platform optimized versions. Some
platforms will disable the generic platform version in favor of
the optimized one. If the generic version is disabled and if the
checksum.h header from asm-generic is used then we will end up
with a undefined symbol error when loading the obdclass module.
The solution is to use the platform specific checksum.h header
that will handle using the generic or optimized version for us.
As a bounus we get better performance with the right kernel
configuration.

Test-Parameters: trivial

Change-Id: Ia0cfc9f4363bb61d5e381790655423ff5f91d9be
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32953
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-9325 ptlrpc: replace simple_strtol with kstrtol 85/32785/8
James Simmons [Thu, 5 Jul 2018 03:56:02 +0000 (23:56 -0400)]
LU-9325 ptlrpc: replace simple_strtol with kstrtol

Eventually simple_strtol() will be removed so replace its use in
the ptlrpc with kstrtoXXX() class of functions.

Change-Id: I41b44c5dc329832a901c1772a9ba0608df30282a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32785
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-9120 lnet: LNet Health/Resiliency Feature 23/33023/1
Amir Shehata [Sat, 18 Aug 2018 01:23:53 +0000 (18:23 -0700)]
LU-9120 lnet: LNet Health/Resiliency Feature

The LNet Health/Resiliency feature adds the ability for LNet
to try out different interfaces available to it if message
sending fails. It maintains the health of each remote and local
interfaces and selects the best interface for sending from and best
remote interface to send to.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcbbc34f8acfc3afb36ffe73eb27d69c147d02ce

5 years agoLU-9120 lnet: health error simulation 51/32951/13
Amir Shehata [Sun, 5 Aug 2018 21:37:29 +0000 (14:37 -0700)]
LU-9120 lnet: health error simulation

Modified the error simulation code to simulate health errors for
testing purposes. The specific error can be set. If multiple
errors are configured then one at random is chosen from the set.

EX:
lctl net_drop_add -s *@tcp -d *@tcp -m GET -i 1 -e local_interrupt

The -e can be repeated multiple times to specify different
errors to simulate. The available set are
local_interrupt
local_dropped
local_aborted
local_no_route
local_error
local_timeout
remote_error
remote_dropped
remote_timeout
network_timeout
random

a -n, "--random", has been added to randomize error generation for
drop rules. This will rely an interval value provided via -i. This
will generate a random number no bigger than interval. If the number
is smaller than half of the interval then the rule isn't matched,
otherwise it is.

The purpose of this is because drop matching can happen multiple
times in the path of sending the message, and using time based
or rate will not result in even error generation across the
multiple calls.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If070e29f68c3de10100a9d5eaa49d10cdb76a59a
Reviewed-on: https://review.whamcloud.com/32951
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-9120 lnet: print recovery queues content 50/32950/12
Amir Shehata [Sun, 5 Aug 2018 21:25:47 +0000 (14:25 -0700)]
LU-9120 lnet: print recovery queues content

Add commands to lnetctl to print recovery queues content from
user space.

Associated code to handle the IOCTL is added in LNet module.

for local NIs:
lnetctl debug recovery --local

for peer NIs:
lnetctl debug recovery --peer

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id136d506772d95381fd5d8346d772177442a84fb
Reviewed-on: https://review.whamcloud.com/32950
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
5 years agoLU-9120 lnet: add global health statistics 49/32949/12
Amir Shehata [Sun, 5 Aug 2018 21:16:49 +0000 (14:16 -0700)]
LU-9120 lnet: add global health statistics

Added global health statistics

Print that from lnetctl.

lnetctl stats show

lnet_selftest passes the statistics block over the wire. This,
unfortunately, creates an unnecessary backwards compatibility link
for lnet_selftest, which shouldn't be there. This patch breaks
this backwards compatibility, which means lnet_selftest will
not work with older selftest modules.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4a171c4f3cf13a1e8ab0d607d3b328352f727380
Reviewed-on: https://review.whamcloud.com/32949
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
5 years agoLU-9120 lnet: set health value from user space 63/32863/14
Amir Shehata [Tue, 24 Jul 2018 00:11:07 +0000 (17:11 -0700)]
LU-9120 lnet: set health value from user space

Add commands to lnetctl to set the health value.

for local NIs:
 lnetctl net set --nid <nid> --health <value>

for peer NIs:
 lnetctl peer set --nid <nid> --health <value>

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I06e1238df54c94bcfecadd84fbaa30cc1ce4dd68
Reviewed-on: https://review.whamcloud.com/32863
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
5 years agoLU-9120 lnet: show peer ni health stats 83/32783/15
Amir Shehata [Wed, 4 Jul 2018 18:49:38 +0000 (11:49 -0700)]
LU-9120 lnet: show peer ni health stats

Added another section in the peer ni show output for the health
statistics.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7ab3a9343972622d90a984c4f8c0b096b15ecbdc
Reviewed-on: https://review.whamcloud.com/32783
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: show local ni health stats 82/32782/15
Amir Shehata [Wed, 4 Jul 2018 17:42:58 +0000 (10:42 -0700)]
LU-9120 lnet: show local ni health stats

Added another section in the ni show output for the health
statistics.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id57013e510cf1fb4befdd7a4c18af28d1f995ce2
Reviewed-on: https://review.whamcloud.com/32782
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: set health sensitivity from lnetctl 79/32779/16
Amir Shehata [Wed, 4 Jul 2018 00:51:29 +0000 (17:51 -0700)]
LU-9120 lnet: set health sensitivity from lnetctl

Added an lnetctl command to set the health sensitivity
from userspace.

lnetctl set health_sensitivity {>0}

0 - turn off health evaluation
>0 - sensitivity value not more than 1000

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic9289b06c5c9285a69c1819a33b79e954319a01e
Reviewed-on: https://review.whamcloud.com/32779
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: set transaction timeout from lnetctl 78/32778/16
Amir Shehata [Wed, 4 Jul 2018 00:24:31 +0000 (17:24 -0700)]
LU-9120 lnet: set transaction timeout from lnetctl

Added an lnetctl command to set the transaction timeout
from userspace.

lnetctl set transaction_timeout {>0}

>0 - timeout in seconds.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I71274e82fd46bff8017e36c37de449d8a7639ec6
Reviewed-on: https://review.whamcloud.com/32778
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: set retry count from lnetctl 77/32777/16
Amir Shehata [Wed, 4 Jul 2018 00:04:16 +0000 (17:04 -0700)]
LU-9120 lnet: set retry count from lnetctl

Added an lnetctl command to set the retry_count from userspace.

lnetctl set retry_count [0|>0]

0 - turns off retries in the system
>0 - number of retries.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2fd5c88a91590195cfdad52e6d177619ccbbc840
Reviewed-on: https://review.whamcloud.com/32777
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: remove obsolete health functions 62/32862/14
Amir Shehata [Tue, 17 Jul 2018 18:58:22 +0000 (11:58 -0700)]
LU-9120 lnet: remove obsolete health functions

Removed obsolete health functions that were originally added
during the Multi-Rail project. Some assumptions were made about
the health implementation back then, that are no longer true.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4d4f47a03541d58da6807d9c2b786ecd868b50b0
Reviewed-on: https://review.whamcloud.com/32862
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
5 years agoLU-9120 lnet: Add ioctl to get health stats 76/32776/16
Amir Shehata [Tue, 3 Jul 2018 23:27:10 +0000 (16:27 -0700)]
LU-9120 lnet: Add ioctl to get health stats

At the time of this patch the sysfs statistics features is
still in development. Therefore, using ioctl to get the stats
from LNet.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia216484f9e6ee062c766c1043f456e38a27e4d39
Reviewed-on: https://review.whamcloud.com/32776
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
5 years agoLU-9120 lnet: add health statistics 75/32775/15
Amir Shehata [Tue, 3 Jul 2018 01:24:44 +0000 (18:24 -0700)]
LU-9120 lnet: add health statistics

Add a health statistics block for each local and peer NI.
These statistics will be incremented when processing errors reported
by lnet_finalize()

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia1ec4d5de50c04392605e94ac2f81adef78fc17c
Reviewed-on: https://review.whamcloud.com/32775
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: reset health value 73/32773/15
Amir Shehata [Mon, 2 Jul 2018 21:36:50 +0000 (14:36 -0700)]
LU-9120 lnet: reset health value

Added an IOCTL to set the local or peer ni health value.
This would be useful in debugging where we can test the selection
algorithm and recovery mechanism by reducing the health of an
interface.

If the value specified is -1 then reset the health value to maximum.
This is useful to reset the system once a network issue has been
resolved. There would be no need to wait for the interface to go to
fully healthy on its own. It might be desirable to shortcut the
process.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I45a5844bbaa72f769e37a39526773ef4c71118c0
Reviewed-on: https://review.whamcloud.com/32773
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
5 years agoLU-9120 lnet: handle fatal device error 72/32772/15
Amir Shehata [Fri, 29 Jun 2018 23:54:38 +0000 (16:54 -0700)]
LU-9120 lnet: handle fatal device error

The o2iblnd can receive device status on the QP event handler.
There are three in specific that are being handled in this patch:
IB_EVENT_DEVICE_FATAL
IB_EVENT_PORT_ERR
IB_EVENT_PORT_ACTIVE
For DEVICE_FATAL and PORT_ERR the NI associated with the QP is set
in fatal error mode. This NI will no longer be selected when sending
messages. When PORT_ACTIVE is received the NI associated with the QP
has the fatal error cleared and future messages can use that NI.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I282aa463927f489c46e4e45040e93478c9823a37
Reviewed-on: https://review.whamcloud.com/32772
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: remove duplicate timeout mechanism 92/32992/8
Amir Shehata [Mon, 13 Aug 2018 23:19:00 +0000 (16:19 -0700)]
LU-9120 lnet: remove duplicate timeout mechanism

Remove the duplicate GET/PUT timeout mechanism currently implemented
for discovery, as it has been replaced by a more generic timeout
mechanism for all GET/PUT messages.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I28efae8c1fca6fc07fcaad4bfacf123b00ff887d
Reviewed-on: https://review.whamcloud.com/32992
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: timeout delayed REPLYs and ACKs 71/32771/15
Amir Shehata [Fri, 29 Jun 2018 01:02:42 +0000 (18:02 -0700)]
LU-9120 lnet: timeout delayed REPLYs and ACKs

When a GET or a PUT which require an ACK are sent, add a response
tracker block on a percpt queue. When the REPLY/ACK are received
then remove the block from the percpt queue. The monitor thread
will wake up periodically to check if any of the blocks have
expired and if so, it will send a timeout event to the ULP and
flag the MD as stale, then unlink.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia219fca5a578d625819b9f9c8ee2b3aa050dce80
Reviewed-on: https://review.whamcloud.com/32771
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
5 years agoLU-9120 lnet: sysfs functions for module params 61/32861/14
Amir Shehata [Fri, 20 Jul 2018 23:13:55 +0000 (16:13 -0700)]
LU-9120 lnet: sysfs functions for module params

Allow transaction timeout and retry count module parameters to be
set and shown via sysfs.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ica3819f9343a4b45cb0ae322f85f936230fa8138
Reviewed-on: https://review.whamcloud.com/32861
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: calculate the lnd timeout 70/32770/15
Amir Shehata [Tue, 26 Jun 2018 03:59:07 +0000 (20:59 -0700)]
LU-9120 lnet: calculate the lnd timeout

Calculate the LND timeout based on the transaction timeout
and the retry count. Both of these are user defined values. Whenever
they are set the lnd timeout is calculated. The LNDs use these
timeouts instead of the LND timeout module parameter.

Retry count can be set to 0, which means no retries. In that case the
LND timeout will default to 5 seconds, which is the same as the
default transaction timeout.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I5a37caa2b69df155211864735ba8b275fc2d34bb
Reviewed-on: https://review.whamcloud.com/32770
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: add retry count 69/32769/15
Amir Shehata [Tue, 26 Jun 2018 02:16:46 +0000 (19:16 -0700)]
LU-9120 lnet: add retry count

Added a module parameter to define the number of retries on a
message. It defaults to 0, which means no retries will be attempted.
Each message will keep track of the number of times it has been
retransmitted. When queuing it on the resend queue, the retry count
will be checked and if it's exceeded, then the message will be
finalized.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3a622c2128ff89f22b0f8bff02f862163c9d007e
Reviewed-on: https://review.whamcloud.com/32769
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
5 years agoLU-9120 lnet: handle remote errors in LNet 67/32767/15
Amir Shehata [Fri, 22 Jun 2018 17:42:23 +0000 (10:42 -0700)]
LU-9120 lnet: handle remote errors in LNet

Add health value in the peer NI structure. Decrement the
value whenever there is an error sending to the peer.
Modify the selection algorithm to look at the peer NI health
value when selecting the best peer NI to send to.

Put the peer NI on the recovery queue whenever there is
an error sending to it. Attempt only to resend on REMOTE
DROPPED since we're sure the message was never received by
the peer. For other errors finalize the message.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcb41b3fb538e76b973bcb10fcd07638c118acb9
Reviewed-on: https://review.whamcloud.com/32767
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
5 years agoLU-9120 lnet: handle socklnd tx failure 66/32766/15
Amir Shehata [Fri, 22 Jun 2018 04:06:56 +0000 (21:06 -0700)]
LU-9120 lnet: handle socklnd tx failure

Update the socklnd to propagate the health status up to
LNet for handling.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iec090ade478acafb976aef7f6eaf5315ccd1fb67
Reviewed-on: https://review.whamcloud.com/32766
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: handle o2iblnd tx failure 65/32765/15
Amir Shehata [Fri, 15 Jun 2018 20:15:27 +0000 (13:15 -0700)]
LU-9120 lnet: handle o2iblnd tx failure

Monitor the different types of failures that might occur on the
transmit and flag the type of failure to be propagated to LNet
which will handle either by attempting a resend or simply
finalizing the message and propagating a failure to the ULP.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e2bb62257cb8bd2a5ed0054c172742c465731be
Reviewed-on: https://review.whamcloud.com/32765
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
5 years agoLU-9120 lnet: handle local ni failure 64/32764/15
Amir Shehata [Tue, 5 Jun 2018 20:34:52 +0000 (13:34 -0700)]
LU-9120 lnet: handle local ni failure

Added an enumerated type listing the different errors which
the LND can propagate up to LNet for further handling.

All local timeout errors will trigger a resend if the
system is configured for resends. Remote errors will
not trigger a resend to avoid creating duplicate message
scenario on the receiving end. If a transmit error is encountered
where we're sure the message wasn't received by the remote end
we will attempt a resend.

LNet level logic to handle local NI failure. When the LND finalizes
a message lnet_finalize() will check if the message completed
successfully, if so it increments the healthv of the local NI, but
not beyond the max, and if it failed then it'll decrement the healthv
but not below 0 and put the message on the resend queue.

On local NI failure the local NI is placed on a recovery queue.

The monitor thread will wake up and resend all the messages pending.
The selection algorithm will properly select the local and remote NIs
based on the new healthv.

The monitor thread will ping each NI on the local recovery queue. On
reply it will check if the NIs healthv is back to maximum, if it is
then it will remove it from the recovery queue, otherwise it'll
keep it there until it's fully recovered.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1cf5c6e74b9c5e5b06b15209f6ac77b49014e270
Reviewed-on: https://review.whamcloud.com/32764
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
5 years agoLU-9120 lnet: add monitor thread 63/32763/11
Amir Shehata [Thu, 31 May 2018 00:20:10 +0000 (17:20 -0700)]
LU-9120 lnet: add monitor thread

Refactored the router checker thread to be the monitor thread.
The monitor thread will check router aliveness, expires messages
on the active list, recover local and remote NIs and resend messages.

In this patch it only checks router aliveness.

A deadline on the message is also added to keep track of when this
message should expire.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I712cad13d55328400ce61749967979673c4d673f
Reviewed-on: https://review.whamcloud.com/32763
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
5 years agoLU-9120 lnet: add lnet_health_sensitivity 62/32762/10
Amir Shehata [Mon, 19 Feb 2018 23:35:58 +0000 (15:35 -0800)]
LU-9120 lnet: add lnet_health_sensitivity

Add lnet_health_senstivity value. This value determines the amount
the NI health value is decremented by. The value defaults to 0,
which turns off the health feature by default. The user needs
to explicitly turn on this feature. The assumption is that many sites
will only have one interface in their nodes. In this case the
health feature will not increase the resiliency of their system.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I23f70b00f270803e5d296033e36a3a09986fd3cf
Reviewed-on: https://review.whamcloud.com/32762
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
5 years agoLU-9120 lnet: add health value per ni 61/32761/10
Amir Shehata [Fri, 16 Feb 2018 22:10:33 +0000 (14:10 -0800)]
LU-9120 lnet: add health value per ni

Add a health value per local network interface. The health value
reflects the health of the NI. It is initialized to 1000. 1000 is
chosen to be able to granularly decrement the health value on error.

If the NI is absolutely not healthy that will be indicated by an
LND event, which will flag that the NI is down and should never
be used.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0fb362a84c110f482633fb86a81c4d7b26c3ecba
Reviewed-on: https://review.whamcloud.com/32761
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
5 years agoLU-9120 lnet: refactor lnet_select_pathway() 60/32760/9
Amir Shehata [Tue, 13 Feb 2018 21:11:30 +0000 (13:11 -0800)]
LU-9120 lnet: refactor lnet_select_pathway()

lnet_select_pathway() is a complex monolithic function which handles
many send cases. Broke down lnet_select_pathway() to multiple
functions. Each function handles a different send case. This will
make it easier to add the handling of the different health cases in
future patches.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6e554c71eaa61f3e1bdfdc60bd9cd38f70df57b5
Reviewed-on: https://review.whamcloud.com/32760
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
5 years agoNew tag 2.11.54 2.11.54 v2_11_54 v2_11_54_0
Oleg Drokin [Fri, 17 Aug 2018 18:43:06 +0000 (14:43 -0400)]
New tag 2.11.54

Change-Id: If0cf2f80cbed8deb946dc57d9e8582c8b1e9b951
Signed-off-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-1895 tests: don't fail mmp test_5 due to race 55/32355/7
Andreas Dilger [Thu, 2 Aug 2018 15:50:42 +0000 (09:50 -0600)]
LU-1895 tests: don't fail mmp test_5 due to race

In the mmp.sh test_5() mount_after_unmount() testing, it is possible
that the first filesystem unmounts successfully before the second
one starts, and there is no contention for the MMP block.

This caused the test to fail on a regular basis.  However, there is
still value in running this test, since non-MMP race conditions have
previously been seen in this area (OBD device refcount, etc).

Make mount_after_unmount() more robust, only failing if the first
filesystem is still mounted at the same time as the second one.

Author: Andreas Dilger <adilger@whamcloud.com>

Test-Parameters: trivial mdtfilesystemtype=ldiskfs failover=true ostfilesystemtype=ldiskfs osscount=2 mdscount=2 mdtcount=1 austeroptions=-R iscsi=1 testlist=mmp
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I186b9ce0a5a0e1ed6f2b46895fec4a32e73ebbe5
Reviewed-on: https://review.whamcloud.com/32355
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-11152 test: work around find bug in sanity 133[fg] 34/32934/5
John L. Hammond [Fri, 3 Aug 2018 16:40:28 +0000 (11:40 -0500)]
LU-11152 test: work around find bug in sanity 133[fg]

Some versions of find do not handle the -ignore_readdir_race option
correctly. Work around this by calling error_ignore() rather than
error() in these cases.

Test-Parameters: trivial

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I0ad9cef3743f1748908dbab9087b0b54e6466d0a
Reviewed-on: https://review.whamcloud.com/32934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-11062 libcfs: use save_stack_trace for stack dump 52/32952/3
Yang Sheng [Tue, 7 Aug 2018 16:24:19 +0000 (00:24 +0800)]
LU-11062 libcfs: use save_stack_trace for stack dump

The stacktrace_ops has been removed recently. So we
have to use save_stack_trace_tsk for stack trace
dump.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Icb3d0dbd62c35fdd9b8de925aec9358a2208814f
Reviewed-on: https://review.whamcloud.com/32952
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11176 systemd: use univeral path for modprobe 44/32944/3
James Simmons [Mon, 6 Aug 2018 20:00:58 +0000 (16:00 -0400)]
LU-11176 systemd: use univeral path for modprobe

The program modprobe is not the same on all platforms. On RHEL
systems it is located in /usr/sbin. For Ubuntu/Debian which is
busybox based /sbin/modprobe is a symlink to /bin/kmod. On all
platforms to keep some sort of standard a symlink for modprobe
exist in /sbin. Update the lnet.service script to use the hard
patch /sbin/modprobe

Test-Parameters: trivial

Change-Id: I54342971a6ee1aa4ce86a9fae0ac4dcb167b1510
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32944
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10940 tests: skip sanity test 802 when quota enabled 00/32900/2
James Nunez [Mon, 30 Jul 2018 16:02:17 +0000 (10:02 -0600)]
LU-10940 tests: skip sanity test 802 when quota enabled

If ENABLE_QUOTA is set, sanity test 802 will try to set
the quota type on read-only targets. Setting quota requires
changes to the targets and, thus, does not make sense for
this test. sanity test 802 should be skipped if ENABLE_QUOTA
is set.

Test-Parameters: trivial envdefinitions=ENABLE_QUOTA=yes,ONLY=802 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic9c245045961867b7dc93be9268e6f4a4631c1dc
Reviewed-on: https://review.whamcloud.com/32900
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11171 tests: set parameters for racer_on_nfs 80/32880/3
James Nunez [Wed, 25 Jul 2018 16:59:33 +0000 (10:59 -0600)]
LU-11171 tests: set parameters for racer_on_nfs

The parallel-scale-nfs script calls the racer test without
specifying a directory to create files, create directories,
etc. in. In addition, racer needs a few other global
parameters to work properly, including the number of OSTs,
MDTs and which LFS to use.

Test-Parameters: trivial testlist=parallel-scale-nfsv3,parallel-scale-nfsv4
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic4f5f08ddec7a8df5cb818b434aa3473f6cd72cb
Reviewed-on: https://review.whamcloud.com/32880
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9007 lod: get rid of comp ost in use array 13/32813/3
Bobi Jam [Thu, 12 Jul 2018 22:09:56 +0000 (16:09 -0600)]
LU-9007 lod: get rid of comp ost in use array

Use lod_layout_component::llc_ost_indices to serve the same purpose.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I66c89fe6349b48b89593e34e9e985ec6ea5a1758
Reviewed-on: https://review.whamcloud.com/32813
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11109 mdt: handle zero length xattr values correctly 55/32755/3
John L. Hammond [Mon, 2 Jul 2018 19:52:01 +0000 (14:52 -0500)]
LU-11109 mdt: handle zero length xattr values correctly

In mdt_getxattr(), set OBD_MD_FLXATTR in mbo_valid of the reply's MDT
body so that the client can distinguish between nonexistent extended
attributes and zero length values. In ll_xattr_list() and
ll_getxattr_common() test for OBD_MD_FLXATTR and return 0 rather than
-ENODATA in the appropriate cases. Add sanity test_102t() to test that
zero length values are handled correctly.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I15649581c26dc52e83ca714b44f8372f29954ed5
Reviewed-on: https://review.whamcloud.com/32755
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array 06/32806/5
Teddy Zheng [Fri, 27 Jul 2018 05:37:18 +0000 (13:37 +0800)]
LU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array

Clients registed to MDS with OBD_CONNECT2_ARCHIVE_ID_ARRAY will
use array to pass ARCHIVED IDs. While clients without it still
use bitmap. This flag allows old clients connect to new MDSs.

Test-Parameters: trivial
Change-Id: I61a691fc262fdc921d5ff4aa88c1fd623f09d565
Signed-off-by: Teddy Zheng <teddy@ddn.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/32806
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
5 years agoLU-9538 utils: Tool for syncing file LSOM xattr 24/30124/21
Qian Yingjin [Thu, 16 Nov 2017 01:42:57 +0000 (09:42 +0800)]
LU-9538 utils: Tool for syncing file LSOM xattr

Add a helper tool for syncing file LSOM xattr.
Firstly, register a new changelog user:
lctl --device lustre-MDT0000 changelog_register

After perform some file operations on Lustre file system, run
this tool to sync file LSOM xattr:
llsom_sync -u cl1 -m lustre-MDT0000 /mnt/lustre

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia2878b48f7f665b01b230585921c78ae41846171
Reviewed-on: https://review.whamcloud.com/30124
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9087 build: add support for DKMS debs 28/25328/8
Michael Kuhn [Tue, 31 Jul 2018 02:12:05 +0000 (22:12 -0400)]
LU-9087 build: add support for DKMS debs

This introduces a new package lustre-client-modules-dkms that uses DKMS
to automatically recompile the client kernel modules on kernel upgrades.
The package is only created if the dkms-debs target is used, otherwise
the traditional kernel-specific package is created.

Test-Parameters: trivial
Change-Id: Ie9aeee29f7fd73938b148299d246c663a783ccd3
Signed-off-by: Michael Kuhn <michael.kuhn@informatik.uni-hamburg.de>
Reviewed-on: https://review.whamcloud.com/25328
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-7372 tests: stop running replay-dual test 26 02/32902/2
James Nunez [Mon, 30 Jul 2018 17:34:36 +0000 (11:34 -0600)]
LU-7372 tests: stop running replay-dual test 26

replay-dual test 26 fails frequently. We need to add
this test to the ALWAYS_EXCEPT list and, thus, stop
running the test until we fix the issue.

Test-Parameters: trivial testlist=replay-dual
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ida58ecc4933dae33d396c258fee64f6d3dbd4978
Reviewed-on: https://review.whamcloud.com/32902
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-4684 migrate: pack lmv ea in migrate rpc 24/31424/14
Lai Siyao [Sat, 20 Jan 2018 07:51:32 +0000 (15:51 +0800)]
LU-4684 migrate: pack lmv ea in migrate rpc

To support stripe directory migration, pack lmv_user_md in migrate
RPC. Add arguments of 'mdt-count' and 'mdt-hash' for 'lfs migrate'.

Disable directory migration related tests temprorily, and we'll
enable them later in the last patch of this set.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I914a9205a1a558da8c4231e7c83334621b5c92c0
Reviewed-on: https://review.whamcloud.com/31424
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10181 mdt: read on open for DoM files 11/23011/53
Mikhail Pershin [Thu, 18 Aug 2016 06:26:06 +0000 (09:26 +0300)]
LU-10181 mdt: read on open for DoM files

Read file data upon open and return it in reply. That works
only for file with Data-on-MDT layout and no OST components
initialized. There are three possible cases may occur:
1) file data fits in already allocated reply buffer (~9K)
   and is returned in that buffer in OPEN reply.
2) File fits in the maximum reply buffer (128K) and reply is
   returned with larger size to the client causing resend
   with re-allocated buffer.
3) File doesn't fit in reply buffer but its tail fills page
   partially then that tail is returned. This can be useful
   for an append case

Test-Parameters: mdssizegb=20 testlist=sanity-dom,dom-performance,racer
Change-Id: I5574ce5f74017fc654715e212b71fc3b905bdcae
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Reviewed-on: https://review.whamcloud.com/23011
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11186 ofd: fix for a final oid at sequence 91/32891/5
Alexander Boyko [Fri, 27 Jul 2018 13:10:23 +0000 (09:10 -0400)]
LU-11186 ofd: fix for a final oid at sequence

There was an error at the end of sequence range and last oid
0xffffffff can't be created. The 0xffffffff is a valid oid, and
sequence update happens only if it is created.

LustreError: 11756:0:(ofd_objects.c:217:ofd_precreate_objects())
lustre-OST0000:0xfffffffe:10737419264 hit the OBIF_MAX_OID (1<<32)!
LustreError: 11756:0:(ofd_dev.c:1764:ofd_create_hdl())
lustre-OST0000: unable to precreate: rc = -28

The patch fixes this error.

The conf-sanity 122 is added for checking sequence update.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I39ad66c05e8358591ca05fadabb2b46bee638070
Cray-bug-id: LUS-6222
Reviewed-on: https://review.whamcloud.com/32891
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11156 scrub: skip project quota inode 29/32829/7
Alexander Boyko [Wed, 18 Jul 2018 14:17:16 +0000 (10:17 -0400)]
LU-11156 scrub: skip project quota inode

Error happened when scrub try to process project quota inode.
Scrub thinks that it is IGIF, because it has no lma fid. And it starts
to create O/inum/{LAST_ID,d0-d31}, and fails with not enough credits.
The project quota inode s_prj_quota_inum should be skipped
from scrub iteration.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-6197
Change-Id: I38c347377a1c648ac3dd3e3ff4c4d65ee34cde39
Reviewed-on: https://review.whamcloud.com/32829
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11102 test: test fewer files on ZFS system 33/32933/2
Lai Siyao [Sun, 22 Jul 2018 01:44:02 +0000 (09:44 +0800)]
LU-11102 test: test fewer files on ZFS system

sanity test_415 may be slow on ZFS system, test with use fewer files.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie21e9e146508b395c8196adac1f6ba3e6854a1ef
Reviewed-on: https://review.whamcloud.com/32933
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-11127 test: sanity-flr OST not recovery fast enough 22/32922/3
Bobi Jam [Thu, 2 Aug 2018 04:06:21 +0000 (12:06 +0800)]
LU-11127 test: sanity-flr OST not recovery fast enough

use wait_recovery_complete() than wait_osc_import_state() to be more
patient for OST recovery.

Test-Parameters: trivial mdtcount=2 mdscount=2 testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr

Test-Parameters: mdtcount=2 mdscount=2 testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d652d09b0575a720e5ef9701fb7067cbf454079
Reviewed-on: https://review.whamcloud.com/32922
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9538 utils: fix lfs xattr.h header usage 18/32918/2
Andreas Dilger [Wed, 1 Aug 2018 19:46:35 +0000 (13:46 -0600)]
LU-9538 utils: fix lfs xattr.h header usage

The lfs_getsom() code added the use of lgetxattr() to lfs.c, but
included the <attr/xattr.h> header instead of <sys/xattr.h> as
is used by other code in the tree.  That adds a dependency on
libattr-devel that we don't really need.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8cccccbdc7186d0ed1bfb1c12d911da763a44bf5
Reviewed-on: https://review.whamcloud.com/32918
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11159 kernel: kernel update RHEL7.5 [3.10.0-862.9.1.el7] 45/32845/2
Jian Yu [Fri, 20 Jul 2018 07:16:12 +0000 (00:16 -0700)]
LU-11159 kernel: kernel update RHEL7.5 [3.10.0-862.9.1.el7]

Update RHEL7.5 kernel to 3.10.0-862.9.1.el7.

Change-Id: I2bb3462efbbdd8ed17803209b9508176ab04be96
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32845
Tested-by: Jenkins
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11166 tests: remove use of /proc/fs/jbd2/*/history file 58/32858/2
James Nunez [Mon, 23 Jul 2018 20:19:11 +0000 (14:19 -0600)]
LU-11166 tests: remove use of /proc/fs/jbd2/*/history file

The /proc/fs/jbd2/*/history file was removed several years
ago with a patch from Theodore Ts’o; commit bf6993276f. We
need to remove all uses of /proc/fs/jbd*/*/history from our
tests and utilities.

In particular, obdfilter-survey.sh and iokit-lstat rely on
/proc/fs/jbd2/*/history to collect data and must be modified.

Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ib25dd28a496840199de1e84f597748905bda80d2
Reviewed-on: https://review.whamcloud.com/32858
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-11160 build: Fix uuid / blkid dependency 42/32842/4
Nathaniel Clark [Thu, 19 Jul 2018 19:26:27 +0000 (15:26 -0400)]
LU-11160 build: Fix uuid / blkid dependency

UUID dependency stems from libblkid, so only link with uuid if blkid
is present.

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: If1cc293cc48210a065f8910ea655615b11268b5c
Reviewed-on: https://review.whamcloud.com/32842
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10627 tests: don't use libtool wrapper for applications 35/32835/11
James Simmons [Fri, 27 Jul 2018 23:55:49 +0000 (19:55 -0400)]
LU-10627 tests: don't use libtool wrapper for applications

It is a common pratice of lustre developers to test within the
lustre tree without actually installing lustre onto the local
node. In order for this to work the test suite needs to use
the binary executables instead of the libtool executable wrappers.
Add in the libtool LDFLAG to prevent the creation of the wrappers
for the lustre utils. Additionally properly set LD_LIBRARY_PATH
to where libtool caches the dynamic libraries.

Change-Id: I9570fcb65b927463076f28c47ecec924602bef4e
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32835
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11153 quota: initialize ver for default quota 27/32827/3
Hongchao Zhang [Wed, 18 Jul 2018 04:02:42 +0000 (00:02 -0400)]
LU-11153 quota: initialize ver for default quota

In qmt_set_with_lqe, the variable "ver" is not initialized
if the lqe using the default quota is being updated to use
new default quota setting.

Change-Id: I578543fc69009ef85c667092a66947d3c98a6a7d
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32827
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10916 lfs: improve lfs mirror resync 08/32808/4
Bobi Jam [Wed, 11 Jul 2018 16:24:27 +0000 (10:24 -0600)]
LU-10916 lfs: improve lfs mirror resync

Make mirror resync use read+write+write+... mode instead do the
resync on each stale mirror of a file separately (read+write,
read+write, ...).

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I627fa53fcfde4811b2cd9c84c8545defe151206c
Reviewed-on: https://review.whamcloud.com/32808
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>