Whamcloud - gitweb
fs/lustre-release.git
23 months agoLU-11244 build: apply IB_OPTIONS to debian rules 96/32996/2
Jinshan Xiong [Tue, 14 Aug 2018 03:33:33 +0000 (20:33 -0700)]
LU-11244 build: apply IB_OPTIONS to debian rules

IB_OPTIONS should be honored when making debian package.

Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Change-Id: Ibc16a5428d47f072499c39a62ea457c922ae7352
Reviewed-on: https://review.whamcloud.com/32996
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Martin Schroeder <martin.h.schroeder@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11227 lod: lod_sync: don't attempt sync to inactive targets 64/32964/5
Robin Humble [Thu, 9 Aug 2018 05:33:04 +0000 (15:33 +1000)]
LU-11227 lod: lod_sync: don't attempt sync to inactive targets

chgrp on a client triggers lod_sync() which in turn loops over OST/MDT
targets with dt_sync(). dt_sync() fails with -ENOTCONN when targets
have been deactivated (ie. set to active=0). The client retries
infinitely causing the client process to hang and considerably MDS
network traffic, load, and disk i/o.

the fix is to not attempt dt_sync() to ost/mdt targets that have been
deactivated and also (because of possible races) to ignore connection
errors.

tested with Lustre 2.10.4.

Signed-off-by: Robin Humble <plaguedbypenguins@gmail.com>
Change-Id: I617509cf7944541489f4fd9762c233b771132165
Reviewed-on: https://review.whamcloud.com/32964
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11226 flr: mirror resync regression 68/32968/5
Bobi Jam [Thu, 9 Aug 2018 06:35:49 +0000 (14:35 +0800)]
LU-11226 flr: mirror resync regression

There is a glitch in the lfs mirror resync tool in commit
0e5c12ac29a9622e8ca05d5e39cd5e2a721ace93, resync write needs to
restricted to the component's extent.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ifbd3f16b2f621407b31c7fe37ce9745de48fcc99
Reviewed-on: https://review.whamcloud.com/32968
Tested-by: Jenkins
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11146 lustre: fix setstripe for specific osts upon dir 14/32814/16
Wang Shilong [Wed, 11 Jul 2018 14:11:47 +0000 (22:11 +0800)]
LU-11146 lustre: fix setstripe for specific osts upon dir

LOV_USER_MAGIC_SPECIFIC function is broken and it
was not available for setting directory.

1)llite doesn't handle LOV_USER_MAGIC_SPECIFIC case
properly for dir {set,get}_stripe, and ioctl
LL_IOC_LOV_SETSTRIPE did not alloc enough buf,
copy ost lists from userspace.

2)lod_get_default_lov_striping() did not handle
LOV_USER_MAGIC_SPECIFIC type that newly created
files/dir won't inherit parent setting well.

3)there is not any case to cover lfs setstripe
'-o' interface which make it hard to figure out
when this function was broken.

Change-Id: Icc2ee60a474e5e565db12b35a9a38fde65b05bbd
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32814
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-8066 llite: move /proc/fs/lustre/llite/uuid to sysfs 01/32501/9
James Simmons [Sun, 29 Jul 2018 14:34:19 +0000 (10:34 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/uuid to sysfs

Move uuid file from /proc/fs/lustre/llite/*
to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: ec55a6299990efa969dfc00d95c72444ff1e3461

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: I2dc13c248879f554f9f7ed6dc62a6772a59f6f35
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32501
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
23 months agoLU-8215 tests: sanity-benchmark/iozone should wait for space recovery 99/20499/2
Alex Zhuravlev [Mon, 30 May 2016 10:45:51 +0000 (14:45 +0400)]
LU-8215 tests: sanity-benchmark/iozone should wait for space recovery

otherwise it may fail due to a transient state where the space confsumed
by the previous run hasn't recovered yet. this happens to tiny filesystems
used in local setups.

Change-Id: I04b3ce096621583629277c1e52c64a1551bc8ace
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/20499
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11201 lfsck: check linkea entry validity 58/32958/2
Lai Siyao [Sun, 22 Jul 2018 21:45:23 +0000 (05:45 +0800)]
LU-11201 lfsck: check linkea entry validity

Invalid linkea data may lead to dead loop in linkea iteration, check
linkea entry validity on unpack, and if entry is not unpacked, check
entry length validity.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-lfsck
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8e1890ed64fab38b85149ebbfecce04caaf41e17
Reviewed-on: https://review.whamcloud.com/32958
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11154 llite: use proper flags for FS_IOC_{FSSET,FSGET}XATTR 28/32828/6
Wang Shilong [Wed, 18 Jul 2018 08:30:28 +0000 (16:30 +0800)]
LU-11154 llite: use proper flags for FS_IOC_{FSSET,FSGET}XATTR

Two problems addressed by this patch:

1)struct fsxattr fsx_xflags has its own flags definition
like FS_XFLAG_XXX, we should use proper convert macro for
it, here we used wrong constant flag for project inherit flag.

2)FS_XFLAG_PROJINHERIT is not a valid vfs inode flag, looking
at current linux codes, local filesystem set project inherit
flag on its private flags, we should do similar thing to Lustre

Test-Parameters: trivial testlist=sanity-quota,sanity-quota,sanity-quota
Change-Id: I453db8ed074e8008f0ec145c726d7577121422e6
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32828
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-9120 lnet: LNet Health/Resiliency Feature
Oleg Drokin [Tue, 21 Aug 2018 16:15:26 +0000 (12:15 -0400)]
LU-9120 lnet: LNet Health/Resiliency Feature

The LNet Health/Resiliency feature adds the ability for LNet
to try out different interfaces available to it if message
sending fails. It maintains the health of each remote and local
interfaces and selects the best interface for sending from and best
remote interface to send to.

Merge commit '958ef71f33fa925e6657f9902702cd3677e15ec9'

Change-Id: I9ca740654c48d642fe130f98a60c5c59b9b4ebe1
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
23 months agoLU-10686 tests: stop running sanity-pfl test 9 45/32945/2
James Nunez [Mon, 6 Aug 2018 21:26:25 +0000 (15:26 -0600)]
LU-10686 tests: stop running sanity-pfl test 9

sanity-pfl test 9 consistently fails when run on a Lustre
file system with a single MDS. We need to add test 9 to
the ALWAYS_EXCCEPT list and, thus, stop running the test
until a fix for the underlying problem can be found.

Test-Parameters: trivial mdscount=1 mdtcount=1 testlist=sanity-pfl
Test-Parameters: mdscount=2 mdtcount=2 testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ife4b3c044e2777bb9b9010e0be7c00549a683fdc
Reviewed-on: https://review.whamcloud.com/32945
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments 39/32939/5
James Simmons [Mon, 6 Aug 2018 17:56:55 +0000 (13:56 -0400)]
LU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments

For the linux kernel their exist two ways to initialize a
struct timer_list. One method is with setup_timer() and the other is
with the DEFINE_TIMER macro. For earlier kernels both methods employed
callbacked with a argument of the type unsigned long. In kernels 4.15+
both methods of initialization use struct timer_list pointer for its
callback argument. During the 4.14 development phase we have
setup_timer() using struct timer_list as an argument for its callback
but DEFINE_TIMER was still using unsigned long. Additionally when
DEFINE_TIMER did move to using struct timer_list it reduced the number
of arguments to the macro. This patch handles the 4.14 kernel state of
development for the timer API.

Test-Parameters: trivial

Change-Id: I1c509838153328ed4bbdfa50468a396e13037d50
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32939
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11014 mdc: remove obsolete intent opcodes 61/32361/6
John L. Hammond [Fri, 11 May 2018 17:04:02 +0000 (12:04 -0500)]
LU-11014 mdc: remove obsolete intent opcodes

In enum ldlm_intent_flags, remove the obsolete constants IT_UNLINK,
IT_TRUNC, IT_EXEC, IT_PIN, IT_SETXATTR. Remove any handling code for
these opcodes.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I66f20e4c881cb77a481805a148a33f1c2daa5f0c
Reviewed-on: https://review.whamcloud.com/32361
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-8066 lod: migrate from proc to sysfs 98/32198/6
James Simmons [Sat, 28 Jul 2018 15:54:38 +0000 (11:54 -0400)]
LU-8066 lod: migrate from proc to sysfs

Move the lod module from using proc for most single value files
to sysfs. Create the default attrs for dt_devices which can be
used for other server side devices.

Change-Id: I734f01ef0d9f0c18efc141c835e4cf8ad2365250
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32198
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
23 months agoLU-11121 mdt: take discard lock at cleanup stage 30/29930/21
Mikhal Pershin [Fri, 3 Nov 2017 09:38:04 +0000 (12:38 +0300)]
LU-11121 mdt: take discard lock at cleanup stage

Call mdt_dom_check_and_discard() after mdt_object_unlock() to
avoid possible deadlock if some third lock is conflicting with
both like in the scenario below:
 thread1: mdt_object_lock() with some bits
 thread2: take conflicting lock and wait
 thread1: mdt_dom_check_and_discard() with bits conflicting
          with thread2 causes deadlock.

Patch enables dom layout in racer to test it on regular basis
Another minor update uses 'trap' in related tests.

Test-Parameters: mdssizegb=20 mdtcount=1 mdscount=1 testlist=sanity-dom,dom-performance,racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I63bedabb4a82cfa2f01e126d35dc8c2a89d64f56
Reviewed-on: https://review.whamcloud.com/29930
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11175 osc: serialize access to idle_timeout vs cleanup 83/32883/4
Alex Zhuravlev [Thu, 26 Jul 2018 07:52:38 +0000 (11:52 +0400)]
LU-11175 osc: serialize access to idle_timeout vs cleanup

use LPROCFS_CLIMP_CHECK() and LPROCFS_CLIMP_EXIT() as cl_import
can disappear due to umount.

Change-Id: I2a067f416691f39cde13cfae8f64ed5769d92041
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32883
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
23 months agoLU-6142 obdclass: Fix style issues for acl.c 51/32851/5
Arshad Hussain [Sun, 22 Jul 2018 03:00:27 +0000 (08:30 +0530)]
LU-6142 obdclass: Fix style issues for acl.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/acl.c

Change-Id: I00d4535123fb6677863bfd10937df5039ee7a339
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32851
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
23 months agoLU-6142 osd-ldiskfs: Fix style issues for osd_iam_lfix.c 49/32849/6
Arshad Hussain [Sat, 21 Jul 2018 19:35:19 +0000 (01:05 +0530)]
LU-6142 osd-ldiskfs: Fix style issues for osd_iam_lfix.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_iam_lfix.c

Change-Id: I9d32231e397689dd3806fecf106bc1ce2f1439a4
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32849
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
23 months agoLU-11116 llog: error handling cleanup 80/32780/2
Alexander Boyko [Wed, 4 Jul 2018 10:41:52 +0000 (06:41 -0400)]
LU-11116 llog: error handling cleanup

llog_cat_new_log() needs some error handling cleanup.
Save and restore thread lgi_cookie when using, to prevent
conflict/corruptions with llog_process_thread().

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I12fdfe1a72e77cfeb5ad464b8582db68a7bcfe16
Cray-bug-id: LUS-4780
Reviewed-on: https://review.whamcloud.com/32780
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11224 obd: use correct ip_compute_csum() version 53/32953/2
James Simmons [Tue, 7 Aug 2018 17:20:54 +0000 (13:20 -0400)]
LU-11224 obd: use correct ip_compute_csum() version

The linux kernel provides a generic platform independent version
of ip_compute_csum() as well as platform optimized versions. Some
platforms will disable the generic platform version in favor of
the optimized one. If the generic version is disabled and if the
checksum.h header from asm-generic is used then we will end up
with a undefined symbol error when loading the obdclass module.
The solution is to use the platform specific checksum.h header
that will handle using the generic or optimized version for us.
As a bounus we get better performance with the right kernel
configuration.

Test-Parameters: trivial

Change-Id: Ia0cfc9f4363bb61d5e381790655423ff5f91d9be
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32953
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
23 months agoLU-9325 ptlrpc: replace simple_strtol with kstrtol 85/32785/8
James Simmons [Thu, 5 Jul 2018 03:56:02 +0000 (23:56 -0400)]
LU-9325 ptlrpc: replace simple_strtol with kstrtol

Eventually simple_strtol() will be removed so replace its use in
the ptlrpc with kstrtoXXX() class of functions.

Change-Id: I41b44c5dc329832a901c1772a9ba0608df30282a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32785
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
23 months agoLU-9120 lnet: LNet Health/Resiliency Feature 23/33023/1
Amir Shehata [Sat, 18 Aug 2018 01:23:53 +0000 (18:23 -0700)]
LU-9120 lnet: LNet Health/Resiliency Feature

The LNet Health/Resiliency feature adds the ability for LNet
to try out different interfaces available to it if message
sending fails. It maintains the health of each remote and local
interfaces and selects the best interface for sending from and best
remote interface to send to.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcbbc34f8acfc3afb36ffe73eb27d69c147d02ce

23 months agoLU-9120 lnet: health error simulation 51/32951/13
Amir Shehata [Sun, 5 Aug 2018 21:37:29 +0000 (14:37 -0700)]
LU-9120 lnet: health error simulation

Modified the error simulation code to simulate health errors for
testing purposes. The specific error can be set. If multiple
errors are configured then one at random is chosen from the set.

EX:
lctl net_drop_add -s *@tcp -d *@tcp -m GET -i 1 -e local_interrupt

The -e can be repeated multiple times to specify different
errors to simulate. The available set are
local_interrupt
local_dropped
local_aborted
local_no_route
local_error
local_timeout
remote_error
remote_dropped
remote_timeout
network_timeout
random

a -n, "--random", has been added to randomize error generation for
drop rules. This will rely an interval value provided via -i. This
will generate a random number no bigger than interval. If the number
is smaller than half of the interval then the rule isn't matched,
otherwise it is.

The purpose of this is because drop matching can happen multiple
times in the path of sending the message, and using time based
or rate will not result in even error generation across the
multiple calls.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If070e29f68c3de10100a9d5eaa49d10cdb76a59a
Reviewed-on: https://review.whamcloud.com/32951
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
23 months agoLU-9120 lnet: print recovery queues content 50/32950/12
Amir Shehata [Sun, 5 Aug 2018 21:25:47 +0000 (14:25 -0700)]
LU-9120 lnet: print recovery queues content

Add commands to lnetctl to print recovery queues content from
user space.

Associated code to handle the IOCTL is added in LNet module.

for local NIs:
lnetctl debug recovery --local

for peer NIs:
lnetctl debug recovery --peer

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id136d506772d95381fd5d8346d772177442a84fb
Reviewed-on: https://review.whamcloud.com/32950
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
23 months agoLU-9120 lnet: add global health statistics 49/32949/12
Amir Shehata [Sun, 5 Aug 2018 21:16:49 +0000 (14:16 -0700)]
LU-9120 lnet: add global health statistics

Added global health statistics

Print that from lnetctl.

lnetctl stats show

lnet_selftest passes the statistics block over the wire. This,
unfortunately, creates an unnecessary backwards compatibility link
for lnet_selftest, which shouldn't be there. This patch breaks
this backwards compatibility, which means lnet_selftest will
not work with older selftest modules.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4a171c4f3cf13a1e8ab0d607d3b328352f727380
Reviewed-on: https://review.whamcloud.com/32949
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
23 months agoLU-9120 lnet: set health value from user space 63/32863/14
Amir Shehata [Tue, 24 Jul 2018 00:11:07 +0000 (17:11 -0700)]
LU-9120 lnet: set health value from user space

Add commands to lnetctl to set the health value.

for local NIs:
 lnetctl net set --nid <nid> --health <value>

for peer NIs:
 lnetctl peer set --nid <nid> --health <value>

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I06e1238df54c94bcfecadd84fbaa30cc1ce4dd68
Reviewed-on: https://review.whamcloud.com/32863
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
23 months agoLU-9120 lnet: show peer ni health stats 83/32783/15
Amir Shehata [Wed, 4 Jul 2018 18:49:38 +0000 (11:49 -0700)]
LU-9120 lnet: show peer ni health stats

Added another section in the peer ni show output for the health
statistics.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7ab3a9343972622d90a984c4f8c0b096b15ecbdc
Reviewed-on: https://review.whamcloud.com/32783
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: show local ni health stats 82/32782/15
Amir Shehata [Wed, 4 Jul 2018 17:42:58 +0000 (10:42 -0700)]
LU-9120 lnet: show local ni health stats

Added another section in the ni show output for the health
statistics.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id57013e510cf1fb4befdd7a4c18af28d1f995ce2
Reviewed-on: https://review.whamcloud.com/32782
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: set health sensitivity from lnetctl 79/32779/16
Amir Shehata [Wed, 4 Jul 2018 00:51:29 +0000 (17:51 -0700)]
LU-9120 lnet: set health sensitivity from lnetctl

Added an lnetctl command to set the health sensitivity
from userspace.

lnetctl set health_sensitivity {>0}

0 - turn off health evaluation
>0 - sensitivity value not more than 1000

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic9289b06c5c9285a69c1819a33b79e954319a01e
Reviewed-on: https://review.whamcloud.com/32779
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: set transaction timeout from lnetctl 78/32778/16
Amir Shehata [Wed, 4 Jul 2018 00:24:31 +0000 (17:24 -0700)]
LU-9120 lnet: set transaction timeout from lnetctl

Added an lnetctl command to set the transaction timeout
from userspace.

lnetctl set transaction_timeout {>0}

>0 - timeout in seconds.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I71274e82fd46bff8017e36c37de449d8a7639ec6
Reviewed-on: https://review.whamcloud.com/32778
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: set retry count from lnetctl 77/32777/16
Amir Shehata [Wed, 4 Jul 2018 00:04:16 +0000 (17:04 -0700)]
LU-9120 lnet: set retry count from lnetctl

Added an lnetctl command to set the retry_count from userspace.

lnetctl set retry_count [0|>0]

0 - turns off retries in the system
>0 - number of retries.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2fd5c88a91590195cfdad52e6d177619ccbbc840
Reviewed-on: https://review.whamcloud.com/32777
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: remove obsolete health functions 62/32862/14
Amir Shehata [Tue, 17 Jul 2018 18:58:22 +0000 (11:58 -0700)]
LU-9120 lnet: remove obsolete health functions

Removed obsolete health functions that were originally added
during the Multi-Rail project. Some assumptions were made about
the health implementation back then, that are no longer true.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4d4f47a03541d58da6807d9c2b786ecd868b50b0
Reviewed-on: https://review.whamcloud.com/32862
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
23 months agoLU-9120 lnet: Add ioctl to get health stats 76/32776/16
Amir Shehata [Tue, 3 Jul 2018 23:27:10 +0000 (16:27 -0700)]
LU-9120 lnet: Add ioctl to get health stats

At the time of this patch the sysfs statistics features is
still in development. Therefore, using ioctl to get the stats
from LNet.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia216484f9e6ee062c766c1043f456e38a27e4d39
Reviewed-on: https://review.whamcloud.com/32776
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
23 months agoLU-9120 lnet: add health statistics 75/32775/15
Amir Shehata [Tue, 3 Jul 2018 01:24:44 +0000 (18:24 -0700)]
LU-9120 lnet: add health statistics

Add a health statistics block for each local and peer NI.
These statistics will be incremented when processing errors reported
by lnet_finalize()

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia1ec4d5de50c04392605e94ac2f81adef78fc17c
Reviewed-on: https://review.whamcloud.com/32775
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: reset health value 73/32773/15
Amir Shehata [Mon, 2 Jul 2018 21:36:50 +0000 (14:36 -0700)]
LU-9120 lnet: reset health value

Added an IOCTL to set the local or peer ni health value.
This would be useful in debugging where we can test the selection
algorithm and recovery mechanism by reducing the health of an
interface.

If the value specified is -1 then reset the health value to maximum.
This is useful to reset the system once a network issue has been
resolved. There would be no need to wait for the interface to go to
fully healthy on its own. It might be desirable to shortcut the
process.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I45a5844bbaa72f769e37a39526773ef4c71118c0
Reviewed-on: https://review.whamcloud.com/32773
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
23 months agoLU-9120 lnet: handle fatal device error 72/32772/15
Amir Shehata [Fri, 29 Jun 2018 23:54:38 +0000 (16:54 -0700)]
LU-9120 lnet: handle fatal device error

The o2iblnd can receive device status on the QP event handler.
There are three in specific that are being handled in this patch:
IB_EVENT_DEVICE_FATAL
IB_EVENT_PORT_ERR
IB_EVENT_PORT_ACTIVE
For DEVICE_FATAL and PORT_ERR the NI associated with the QP is set
in fatal error mode. This NI will no longer be selected when sending
messages. When PORT_ACTIVE is received the NI associated with the QP
has the fatal error cleared and future messages can use that NI.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I282aa463927f489c46e4e45040e93478c9823a37
Reviewed-on: https://review.whamcloud.com/32772
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: remove duplicate timeout mechanism 92/32992/8
Amir Shehata [Mon, 13 Aug 2018 23:19:00 +0000 (16:19 -0700)]
LU-9120 lnet: remove duplicate timeout mechanism

Remove the duplicate GET/PUT timeout mechanism currently implemented
for discovery, as it has been replaced by a more generic timeout
mechanism for all GET/PUT messages.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I28efae8c1fca6fc07fcaad4bfacf123b00ff887d
Reviewed-on: https://review.whamcloud.com/32992
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: timeout delayed REPLYs and ACKs 71/32771/15
Amir Shehata [Fri, 29 Jun 2018 01:02:42 +0000 (18:02 -0700)]
LU-9120 lnet: timeout delayed REPLYs and ACKs

When a GET or a PUT which require an ACK are sent, add a response
tracker block on a percpt queue. When the REPLY/ACK are received
then remove the block from the percpt queue. The monitor thread
will wake up periodically to check if any of the blocks have
expired and if so, it will send a timeout event to the ULP and
flag the MD as stale, then unlink.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia219fca5a578d625819b9f9c8ee2b3aa050dce80
Reviewed-on: https://review.whamcloud.com/32771
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
23 months agoLU-9120 lnet: sysfs functions for module params 61/32861/14
Amir Shehata [Fri, 20 Jul 2018 23:13:55 +0000 (16:13 -0700)]
LU-9120 lnet: sysfs functions for module params

Allow transaction timeout and retry count module parameters to be
set and shown via sysfs.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ica3819f9343a4b45cb0ae322f85f936230fa8138
Reviewed-on: https://review.whamcloud.com/32861
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: calculate the lnd timeout 70/32770/15
Amir Shehata [Tue, 26 Jun 2018 03:59:07 +0000 (20:59 -0700)]
LU-9120 lnet: calculate the lnd timeout

Calculate the LND timeout based on the transaction timeout
and the retry count. Both of these are user defined values. Whenever
they are set the lnd timeout is calculated. The LNDs use these
timeouts instead of the LND timeout module parameter.

Retry count can be set to 0, which means no retries. In that case the
LND timeout will default to 5 seconds, which is the same as the
default transaction timeout.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I5a37caa2b69df155211864735ba8b275fc2d34bb
Reviewed-on: https://review.whamcloud.com/32770
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: add retry count 69/32769/15
Amir Shehata [Tue, 26 Jun 2018 02:16:46 +0000 (19:16 -0700)]
LU-9120 lnet: add retry count

Added a module parameter to define the number of retries on a
message. It defaults to 0, which means no retries will be attempted.
Each message will keep track of the number of times it has been
retransmitted. When queuing it on the resend queue, the retry count
will be checked and if it's exceeded, then the message will be
finalized.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3a622c2128ff89f22b0f8bff02f862163c9d007e
Reviewed-on: https://review.whamcloud.com/32769
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
23 months agoLU-9120 lnet: handle remote errors in LNet 67/32767/15
Amir Shehata [Fri, 22 Jun 2018 17:42:23 +0000 (10:42 -0700)]
LU-9120 lnet: handle remote errors in LNet

Add health value in the peer NI structure. Decrement the
value whenever there is an error sending to the peer.
Modify the selection algorithm to look at the peer NI health
value when selecting the best peer NI to send to.

Put the peer NI on the recovery queue whenever there is
an error sending to it. Attempt only to resend on REMOTE
DROPPED since we're sure the message was never received by
the peer. For other errors finalize the message.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcb41b3fb538e76b973bcb10fcd07638c118acb9
Reviewed-on: https://review.whamcloud.com/32767
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
23 months agoLU-9120 lnet: handle socklnd tx failure 66/32766/15
Amir Shehata [Fri, 22 Jun 2018 04:06:56 +0000 (21:06 -0700)]
LU-9120 lnet: handle socklnd tx failure

Update the socklnd to propagate the health status up to
LNet for handling.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iec090ade478acafb976aef7f6eaf5315ccd1fb67
Reviewed-on: https://review.whamcloud.com/32766
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: handle o2iblnd tx failure 65/32765/15
Amir Shehata [Fri, 15 Jun 2018 20:15:27 +0000 (13:15 -0700)]
LU-9120 lnet: handle o2iblnd tx failure

Monitor the different types of failures that might occur on the
transmit and flag the type of failure to be propagated to LNet
which will handle either by attempting a resend or simply
finalizing the message and propagating a failure to the ULP.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e2bb62257cb8bd2a5ed0054c172742c465731be
Reviewed-on: https://review.whamcloud.com/32765
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
23 months agoLU-9120 lnet: handle local ni failure 64/32764/15
Amir Shehata [Tue, 5 Jun 2018 20:34:52 +0000 (13:34 -0700)]
LU-9120 lnet: handle local ni failure

Added an enumerated type listing the different errors which
the LND can propagate up to LNet for further handling.

All local timeout errors will trigger a resend if the
system is configured for resends. Remote errors will
not trigger a resend to avoid creating duplicate message
scenario on the receiving end. If a transmit error is encountered
where we're sure the message wasn't received by the remote end
we will attempt a resend.

LNet level logic to handle local NI failure. When the LND finalizes
a message lnet_finalize() will check if the message completed
successfully, if so it increments the healthv of the local NI, but
not beyond the max, and if it failed then it'll decrement the healthv
but not below 0 and put the message on the resend queue.

On local NI failure the local NI is placed on a recovery queue.

The monitor thread will wake up and resend all the messages pending.
The selection algorithm will properly select the local and remote NIs
based on the new healthv.

The monitor thread will ping each NI on the local recovery queue. On
reply it will check if the NIs healthv is back to maximum, if it is
then it will remove it from the recovery queue, otherwise it'll
keep it there until it's fully recovered.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1cf5c6e74b9c5e5b06b15209f6ac77b49014e270
Reviewed-on: https://review.whamcloud.com/32764
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
23 months agoLU-9120 lnet: add monitor thread 63/32763/11
Amir Shehata [Thu, 31 May 2018 00:20:10 +0000 (17:20 -0700)]
LU-9120 lnet: add monitor thread

Refactored the router checker thread to be the monitor thread.
The monitor thread will check router aliveness, expires messages
on the active list, recover local and remote NIs and resend messages.

In this patch it only checks router aliveness.

A deadline on the message is also added to keep track of when this
message should expire.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I712cad13d55328400ce61749967979673c4d673f
Reviewed-on: https://review.whamcloud.com/32763
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
23 months agoLU-9120 lnet: add lnet_health_sensitivity 62/32762/10
Amir Shehata [Mon, 19 Feb 2018 23:35:58 +0000 (15:35 -0800)]
LU-9120 lnet: add lnet_health_sensitivity

Add lnet_health_senstivity value. This value determines the amount
the NI health value is decremented by. The value defaults to 0,
which turns off the health feature by default. The user needs
to explicitly turn on this feature. The assumption is that many sites
will only have one interface in their nodes. In this case the
health feature will not increase the resiliency of their system.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I23f70b00f270803e5d296033e36a3a09986fd3cf
Reviewed-on: https://review.whamcloud.com/32762
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
23 months agoLU-9120 lnet: add health value per ni 61/32761/10
Amir Shehata [Fri, 16 Feb 2018 22:10:33 +0000 (14:10 -0800)]
LU-9120 lnet: add health value per ni

Add a health value per local network interface. The health value
reflects the health of the NI. It is initialized to 1000. 1000 is
chosen to be able to granularly decrement the health value on error.

If the NI is absolutely not healthy that will be indicated by an
LND event, which will flag that the NI is down and should never
be used.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0fb362a84c110f482633fb86a81c4d7b26c3ecba
Reviewed-on: https://review.whamcloud.com/32761
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
23 months agoLU-9120 lnet: refactor lnet_select_pathway() 60/32760/9
Amir Shehata [Tue, 13 Feb 2018 21:11:30 +0000 (13:11 -0800)]
LU-9120 lnet: refactor lnet_select_pathway()

lnet_select_pathway() is a complex monolithic function which handles
many send cases. Broke down lnet_select_pathway() to multiple
functions. Each function handles a different send case. This will
make it easier to add the handling of the different health cases in
future patches.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6e554c71eaa61f3e1bdfdc60bd9cd38f70df57b5
Reviewed-on: https://review.whamcloud.com/32760
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
23 months agoNew tag 2.11.54 2.11.54 v2_11_54 v2_11_54_0
Oleg Drokin [Fri, 17 Aug 2018 18:43:06 +0000 (14:43 -0400)]
New tag 2.11.54

Change-Id: If0cf2f80cbed8deb946dc57d9e8582c8b1e9b951
Signed-off-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-1895 tests: don't fail mmp test_5 due to race 55/32355/7
Andreas Dilger [Thu, 2 Aug 2018 15:50:42 +0000 (09:50 -0600)]
LU-1895 tests: don't fail mmp test_5 due to race

In the mmp.sh test_5() mount_after_unmount() testing, it is possible
that the first filesystem unmounts successfully before the second
one starts, and there is no contention for the MMP block.

This caused the test to fail on a regular basis.  However, there is
still value in running this test, since non-MMP race conditions have
previously been seen in this area (OBD device refcount, etc).

Make mount_after_unmount() more robust, only failing if the first
filesystem is still mounted at the same time as the second one.

Author: Andreas Dilger <adilger@whamcloud.com>

Test-Parameters: trivial mdtfilesystemtype=ldiskfs failover=true ostfilesystemtype=ldiskfs osscount=2 mdscount=2 mdtcount=1 austeroptions=-R iscsi=1 testlist=mmp
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I186b9ce0a5a0e1ed6f2b46895fec4a32e73ebbe5
Reviewed-on: https://review.whamcloud.com/32355
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
23 months agoLU-11152 test: work around find bug in sanity 133[fg] 34/32934/5
John L. Hammond [Fri, 3 Aug 2018 16:40:28 +0000 (11:40 -0500)]
LU-11152 test: work around find bug in sanity 133[fg]

Some versions of find do not handle the -ignore_readdir_race option
correctly. Work around this by calling error_ignore() rather than
error() in these cases.

Test-Parameters: trivial

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I0ad9cef3743f1748908dbab9087b0b54e6466d0a
Reviewed-on: https://review.whamcloud.com/32934
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
23 months agoLU-11062 libcfs: use save_stack_trace for stack dump 52/32952/3
Yang Sheng [Tue, 7 Aug 2018 16:24:19 +0000 (00:24 +0800)]
LU-11062 libcfs: use save_stack_trace for stack dump

The stacktrace_ops has been removed recently. So we
have to use save_stack_trace_tsk for stack trace
dump.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Icb3d0dbd62c35fdd9b8de925aec9358a2208814f
Reviewed-on: https://review.whamcloud.com/32952
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11176 systemd: use univeral path for modprobe 44/32944/3
James Simmons [Mon, 6 Aug 2018 20:00:58 +0000 (16:00 -0400)]
LU-11176 systemd: use univeral path for modprobe

The program modprobe is not the same on all platforms. On RHEL
systems it is located in /usr/sbin. For Ubuntu/Debian which is
busybox based /sbin/modprobe is a symlink to /bin/kmod. On all
platforms to keep some sort of standard a symlink for modprobe
exist in /sbin. Update the lnet.service script to use the hard
patch /sbin/modprobe

Test-Parameters: trivial

Change-Id: I54342971a6ee1aa4ce86a9fae0ac4dcb167b1510
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32944
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10940 tests: skip sanity test 802 when quota enabled 00/32900/2
James Nunez [Mon, 30 Jul 2018 16:02:17 +0000 (10:02 -0600)]
LU-10940 tests: skip sanity test 802 when quota enabled

If ENABLE_QUOTA is set, sanity test 802 will try to set
the quota type on read-only targets. Setting quota requires
changes to the targets and, thus, does not make sense for
this test. sanity test 802 should be skipped if ENABLE_QUOTA
is set.

Test-Parameters: trivial envdefinitions=ENABLE_QUOTA=yes,ONLY=802 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic9c245045961867b7dc93be9268e6f4a4631c1dc
Reviewed-on: https://review.whamcloud.com/32900
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11171 tests: set parameters for racer_on_nfs 80/32880/3
James Nunez [Wed, 25 Jul 2018 16:59:33 +0000 (10:59 -0600)]
LU-11171 tests: set parameters for racer_on_nfs

The parallel-scale-nfs script calls the racer test without
specifying a directory to create files, create directories,
etc. in. In addition, racer needs a few other global
parameters to work properly, including the number of OSTs,
MDTs and which LFS to use.

Test-Parameters: trivial testlist=parallel-scale-nfsv3,parallel-scale-nfsv4
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ic4f5f08ddec7a8df5cb818b434aa3473f6cd72cb
Reviewed-on: https://review.whamcloud.com/32880
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9007 lod: get rid of comp ost in use array 13/32813/3
Bobi Jam [Thu, 12 Jul 2018 22:09:56 +0000 (16:09 -0600)]
LU-9007 lod: get rid of comp ost in use array

Use lod_layout_component::llc_ost_indices to serve the same purpose.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I66c89fe6349b48b89593e34e9e985ec6ea5a1758
Reviewed-on: https://review.whamcloud.com/32813
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11109 mdt: handle zero length xattr values correctly 55/32755/3
John L. Hammond [Mon, 2 Jul 2018 19:52:01 +0000 (14:52 -0500)]
LU-11109 mdt: handle zero length xattr values correctly

In mdt_getxattr(), set OBD_MD_FLXATTR in mbo_valid of the reply's MDT
body so that the client can distinguish between nonexistent extended
attributes and zero length values. In ll_xattr_list() and
ll_getxattr_common() test for OBD_MD_FLXATTR and return 0 rather than
-ENODATA in the appropriate cases. Add sanity test_102t() to test that
zero length values are handled correctly.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I15649581c26dc52e83ca714b44f8372f29954ed5
Reviewed-on: https://review.whamcloud.com/32755
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
2 years agoLU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array 06/32806/5
Teddy Zheng [Fri, 27 Jul 2018 05:37:18 +0000 (13:37 +0800)]
LU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array

Clients registed to MDS with OBD_CONNECT2_ARCHIVE_ID_ARRAY will
use array to pass ARCHIVED IDs. While clients without it still
use bitmap. This flag allows old clients connect to new MDSs.

Test-Parameters: trivial
Change-Id: I61a691fc262fdc921d5ff4aa88c1fd623f09d565
Signed-off-by: Teddy Zheng <teddy@ddn.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/32806
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
2 years agoLU-9538 utils: Tool for syncing file LSOM xattr 24/30124/21
Qian Yingjin [Thu, 16 Nov 2017 01:42:57 +0000 (09:42 +0800)]
LU-9538 utils: Tool for syncing file LSOM xattr

Add a helper tool for syncing file LSOM xattr.
Firstly, register a new changelog user:
lctl --device lustre-MDT0000 changelog_register

After perform some file operations on Lustre file system, run
this tool to sync file LSOM xattr:
llsom_sync -u cl1 -m lustre-MDT0000 /mnt/lustre

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia2878b48f7f665b01b230585921c78ae41846171
Reviewed-on: https://review.whamcloud.com/30124
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9087 build: add support for DKMS debs 28/25328/8
Michael Kuhn [Tue, 31 Jul 2018 02:12:05 +0000 (22:12 -0400)]
LU-9087 build: add support for DKMS debs

This introduces a new package lustre-client-modules-dkms that uses DKMS
to automatically recompile the client kernel modules on kernel upgrades.
The package is only created if the dkms-debs target is used, otherwise
the traditional kernel-specific package is created.

Test-Parameters: trivial
Change-Id: Ie9aeee29f7fd73938b148299d246c663a783ccd3
Signed-off-by: Michael Kuhn <michael.kuhn@informatik.uni-hamburg.de>
Reviewed-on: https://review.whamcloud.com/25328
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-7372 tests: stop running replay-dual test 26 02/32902/2
James Nunez [Mon, 30 Jul 2018 17:34:36 +0000 (11:34 -0600)]
LU-7372 tests: stop running replay-dual test 26

replay-dual test 26 fails frequently. We need to add
this test to the ALWAYS_EXCEPT list and, thus, stop
running the test until we fix the issue.

Test-Parameters: trivial testlist=replay-dual
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ida58ecc4933dae33d396c258fee64f6d3dbd4978
Reviewed-on: https://review.whamcloud.com/32902
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-4684 migrate: pack lmv ea in migrate rpc 24/31424/14
Lai Siyao [Sat, 20 Jan 2018 07:51:32 +0000 (15:51 +0800)]
LU-4684 migrate: pack lmv ea in migrate rpc

To support stripe directory migration, pack lmv_user_md in migrate
RPC. Add arguments of 'mdt-count' and 'mdt-hash' for 'lfs migrate'.

Disable directory migration related tests temprorily, and we'll
enable them later in the last patch of this set.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I914a9205a1a558da8c4231e7c83334621b5c92c0
Reviewed-on: https://review.whamcloud.com/31424
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10181 mdt: read on open for DoM files 11/23011/53
Mikhail Pershin [Thu, 18 Aug 2016 06:26:06 +0000 (09:26 +0300)]
LU-10181 mdt: read on open for DoM files

Read file data upon open and return it in reply. That works
only for file with Data-on-MDT layout and no OST components
initialized. There are three possible cases may occur:
1) file data fits in already allocated reply buffer (~9K)
   and is returned in that buffer in OPEN reply.
2) File fits in the maximum reply buffer (128K) and reply is
   returned with larger size to the client causing resend
   with re-allocated buffer.
3) File doesn't fit in reply buffer but its tail fills page
   partially then that tail is returned. This can be useful
   for an append case

Test-Parameters: mdssizegb=20 testlist=sanity-dom,dom-performance,racer
Change-Id: I5574ce5f74017fc654715e212b71fc3b905bdcae
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Reviewed-on: https://review.whamcloud.com/23011
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11186 ofd: fix for a final oid at sequence 91/32891/5
Alexander Boyko [Fri, 27 Jul 2018 13:10:23 +0000 (09:10 -0400)]
LU-11186 ofd: fix for a final oid at sequence

There was an error at the end of sequence range and last oid
0xffffffff can't be created. The 0xffffffff is a valid oid, and
sequence update happens only if it is created.

LustreError: 11756:0:(ofd_objects.c:217:ofd_precreate_objects())
lustre-OST0000:0xfffffffe:10737419264 hit the OBIF_MAX_OID (1<<32)!
LustreError: 11756:0:(ofd_dev.c:1764:ofd_create_hdl())
lustre-OST0000: unable to precreate: rc = -28

The patch fixes this error.

The conf-sanity 122 is added for checking sequence update.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I39ad66c05e8358591ca05fadabb2b46bee638070
Cray-bug-id: LUS-6222
Reviewed-on: https://review.whamcloud.com/32891
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11156 scrub: skip project quota inode 29/32829/7
Alexander Boyko [Wed, 18 Jul 2018 14:17:16 +0000 (10:17 -0400)]
LU-11156 scrub: skip project quota inode

Error happened when scrub try to process project quota inode.
Scrub thinks that it is IGIF, because it has no lma fid. And it starts
to create O/inum/{LAST_ID,d0-d31}, and fails with not enough credits.
The project quota inode s_prj_quota_inum should be skipped
from scrub iteration.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-6197
Change-Id: I38c347377a1c648ac3dd3e3ff4c4d65ee34cde39
Reviewed-on: https://review.whamcloud.com/32829
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11102 test: test fewer files on ZFS system 33/32933/2
Lai Siyao [Sun, 22 Jul 2018 01:44:02 +0000 (09:44 +0800)]
LU-11102 test: test fewer files on ZFS system

sanity test_415 may be slow on ZFS system, test with use fewer files.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie21e9e146508b395c8196adac1f6ba3e6854a1ef
Reviewed-on: https://review.whamcloud.com/32933
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11127 test: sanity-flr OST not recovery fast enough 22/32922/3
Bobi Jam [Thu, 2 Aug 2018 04:06:21 +0000 (12:06 +0800)]
LU-11127 test: sanity-flr OST not recovery fast enough

use wait_recovery_complete() than wait_osc_import_state() to be more
patient for OST recovery.

Test-Parameters: trivial mdtcount=2 mdscount=2 testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr

Test-Parameters: mdtcount=2 mdscount=2 testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d652d09b0575a720e5ef9701fb7067cbf454079
Reviewed-on: https://review.whamcloud.com/32922
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9538 utils: fix lfs xattr.h header usage 18/32918/2
Andreas Dilger [Wed, 1 Aug 2018 19:46:35 +0000 (13:46 -0600)]
LU-9538 utils: fix lfs xattr.h header usage

The lfs_getsom() code added the use of lgetxattr() to lfs.c, but
included the <attr/xattr.h> header instead of <sys/xattr.h> as
is used by other code in the tree.  That adds a dependency on
libattr-devel that we don't really need.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8cccccbdc7186d0ed1bfb1c12d911da763a44bf5
Reviewed-on: https://review.whamcloud.com/32918
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11159 kernel: kernel update RHEL7.5 [3.10.0-862.9.1.el7] 45/32845/2
Jian Yu [Fri, 20 Jul 2018 07:16:12 +0000 (00:16 -0700)]
LU-11159 kernel: kernel update RHEL7.5 [3.10.0-862.9.1.el7]

Update RHEL7.5 kernel to 3.10.0-862.9.1.el7.

Change-Id: I2bb3462efbbdd8ed17803209b9508176ab04be96
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32845
Tested-by: Jenkins
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11166 tests: remove use of /proc/fs/jbd2/*/history file 58/32858/2
James Nunez [Mon, 23 Jul 2018 20:19:11 +0000 (14:19 -0600)]
LU-11166 tests: remove use of /proc/fs/jbd2/*/history file

The /proc/fs/jbd2/*/history file was removed several years
ago with a patch from Theodore Ts’o; commit bf6993276f. We
need to remove all uses of /proc/fs/jbd*/*/history from our
tests and utilities.

In particular, obdfilter-survey.sh and iokit-lstat rely on
/proc/fs/jbd2/*/history to collect data and must be modified.

Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ib25dd28a496840199de1e84f597748905bda80d2
Reviewed-on: https://review.whamcloud.com/32858
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
2 years agoLU-11160 build: Fix uuid / blkid dependency 42/32842/4
Nathaniel Clark [Thu, 19 Jul 2018 19:26:27 +0000 (15:26 -0400)]
LU-11160 build: Fix uuid / blkid dependency

UUID dependency stems from libblkid, so only link with uuid if blkid
is present.

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: If1cc293cc48210a065f8910ea655615b11268b5c
Reviewed-on: https://review.whamcloud.com/32842
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10627 tests: don't use libtool wrapper for applications 35/32835/11
James Simmons [Fri, 27 Jul 2018 23:55:49 +0000 (19:55 -0400)]
LU-10627 tests: don't use libtool wrapper for applications

It is a common pratice of lustre developers to test within the
lustre tree without actually installing lustre onto the local
node. In order for this to work the test suite needs to use
the binary executables instead of the libtool executable wrappers.
Add in the libtool LDFLAG to prevent the creation of the wrappers
for the lustre utils. Additionally properly set LD_LIBRARY_PATH
to where libtool caches the dynamic libraries.

Change-Id: I9570fcb65b927463076f28c47ecec924602bef4e
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32835
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11153 quota: initialize ver for default quota 27/32827/3
Hongchao Zhang [Wed, 18 Jul 2018 04:02:42 +0000 (00:02 -0400)]
LU-11153 quota: initialize ver for default quota

In qmt_set_with_lqe, the variable "ver" is not initialized
if the lqe using the default quota is being updated to use
new default quota setting.

Change-Id: I578543fc69009ef85c667092a66947d3c98a6a7d
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32827
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10916 lfs: improve lfs mirror resync 08/32808/4
Bobi Jam [Wed, 11 Jul 2018 16:24:27 +0000 (10:24 -0600)]
LU-10916 lfs: improve lfs mirror resync

Make mirror resync use read+write+write+... mode instead do the
resync on each stale mirror of a file separately (read+write,
read+write, ...).

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I627fa53fcfde4811b2cd9c84c8545defe151206c
Reviewed-on: https://review.whamcloud.com/32808
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10947 build: test if xfsprogs are installed 39/32539/4
James Simmons [Fri, 27 Jul 2018 23:59:32 +0000 (19:59 -0400)]
LU-10947 build: test if xfsprogs are installed

We need xfsprogs because conf-sanity test_116 use mkfs.xfs. No
need to install the xfsprogs for a single test so just skip the
test if mkfs.xfs is not available. Set $tmpmnt to $TMP/$tdir
since /mnt is read only for my diskless setup. The $TMP is not
in my setup.

Test-Parameters: trivial testlist=conf-sanity mdsdistro=sles12sp3 ossdistro=sles12sp3

Change-Id: I1db88afe7e382e1032ed7e2844a1dec1c032530e
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32539
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-8066 llite: move /proc/fs/lustre/llite/fstype to sysfs 96/32896/2
James Simmons [Sun, 29 Jul 2018 14:28:33 +0000 (10:28 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/fstype to sysfs

Move fstype file from /proc/fs/lustre/llite/*
to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: 0cee667682b55d7c389d77877adbd63360415baa

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: Ic06988c1f9ccfa6a32f99f5ea8ddcf4820a62a8e
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32896
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 llite: move /proc/fs/lustre/llite/client_type to sysfs 99/32499/5
James Simmons [Sun, 29 Jul 2018 02:13:43 +0000 (22:13 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/client_type to sysfs

Move client_type file from /proc/fs/lustre/llite/*
to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: 95e1b6b0cff09292158ecc0701f721315167b64e

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: I056b0b3693b0c747d5a45fb6485cb5c4975acb1b
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32499
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 llite: move /proc/fs/lustre/llite/files* to sysfs 98/32498/6
James Simmons [Sat, 28 Jul 2018 16:02:35 +0000 (12:02 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/files* to sysfs

Move filestotal and filesfree files from /proc/fs/lustre/llite/*
to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: 7267ec0d8726c214aaf24ca9e8baebb443b0da75

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: I84b6d6a0868058d60a83a0700f0389d3ba685ddb
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32498
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 llite: move /proc/fs/lustre/llite/kbytes* to sysfs 97/32497/7
James Simmons [Fri, 27 Jul 2018 22:13:42 +0000 (18:13 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/kbytes* to sysfs

Move kbytestotal, kbytesavail and kbytesfree files from
/proc/fs/lustre/llite/* to /sys/fs/lustre/llite/*/

This is a modified version of

Linux-commit: 5804b11e1487558c6740282a01a08bb4ba0c6d06

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: Ifb43c01bb0055051cecb01ed6a183d1797d3870e
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32497
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-8066 llite: move /proc/fs/lustre/llite/blocksize to sysfs 96/32496/9
James Simmons [Thu, 19 Jul 2018 15:57:30 +0000 (11:57 -0400)]
LU-8066 llite: move /proc/fs/lustre/llite/blocksize to sysfs

Move blocksize file from /proc/fs/lustre/llite/*/ to
/sys/fs/lustre/llite/*/blocksize

This is a heavly modified version of

Linux-commit: 364bcfc8634d5625dbb41683b061bddf307a70e8

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: I0b54890e6c5d5f172c2cc3d081c38ea2307b0f88
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32496
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
2 years agoLU-9474 tests: replace import_file with copytool import 98/30098/21
Quentin Bouget [Wed, 15 Nov 2017 10:05:15 +0000 (10:05 +0000)]
LU-9474 tests: replace import_file with copytool import

In sanity-hsm, replace every call to import_file() using the newer
copytool() interface (copytool import).

The appropriate modifications to any function that internally uses
the variable HSM_ARCHIVE are made.

From now on, tests in sanity-hsm that need to launch a copytool,
import a file, rebind archived data should do so using:
 - copytool setup
 - copytool import
 - copytool rebind

With this patch, sanity-hsm also completes the transition from
trap() to stack_trap().

Test-Parameters: trivial clientcount=3 mdscount=2 testlist=sanity-hsm,sanity-hsm
Change-Id: I911964a4bafd4d879e08f506cfe33e3db29cff42
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-on: https://review.whamcloud.com/30098
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-6142 obdclass: Fix style issues for obdo_server.c 97/32897/2
Arshad Hussain [Sun, 22 Jul 2018 16:13:25 +0000 (21:43 +0530)]
LU-6142 obdclass: Fix style issues for obdo_server.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/obdo_server.c

Change-Id: If2c46841c39258937a0f64ef9e6d589c6ea41809
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32897
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
2 years agoLU-6142 ko2iblnd: remove typedefs from ko2iblnd 02/32802/3
James Simmons [Thu, 19 Jul 2018 19:32:35 +0000 (15:32 -0400)]
LU-6142 ko2iblnd: remove typedefs from ko2iblnd

Change the typedefs in lnd ko2iblnd to proper structures.
Several other style changes to fix checkpatch issues with
code impacted by typedef change.

Test-Parameters: trivial

Change-Id: I55e9c91e392dee804802153bd609afc858a3591b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32802
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10181 mds: init cpt params for mdt IO service 50/31750/12
Mikhail Pershin [Fri, 23 Mar 2018 14:18:07 +0000 (17:18 +0300)]
LU-10181 mds: init cpt params for mdt IO service

Initialize CPT values for MDS IO service similar to
OST's values.

Test-Parameters: mdtcount=2 mdscount=2 mdssizegb=20 testlist=dom-performance
Signed-off-by: Mike Pershin <mpershin@whamcloud.com>
Change-Id: I96b5f78c7212d31d43ea1b7abd75000fb19beee9
Reviewed-on: https://review.whamcloud.com/31750
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11148 ldlm: enable trybits for PDO lock 20/32820/4
Lai Siyao [Sat, 23 Jun 2018 09:06:15 +0000 (17:06 +0800)]
LU-11148 ldlm: enable trybits for PDO lock

When trybits was added (in LU-9148), it doesn't enable trybits for
PDO lock in mdt_object_local_lock(), which may cause deadlock in
try_lock.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Icfca639cfbd84e1a3bc25d91de0460d2951c2c2b
Reviewed-on: https://review.whamcloud.com/32820
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10288 lfsck: layout LFSCK for mirrored file 05/32705/5
Fan Yong [Sat, 14 Jul 2018 21:15:21 +0000 (05:15 +0800)]
LU-10288 lfsck: layout LFSCK for mirrored file

This patch makes the layout LFSCK to support mirrored file
as following:

1. Verify mirrored file's LOV EA and PFID EA, including all
   kinds of inconsistencies as non-mirrored file may hit.

2. Rebuild mirrored file's LOV EA from orphan OST-objects,
   recover the component's status/flags before the crash:
   init, stale, and so on.

3. For the mirrored file with dangling reference (OST object),
   it does NOT rebuild the lost OST-object from other replica,
   instead, it either reports the curruption or re-create empty
   OST-object that follows the same rules as non-mirrored case.

Some code cleanup and new test cases for LFSCK against mirrored file.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I560746fc2aae40101dcb0e8513b6c7ed54902ec6
Reviewed-on: https://review.whamcloud.com/32705
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11104 mdt: rename may cause deadlock 01/32701/9
Lai Siyao [Tue, 12 Jun 2018 04:11:36 +0000 (12:11 +0800)]
LU-11104 mdt: rename may cause deadlock

In rename locking, there are two situations we need to lock target
parent before source parent:
1. source parent is subdir of target parent.
2. source and target parents are both stripes of the same directory,
   and stripe index of source parent is after that of target parent.

But the check for the second situation is missing, which may cause
deadlock if another thread is taking stripe locks of their parent.

Cleanup mdd_is_subdir().

Add sanityn.sh test_81b.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ib96fb7b286e7dfdea868ef2fa4919f8d3f1567f9
Reviewed-on: https://review.whamcloud.com/32701
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11102 ldlm: run local lock bl_ast only when necessary 38/32738/9
Lai Siyao [Wed, 20 Jun 2018 18:30:18 +0000 (02:30 +0800)]
LU-11102 ldlm: run local lock bl_ast only when necessary

LDLM local lock will be canceled after use, and it should only
run bl_ast if it needs to trigger Commit-on-Sharing, otherwise
if this bl_ast does nothing, it will prevent subsequent
operations to run bl_ast again, therefore Commit-on-Sharing
can't be triggered.

For example, a concurrent setattr on a striped directory and
rename under this directory:
1. setattr takes UPDATE lock of directory, but not unlock it yet
(i.e., this lock is not downgraded to COS lock).
2. a concurrent 'mv' under this directory will first getattr file by
name, this getattr will take UPDATE lock of this directory, which is
racing with setattr, but this getattr is not a distributed operation,
and the lock still has writer (by setattr), bl_ast does nothing.
3. setattr unlocks this UPDATE lock.
4. rename tries to lock UPDATE lock of this directory, but this lock
was bl_ast was run before(though nothing did), it won't run again,
rename will wait until setattr transaction commit.

To fix this, run local lock bl_ast only when it will trigger
Commit-on-Sharing.

Add sanity.sh test_415 to verify this.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Idae241076e7cae8fe06ae6a34481fe19c7dfd2f3
Reviewed-on: https://review.whamcloud.com/32738
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11103 lod: add lock for lod_object layout 89/32589/8
Lai Siyao [Thu, 7 Jun 2018 11:53:14 +0000 (19:53 +0800)]
LU-11103 lod: add lock for lod_object layout

lod_object layout is loaded on demand, and it may be updated
by layout split/merge. To avoid race, add ldo_layouyt_mutex to
serialize layout load/free/reload.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I43c15a3b07254eadef95a14b288267904a1cd621
Reviewed-on: https://review.whamcloud.com/32589
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11141 tests: put sanity-quota 61 on slow list 03/32903/3
James Nunez [Wed, 25 Jul 2018 18:41:44 +0000 (12:41 -0600)]
LU-11141 tests: put sanity-quota 61 on slow list

Since the patch for LU-11141, with commit, 6316b42a73f8,
landed, sanity-quota test 61 takes between 20 and 50
minutes to run. Test 61 needs to be added to the slow
list and, thus, will not be run unless the SLOW
variable is true.

Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: envdefinitions="SLOW=yes" testlist=sanity-quota

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I2b6c21996ef2db9472da8838d3f41fed60ba5102
Reviewed-on: https://review.whamcloud.com/32903
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
2 years agoLU-11071 build: Add server build support for Ubuntu 18.04 13/32613/10
Li Dongyang [Thu, 19 Jul 2018 16:24:36 +0000 (12:24 -0400)]
LU-11071 build: Add server build support for Ubuntu 18.04

This enables server build for Ubuntu 18.04 LTS, the ldiskfs
patches are based on Gael's 4.12 support,
they apply to kernel versions 4.15.0-20.21 to 4.15.0-23.25

There's also a small fix to make dpkg happy when installing
lustre packages which requires lustre-client-utils.

Test-Parameters: clientdistro=ubuntu1604 trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Gael Delbary <gael.delbary@cea.fr>
Change-Id: I65e1a5ee0d17115f23ba071ff1ab23b4fb22e78f
Reviewed-on: https://review.whamcloud.com/32613
Tested-by: Jenkins
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9538 mdt: Lazy size on MDT 60/29960/43
Qian Yingjin [Tue, 7 Nov 2017 08:27:07 +0000 (16:27 +0800)]
LU-9538 mdt: Lazy size on MDT

The design of Lazy size on MDT (LSOM) does not guarantee the
accuracy. A file that is being opened for a long time might
cause inaccurate LSOM for a very long time. And also eviction or
crash of client might cause incomplete process of closing a file,
thus might cause inaccurate LSOM. A precise LSOM could only be read
from MDT when 1) all possible corruption and inconsistency caused
by client eviction or client/server crash have all been fixed by
LFSCK and 2) the file is not being opened for write.
In the first step of implementing LSOM, LSOM will not be accessible
from client. Instead, LSOM values can only be accessed on MDT. Thus,
no interface or logic codes will be added on client side to enabled
the access of LSOM from client side.
The LSOM will be saved as an EA value on MDT.
LSOM includes both the apparent size and also the disk usage of
the file.
Whenever a file is being truncated, the LSOM of the file on MDT
will be updated.
Whenever a client is closing a file, ll_prepare_close() will send
the size and blocks to the MDS. The MDS will update the LSOM of
the file if the file size or block size is being increased.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If4032a55f448a65235a6b3db58f857c74222faa3
Reviewed-on: https://review.whamcloud.com/29960
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10928 tests: sanity/133b should wait a bit 69/32069/3
Alex Zhuravlev [Thu, 19 Apr 2018 10:40:57 +0000 (13:40 +0300)]
LU-10928 tests: sanity/133b should wait a bit

to invalidate cache in obd_statfs()

Test-Parameters: trivial

Change-Id: I08283542962e4b88ca4b5dcde4dfcc58316c1bba
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/32069
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11149 build: enable KMP for Mellanox build 33/32833/9
Minh Diep [Thu, 19 Jul 2018 14:27:55 +0000 (07:27 -0700)]
LU-11149 build: enable KMP for Mellanox build

* We need to build Mellanox KMP to avoid error
in symbol dependency when installing lustre
* Remove all Mellanox config parameters and use
default

Test-Parameters: trivial

Change-Id: I4676d01bd5f788581e1be6df98d2d787a5419c07
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/32833
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11083 tests: automatically load external modules 90/32790/4
John L. Hammond [Thu, 5 Jul 2018 16:02:25 +0000 (11:02 -0500)]
LU-11083 tests: automatically load external modules

In the test-framework function load_module(), try to load (using
modprobe) any not yet loaded modules (which are assumed to be
external) that the current module depends on.

Test-Parameters: trivial

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Id1d10519b00854600d095b861670e96f906298fc
Reviewed-on: https://review.whamcloud.com/32790
Tested-by: Jenkins
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11010 tests: remove calls to return after skip() 31/32731/3
James Nunez [Tue, 26 Jun 2018 16:46:14 +0000 (10:46 -0600)]
LU-11010 tests: remove calls to return after skip()

The skip() routine now contains a call to exit. All calls
to skip() and skip_env() should be reviewed and calls to
return that followed skip() should be removed.

This is the second patch in a series that removes calls
to return after skip() in the Lustre test suites.

Calls to return after skip() are removed for:
dne_sanity
insanity
obdfilter-survey
sgpdd-survey

Test-Parameters: trivial testlist=dne-sanity,insanity,obdfilter-survey,sgpdd-survey
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I4b9aeaeddd673dcba371b8340dd635ddeed2b6be
Reviewed-on: https://review.whamcloud.com/32731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11068 build: remove invalid kernel srpm location 06/32606/2
Minh Diep [Fri, 1 Jun 2018 17:56:56 +0000 (10:56 -0700)]
LU-11068 build: remove invalid kernel srpm location

The location has never been existed

Change-Id: I8958bbdb5c61284c55d6cc337ac92832f91ee08b
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/32606
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10254 tests: fix racer version checks 07/30307/9
Andreas Dilger [Fri, 13 Jul 2018 20:44:39 +0000 (14:44 -0600)]
LU-10254 tests: fix racer version checks

Fix the checks for enabling DOM, PFL, and FLR tests in file_create.sh.
The $LCTL variable was unset in the test script, so the version check
was failing.

Instead of doing the version check inside file_create.sh do it in the
Lustre-specific racer.sh test script, where other version checks live.
This enables PFL and FLR testing by default, but leaves DOM tests off.

Author: Andreas Dilger <adilger@whamcloud.com>

Test-Parameters: trivial testlist=racer envdefinitions=SLOW=yes
Test-Parameters: testlist=racer mdtfilesystemtype=zfs ostfilesystemtype=zfs
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I2aeab0911f19f9741212925cf9b4aeb70e3ebbe5
Reviewed-on: https://review.whamcloud.com/30307
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms 23/32823/2
Jian Yu [Tue, 17 Jul 2018 00:09:15 +0000 (17:09 -0700)]
LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms

While choosing OST to create object, both lod_alloc_qos() and
lod_alloc_rr() functions use lod_statfs_and_check() function
to check whether the OST is available for new OST objects or not.
However, OST with max_create_count=0 is not checked in that
function and just returned as an available OST.

This patch fixes the above issue by detecting OST with
max_create_count=0 in lod_statfs_and_check() and skip it.

Change-Id: I04476a4b369e99133bd89c00155fd9f51bf0c930
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32823
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11174 quota: use sync io to test quota 74/32874/3
Hongchao Zhang [Fri, 20 Jul 2018 19:45:56 +0000 (15:45 -0400)]
LU-11174 quota: use sync io to test quota

In test_61 of sanity-quota, the client cache (grant) could affect
the quota behavior, using sync io to avoid the effect of it.

Test-Parameters: trivial testlist=sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota

Change-Id: I08bc19c5e7ac4f9cb679f96a2299c0be772f0330
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32874
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>