Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-9934 build: address issues raised by gcc7 76/30376/15
James Simmons [Sun, 14 Jan 2018 04:23:22 +0000 (23:23 -0500)]
LU-9934 build: address issues raised by gcc7

Starting with gcc version 7 several platforms have enabled new
flags to report potential problems when compling code. For lustre
much of the reported problems deal with potential buffer overruns.
Also we have unused data structures and are not properly
initializing some data structures.

Change-Id: I10243ea88f2c726032d179febdbf26f28de13715
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9833 utils: resolve buffer over runs in lustre_rsync 73/30373/9
James Simmons [Fri, 12 Jan 2018 19:05:52 +0000 (14:05 -0500)]
LU-9833 utils: resolve buffer over runs in lustre_rsync

Newer version of gcc will report of snprintf is used in an
incorrect way. For the case of the lustre_rsync application
many times two buffers of size PATH_MAX are being placed into
one buffer of the size PATH_MAX. This can easily lead to a
buffer overrun. This patch resolves those bugs.

Test-Parameters: trivial testlist=lustre-rsync-test

Change-Id: I035b4a3b1d9695a16822649c2165e492e9f2879d
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30373
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10290 tests: properly set fileset with combined MGT/MDT 93/30293/5
Sebastien Buisson [Tue, 28 Nov 2017 09:25:31 +0000 (10:25 +0100)]
LU-10290 tests: properly set fileset with combined MGT/MDT

We need to make sure MDS receives updated fileset info from MGS.
In case of combined MGT/MDT, directly setting fileset on the node
will mask llog-based info retrieval mechanism.
This patch also removes sanity-sec test_27 from ALWAYS_EXCEPT.

Test-Parameters: trivial testlist=sanity-sec,sanity-sec,sanity-sec,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7f25f03a213833f15d082a871ac6368a0e11aa82
Reviewed-on: https://review.whamcloud.com/30293
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10526 build: Ubuntu Kernel 4.4.0 lacks symbols used by o2iblnd.c 93/30893/2
Martin Schroeder [Wed, 17 Jan 2018 10:19:09 +0000 (11:19 +0100)]
LU-10526 build: Ubuntu Kernel 4.4.0 lacks symbols used by o2iblnd.c

Recently, a change has been merged to "lnet/klnds/o2iblnd/o2iblnd.c" which
introduces the usage of IB_DEVICE_SG_GAPS_REG and IB_MR_TYPE_SG_GAPS.

Unfortunately, these symbols are not available in the 4.4.0 Kernels as used
by Ubuntu 14/16.

Additionally, there seems to be general warning against their use:
 - https://patchwork.kernel.org/patch/9573483/
 - https://lkml.org/lkml/2017/3/13/206

 Also, there is a related performance issue as reported in LU-10394.

The solution is to create a preprocessor guard around their use, so that
Kernels lacking these symbols will not use them and revert to using the older
IB_MR_TYPE_MEM_REG, instead.

Test-Parameters: trivial
Signed-off-by: Martin Schroeder <martin.h.schroeder@intel.com>
Change-Id: Ie835d6e04f3859634ba508c24dff1f27f1b24cf6
Reviewed-on: https://review.whamcloud.com/30893
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8346 obdclass: protect key_set_version 48/27448/10
Hongchao Zhang [Sat, 13 Jan 2018 10:08:29 +0000 (18:08 +0800)]
LU-8346 obdclass: protect key_set_version

In lu_context_refill, the key_set_version should be protected
before comparing it to version stored in the lu_context.

This patch is a supplement of the previous patch
https://review.whamcloud.com/#/c/28405/, which adds protection
for key_set_version from modification in lu_context_refill
and lu_context_key_degister.

Change-Id: I201f56214382a717cfc31ba573e06fec9fbedae4
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/27448
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9422 tests: generalize SLES version check 06/26906/8
James Nunez [Mon, 6 Nov 2017 16:03:40 +0000 (09:03 -0700)]
LU-9422 tests: generalize SLES version check

Some tests in the Lustre test suites cannot run on all
versions of SuSE Linux and need to be skipped based on
the SuSE version.

Generalize the function that compiles the version of SLES
and skip tests based on this new routine.

Test-Parameters: trivial

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ia61022c4a477da1968210550fb7a628d31c062ce
Reviewed-on: https://review.whamcloud.com/26906
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
3 years agoLU-9228 nrs: TBF realtime policies under congestion 87/26087/8
Qian Yingjin [Mon, 6 Mar 2017 07:05:01 +0000 (15:05 +0800)]
LU-9228 nrs: TBF realtime policies under congestion

During TBF evaluation, we find that when the sum of I/O bandwidth
requirements for all classes exceeds the system capacity, the
classes with same rate limits get less bandwidth than preconfigured
evenly.

The reason is as follows: under heavy load on a congested server,
it will result in some missed deadlines for some classes. The
calculated tokens may larger than 1 during dequeuing. In the original
implementation, all classes are equally handled to simply discard
exceeding tokens.

Thus, a Hard Token Compensation (HTC) strategy is proposed. A class
can be configured with HTC feature by the rule it matches. This
feature means that requests in this kind of class queues have high
real-time requirements and that the bandwidth assignment must be
satisfied as good as possible. When deadline misses happen, the
class keeps the deadline unchanged and the time residue (the
remainder of elapsed time divided by 1/r) is compensated to the
next round. This ensures that the next idle I/O thread will always
select this class to serve until all accumulated exceeding tokens
are handled or there are no pending requests in the class queue.

A new command format is added to enable realtime feature for a rule:
start $ruleName jobid={dd.0} rate=100 realtime=1

Change-Id: I3c867052c27e57a30ccdfe649e0905d141792663
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/26087
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10460 osd-zfs: Add tunables to disable sync 61/7761/9
Brian Behlendorf [Fri, 12 May 2017 15:05:13 +0000 (08:05 -0700)]
LU-10460 osd-zfs: Add tunables to disable sync

This patch allows replacing the call to txg_wait_synced(),
which blocks waiting for a full pool sync, with a smaller
tunable delay.  This delay is intended to stand in for the time
it would have taken to synchronously write the dirty data to
the intent log.

This allows testing ZFS behaviour as if there were a low-latency
ZIL device enabled to handle sync IO operations.  Setting the
delay to zero disables sync operations on the server completely.
However, be aware that no data is guaranteed to be written to
disk if the tunables are enabled, and this patch is solely for
performance analysis.  By default the tunables are set to -1,
which leaves the system using the normal sync behaviour.

Two new tunables are introduced to control the delay, the
osd_object_sync_delay_us and osd_txg_sync_delay_us module options.
These values default to -1 which preserves the safe full sync
pool behavior.  Setting these values to zero or larger will
replace the pool sync with a delay of N microseconds.

The initial test results obtained by running sanityN test 16
(fsx) are encouraging.  If the zil_commit() time can be kept to
less than 10ms we should see a significant performance improvement.
These tests were run in a pristine centos 6.4 VM and the results
are averaged over four runs.

osd_txg_sync_delay_us     -1    -1     -1     -1      -1
osd_obj_sync_delay_us     -1     0   1000  10000  100000
--------------------------------------------------------
SanityN test 16 (secs)  24.3   7.3    7.6   10.1    34.4

Change-Id: Iff9b66888edc79a5e1585fa3ce8377be068748f2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Darby Vicker <darby.vicker-1@nasa.gov>
Reviewed-on: https://review.whamcloud.com/7761
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
3 years agoLU-9019 libcfs: remove cfs_time_XXX_64 wrappers 67/30867/2
James Simmons [Mon, 15 Jan 2018 00:17:01 +0000 (19:17 -0500)]
LU-9019 libcfs: remove cfs_time_XXX_64 wrappers

In an attempt to support 64 bit time handling before the linux
kernel developed time64_t and ktime lustre attempted to use
64 bit jiffies with a libcfs abstraction. Lets remove these
wrappers and replace them with modern 64 bit time support. The
lustre code that used these wrappers needs time resolution at
the seconds level so replace the code with time64_t handling.

Change-Id: I2bd53c4ce83830bedd4448678dffce9f2b2173b1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30867
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
3 years agoLU-9019 lnet: move ping and delay injection to time64_t 58/30658/4
James Simmons [Fri, 12 Jan 2018 16:50:46 +0000 (11:50 -0500)]
LU-9019 lnet: move ping and delay injection to time64_t

Migrate away from jiffies for the pinger to time_64_t to one make
it clear its for time keeping and secondly to ensure the behavior
is consistent across any platform. Besides the lnet pinger code
move the lnet dely injection code to time64_t as well.

Change-Id: If363523893fc1dcce4eaa866501946edd6558751
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30658
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10055 mdt: use max_mdsize in reply for layout intent 04/30004/10
Mikhal Pershin [Mon, 30 Oct 2017 16:45:42 +0000 (19:45 +0300)]
LU-10055 mdt: use max_mdsize in reply for layout intent

The LAYOUT intent reply LVB buffer size is set to a current
file layout, meanwhile it is not working when layout is changed
and the mdt_max_mdsize is better to use as size of reply buffer.
This buffer will be shrinked to the new layout size after all.

Without that change the new layout size may be bigger and layout
is not returned back, causing extra RPC from client.
The mdt_lvbo_fill() is changed also to update mdt_max_mdsize if
larger layout is found. The related message level is decreased
from D_ERROR to D_INFO.

Signed-off-by: Mikhal Pershin <mike.pershin@intel.com>
Change-Id: Iaac5dcb8b4c5aa2c050dddb5b3fb2662c59f133b
Reviewed-on: https://review.whamcloud.com/30004
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10453 lnet: support gni net configuration 29/30829/5
Amir Shehata [Thu, 11 Jan 2018 00:38:33 +0000 (16:38 -0800)]
LU-10453 lnet: support gni net configuration

GNI interfaces don't have IP addresses so when configuring GNI
interfaces there is no point of trying to query the ip. There is
also only one GNI interface, therefore the net configuration
command shouldn't enforce an interface name.

This patch also adds more descriptive error commands. It also allows
deleting an entire network without having to specify an interface.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I549647675fe5530db7d86272a7dc79892117847d
Reviewed-on: https://review.whamcloud.com/30829
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10429 lod: LBUG lod_comp_ost_in_use() 89/30889/2
Bobi Jam [Wed, 17 Jan 2018 07:46:23 +0000 (15:46 +0800)]
LU-10429 lod: LBUG lod_comp_ost_in_use()

* print more debug info in lod_comp_ost_in_use().
* lod_alloc_qos() could possibly rollback too much items in the
  inuse array, leads to negative inuse array count number.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ie3787f193468c6b783776e7df2ed4a6d54d8a12b
Reviewed-on: https://review.whamcloud.com/30889
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10419 lfsck: no delay for notify RPC 68/30768/4
Fan Yong [Thu, 18 Jan 2018 02:56:11 +0000 (10:56 +0800)]
LU-10419 lfsck: no delay for notify RPC

It is impossible that current MDT has trouble on the connection
with some other MDT(s) or OST(s). Under such case, the LFSCK on
current MDT should skip related MDT(s) or OST(s) to avoid whole
LFSCK process being blocked by the trouble connection or remote
targets via setting the LFSCK notify RPC as rq_no_delay.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib35080cedcbe49f4ae8c4b3690a4743d5afe41b1
Reviewed-on: https://review.whamcloud.com/30768
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10422 lfsck: misc fixes to avoid unexpected repairing 12/30612/5
Fan Yong [Wed, 20 Dec 2017 13:57:09 +0000 (21:57 +0800)]
LU-10422 lfsck: misc fixes to avoid unexpected repairing

There are several issues that will misguide LFSCK to
trigger unexpected RPC or repairing by wrong, including:

1) object_update_result_insert() should pack the OUT RPC
   result (not the return value) into the reply buffer via
   object_update_result::our_data. But it did that in some
   wrong address.

2) out_xattr_get() used wrong index to obtain the EA buffer
   as to may overwrite former update (such as OUT_XATTR_GET)
   results.

3) osp_declare_xattr_get() does not consider the last '0'
   of the EA name for the length parameter for
   osp_insert_async_request().

4) osp_xattr_get_interpterer() missed to handle the positive
   value for the given parameter @rc. That will cause the PFID
   EA to be double read when the target OST-object has it.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibf0e095ae2735c60b9b88e4b0992389c906728f9
Reviewed-on: https://review.whamcloud.com/30612
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9836 osd-ldiskfs: read directory completely 70/30770/8
Fan Yong [Tue, 16 Jan 2018 03:21:05 +0000 (11:21 +0800)]
LU-9836 osd-ldiskfs: read directory completely

For ldiskfs backend, the return of readdir() does NOT means
the whole directory being read. Instead, it is the caller's
duty to count whether there are new items read via the last
readdir() then determine whether or not the whole directroy
has been read.

Unfortunately, some old osd-ldiskfs logic, such as OI scrub,
did not handle that properly, as to some directory, such as
lost+found, may be partly scanned. That is why some orphans
cannot be recovered.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib328643c4cdcdb14b548807ed05e8835f80bbf6a
Reviewed-on: https://review.whamcloud.com/30770
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8264 lod: lfs setstripe fix for pool. 49/20849/13
Hongchao Zhang [Sat, 13 Jan 2018 09:01:54 +0000 (17:01 +0800)]
LU-8264 lod: lfs setstripe fix for pool.

If a file is created (with lfs) in the directory associated
with pool without -p pool_name option then limit stripe count
to number of osts in the pool as that directory is associated
with the pool. This patch fixes this problem.

Also removed the wrong check from ost-pools.sh, test_20 where
we were creating file in a directory associated with pool and
checking it as not part of the pool.

Add test cases in ost_pool.sh test_20.

Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Seagate-bug-id: MRP-3615
Change-Id: Id6dd5126856db7fc773a1fe9c837a214db8d6d70
Reviewed-on: https://review.whamcloud.com/20849
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10514 utils: statically link l_getidentity with libcfs.a 72/30872/3
John L. Hammond [Tue, 16 Jan 2018 00:50:46 +0000 (18:50 -0600)]
LU-10514 utils: statically link l_getidentity with libcfs.a

l_getidentity runs in a restricted environment which is not compatible
with the libtool wrapper script so statically link it with libcfs.a.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4d3455003d48a11bad4570c3ad23de65c95e5b2c
Reviewed-on: https://review.whamcloud.com/30872
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10480 liblustre: suppress progname prefix in output 19/30819/3
Bobi Jam [Wed, 10 Jan 2018 00:59:32 +0000 (08:59 +0800)]
LU-10480 liblustre: suppress progname prefix in output

Makes liblustre tool not prefix the progname before every output.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ic8d4cfa19e739ae15048152ec63d90f4b2959d20
Reviewed-on: https://review.whamcloud.com/30819
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10052 test: relate fs_log_size to recordsize 16/30916/5
Hongchao Zhang [Thu, 18 Jan 2018 08:58:42 +0000 (16:58 +0800)]
LU-10052 test: relate fs_log_size to recordsize

If the backend filesystem is ZFS, the block usage difference is
related to the recordsize of it, the maximum difference are 2 blocks.
This affects several different tests that have intermittent failures.

    replay-dual test_14b, replay-single test_20b, test_89

Change-Id: I36b184587306bd2b9221e5771bf1adfe071653ca
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/30916
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10516 doc: recommend e2fsprogs 1.42.13.wc6 71/30871/3
Andreas Dilger [Mon, 15 Jan 2018 23:04:55 +0000 (16:04 -0700)]
LU-10516 doc: recommend e2fsprogs 1.42.13.wc6

Update the recommended e2fsprogs version to 1.42.13.wc6 in
lustre/ChangeLog as this has been released for some time already.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Id4663e02849675f1c8a4b9c13e191ed9d735ab56
Reviewed-on: https://review.whamcloud.com/30871
Reviewed-by: Peter Jones <peter.a.jones@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10488 tests: fix sub-test return value issue in sanity-dom.sh 42/30842/3
Jian Yu [Thu, 11 Jan 2018 20:05:13 +0000 (12:05 -0800)]
LU-10488 tests: fix sub-test return value issue in sanity-dom.sh

This patch fixes test_sanity() and test_sanityn() in sanity-dom.sh
to return the actual exit values of sanity.sh and sanityn.sh.

For bash, variable assignments preceding commands affect only that
command. So, we can just change sh to bash and do not need save
and restore the value of ONLY.

Test-Parameters: trivial

Change-Id: I1edb1022f856552cb19cb6bd713aa9b6fce37b73
Signed-off-by: Jian Yu <jian.yu@intel.com>
Reviewed-on: https://review.whamcloud.com/30842
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10476 tests: add version check to sanity-dom and sanity-flr 16/30816/3
Jian Yu [Tue, 9 Jan 2018 23:10:58 +0000 (15:10 -0800)]
LU-10476 tests: add version check to sanity-dom and sanity-flr

This patch adds Lustre version check codes into sanity-dom.sh
and sanity-flr.sh to make the tests interoperate with servers
that do not support the DOM and FLR features.

Test-Parameters: trivial \
mdsjob=lustre-b2_10 ossjob=lustre-b2_10 serverbuildno=52 \
testlist=sanity-dom,sanity-flr

Change-Id: If36125e84a424976a60b9bcc1e2c94c5fab2ac7d
Signed-off-by: Jian Yu <jian.yu@intel.com>
Reviewed-on: https://review.whamcloud.com/30816
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10437 lod: clear layout header when generating layout 85/30785/2
Jinshan Xiong [Mon, 8 Jan 2018 21:36:35 +0000 (21:36 +0000)]
LU-10437 lod: clear layout header when generating layout

LOD needs to clear layout header otherwise the lcm_flags and
lcm_padding will be random data, which will create issues when
those fields are used by future module.

It already confused FLR because it uses lcm_flags and mirror_count
to do sanity check.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: If9511e6691144debd51ccab575ef4479d0c9b865
Reviewed-on: https://review.whamcloud.com/30785
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-10435 tests: add version check to conf-sanity test 32e 66/30766/2
Jian Yu [Mon, 8 Jan 2018 05:25:42 +0000 (21:25 -0800)]
LU-10435 tests: add version check to conf-sanity test 32e

This patch adds Lustre version check codes into conf-sanity
test 32e to make the test interoperate with servers that do
not support the DOM feature.

Test-Parameters: trivial envdefinitions=ONLY=32e \
mdsjob=lustre-b2_10 ossjob=lustre-b2_10 serverbuildno=52 \
testlist=conf-sanity

Change-Id: I6a561d2972dfc1071c0722af5cb265de0423626c
Signed-off-by: Jian Yu <jian.yu@intel.com>
Reviewed-on: https://review.whamcloud.com/30766
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10458 kernel: kernel update [SLES12 SP3 4.4.103-6.38] 38/30738/4
Bob Glossman [Tue, 9 Jan 2018 15:45:42 +0000 (07:45 -0800)]
LU-10458 kernel: kernel update [SLES12 SP3 4.4.103-6.38]

Update target and kernel_config files for new version

Test-Parameters: clientdistro=sles12sp3 testgroup=review-ldiskfs \
  mdsdistro=sles12sp3 ossdistro=sles12sp3 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ib7a308dbce58d94c5f5775cd54f33563cf067e7
Reviewed-on: https://review.whamcloud.com/30738
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5955 utils: lfs shouldn't skip .lustre directory 63/30463/4
Andreas Dilger [Sat, 9 Dec 2017 08:23:36 +0000 (01:23 -0700)]
LU-5955 utils: lfs shouldn't skip .lustre directory

Before Lustre 2.5.3 the MDS returned the .lustre directory to clients
with readdir in the root directory.  This has always been masked out
for "lfs find" and "lfs getstripe" by llapi_semantic_traverse(), but
had the side-effect of also skipping a real .lustre directory that may
exist in the filesystem (for whatever reason, I'm not sure).

Since 2.5.3-84-g2976f91 the /.lustre directory is no longer returned
by the MDS, so there is no need to exclude it in the tools anymore.
Add a sanity test to confirm that the .lustre directory is not listed
(there already are many tests that verify it can be accessed).

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I7ec6ee94b6012445d3bfd9a8a47497dacdbcab07
Reviewed-on: https://review.whamcloud.com/30463
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10282 flr: comp-flags support when creating mirrors 60/30360/13
Jinshan Xiong [Mon, 4 Dec 2017 19:52:25 +0000 (11:52 -0800)]
LU-10282 flr: comp-flags support when creating mirrors

This patch will allow flags to be set when creating mirrors.
The flags are set to individual components therefore it would be
flexible to flags based on the location of components. Also, 'stale'
and 'prefer' flags are allowed to set to individual components later
on.

This patch also revises component flags matching rules to allow
flags and inverted flags to be set at the same time in the command
lfs-find(1) and lfs-getstripe(1).

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ia077ca5454d49eb411bd82bd451c9dfc426d780c
Reviewed-on: https://review.whamcloud.com/30360
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10175 ldlm: remove obsoleted lock convert code 91/30491/3
Mikhail Pershin [Tue, 12 Dec 2017 11:17:21 +0000 (14:17 +0300)]
LU-10175 ldlm: remove obsoleted lock convert code

Patch removes lock mode convert mechanics from Lustre,
it is obsoleted and not functional at the moment. Also
there are no plans to restore it and use again.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I477caf24927768dfcdc15888e59a7d5e62d5b577
Reviewed-on: https://review.whamcloud.com/30491
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9618 clio: Use readahead for partial page write 44/27544/8
Patrick Farrell [Mon, 26 Jun 2017 16:07:38 +0000 (11:07 -0500)]
LU-9618 clio: Use readahead for partial page write

When writing to a region of a file less than file size
(either an existing file or a shared file with multiple
writers), writes of less than one page in size must first
read in that page.

This results in extremely poor performance. For random I/O,
there's no easy improvements available, but the sequential
case can benefit enormously by using readahead to bring in
those pages.

This patch connects ll_prepare_partial_page to the readahead
infrastructure.

This does not affect random I/O or large unaligned writes,
where readahead does not detect I/O.

Benchmarks are from a small VM system, files are NOT in
cache when rewriting.

Write numbers are in MB/s.

File per process:
    access             = file-per-process
    ordering in a file = sequential offsets
    ordering inter file= no tasks offsets
    clients            = 1 (1 per node)
    repetitions        = 1
    blocksize          = 1000 MiB
    aggregate filesize = 1000 MiB

New file (best case):
xfsize  ppr write
1KiB n/a 59.44
5KiB n/a 164.5

Rewrite of existing file:
xfsize  ppr re-write
1KiB off 4.65
1KiB on 48.40
5KiB off 12.95
5KiB on 143.3

Shared file writing:
access             = single-shared-file
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients            = 4 (4 per node)
repetitions        = 1
blocksize          = 1000 MiB
        aggregate filesize = 4000 MiB

xfsize  ppr     write
1KiB off 11.26
1KiB on 58.72
5KiB off 18.7
5KiB on 127.3

Cray-bug-id: LUS-188
Signed-off-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Change-Id: I822395995ee23b1c9ca289ae982e5294b69a0cff
Reviewed-on: https://review.whamcloud.com/27544
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10463 osd-zfs: use 1MB RPC size by default 57/30757/3
Andreas Dilger [Sat, 6 Jan 2018 01:39:06 +0000 (18:39 -0700)]
LU-10463 osd-zfs: use 1MB RPC size by default

Revert back to using 1MB RPC size for ZFS back-end storage, if it
is not otherwise specified, and as long as the ZFS recordsize is
1MB or smaller.  Continue to use the ZFS recordsize if it is larger.

For ldiskfs, continue to use 4MB RPC size, unless the bigalloc
feature is enabled and has a larger chunksize.

Testing has shown that while 4MB RPC size is good for ldiskfs, it
does not improve ZFS performance, and increases IO variability in
some cases.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I4b306843667bfd960ad07ecc3886a696fd3ebbe5
Reviewed-on: https://review.whamcloud.com/30757
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10003 lnet: deprecate lctl net commands 55/30755/2
Amir Shehata [Fri, 5 Jan 2018 22:20:04 +0000 (14:20 -0800)]
LU-10003 lnet: deprecate lctl net commands

Added a deprecated message for commands which are implemented in
lnetctl. The lctl commands will continue to function.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I3f528d0145f7958106a2fc6842fcd1670c9b9d7c
Reviewed-on: https://review.whamcloud.com/30755
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10454 mdd: check return value of lu_ucred() 07/30707/4
Sebastien Buisson [Mon, 8 Jan 2018 14:28:27 +0000 (23:28 +0900)]
LU-10454 mdd: check return value of lu_ucred()

In mdd_changelog_data_store_by_fid() part of the function checked
for the return value of lu_ucred(), part of it did not. This lead
to NULL pointer dereferencing.

Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iefe9d10191e499aec94415fb6fe0d5d2064f86f0
Reviewed-on: https://review.whamcloud.com/30707
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-10201 tests: Fix overly greedy grep in conf_sanity test 20 57/29957/4
Oleg Drokin [Tue, 7 Nov 2017 00:59:20 +0000 (19:59 -0500)]
LU-10201 tests: Fix overly greedy grep in conf_sanity test 20

Need to better ensure the mountpoint matching so that only
/mnt/lustre is mtched, but not /mnt/lustre-{mds,ost}

Change-Id: I0ca274a358de3a38542e05bb5682641459fea93d
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/29957
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-10459 lnd: throttle tx based on queue depth 51/30751/3
Amir Shehata [Fri, 5 Jan 2018 20:22:45 +0000 (12:22 -0800)]
LU-10459 lnd: throttle tx based on queue depth

Throttle the transmits based on the negotiated conn queue depth
to ensure we keep the number of outstanding transmits below the
negotiated queue depth.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I27190364904d6c79c0cd6d382228f8b8d2b11ba0
Reviewed-on: https://review.whamcloud.com/30751
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoNew tag 2.10.57 2.10.57 v2_10_57 v2_10_57_0
Oleg Drokin [Wed, 17 Jan 2018 07:25:01 +0000 (02:25 -0500)]
New tag 2.10.57

Change-Id: Ic2704bf2256afdf0800a30c1a979e3d99f2c208a
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7004 obd: make LCFG_SET_PARAM functional 90/28590/26
James Simmons [Wed, 13 Dec 2017 20:05:56 +0000 (15:05 -0500)]
LU-7004 obd: make LCFG_SET_PARAM functional

The LCFG_SET_PARAM infrastructure was meant to replace the
class_process_proc_param() functionality but various software
bugs have prevented its adoption. This patch does the following:

1) Take the better print_lustre_cfg() of the mgs module and use
   that in llog_swab.c instead with the intent of exporting this
   function. I add to process_param2_config() a call to
   print_lustre_cfg() for debugging purposes.

2) Move obdname2fsname to obd_mount.c and make it exportable.
   Expanded the functionality to work for both lctl conf_param
   and lctl set_parm -P.

3) Split mgs_setparam() into two functions since the difference
   in LCFG_SET_PARAM and LCFG_PARAM are large enough.

Currently virtual attributes failover.nid, sptlrpc, and quota
are not fully supported. They will be addressed in later patches.

Change-Id: Iced6505f39a3270139c1630270cfe1dc4a2e49ed
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28590
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10455 kernel: kernel update RHEL7.4 [3.10.0-693.11.6.el7] 34/30734/3
Bob Glossman [Thu, 4 Jan 2018 15:57:37 +0000 (07:57 -0800)]
LU-10455 kernel: kernel update RHEL7.4 [3.10.0-693.11.6.el7]

update RHEL 7.4 kernel to 3.10.0-693.11.6.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Id3428aa00e4b1501b642587db7911b6adafd51ef
Reviewed-on: https://review.whamcloud.com/30734
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8999 test: ignore unrelated quota id 30/30730/3
Hongchao Zhang [Wed, 6 Dec 2017 04:59:23 +0000 (12:59 +0800)]
LU-8999 test: ignore unrelated quota id

In test_38 of sanity_quota, the quota id larger than 9999
should be ignored.

Change-Id: I12e7936c0c1abc2dcaad7646a048c98bb37de254
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/30730
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9859 libcfs: delete libcfs/linux/libcfs.h 06/30706/4
James Simmons [Tue, 9 Jan 2018 05:52:16 +0000 (00:52 -0500)]
LU-9859 libcfs: delete libcfs/linux/libcfs.h

Lustre uses libcfs.h as a the header to include all headers. This
approach has drawbacks like colliding with MOFED compat headers
that do the same thing. This patch is the first step to unwind
including libcfs.h everywhere. This starts with eliminating
linux/libcfs.h.

Test-Parameters: trivial

Change-Id: Id2040d4295c16135561c8251e160cb2117ee21b8
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30706
Reviewed-by: Doug Oucharek <dougso@me.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
3 years agoLU-10052 tests: wait for OST objects to be deleted 78/30678/4
Hongchao Zhang [Mon, 4 Dec 2017 19:47:30 +0000 (03:47 +0800)]
LU-10052 tests: wait for OST objects to be deleted

In test_20b of replay-single, the used space difference after
the file creation and deletion shows that a block is not freed,
wait for OST objects to be destroyed after recovery is done.

Test-Parameters: trivial testlist=replay-single.sh ostfilesystemtype=zfs mdtfilesystemtype=zfs
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Change-Id: I6311d8b8fa4cea713a9755cfb6a3d63e693c8344
Reviewed-on: https://review.whamcloud.com/30678
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
3 years agoLU-10383 hsm: flatten mdt_cdt_started_cb() 61/30561/4
John L. Hammond [Fri, 15 Dec 2017 20:19:46 +0000 (14:19 -0600)]
LU-10383 hsm: flatten mdt_cdt_started_cb()

Rewrite mdt_cdt_started_cb() to avoid creating a fake progress kernel
for mdt_hsm_update_request_state() and handle the cleanup from the
timedout action directly. Cancel cancel actions that have timedout
rather than leaving them in the log indefinitely. The code is improved
in several places to clean up all resources associated with the action
rather than having the clean up depend on unnecessary assumptions.

Since mdt_hsm_coordinator_update() in then only called from the
MDS_HSM_PROGRESS handler, the update_record parameter can be removed
aw well as the now useless wrapper function
mdt_hsm_coordinator_update().

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ic6663b29b2a87de0da59085ccbe297b50abd049d
Reviewed-on: https://review.whamcloud.com/30561
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10383 hsm: consolidate CDT restore handle handling 57/30557/4
John L. Hammond [Fri, 15 Dec 2017 19:24:32 +0000 (13:24 -0600)]
LU-10383 hsm: consolidate CDT restore handle handling

Consolidate duplicated HSM coordinator restore handle handling into
new functions cdt_restore_handle_{add,del_(). Rename
mdt_hsm_restore_hdl_find() and some struct members for consistency.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I9798ed93ea26a9d61d4786540c6dae95cdc38c4b
Reviewed-on: https://review.whamcloud.com/30557
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10383 hsm: refactor mdt_coordinator_cb() 52/30552/3
John L. Hammond [Fri, 15 Dec 2017 16:14:11 +0000 (10:14 -0600)]
LU-10383 hsm: refactor mdt_coordinator_cb()

Split the ARS_WAITING and ARS_STARTED cases of mdt_coordinator_cb()
into subfunctions, mdt_cdt_waiting_cb() and mdt_cdt_started_cb().

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I734e10e4db72f76a6b0de76c383ad0b03efd76d8
Reviewed-on: https://review.whamcloud.com/30552
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-6051 lfs: Update lfs_migrate man page for in-use files 50/29950/2
Steve Guminski [Mon, 6 Nov 2017 17:46:30 +0000 (12:46 -0500)]
LU-6051 lfs: Update lfs_migrate man page for in-use files

Update man page to state that it is safe to use the script on in-use
files for versions at or above 2.5.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I412ee7681db3860ca395c4afc2a30c87f1f49d6d
Reviewed-on: https://review.whamcloud.com/29950
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5541 build: move libcfs and liblustreapi over to libtool 62/30562/6
James Simmons [Mon, 8 Jan 2018 22:30:23 +0000 (17:30 -0500)]
LU-5541 build: move libcfs and liblustreapi over to libtool

Change libcfs into a convenience library using libtool. This allows
use to embbed libcfs library into both liblnetconfig and liblustreapi
so their is no longer a need to link applications to libcfs.a
anymore. With this change we need migrate liblustreapi to libtool.

libtool knows how to build both static and dymanic libraries for
liblusteapi, so no need to hack the Makefile. As two added benefits,
the utilities will now use the dynamic version, thus reducing their
footprint, and calling make twice in a row won't rebuild objects
already built.

Test-Parameters: trivial

Change-Id: Icc1e5d42df503b9bf393396fe09f4e4f1f242486
Signed-off-by: frank zago <fzago@cray.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30562
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8358 vvp: Print discarded page warning on -EIO 11/21111/3
Patrick Farrell [Thu, 27 Jul 2017 15:18:08 +0000 (10:18 -0500)]
LU-8358 vvp: Print discarded page warning on -EIO

On client eviction, the client sometimes has dirty pages
outstanding, which are then discarded.  The client is
supposed to print an error when this happens,
from vvp_vmpage_error->ll_dirty_page_discard_warn.

However, the client looks for specific errors, and newer
Lustre clients will sometimes return -EIO to I/O requests
on eviction, instead of -EINTR.  Since they can still
return -EINTR, we must add -EIO as a new condition and
keep -EINTR.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I22ac82570a3840782c3fc6db40281b4a2c1cba1c
Reviewed-on: https://review.whamcloud.com/21111
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-3846 test: Fix sanity test_56* with different layouts 79/7479/13
Andreas Dilger [Fri, 7 Jul 2017 16:51:17 +0000 (10:51 -0600)]
LU-3846 test: Fix sanity test_56* with different layouts

Fix a bug in "lfs getstripe --obd" and "lfs find --obd" where they
tried to access objects of uninitialized components to check if
they were on specified OSTs.  Those components have no objects, so
skip uninitialized components when searching for specific OSTs.

Sanity test_56s and test_56u will fail when the default stripe count
is not 1 and the test is run with more than one OST.  Explicitly
set stripe_count=1 for the default directory layout for these tests.

For PFL layout testing, test_56a needs to fix its output parsing, as
it is using this to verify that lfs getstripe is returning valid data
by counting occurrences of "obdidx" and not the new "l_ost_idx".

Do not delete the filesystem default striping in 56a, 56g, 56h, as
this will silently cause failures with default PFL layout testing.

The sanity test_56* subdirectories were allowed to be shared long ago,
but when $tdir became a per-subtest directory this was lost.  Allow
the subdirectories to be shared again, where possible, to avoid
duplicate setup of the test directory for each subtest if not needed.

Pass test directory explicitly to setup_56() and not via global $TDIR.

Clean up test_56* to better match modern test code style.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I1bcdeb80fc6e39227a87365f823879db70eec652
Reviewed-on: https://review.whamcloud.com/7479
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-10444 utils: Don't remount debugfs every time 75/30675/3
Oleg Drokin [Sat, 30 Dec 2017 03:16:30 +0000 (22:16 -0500)]
LU-10444 utils: Don't remount debugfs every time

Check if debugfs is mounted at /sys/kernel/debug and only
mount if it is not.

Change-Id: Ib31bd8f7c5c93ab942c6708ed3a4d17a11159e95
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/30675
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
3 years agoLU-9019 osp: migrate to 64 bit time 74/30674/3
James Simmons [Thu, 4 Jan 2018 03:35:41 +0000 (22:35 -0500)]
LU-9019 osp: migrate to 64 bit time

Change opd_statfs_maxage from int to time64_t to make it clear
this field is in units of seconds. Change the last libcfs specific
cfs_time_t which maps to jiffies to ktime_t since it give better
than second resolution which is needed in this case.

Change-Id: I31baa73d5f6bd53dbcce4fc9f90462b11c6457a3
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30674
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9019 osc: migrate to 64 bit time 07/30607/4
James Simmons [Thu, 4 Jan 2018 03:24:57 +0000 (22:24 -0500)]
LU-9019 osc: migrate to 64 bit time

Change od_contention_time from int to time64_t to make it clear
this field is in units of seconds. Change the *_contention_time
fields from jiffies to ktime_t to make it clear we are dealing
with time and ktime_t is consistent on any platform unlike
jiffies.

Change-Id: Ieb240e40cc4d56050607314db057004db00aae13
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30607
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9892 test: fix SuSe nfsserver setup 76/30476/23
Minh Diep [Mon, 11 Dec 2017 18:12:20 +0000 (10:12 -0800)]
LU-9892 test: fix SuSe nfsserver setup

Checking for SuSE-release and use nfsserver
Add export info to a /etc/exports

Test-Parameters: trivial testlist=parallel-scale-nfsv4

Change-Id: Id12370ae35d878e51bdf6f71a77b1b82b5e82c33
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/30476
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10350 tests: make parsing routines pattern aware 36/30636/2
James Nunez [Thu, 21 Dec 2017 21:23:40 +0000 (14:23 -0700)]
LU-10350 tests: make parsing routines pattern aware

'lfs getstripe' now returns the pattern for each component
of a directory and files. The routines that parse
parameters, parse_layout_param() and parse_plain_param(),
need to look for the component pattern when parsing the output
of 'lfs getstripe'.

Test-Parameters: trivial testlist=sanity-pfl,ost-pools
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Iab605f58e9c8f501fa0889806c511e3310cb6dd7
Reviewed-on: https://review.whamcloud.com/30636
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10237 mdc: interruptable during RPC retry for EINPROGRESS 66/30166/2
Fan Yong [Sun, 19 Nov 2017 05:55:11 +0000 (13:55 +0800)]
LU-10237 mdc: interruptable during RPC retry for EINPROGRESS

Sometimes, some system resource may be inaccessible temporarily,
for example, related OI mapping is crashed and has yet not been
rebuilt. Under such case, the server will reply the client with
"-EINPROGRESS", then client will retry the RPC some time later.

Currently, the client will retry infinitely until related RPC
succeed or get other failure. But we do not know how long it
will be before related resource becoming available. It may be
very long time as to the RPC sponsor - the application or the
user does not want to retry any more, then we need to make the
logic to be interruptable. This patch is for such purpose.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I4f939f9a350d3a99ce3d3af37d0dea8ab8030fee
Reviewed-on: https://review.whamcloud.com/30166
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10468 tests: sync zfs dataset before reading blocks 28/30828/5
Jinshan Xiong [Thu, 11 Jan 2018 01:21:25 +0000 (17:21 -0800)]
LU-10468 tests: sync zfs dataset before reading blocks

Before reading blocks it should synchronize zfs dataset therefore
the block number will be accurate.

Test-Parameters: trivial envdefinitions=SLOW=yes,ENABLE_QUOTA=yes mdtfilesystemtype=zfs ostfilesystemtype=zfs mdscount=2 mdtcount=4 testlist=sanity-flr,sanity-flr,sanity-flr
Test-Parameters: trivial envdefinitions=SLOW=yes,ENABLE_QUOTA=yes mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs mdscount=2 mdtcount=4 testlist=sanity-flr,sanity-flr,sanity-flr
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I663d689296e5847b460de9a491b551a56bfbc77d
Reviewed-on: https://review.whamcloud.com/30828
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-10488 tests: saved and restore layout for dom test 36/30836/2
Jinshan Xiong [Thu, 11 Jan 2018 17:47:07 +0000 (17:47 +0000)]
LU-10488 tests: saved and restore layout for dom test

Some features like FLR and quota are still not supported by dom.
It will cause failures when dom is enabled when the corresponding
test cases are running.

This patch saves and restores default layout before and after
sanity-dom runs.

Test-Parameters: trivial envdefinitions=SLOW=yes,ENABLE_QUOTA=yes testlist=sanity-dom,sanity-flr,sanity-quota
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I69752c3c48e3edf32ff22399af32b67c718f8e0e
Reviewed-on: https://review.whamcloud.com/30836
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-5170 lfs: Standardize error messages in lfs_check() 66/30666/2
Steve Guminski [Wed, 12 Jul 2017 18:56:58 +0000 (14:56 -0400)]
LU-5170 lfs: Standardize error messages in lfs_check()

Error messages in lfs_check() are updated to a standard format.
Messages are prefixed with the name of the utility and the command
that caused the error.  User-provided values are delimited with
single quotes.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: Ife0fa0b5f22f1099757c38d07b1827e26182426e
Reviewed-on: https://review.whamcloud.com/30666
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
3 years agoLU-10425 kernel: kernel update [SLES12 SP3 4.4.103-6.33] 47/30647/2
Bob Glossman [Thu, 21 Dec 2017 18:13:08 +0000 (10:13 -0800)]
LU-10425 kernel: kernel update [SLES12 SP3 4.4.103-6.33]

Update target and kernel_config files for new version

Test-Parameters: clientdistro=sles12sp3 testgroup=review-ldiskfs \
  mdsdistro=sles12sp3 ossdistro=sles12sp3 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ibdaf6edeb34388d3d9bf39dc7f84639d5fb9992f
Reviewed-on: https://review.whamcloud.com/30647
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10392 fid: improve seq allocation error messages 23/30623/2
Emoly Liu [Wed, 20 Dec 2017 09:32:37 +0000 (17:32 +0800)]
LU-10392 fid: improve seq allocation error messages

When MDTs are waiting for MDT0000 to start the master sequence
server and be granted a meta sequence for the first time, in case
of "-EINPROGRESS", the limited console messages would be clearer
and better than many error messages.

Change-Id: I64e80508cf9fb837a328b4f8228f29bfb764845d
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/30623
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
3 years agoLU-10327 test: use nogroup instead of nobody 59/30559/8
Minh Diep [Fri, 15 Dec 2017 20:48:58 +0000 (12:48 -0800)]
LU-10327 test: use nogroup instead of nobody

In Ubuntu, there isn't any nobody group

Change-Id: I2d4607b3d1384d2d619dd9640363533f92397356
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/30559
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10316 tests: skip checksum check for sanity 77c 02/30402/7
James Nunez [Wed, 6 Dec 2017 00:17:16 +0000 (17:17 -0700)]
LU-10316 tests: skip checksum check for sanity 77c

sanity test 77c does not need to verify the checksum for
some versions of Lustre servers. The code to skip the
checksum verification is in place, but it is missing a
'retun 0;' to exit the test.

Test-Parameters: trivial
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I9ea19f699d1142475f84d59b9c31880b7daf7f52
Reviewed-on: https://review.whamcloud.com/30402
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
3 years agoLU-10308 misc: update Intel copyright messages for 2017 41/30341/4
Andreas Dilger [Fri, 1 Dec 2017 10:10:07 +0000 (03:10 -0700)]
LU-10308 misc: update Intel copyright messages for 2017

Update copyright messages for files updated in 2016, excluding
trivial patches.

Add trivial patches to updatecw.sh script exclude list.

Revert some changes that were incorrectly attributed to the
2016 (d10200a80770f0029d1d665af954187b9ad883df) and
2015 (0754bc8f2623bea184111af216f7567608db35b6) copyright
update patches themselves, since they were not in the exclude
list when the subsequent script was run.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I82f21c30c4dac75792bb49fc139bee2ca51f5545
Reviewed-on: https://review.whamcloud.com/30341
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5170 utils: Add utility and command name to llapi messages 71/28571/8
Steve Guminski [Thu, 3 Aug 2017 13:33:39 +0000 (09:33 -0400)]
LU-5170 utils: Add utility and command name to llapi messages

Allow error and info messages printed by llapi functions to
include the name of the utility and the command that caused
the error.  The command name is provided by the caller before
invoking any llapi functions through the use of the new function
llapi_set_command_name().  After the command has completed, the
utility should reset the name using llapi_clear_command_name().
If no command name is provided, only the utility name is printed.

The lfs and lctl utilities have been updated with the new functions.
They will print both the utility name and the command name while
in non-interactive mode.  In interactive mode, only the utility
name is printed.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: Ifaa1549f9bac6fdb25120f5721f69a8e1a7a52e1
Reviewed-on: https://review.whamcloud.com/28571
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5170 lfs: Standardize error messages in lfs_setquota() 86/28286/4
Steve Guminski [Wed, 12 Jul 2017 16:16:30 +0000 (12:16 -0400)]
LU-5170 lfs: Standardize error messages in lfs_setquota()

Error and warning messages in lfs_setquota() and the llapi functions
it calls are updated to a standard format.  Messages are prefixed
with the name of the utility and the command that caused the error.
User-provided values are delimited with single quotes.

Messages that duplicate information printed elsewhere have been
removed.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I2d23f3dfc897047ac1c0803f7da9b1e5f2e5d719
Reviewed-on: https://review.whamcloud.com/28286
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5170 lfs: Standardize error messages in lfs_quota() 53/28253/4
Steve Guminski [Wed, 12 Jul 2017 15:05:50 +0000 (11:05 -0400)]
LU-5170 lfs: Standardize error messages in lfs_quota()

Error and warning messages in lfs_setquota() and the llapi functions
it calls are updated to a standard format.  Messages are prefixed
with the name of the utility and the command that caused the error.
User-provided values are delimited with single quotes.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I598be2d73a28675c06b77ca6f9fa0544ecaecc7e
Reviewed-on: https://review.whamcloud.com/28253
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9594 tests: remove sanity tests from ALWAYS_EXCEPT 13/27413/19
James Nunez [Wed, 25 Oct 2017 14:01:45 +0000 (08:01 -0600)]
LU-9594 tests: remove sanity tests from ALWAYS_EXCEPT

Remove the following sanity tests from the ALWAYS_EXCEPT list:

45 - bz3561 'osc io page accounting' because associated bugzilla
ticket contain no activity.

68b - bz5188 test no longer exists.

Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ia6aa47872ffa7fe9e54131de7f7d3a6f8a70bd27
Reviewed-on: https://review.whamcloud.com/27413
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9145 nodemap: new_init_ucred doesn't do nodemapping 24/26624/4
Kit Westneat [Fri, 14 Apr 2017 15:06:02 +0000 (11:06 -0400)]
LU-9145 nodemap: new_init_ucred doesn't do nodemapping

The new_init_ucred path was missed in the original nodemap
implementation. This patch adds the mapping calls to new_init_ucred.

WIP: There are some issues/questions:
 - some of the new_init_ucred code should be merged with
old_init_ucred_common.
 - Why does new_init check for setuid/setgid/setgrp, but not old_init?
 - What is ptlrpc_user_desc and should it interact with nodemap?

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ia0e07078f32c43ec319be5a4cea753667056d645
Reviewed-on: https://review.whamcloud.com/26624
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8235 scripts: pass unrecognized options to lfs migrate 21/20621/6
Steve Guminski [Thu, 19 Oct 2017 14:52:07 +0000 (10:52 -0400)]
LU-8235 scripts: pass unrecognized options to lfs migrate

Pass through any unrecognized options to the "lfs migrate" command,
allowing the script to support migrate options without any special
handling.

Add new options "--rsync" and "--no-rsync" to specify how rsync
is used as a fallback alternative for the "lfs migrate" command.
The "--rsync" option forces the use of rsync instead of lfs migrate,
while the "--no-rsync" option prevents falling back to rsync in
the case where lfs migrate fails.

Add new "--dry-run" option.  The current "-n" option for a dry-run
duplicates the "-n" option to designate non-block for "lfs migrate".
The script's usage of "-n" is therefore deprecated, so that a future
patch can instead pass it through.

Add new "-v" option to increase verbosity, to help test/debug/monitor
what is being done by the script.  This option is also passed
through to "lfs migrate".

Test-Parameters: trivial
Signed-off-by: Nathan Dauchy <Nathan.Dauchy@noaa.gov>
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: Ia3f56bf528d2ac8155f08d93d01abd8bb9168cc4
Reviewed-on: https://review.whamcloud.com/20621
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-5680 tests: Remove use of /etc/motd from sanity-hsm 21/14021/6
James Nunez [Tue, 5 Dec 2017 17:49:26 +0000 (10:49 -0700)]
LU-5680 tests: Remove use of /etc/motd from sanity-hsm

In sanity-hsm tests, /etc/motd is copied to the file system
and archived. /etc/motd may not exist in some Linux distributions
and, if exists, may be of size 0 and may defeat the use of
bandwidth control for timing. Uses of /etc/motd are replaced
with a generated file of known suitable size.

Test-Parameters: trivial testlist=sanity-hsm

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I2f287db6d422ce20bca037ca69e24179c7e48144
Reviewed-on: https://review.whamcloud.com/14021
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7372 test: Remove replay-dual test_26 from ALWAYS_EXCEPT 77/30677/2
Jim Casper [Sun, 31 Dec 2017 23:27:04 +0000 (18:27 -0500)]
LU-7372 test: Remove replay-dual test_26 from ALWAYS_EXCEPT

The following patch to fix test 26 was checked into master and b2_10, but test 26 was
not added back to the test set:
  https://review.whamcloud.com/17853/
  https://review.whamcloud.com/28323/

Test-Parameters: trivial testlist=replay-dual

Change-Id: I5c72537b1b62b2a29882c8e03ce18f5a7766301a
Signed-off-by: Jim Casper <jamesx.casper@intel.com>
Reviewed-on: https://review.whamcloud.com/30677
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
3 years agoLU-8649 recovery: print some useful messages in recovery 56/30656/3
Emoly Liu [Mon, 25 Dec 2017 03:34:36 +0000 (11:34 +0800)]
LU-8649 recovery: print some useful messages in recovery

To make it more clear to the admins that recovery won't start
until the first client connects, this patch prints the following
useful messages in recovery, in case the admins are waiting for
recovery to complete:
- a console message every 10 minutes or so, and
- adding status WAITING_FOR_CLIENTS to /proc recovery_status file

Change-Id: I03d37b4c00a799a1fd651b8d60cdbceed807cea1
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/30656
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
3 years agoLU-10199 tests: Re-enable sanity test 56xb 16/30616/2
Steve Guminski [Fri, 15 Dec 2017 20:18:34 +0000 (15:18 -0500)]
LU-10199 tests: Re-enable sanity test 56xb

Re-enable sanity test 56xb now that LU-10199 has been resolved.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I02502e802be89c3fdfd559e1113193af568bc33c
Reviewed-on: https://review.whamcloud.com/30616
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-8616 dne: allow mkdir with specific MDTs 66/30566/4
Lai Siyao [Sat, 16 Dec 2017 08:22:10 +0000 (16:22 +0800)]
LU-8616 dne: allow mkdir with specific MDTs

Silimiar to create file on specific OSTs, allow 'lfs mkdir' to mkdir
on specific MDTs. This is achieved by allowing 'lfs mkdir -i' specify
multiple MDTs, and adding LMV_USER_MAGIC_SPECIFIC.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I3876707103465d1659afc80914ed6f9b58da25eb
Reviewed-on: https://review.whamcloud.com/30566
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
3 years agoLU-9845 test: add failmode=panic to zpool import options 49/30649/4
John L. Hammond [Fri, 22 Dec 2017 16:21:14 +0000 (10:21 -0600)]
LU-9845 test: add failmode=panic to zpool import options

In test-framework.sh add failmode=panic to the zpool import options.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I3b82f54034d57d99ed1f548bd3ac3736ae98d9f2
Reviewed-on: https://review.whamcloud.com/30649
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-10406 tests: stop running sanity-lfsck 31c 94/30694/3
James Nunez [Tue, 2 Jan 2018 22:56:11 +0000 (15:56 -0700)]
LU-10406 tests: stop running sanity-lfsck 31c

sanity-lfsck is failing about half the time it is run.
Add sanity-lfsck test 31c to the ALWAYS_EXCEPT list until
we understand and fix the issue.

Test-Parameters: trivial
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Iace77e1b1ea59971548497063974628be9e70733
Reviewed-on: https://review.whamcloud.com/30694
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
3 years agoLU-10399 test: use /dev/urandom in sanity-hsm test_1b() 56/30556/3
John L. Hammond [Fri, 15 Dec 2017 18:48:21 +0000 (12:48 -0600)]
LU-10399 test: use /dev/urandom in sanity-hsm test_1b()

In sanity-hsm test_1b() use /dev/urandom instead of /dev/random so we
won't have to wait for the second law of thermodynamics to become
true.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ib64691fd6ee1a172c631ffa55956e1d69f24d349
Reviewed-on: https://review.whamcloud.com/30556
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
3 years agoLU-10357 hsm: open file to be archived before starting 56/30456/2
John L. Hammond [Fri, 8 Dec 2017 19:59:43 +0000 (13:59 -0600)]
LU-10357 hsm: open file to be archived before starting

In the archive case of llapi_hsm_action_begin(), open the file to be
archived before calling the LL_IOC_HSM_COPY_START ioctl. Store the
open FD in struct hsm_copyaction_private, return a dup of it
llapi_hsm_action_get_fd(), and close it in
llapi_hsm_action_end(). Calling open() first avoids 3 extra RPCs
(MDS_GETATTR, LDLM_ENQUEUE for layout to get data version,
LDLM_ENQUEUE for fstat()) when archiving.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Idea0a7f4cb63bd9d712ff2ce9fcb59a3b278d0f2
Reviewed-on: https://review.whamcloud.com/30456
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10341 hsm: filter kkuc write by client UUID 19/30419/2
John L. Hammond [Thu, 7 Dec 2017 02:21:48 +0000 (20:21 -0600)]
LU-10341 hsm: filter kkuc write by client UUID

Add a struct obd_uuid kr_uuid member to struct kkuc_reg to hold the
UUID of the client (super block) that owns the kkuc pipe. Modify
libcfs_kkuc_group_{put,rem,foreach}() to accept a UUID pointer which
filters the kkuc pipes operated on. Modify mdc_hsm_copytool_send() to
pass the UUID of the MDC device when calling
libcfs_kkuc_group_put(). The effect of all this is that HALs received
by a given MDC will only be delivered to copytools registered on the
corresponding mount point.

Remove the cluuid member of struct lmv_obd since it is always the same
as the obd_uuid member of the corresponding struct obd_device. Remove
the kcd_uuid member of struct kkuc_ct_data as it is no longer needed.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ibbda253979739a1d56d3e132a51a482a02a0ec27
Reviewed-on: https://review.whamcloud.com/30419
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10310 utils: change l_getidentity log level 38/30338/5
Sergey Cheremencev [Fri, 1 Dec 2017 14:47:35 +0000 (17:47 +0300)]
LU-10310 utils: change l_getidentity log level

With current error level(NOTICE) l_getidentity error
messages don't appear in kernel messages. Changing
this level to WARNING may help to better and faster
understanding that issue relates to l_getidentity.

Change error level for errlog->vsyslog from LOG_NOTICE
to LOG_WARNING.

Test-Parameters: trivial
Cray-bug-id: MRP-2132
Change-Id: If72a5ee52de89b15b4bb7fc0dda77915531dbc39
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://review.whamcloud.com/30338
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
3 years agoLU-10210 tests: Add lustre_routes_conversion script in PATH 35/30335/3
Sonia Sharma [Fri, 1 Dec 2017 11:25:48 +0000 (03:25 -0800)]
LU-10210 tests: Add lustre_routes_conversion script in PATH

When running out of build tree test_67 in conf-sanity.sh cannot
find lustre_routes_conversion script because it's not in PATH.

Add lustre_routes_conversion script in test-framework.sh
init_test_env() so that it is visible when running out of
build tree.

Test-Parameters: trivial testlist=conf-sanity
Change-Id: Id5ed93e0176f0b42aa704511c72fbe14902ea42f
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-on: https://review.whamcloud.com/30335
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10211 tests: conf_sanity test 102 don't call lustre_rmmod 01/30001/2
Oleg Drokin [Wed, 8 Nov 2017 05:29:15 +0000 (00:29 -0500)]
LU-10211 tests: conf_sanity test 102 don't call lustre_rmmod

When testing from build tree lustre/scripts is not in the
PATH, so need to call $LUSTRE_RMMOD that knows the actual
path to that script.

Change-Id: I7e8fa2b4ac8c2d03d1a9a6865c50dbae6c139a30
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/30001
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
3 years agoLU-10030 utils: add lfs tool to change/list project of file 90/29190/22
Wang Shilong [Mon, 16 Oct 2017 06:58:04 +0000 (14:58 +0800)]
LU-10030 utils: add lfs tool to change/list project of file

Currently, we are using chattr/lsattr for project quota
interface, this have some problems:

1)Client side need patched e2fsprogs or latest upstream e2fsprogs.
2)Project quota will be no longer osd-ldiskfs based, ZFS
too, zfs guys might dislike ldiskfs tool dependency for them.
3)customers argue chattr might be a little dangerous.

So this patch add native lfs tools for project quota.
usage: project [-p id] [-s] [-r] <file|directory..>
          set project ID and/or inherit flag for specified
          file(s) or directory.
       project [-d|-r [-0]] <file|directory...>
          list project ID and flags on file(s) or directory,
          print outliers
       project -c [-d|-r [-p id] [-0]] <file|directory..>
          check project ID and flags on file(s) or directory,
          print outliers
       project -C [-r] [-k] <file|directory..>
          clear the project inherit flag and ID on the file
          or directory

Test-Parameters: testlist=sanity-quota,sanity-quota,sanity-quota,\
    sanity-quota clientdistro=el7 serverdistro=el7 \
    ostfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I45960fb8fbd12e22a654792fba517896c0447447
Reviewed-on: https://review.whamcloud.com/29190
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10192 lfsck: verify agent entry 85/29985/7
Fan Yong [Wed, 29 Nov 2017 04:27:12 +0000 (12:27 +0800)]
LU-10192 lfsck: verify agent entry

Originally, we only support agent entry for ldiskfs backend,
and the osd-ldiskfs will maintain agent entry from cross-MDTs
directory, NOT corss-MDTs regular file. So if someone create
cross-MDTs hard link or renames regular file cross-MDTs, then
related object will become invisible to userspace when mount
the MDT as 'ldiskfs' directly.

On the other hand, old ZFS based MDT also did not support
agent entry. When upgraded from the old ZFS based device,
or migrated from old ldiskfs based MDT to new ZFS based MDT,
some (or all) agent entries need to be created.

So we enhance the namespace LFSCK logic to check whether the
agent entry is properly setup or not. If not, the LFSCK will
trigger the lower layer agent entry verify mechanism via set
xattr operation.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0aa83aff8b39b894dbde19f573c078faf0ef249c
Reviewed-on: https://review.whamcloud.com/29985
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9796 kernel: improve metadata performaces for RHEL7 76/28276/9
Wang Shilong [Thu, 21 Dec 2017 07:02:20 +0000 (15:02 +0800)]
LU-9796 kernel: improve metadata performaces for RHEL7

Port following upstream patch for RHEL7:

commit de92c8caf16ca84926fa31b7a5590c0fb9c0d5ca
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jun 8 12:46:37 2015 -0400

    jbd2: speedup jbd2_journal_get_[write|undo]_access()

    jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
    frequently called for buffers that are already part of the running
    transaction - most frequently it is the case for bitmaps, inode table
    blocks, and superblock. Since in such cases we have nothing to do, it is
    unfortunate we still grab reference to journal head, lock the bh, lock
    bh_state only to find out there's nothing to do.

    Improving this is a bit subtle though since until we find out journal
    head is attached to the running transaction, it can disappear from under
    us because checkpointing / commit decided it's no longer needed. We deal
    with this by protecting journal_head slab with RCU. We still have to be
    careful about journal head being freed & reallocated within slab and
    about exposing journal head in consistent state (in particular
    b_modified and b_frozen_data must be in correct state before we allow
    user to touch the buffer).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
From 087ffd4eae9929afd06f6a709861df3c3508492a Mon Sep 17 00:00:00 2001
Date: Fri, 4 Dec 2015 12:29:28 -0500
Subject: [PATCH] jbd2: fix null committed data return in undo_access

     introduced jbd2_write_access_granted() to improve write|undo_access
     speed, but missed to check the status of b_committed_data which caused
     a kernel panic on ocfs2.
     ...

     Fixes: de92c8caf16c("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patches improve 10% of file create, 17% of file lustre unlink
performances

Change-Id: I8082be396209d8f658e3265cedf32670e15a53f5
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/28276
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9543 ofd: fiemap deadlock 24/27224/4
Andriy Skulysh [Mon, 22 May 2017 12:43:19 +0000 (15:43 +0300)]
LU-9543 ofd: fiemap deadlock

lock_zero_regions() locks all zero regions by acquiring
a set of independent locks.
It can deadlock with a PW lock for the whole file from
a client.

Indeed it isn't required to have all zero regions locked
at once, we need only force clients to flush data for
these regions.

Change-Id: Ib48e2bd9e6f715eb54a7821acde7b38b0de6650c
Seagate-bug-id: MRP-4393
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/15734
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Tested-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Reviewed-on: https://review.whamcloud.com/27224
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Artem Blagodarenko <c17828@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8895 target: limit grant allocation 96/24096/4
Vladimir Saveliev [Fri, 15 Dec 2017 09:33:17 +0000 (12:33 +0300)]
LU-8895 target: limit grant allocation

tgt_grant_alloc() is missing a check for amount of space already
granted to a client. If the client submits number of RPCs
simultaneously when the client's grant is below its max amount of
grants then the server may grant the client with amount of grants
substantially exceeding the amount of grants requested in one RPC. In
case of decent number of clients that may lead to ENOSPC long before
the lack of disk space is really achieved.

Limit grants given to a client to asked amount plus grants for 2 full
write RPCs.

A test to illustrate the issue is included.
The test needs to lower debug level so that dd provided sufficient I/O
throughput.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Seagate-bug-id: MRP-4013
Change-Id: Ie6a8abbad28a06bc1d55ff2fd042b9664a29e9e4
Reviewed-on: https://review.whamcloud.com/24096
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-6609 test: wait for import state FULL 43/14843/11
Sergey Cheremencev [Wed, 18 Dec 2013 13:56:38 +0000 (17:56 +0400)]
LU-6609 test: wait for import state FULL

recovery-small 26a sometimes couldn't remove sub-test dirs.
Decrement of export number may be caused by net issues.
So, now test is passed only when import state becomes EVICTED.
And in the end it waits for state FULL before removing sub-test dirs.

Test-Parameters: trivial envdefinitions=SLOW=yes,ONLY=26a testlist=recovery-small

Change-Id: Ib6156f4761bc79d89b42654898b51cc86c2ef40a
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/1277
Signed-off-by: Sergey Cheremencev <Sergey_Cheremencev@xyratex.com>
Xyratex-bug-id: MRP-1168
Tested-by: Jenkins
Tested-by: Elena Gryaznova <elena_gryaznova@xyratex.com>
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-on: https://review.whamcloud.com/14843
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9019 llite: change lli_glimpse_time to ktime 01/30601/2
James Simmons [Tue, 19 Dec 2017 18:20:45 +0000 (13:20 -0500)]
LU-9019 llite: change lli_glimpse_time to ktime

Currently lli_glimpse_time is in jiffies which can vary between
platforms. Migrate to ktime since we need more than second
time resolution that is consistent on any platform. Replace the
last cfs_time_current_sec() with ktime_get_real_seconds().

Change-Id: I352c3adbd07d9dadb7e5dbe180447a1cb18a48d2
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30601
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5637 tests: set filefrag blocksize to 1024 91/30391/3
Sergey Cheremencev [Wed, 6 Dec 2017 11:08:21 +0000 (14:08 +0300)]
LU-5637 tests: set filefrag blocksize to 1024

If blocksize for filefrag is unspecified it defaults to 1024 bytes.
But for example at SL7 blocksize defaults is 4096.
Set it to 1024 to have everywhere the same results.

Test-Parameters: trivial envdefinitions=ONLY=130 testlist=sanity
Change-Id: I550446c6c7c0b85aa769f0f8a7575a6d33b2dc4b
Cray-bug-id: MRP-2933
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://review.whamcloud.com/30391
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
3 years agoLU-10269 ldlm: allow trybits in waiting queue 43/30343/6
Mikhal Pershin [Sat, 2 Dec 2017 08:42:11 +0000 (11:42 +0300)]
LU-10269 ldlm: allow trybits in waiting queue

Lock trybits can be kept while lock is waiting and each new
lock filters trybits of locks in the waiting queue. When lock
if granted finally remaining trybits are added to the granted
bits. Therefore trybits can be granted for blocking lock if
no other locks take these bits while lock is waiting.

Test-Parameters: mdscount=1 mdtcount=1 testlist=racer,racer,racer,racer
Signed-off-by: Mikhal Pershin <mike.pershin@intel.com>
Change-Id: I775f776f4cf8b581e32e4a1585e862e1764b5bed
Reviewed-on: https://review.whamcloud.com/30343
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8626 hsm: expose the number of active hsm requests per type 36/30336/8
Quentin Bouget [Fri, 1 Dec 2017 11:41:44 +0000 (11:41 +0000)]
LU-8626 hsm: expose the number of active hsm requests per type

This patch creates 3 new proc files under the hsm directory:
 - archive_count
 - restore_count
 - remove_count

These should help monitor the coordinator's health and allow
policy engine to adapt their request flow.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I30c9fb658e8c14a181b094b51408c92df609c3ca
Reviewed-on: https://review.whamcloud.com/30336
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8271 nodemap: wait before getting large conf if changed 81/26781/6
Kit Westneat [Fri, 21 Apr 2017 18:57:42 +0000 (14:57 -0400)]
LU-8271 nodemap: wait before getting large conf if changed

If a nodemap configuration spans multiple RPCs, it's possible for the
nodemap config to change between RPCs. Previously, the MGC would
immediately retry to get the nodemap config. This patch modifies the
behavior so the nodemap config lock is readded to the wait queue,
which will delay retrying by ~10s.

Test-Parameters: envdefinitions=SLOW=yes \
testlist=sanity-sec,sanity-sec,sanity-sec

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ie4e70def712e5eaa38adecc450e39c0380e34b69
Reviewed-on: https://review.whamcloud.com/26781
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10321 lfsck: not start lfsck during umount 13/30513/2
Fan Yong [Wed, 13 Dec 2017 05:35:39 +0000 (13:35 +0800)]
LU-10321 lfsck: not start lfsck during umount

There is race condition bewtween lfsck_start and umount:
the LFSCK may be triggered just after the LFSCK stopped
during umount the target, then nobody will stop the new
started LFSCK, as to the umount may be blocked.

This patch sets flag on the lfsck instance when umount
that will prevent subsequent lfsck_start.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I779f862d4195d4289bb9dd96575cd7746ac4b35b
Reviewed-on: https://review.whamcloud.com/30513
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10321 lfsck: allow to stop the in-starting lfsck 20/30420/2
Fan Yong [Thu, 7 Dec 2017 07:36:47 +0000 (15:36 +0800)]
LU-10321 lfsck: allow to stop the in-starting lfsck

The LFSCK start logic will hold li_mutex on the lfsck instance
during LFSCK start processing. The LFSCK stop logic also needs
to take the li_mutex on the lfsck instance when stop the LFSCK.
If someone triggers lfsck_stop (such as when umount the target)
before the lfsck_start return, then lfsck_stop will be blocked
on the li_mutex. And if the li_mutex holder is blocked by other
things, for example, it may be waiting for the LFSCK RPC to be
handled by remote server (MDT/OST) but the connection or remote
server is not ready yet, then the lfsck_stop will be blocked.

To avoid such cascade block trouble, the patch makes lfsck_stop
can go ahead without taking li_mutex, then it can directly tell
related LFSCK engines the stop event even if former lfsck_start
does not complete yet.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I6e168d955db33d74778142235a8ed2802d3577d9
Reviewed-on: https://review.whamcloud.com/30420
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10212 ldlm: fix prolong for destroyed lock 92/29992/6
Alexander Boyko [Wed, 8 Nov 2017 19:30:05 +0000 (14:30 -0500)]
LU-10212 ldlm: fix prolong for destroyed lock

For a IO request ofd_prolong_extent_locks use
a fast path if the lock is found by handle. If the lock
has LDLM_FL_DESTROYED, prolong should try a general path.

No lock was accounted for IO request with destroyed lock
and ESTALE error happaned for a client.

operation ost_read to node x.x.x.x@o2ib failed: rc = -116

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I63e619d0330279bb2ae678ed98b1c0e899ad4e08
Reviewed-on: https://review.whamcloud.com/29992
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
3 years agoLU-5163 mdd: migrated entry may not exist 20/26620/6
Lai Siyao [Thu, 13 Apr 2017 09:54:53 +0000 (17:54 +0800)]
LU-5163 mdd: migrated entry may not exist

During dirent migration, we shouldn't assert file exists.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I11bbc5556007ec045b7a5d57a250981082ef6d70
Reviewed-on: https://review.whamcloud.com/26620
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-6142 gnilnd: handle LNet core typedef removal 74/30474/3
James Simmons [Fri, 22 Dec 2017 07:17:56 +0000 (02:17 -0500)]
LU-6142 gnilnd: handle LNet core typedef removal

The LNet layer has removed all the typedefs. Update the gnilnd
driver to handle the removals.

Test-parameters: trivial

Change-Id: I96377a977bc9ef689a0c48afb77106aa2a8993a1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30474
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9943 lnd: correct WR fast reg accounting 11/30311/4
Amir Shehata [Tue, 28 Nov 2017 04:18:02 +0000 (20:18 -0800)]
LU-9943 lnd: correct WR fast reg accounting

Ensure that enough WRs are allocated for the fast reg
case which needs two additional WRs per transfer:
the first for memory window registration and the second for
memory window invalidation.

Failure to allocate these causes the following problem:
mlx5_warn:mlx5_0:begin_wqe:4085(pid 9590): work queue overflow

Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Icf98b6bbb3d98fb29794173da84412070f13541b
Reviewed-on: https://review.whamcloud.com/30311
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10185 gnilnd: Change gnc_tx_bits to unsigned long 97/29897/4
Chuck Fossen [Wed, 6 Sep 2017 13:18:36 +0000 (08:18 -0500)]
LU-10185 gnilnd: Change gnc_tx_bits to unsigned long

gnc_tx_bits declared as __u8. Change to unsigned long.

The goal is to align gnc_tx_bits[] to a 32 bit boundary.
The initial declaration I believe was not a good choice.
The use of the gnc_tx_bits makes more sense as a long than as a 8 bit
value.
It is used by:
static inline int test_and_clear_bit(int nr, unsigned long *addr);
unsigned long find_next_zero_bit(unsigned long *addr, unsigned long
size, unsigned long offset);
static inline int test_and_set_bit(int nr, unsigned long *addr);
all of which takes a unsigned long pointer.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I559e2a356182f253716d30f69bf675c485fe1b72
Reviewed-on: https://review.whamcloud.com/29897
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-10185 gnilnd: change default credits 96/29896/4
James Shimek [Fri, 14 Jul 2017 21:48:09 +0000 (16:48 -0500)]
LU-10185 gnilnd: change default credits

It has been found that a credit value of 64 reduces likelihood of file
io induced congestion without producing performance degradation when a
many router single compute pattern occurs.
Change default compute credits to 64 based on this finding.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I0a1f46bdb2c327a7e6dc9eeb145fe1418d691da1
Reviewed-on: https://review.whamcloud.com/29896
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Shimek <knathrak@gmail.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>