Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-12131 tests: properly handle GSS in server failover 41/35041/6
Sebastien Buisson [Mon, 3 Jun 2019 14:30:50 +0000 (23:30 +0900)]
LU-12131 tests: properly handle GSS in server failover

In case of server failover, a number of aspects must be handled when
GSS based features (SSK or Kerberos) are activated:
- lsvcgssd daemon must be restarted;
- targets must be mounted with proper skpath option;
- permissions on keys must be adjusted.
When service is initially started, all that is managed in setupall().
fail() and facet_failover() have to be improved to take GSS aspects
into account.

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8db686f406629c7eec655496cf83c0539c1bfb33
Reviewed-on: https://review.whamcloud.com/35041
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 tests: Fix style issues for cascading_rw.c 33/35433/3
Arshad Hussain [Fri, 5 Jul 2019 20:56:26 +0000 (02:26 +0530)]
LU-6142 tests: Fix style issues for cascading_rw.c

This patch fixes issues reported by checkpatch
for file lustre/tests/mpi/cascading_rw.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I033db1a7ee23042cc3ec0dedced48830e00d1230
Reviewed-on: https://review.whamcloud.com/35433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11761 fld: let's caller to retry FLD_QUERY 62/34962/10
Hongchao Zhang [Thu, 4 Jul 2019 13:39:24 +0000 (09:39 -0400)]
LU-11761 fld: let's caller to retry FLD_QUERY

In fld_client_rpc(), if the FLD_QUERY request between MDTs fails
with -EWOUDBLOCK because the connection is lost, return -EAGAIN
to notify the caller to retry.

It also reverts the patch https://review.whamcloud.com/12586/, which
was landed on b2_6_90_0-5-g6db07f0 to avoid returning -EAGAIN from
lod_object_init() to confuse lu_object_find_at() (thinks the object
was dying when it encounters -EAGAIN). In current Lustre version,
lu_object_find_at() just returned found object and let's caller to
check whether it's dying.

Fixes: 6db07f095fba ("LU-5871 lod: Do not return EAGAIN in lod_object_init")
Change-Id: Ie83ebfdae2bd50c96a59a065f7f3c3dcfad04e42
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34962
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12538 lod: Add missed qos_rr_init 90/35490/4
Patrick Farrell [Fri, 12 Jul 2019 19:24:30 +0000 (15:24 -0400)]
LU-12538 lod: Add missed qos_rr_init

The new lmv space hash code uses the lu_qos_rr struct, but
forgot to init it fully.  Specifically, the spin lock isn't
inited, causing failures.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id410a8dc61980b880eab7e151b85c417a8439fd5
Reviewed-on: https://review.whamcloud.com/35490
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoNew tag 2.12.56 2.12.56 v2_12_56
Oleg Drokin [Tue, 16 Jul 2019 17:08:26 +0000 (13:08 -0400)]
New tag 2.12.56

Change-Id: Iad3fd72a2720f5dba2b7dae667b088eb73199d6a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-7912 osc: Remove stale comment in osc_page_transfer_add 15/19115/8
Oleg Drokin [Thu, 24 Mar 2016 00:19:46 +0000 (20:19 -0400)]
LU-7912 osc: Remove stale comment in osc_page_transfer_add

Ever since LU-3321 the fileds were not shared,
but then ops_inflight was completely deleted too.

Test-Parameters: trivial
Change-Id: I7b235f10ddb26a7ddbd4de7e502d33ee81a4f2e3
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/19115
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12410 tests: ignore return status of removal of 99-lustre-test.rules 98/35398/2
Oleg Drokin [Mon, 1 Jul 2019 17:46:59 +0000 (13:46 -0400)]
LU-12410 tests: ignore return status of removal of 99-lustre-test.rules

we don't care if we cannot remove it on th server side.

Change-Id: Id6833505c1e7cd39df9845a16b01c31c9d65e794
Test-Parameters: trivial
Test-Parameters: testlist=sanity-dlc
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35398
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
4 years agoLU-9862 lov: Correct bounds checking 84/28484/16
Nathaniel Clark [Thu, 4 Jul 2019 15:34:05 +0000 (11:34 -0400)]
LU-9862 lov: Correct bounds checking

While Dan Carpenter ran his smatch tool against the lustre code
base he encountered the following static checker warning:

lustre/lov/lov_ea.c:207 lsm_unpackmd_common()
warn: signed overflow undefined. 'min_stripe_maxbytes * stripe_count < min_stripe_maxbytes'

The current code doesn't properly handle the potential overflow
with the min_stripe_maxbytes * stripe_count. This fixes the
overflow detection for maxbytes in lsme_unpack().

Change-Id: I34646df3d59cadcb42a4defb58e16cb840acc99
Fixes: 3ddcf5b4a138 ("LU-7890 lov: Ensure correct operation for large object sizes")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/28484
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12495 test: shorten qos_maxage to update statfs 95/35395/4
Lai Siyao [Fri, 28 Jun 2019 19:07:09 +0000 (03:07 +0800)]
LU-12495 test: shorten qos_maxage to update statfs

sanity test_413b() should shorten lmv->desc.qos_maxage to update
cached statfs in time.

Test-Parameter: trivial envdefinitions=ONLY=413b
Test-Parameter: testlist=sanity,sanity,sanity,sanity,sanity,sanity

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I58672590669be5eaa5c0d679c51cb6cd533bc0d7
Reviewed-on: https://review.whamcloud.com/35395
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
4 years agoLU-12514 obdclass: Drop FS_HAS_FIEMAP compat macro 24/35424/2
Oleg Drokin [Fri, 5 Jul 2019 17:13:26 +0000 (13:13 -0400)]
LU-12514 obdclass: Drop FS_HAS_FIEMAP compat macro

FS_HAS_FIEMAP was some sort of old RHEL5 construct that's not
really important anymore

Linux-commit: 5c8eae72ff46f0e70d03ae2e86e631d7a1ca4fe6

Change-Id: Ia9941fa32eeb6114f9404014b78c29465d524d07
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35424
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
4 years agoLU-12368 obdclass: don't send multiple statfs RPCs 80/35380/5
Andreas Dilger [Sat, 29 Jun 2019 01:10:41 +0000 (19:10 -0600)]
LU-12368 obdclass: don't send multiple statfs RPCs

If multiple threads are racing to send a non-cached OST_STATFS or
MDS_STATFS RPC, this can cause a significant RPC storm for systems
with many-core clients and many OSTs due to amplification of the
requests, and the fact that STATFS RPCs are sent asynchronously.
Some logs have shown few 96-core clients have 20k+ OST_STATFS RPCs
in flight concurrently, which can overload the network if many OSTs
are on the same OSS nodes (osc.*.max_rpcs_in_flight is per OST).

This was not previously a significant issue when core counts were
smaller on the clients, or with fewer OSTs per OSS.

If a thread can't use the cached statfs values, limit statfs to one
thread at a time, since the thread(s) would be blocked waiting for
the RPC replies anyway, which can't finish faster if many are sent.

Also add a llite.*.statfs_max_age parameter that can be tuned on
to control the maximum age (in seconds) of the statfs cache.  This
can avoid overhead for workloads that are statfs heavy, given that
the filesystem is _probably_ not running out of space this second,
and even so "statfs" does not guarantee space in parallel workloads.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I95690e37aecbac08ac5768a5e5c6c70ca258a832
Reviewed-on: https://review.whamcloud.com/35380
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12481 osd-ldiskfs: allow full 64KB xattr size 52/35352/2
Andreas Dilger [Fri, 28 Jun 2019 00:53:04 +0000 (18:53 -0600)]
LU-12481 osd-ldiskfs: allow full 64KB xattr size

When the 'ea_inode' feature is enabled, allow the full 64KB xattr
size, since the xattr data is stored directly in the ea_inode data
blocks, while the ext4_xattr_entry and ext4_xattr_hdr structures are
stored separately in the parent inode or external xattr block.

This avoids errors on the client trying to set a full-sized inode:

    setfattr: /mnt/lustre/f61.conf-sanity: Argument list too long

Fixes: 3ec712bd183a ("LU-11868 osd: Set max ea size to XATTR_SIZE_MAX")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1320c32af98ab0feeeb147d8dbbc66ec7d1b8e1f
Reviewed-on: https://review.whamcloud.com/35352
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: remove prng 51/35351/4
NeilBrown [Fri, 28 Jun 2019 00:59:07 +0000 (20:59 -0400)]
LU-9859 libcfs: remove prng

The cfs prng is no longer used, so discard it.

Linux-commit: 508d5e0f4d45a815a0759c6aea69fef62359cf74

Test-Parameters: trivial

Change-Id: If780690dba196c8bc5935be223a952442f6a33ae
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35351
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12473 llapi: fix pool_list by path 20/35320/7
Dominique Martinet [Tue, 25 Jun 2019 16:11:30 +0000 (18:11 +0200)]
LU-12473 llapi: fix pool_list by path

lfs/lctl pool_list <fs_path> would print the FS path as pool prefix.
print fsname properly instead.

Fixes: 8813fdf2a4f2 ("LU-5030 util: migrate liblustreapi to use cfs_get_paths()")
Change-Id: I016b794fabd3d161d4651b41989637aebdf31f36
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-on: https://review.whamcloud.com/35320
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4423 ptlrpc: remove inline on non-inlined functions. 96/35296/5
NeilBrown [Mon, 1 Jul 2019 17:10:26 +0000 (13:10 -0400)]
LU-4423 ptlrpc: remove inline on non-inlined functions.

These three functions are never inlined.  The only time they
are used, their address is taken, and this forces them to
be compiled as stand-alone functions.  So having the "inline"
declaration is misleading.

Move the functions to the place where their address is used, and
remove the 'inline' tag.

Test-Parameters: trivial

Change-Id: I0824d362f05e7397dd828f06464ad5aa156673d4
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35296
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12400 osd-ldiskfs: get rid of legacy 'get_ds()' function 38/35238/3
Shaun Tancheff [Mon, 10 Jun 2019 20:48:54 +0000 (15:48 -0500)]
LU-12400 osd-ldiskfs: get rid of legacy 'get_ds()' function

get rid of legacy 'get_ds()' function

Every in-kernel use of this function defined it to KERNEL_DS
(either as an actual define, or as an inline function).

Linux-commit: 736706bee3298208343a76096370e4f6a5c55915

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I3aabe74802f1a953b140728f22c83125dae270c3
Reviewed-on: https://review.whamcloud.com/35238
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12423 lnet: honor discovery setting 92/35192/3
Amir Shehata [Tue, 11 Jun 2019 19:02:15 +0000 (12:02 -0700)]
LU-12423 lnet: honor discovery setting

If discovery is off do not push out any updates. This could be
triggered in case of a gateway's interface changing.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie421318ae85b895327ec170ffb436c9b679f6866
Reviewed-on: https://review.whamcloud.com/35192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: interoperability support 44/35144/8
Vitaly Fertman [Wed, 5 Jun 2019 14:23:40 +0000 (17:23 +0300)]
LU-10070 lod: SEL: interoperability support

Add a new SEL magic for storing SEL components on disk.
It is never gets out of LOD, converted on read/write to COMP_V1.
A the result, old MDS is not able to open SEL files.
At the same time old clients are able to work with existing files
seamlessly. Old clients still lacks lustre utils support, thus not
possible to create new SEL files, etc.

Cray-bug-id: LUS-2528
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ib3f0b1402cd920e56beaad78a74da485bd7ad342
Reviewed-on: https://review.whamcloud.com/35144
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12355 llite: totalram_pages changed to atomic_long_t 25/35025/5
Shaun Tancheff [Sat, 15 Jun 2019 19:32:26 +0000 (14:32 -0500)]
LU-12355 llite: totalram_pages changed to atomic_long_t

Kernel 5.0 changed totalram_pages to atomic_long_t
Provide an abstracted accessor now that totalram_pages
is now a function

Linux-commit: ca79b0c211af63fa3276f0e3fd7dd9ada2439839

Test-Parameters: trivial
Change-Id: I558e42074004e2ee5f79deea0d363e5bea332729
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-on: https://review.whamcloud.com/35025
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 utils: SEL: lfs find & getstripe support 09/34909/16
Vitaly Fertman [Mon, 3 Jun 2019 16:34:05 +0000 (19:34 +0300)]
LU-10070 utils: SEL: lfs find & getstripe support

The support includes:
- add --extension-size option to lfs find & getstripe along
  with +/- functionality;
- do not take the extension components into account for
  lfs find --stripe-size and --stripe-count;
- add appropriate tests;

Cray-bug-id: LUS-2528
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ic3ad4c713e8c676998cf7d02b524ba266c992924
Reviewed-on: https://review.whamcloud.com/34909
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 obd_type: use typ_kobj.name as typ_name 17/34717/7
NeilBrown [Thu, 27 Jun 2019 21:55:26 +0000 (17:55 -0400)]
LU-8066 obd_type: use typ_kobj.name as typ_name

As the kobject has a name (after kobject_add has been called),
we don't need to also store it in typ_name.
So use typ_kobj.name instead of typ_name.

This requires changing some "char *" to "const char *" as
typ.kobj.name is const.

Change-Id: Iaf0ef192e91ba1b4bd1c1b124dc1068de632d341
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/34717
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
4 years agoLU-12101 socklnd: fix infinite loop in ksocknal_push() 99/34499/4
NeilBrown [Thu, 27 Jun 2019 15:18:36 +0000 (11:18 -0400)]
LU-12101 socklnd: fix infinite loop in ksocknal_push()

If the list_for_each_entry() loop in ksocknal_push()
ever finds a match, then it will increment 'i', and the outer
loop will continue.

Once peer_off becomes larger than the number of matches
in a given chain, 'peer_ni' will be an invalid pointer, and
ksocknal_push_peer() will probably crash when called on it.

To abort the outer loop properly, we need to test if
"i <= peer_off", which indicates that all patching peers
have been found.

This bug can easily be reproduced by running
  lctl --net tcp push

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I9468214c7e1a0154213586cac0deb61afaa1d53d
Reviewed-on: https://review.whamcloud.com/34499
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: Repeated components 86/33786/28
Patrick Farrell [Thu, 6 Jun 2019 16:34:21 +0000 (19:34 +0300)]
LU-10070 lod: SEL: Repeated components

This changes behavior when there is no next component to
spill over to.  Currently, in that case, we just extend
the current component regardless of available space.

Now, if there is no following component, we try repeating
the current component, creating a new component using the
current one as a striping template.  We try assigning
striping for this component.  If there is sufficient free
space on the OSTs chosen for this component, it is
instantiated and i/o continues there.

If there is not sufficient space on the OSTs chosen for the
new component, we remove it & extend the current component.

This is a behavioral improvement, with no implications for
layout sanity checking.

Cray-bug-id: LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: If9f364b4105a4bb892dfe673c724e04781c46336
Reviewed-on: https://review.whamcloud.com/33786
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: Add FLR support 85/33785/24
Patrick Farrell [Tue, 11 Dec 2018 19:34:27 +0000 (13:34 -0600)]
LU-10070 lod: SEL: Add FLR support

Add FLR support for self-extending layouts.

The basic model is that when a layout intent would modify
an FLR replica, we first run the extent_update code to
perform any layout extent changes for self extending
layouts.

This treats the FLR operations (stale, resync, etc)
similarly to i/o, creating initialized layout where those
operations need it.  This makes the interaction between SEL
and FLR fairly simple.

Add FLR tests for self-extending layouts

Cray-bug-id: LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: Ia23df8e226955f64e9b19df993b66d2d4f820f33
Reviewed-on: https://review.whamcloud.com/33785
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: Layout sanity checking 84/33784/25
Patrick Farrell [Mon, 3 Jun 2019 17:43:16 +0000 (20:43 +0300)]
LU-10070 lod: SEL: Layout sanity checking

Add layout sanity checking for self-extending layouts.

This requires a more complex method checking layouts,
checking the entire layout rather than just individual
components against their immediate neighbors.  This is
implemented with a layout sanity callback which walks the
layout.

Incorporate mirror sanity checks from lfs.c.

Cray-bug-id: LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I960a4ce96ace54f7fe4305b9197e27c540f81211
Reviewed-on: https://review.whamcloud.com/33784
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: Implement basic spillover space 83/33783/24
Patrick Farrell [Mon, 3 Jun 2019 16:24:29 +0000 (19:24 +0300)]
LU-10070 lod: SEL: Implement basic spillover space

This is a barebones implementation of spillover space.
This allows the creation of extendable layout components,
which are normal layout components followed by "extension
components".  These extension components are never
initialized, instead, when i/o reaches them, the server
checks if there is sufficient space on the preceding normal
layout component, and if so, it modifies the extent of the
component to give space to the preceding component.

If there is not sufficient space on those OSTs, the special
extension space component can be removed, and the next
component of the layout is moved down to meet the existing
component.  This allows i/o to "spill over" to this new
layout component, which is expected to be on different
OSTs.

For multi-tiered systems, this makes it possible to avoid
the situation where an inner tier is low on space, but a
an outer tier has plenty, and PFL files cannot use the
space in the outer tier because the inner is full.

This patch requires the next patch in the series for FLR
support, but does not depend on the other subsequent
patches in this series.

Cray-bug-id: LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I8f6c6df8ee155033d5278535dc456e604552e409
Reviewed-on: https://review.whamcloud.com/33783
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: Add flag & setstripe support 82/33782/22
Patrick Farrell [Mon, 3 Jun 2019 16:22:09 +0000 (19:22 +0300)]
LU-10070 lod: SEL: Add flag & setstripe support

The self-extending layouts feature adds a new layout flag
and also uses the stripe size field differently.

This patch implements this basic functionality, to be used
in subsequent patches.

Cray-bug-id: LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I4392b70266cbab5bc8fa42afc3c360b954d5918a
Reviewed-on: https://review.whamcloud.com/33782
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: SEL: split lod_del_layout 80/33780/22
Patrick Farrell [Wed, 19 Jun 2019 21:29:12 +0000 (00:29 +0300)]
LU-10070 lod: SEL: split lod_del_layout

SEL deletes layout components as part of other operations,
rather than only as a separate delete operation.

So we split lod_layout_del in to a function that prepares
the layout and one that writes it out.  The prep_layout
function will be used in later patches.

Cray-bug-id:  LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I5d7270db9a8d9bc94f4571906ed9e2d4a17a151b
Reviewed-on: https://review.whamcloud.com/33780
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 tests: New test-framework functionality 78/33778/21
Patrick Farrell [Thu, 13 Jun 2019 22:30:24 +0000 (01:30 +0300)]
LU-10070 tests: New test-framework functionality

The self-extending layout tests will make heavy use of
setting OST low & high watermarks to simulate low/out of
space conditions.  To this end, add improved ways of
working with these to the test framework and use them in
sanity 253.

Add a component-count helper in sanity-pfl.

Fix pool_add_targets so it can add only 1 target.

Also move one helper from sanity to test-framework so it
can be used from sanity-pfl.

Cray-bug-id: LUS-2528
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I4e75c7db07b201ff2c410734d5daa991e74bd5c1
Reviewed-on: https://review.whamcloud.com/33778
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11762 ldlm: don't exceed hard timeout 08/34408/9
James Simmons [Thu, 4 Jul 2019 16:47:09 +0000 (12:47 -0400)]
LU-11762 ldlm: don't exceed hard timeout

For recovery lustre has both a soft timeout, obd_recovery_timeout
and a hard timeout, obd_recovery_time_hard. When the recovery
timer is adjust with the function extend_recovery_timer() you
can control if it takes in consideration what is left of the
timer. The current code is not very clear on its intent so this
patch attempts to make the code understandable. No function
change should happen with this patch.

Change-Id: I5701a6cd813ad64b6b4422863767af135eb8e94b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34408
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12491 obdclass: add comment for rcu handling in lu_env_remove 47/35447/4
James Simmons [Mon, 8 Jul 2019 20:47:40 +0000 (16:47 -0400)]
LU-12491 obdclass: add comment for rcu handling in lu_env_remove

During the review it was pointed out why the RCU lock was dropped
in lu_env_remove() but the code itself doesn't explain why. Add
a comment giving the details why RCU locking is not needed.

Test-parameters: trivial

Change-Id: I4fd761d2e1b4adad8e970904d56cdcd057dfe7d5
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35447
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11518 osc: cancel osc_lock list traversal once found the lock is being used 96/35396/4
Gu Zheng [Mon, 24 Jun 2019 05:51:20 +0000 (13:51 +0800)]
LU-11518 osc: cancel osc_lock list traversal once found the lock is being used

Currently, in osc_ldlm_weigh_ast, it walks osc_lock list (oo_ol_list)
to check whether target dlm is being used, normally, if found, it needs
to skip the rest ones and cancel the traversal, but it doesn't, let's
fix it here.

Change-Id: I2e64d2938cdacb6c5baca73647d74c9fb8f54f8c
Fixes: 3f3a24dc5d7d ("LU-3259 clio: cl_lock simplification")
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Reviewed-on: https://review.whamcloud.com/35396
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11213 uapi: change "space" hash type to hash flag 18/35318/3
Lai Siyao [Fri, 21 Jun 2019 06:47:42 +0000 (14:47 +0800)]
LU-11213 uapi: change "space" hash type to hash flag

Change LMV_HASH_TYPE_SPACE to LMV_HASH_FLAG_SPACE to make it flexible
in directory layout inheritance in the future. But it's still exposed
to user as hash type "space" in "lfs setdirstripe" command to make
it easy to understand.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ifa302204ed62dff8cc9d12fdc1f9ea86f8491d40
Reviewed-on: https://review.whamcloud.com/35318
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11264 llapi: clean up llapi_search_tgt() code 92/35092/5
Andreas Dilger [Fri, 7 Jun 2019 03:39:36 +0000 (21:39 -0600)]
LU-11264 llapi: clean up llapi_search_tgt() code

Clean up llapi_search_tgt() and helper functions llapi_search_ost()
and llapi_search_mdt() to set errno on return.

Add man pages for all of these functions.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieb2e93208fbc1b1492f632d8ce1383ca9fdec5f2
Reviewed-on: https://review.whamcloud.com/35092
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11285 mdt: improve IBITS lock definitions 45/35045/6
Andreas Dilger [Mon, 3 Jun 2019 18:21:53 +0000 (12:21 -0600)]
LU-11285 mdt: improve IBITS lock definitions

Move MDS_INODELOCK_* flags into a named enum, and add the definitions
for the newer flags into wirecheck/wiretest to ensure consistency.

Rename MDS_INODELOCK_MAXSHIFT to MDS_INODELOCK_NUMBITS to hold current
number of lockbits, rather than one less than the number of lockbits,
since the only two places that use it expect it to be one larger than
it is.  Fix uses of MDS_INODELOCK_NUMBITS to be number of locks.  This
does not change the value of MDS_INODELOCK_FULL, which is used in the
protocol to exchange supported lock bits between client and server.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c2985bcc602b7182d5db2cf8d590923be2cab07
Reviewed-on: https://review.whamcloud.com/35045
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12491 obdclass: use RCU to release lu_env_item 38/35038/5
Alex Zhuravlev [Mon, 3 Jun 2019 02:52:42 +0000 (05:52 +0300)]
LU-12491 obdclass: use RCU to release lu_env_item

as rhashtable_lookup_fast() is lockless and can
find just released objects.

Fixes: aa82cc8361 ("obdclass: put all service's env on the list")
Change-Id: I6ed8ccc5bb5b192eed90b55103d11b822ec90692
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35038
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11893 lnet: consoldate secondary IP address handling 93/34993/8
James Simmons [Tue, 2 Jul 2019 13:11:15 +0000 (09:11 -0400)]
LU-11893 lnet: consoldate secondary IP address handling

The last piece of code with broken secondary IP address
support is lnet_parse_ip2nets(). We could fix it like
o2iblnd or socklnd was done but since the LND drivers
resolved those issues instead we can move the handling
out of the LND drivers into one place in the LNet core.
To do this we introduce struct lnet_inetdev which is
a collection of data that the current LNet layer requires.
The new function lnet_inet_enumerate() is used to collect
this information.

Change-Id: I0c532caa3cf6b2178eb1ab65e55e5883d408a185
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-1732 utils: allow ldiskfs wide striping with ea_inode 15/4315/16
Patrick Farrell [Sat, 25 May 2019 13:48:06 +0000 (09:48 -0400)]
LU-1732 utils: allow ldiskfs wide striping with ea_inode

Format the MDT filesystem with the "ea_inode" option by default
so that files can have more than 160 stripes, and large xattrs
over one filesystem block in size (normally 4096 bytes).

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I71589c13e59a9d13db3bf075282cf6334b86be30
Reviewed-on: https://review.whamcloud.com/4315
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12462 llite: Remove old fsync versions 39/35339/2
Patrick Farrell [Thu, 27 Jun 2019 15:41:29 +0000 (11:41 -0400)]
LU-12462 llite: Remove old fsync versions

The old two arg and three arg versions of fsync were last
used in 2.6.3X kernels, and we no longer support those.

Remove it to clean up our fsync defines, and add to debug
to capture all the arguments of the current fsync.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3ab18fa5a7a4a6d3b0714570a8ff3f2ad820e5ad
Reviewed-on: https://review.whamcloud.com/35339
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9971 lnet: fix peer ref counting 46/35446/2
Amir Shehata [Mon, 8 Jul 2019 19:51:05 +0000 (12:51 -0700)]
LU-9971 lnet: fix peer ref counting

Exit from the loop after peer ref count has been incremented
to avoid wrong ref count.

The code makes sure that a peer is queued for discovery at most
once if discovery is disabled. This is done to use discovery
as a standard ping for gateways which do not have discovery feature
or discovery is disabled.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2cc4c8f9d780f5c438d9b51bb2d1106fec553f39
Reviewed-on: https://review.whamcloud.com/35446
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoRevert "LU-11760 ofd: formatted OST recognition change" 88/35388/4
Andreas Dilger [Sun, 30 Jun 2019 07:54:13 +0000 (07:54 +0000)]
Revert "LU-11760 ofd: formatted OST recognition change"

This is causing conf-sanity test_69 failures in LU-12404 due to
the increased limit on the number of objects precreated after
recovery.  The issue will be fixed in a different way.

This reverts commit d07d9c5ed0aa1d6614944c7d1e0ca55cba301dc4.

Change-Id: I437889f20699207fa15eff6685b0992292555f19
Reviewed-on: https://review.whamcloud.com/35388
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12494 kernel: kernel update SLES12 SP4 [4.12.14-95.19.1] 91/35391/2
Jian Yu [Sun, 30 Jun 2019 07:01:18 +0000 (00:01 -0700)]
LU-12494 kernel: kernel update SLES12 SP4 [4.12.14-95.19.1]

Update SLES12 SP4 kernel to 4.12.14-95.19.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT=103a

Change-Id: I6a101dc2637945192cf8aca661e23c3bccb47609
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35391
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12458 kernel: kernel update RHEL7.6 [3.10.0-957.21.3.el7] 68/35268/5
Jian Yu [Wed, 19 Jun 2019 20:19:53 +0000 (13:19 -0700)]
LU-12458 kernel: kernel update RHEL7.6 [3.10.0-957.21.3.el7]

Update RHEL7.6 kernel to 3.10.0-957.21.3.el7.

Test-Parameters: clientdistro=el7.6 serverdistro=el7.6

Change-Id: I10c5708a412022e6066ff7ce2801375049f188d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35268
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12456 kernel: kernel update RHEL 8.0 [4.18.0-80.4.2.el8_0] 67/35267/4
Jian Yu [Wed, 19 Jun 2019 18:42:53 +0000 (11:42 -0700)]
LU-12456 kernel: kernel update RHEL 8.0 [4.18.0-80.4.2.el8_0]

Update RHEL 8.0 kernel to 4.18.0-80.4.2.el8_0 for Lustre client.

Change-Id: I1ff433f6ef3433dae54def0e89bc035d25ff15a4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35267
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12474 tests: Do not run check_progs_installed for racer 27/35327/2
Oleg Drokin [Wed, 26 Jun 2019 02:22:23 +0000 (22:22 -0400)]
LU-12474 tests: Do not run check_progs_installed for racer

it's run from within racer so racer is already there for sure

Change-Id: Ifd78cd051842c9663130b650c6e35d60332250e7
Test-Parameters: testlist=racer
Test-Parameters: trivial
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-9792 tests: t-f do not create empty files for ZFS 05/34105/7
Alex Zhuravlev [Thu, 24 Jan 2019 13:24:26 +0000 (16:24 +0300)]
LU-9792 tests: t-f do not create empty files for ZFS

as zpool doesn't like empty device files.

Change-Id: I3a224bf2e60c6f20e13013caf827fe29641a8c5c
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34105
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12395 build: build mpitests for el8 74/35374/9
Minh Diep [Fri, 28 Jun 2019 21:28:39 +0000 (14:28 -0700)]
LU-12395 build: build mpitests for el8

RHEL8 has rpm-mpi-hooks which requires binaries
to be in specific mpi bin to generate the correct
requires

See https://fedoraproject.org//wiki/Changes/RpmMPIReqProv
and https://fedoraproject.org/wiki/Packaging:MPI

Test-Parameters: trivial clientdistro=el8 serverdistro=el7.6 testgroup=regression-mpi

Change-Id: Id9fa50e15b48b9da846083b9e9cd894ad1eac967
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35374
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12483 tests: fix sanity test 60h running conditions 55/35355/7
Oleg Drokin [Fri, 28 Jun 2019 01:59:37 +0000 (21:59 -0400)]
LU-12483 tests: fix sanity test 60h running conditions

The test is supposed to run in DNE mode on 2.12.52 or above,
but the conditions are somehow reversed.

Change-Id: I322941a6098b0dbfbabe2f5c70f40f8e81d1bbab
Test-Parameters: trivial
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35355
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoRevert "LU-12328 flr: preserve last read mirror" 50/35450/2
Oleg Drokin [Tue, 9 Jul 2019 17:27:41 +0000 (17:27 +0000)]
Revert "LU-12328 flr: preserve last read mirror"

This is causing somewhat frequent crashes tracked in
LU-12525

This reverts commit 810f2a5fef577b4f0f6a58ab234cf29afd96c748.

Change-Id: If7604ad4ca1d4ddc63a20fa2ec7d9467ee7bb5f9
Reviewed-on: https://review.whamcloud.com/35450
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12495 obdclass: generate random u64 max correctly 94/35394/5
Lai Siyao [Fri, 28 Jun 2019 17:36:27 +0000 (01:36 +0800)]
LU-12495 obdclass: generate random u64 max correctly

Generate random u64 max number correctly, and make it an obdclass
function lu_prandom_u64_max().

Fixes: 7a707d4828 (libcfs: replace cfs_rand() with prandom_u32_max())

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2b94a42b42539be319f358d7af2a82dc8b26117c
Reviewed-on: https://review.whamcloud.com/35394
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: remove libcfs_debug_vmsg2 25/35225/3
NeilBrown [Thu, 13 Jun 2019 19:10:21 +0000 (15:10 -0400)]
LU-9859 libcfs: remove libcfs_debug_vmsg2

Now that libcfs_debug_vmsg2 has no (external) users, we can remove it.
It is used to implement libcfs_debug_msg(), so simply move
the body of the function (suitably modified) into that one caller.

Linux-commit: d42a3aded317c97594c19995879999428de53c46

Signed-off-by: NeilBrown <neilb@suse.com>
Change-Id: I80d24abcc23a8f6e2f8b995d1337ba5038318d5a
Reviewed-on: https://review.whamcloud.com/35225
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12328 flr: preserve last read mirror 11/35111/2
Jinshan Xiong [Sat, 8 Jun 2019 05:34:03 +0000 (22:34 -0700)]
LU-12328 flr: preserve last read mirror

This patch preserves the mirror that has been read successfully
so that all subsequent I/O can take this advantage it and avoid
trying to read unavailable OSTs.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Change-Id: I806f936340db7c73228048edf21d5ecbed4b3c6c
Reviewed-on: https://review.whamcloud.com/35111
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9971 lnet: use after free in lnet_discover_peer_locked() 44/28944/5
Olaf Weber [Tue, 12 Sep 2017 12:07:50 +0000 (14:07 +0200)]
LU-9971 lnet: use after free in lnet_discover_peer_locked()

When the lnet_net_lock is unlocked, the peer attached to an
lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer)
can change, and the old peer deallocated. If we are really
unlucky, then all the churn could give us a new, different,
peer at the same address in memory.

Change the reference counting on the lnet_peer lp so that it
is guaranteed to be alive when we relock the lnet_net_lock for
the cpt. When the reference count is dropped lp may go away if
it was unlinked, but the new peer is guaranteed to have a
different address, so we can still correctly determine whether
the peer changed and discovery should be redone.

Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
Change-Id: Ia44dce20074b27ec0e77d7c1908c6a44ec73d326
Reviewed-on: https://review.whamcloud.com/28944
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-930 doc: update MAINTAINERS file 41/35241/4
Andreas Dilger [Mon, 17 Jun 2019 03:23:50 +0000 (05:23 +0200)]
LU-930 doc: update MAINTAINERS file

Update Patrick's email address, and remove John Hammond from the list.

Add some existing files to existing subsystems, and add a few new
subsystems for recently-landed code.

Re-order a couple of the entries to be in alphabetical order.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I297ec31abf65d54dc363b5d5fd460b7b3e3ebbe5
Reviewed-on: https://review.whamcloud.com/35241
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12045 tests: honor EXCEPT tests when using ONLY list 38/34938/9
James Nunez [Wed, 22 May 2019 16:22:19 +0000 (10:22 -0600)]
LU-12045 tests: honor EXCEPT tests when using ONLY list

The Lustre test framework allows a user to specify a subset
of tests to run using the ONLY parameter or --only flag.
The test framwork also allows the user to specify a list of
tests to skip using the EXCEPT or ALWAYS_EXCEPT parameters.
By default, if the ONLY parameter or --only flag is used,
the EXCEPT and ALWAYS_EXCEPT lists are ignored.

Add a flag to auster, -H, and an environment variable,
HONOR_EXCEPT, to skip the tests on the ALWAYS_EXCEPT,
EXCEPT and SLOW lists when using the ONLY/--only parameter.

Test-Parameters: trivial
Test-Parameters: envdefinitions=ONLY="40-43" testlist=sanity
Test-Parameters: envdefinitions=ONLY="40-43" austeroptions=-H testlist=sanity
Test-Parameters: envdefinitions=SLOW="no",ONLY="27" testlist=sanity
Test-Parameters: envdefinitions=SLOW="no",ONLY="27" austeroptions=-H testlist=sanity
Test-Parameters: envdefinitions=SLOW="yes",ONLY="27" testlist=sanity
Test-Parameters: envdefinitions=SLOW="yes",ONLY="27" austeroptions=-H testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I173e48e1d2dc3b404d148146639a13148bc48a3d
Reviewed-on: https://review.whamcloud.com/34938
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11518 ptlrpc: don't reset lru_resize on idle reconnect 85/35285/7
Andriy Skulysh [Tue, 11 Jun 2019 14:44:32 +0000 (17:44 +0300)]
LU-11518 ptlrpc: don't reset lru_resize on idle reconnect

ptlrpc_disconnect_idle_interpret() clears imp_remote_handle,
so reconnect has pcaa_initial_connect set to 1.

Update only changed ns_connect_flags bits.

Fixes: 5a6ceb664f0 ("LU-7236 ptlrpc: idle connections can disconnect")
Change-Id: I2368708b6381c1d772c47dc6e61c8fb39a14a2cc
Cray-bug-id: LUS-7471
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/35285
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11444 ptlrpc: Add increasing XIDs CONNECT2 flag 13/35113/3
Andriy Skulysh [Fri, 28 Sep 2018 11:02:22 +0000 (14:02 +0300)]
LU-11444 ptlrpc: Add increasing XIDs CONNECT2 flag

This patch reserves the OBD_CONNECT2 flag
for increasing XIDs.

Change-Id: Id825ce1c86a345884c9d4cd90cd2e13839a268f0
Cray-bug-id: LUS-6272
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/35113
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
4 years agoLU-4423 lustre: don't declare extern variables in C files. 94/35294/3
NeilBrown [Sun, 23 Jun 2019 13:34:50 +0000 (09:34 -0400)]
LU-4423 lustre: don't declare extern variables in C files.

'extern' declarations should only appear in .h files.
All these names are declared in .h files as needed,
and these duplicate declarations in .c files can
be removed.

Test-Parameters: trivial

Change-Id: Ic563789f350fd21fd033f1d3c49cdac2125b86c5
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35294
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
4 years agoLU-4423 mgc: remove llog_process_lock 93/35293/2
NeilBrown [Sun, 23 Jun 2019 13:28:24 +0000 (09:28 -0400)]
LU-4423 mgc: remove llog_process_lock

This mutex is never used, so remove it.

Test-Parameters: trivial

Change-Id: I53ce68beafcf8b895c1db138a6eff2ade2994832
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35293
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-1538 tests: standardize test script init - lnet 54/35254/2
Andreas Dilger [Mon, 17 Jun 2019 20:59:39 +0000 (14:59 -0600)]
LU-1538 tests: standardize test script init - lnet

Standardize the initial Lustre test script initialization for clarity
and consistency.

The LUSTRE path is already normalized in init_test_env(), so this
doesn't need to be done in the caller.  Use $(...) subshells instead
of `...` in the affected lines.  Remove NAME, PATH, SAVE_PWD,
RLUSTRE, MULTIOP variable initialization, since it is already done in
init_test_env() or not needed in the test script.

Move all definitions of ALWAYS_EXCEPT and SLOW to after
init_test_env() and init_logging() and call build_test_filter()
immediately after the ALWAYS_EXCEPT and SLOW definitions.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-zfs-part-3
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I491cf3e282a1f31fb8c87440554445d708c6da1e
Reviewed-on: https://review.whamcloud.com/35254
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10931 tests: stop running recovery-small 136 25/35325/5
James Nunez [Tue, 25 Jun 2019 21:46:15 +0000 (15:46 -0600)]
LU-10931 tests: stop running recovery-small 136

recovery-small test 136 hangs on MDS mount with multiple
MDSs.  We need to stop running this test until we find a
solution for this problem.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=recovery-small

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I5364efc8b6fe1ea9b3c3920121a8fab6ac03bd05
Reviewed-on: https://review.whamcloud.com/35325
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
4 years agoLU-12349 llite: console message for disabled flock call 86/34986/13
Li Xi [Thu, 13 Jun 2019 03:27:50 +0000 (11:27 +0800)]
LU-12349 llite: console message for disabled flock call

When flock option is disabled on a Lustre client, any call to
flock() or lockf() would cause a return value with failure.
For applications that don't print proper error message, it is
hard to know the root cause is the missing flock option on Lustre
file system. Thus this patch prints following error message to
the tty that calls flock()/lockf():

"Lustre: flock disabled, mount with '-o [local]flock' to enable"

Such message will print to each file descriptor no more than
once to avoid message flood.

In order to do so, this patch adds support for CDEBUG_LIMIT(D_TTY).
It prints the message to tty. When using this macro, please
note that "\r\n" needs to be the end of the line. Otherwise,
message like "format at $FILE:$LINO:$FUNC doesn't end in '\r\n'"
will be printed to the system message for warning.

Change-Id: I4eeb3ea219848ebbbca9d14e3d2b8a23237105b5
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/34986
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 lod: layout_del memleak 70/35270/4
Vitaly Fertman [Wed, 19 Jun 2019 21:28:40 +0000 (00:28 +0300)]
LU-10070 lod: layout_del memleak

A component may have been declared on a pool or on a specific set
of OSTs, however only the first component is INIT'ed at the beginning.
lod_layout_del should take care about these allocations independently
of the existent striping

Cray-bug-id: LUS-2528
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I7069154ec6b3e64cd945231829c19c4c6920c030
Reviewed-on: https://review.whamcloud.com/35270
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11775 obdclass: protect imp_sec using rwlock_t 61/33861/7
Li Dongyang [Fri, 14 Dec 2018 02:16:39 +0000 (13:16 +1100)]
LU-11775 obdclass: protect imp_sec using rwlock_t

We've seen spinlock contention on imp_lock in
sptlrpc_import_sec_ref(), introduce a new rwlock
imp_sec_lock to protect imp_sec instead of using imp_lock.

This patch also removes imp_sec_mutex from obd_import,
which is not needed, to avoid confusion between
imp_sec_lock/mutex.

Test-Parameters: testlist=sanity-sec envdefinitions=SHARED_KEY=true
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I3be3a6d1225888134bf7dd58e4b05b864f8415b4
Reviewed-on: https://review.whamcloud.com/33861
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 ldlm: Fix style issues for ldlm_lib.c 95/34495/10
Arshad Hussain [Thu, 21 Mar 2019 01:10:18 +0000 (06:40 +0530)]
LU-6142 ldlm: Fix style issues for ldlm_lib.c

This patch fixes issues reported by checkpatch for
file lustre/ldlm/ldlm_lib.c

Change-Id: I7555974ea311af4e3d4eb64a24f810e4699a8690
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34495
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-6142 obdecho: Fix style issues for echo.c 90/34490/5
Arshad Hussain [Wed, 20 Mar 2019 15:08:32 +0000 (20:38 +0530)]
LU-6142 obdecho: Fix style issues for echo.c

This patch fixes issues reported by checkpatch
for file lustre/obdecho/echo.c

Change-Id: I4a828de8c0761d5e9058beae38009055f1030c81
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34490
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-6142 utils: Fix style issues for mount_utils.c 39/34439/7
Arshad Hussain [Sat, 9 Mar 2019 20:55:10 +0000 (02:25 +0530)]
LU-6142 utils: Fix style issues for mount_utils.c

This patch fixes issues reported by checkpatch
for file lustre/utils/mount_utils.c

Change-Id: Ic60df860a15d8b66ec37c979eec66b2bc0c52dea
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34439
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-11867 osd-ldiskfs: FID in LMA mismatch won't block create 52/34052/5
Lai Siyao [Mon, 7 Jan 2019 03:37:48 +0000 (11:37 +0800)]
LU-11867 osd-ldiskfs: FID in LMA mismatch won't block create

Sometimes two OST objects may be mapped to the same inode, so the
second object FID mismatch with FID in inode LMA, in this case,
if this inode was not written yet, it's safe to set object inode
to NULL to let it create a new inode.

Another case is if the mapped inode doesn't exist, it's also safe
to not initialize inode and return 0, so that create can succeed.

Add sanity-scrub.sh 4d for this.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ic84cdeaca2ea202ab0c01a0075a2f9ee8627f508
Reviewed-on: https://review.whamcloud.com/34052
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4423 ptlrpc: make ptlrpc_bulk_frag_ops always const. 95/35295/2
NeilBrown [Sun, 23 Jun 2019 13:41:25 +0000 (09:41 -0400)]
LU-4423 ptlrpc: make ptlrpc_bulk_frag_ops always const.

There is one place where a non-const pointer to this struct
exists, and that causes a cast to be required.

Make it always const, and discard the cast.

Test-Parameters: trivial

Change-Id: Ic48ff6a1f759abdddc75298feee2b68c60bf3e7b
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35295
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4423 lprocfs: use log2.h macros instead of shift loop. 74/35274/3
NeilBrown [Thu, 20 Jun 2019 03:43:30 +0000 (23:43 -0400)]
LU-4423 lprocfs: use log2.h macros instead of shift loop.

These shift loops seem to be trying to avoid doing a
multiplication.
The same effect can be achieved more transparently using
rounddown_pow_of_two().  Even though there is a multiplication
in the C code, the resulting machine code just does a single shift.

As rounddown_pow_of_two() is not defined for 0, and as we cannot be
positively use the blk_size is non-zero, use blk_size ?: 1.

Change-Id: Ie4cec1ca3f30617df0022a9c0dd80fe7e755ed64
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35274
Tested-by: Jenkins
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-4423 ldlm: discard varname in ldlm_pool. 73/35273/3
NeilBrown [Thu, 20 Jun 2019 03:38:40 +0000 (23:38 -0400)]
LU-4423 ldlm: discard varname in ldlm_pool.

This allocated buffer serves no purpose.
A constant string is copied into it, it is passed to some
function which copies it out again, then the buffer is freed.
Instead, we can pass the constant string to that function.

Change-Id: I83ea8fc839da3933ebc34756fc5c06b86427ce7c
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35273
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
4 years agoLU-9859 libcfs: replace cfs_rand() with prandom_u32_max() 72/35272/3
NeilBrown [Thu, 20 Jun 2019 02:57:13 +0000 (22:57 -0400)]
LU-9859 libcfs: replace cfs_rand() with prandom_u32_max()

All occurrences of
   cfs_rand() % X
are replaced with
   prandom_u32_max(X)

cfs_rand() is a simple Linear Congruential PRNG. prandom_u32_max()
is at least as random, is seeded with more randomness, and uses
cpu-local state to avoid cross-cpu issues.

This is the first step is discarding the libcfs prng with
the standard linux prng.

Linux-commit: bcfa98a50763a0f781a8441d1994ae1456816219

Change-Id: I63679c269b72f4c4860cb3a47225178edf2d7892
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35272
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-1538 tests: standardize test script init - dne-part-4 55/35255/3
Andreas Dilger [Mon, 17 Jun 2019 22:20:41 +0000 (16:20 -0600)]
LU-1538 tests: standardize test script init - dne-part-4

Standardize the initial Lustre test script initialization for
clarity and consistency.

The LUSTRE path is already normalized in init_test_env(), so
this doesn't need to be done in the caller.  Use $(...) subshells
instead of `...` in the affected lines.  Remove NAME, CHECKSTAT,
TMP, SAVE_PWD,SRCDIR, PATH, MULTIOP, SETUP, CLEANUP variable
initialization, since it is already done in init_test_env() or
not needed in the test script.  Remove all calls to get_lustre_env()
in the test scripts since this is called in init_test_env().

Move all definitions of ALWAYS_EXCEPT and SLOW to after
init_test_env() and init_logging() and call build_test_filter()
immediately after the ALWAYS_EXCEPT and SLOW definitions.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-4
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I9d2a7e6bedd2d66e5ee564405b86b6206226769f
Reviewed-on: https://review.whamcloud.com/35255
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12446 tests: Fix use of undefined variable under sanityn.sh/16b 46/35246/5
Arshad Hussain [Mon, 17 Jun 2019 18:49:02 +0000 (00:19 +0530)]
LU-12446 tests: Fix use of undefined variable under sanityn.sh/16b

Under sanityn.sh/16b variable STRIPE_BYTES is used to define
the block size within 'dd' command. However STRIPE_BYTES is
used undefined. This results in 'dd' command failing silently.
Although this does not affect the outcome of the test, the 'dd'
command used this way is no-op.

This patch fixes use of undefined variable under sanityn.sh:16b
by replacing variable STRIPE_BYTES with variable stripe_size.

Test-Parameters: trivial testlist=sanityn
Change-Id: I8bc947d93e339573759d5d37e800aae9bc3b4b18
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/35246
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-12424 lnet: prevent loop in LNetPrimaryNID() 91/35191/6
Amir Shehata [Tue, 11 Jun 2019 18:25:27 +0000 (11:25 -0700)]
LU-12424 lnet: prevent loop in LNetPrimaryNID()

If discovery is disabled locally or at the remote end, then attempt
discovery only once. Do not update the internal database when
discovery is disabled and do not repeat discovery.

This change prevents LNet from getting hung waiting for
discovery to complete.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4543b0f71e6cf297a1a5f058ebcc6bf74b8ac328
Reviewed-on: https://review.whamcloud.com/35191
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11673 tests: add space before ']' in test-framework 79/35079/4
James Nunez [Thu, 6 Jun 2019 13:48:13 +0000 (07:48 -0600)]
LU-11673 tests: add space before ']' in test-framework

The test command '[' expects spaces before all arguments
including the closing ']'.

Add a space before the closing ']' in the function
print_summary() in test-framework.sh.

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: If2365cb5f2b9c003949c6224997644c61341fe35
Reviewed-on: https://review.whamcloud.com/35079
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
4 years agoLU-8066 obd: collect all resource releasing for obj_type. 16/34716/4
NeilBrown [Wed, 5 Jun 2019 16:36:23 +0000 (12:36 -0400)]
LU-8066 obd: collect all resource releasing for obj_type.

Now that obj_type is managed as a kobject, move all
the freeing and deregistering into class_sysfs_release().

Change-Id: I784287ea17e010206b5fa256c7a224d01085be92
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/34716
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 scrub: handle s_uuid change to uuid_t 89/34689/8
James Simmons [Thu, 20 Jun 2019 20:57:01 +0000 (16:57 -0400)]
LU-11838 scrub: handle s_uuid change to uuid_t

The 4.12 kernel changed the s_uuid field in struct super_block from
an character array to an uuid_t. While ldiskfs uses it own s_uuid
field in struct ext4_super_block that field is a char array instead
of an uuid. Currently on going effort are being down in the linux
kernel to move to uuid_t so I suspect this will change in the future.
Since this is the case change all the character arrays for uuid
handling to uuid_t located in the scrubbing code. Change osd-ldiskfs
to use the struct super_block uuid, which is equivalent to s_es
version, to handle the uuid_t changes now.

Change-Id: I40643d342b5bc17a6ef922e99b3e8524930822de
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34689
Tested-by: Jenkins
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-9859 libcfs: replace cfs_get_random_bytes calls with get_random_byte() 34/35234/4
NeilBrown [Sat, 15 Jun 2019 01:21:15 +0000 (21:21 -0400)]
LU-9859 libcfs: replace cfs_get_random_bytes calls with get_random_byte()

The cfs_get_random_bytes() interface adds nothing of value
to get_random_byte() (which it uses internally).  So just use the
standard interface.

Linux-commit: e904f839cdb04d1b314753a83a6e58146e315c66

Change-Id: I48e153d7658f0f616afe4e884faeb09c2dbdcd03
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35234
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: deprecate libcfs_debug_vmsg2 24/35224/4
NeilBrown [Fri, 14 Jun 2019 01:56:34 +0000 (21:56 -0400)]
LU-9859 libcfs: deprecate libcfs_debug_vmsg2

Since 2.6.36, Linux' vsprintf has supported %pV
which supports "recursive sprintf" - exactly the task
that libcfs_debug_vmsg2 aims to provide.

Instead of calling libcfs_debug_vmsg2(), we can put the fmt and
args in a 'struct va_format', and pass the address of that structure
to the "%pV" format.

So do this to remove all users of libcfs_debug_vmsg2().

Linux-commit: 0fe922e1eca8e2850f0e6c535a14ba7414ca73c2

Change-Id: I6952ca8fdb619423639734aab1a30f4635b089cc
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35224
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12226 doc: recommend e2fsprogs 1.45.2.wc1 02/35202/2
Li Dongyang [Wed, 12 Jun 2019 06:17:00 +0000 (16:17 +1000)]
LU-12226 doc: recommend e2fsprogs 1.45.2.wc1

Update the recommended e2fsprogs version to 1.45.2.wc1

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I0eea35a0bcc24a6109d0c90254e9d071f70e8e9d
Reviewed-on: https://review.whamcloud.com/35202
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
4 years agoLU-8066 tests: use lod / osp tunables on servers 85/35185/2
James Simmons [Tue, 11 Jun 2019 15:39:17 +0000 (11:39 -0400)]
LU-8066 tests: use lod / osp tunables on servers

Before the lustre 2.4 OSD work the lov and osc code was used on
both servers and clients. With the OSD layer work we saw the new
lod and osp layers created that are server specific. To avoid
breakage symlinks were created that went from the lod / osp to
lov / osc directories in the proc tree on the server side.

It has been a very long time since that change so we can now
safely start to unwind that handling. The first step taken here
is to migrate the maloo test from using lov / osc for the server
tunables to using lod / osp instead.

Change-Id: I9dd562cd74d68aaa0226d5ab93042b52193604a1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35185
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12420 utils: llog_reader handles uninitialized mountdata 78/35178/2
Li Xi [Tue, 11 Jun 2019 12:28:30 +0000 (20:28 +0800)]
LU-12420 utils: llog_reader handles uninitialized mountdata

When reading an mountdata that has never been used, "llog_reader
CONFIGS/mountdata" command crashes with following output:

Header size : 500170753
Time : Wed Sep  4 00:57:37 6869
Number of records: 65534
Target uuid :
-----------------------
Segmentation fault

After apply this patch, llog_reader will print following message
and quit under this circumstance:

Header size : 500170753
Time : Wed Sep  4 00:57:37 6869
Number of records: 65534
Target uuid :
-----------------------
uninitialized llog record at index 0

Change-Id: I25147f7fd09c6d59ff0049bdb20ac1979cf43ee4
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/35178
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12420 utils: llog_reader handles uninitialized llog properly 77/35177/3
Li Xi [Tue, 11 Jun 2019 08:26:22 +0000 (16:26 +0800)]
LU-12420 utils: llog_reader handles uninitialized llog properly

When reading an empty LLOG, llog_reader would crash because
of record number of zero. E.g. "llog_reader CONFIGS/nodemap" on
a MGS without nodemap configuration would cause failure of:

llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
llog_reader: Could not pack buffer.: Cannot allocate memory (12)

After apply this patch, llog_reader will print following message
and quit if the LLOG is unintialized:

uninitialized llog: zero record number

Change-Id: I87246672e9fc992c99126134236c2e8d304df74b
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/35177
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12382 llite: fix deadloop with tiny write 58/35058/7
Wang Shilong [Tue, 4 Jun 2019 12:54:01 +0000 (20:54 +0800)]
LU-12382 llite: fix deadloop with tiny write

For a small write(<4K), we will use tiny write and
__generic_file_write_iter() will be called to handle it.

On newer kernel(4.14 etc), the function is exported and will
do something like following:

|->__generic_file_write_iter
  |->generic_perform_write()

If iov_iter_count() passed in is 0, generic_write_perform() will
try go to forever loop as bytes copied is always calculated as 0.

The problem is VFS doesn't always skip IO count zero before it comes
to lower layer read/write hook, and we should do it by ourselves.

To fix this problem, always return 0 early if there is no
real any IO needed.

Change-Id: I765a723da79eb5fd09317c3fad47fe479b1dd4fb
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35058
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12330 obdclass: allow per-session jobids. 95/34995/9
NeilBrown [Thu, 23 May 2019 00:15:43 +0000 (10:15 +1000)]
LU-12330 obdclass: allow per-session jobids.

Lustre includes a jobid in all RPC message sent to the server.  This
is used to collected per-job statistics, where a "job" can involve
multiple processes on multiple nodes in a cluster.

Nodes in a cluster can be running processes for multiple jobs, so it
is best if different processes can have different jobids, and that
processes on different nodes can have the same job id.

The current mechanism for supporting this is to use an environment
variable which the kernel extracts from the relevant process's address
space.  Some kernel developers see this to be an unacceptable design
choice, and the code is not likely to be accepted upstream.

This patch provides an alternate method, leveraging the concept of a
"session id", as set with setsid().  Each login session already gets a
unique sid which is preserved for all processes in that session unless
explicitly changed (with setsid(1)).
When a process in a session writes to
     /sys/fs/lustre/jobid_this_session
the string becomes the name for that session.
If jobid_var is set to "session", then the per-session jobid is used
for the jobid for all requests from processes in that session.

When a session ends, the jobid information will be purged within 5
minutes.

Change-Id: I6fb1a75f8f60f824e402706de0b1439464bfa05c
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/34995
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12043 llite: improve single-thread read performance 95/34095/35
Wang Shilong [Mon, 21 Jan 2019 12:23:47 +0000 (20:23 +0800)]
LU-12043 llite: improve single-thread read performance

Here is whole history:

Currently, for sequential read IO, We grow up window
size very quickly, and once we cached @max_readahead_per_file
pages. For following command:

  dd if=/mnt/lustre/file of=/dev/null bs=1M

We will do something like following:
...
64M bytes cached.
fast io for 16M bytes
readahead extra 16M to fill up window.
fast io for 16M bytes
readahead extra 16M to fill up window.
....

In this way, we could only use fast IO for 16M bytes and
then fall through non-fast IO mode. this is also reason
that why increasing @max_readahead_per_file don't give us
performances up, since this value only changes how much
memory we cached in memory, during my testing whatever
I changed the value, i could only get 2GB/s for single thread
read.

Actually, we could do this better, if we have used
more than 16M bytes readahead pages, submit another readahead
requests in the background. and ideally, we could always
use fast IO.

Test Patched Unpatched
dd if=file of=/dev/null bs=1M.   4.0G/s 1.9G/s
ior -np 192 r -t 1m -b 4g -F -e -vv -o /cache1/ior -k 11195.97 10817.02 MB/sec

Tested with drop OSS and client memory before every run.
max_readahead_per_mb=128M, RPC size is 16M.
dd file's size is 400G which is double of memory or so.

Change-Id: I9b6be078ca24c256198488a9c1635791dafbd7e7
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34095
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12043 llite,readahead: don't always use max RPC size 33/35033/4
Wang Shilong [Sun, 2 Jun 2019 15:17:26 +0000 (23:17 +0800)]
LU-12043 llite,readahead: don't always use max RPC size

Since 64M RPC landed, @PTLRPC_MAX_BRW_PAGES will be 64M.
And we always try to use this max possible RPC size to check
whether we should avoid fast IO and trigger real context IO.

This is not good for following reasons:

(1) Since current default RPC size is still 4M,
most of system won't use 64M for most of time.

(2) Currently default readahead size per file is still 64M,
which makes fast IO always run out of all readahead pages
before next IO. This breaks what users really want for readahead
grapping pages in advance.

To fix this problem, we use 16M as a balance value if RPC smaller
than 16M, patch also fix the problem that @ras_rpc_size could not
grow bigger which is possibe in the following case:

1) set RPC to 16M
2) Set RPC to 64M

In the current logic ras->ras_rpc_size will be kept as 16M which is wrong.

Change-Id: Ida9f839f7c692cd88d32dc0909503f6ae991d909
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35033
Tested-by: Jenkins
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9010 lnet: Change static defines to use macro for module.c 32/33932/12
Arshad Hussain [Thu, 27 Dec 2018 12:26:17 +0000 (07:26 -0500)]
LU-9010 lnet: Change static defines to use macro for module.c

This patch replaces mutex which are defined statically
in file lnet/lnet/module.c with kernel provided macro.

Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I59de4514dc332c3c59e0d816720a81394521881c
Reviewed-on: https://review.whamcloud.com/33932
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11775 osc: reduce lock contention in osc_unreserve_grant 58/33858/6
Li Dongyang [Fri, 14 Dec 2018 01:19:09 +0000 (12:19 +1100)]
LU-11775 osc: reduce lock contention in osc_unreserve_grant

In osc_queue_async_io() the cl_loi_list_lock is acquired to reserve
and consume the grant and released, right after we expand the extent
the same lock is used to unreserve the grant.
We can keep the spinlock when we are done with the grant to improve
the throughput.

mpirun  -np 32 /root/ior-openmpi/src/ior -w -t 1m -b 8g -F -e -vv
-o /scratch0/file -i 1
master:
Max Write: 13799.70 MiB/sec (14470.04 MB/sec)
master with 33858:
Max Write: 14339.57 MiB/sec (15036.13 MB/sec)

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ic61af84c7b98b5a189d7adabe33ae687954b2ed4
Reviewed-on: https://review.whamcloud.com/33858
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12411 lnet: Do not allow gateways on remote nets 98/35198/2
Chris Horn [Tue, 11 Jun 2019 19:59:31 +0000 (14:59 -0500)]
LU-12411 lnet: Do not allow gateways on remote nets

A gateway needs to be reachable over some local interface.

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ib66d4f8fd48d8863097280c480648ab8e29d2767
Reviewed-on: https://review.whamcloud.com/35198
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10070 ldlm: layout lock fixes 32/35232/7
Vitaly Fertman [Tue, 11 Jun 2019 23:18:45 +0000 (02:18 +0300)]
LU-10070 ldlm: layout lock fixes

as the intent_layout operation becomes more frequent with SEL,
cancel existent layout locks in advance and reuse ELC to deliver
cancels to MDS

as clients are given LCK_EX layout locks, take into account this
mode as well in ldlm_lock_match

Cray-bug-id: LUS-2528
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I1525153b3a07385fc17ef5416ded7b6d4378b2ec
Reviewed-on: https://review.whamcloud.com/35232
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12447 utils: specify correct size for lfs project buffer 57/35257/2
Wang Shilong [Tue, 18 Jun 2019 00:38:17 +0000 (20:38 -0400)]
LU-12447 utils: specify correct size for lfs project buffer

Enviorment:
Fedora release 28 (Twenty Eight)

gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
Copyright (C) 2018 Free Software Foundation, Inc.

Hit build failure:
lfs_project.c: In function ‘lfs_project_item_alloc’:
lfs_project.c:72:2: error: ‘strncpy’ specified bound 4096
equals destination size [-Werror=stringop-truncation]
  strncpy(lpi->lpi_pathname, pathname, sizeof(lpi->lpi_pathname));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Ia6429c47391bf503546609ec6a262fe24664bdd4
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35257
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12392 utils: specify correct size for the buffer 70/35070/6
Alex Zhuravlev [Wed, 5 Jun 2019 13:38:27 +0000 (16:38 +0300)]
LU-12392 utils: specify correct size for the buffer

otherwise gcc8 makes a warning which interrupts build.

Change-Id: I6a94c6cd63473df9fc88b1867bbda1353fa10247
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35070
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-12355 ldiskfs: bio_phys_segments symbol is not exported 40/35040/3
Shaun Tancheff [Sat, 8 Jun 2019 17:05:04 +0000 (12:05 -0500)]
LU-12355 ldiskfs: bio_phys_segments symbol is not exported

As of kenrel 5.0 bio_phys_segments not exported
It is only used in one CDEBUG(D_INODE so use bio->bi_phys_segments
directly.

Linux-commit: 6c210aa596d0ecf6f3eea65c02ac807877385a18

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I19cf7cab86ccebe4fccf7a34a945a4150069d18b
Reviewed-on: https://review.whamcloud.com/35040
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 obd: cleanup server sysfs symlinks handling 15/34715/14
James Simmons [Wed, 5 Jun 2019 16:32:38 +0000 (12:32 -0400)]
LU-8066 obd: cleanup server sysfs symlinks handling

Rename class_setup_tunables() to class_add_symlinks(). Move all the
special sysfs and debugfs symlink handling into the function
class_add_symlinks(). Now that the obd_type is created using sysfs
handling we can complete the initializion if the real obd device is
registered later. For example if lod is registered first and it
creates the the "lov" obd_type. Then if the lov module is loaded
later then class_register_type() will use the "lov" obd_type created
by the lod module.

Change-Id: I754ec15a88458b170422b988d783efbe20141b87
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34715
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro 90/35090/5
Vitaly Fertman [Wed, 5 Jun 2019 21:17:34 +0000 (00:17 +0300)]
LU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro

Add rq_no_reply flag to the DEBUG_REQ_FLAGS macro for debug purposes
Also, add another debug message to check_write_rcs

Test-Parameters: trivial
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I39ea7e9359a377ad46f7600edad14375f9935793
Reviewed-on: https://review.whamcloud.com/35090
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11264 llapi: reduce llapi_stripe_limit_check() overhead 91/35091/2
Andreas Dilger [Fri, 7 Jun 2019 03:34:49 +0000 (21:34 -0600)]
LU-11264 llapi: reduce llapi_stripe_limit_check() overhead

There is no need to check PAGE_SIZE in llapi_stripe_limit_check()
every time, since this cannot change between calls.

Always set errno if an error is returned.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib377c1cc734c9e683f75eeb509e220c4ea3ebbe5
Reviewed-on: https://review.whamcloud.com/35091
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11089 obd: rename lu_keys_guard to lu_context_remembered_guard 73/33673/8
NeilBrown [Wed, 5 Jun 2019 13:26:36 +0000 (09:26 -0400)]
LU-11089 obd: rename lu_keys_guard to lu_context_remembered_guard

The only remaining use of lu_keys_guard is to protect the
lu_context_remembers linked list, and always write_lock()
is used.
So rename it to reflect this, and change to a spinlock.
We move keys_fini() out of the locked region in
lu_context_fini() - once we have removed the context from
the lc_remembers list, there can no longer be a race.

Linux-commit: dc8aaaca0062878c2fbad9df1b9ac3e85cad8630

Change-Id: Id66930e073d5351a96b139f2fc1a8007841de728
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33673
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11213 lod: default LMV can't be deleted 07/35207/3
Lai Siyao [Wed, 12 Jun 2019 10:33:12 +0000 (18:33 +0800)]
LU-11213 lod: default LMV can't be deleted

When 'space' hash type was introduced, default LMV deletion added
check for hash type, but it only checks whether type is 'none',
while by default it's 'fnv_1a_64', which caused default LMV can't
be deleted.

Change check to !LMV_HASH_TYPE_SPACE and update test 413b.

Fixes: a24f6153292 ("LU-11213 dne: add new dir hash type 'space'")

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I88c630dc8d339ddeb9dc03d6f8987d8783062a13
Reviewed-on: https://review.whamcloud.com/35207
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>