Whamcloud - gitweb
fs/lustre-release.git
6 years agoLU-9950 build: add support for Ubuntu(debian) arm64 70/28870/2
Gu Zheng [Wed, 6 Sep 2017 03:14:35 +0000 (21:14 -0600)]
LU-9950 build: add support for Ubuntu(debian) arm64

Add arm64 into the support arch list of debian control file.

Change-Id: I9c39a4d8c1896c1255432380bd956330c2edf476
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Reviewed-on: https://review.whamcloud.com/28870
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9921 lnet: resolve unsafe list access 23/28723/6
Amir Shehata [Sat, 26 Aug 2017 04:18:16 +0000 (21:18 -0700)]
LU-9921 lnet: resolve unsafe list access

Use list_for_each_entry_safe() when accessing messages on pending
queue. Remove the message from the list before calling lnet_finalize()
or lnet_send().

When destroying the peer make sure to queue all pending messages on
a global list. We can not resend them at this point because the
cpt lock is held. Unlocking the cpt lock could lead to an inconsistent
state. Use the discovery thread to check if the global list is not
empty and if so resend all messages on the list. Use a new spin
lock to protect the resend message list. I steered clear from reusing
an existing lock because LNet locking is complex and reusing a lock
will add to this complexity. Using a new lock makes the code easier
to understand.

Verify that all lists are empty before destroying the peer_ni

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ia081419ec5ed2be5823cfbca7e050138a229ab9c
Reviewed-on: https://review.whamcloud.com/28723
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-7746 tests: skip tests for older (upstream) client 18/28718/3
Andreas Dilger [Fri, 25 Aug 2017 19:02:01 +0000 (13:02 -0600)]
LU-7746 tests: skip tests for older (upstream) client

Skip some tests when running newer sanity.sh on an older client.
This typically only happens when testing the upstream client,
since otherwise the tests will always match the client version.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I78e1b0a6ae98879a2039817696c3a0dd15621fcc
Reviewed-on: https://review.whamcloud.com/28718
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9891 tests: Increase space not released for ZFS 82/28682/4
James Nunez [Thu, 24 Aug 2017 14:51:15 +0000 (08:51 -0600)]
LU-9891 tests: Increase space not released for ZFS

Several Lustre tests calculate the free space on the
object storage servers. For servers running ZFS, the amount
of space released by ZFS is not 100% deterministic. Thus,
fs_log_size() will return the buffer size that we allow
the space to be off by. For ZFS, increase this buffer
from 400 to 512 KB.

Test-Parameters: trivial testgroup=review-zfs-part-2
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I32e0ae3752d0ee0e9f0091ea779f8b53ba969a26
Reviewed-on: https://review.whamcloud.com/28682
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9870 build: handle SNMP missing on build box 94/28494/3
James Simmons [Fri, 11 Aug 2017 18:50:52 +0000 (14:50 -0400)]
LU-9870 build: handle SNMP missing on build box

Currently the lustre spec file doesn't handle the case when SNMP
is missing. So even if the user does configure --disable-snmp our
rpm build process will ignore this and fail to build rpms. Pass
to the rpm build process the missing SNMP case.

Test-Parameters: trivial

Change-Id: Ia6dcfd31b50f4f67afe7a4545fe417c32df6e559
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28494
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9044 test: remove conf-sanity tests from ALWAYS_EXCEPT 59/25059/8
dilip krishnagiri [Mon, 28 Aug 2017 15:39:57 +0000 (09:39 -0600)]
LU-9044 test: remove conf-sanity tests from ALWAYS_EXCEPT

Removing the following conf-sanity tests

bugzilla ticket 23954 added test
24b "Multiple MGSs on a single node (should return err)"
to the ALWAYS_EXECPT list. Bugzilla 23954 is resolved.

from ALWAYS_EXCEPT list.

Test-Parameters: trivial combinedmdsmgs=false testlist=conf-sanity

Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Change-Id: If379ac75921563412121e96439f49ab49dfb5fbc
Reviewed-on: https://review.whamcloud.com/25059
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8342 utils: Set dnodesize/recordsize at zfs dataset create 55/21055/11
Giuseppe Di Natale [Tue, 14 Jun 2016 15:29:31 +0000 (08:29 -0700)]
LU-8342 utils: Set dnodesize/recordsize at zfs dataset create

After the zfs dataset is created, attempt to set the
dnodesize and recordsize properties. Moved xattr=sa to be
consistent with the new method of setting dataset properties.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: I12e5863e4602496b85f8512ea780be4589489d01
Reviewed-on: https://review.whamcloud.com/21055
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9941 lov: lsm_is_composite isn't right 45/28845/2
Bobi Jam [Sat, 2 Sep 2017 08:34:23 +0000 (16:34 +0800)]
LU-9941 lov: lsm_is_composite isn't right

LOVEA magic containing LOV_MAGIC_MAGIC will also be regarded as
a composite magic.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I3ef37ee80364b2a8f27831e3c53fb88b464f2039
Reviewed-on: https://review.whamcloud.com/28845
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9260 test: Use the correct mount device when test against lustre 61/28661/4
Wei Liu [Wed, 23 Aug 2017 16:49:29 +0000 (09:49 -0700)]
LU-9260 test: Use the correct mount device when test against lustre

The changes pass the MGSNID:/FSNAME into test, instead of
using the default loop device when testing against lustre.
The corresponding changes to the Posix test suites are also needed
to make the testing pass. Related changes apply to toolkit.

Test-Parameters: trivial testlist=posix

Change-Id: I32fc5a401fdc53ed133a78dc4c84b4a7e2a5ad19
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/28661
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9810 lnet: prefer Fast Reg 78/28278/5
Alexey Lyashkov [Mon, 4 Sep 2017 14:25:55 +0000 (17:25 +0300)]
LU-9810 lnet: prefer Fast Reg

The FastReg memory model has less CPU overhead than the default.
Therefore prefer it if the HW supports it.  This
applies in particular to the MLX4 HW which supports both memory
models.

Seagate-bug-id: MRP-4508
Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Change-Id: I09a85a3724d78b61a40fe18c72dbcc4a87da3013
Reviewed-on: https://review.whamcloud.com/28278
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9810 lnet: fix build with M-OFED 4.1 77/28277/4
Alexey Lyashkov [Mon, 4 Sep 2017 14:25:54 +0000 (17:25 +0300)]
LU-9810 lnet: fix build with M-OFED 4.1

Add uapi path into includes to make build happy

Seagate-bug-id: MRP-4508
Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Change-Id: If9c61a303de24c78261a7b6fdafec77f52efa0d3
Reviewed-on: https://review.whamcloud.com/28277
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-7001 osp: fix llog processing 32/26132/27
Alexander Boyko [Wed, 22 Mar 2017 11:39:48 +0000 (14:39 +0300)]
LU-7001 osp: fix llog processing

The osp_sync_thread base on fact that llog_cat_process
will not end until umount. This is worng when processing reaches
bottom of catalog, or if catalog is wrapped.
The patch fixes this issue.

For wrapped catalog llog_process_thread could process old
record.
1 thread llog_process_thread read chunk and proccing first record
2 thread add rec to this catalog at this chunk and
  update bitmap
1 check bitmap for next idx and process old record

Test conf-sanity 106 was added.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Seagate-bug-id: MRP-4235
Change-Id: Ifc983018e3a325622ef3215bec4b69f5c9ac2ba2
Reviewed-on: https://review.whamcloud.com/26132
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andriy Skulysh
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-7988 hsm: update many cookie status at once 84/19584/39
Bruno Faccini [Tue, 18 Jul 2017 08:21:53 +0000 (10:21 +0200)]
LU-7988 hsm: update many cookie status at once

Instead of calling mdt_agent_record_update, which calls
cdt_llog_process, once for every HAL, build a list of the cookies to
update with their status and call mdt_agent_record_update just once
per seconds at most.

Update mdt_agent_record_update to take a status for every cookie.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ie4afd667727e07570ed6a2d51e8dfaea8302b97b
Signed-off-by: Ben Evans <bevans@cray.com>
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Reviewed-on: https://review.whamcloud.com/19584
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9907 build: add patchless server for lbuild 72/28672/21
Minh Diep [Wed, 23 Aug 2017 23:07:28 +0000 (16:07 -0700)]
LU-9907 build: add patchless server for lbuild

Adding lbuild support for building patchless server
Cleanup unused TARGET_ARCHS and BUILD_ARCHS

Test-Parameters: trivial

Change-Id: I946352fa243c86d5729779406264e6ee37856145
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28672
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9917 lnet: rediscover peer if it changed 72/28772/2
Amir Shehata [Mon, 28 Aug 2017 22:22:57 +0000 (15:22 -0700)]
LU-9917 lnet: rediscover peer if it changed

If the peer has changed after we unlocked the cpt then
we'll need to discover the new peer.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ib880746d5e67bbea1aa43122fa3aa115261c8664
Reviewed-on: https://review.whamcloud.com/28772
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9918 lnet: decref on peer after use 22/28722/3
Amir Shehata [Sat, 26 Aug 2017 04:26:00 +0000 (21:26 -0700)]
LU-9918 lnet: decref on peer after use

After looking up the peer for both ping and discover
we need to decref the peer so we don't lose a reference
on it. This needs to be done while the mutex_lock is held
to ensure the peer list remains stable.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ic57e67d21b8afe17a239cc496621bc4abf681077
Reviewed-on: https://review.whamcloud.com/28722
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-6051 utils: Remove incorrect request for getstripe help 97/28597/3
Steve Guminski [Fri, 18 Aug 2017 12:37:29 +0000 (08:37 -0400)]
LU-6051 utils: Remove incorrect request for getstripe help

The option flag for stripe-size in the getstripe command was changed
in Lustre 1.8.  To detect the correct flag to use, the help was
parsed.  However, the help was incorrectly invoked by using the
"--help" option, instead of the correct "lfs help getstripe".
Since interoperability with 1.8 is no longer needed, the incorrect
code is removed and the correct flag is hard-coded.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I29ae644c7d6b2ed247573d83c943cb556cfb6325
Reviewed-on: https://review.whamcloud.com/28597
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-6210 utils: Use C99 struct initializers in lnetctl 23/28423/4
Steve Guminski [Mon, 7 Aug 2017 14:55:35 +0000 (10:55 -0400)]
LU-6210 utils: Use C99 struct initializers in lnetctl

This patch makes no functional changes.  The option struct
initializers in lnetctl are updated to C99 syntax.  The short and
long options are renamed to short_opts and long_opts for consistency.

C89 positional initializers require values to be placed in the
correct order. This will cause errors if the fields of the struct
definition are reordered or fields are added or removed. C99 named
initializers avoid this problem, and also automatically clear any
values that are not explicitly set.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I1c1483a57aea918dce84afd0c7e94e31324c189e
Reviewed-on: https://review.whamcloud.com/28423
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9679 ldlm: remove flock accessor macros 56/27856/3
Andreas Dilger [Tue, 27 Jun 2017 22:15:18 +0000 (16:15 -0600)]
LU-9679 ldlm: remove flock accessor macros

Remove old flock wrapper functions that were never used in the kernel:
flock_type(), flock_set_type(), flock_pid(), flock_set_pid(),
flock_start(), flock_set_start(), flock_end(), flock_set_end()

so that our code is closer to that in the upstream kernel.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I8709da925f4aa4650088f72d7e26f5e6281cab07
Reviewed-on: https://review.whamcloud.com/27856
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-2776 tests: waiting multiop finished in sanityn:51a 62/27662/3
Yang Sheng [Thu, 15 Jun 2017 15:29:25 +0000 (23:29 +0800)]
LU-2776 tests: waiting multiop finished in sanityn:51a

The test would fail if multiop be delayed, So we
should wait enough time for it finished.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I9a329857230e3c49a5c78017ed385f20b3554d98
Reviewed-on: https://review.whamcloud.com/27662
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
6 years agoLU-5965 tests: fix parsing for older Lustre versions 93/28793/2
Andreas Dilger [Wed, 30 Aug 2017 06:57:03 +0000 (00:57 -0600)]
LU-5965 tests: fix parsing for older Lustre versions

Fix parsing of Lustre version generated by "lctl get_param version"
before release 2.7.  The old code generated a valid version number
even for older releases, except in the case where the "build:" line
did not start with a numeric value, since that line was incorrectly
being parsed instead of the "lustre:" line due to "$ver" not being
double-quoted properly, so "$ver" was being treated as a single
line and "head -n 1" was doing nothing.  This was offset by sed
dropping everything before the _last_ ":" instead of before the
_first_ ":", and then using the "build: " line.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ifd7dc95aaf0d6edf3558e18b85a78bea861248d0
Reviewed-on: https://review.whamcloud.com/28793
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9791 obd: always call lprocfs_obd_setup 47/28747/3
James Simmons [Mon, 28 Aug 2017 15:31:01 +0000 (11:31 -0400)]
LU-9791 obd: always call lprocfs_obd_setup

In the case of lustre running on a single nodes the function
lprocfs_obd_setup() was not being called for lov/osc. This
was preventing sysfs from being registered. So always call
lprocfs_obd_setup(). Update lprocfs_obd_setup() to see if
obd->obd_proc_entry has already been set and return right
away.

Change-Id: Idbd99ea6a2e59eeee3991048d54c532df7d849ad
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28747
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoRevert "LU-5541 build: build static and dynamic liblustreapi" 83/28783/3
Oleg Drokin [Tue, 29 Aug 2017 17:54:12 +0000 (17:54 +0000)]
Revert "LU-5541 build: build static and dynamic liblustreapi"

This broke Ubuntu 16 build not caught by current review builders.

This reverts commit ab1df50e73ff838053fff62302c3b884e4e19552.

Change-Id: Ie916869267e370791f13c53ceac8e6b1e3de97e9
Reviewed-on: https://review.whamcloud.com/28783
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoRevert "LU-5541 lustreapi: only export the API symbols" 82/28782/3
Oleg Drokin [Tue, 29 Aug 2017 17:33:01 +0000 (17:33 +0000)]
Revert "LU-5541 lustreapi: only export the API symbols"

This commit breaks ubuntu 16 build not caught by review builder.

This reverts commit b36c377ff25c20417c481eab3798e67d042ec3a3.

Change-Id: Ibe9da0d7cd91dbf8a1d51ca3e531af1850af1fab
Reviewed-on: https://review.whamcloud.com/28782
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8653 lod: use stripe_count instead of stripe_nr 81/26681/6
Andreas Dilger [Sat, 1 Oct 2016 00:07:17 +0000 (18:07 -0600)]
LU-8653 lod: use stripe_count instead of stripe_nr

Replace the use of stripenr and stripecnt in the code with
stripe_count to be consistent the rest of the code.

Introduce LOV_PATTERN_NONE instead of using "0" around the
code to indicate no layout has been specified.

Introduce LOV_PATTERN_DEFAULT to indicate the entire layout
is unset, instead of using "0xffffffff" in the code.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I6056aebc1a381b09c1a436fb4a7986a51f3ebbe5
Reviewed-on: https://review.whamcloud.com/26681
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9915 build: remove LC_CONFIG_OBD_BUFFER_SIZE 10/28710/2
James Simmons [Fri, 25 Aug 2017 14:22:39 +0000 (10:22 -0400)]
LU-9915 build: remove LC_CONFIG_OBD_BUFFER_SIZE

One last piece of the CONFIG_LUSTRE_OBD_MAX_IOCTL_BUFFER
removal was missed.

Test-Parameters: trivial

Change-Id: I37970459b1d9427edf52938a6c15f36901c8a462
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28710
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9909 lnet: fix memory leak and lnet_interfaces_max 02/28702/2
Amir Shehata [Fri, 25 Aug 2017 01:06:57 +0000 (18:06 -0700)]
LU-9909 lnet: fix memory leak and lnet_interfaces_max

Free buffer allocated for discover command.

Set lnet_interfaces_max to LNET_INTERFACES_MAX_DEFAULT if
it's not defined or if it's being set to something below
LNET_INTERFACES_MIN.

For lnet_ping() and lnet_discover() if the provided space
can fit more NIDs than lnet_interfaces_max then ensure only
lnet_interfaces_max is copied into the buffer.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I19aed712f40a8bf44d2fb112588e9ae07257469f
Reviewed-on: https://review.whamcloud.com/28702
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9899 tests: mount client on MGS for ost-pools 38/28638/4
James Nunez [Mon, 21 Aug 2017 22:26:30 +0000 (16:26 -0600)]
LU-9899 tests: mount client on MGS for ost-pools

When a Lustre configuration has the MGS and MDS on separate
nodes, the file system must be mounted on the MGS to allow
OST pools to work properly.

Add the ability to mount the file system on the MGS when
necessary for the Lustre test suite ost-pools.sh.

Test-Parameters: trivial testlist=ost-pools
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Iff0663a38b92bb8e71c313897b12fca98fdae932
Reviewed-on: https://review.whamcloud.com/28638
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Arshad Hussain <arshad.hussain@seagate.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8066 libcfs: call kernel_param_unlock on error 12/28612/3
Hongchao Zhang [Tue, 22 Aug 2017 21:13:38 +0000 (17:13 -0400)]
LU-8066 libcfs: call kernel_param_unlock on error

In libcfs_param_debug_mb_set, kerenl_param_unlock should be
called in case of an error.

Change-Id: Iafeeb21b2d891f4ed7432e4d1ddd3c383fe33d5a
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/28612
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9347 ioctl: Add BLKSSZGET ioctl support 78/28578/5
Emoly Liu [Thu, 17 Aug 2017 07:36:49 +0000 (15:36 +0800)]
LU-9347 ioctl: Add BLKSSZGET ioctl support

Add BLKSSZGET ioctl and return PAGE_SIZE for the minimun
alignment from ll_file_ioctl() for this call.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: Id8a77e77cd7e1807aa90474ca6d3d1fea4d7c269
Reviewed-on: https://review.whamcloud.com/28578
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
6 years agoLU-9042 test: Remove conf-sanity tests from ALWAYS_EXCEPT 39/28539/3
dilip krishnagiri [Mon, 14 Aug 2017 19:03:25 +0000 (13:03 -0600)]
LU-9042 test: Remove conf-sanity tests from ALWAYS_EXCEPT

Removing the following conf-sanity tests:

LU-2181 added conf-sanity tests
23a "interrupt client during recovery mount delay"
34b "force umount with failed mds should be normal"
from the ALWAYS_EXCEPT list. LU-2181 is resolved.

Test-parameters: trivial testlist=conf-sanity clientdistro=sles11sp4 mdsdistro=sles11sp4 ossdistro=sles11sp4

Change-Id: Iea35039cc1de57bc3109e678c3a52bd2b9fa12f7
Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Reviewed-on: https://review.whamcloud.com/28539
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-6210 lnet: Use C99 struct initializer in framework.c 36/28436/2
Steve Guminski [Mon, 7 Aug 2017 17:17:31 +0000 (13:17 -0400)]
LU-6210 lnet: Use C99 struct initializer in framework.c

This patch makes no functional changes.  The struct initializer in
framework.c is updated to C99 syntax.

C89 positional initializers require values to be placed in the
correct order. This will cause errors if the fields of the struct
definition are reordered or fields are added or removed. C99 named
initializers avoid this problem, and also automatically clear any
values that are not explicitly set.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: Id54894c6f9476a5bf3b9cb5077ca324703c28da4
Reviewed-on: https://review.whamcloud.com/28436
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-5170 lfs: Standardize error messages in lfs_setdirstripe() 86/28086/2
Steve Guminski [Tue, 11 Jul 2017 20:10:52 +0000 (16:10 -0400)]
LU-5170 lfs: Standardize error messages in lfs_setdirstripe()

Error and warning messages in lfs_setdirstripe() are updated to a
standard format.  Messages are prefixed with the name of the utility
and the command that caused the error.  User-provided values are
delimited with single quotes.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I1dcc60aef3eab33610cc5f1e2b2d7e570568aca4
Reviewed-on: https://review.whamcloud.com/28086
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9588 tests: remove replay-ost-single test from ALWAYS_EXCEPT 02/27402/5
dilip krishnagiri [Wed, 9 Aug 2017 19:11:52 +0000 (13:11 -0600)]
LU-9588 tests: remove replay-ost-single test from ALWAYS_EXCEPT

Removing replay-ost-single tests
 for ZFS,   3 "Fail OST during write, with verification"
from ALWAYS_EXCEPT list.

Test-Parameters: trivial testlist=replay-ost-single mdtfilesystemtype=zfs ostfilesystemtype=zfs

Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Change-Id: I6d928c374adaab47288368c533c2455549d4be17
Reviewed-on: https://review.whamcloud.com/27402
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9580 tests: remove performance-sanity tests from ALWAYS_EXCEPT 75/27375/4
dilip krishnagiri [Wed, 9 Aug 2017 20:14:01 +0000 (14:14 -0600)]
LU-9580 tests: remove performance-sanity tests from ALWAYS_EXCEPT

Remove performance-sanity tests 1 and 2 from ALWAYS_EXCEPT
list as well as tests test_1 and test_2 because all they
contain are calls to echo.
Tests 1 and 2 are associated with bugzilla ticket 15266 and
it is fixed. Yet, reviewing all comment in that ticket
reveals that tests 1 and 2 were never implemented.

Test-Parameters: trivial testlist=performance-sanity

Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Change-Id: I402474f9db0d1875bf9c4b5c071e9c27bd47ba28
Reviewed-on: https://review.whamcloud.com/27375
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9519 utils: liblustreapi header cleanup 55/27155/4
Henri Doreau [Wed, 17 May 2017 08:50:50 +0000 (10:50 +0200)]
LU-9519 utils: liblustreapi header cleanup

Remove superfluous 'external' qualifier from liblustreapi method prototypes.
Remove superfluous 'const' qualifier.

Test-Parameters: trivial
Change-Id: I818d5d2c9ae69d947f72c9306125715547714770
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-on: https://review.whamcloud.com/27155
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8691 tests: add mdtest to ha.sh 70/23070/5
Elena Gryaznova [Mon, 29 May 2017 19:19:03 +0000 (22:19 +0300)]
LU-8691 tests: add mdtest to ha.sh

Patch adds:
- mdtest mpi load;
- ha_simultaneous mode, which allows to reboot
  victim nodes simultaneously.

Test-Parameters: trivial
Seagate-bug-id: MRP-3896
Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Change-Id: I2c37f2a383ce2ed475ae14dcfa50a7f7357cb1bf
Reviewed-on: https://review.whamcloud.com/23070
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8435 tests: slab alloc error does not LBUG 45/21745/6
Aurelien Degremont [Tue, 30 May 2017 21:56:06 +0000 (23:56 +0200)]
LU-8435 tests: slab alloc error does not LBUG

Under memory pressure, for instance using a memory cgroup
and kmem.limit_in_bytes enforced (SLURM does this),
osc_extent_alloc() could fail and error handling will
hit an LBUG.

Add a test for this.

Test-Parameters: trivial testlist=sanity,sanity,sanity

Signed-off-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Change-Id: I135f05ee4be14521522c949e50bd4c8deb1f099a
Reviewed-on: https://review.whamcloud.com/21745
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-7988 hsm: added coordinator housekeeping flag 82/19582/38
Frank Zago [Fri, 8 Apr 2016 17:59:06 +0000 (13:59 -0400)]
LU-7988 hsm: added coordinator housekeeping flag

When the coordinator is not performing housekeeping, only the requests
in the ARS_WAITING state will be processed as they are new
requests. The other requests, in states ARS_FAILED, ARS_CANCELED,
ARS_SUCCEED and ARS_STARTED can wait a few more seconds until the
housekeeping starts.

Also, when not performing housekeeping, as soon as hsd.request is
full, exit from the loop as there is enough potential work queued;
there's no need to examine all the HSM records, thus shortening the
time spent in cdt_llog_process() holding the critical lock
cdt_llog_lock.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ib73c97d29ca2f86b912aeb8d055c004cff14d5cf
Reviewed-on: https://review.whamcloud.com/19582
Tested-by: Jenkins
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9890 osd-zfs: dmu_objset_own/disown changes 93/28593/3
Giuseppe Di Natale [Thu, 17 Aug 2017 17:16:49 +0000 (10:16 -0700)]
LU-9890 osd-zfs: dmu_objset_own/disown changes

ZFS 0.8.0 will introduce ZFS encryption. The interfaces
to 'dmu_objset_own' and 'dmu_objset_disown' have changed.
Add configure checks to determine which versions of these
functions are available and call them appropriately.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Test-Parameters: trivial ostfilesystemtype=zfs mdtfilesystemtype=zfs testlist=sanity
Change-Id: Ide1a712858770e373404445b06596130a574d85b
Reviewed-on: https://review.whamcloud.com/28593
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
6 years agoLU-9882 kernel: kernel update RHEL7.4 [3.10.0-693.1.1.el7] 55/28555/4
Bob Glossman [Tue, 15 Aug 2017 14:21:36 +0000 (07:21 -0700)]
LU-9882 kernel: kernel update RHEL7.4 [3.10.0-693.1.1.el7]

update RHEL 7.4 kernel to 3.10.0-693.1.1.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I48c1907b0db9f97fbebc8b8276cc27124433b482
Reviewed-on: https://review.whamcloud.com/28555
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9857 lmv: stripe dir page may be released mistakenly 48/28548/2
Lai Siyao [Tue, 15 Aug 2017 03:13:30 +0000 (11:13 +0800)]
LU-9857 lmv: stripe dir page may be released mistakenly

stripe_dirent_next() may put_stripe_dirent() while its dirent
is still in use, e.g. lmv_dirent_next() popped stripe last
dirent, when it can't point sd_ent to next, but it shouldn't
release stripe dir page.

stripe_dirent->sd_ent should be set NULL when its dir page
is released, which can avoid misuse.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I6d0e119d598e468d6a080b2072514a6bf1d4f786
Reviewed-on: https://review.whamcloud.com/28548
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9874 osd-ldiskfs: simplify project transfer codes 10/28510/4
Wang Shilong [Tue, 27 Jun 2017 05:51:09 +0000 (13:51 +0800)]
LU-9874 osd-ldiskfs: simplify project transfer codes

Currently, osd-ldiskfs call __ldiskfs_ioctl_project()
to transfer project quota which is user ioctl for ext4 which
will start a transaction, and reserve credits, this is not
right logic with Lustre.

Lustre have started a transaction handle and credits should be
reserved during declare phase, so calling _ldiskfs_ioctl_project()
here will cause nested handle starting, which is not a problem for
JBD2 because it will attach current thread's handle if transaction
have been started, but in this case it will ignore credits
reservation.

Also Lustre don't need inode mutex protection for
project transfer, we don't need write inode in transfer codes,
it will be done when dirty inode is called. Setting attr
have reserved enough credits for project transfer, we need
fix agent inode transfering.

This patch makes codes logic clear, also fix credits
reservation for DNE agent inode transfering.

Change-Id: I6ab3c0fdc4cf456b102e49d9326840fd0e12ade0
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/28510
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9866 kernel: kernel update [SLES12 SP2 4.4.74-92.35] 09/28509/2
Bob Glossman [Fri, 11 Aug 2017 15:25:03 +0000 (08:25 -0700)]
LU-9866 kernel: kernel update [SLES12 SP2 4.4.74-92.35]

Update target and kernel_config files for new version

Test-Parameters: clientdistro=sles12sp2 testgroup=review-ldiskfs \
  mdsdistro=sles12sp2 ossdistro=sles12sp2 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ibd5e7e931a6055c1b0d2a52359d4f4527843dec0
Reviewed-on: https://review.whamcloud.com/28509
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8066 libcfs: test for both __kernel_param_[un]lock and kernel_param_[un]lock 98/28498/3
James Simmons [Fri, 11 Aug 2017 19:47:19 +0000 (15:47 -0400)]
LU-8066 libcfs: test for both __kernel_param_[un]lock and kernel_param_[un]lock

In earlier kernels like RHEL6 no locking is available. Later the
function __kernel_param_[un]lock() we introduced. In most recent
kernels per module locking was introduced with the functions
kernel_param_[un]lock() and __kernel_param_[un]lock() is no longer
visible to modules. Since this is the case we need to make sure
both HAVE_MODULE_PARAM_LOCKING and HAVE_KERNEL_PARAM_LOCK are not
set in the case of RHEL6.

Change-Id: I0957a16352c4fb49fb5d96c0ff4d331a8be9703a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28498
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9860 tests: Add conf-sanity tests to ALWAYS_EXCEPT list 97/28497/10
James Nunez [Fri, 11 Aug 2017 19:45:52 +0000 (13:45 -0600)]
LU-9860 tests: Add conf-sanity tests to ALWAYS_EXCEPT list

The following tests fail when run with a separate MDS and MGS:
conf-sanity tests 33a, 43b, 53b, 54b, 70e, 80, 84, 87, 100,
102, 103, 104, 105 and 107.
We need to add these tests to the ALWAYS_EXCEPT list
when running with a separate MDS and MGS.

Test-Parameters: trivial combinedmdsmgs=false testlist=conf-sanity envdefinitions=SLOW=yes
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I1b17714216e14ad04eb9a492cb5f1aa4ed82bd1a
Reviewed-on: https://review.whamcloud.com/28497
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Dilip Krishnagiri <dilipx.krishnagiri@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9869 lnet: fix incorrect arguments order calling lstcon_session_new 87/28487/3
Colin Ian King [Fri, 11 Aug 2017 17:17:57 +0000 (13:17 -0400)]
LU-9869 lnet: fix incorrect arguments order calling lstcon_session_new

The arguments args->lstio_ses_force and args->lstio_ses_timeout are
in the incorrect order. Fix this by swapping them around.

Detected by CoverityScan, CID#1226833 ("Arguments in wrong order")

Test-Parameters: trivial testlist=lnet-selftest

Change-Id: If11c574655425db5bbf21ba2264be8d83a7e8bf8
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28487
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-6210 ptlrpc: Use C99 initializer in ptlrpc_register_rqbd() 79/28479/3
Steve Guminski [Mon, 7 Aug 2017 18:01:32 +0000 (14:01 -0400)]
LU-6210 ptlrpc: Use C99 initializer in ptlrpc_register_rqbd()

This patch makes no functional changes.  The struct initializer in
ptlrpc_register_rqbd() is updated to C99 syntax.

C89 positional initializers require values to be placed in the
correct order. This will cause errors if the fields of the struct
definition are reordered or fields are added or removed. C99 named
initializers avoid this problem, and also automatically clear any
values that are not explicitly set.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I7c24bac3ba6be6732b206406cd74b0d4f8a1f9c2
Reviewed-on: https://review.whamcloud.com/28479
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9856 mdd: handle NULL buffer in mdd_xattr_list() 69/28469/2
John L. Hammond [Thu, 10 Aug 2017 19:44:24 +0000 (14:44 -0500)]
LU-9856 mdd: handle NULL buffer in mdd_xattr_list()

The upper layer may call mdd_xattr_list() with a NULL buffer to get
the length of the xattr name list. Handle this case safely by skipping
the removal of the link xattr for unlinked objects.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iae87fba20325b228ef75ee762acfa49353932b1b
Reviewed-on: https://review.whamcloud.com/28469
Tested-by: Jenkins
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-6210 utils: Use C99 struct initializers in lfs_getdirstripe 21/28421/2
Steve Guminski [Tue, 8 Aug 2017 17:46:24 +0000 (13:46 -0400)]
LU-6210 utils: Use C99 struct initializers in lfs_getdirstripe

This patch makes no functional changes.  The option struct
initializer in lfs_getdirstripe() is updated to C99 syntax.

C89 positional initializers require values to be placed in the
correct order. This will cause errors if the fields of the struct
definition are reordered or fields are added or removed. C99 named
initializers avoid this problem, and also automatically clear any
values that are not explicitly set.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I6f2d4a82e5a9ef2c76946746d6c46b1202e8c278
Reviewed-on: https://review.whamcloud.com/28421
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-832 test: Add error check when running run-llog.sh 12/28412/2
Wei Liu [Mon, 7 Aug 2017 19:03:15 +0000 (12:03 -0700)]
LU-832 test: Add error check when running run-llog.sh

Add error status check in sanity test_60a when calling
run-llog.sh

Test-Parameters: trivial

Change-Id: I1296907c8892b7dd54dac37045d8a7c4e03b1f52
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/28412
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9803 tests: cast st_blksize for printf 62/28262/4
Chris Horn [Thu, 27 Jul 2017 20:10:15 +0000 (15:10 -0500)]
LU-9803 tests: cast st_blksize for printf

Compilation with -Werror=format complains about this printf. Expects
unsigned long but st_blksize has type __blksize_t. Cast it to unsigned
long for printf

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I1eeb5613e485132de8f0bce08bd4d89887e52cf6
Reviewed-on: https://review.whamcloud.com/28262
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9781 llog: Improve catalog full warning 93/28093/4
Giuseppe Di Natale [Tue, 18 Jul 2017 21:57:18 +0000 (14:57 -0700)]
LU-9781 llog: Improve catalog full warning

When warning that a catalog file is full, provide the name
of the catalog file. If the name of catalog file isn't
defined, print its FID.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: I559e43d08febfd8a1512ceb58fd3030b06372e9f
Reviewed-on: https://review.whamcloud.com/28093
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-6210 utils: Use C99 initializers in lfs_changelog() 22/27522/3
Steve Guminski [Fri, 14 Apr 2017 19:33:23 +0000 (15:33 -0400)]
LU-6210 utils: Use C99 initializers in lfs_changelog()

This patch makes no functional changes.  Struct initializers that
use C89 or GCC-only syntax are updated to C99 syntax.  Variables of
type struct option are renamed to long_opts for consistency.

C89 positional initializers require values to be placed in the
correct order. This will cause errors if the fields of the struct
definition are reordered or fields are added or removed. C99 named
initializers avoid this problem, and also automatically clear any
values that are not explicitly set.

This patch updates lfs_changelog() to use the C99 syntax.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I4f9d82974f68742d788f00d58c5e3d61449fc5bb
Reviewed-on: https://review.whamcloud.com/27522
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-9593 tests: remove sanity-sec tests from ALWAYS_EXCEPT 11/27411/4
dilip krishnagiri [Mon, 7 Aug 2017 17:29:12 +0000 (11:29 -0600)]
LU-9593 tests: remove sanity-sec tests from ALWAYS_EXCEPT

sanity-sec tests 2, 5 and 6 no longer exist. Test 2 was
removed by LU-6971 patch change ID I06f4348b. Tests
5 and 6 were removed by LU-3105 patch change I865a92b57.

Remove sanity-sec tests 2, 5 and 6 from the ALWAYS_EXCEPT
list.

Test-Parameters: trivial testlist=sanity-sec

Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Change-Id: Ia0377ff0da41c4ba9df6c90bc26f0469cb9de9a6
Reviewed-on: https://review.whamcloud.com/27411
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Chris Hanna <hannac@iu.edu>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9591 tests: remove replay-vbr tests 12a from ALWAYS_EXCEPT 06/27406/3
dilip krishnagiri [Tue, 8 Aug 2017 21:17:42 +0000 (15:17 -0600)]
LU-9591 tests: remove replay-vbr tests 12a from ALWAYS_EXCEPT

Removing replay-vbr test 12a - lock replay with VBR from
the ALWAYS_EXCEPT list. It is associated with bugzilla
ticket 16356 which is in NEW state.
This test did not run for years.

Test-Parameters: trivial testlist=replay-vbr

Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Change-Id: I251bbaeea744a11fdf3e34870a00fc6b53fae3b1
Reviewed-on: https://review.whamcloud.com/27406
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8276 ldlm: Make lru clear always discard read lock pages 85/20785/7
Patrick Farrell [Mon, 14 Aug 2017 10:09:35 +0000 (05:09 -0500)]
LU-8276 ldlm: Make lru clear always discard read lock pages

A significant amount of time is sometimes spent during
lru clearing (IE, echo 'clear' > lru_size) checking
pages to see if they are covered by another read lock.
Since all unused read locks will be destroyed by this
operation, the pages will be freed momentarily anyway,
and this is a waste of time.

This patch sets the LDLM_FL_DISCARD_DATA flag on all the PR
locks which are slated for cancellation by
ldlm_prepare_lru_list when it is called from
ldlm_ns_drop_cache.

The case where another lock covers those pages (and is in
use and so does not get cancelled by lru clear) is safe for
a few reasons:

1. When discarding pages, we wait (discard_cb->cl_page_own)
until they are in the cached state before invalidating.
So if they are actively in use, we'll wait until that use
is done.

2. Removal of pages under a read lock is something that can
happen due to memory pressure, since these are VFS cache
pages. If a client reads something which is then removed
from the cache and goes to read it again, this will simply
generate a new read request.

This has a performance cost for that reader, but if anyone
is clearing the ldlm lru while actively doing I/O in that
namespace, then they cannot expect good performance.

In the case of many read locks on a single resource, this
improves cleanup time dramatically.  In internal testing at
Cray with ~80,000 read locks on a single file, this improves
cleanup time from ~60 seconds to ~0.5 seconds.  This also
slightly improves cleanup speed in the case of 1 or a few
read locks on a file.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I0c076b31ea474bb5f012373ed2033de3e447b62d
Reviewed-on: https://review.whamcloud.com/20785
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5541 lustreapi: only export the API symbols 43/11643/19
frank zago [Sun, 13 Aug 2017 18:17:17 +0000 (14:17 -0400)]
LU-5541 lustreapi: only export the API symbols

By default, all kind of symbols are exported from the library (dump,
libcfs_ukuc_start, l_ioctl, set_ioctl_dump, ...), which may create
external conflicts. Use the linker version-script options to only
export the API symbols, and prevent the export of internal symbols.

Only the symbols declared in the global section of liblustreapi.map
will be seen by applications.

Fix lshowmount to use libcfs and not internal liblustreapi symbol.

Change-Id: Ica4226c1ea9b6b159a056ad22bacaa2ffcf4b171
Signed-off-by: frank zago <fzago@cray.com>
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-on: https://review.whamcloud.com/11643
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
6 years agoLU-5541 build: build static and dynamic liblustreapi 25/11625/31
frank zago [Sun, 13 Aug 2017 18:11:26 +0000 (14:11 -0400)]
LU-5541 build: build static and dynamic liblustreapi

libtool knows how to build both, so no need to hack the Makefile. As
two added benefits, the utilities will now use the dynamic version,
thus reducing their footprint, and calling make twice in a row won't
rebuild objects already built.

Test-Parameters: trivial

Change-Id: If4191e1ff1564793c476ffe03f5d4b6ad5295421
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: https://review.whamcloud.com/11625
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9848 llog: check padding size for update reclen 54/28554/2
Lai Siyao [Tue, 15 Aug 2017 11:51:08 +0000 (19:51 +0800)]
LU-9848 llog: check padding size for update reclen

Update log only checks padding size for split case, which should also
be done if it's less than chunk size.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ie7819f67dd9bcbfb060713bb208c9777420c5178
Reviewed-on: https://review.whamcloud.com/28554
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0 91/28491/2
Doug Oucharek [Wed, 31 May 2017 21:39:12 +0000 (14:39 -0700)]
LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0

There is a case in the routine ptlrpc_register_bulk() where we were
asserting if bd_nob_transferred != 0 when not resending.  There is
evidence that network errors can create a situation where
this does happen.  So we should not be asserting!

This patch changes that assert to an error return code of -EIO.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I6a73ca1b04a86f187744d3b8b5d46df71d95e416
Reviewed-on: https://review.whamcloud.com/28491
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9863 lmv: Off by two in lmv_fid2path() 77/28477/2
Dan Carpenter [Fri, 11 Aug 2017 00:26:39 +0000 (20:26 -0400)]
LU-9863 lmv: Off by two in lmv_fid2path()

We want to concatonate join string one, a '/' character, string two and
then a NUL terminator. The destination buffer holds ori_gf->gf_pathlen
characters. The strlen() function returns the number of characters not
counting the NUL terminator. So we should be adding two extra spaces,
one for the foward slash and one for the NULL.

Change-Id: Ia96461a2d1b3331f44d3791ca0148f6e836caf0d
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28477
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9019 mdd: migrate from jiffies64 to ktime 07/28407/4
James Simmons [Mon, 14 Aug 2017 18:25:19 +0000 (14:25 -0400)]
LU-9019 mdd: migrate from jiffies64 to ktime

The mdd layer uses cfs_time_xxx_64() for 64 bit time percision.
This was written before ktime_t came into existence and it uses
64 bit version of jiffies which can vary between nodes due to
HZ being configurable. This provides a consistent format with
nanosecond precision on any node.

Change-Id: Ibec17227fd70a148c83296e8d1c41668f67e9201
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28407
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9657 pfl: llapi_layout_comp_usei should handle non-pfl file 65/27865/4
Emoly Liu [Tue, 15 Aug 2017 08:41:39 +0000 (16:41 +0800)]
LU-9657 pfl: llapi_layout_comp_usei should handle non-pfl file

This patch improves llapi_layout_comp_use() to treat non-composite
file as single component file. When doing "is composite" check,
"1" is returned when LLAPI_LAYOUT_COMP_USE_NEXT/PREV is specified.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I3ba4f07ec843d9b61273af331060d5f8827c2f8b
Reviewed-on: https://review.whamcloud.com/27865
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-8993 utils: Use absolute pathname for debug_daemon log file 85/25485/9
Steve Guminski [Mon, 13 Feb 2017 20:24:08 +0000 (15:24 -0500)]
LU-8993 utils: Use absolute pathname for debug_daemon log file

The lctl debug_daemon command is changed to always provide an
absolute pathname to the kernel.  The kernel code will return EINVAL
if the pathname does not begin with '/', leading to the confusing
error "Invalid argument". This patch allows the user to provide a
relative pathname to the command without generating this error.

The absolute_path function has been moved to string.c and renamed to
cfs_abs_path, so that it may be used by all utilities.

Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I35af242bfcfcb9a56135aeabe0423e28e9634bab
Reviewed-on: https://review.whamcloud.com/25485
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9410 ldiskfs: no check mb bitmap if flex_bg enabled 66/28566/4
Fan Yong [Wed, 9 Aug 2017 18:30:02 +0000 (02:30 +0800)]
LU-9410 ldiskfs: no check mb bitmap if flex_bg enabled

When initializes (reformat) the filesystem, the number of
free blocks in the group descriptor is calculated via the
ext2fs_reserve_super_and_bgd() (e2fsprogs). As commented
in such function: "This is not necessarily the case when
the flex_bg feature is enabled, so callers should take care!".

So it is normal that we may find the block group descriptor
that has LDISKFS_BG_BLOCK_UNINIT flag but with 0 free blocks.
The ldiskfs_mb_check_ondisk_bitmap() should NOT report error
for such block group, instead, skip the check directly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iba0fb2bf0632a6e54222472bc724a8ea0478e9ae
Reviewed-on: https://review.whamcloud.com/28566
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9841 lov: do not split IO for single striped file 51/28451/2
Jinshan Xiong [Wed, 9 Aug 2017 23:31:17 +0000 (16:31 -0700)]
LU-9841 lov: do not split IO for single striped file

stripe size for single striped file is not reliable, it shouldn't
be used to split I/O.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I47c31d59b46b07d4a6760b8985e1c19da4765a5c
Reviewed-on: https://review.whamcloud.com/28451
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9842 osd: return ENODATA for XATTR_NAME_FID on MDT 34/28434/2
Fan Yong [Tue, 8 Aug 2017 23:18:21 +0000 (07:18 +0800)]
LU-9842 osd: return ENODATA for XATTR_NAME_FID on MDT

The XATTR_NAME_FID xattr is OST side EA, if someone calls
getxattr() for XATTR_NAME_FID on MDT, then return -ENODATA.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I18b1466cf62d10fa28f7ed9731490e963b6274f4
Reviewed-on: https://review.whamcloud.com/28434
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9767 utils: validate filesystem name for mkfs.lustre 70/28070/7
James Simmons [Mon, 21 Aug 2017 03:24:20 +0000 (23:24 -0400)]
LU-9767 utils: validate filesystem name for mkfs.lustre

The patch "LU-6401 uapi: turn lustre_param.h into a proper
UAPI header" removed various user land functions used to
validate poolnames and file system names were removed. The
checks instead were enforced on the kernel side to ensure
any possible user land software directly interfacing to the
kernel wouldn't be able to break things badly. For the case
of formating the backend file system no kernel interaction
doesn't happen until it tries to mount the MDT/OST/MGT which
is very late in the process. So for this case lets add back
the file system name verification to mkfs.lustre to warn
users long before they try to mount anything.

Secondly we remove the verify_poolname() in lfs.c since
it duplicates extract_fsname_poolname() in obd.c. Their
is no need to do the same test twice. The function
pool_cmd() calls the ioctl for pool handling which in
turn returns an error code. Use this error code to notify
the user what mistake they did for their pool command.
For the MGS kernel code mgs_extract_fs_pool() was checking
MTI_NAME_MAXLEN instead of LUSTRE_MAXFSNAME. Also use
LUSTRE_MAXFSNAME instead of the raw number in the function
server_name2fsname() located in obd_mount.c.

Change-Id: If094644e56a70b6dd8e6b0378adc8736911aeef1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/28070
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9913 lnet: balance references in lnet_discover_peer_locked() 95/28695/2
John L. Hammond [Thu, 24 Aug 2017 20:01:34 +0000 (15:01 -0500)]
LU-9913 lnet: balance references in lnet_discover_peer_locked()

In lnet_discover_peer_locked() avoid a leaked reference to the peer in
the non-blocking discovery case.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ic48414859c923af1ebb197b0b0f2f8d6752043ac
Reviewed-on: https://review.whamcloud.com/28695
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9480 lnet: Multi-Rail Dynamic Discovery feature
Oleg Drokin [Tue, 22 Aug 2017 16:32:16 +0000 (12:32 -0400)]
LU-9480 lnet: Multi-Rail Dynamic Discovery feature

Merge remote-tracking branch 'origin/multi-rail'

Change-Id: I63d21d1085f4bf665480d29d5d14c065b6a22191
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9480 lnet: cleanup lnetctl and cyaml 49/27349/15
Sonia [Wed, 31 May 2017 08:48:15 +0000 (01:48 -0700)]
LU-9480 lnet: cleanup lnetctl and cyaml

lnetctl set commands results in segmentation fault
if no values are provided. This patch makes them
show help if no values are provided to with set commands.

Made general cleanups in the lnetctl code to consolidate
where the help is being printed. Created a function
check_cmd() which checks for the expected number of
arguments and for the -h/--help option and prints
the help string if either scenario is encountered.

fixed the fsm transition in cyaml to allow proper
parsing of empty cyaml documents

Change-Id: Ia081e9304ba2d6baa804e4c8890fb1988d860c1c
Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-on: https://review.whamcloud.com/27349
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
6 years agoLU-9480 lnet: show peer state 30/26130/21
Amir Shehata [Wed, 22 Mar 2017 20:34:23 +0000 (13:34 -0700)]
LU-9480 lnet: show peer state

It is important to show the peer state when debugging.
This patch exports the peer state from the kernel to
user space, and is shown when the detail level requested
in the peer show command is >= 3

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I1e169b2b7bf80671ea302f04c6fb948bbcbbb245
Reviewed-on: https://review.whamcloud.com/26130
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
6 years agoLU-9480 lnet: add enhanced statistics 95/25795/27
Amir Shehata [Thu, 2 Feb 2017 22:01:15 +0000 (14:01 -0800)]
LU-9480 lnet: add enhanced statistics

Added statistics to track the different types of
LNet messages which are sent/received/dropped

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I7e1fc991a56df20181f9e55a794765349a4d2cb9
Reviewed-on: https://review.whamcloud.com/25795

6 years agoLU-9480 lnet: add "lnetctl discover" 93/25793/29
Sonia Sharma [Mon, 13 Feb 2017 20:40:19 +0000 (12:40 -0800)]
LU-9480 lnet: add "lnetctl discover"

Add a "discover" subcommand to lnetctl

jt_discover() in lnetctl.c calls lustre_lnet_discover_nid()
to implement "lnetctl discover". The output is similar to
"lnetctl ping" command.
This patch also does some clean up in linlnetconfig.c
For parameters under global settings, the common code
for them is pulled in funtions ioctl_set_value() and
ioctl_show_global_values().

Test-Parameters: trivial
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I98ebb0b27de4b32ea07421f7dd71a4a1c96f3e05
Reviewed-on: https://review.whamcloud.com/25793

6 years agoLU-9077 lnet: fix for static analysis issues 92/25792/29
sharmaso [Wed, 8 Feb 2017 22:42:01 +0000 (14:42 -0800)]
LU-9077 lnet: fix for static analysis issues

fixes the 11 static analysis issues found in
v2_9_52_0-66-gec839d4.

1. lustre_lnet_show_numa_range - fixed
2. lnet_select_pathway - fixed
3. lustre_lnet_show_discovery - fixed
4. lnet_discover_peer_locked - false positive
5. lustre_lnet_ping_nid - fixed
6. lustre_lnet_ping_nid - false positive
7. lustre_lnet_show_discovery - duplicate of 3
8. lustre_lnet_show_max_intf - fixed
9. lustre_lnet_show_max_intf - duplicate of 8
10. lnet_peer_set_primary_data - false positive
11. lustre_lnet_show_numa_range - fixed

Test-Parameters: trivial
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I4cb03e4f64cd0c743ee3646f4628d34533b2d4ba
Reviewed-on: https://review.whamcloud.com/25792
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add "lnetctl ping" command 91/25791/31
Olaf Weber [Thu, 6 Apr 2017 09:43:20 +0000 (11:43 +0200)]
LU-9480 lnet: add "lnetctl ping" command

Adds function jt_ping() in lnetctl.c and
lustre_lnet_ping_nid() in liblnetconfig.c file.
The output of "lnetctl ping" is similar to
"lnetctl peer show".

Function jt_ping() in lnetctl.c calls lustre_lnet_ping_nid()
to implement "lnetctl ping". Adds a function infra_ping_nid()
to be later reused for the ping similar lnetctl commands.
Uses a new ioctl call, IOC_LIBCFS_PING_PEER for "lnetctl ping".
With "lnetctl ping", multiple nids can be pinged. Uses a new
struct(lnet_ioctl_ping_data in lib-dlc.h) to pass the data
from kernel to user space for ping. Also changes lnet_ping()
function and its input parameters in lnet/lnet/api-ni.c

Test-Parameters: trivial
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I67024d87fa5cca6aa7ff7a8099d4400a795f3a83
Reviewed-on: https://review.whamcloud.com/25791
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add "lnetctl peer list" 90/25790/26
Olaf Weber [Fri, 27 Jan 2017 15:36:47 +0000 (16:36 +0100)]
LU-9480 lnet: add "lnetctl peer list"

Add IOC_LIBCFS_GET_PEER_LIST to obtain a list of the primary
NIDs of all peers known to the system. The list is written
into a userspace buffer by the kernel. The typical usage is
to make a first call to determine the required buffer size,
then a second call to obtain the list.

Extend the "lnetctl peer" set of commands with a "list"
subcommand that uses this interface.

Modify the IOC_LIBCFS_GET_PEER_NI ioctl (which is new in the
Multi-Rail code) to use a NID to indicate the peer to look
up, and then pass out the data for all NIDs of that peer.

Re-implement "lnetctl peer show" to obtain the list of NIDs
using IOC_LIBCFS_GET_PEER_LIST followed by one or more
IOC_LIBCFS_GET_PEER_NI calls to get information for each
peer.

Make sure to copy the structure from kernel space to
user space even if the ioctl handler returns an error.
This is needed because if the buffer passed in by the
user space is not big enough to copy the data, we want
to pass the requested size to user space in the structure
passed in. The return code in this case is -E2BIG.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I522c11e6ec09bec46121496d526bb258e10295f1
Reviewed-on: https://review.whamcloud.com/25790
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: implement Peer Discovery 89/25789/24
Olaf Weber [Tue, 28 Mar 2017 13:05:03 +0000 (15:05 +0200)]
LU-9480 lnet: implement Peer Discovery

Implement Peer Discovery.

A peer is queued for discovery by lnet_peer_queue_for_discovery().
This set LNET_PEER_DISCOVERING, to indicate that discovery is in
progress.

The discovery thread lnet_peer_discovery() checks the peer and
updates its state as appropriate.

If LNET_PEER_DATA_PRESENT is set, then a valid Push message or
Ping reply has been received. The peer is updated in accordance
with the data, and LNET_PEER_NIDS_UPTODATE is set.

If LNET_PEER_PING_FAILED is set, then an attempt to send a Ping
message failed, and peer state is updated accordingly. The discovery
thread can do some cleanup like unlinking an MD that cannot be done
from the message event handler.

If LNET_PEER_PUSH_FAILED is set, then an attempt to send a Push
message failed, and peer state is updated accordingly. The discovery
thread can do some cleanup like unlinking an MD that cannot be done
from the message event handler.

If LNET_PEER_PING_REQUIRED is set, we must Ping the peer in order to
correctly update our knowledge of it. This is set, for example, if
we receive a Push message for a peer, but cannot handle it because
the Push target was too small. In such a case we know that the
state of the peer is incorrect, but need to do extra work to obtain
the required information.

If discovery is not enabled, then the discovery process stops here
and the peer is marked with LNET_PEER_UNDISCOVERED. This tells the
discovery process that it doesn't need to revisit the peer while
discovery remains disabled.

If LNET_PEER_NIDS_UPTODATE is not set, then we have reason to think
the lnet_peer is not up to date, and will Ping it.

The peer needs a Push if it is multi-rail and the ping buffer
sequence number for this node is newer than the sequence number it
has acknowledged receiving by sending an Ack of a Push.

If none of the above is true, then discovery has completed its work
on the peer.

Discovery signals that it is done with a peer by clearing the
LNET_PEER_DISCOVERING flag, and setting LNET_PEER_DISCOVERED or
LNET_PEER_UNDISCOVERED as appropriate. It then dequeues the peer
and clears the LNET_PEER_QUEUED flag.

When the local node is discovered via the loopback network, the
peer structure that is created will have an lnet_peer_ni for the
local loopback interface. Subsequent traffic from this node to
itself will use the loopback net.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I30acd1e046604013025b231b5806be25468a2286
Reviewed-on: https://review.whamcloud.com/25789
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add the Push target 88/25788/23
Olaf Weber [Tue, 28 Mar 2017 12:48:44 +0000 (14:48 +0200)]
LU-9480 lnet: add the Push target

Peer Discovery will send a Push message (same format as an
LNet Ping) to Multi-Rail capable peers to give the peer the
list of local interfaces.

Set up a target buffer for these pushes in the_lnet. The
size of this buffer defaults to LNET_MIN_INTERFACES, but it
is resized if required.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I09b5ad8ae504ba8368d908539001fb8afc2c2778
Reviewed-on: https://review.whamcloud.com/25788
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: tune lnet_peer_discovery_disabled with lnetctl 87/25787/21
Olaf Weber [Tue, 28 Mar 2017 09:09:32 +0000 (11:09 +0200)]
LU-9480 lnet: tune lnet_peer_discovery_disabled with lnetctl

A new tunable, lnet_peer_discovery_disabled, has been introduced.
Make it tunable with lnetctl. Note that the state of discovery is
reported as 1/enabled, or 0/disabled, which is the inverse of the
module parameter.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I67333d86520c5b6db8ff99c924054c4b487c8029
Reviewed-on: https://review.whamcloud.com/25787
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
6 years agoLU-9480 lnet: add discovery thread 86/25786/23
Olaf Weber [Fri, 27 Jan 2017 15:32:11 +0000 (16:32 +0100)]
LU-9480 lnet: add discovery thread

Add the discovery thread, which will be used to handle peer
discovery. This change adds the thread and the infrastructure
that starts and stops it. The thread itself does trivial work.

Peer Discovery gets its own event queue (ln_dc_eqh), a queue
for peers that are to be discovered (ln_dc_request), a queue
for peers waiting for an event (ln_dc_working), a wait queue
head so the thread can sleep (ln_dc_waitq), and start/stop
state (ln_dc_state).

Peer discovery is started from lnet_select_pathway(), for
GET and PUT messages not sent to the LNET_RESERVED_PORTAL.
This criterion means that discovery will not be triggered by
the messages used in discovery, and neither will an LNet ping
trigger it.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I38a48ab7f61c8ef1b994cd17069729f243912bdf
Reviewed-on: https://review.whamcloud.com/25786
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
6 years agoLU-9480 lnet: add msg_type to lnet_event 85/25785/23
Olaf Weber [Fri, 27 Jan 2017 15:31:57 +0000 (16:31 +0100)]
LU-9480 lnet: add msg_type to lnet_event

Add a msg_type field to the lnet_event structure. This makes
it possible for an event handler to tell whether LNET_EVENT_SEND
corresponds to a GET or a PUT message.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: If9ecc42c26eb078c19697f399a17f80b2e225639
Reviewed-on: https://review.whamcloud.com/25785
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: reference counts on lnet_peer/lnet_peer_net 84/25784/23
Olaf Weber [Fri, 27 Jan 2017 15:25:30 +0000 (16:25 +0100)]
LU-9480 lnet: reference counts on lnet_peer/lnet_peer_net

Peer discovery will be keeping track of lnet_peer structures,
so there will be references to an lnet_peer independent of
the references implied by lnet_peer_ni structures. Manage
this by adding explicit reference counts to lnet_peer_net and
lnet_peer.

Each lnet_peer_net has a hold on the lnet_peer it links to
with its lpn_peer pointer. This hold is only removed when that
pointer is assigned a new value or the lnet_peer_net is freed.
Just removing an lnet_peer_net from the lp_peer_nets list does
not release this hold, it just prevents new lookups of the
lnet_peer_net via the lnet_peer.

Each lnet_peer_ni has a hold on the lnet_peer_net it links to
with its lpni_peer_net pointer. This hold is only removed when
that pointer is assigned a new value or the lnet_peer_ni is
freed. Just removing an lnet_peer_ni from the lpn_peer_nis
list does not release this hold, it just prevents new lookups
of the lnet_peer_ni via the lnet_peer_net.

This ensures that given a lnet_peer_ni *lpni, we can rely on
lpni->lpni_peer_net->lpn_peer pointing to a valid lnet_peer.

Keep a count of the total number of lnet_peer_ni attached to
an lnet_peer in lp_nnis.

Split the global ln_peers list into per-lnet_peer_table lists.
The CPT of the peer table in which the lnet_peer is linked is
stored in lp_cpt.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I465f9b732964834dad327fbe5177fba0cfb6775f
Reviewed-on: https://review.whamcloud.com/25784
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add LNET_PEER_CONFIGURED flag 83/25783/23
Olaf Weber [Fri, 27 Jan 2017 15:25:02 +0000 (16:25 +0100)]
LU-9480 lnet: add LNET_PEER_CONFIGURED flag

Add the LNET_PEER_CONFIGURED flag, which indicates that a peer
has been configured by DLC. This is used to enforce that only
DLC can modify such a peer.

This includes some further refactoring of the code that creates
or modifies peers to ensure that the flag is properly passed
through, set, and cleared.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I647116ec19bc2f577732a02bf8efb52dad48a391
Reviewed-on: https://review.whamcloud.com/25783
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: preferred NIs for non-Multi-Rail peers 82/25782/24
Olaf Weber [Fri, 27 Jan 2017 15:24:40 +0000 (16:24 +0100)]
LU-9480 lnet: preferred NIs for non-Multi-Rail peers

When a node sends a message to a peer NI, there may be
a preferred local NI that should be the source of the
message. This is in particular the case for non-Multi-
Rail (NMR) peers, as an NMR peer depends in some cases
on the source address of a message to correctly identify
its origin. (This as opposed to using a UUID provided by
a higher protocol layer.)

Implement this by keeping an array of preferred local
NIDs in the lnet_peer_ni structure. The case where only
a single NID needs to be stored is optimized so that this
can be done without needing to allocate any memory.

A flag in the lnet_peer_ni, LNET_PEER_NI_NON_MR_PREF,
indicates that the preferred NI was automatically added
for an NMR peer. Note that a peer which has not been
explicitly configured as Multi-Rail will be treated as
non-Multi-Rail until proven otherwise. These automatic
preferences will be cleared if the peer is changed to
Multi-Rail.

- lnet_peer_ni_set_non_mr_pref_nid()
  set NMR preferred NI for peer_ni
- lnet_peer_ni_clr_non_mr_pref_nid()
  clear NMR preferred NI for peer_ni
- lnet_peer_clr_non_mr_pref_nids()
  clear NMR preferred NIs for all peer_ni

- lnet_peer_add_pref_nid()
  add a preferred NID
- lnet_peer_del_pref_nid()
  delete a preferred NID

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: If98501b34e83f099652f3b19aab5bbbf158f8280
Reviewed-on: https://review.whamcloud.com/25782
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: introduce LNET_PEER_MULTI_RAIL flag bit 81/25781/23
Olaf Weber [Fri, 27 Jan 2017 15:24:21 +0000 (16:24 +0100)]
LU-9480 lnet: introduce LNET_PEER_MULTI_RAIL flag bit

Add lp_state as a flag word to lnet_peer, and add lp_lock
to protect it. This lock needs to be taken whenever the
field is updated, because setting or clearing a bit is
a read-modify-write cycle.

The lp_multi_rail is removed, its function is replaced by
the new LNET_PEER_MULTI_RAIL flag bit.

The helper lnet_peer_is_multi_rail() tests the bit.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I15034be7670bcb18460dc709accf675711a48113
Reviewed-on: https://review.whamcloud.com/25781
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: refactor lnet_add_peer_ni() 80/25780/23
Olaf Weber [Fri, 27 Jan 2017 15:24:04 +0000 (16:24 +0100)]
LU-9480 lnet: refactor lnet_add_peer_ni()

Refactor lnet_add_peer_ni() and the functions called by it. In
particular, lnet_peer_add_nid() adds an lnet_peer_ni to an
existing lnet_peer, lnet_peer_add() adds a new lnet_peer.

lnet_find_or_create_peer_locked() is removed.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Iffcbf9ffc26460afc544f102bd0e0a56e23a83f3
Reviewed-on: https://review.whamcloud.com/25780
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: refactor lnet_del_peer_ni() 79/25779/23
Olaf Weber [Fri, 27 Jan 2017 15:23:51 +0000 (16:23 +0100)]
LU-9480 lnet: refactor lnet_del_peer_ni()

Refactor lnet_del_peer_ni(). In particular break out the code
that removes an lnet_peer_ni from an lnet_peer and put it into
a separate function, lnet_peer_del_nid().

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Id5988b308afb093f83cc2e7029d3f2961171c906
Reviewed-on: https://review.whamcloud.com/25779
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: rename lnet_add/del_peer_ni_to/from_peer() 78/25778/23
Olaf Weber [Fri, 27 Jan 2017 15:23:35 +0000 (16:23 +0100)]
LU-9480 lnet: rename lnet_add/del_peer_ni_to/from_peer()

Rename lnet_add_peer_ni_to_peer() to lnet_add_peer_ni(), and
lnet_del_peer_ni_from_peer() to lnet_del_peer_ni().  This brings
the function names closer to the ioctls they implement:
IOCTL_LIBCFS_ADD_PEER_NI and IOCTL_LIBCFS_DEL_PEER_NI. These
names are also a more accturate description their effect: adding
or deleting an lnet_peer_ni to LNet.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I0eefb60cbdedb998a659002b48d4c8ddd3b11fb2
Reviewed-on: https://review.whamcloud.com/25778
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: cleanup of lnet_peer_ni_addref/decref_locked() 77/25777/23
Olaf Weber [Fri, 27 Jan 2017 15:23:20 +0000 (16:23 +0100)]
LU-9480 lnet: cleanup of lnet_peer_ni_addref/decref_locked()

Address style issues in lnet_peer_ni_addref_locked() and
lnet_peer_ni_decref_locked(). In the latter routine, replace
a sequence of atomic_dec()/atomic_read() with atomic_dec_and_test().

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I9b7030ac9850b035f8bd80487a7b69b66b1d5858
Reviewed-on: https://review.whamcloud.com/25777
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add sanity checks on ping-related constants 76/25776/23
Olaf Weber [Mon, 27 Mar 2017 10:22:55 +0000 (12:22 +0200)]
LU-9480 lnet: add sanity checks on ping-related constants

Add sanity checks for LNet ping related data structures and
constants to wirecheck.c, and update the generated code in
lnet_assert_wire_constants().

In order for the structures and macros to be visible to
wirecheck.c, which is a userspace program, they were moved
from kernel-only lnet/lib-types.h to lnet/types.h

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I2949d27445b29ec69cf8c17b7769291f270a5923
Reviewed-on: https://review.whamcloud.com/25776
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add Multi-Rail and Discovery ping feature bits 75/25775/23
Olaf Weber [Fri, 27 Jan 2017 15:22:40 +0000 (16:22 +0100)]
LU-9480 lnet: add Multi-Rail and Discovery ping feature bits

Claim ping features bit for Multi-Rail and Discovery.

Assert in lnet_ping_target_update() that no unknown bits will
be send over the wire.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Ie84b155f1ae45e3c228a4e49dc898160b81efb94
Reviewed-on: https://review.whamcloud.com/25775
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: automatic sizing of router pinger buffers 74/25774/23
Olaf Weber [Fri, 27 Jan 2017 15:16:34 +0000 (16:16 +0100)]
LU-9480 lnet: automatic sizing of router pinger buffers

The router pinger uses fixed-size buffers to receive the data
returned by a ping. When a router has more than 16 interfaces
(including loopback) this means the data for some interfaces
is dropped.

Detect this situation, and track the number of remote NIs in
the lnet_rc_data_t structure.  lnet_create_rc_data_locked()
becomes lnet_update_rc_data_locked(), and modified to replace
an existing ping buffer if one is present. It is now also
called by lnet_ping_router_locked() when the existing ping
buffer is too small.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I7286702b8606e25a5c82291ea4138479c4bf010f
Reviewed-on: https://review.whamcloud.com/25774
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: add struct lnet_ping_buffer 73/25773/21
Olaf Weber [Fri, 27 Jan 2017 15:16:16 +0000 (16:16 +0100)]
LU-9480 lnet: add struct lnet_ping_buffer

The Multi-Rail code will use the ping target buffer also as the
source of data to push to other nodes. This means that there
will be multiple MDs referencing the same buffer, and care must
be taken to ensure that the buffer is not freed while any such
reference remains.

Encapsulate the struct lnet_ping_info (aka lnet_ping_info_t) in
a struct lnet_ping_buffer. This adds a reference count, and the
number of NIDs for the encapsulated lnet_ping_info has been
sized.

For sizing the buffer the constant LNET_PINGINFO_SIZE is replaced
with LNET_PING_INFO_SIZE(NNIS).

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Iae255a7ebd6099c050bddbea84fb1923a586ac66
Reviewed-on: https://review.whamcloud.com/25773
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Amir Shehata <amir.shehata@intel.com>
6 years agoLU-9480 lnet: configure lnet_interfaces_max tunable from dlc 71/25771/18
Olaf Weber [Fri, 27 Jan 2017 15:15:24 +0000 (16:15 +0100)]
LU-9480 lnet: configure lnet_interfaces_max tunable from dlc

Added the ability to configure lnet_interfaces_max from DLC.
Combined the configure and show of numa range and max interfaces
under a "global" YAML element when configuring using YAML.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I6f8babdf7900f963cd86acf92468175a49bbaeee
Reviewed-on: https://review.whamcloud.com/25771

6 years agoLU-9480 lnet: add lnet_interfaces_max tunable 70/25770/16
Olaf Weber [Fri, 27 Jan 2017 15:15:07 +0000 (16:15 +0100)]
LU-9480 lnet: add lnet_interfaces_max tunable

Add an lnet_interfaces_max tunable value, that describes the maximum
number of interfaces per node. This tunable is primarily useful for
sanity checks prior to allocating memory.

Allow lnet_interfaces_max to be set and get from the sysfs interface.

Add LNET_INTERFACES_MIN, value 16, as the minimum value.

Add LNET_INTERFACES_MAX_DEFAULT, value 200, as the default value. This
value was chosen to ensure that the size of an LNet ping message with
any associated LND overhead would fit in 4096 bytes.

(The LNET_INTERFACES_MAX name was not reused to allow for the early
detection of issues when merging code that uses it.)

Rename LNET_NUM_INTERFACES to LNET_INTERFACES_NUM

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I9bdc72cc688a414f7658fed93f84c9885c8342be
Reviewed-on: https://review.whamcloud.com/25770

6 years agoNew tag 2.10.52 2.10.52 v2_10_52 v2_10_52_0
Oleg Drokin [Tue, 22 Aug 2017 02:34:09 +0000 (22:34 -0400)]
New tag 2.10.52

Change-Id: I673949d64dd0067f1f220426ce3389806a886b5b
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9888 tests: Do not run conf-sanity 32b with ZFS 02/28602/2
James Nunez [Fri, 18 Aug 2017 18:42:06 +0000 (12:42 -0600)]
LU-9888 tests: Do not run conf-sanity 32b with ZFS

With recent changes to this test to support ZFS 0.7.1,
conf-sanity test 32b consistently fails in automated testing
with a ZFS file system. Add conf-sanity test 32b to the
ALWAYS_EXCEPT list for ZFS testing while the fialure is
investigated.

Test-Parameters: trivial testgroup=review-zfs-part-2
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I1d5f7e5d02f0c318153eab0db01d8ae67ad93f13
Reviewed-on: https://review.whamcloud.com/28602
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9887 tests: ignore error sanity-lfsck test 9a,b 88/28588/7
James Nunez [Thu, 17 Aug 2017 20:04:14 +0000 (14:04 -0600)]
LU-9887 tests: ignore error sanity-lfsck test 9a,b

sanity-lfsck tests 9a and 9b are failing consistently on
checking that speed limiting LFSCK takes less time than the
user defined maximum speed. We should ignore these errors
for now and print the layout or namespace to help understand
this issue.

Test-Parameters: trivial testgroup=review-zfs-part-1
Test-Parameters: testgroup=review-dne-part-2

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I64cac59edd456e6fd519961a4055130c8dbc8a4a
Reviewed-on: https://review.whamcloud.com/28588
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>