Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-12608 kernel: kernel update RHEL7.6 [3.10.0-957.27.2.el7] 39/35639/4
Jian Yu [Thu, 12 Sep 2019 07:08:25 +0000 (00:08 -0700)]
LU-12608 kernel: kernel update RHEL7.6 [3.10.0-957.27.2.el7]

Update RHEL7.6 kernel to 3.10.0-957.27.2.el7.

Test-Parameters: clientdistro=el7.6 serverdistro=el7.6

Change-Id: I8dd5e24746ccf11467c7a468edf7f9056d5705e3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12724 kernel: kernel update RHEL7.7 [3.10.0-1062.1.1.el7] 75/36075/2
Jian Yu [Fri, 6 Sep 2019 07:24:35 +0000 (00:24 -0700)]
LU-12724 kernel: kernel update RHEL7.7 [3.10.0-1062.1.1.el7]

Update RHEL7.7 kernel to 3.10.0-1062.1.1.el7.

Test-Parameters: trivial clientdistro=el7.7 serverdistro=el7.7

Change-Id: Iad40fb93b8a15d875b72749a05666a23e4755fcc
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12080 lnet: recovery event handling broken 28/36028/6
Amir Shehata [Sun, 17 Mar 2019 15:16:40 +0000 (08:16 -0700)]
LU-12080 lnet: recovery event handling broken

Don't increment health on unlink event.
If a SEND fails an unlink will follow so no need to do any
special processing on SEND event. If SEND succeeds then we
wait for the reply.
When queuing a message on the NI recovery queue only do so
if the MT thread is still running.

Lustre-change: https://review.whamcloud.com/34445
Lustre-commit: 5409e620e0256dc9b657f1c457541d7411b543cd

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4877caebcac5cdfc35a59a18a3e3451b1f23cb0d
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36028
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoRevert "LU-11816 lnet: setup health timeout defaults" 73/36173/2
Oleg Drokin [Thu, 12 Sep 2019 18:04:55 +0000 (18:04 +0000)]
Revert "LU-11816 lnet: setup health timeout defaults"

This is causing frequent assertion failures like below:
LNetError: 1701:0:(lib-move.c:3670:lnet_monitor_thr_stop()) ASSERTION( rc == 0 ) failed:
[  378.662897] LNetError: 1701:0:(lib-move.c:3670:lnet_monitor_thr_stop()) LBUG
[  378.665136] Pid: 1701, comm: rmmod 3.10.0-7.6-debug #1 SMP Fri Jul 12 02:40:17 EDT 2019
[  378.667455] Call Trace:
[  378.668302]  [<ffffffffa01927dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[  378.670463]  [<ffffffffa019288c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[  378.672398]  [<ffffffffa021d036>] lnet_monitor_thr_stop+0xe6/0x120 [lnet]
[  378.674727]  [<ffffffffa01fde8a>] LNetNIFini+0x6a/0x110 [lnet]
[  378.676532]  [<ffffffffa0622b15>] ptlrpc_ni_fini+0x175/0x200 [ptlrpc]
[  378.678598]  [<ffffffffa0622e53>] ptlrpc_exit_portals+0x13/0x20 [ptlrpc]
[  378.680850]  [<ffffffffa06b59aa>] ptlrpc_exit+0x22/0x678 [ptlrpc]
[  378.683338]  [<ffffffff81108aab>] SyS_delete_module+0x19b/0x300
[  378.684809]  [<ffffffff817c8e15>] system_call_fastpath+0x1c/0x21
[  378.686727]  [<ffffffffffffffff>] 0xffffffffffffffff
[  378.688144] Kernel panic - not syncing: LBUG

This reverts commit db81f3f293dbc0c9dba90ea1153f554b33fbb80b.

Change-Id: Id12f9d3ec4af3ab37158b3e6049d2ea971d86913
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36173

4 years agoLU-12603 ldlm: Check cancel lock count for correctness 08/36108/3
Oleg Drokin [Sat, 17 Aug 2019 05:36:07 +0000 (01:36 -0400)]
LU-12603 ldlm: Check cancel lock count for correctness

Make sure the number of locks we are going to cancel fits into
the supplied buffer first.

Lustre-change: https://review.whamcloud.com/35806
Lustre-commit: 7cc43aef98f6a759cbc5ae572123b44803c0ccd2

Change-Id: I93887133532bf7ee2be27114b1972aa64e06623c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12566 mdc: hold lock while walking changelog dev list 35/35835/2
Andreas Dilger [Thu, 1 Aug 2019 20:55:58 +0000 (14:55 -0600)]
LU-12566 mdc: hold lock while walking changelog dev list

In mdc_changelog_cdev_finish() we need chlg_registered_dev_lock
while walking and changing entries on the chlog_registered_devs
and ced_obds lists in chlg_registered_dev_find_by_obd().

Move the calling of chlg_registered_dev_find_by_obd() under the
mutex, and add assertions to the places where the lists are walked
and changed that the mutex is held.

Lustre-change: https://review.whamcloud.com/35668
Lustre-commit: a260c530801db7f58efa93b774f06b0ce72649a3

Fixes: 1d40214d96dd ("LU-7659 mdc: expose changelog through char devices")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib62fdff87cde6a4bcfb9bea24a2ea72a933ebbe5
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12045 tests: honor EXCEPT tests when using ONLY list 01/35901/2
James Nunez [Wed, 22 May 2019 16:22:19 +0000 (10:22 -0600)]
LU-12045 tests: honor EXCEPT tests when using ONLY list

The Lustre test framework allows a user to specify a subset
of tests to run using the ONLY parameter or --only flag.
The test framwork also allows the user to specify a list of
tests to skip using the EXCEPT or ALWAYS_EXCEPT parameters.
By default, if the ONLY parameter or --only flag is used,
the EXCEPT and ALWAYS_EXCEPT lists are ignored.

Add a flag to auster, -H, and an environment variable,
HONOR_EXCEPT, to skip the tests on the ALWAYS_EXCEPT,
EXCEPT and SLOW lists when using the ONLY/--only parameter.

Lustre-commit: e636a709bf5948cd944ca9a42d4b74f07557a2ac
Lustre-change: https://review.whamcloud.com/34938

Test-Parameters: trivial
Test-Parameters: envdefinitions=ONLY="40-43" testlist=sanity
Test-Parameters: envdefinitions=ONLY="40-43" austeroptions=-H testlist=sanity
Test-Parameters: envdefinitions=SLOW="no",ONLY="27" testlist=sanity
Test-Parameters: envdefinitions=SLOW="no",ONLY="27" austeroptions=-H testlist=sanity
Test-Parameters: envdefinitions=SLOW="yes",ONLY="27" testlist=sanity
Test-Parameters: envdefinitions=SLOW="yes",ONLY="27" austeroptions=-H testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I173e48e1d2dc3b404d148146639a13148bc48a3d
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12017 ldlm: DoM truncate deadlock 37/35937/3
Andriy Skulysh [Thu, 6 Jun 2019 12:22:00 +0000 (15:22 +0300)]
LU-12017 ldlm: DoM truncate deadlock

setxattr takes inode lock and sends reint to MDS.
truncate takes MDS_INODELOCK_DOM lock and wants
to acquire inode lock.

MDS locks are for different bits
MDS_INODELOCK_UPDATE|MDS_INODELOCK_XATTR vs
MDS_INODELOCK_DOM but they blocks each other if
some blocking lock was present earlier.

If IBITS waiting lock has no conflicts with any lock in the
granted queue or any lock ahead in the waiting queue then
it can be granted.

Use separate waiting lists for each ibit to eliminate full
lr_waiting list scan.

Lustre-change: https://review.whamcloud.com/35057
Lustre-commit: 2250e072c37855d611aa64027945981fe2c8f4d7

Cray-bug-id: LUS-6970
Change-Id: I95b2ed0b1a0063b7ece5277a5ee06e2511d44e5f
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11285 mdt: improve IBITS lock definitions 55/35955/2
Andreas Dilger [Mon, 3 Jun 2019 18:21:53 +0000 (12:21 -0600)]
LU-11285 mdt: improve IBITS lock definitions

Move MDS_INODELOCK_* flags into a named enum, and add the definitions
for the newer flags into wirecheck/wiretest to ensure consistency.

Rename MDS_INODELOCK_MAXSHIFT to MDS_INODELOCK_NUMBITS to hold current
number of lockbits, rather than one less than the number of lockbits,
since the only two places that use it expect it to be one larger than
it is.  Fix uses of MDS_INODELOCK_NUMBITS to be number of locks.  This
does not change the value of MDS_INODELOCK_FULL, which is used in the
protocol to exchange supported lock bits between client and server.

Lustre-change: https://review.whamcloud.com/35045
Lustre-commit: 3611352b699ce479779c0ff92ca558d9321e58a2

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c2985bcc602b7182d5db2cf8d590923be2cab07
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
4 years agoLU-11761 fld: let's caller to retry FLD_QUERY 61/35661/2
Hongchao Zhang [Thu, 4 Jul 2019 13:39:24 +0000 (09:39 -0400)]
LU-11761 fld: let's caller to retry FLD_QUERY

In fld_client_rpc(), if the FLD_QUERY request between MDTs fails
with -EWOUDBLOCK because the connection is lost, return -EAGAIN
to notify the caller to retry.

It also reverts the patch https://review.whamcloud.com/12586/, which
was landed on b2_6_90_0-5-g6db07f0 to avoid returning -EAGAIN from
lod_object_init() to confuse lu_object_find_at() (thinks the object
was dying when it encounters -EAGAIN). In current Lustre version,
lu_object_find_at() just returned found object and let's caller to
check whether it's dying.

Fixes: 6db07f095fba ("LU-5871 lod: Do not return EAGAIN in lod_object_init")

Lustre-change: https://review.whamcloud.com/34962
Lustre-commit: e3f6111dfd1c6f2266d0beef67e5a7514a6965d0

Change-Id: Ie83ebfdae2bd50c96a59a065f7f3c3dcfad04e42
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35661
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12163 lnet: fix cpt locking 32/36032/3
Amir Shehata [Sat, 6 Apr 2019 00:38:38 +0000 (17:38 -0700)]
LU-12163 lnet: fix cpt locking

In lnet_select_pathway() the call to lnet_handle_send_case_locked()
can result in sd_cpt being changed. If this function returns
REPEAT_SEND, we'll go back to the again label. It is possible at
this time to initiate discovery, which will unlock the cpt.
If the local cpt isn't updated we could potentially be manipulating
the wrong cpt resulting in some form of corruption or dead lock.

Lustre-change: https://review.whamcloud.com/34607
Lustre-commit: f6d63067e1ec00009b9da5cdb263fe14e7e503e1

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifd39b0d84f8cce859151f7cc900a082481dd7218
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11816 lnet: setup health timeout defaults 31/36031/3
Amir Shehata [Wed, 19 Dec 2018 23:55:49 +0000 (15:55 -0800)]
LU-11816 lnet: setup health timeout defaults

Enable health feature by default.
Setup transaction timeout to a default 10 seconds and
retry count to 3 when health is enabled. When health
is disabled set default transaction timeout to 50.
When toggling between health enabled/disabled the defaults
will always kick in.

Lustre-change: https://review.whamcloud.com/34252
Lustre-commit: 8632e94aeb7e62da07f342a9897d15dfd8251148

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I153c2822898b44e33871ec827de7e61f153bb1db
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12626 lnet: create existing net returns EEXIST 53/35953/2
Olaf Faaland [Fri, 2 Aug 2019 16:38:50 +0000 (09:38 -0700)]
LU-12626 lnet: create existing net returns EEXIST

When "lnetctl net add" is called for an interface/net pair that
already exists, the error returned should be EEXIST, so the
user knows that the net is already configured.

Lustre-change: https://review.whamcloud.com/35681
Lustre-commit: 4aa71267cc0317e126843509f1c5b237f469414b

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Idab79ab288a11a2920793f27df235b4dfab497fe
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35953
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12485 obdclass: 0-nlink race in lu_object_find_at() 34/35834/2
Lai Siyao [Fri, 28 Jun 2019 12:19:56 +0000 (20:19 +0800)]
LU-12485 obdclass: 0-nlink race in lu_object_find_at()

There is a race in lu_object_find_at: in the gap between
lu_object_alloc() and hash insertion, another thread may
have allocated another object for the same file and unlinked
it, so we may get an object with 0-nlink, which will trigger
assertion in osd_object_release().

To avoid such race, initialize object after hash insertion.
But this may cause an unitialized object found in cache, if
so, wait for the object initialized by the allocator.

To reproduce the race, introduced cfs_race_wait() and
cfs_race_wakeup(): cfs_race_wait() will cause the thread that
calls it wait on the race; while cfs_race_wakeup() will wake
up the waiting thread. Same as cfs_race(), CFS_FAIL_ONCE
should be set together with fail_loc.

Add sanityn test_84.

Lustre-change: https://review.whamcloud.com/35360
Lustre-commit: 2ff420913b9718ee8d80ae51fddc6e5df4a3148a

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0869f254544256987b73f0ff92f75e4d1562e566
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35834
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12267 tests: update filter in acl for SElinux case 57/35957/2
Sebastien Buisson [Tue, 7 May 2019 15:55:04 +0000 (00:55 +0900)]
LU-12267 tests: update filter in acl for SElinux case

With SElinux enforced on client, sanity.sh test_103a fails because
the "ls -l" command produces an extra '.' at the end to indicate
extra security attributes are set.

So update filter by removing this trailing '.' in the output.

Lustre-change: https://review.whamcloud.com/34818
Lustre-commit: 3f6294a482651802fb97175b6e8c6568a371352a

Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=103a
Test-Parameters: clientselinux testlist=sanity envdefinitions=ONLY=103a
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie684a3fe02f0f2821c8059855165a0f9dd585b72
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35957
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11760 ofd: limit num of objects to create in 1 transaction 51/35951/2
Sergey Cheremencev [Fri, 28 Jun 2019 20:42:28 +0000 (23:42 +0300)]
LU-11760 ofd: limit num of objects to create in 1 transaction

Set flag th_sync when the number of objects to create per
sequence reaches OST_MAX_PRECREATE in one transaction.
It is needed to avoid gaps after OST failover.
See details in LU-11760.

Lustre-change: https://review.whamcloud.com/35373
Lustre-commit: 4485ee8be4cf224e2543f6344efc6e1cb295a0a7

Change-Id: Ie29de5a42e757b07561749982359c01df999e798
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35951
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12600 tgt: shortio size should be unsigned 67/35867/2
Patrick Farrell [Tue, 30 Jul 2019 18:10:32 +0000 (14:10 -0400)]
LU-12600 tgt: shortio size should be unsigned

The short_io_size value is accepting unsigned values from
req_capsule_get_size, and so needs to be unsigned as well.

If it's not, it's possible for the short_io_size memcopy to
act on an incorrect value and cause memory corruption.

Lustre-change: https://review.whamcloud.com/35653
Lustre-commit: 4c3864cf97711d73b12905fea720570cf814d179

Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I043e314cd43a7b40519f951a605fa5a36ff91dcf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12605 tgt: check client data size in target_handle_connect() 35/35935/2
Emoly Liu [Fri, 9 Aug 2019 07:29:30 +0000 (15:29 +0800)]
LU-12605 tgt: check client data size in target_handle_connect()

Check client data size (negtive or excessively large) in case of
memcpy corruption.

Lustre-change: https://review.whamcloud.com/35711
Lustre-commit: 149f005a3199eee13fe6396671613a0f620ee0cc

Change-Id: Ided26dea0e2bbb79e607c626810834ca947497d4
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12394 llite: Fix extents_stats 66/35866/2
Patrick Farrell [Tue, 11 Jun 2019 18:54:20 +0000 (14:54 -0400)]
LU-12394 llite: Fix extents_stats

Patch 32517 from LU-8066 changed:
        (1 << LL_HIST_START << i)

To:

        BIT(LL_HIST_START << i)

But these are not equivalent because this changes the order
of operations.  The earlier one does the operations in this
order:
        (1 << LL_HIST_START) << i

The new one is this order:
        1 << (LL_HIST_START << i)

Which is quite different, as it's left shifting
LL_HIST_START directly, and LL_HIST_START is a number of
bits.

The goal is really just to start with BIT(LL_HIST_START)
and left shift by one (going from 4K, to 8K, etc) each
time, so just use:
        BIT(LL_HIST_START + i)

The result of this was that all i/os over 8K were placed in
the 4K-8K stat bucket, because the loop exited early.

Also add mmap'ed reads & writes to extents_stats.

Add test for extents_stats.

Fixes: adb5aca3d673 ("LU-8066 llite: Move all remaining procfs entries
                     to debugfs")

Lustre-change: https://review.whamcloud.com/35075
Lustre-commit: d31a4dad4e698c537dff3d018fd67f196b2b293f

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iab4dc097234d411601a18d501075df45791d1138
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12615 mdt: check mdt_object 69/35869/2
Hongchao Zhang [Sat, 13 Jul 2019 12:07:07 +0000 (08:07 -0400)]
LU-12615 mdt: check mdt_object

In processing RPC of getattr, getxattr, swap_layouts and sync,
the mdt_object should be checked to verify there is a valid
RMF_MDT_BODY field and OBD_MD_FLID is set properly.

Lustre-change: https://review.whamcloud.com/35764
Lustre-commit: e5e0bdb7a5c2d47ceaa2d1c190806d1be4999129

Change-Id: Ibb6aaa5ec5eb4b7284f4d5567a618a908d66920c
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35869
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12267 tests: filter trailing '.' for SELinux 58/35958/2
James Nunez [Fri, 31 May 2019 21:28:20 +0000 (15:28 -0600)]
LU-12267 tests: filter trailing '.' for SELinux

When SELinux is enforced, sanity test 420 fails due to
the "ls -n" command producing an extra '.' at the end of
the file/directory permissions to indicate extra security
attributes are set.

We need to filter out the trailing '.' in the 'ls -n'
output for testing to pass when SELinux is enabled.

Lustre-change: https://review.whamcloud.com/35026
Lustre-commit: f000996069acc7d535b7574a9d9a4ab65e753ff0

Test-Parameters: trivial envdefinitions=ONLY=420 testlist=sanity
Test-Parameters: clientselinux envdefinitions=ONLY=420 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I4a2f199d2ef4a7b1b6a1b381041b384bb0077cc6
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11873 tests: Increase barrier freeze time 52/35952/2
Patrick Farrell [Fri, 28 Jun 2019 15:32:29 +0000 (11:32 -0400)]
LU-11873 tests: Increase barrier freeze time

Barrier freeze times of 10 seconds or less are roughly the
same length as ZFS commit intervals, and because barriers
generate sync ops, they have to wait for those.  This means
that a 10 second barrier will occassionally expire before
the commit has finished.

Switch to barriers of at least 20 seconds.

Lustre-change: https://review.whamcloud.com/35361
Lustre-commit: 96771280b330af07781326ff8811facd1ca39deb

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I50fc8315c791ed444ccf39755441fdbe3aa1db6c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12657 llite: forget cached ACLs properly 70/35870/2
Alex Zhuravlev [Fri, 9 Aug 2019 19:43:45 +0000 (23:43 +0400)]
LU-12657 llite: forget cached ACLs properly

Lustre with linux-4.* fails ACL tests (e.g. sanity/103 and sanityn/25)
because ll_lock_cancel_bits() does not reset i_acl and i_default_acl
into initial state.  use kernel's forget_all_cached_acls() to do so.

Lustre-change: https://review.whamcloud.com/35756
Lustre-commit: 3df034f8f46b0d22829f7ac83cbf9871823c093c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I468b775e13ba0f7279a6aa320983705f5e79187a
Reviewed-by: Neil Brown <neilb@suse.com>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35870
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12457 kernel: RHEL 7.7 server support 08/35808/2
Jian Yu [Sat, 17 Aug 2019 05:43:49 +0000 (22:43 -0700)]
LU-12457 kernel: RHEL 7.7 server support

This patch makes changes to support new RHEL 7.7 release
for Lustre server.

Test-Parameters: trivial clientdistro=el7.7 serverdistro=el7.7

Change-Id: Ic56e087e6c89f1bbd1ab247c44b2e979828f34f9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35808
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12470 tests: clear MDT-MDT locks for pdo tests 65/35865/2
Patrick Farrell [Fri, 5 Jul 2019 16:50:11 +0000 (12:50 -0400)]
LU-12470 tests: clear MDT-MDT locks for pdo tests

It is not sufficient to clear client locks to avoid
spillover from one tests to the next in the pdo tests, we
must also clear MDT-MDT locks or we can end up waiting for
one of those.

Lustre-change: https://review.whamcloud.com/35321
Lustre-commit: 43ed7101e10e395839f9406bead6a5ac4fb02997

Test-Parameters: trivial testlist=sanityn
Test-Parameters: fstype=zfs testlist=sanityn
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanityn
Test-Parameters: mdscount=2 mdtcount=4 fstype=zfs testlist=sanityn
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8b6a1a6e9a1268a5d87bcb216f54736118ae7ba0
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11537 osp: avoid nested transaction 32/35832/2
Sergey Cheremencev [Mon, 25 Jun 2018 13:28:25 +0000 (16:28 +0300)]
LU-11537 osp: avoid nested transaction

Don't create and start new transaction in
osp_precreate_reserve (osp_sync_force)
because it has been already started in mdd_create.
New transaction rewrites oti_declare_ops_cred
resulting in assert in osd_trans_exec_op:

osd_trans_exec_op()) lustre-MDT0000: opcode 3: credits = 0, rollback = 4
osd_trans_exec_op()) ASSERTION( !ldiskfs_track_declares_assert ) failed:
...
 #2 [ffff88008983f600] panic at ffffffff816a863f
 #3 [ffff88008983f680] lbug_with_loc at ffffffffc0513854 [libcfs]
 #4 [ffff88008983f6a0] osd_create at ffffffffc0dfac32 [osd_ldiskfs]
 #5 [ffff88008983f718] lod_sub_create at ffffffffc101b585 [lod]
 #6 [ffff88008983f7c0] lod_create at ffffffffc100d6e9 [lod]
 #7 [ffff88008983f800] mdd_create_object_internal at ffffffffc0ed8888 [mdd]
 #8 [ffff88008983f838] mdd_create_object at ffffffffc0ec3e05 [mdd]
 #9 [ffff88008983f8b0] mdd_create at ffffffffc0ecc673 [mdd]

Lustre-change: https://es-gerrit.dev.cray.com/153461
Lustre-commit: f9c20f472cb9f500a609bee1db68868cf2ac3c13

Change-Id: Ic2c4589a9a1f640c7a0aa989fc62d81ca08f917f
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Cray-bug-id: LUS-6098
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/33391
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12034 obdclass: disable server-only code on client 66/35566/3
Andreas Dilger [Thu, 18 Jul 2019 19:52:11 +0000 (13:52 -0600)]
LU-12034 obdclass: disable server-only code on client

The lu_env_add(), lu_env_remove(), and lu_env_find() functions are
only used on the server.  Conditionally remove them when doing a
client-only build.

Fixes: fce8d80624fd ("LU-12034 obdclass: put all service env on list")

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I15f4d4de583bb3f9d16adad3ea16f961853ebbe5
Reviewed-on: https://review.whamcloud.com/35566
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12387 tests: Validate l_tunedisk max_sectors_kb tuning 72/35372/2
Chris Horn [Thu, 6 Jun 2019 15:59:18 +0000 (10:59 -0500)]
LU-12387 tests: Validate l_tunedisk max_sectors_kb tuning

Add test to ensure that l_tunedisk only tunes the max_sectors_kb
value of OST devices, and that it properly tunes any slave devices.

Test-parameters: trivial
Test-parameters: fstype=ldiskfs testlist=conf-sanity \
 envdefinitions=ONLY=125

Lustre-change: https://review.whamcloud.com/35081
Lustre-commit: ac8bbb3ddd646e4aa04b77cb1e7640b5865f2c04

Cray-bug-id: LUS-7358
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I414526e71fd7ac2811d7c0e8a6afd80a50788258
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35372
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12660 kernel: kernel update SLES12 SP4 [4.12.14-95.29.1] 75/35775/2
Jian Yu [Mon, 12 Aug 2019 17:38:31 +0000 (10:38 -0700)]
LU-12660 kernel: kernel update SLES12 SP4 [4.12.14-95.29.1]

Update SLES12 SP4 kernel to 4.12.14-95.29.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT=103a

Change-Id: I93c9a255bfa7f5048cd5acf9efe3af707e08e38c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35775
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10094 mdc: dir page ldp_hash_end mistakenly adjusted 12/35812/2
Lai Siyao [Sun, 18 Aug 2019 06:09:19 +0000 (23:09 -0700)]
LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted

On system PAGE_SIZE > 4k, mdc_adjust_dirpages() adjusts dir page
end hash with le64_to_cpu() value, but it should be little endian.

Fixes: 9d087dfd0fd ("LU-4516 mdc: missing lexxx_to_cpu in
mdc_read_entry")

This patch is back-ported from the following one:
Lustre-commit: d8b19ae6617733df003a906aca1791791a5f0eff
Lustre-change: https://review.whamcloud.com/35517

Test-Parameters: clientarch=ppc64 envdefinitions=ONLY="18 22 32 48" \
testlist=sanity

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I89bb8b93f1fe5f7962f0b80d122ef9965cf15c63
Reviewed-on: https://review.whamcloud.com/35812
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12586 lov: Correct write_intent end for trunc 36/35836/2
Patrick Farrell [Wed, 24 Jul 2019 19:50:23 +0000 (15:50 -0400)]
LU-12586 lov: Correct write_intent end for trunc

When instantiating a layout, the server interprets the
write intent from the client as the range [start, end), not
including the last byte.

This is correct for writes because the last byte given for
a write is actually 'endpos', the resulting file pointer
position, and so is not included.

However, truncate is specifiying a size, not an endpos, so
truncate is [start, size].  To make this work with the
[start, end) processing for write_intents, we have to add
1 to the size when sending a write intent.

Without this, a truncate operation to the first byte of a
new layout component fails silently because the component
is not instantiated.

Lustre-change: https://review.whamcloud.com/35607
Lustre-commit: c32c7401426d46b371fa993bba17265443fefa1b

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id2b07abe73455bf1f0ed841ad08c5f381a871315
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35836
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11634 tests: sanityn/test_77 improvements 35/35735/2
Vladimir Saveliev [Mon, 9 Apr 2018 09:18:50 +0000 (12:18 +0300)]
LU-11634 tests: sanityn/test_77 improvements

sshd limits number of simultaneous unauthenticated connections via
MaxStartups configuration parameter. By default, 10 connections are
allowed. nrs_write_read() tries to run up to 32 do_nodes() in
parallel, causing sshd to drop some of connections.

The fix is to have do_nodes() to start required number of dd-s in
parallel.

Minor changes which were probably meant at the development:
- Test filenames include $HOSTNAME so that each client worked with its
own file, it seems. Add missing escaping backslashes so that $HOSTNAME
worked as expected.
- Add conv=notrunc parameter for dd-s which write lustre file at
  different seeks.
- Have reading dd-s to read files which were especially created for
  that.
- use /dev/null instead on /dev/zero to throw read data away.

Lustre-change: https://review.whamcloud.com/33607
Lustre-commit: 43ac1425ad9e8c5fc1e7deff579a443b5c9c7a58

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I496b0f6b50811351ac8e0e606cf5a20843fab5d4
Cray-bug-id: LUS-2493
Test-Parameters: testlist=sanityn envdefinitions=ONLY=77
Reviewed-on: https://review.whamcloud.com/33607
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/35735
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10756 ptlrpc: change IMPORT_SET_* macros into real functions 95/35795/2
James Simmons [Thu, 11 Jul 2019 00:52:34 +0000 (20:52 -0400)]
LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions

Make the IMPORT_SET_STATE_NOLOCK and IMPORT_SET_STATE macros into
normal functions. Since import_set_state_nolock() is basically a
wrapper around __import_set_state() we can merge both functions.

Lustre-change: https://review.whamcloud.com/35463
Lustre-commit: cf78502e48d6dbbc0d6c113e573ba9c68c5c311e

Change-Id: Idaa6aeb81ff2282e2f83d758a267129e686bd794
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35795
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-12343 osc: Fix dom handling in weight_ast 58/35858/2
Patrick Farrell [Wed, 29 May 2019 15:02:19 +0000 (11:02 -0400)]
LU-12343 osc: Fix dom handling in weight_ast

The DOM bit can be cancelled at any time during calls to
weigh_ast, so:

1. We cannot assert that it is present
2. We cannot use it to identify the !LDLM_EXTENT case when
calling osc_lock_weight

Lustre-change: https://review.whamcloud.com/34966
Lustre-commit: 4f3ce87a06e6ed90373218d3aa1eb34a7675db65

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic3e7370580e35d8ae06b8330971959e0d55a4e81
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35858
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-8130 libcfs: don't include rhashtable if unavailable 65/35565/3
Andreas Dilger [Fri, 14 Dec 2018 22:43:41 +0000 (15:43 -0700)]
LU-8130 libcfs: don't include rhashtable if unavailable

Don't include <linux/rhashtable.h> if it is not available.

Lustre-change: https://review.whamcloud.com/34020
Lustre-commit: 29d627f860bc1963f2103ea441577dbd18d71344

Fixes: ac8d93c9f6f9 ("LU-8130 libcfs: support latest rhashtable API")

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I80b2ee63fb2a438399359f8052a5063429254035
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35565
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12539 build: pass --with-o2ib when building deb packages 28/35828/2
Sebastien Buisson [Fri, 12 Jul 2019 13:23:29 +0000 (15:23 +0200)]
LU-12539 build: pass --with-o2ib when building deb packages

When building deb packages (make debs), '--with-o2ib' option is
not passed to ./configure called by package mechanism.
So Lustre deb packages are possibly built against wrong OFED headers.

Lustre-change: https://review.whamcloud.com/35481
Lustre-commit: 8d7f2674337e4f22e200e08ca1ac001ec24b4496

Test-Parameters: trivial
Test-Parameters: trivial clientdistro=ubuntu1804
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9cd1db54e77b97f46c0e0bdfe35084f1a268b70b
Reviewed-on: https://review.whamcloud.com/35828
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12457 kernel: new kernel [RHEL 7.7 3.10.0-1062.el7] 25/35725/2
Jian Yu [Thu, 8 Aug 2019 00:56:26 +0000 (17:56 -0700)]
LU-12457 kernel: new kernel [RHEL 7.7 3.10.0-1062.el7]

This patch makes changes to support new RHEL 7.7 release
for Lustre client.

Test-Parameters: trivial clientdistro=el7.7

Change-Id: I1fd68b56340c8674c9fae607e05faca04ba99a5a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35725
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12589 llite: swab LOV EA data in ll_getxattr_lov() 36/35736/2
Jian Yu [Thu, 8 Aug 2019 19:13:47 +0000 (12:13 -0700)]
LU-12589 llite: swab LOV EA data in ll_getxattr_lov()

On PPC client, the LOV EA data returned by getfattr from x86_64 server
was not swabbed to the host endian. While running setfattr, the data was
swabbed in ll_lov_setstripe_ea_info(), which caused magic mis-match in
ll_lov_user_md_size() and then ll_setstripe_ea() returned -ERANGE.

This patch fixed the above issue by swabbing LOV EA data in ll_getxattr_lov().

Test-Parameters: clientarch=ppc64 \
envdefinitions=ONLY="24D 102a" testlist=sanity

This patch is back-ported from the following one:
Lustre-commit: f4a5957164bb981c93072bb0a28118bb7207a209
Lustre-change: https://review.whamcloud.com/35626

Change-Id: I8069df0c8f07c0bedba2e27db7c3a5553f11afb4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35736
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11771 ldlm: use hrtimer for recovery to fix timeout messages 76/35276/3
James Simmons [Thu, 18 Apr 2019 23:07:39 +0000 (19:07 -0400)]
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages

Currently the functions target_handle_connect/reconnect show
incorrect timeout to the end of recovery:

fs1-OST0000: Recovery already passed deadline 71578:57.
If you do not want to wait more, please abort the recovery by force.
...
fs1-OST0000: Denying connection for new client ...
(1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57

This is due to the assumption that the time returned by the
monotonic clock and jiffies was initialized at the same time but
that is not the case. So a compare between ktime_get_seconds()
and jiffies converted to seconds is invalid.

We solve this by replacing the recovery timer with a hrtimer based
one. Their are many benefits to using a hrtimer over jiffies like
better scaling, power profile, and better handling on tickless
system. This also makes the code clear by using just the real wall
clock in all cases.

Change-Id: I9d7e7e92e67ee942bc1dc51fbb0af7d8f53e54e1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34710
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-on: https://review.whamcloud.com/35276
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12516 mdd: support for volatile creation in .lustre 20/35620/3
Alex Zhuravlev [Tue, 18 Jun 2019 09:18:27 +0000 (13:18 +0400)]
LU-12516 mdd: support for volatile creation in .lustre

this is useful to enable striping manipulation by FIDs.

Lustre-change: https://review.whamcloud.com/35258
Lustre-commit: 9a0a864112550047ae7236c7a904dc7a9955880e

Change-Id: I4d5b1b13acdfef21ac46bf3557e9ab6d5ccc796b
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35620
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12501 utils: fix 'lfs df' printing loop 62/35662/2
Andreas Dilger [Wed, 10 Jul 2019 16:53:59 +0000 (10:53 -0600)]
LU-12501 utils: fix 'lfs df' printing loop

If the OS_STATE_NONROT flag is set for a device, the showdf() state
printing loop will spin endlessly because this bit is not printed,
so it is never cleared from the loop's state mask.

Declaring the obd_statfs_state_names[] array indexed by OS_STATE_*
flags also is problematic because the array will double in size as
new binary flags are added (already OS_STATE_NONROT results in an
array size of 0x200 = 512 entries).  Instead, declare a struct that
is indexed linearly and stores the OS_STATE_* flag in a field,
along with the name and whether the flag indicates a problem state.

The flag printing loop can iterate over the array of flags instead
of the os_state bits, which clarifies the for-loop iteration and is
equally efficient.

This also allows printing informational flags with "lfs df -v" so
that OS_STATE_NONROT and similar flags can be visible to users.

Fixes: 68635c3d9b3 ("LU-11963 osd: Add nonrotational flag to statfs")

Lustre-change: https://review.whamcloud.com/35456
Lustre-commit: e4d92a8a08acbdca6634decd4deb9fe5678ad7ba

Change-Id: Ib62e949ca56d691c4699d5f2d9439c42643ebbe5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11963 osd: Add nonrotational flag to statfs 26/35226/7
Patrick Farrell [Thu, 13 Jun 2019 20:51:44 +0000 (16:51 -0400)]
LU-11963 osd: Add nonrotational flag to statfs

It is potentially useful for the MDS and userspace to
know whether or not an OST is using non-rotational media.

Add a flag to obd_statfs that reflects this.

Users can override this parameter in proc.

ZFS does not currently make this information available to
Lustre, so default to rotational and allow users to
override.

Lustre-Change: https://review.whamcloud.com/34235
Lustre-Commit: 68635c3d9b3113621b93fd989f1a3f8f064385b9

LU-12396 utils: lfs should not output 'nul' char

If lfs prints a nul char, it breaks parsing of the output.

Fixes: 68635c3d9b31 ("LU-11963 osd: Add nonrotational flag to statfs")

Lustre-Change: https://review.whamcloud.com/35137
Lustre-Commit: fd3958b61c5f1c7ed520f07553b999af5522d8e0

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iac2b54c5d8cc1eb79cdace764e93578c7b058661
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-11672 ldlm: awalys cancel aged locks regardless enabling or disabling lru resize 60/35660/2
Gu Zheng [Thu, 11 Jul 2019 05:52:38 +0000 (13:52 +0800)]
LU-11672 ldlm: awalys cancel aged locks regardless enabling or disabling lru resize

Currently cancelling aged locks is handled by of ldlm_pool_recalc routine,
and it only works when lru resize is enabled, means if we disabled lru
resize, old aged locks are still cached even though they reach the
ns_max_age.

But theoretically, even lru resize disabled, lru_max_age should behave
same as enabling lru resize. At the end, lru_size is like hard limit of
number of locks, but ns_max_age/lru_max_age is a elimination mechanism,
regardless enabling or disabling lru resize meaning once it gets
lru_max_age, locks need to be cancelled.

So fix it here with changing the lru flags when invoking ldlm_cancel_lru
to do the real cancel work, if lru resize is enabled, set flag to
LDLM_LRU_FLAG_LRUR, otherwise LDLM_LRU_FLAG_AGED.

Lustre-change: https://review.whamcloud.com/35467
Lustre-commit: e4c490bac7701435cb08ce444d9b23b8fd1dd839

Change-Id: Ic2df2550af87fd7209fdb31ca3730683d727a74d
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/35660
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 tests: use lod / osp tunables on servers 49/35349/4
James Simmons [Tue, 11 Jun 2019 15:39:17 +0000 (11:39 -0400)]
LU-8066 tests: use lod / osp tunables on servers

Before the lustre 2.4 OSD work the lov and osc code was used on
both servers and clients. With the OSD layer work we saw the new
lod and osp layers created that are server specific. To avoid
breakage symlinks were created that went from the lod / osp to
lov / osc directories in the proc tree on the server side.

It has been a very long time since that change so we can now
safely start to unwind that handling. The first step taken here
is to migrate the maloo test from using lov / osc for the server
tunables to using lod / osp instead.

Lustre-commit: c2f43d4c7a609def4292c5b9bee63c9a33cb4598
Reviewed-on: https://review.whamcloud.com/35185

Change-Id: I9dd562cd74d68aaa0226d5ab93042b52193604a1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35185
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35349
Tested-by: jenkins <devops@whamcloud.com>
4 years agoLU-11678 quota: make overquota flag for old req 16/34916/4
Hongchao Zhang [Fri, 29 Mar 2019 13:28:06 +0000 (09:28 -0400)]
LU-11678 quota: make overquota flag for old req

For the old request with over quota flag, the over quota flag
should still be marked at OSC, because the old request could be
processed afther the new request at OST, then it won't break the
quota enforement at OST.

Lustre-change: https://review.whamcloud.com/34645
Lustre-commit: c59cf862c3c06758c270564dd6e8948e167316b9

Test-Parameters: testlist=replay-single,replay-single,replay-single
Test-Parameters: testlist=replay-single,replay-single,replay-single
Change-Id: Ic34c438fe3f018c3b596b26ad6dc945547c8fada
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34916
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10100 llite: swab LOV EA user data 33/35633/2
Jian Yu [Mon, 29 Jul 2019 07:33:52 +0000 (00:33 -0700)]
LU-10100 llite: swab LOV EA user data

Many sub-tests failed with "Invalid argument" failures
on PPC client because of the endianness issue.

This patch fixes the issue by adding a common function
lustre_swab_lov_user_md() to swab the LOV EA user data.

Test-Parameters: clientarch=ppc64 \
envdefinitions=ONLY=27 testlist=sanity

This patch is back-ported from the following one:
Lustre-commit: 9d17996766e0fa93b1029d2422d45d087edde389
Lustre-change: https://review.whamcloud.com/35291

Change-Id: I46bab0788300cd79c4e66e1a4990c3e1f7192391
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35633
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect 35/35635/2
Andriy Skulysh [Wed, 27 Feb 2019 17:37:24 +0000 (19:37 +0200)]
LU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect

Import connect flags are reset to original ones during
reconnect, so a request can be created with unsupported
features.

Use separate obd_connect_data to send connect request.

Lustre-change: https://review.whamcloud.com/34480
Lustre-commit: 1224084c6300d5b15ccb703dfe18209a0f1f12ab

Change-Id: I4cfc48bf7ef66c4f3832613e179030b0eb1d6fdf
Cray-bug-id: LUS-6397
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35635
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12575 build: add ibutils2 for MOFED build 31/35631/2
Minh Diep [Tue, 23 Jul 2019 00:07:02 +0000 (17:07 -0700)]
LU-12575 build: add ibutils2 for MOFED build

MOFED 4.6 include ibutils2 instead of ibutils
Remove ofed rhel5 patch which we don't need

Lustre-change: https://review.whamcloud.com/35590
Lustre-commit: 26444d84c74e693e47c9785423e63402d32acb4f

Change-Id: I46c51eb8a194ea86bd8c951944e5c1427d0f37d0
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35631
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12510 osd: osd-zfs to release zrlock quickly 00/35600/3
Alexey Zhuravlev [Mon, 15 Jul 2019 18:01:59 +0000 (21:01 +0300)]
LU-12510 osd: osd-zfs to release zrlock quickly

otherwise few threads trying to access same dnode can get stuck.
this patch is a quick workaround for the issue, it's supposed
to be replaced with a better patch using regular DMU API.

Lustre-commit: 88b329ac2ab568a25251f3f7c3a7e0c7367cb36f
Lustre-change: https://review.whamcloud.com/35524

Change-Id: I24d9ed7f8e68080c6a46409476a80799dbb45230
Signed-off-by: Alexey Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/35600
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12479 utils: cleanup gcc8 string warnings 86/35586/2
Shaun Tancheff [Mon, 22 Jul 2019 21:11:57 +0000 (14:11 -0700)]
LU-12479 utils: cleanup gcc8 string warnings

Cleanup some trivial buffer overflows

This patch is back-ported from the following one:
Lustre-commit: 164434eed2161337000257968e37d2714e9c9599
Lustre-change: https://review.whamcloud.com/35354

Test-Parameters: trivial
Cray-bug-id: LUS-6962
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I17dd1d042c4050e351aca856931ce1107fd5b08f
Reviewed-on: https://review.whamcloud.com/35586
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 lnet: properly isolate kernel_param_ops 42/35542/6
Andreas Dilger [Fri, 14 Dec 2018 22:46:26 +0000 (15:46 -0700)]
LU-8066 lnet: properly isolate kernel_param_ops

Don't reference kernel_param_ops if not available.  There is
already a HAVE_KERNEL_PARAM_OPS configure check for this, but
it is just misplaced.

Fixes: 7092309f32516cbfb95a964c87b8030129edeb27

Lustre-change: https://review.whamcloud.com/33874
Lustre-commit: eb5a6f97af63585bb92ff65ad441a2281041bc47

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I56f86257c6bc8a9c53c7901bc2765e10587cab07
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35542
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12478 build: rhel8 missing module packaing tools 87/35587/2
Shaun Tancheff [Mon, 22 Jul 2019 21:22:16 +0000 (14:22 -0700)]
LU-12478 build: rhel8 missing module packaing tools

On RHEL8 kmodtool and kernel_module_package_buildreqs
are not installed with kernel-devel

This helps to bootstrap the developer into a working configuration.

This patch is back-ported from the following one:
Lustre-commit: cd282e3e6f1a249546c284d4683bb5bfa5dfdf36
Lustre-change: https://review.whamcloud.com/35356

Test-Parameters: trivial
Cray-bug-id: LUS-7385
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I30cf5c11174b2aa94c663c72e34f8f88c5c90ed8
Reviewed-on: https://review.whamcloud.com/35587
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12401 gss: fix checksum for Kerberos and SSK 36/35536/10
Sebastien Buisson [Fri, 7 Jun 2019 14:45:26 +0000 (23:45 +0900)]
LU-12401 gss: fix checksum for Kerberos and SSK

When computing checksum for Kerberos, krb5 wire token header is
appended to the plain text. Make sure the actual header is appended
in gss_digest_hash().
For interop with older clients, introduce new server side tunable
'sptlrpc.gss.krb5_allow_old_client_csum'. When not set, servers refuse
Kerberos connection from older clients.

In gss_crypt_generic(), protect against an undefined behavior by
switching from memcpy to memmove.

When computing checksum for SSK, make sure the actual token is used
to store the checksum.

Lustre-change: https://review.whamcloud.com/35099
Lustre-commit: 218fc688c11f081881b2cc1c1632ceaf9ec77a77

Fixes: a21c13d4df ("LU-8602 gss: Properly port gss to newer crypto api.")
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Test-Parameters: envdefinitions=SHARED_KEY=true clientbuildno=7033 clientjob=lustre-reviews-patchless testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I0233ada481f132af112bf88c065f5421902c942e
Reviewed-on: https://review.whamcloud.com/35536
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12498 kernel: kernel update [SLES12 SP3 4.4.180-94.97] 16/35416/3
Jian Yu [Wed, 3 Jul 2019 23:30:47 +0000 (16:30 -0700)]
LU-12498 kernel: kernel update [SLES12 SP3 4.4.180-94.97]

Update SLES12 SP3 kernel to 4.4.180-94.97.

Test-Parameters: trivial clientdistro=sles12sp3 serverdistro=sles12sp3

Change-Id: Ia545ff0a54d9cf483a38842f0f7bf42b1fda6875
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12131 tests: fix SSK handling in tests 56/35556/4
Sebastien Buisson [Thu, 28 Mar 2019 07:35:18 +0000 (08:35 +0100)]
LU-12131 tests: fix SSK handling in tests

SSK can be activated for Lustre tests by setting SHARED_KEY env
variable to true.
In setup_all() an additional env variable SK_MOUNTED is used to avoid
mounting an SSK file system twice. But this variable has to be set
back to false in stopall() for consistency.
Some tests are incompatible with SSK, so skip them in case SHARED_KEY
is true. Some other tests playing with nodemaps have to take SSK into
account.

Lustre-change: https://review.whamcloud.com/34521
Lustre-commit: ee6904e312f7d7446f390cff0ec3c6e48b98e32b

Whamcloud-bug-id: ATM-1283
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1016a459c42ffed1ab2b6f67d0a145ed2af9fa40
Reviewed-on: https://review.whamcloud.com/35556
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11453 misc: add compat for INIT_LIST_HEAD_RCU 64/35564/4
Andreas Dilger [Fri, 14 Dec 2018 22:49:36 +0000 (15:49 -0700)]
LU-11453 misc: add compat for INIT_LIST_HEAD_RCU

Add a compat version of INIT_LIST_HEAD_RCU() if unavailable.

Fixes: 68bc3984975bb72f730d8a8ab7aa2d836e50abe5

Lustre-change: https://review.whamcloud.com/33875
Lustre-commit: 773e60669b53b0ca2fb48723a21dcddba592af9a

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c1296206e002b895bb6bdf55ddd0d8ec70cab07
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35564
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12331 llite: create obd_device with usercopy whitelist 28/35528/2
Li Dongyang [Tue, 16 Jul 2019 07:01:01 +0000 (00:01 -0700)]
LU-12331 llite: create obd_device with usercopy whitelist

Since kernel 4.16 hardened usercopy has been added,
whitelist the struct obd_device to silence the warning.

 Bad or missing usercopy whitelist? Kernel memory exposure attempt
 detected from SLUB object 'll_obd_dev_cache' (offset 1256, size 40)!
 WARNING: CPU: 1 PID: 17534 at mm/usercopy.c:83 usercopy_warn+0x7d/0xa0
 Call Trace:
   __check_object_size+0xfa/0x181
   lmv_iocontrol+0x1146/0x1880 [lmv]
   ll_obd_statfs+0x356/0x860 [lustre]
   ll_dir_ioctl+0x1e37/0x6760 [lustre]
   do_vfs_ioctl+0xa4/0x630

Linux-commit: 8eb8284b412906181357c2b0110d879d5af95e52

This patch is back-ported from the following one:
Lustre-commit: 7f77996b1c4ac3a874a1f9e016e8b0e3cfee6992
Lustre-change: https://review.whamcloud.com/34946

Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ie863e8a5e2cebd3fd716e7ccc4e0491f83f6fabc
Reviewed-on: https://review.whamcloud.com/35528
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
4 years agoLU-12368 obdclass: don't send multiple statfs RPCs 85/35485/2
Andreas Dilger [Sat, 29 Jun 2019 01:10:41 +0000 (19:10 -0600)]
LU-12368 obdclass: don't send multiple statfs RPCs

If multiple threads are racing to send a non-cached OST_STATFS or
MDS_STATFS RPC, this can cause a significant RPC storm for systems
with many-core clients and many OSTs due to amplification of the
requests, and the fact that STATFS RPCs are sent asynchronously.
Some logs have shown few 96-core clients have 20k+ OST_STATFS RPCs
in flight concurrently, which can overload the network if many OSTs
are on the same OSS nodes (osc.*.max_rpcs_in_flight is per OST).

This was not previously a significant issue when core counts were
smaller on the clients, or with fewer OSTs per OSS.

If a thread can't use the cached statfs values, limit statfs to one
thread at a time, since the thread(s) would be blocked waiting for
the RPC replies anyway, which can't finish faster if many are sent.

Also add a llite.*.statfs_max_age parameter that can be tuned on
to control the maximum age (in seconds) of the statfs cache.  This
can avoid overhead for workloads that are statfs heavy, given that
the filesystem is _probably_ not running out of space this second,
and even so "statfs" does not guarantee space in parallel workloads.

Lustre-change: https://review.whamcloud.com/35380
Lustre-commit: 1c41a6ac390bf74a135861efcd576a3b433d3c49

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I95690e37aecbac08ac5768a5e5c6c70ca258a832
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/35485
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12491 obdclass: add comment for rcu handling in lu_env_remove 88/35488/2
James Simmons [Mon, 8 Jul 2019 20:47:40 +0000 (16:47 -0400)]
LU-12491 obdclass: add comment for rcu handling in lu_env_remove

During the review it was pointed out why the RCU lock was dropped
in lu_env_remove() but the code itself doesn't explain why. Add
a comment giving the details why RCU locking is not needed.

Test-parameters: trivial

Lustre-change: https://review.whamcloud.com/35447
Lustre-commit: 709fbe6ee54aa2e601237a6981db3d42a8a719cd

Change-Id: I4fd761d2e1b4adad8e970904d56cdcd057dfe7d5
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35488
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12491 obdclass: use RCU to release lu_env_item 87/35487/3
Alex Zhuravlev [Mon, 3 Jun 2019 02:52:42 +0000 (05:52 +0300)]
LU-12491 obdclass: use RCU to release lu_env_item

as rhashtable_lookup_fast() is lockless and can
find just released objects.

Fixes: aa82cc8361 ("obdclass: put all service's env on the list")

Lustre-change: https://review.whamcloud.com/35038
Lustre-commit: 87306c22e4b977356f4857d5f750447639d89c26

Change-Id: I6ed8ccc5bb5b192eed90b55103d11b822ec90692
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35487
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11518 ptlrpc: don't reset lru_resize on idle reconnect 89/35489/2
Andriy Skulysh [Tue, 11 Jun 2019 14:44:32 +0000 (17:44 +0300)]
LU-11518 ptlrpc: don't reset lru_resize on idle reconnect

ptlrpc_disconnect_idle_interpret() clears imp_remote_handle,
so reconnect has pcaa_initial_connect set to 1.

Update only changed ns_connect_flags bits.

Fixes: 5a6ceb664f0 ("LU-7236 ptlrpc: idle connections can disconnect")

Lustre-change: https://review.whamcloud.com/35285
Lustre-commit: acacc9d9b1d0a869f61d7940baa0700b63dcd8f7

Change-Id: I2368708b6381c1d772c47dc6e61c8fb39a14a2cc
Cray-bug-id: LUS-7471
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35489
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12456 kernel: kernel update RHEL 8.0 [4.18.0-80.4.2.el8_0] 12/35412/2
Jian Yu [Wed, 3 Jul 2019 19:09:40 +0000 (12:09 -0700)]
LU-12456 kernel: kernel update RHEL 8.0 [4.18.0-80.4.2.el8_0]

Update RHEL 8.0 kernel to 4.18.0-80.4.2.el8_0 for Lustre client.

Change-Id: I70527c9317c48c61d0f515f35a1a53fc22ffd06b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35412
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11845 osd-zfs: Support encrypted ZFS datasets 76/35476/2
Nathaniel Clark [Wed, 9 Jan 2019 20:43:59 +0000 (15:43 -0500)]
LU-11845 osd-zfs: Support encrypted ZFS datasets

Call zfs::dmu_objset_own and zfs::dmu_objset_disown with
decrypt=B_TRUE

This is called the same way as in zfs modules.

Fixes: 0fedb017c1 ("LU-9890 osd-zfs: dmu_objset_own/disown changes")
Test-Parameters: envdefinitions=ZFS_MKFS_OPTS="encryption=on -o keylocation=file:///etc/adjtime -o keyformat=passphrase" testlist=sanity fstype=zfs

Lustre-change: https://review.whamcloud.com/33999
Lustre-commit: 3d43658fc36e5821d8b094c1c2365e9520dbe9fe

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I1d9bc1a579ac26706a9f6cc5a0d52649ce005228
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35476
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11893 lnet: consoldate secondary IP address handling 42/35442/2
James Simmons [Mon, 8 Jul 2019 17:42:47 +0000 (10:42 -0700)]
LU-11893 lnet: consoldate secondary IP address handling

The last piece of code with broken secondary IP address
support is lnet_parse_ip2nets(). We could fix it like
o2iblnd or socklnd was done but since the LND drivers
resolved those issues instead we can move the handling
out of the LND drivers into one place in the LNet core.
To do this we introduce struct lnet_inetdev which is
a collection of data that the current LNet layer requires.
The new function lnet_inet_enumerate() is used to collect
this information.

This patch is back-ported from the following one:
Lustre-commit: d6d0194c1969db05a6a60718679750ecfd75739b
Lustre-change: https://review.whamcloud.com/34993

Change-Id: I0c532caa3cf6b2178eb1ab65e55e5883d408a185
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/35442
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11845 zfs: put configure checks in version order 75/35475/2
Andreas Dilger [Wed, 9 Jan 2019 21:35:10 +0000 (14:35 -0700)]
LU-11845 zfs: put configure checks in version order

Put the ZFS feature checks in release version order, so that it is
easier to track when they apply and when they can be removed in
the future.

Make the configure checks use decrypt=B_TRUE just to illustrate
more correct usage.

Lustre-change: https://review.whamcloud.com/34000
Lustre-commit: 55c973ba5cb595d51b4990eea9ad6803e7c0645e

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I22053638d72b41b51b6f56dea5668e78535cab07
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35475
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-10717 tests: tests should not start mgs 68/35368/2
Alexander Boyko [Tue, 6 Nov 2018 12:57:15 +0000 (07:57 -0500)]
LU-10717 tests: tests should not start mgs

The conf-sanity prolog do reformat_and_config which leaves
mgs service started, if it is not combined.
So, in general, test should not start mgs service, if it don't
stop mgs. And test should start mgs after reformat.

The client mount requires start of all MDTs, because of
MDT0000-osp-MDT000X synchronization.

Lustre-change: https://review.whamcloud.com/33589
Lustre-commit: aa9f9344fc741523ee8693569effb5b77204f90b

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=conf-sanity
Test-Parameters: standalonemgs=true testlist=conf-sanity
Test-Parameters: standalonemgs=true mdscount=2 mdtcount=4 testlist=conf-sanity
Cray-bug-id: LUS-2524
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I226eab5683afc36efe908b200f46b710f6235374
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35368
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12395 build: build mpitests for el8 73/35473/2
Minh Diep [Fri, 28 Jun 2019 21:28:39 +0000 (14:28 -0700)]
LU-12395 build: build mpitests for el8

RHEL8 has rpm-mpi-hooks which requires binaries
to be in specific mpi bin to generate the correct
requires

See https://fedoraproject.org//wiki/Changes/RpmMPIReqProv
and https://fedoraproject.org/wiki/Packaging:MPI

Test-Parameters: trivial clientdistro=el8 serverdistro=el7.6 testgroup=regression-mpi

Lustre-change: https://review.whamcloud.com/35374
Lustre-commit: 3c7aca74729edb5339d2b84259ba042bf83c214a

Change-Id: Id9fa50e15b48b9da846083b9e9cd894ad1eac967
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35473
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12494 kernel: kernel update SLES12 SP4 [4.12.14-95.19.1] 13/35413/2
Jian Yu [Wed, 3 Jul 2019 19:28:51 +0000 (12:28 -0700)]
LU-12494 kernel: kernel update SLES12 SP4 [4.12.14-95.19.1]

Update SLES12 SP4 kernel to 4.12.14-95.19.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT=103a

Change-Id: I6a101dc2637945192cf8aca661e23c3bccb47609
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35413
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12383 utils: only check project inherit bit for dir 93/35393/2
Wang Shilong [Thu, 6 Jun 2019 02:36:39 +0000 (10:36 +0800)]
LU-12383 utils: only check project inherit bit for dir

Currently, ZFS won't set inherit bit on regular files, but
ext4 always set it, it doesn't make sense for regular files
have this bit, but own it won't do any harm as well.

To make test happy and give a consistent view on users,
let's fix project check only complain erros for Direcotry.

Lustre-change: https://review.whamcloud.com/35076
Lustre-commit: e4ad5c17c99e7ede5deabffe0bacdd851240eb86

Test-Parameters: trivial testlist=sanity-quota
Change-Id: I194f3ed9d6ded69313a683995295ab8c07b4fb3a
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35393
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11872 utils: don't follow link files in default 92/35392/2
Wang Shilong [Fri, 25 Jan 2019 08:54:50 +0000 (16:54 +0800)]
LU-11872 utils: don't follow link files in default

We actually don't support operation on link files itself for now.
As a first step, let's skip link files for now in default,
otherwise, it cause unexpected behavior.

Lustre-change: https://review.whamcloud.com/34111
Lustre-commit: 004b80da5c4b2a7cf4f4885b43c9edec76cd2493

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ib0069ed1982e26984c6cf093f0803bf4a2208fe1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-on: https://review.whamcloud.com/35392
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12387 utils: Avoid passing symlink to tune_block_dev 71/35371/2
Chris Horn [Wed, 5 Jun 2019 00:14:47 +0000 (19:14 -0500)]
LU-12387 utils: Avoid passing symlink to tune_block_dev

In tune_block_dev_slaves we iterate over the directories inside the
slaves subdirectory for the multipath device that is being tuned. For
example:

 $ /usr/sbin/l_tunedisk /dev/mapper/mpathc

Suppose mpathc maps to /dev/dm-2. tune_block_dev will initially set
the value of
/sys/devices/virtual/block/dm-2/queue/max_sectors_kb
equal to the value of
/sys/devices/virtual/block/dm-2/queue/max_hw_sectors_kb

Then it looks at the entries in /sys/devices/virtual/block/dm-2/slaves
Suppose the slave devices are as follows:

 $ ls /sys/devices/virtual/block/dm-2/slaves
 sdc  sdh  sdm  sdr
 $

It then calls tune_block_dev recursively, passing
/sys/devices/virtual/block/dm-2/slaves/sdc,
/sys/devices/virtual/block/dm-2/slaves/sdh, etc. However, these are
symlinks that point to directories and as such tune_block_dev will not
tune them because stat does not identify them as block devices.

Instead we should contruct the path argument for these recursive calls
as /dev/<d_name>. In this example, /dev/sdc, /dev/sdh, etc.

Lustre-change: https://review.whamcloud.com/35065
Lustre-commit: c632a238e6d0c4a3240959a894d36a8a409d64f8

Cray-bug-id: LUS-7358
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I63bc073a82384d68648ff23a56b7d43d6656159b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12387 utils: Read existing ldd data in l_tunedisk 70/35370/2
Chris Horn [Tue, 4 Jun 2019 19:34:01 +0000 (14:34 -0500)]
LU-12387 utils: Read existing ldd data in l_tunedisk

Read the lustre_disk_data from the device passed to l_tunedisk, so
we can determine whether the device is an MGT or MDT and thus skip
the tuning of the device.

Lustre-change: https://review.whamcloud.com/35066
Lustre-commit: 9cb4f810164efa5f058dc7605fb1835ea51b0a92

Fixes: 892280742a2b ("LU-9551 utils: add l_tunedisk to fix disk tunings")
Fixes: 2f8d7b4679de ("LU-11736 utils: don't set max_sectors_kb on MDT/MGT")
Cray-bug-id: LUS-7358
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I193fe008d5777b0e83f2be9a500eaffb1d3ca615
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35370
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-12381 ko2iblnd: ignore down interfaces 49/35249/2
James Simmons [Mon, 17 Jun 2019 19:25:48 +0000 (12:25 -0700)]
LU-12381 ko2iblnd: ignore down interfaces

The for_each_netdev() loop in kiblnd_create_dev() scans for all
network devices on a system. Currently the code exit when an
network device is down but the device could be something besides
an IB device. Instead of exiting just ignore any device that is
down.

This patch is back-ported from the following one:
Lustre-commit: 1dea5aac9d9be99c4b317a491f308872b97bf0e6
Lustre-change: https://review.whamcloud.com/35098

Test-Parameters: trivial

Fixes: c4b39bf56bbc ("LU-11893 o2iblnd: add secondary IP address handling")
Change-Id: I0a3bf808d849cd00711b6ef2e4e5bbd876b64903
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35249
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 osd-ldiskfs: inode times switched to timespec64 47/35247/2
Li Dongyang [Mon, 17 Jun 2019 19:01:36 +0000 (12:01 -0700)]
LU-11838 osd-ldiskfs: inode times switched to timespec64

Since kernel 4.18 inode times swtich from struct timespec
to timespec64 to make it y2038 safe.

Linux-commit: 95582b00838837fc07e042979320caf917ce3fe6

This patch is back-ported from the following one:
Lustre-commit: 3af55b3159ac2133dc35eeb2f02825848fb65548
Lustre-change: https://review.whamcloud.com/34675

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Iaddb2f2be27ec348fb97e13371aa3d7e6f6e5c9f
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35247
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10885 llite: enable flock mount option by default 87/34987/2
Andreas Dilger [Tue, 2 Oct 2018 21:52:28 +0000 (15:52 -0600)]
LU-10885 llite: enable flock mount option by default

The "flock" mount option has been optional for many years, initially
because of potential stability issues, and also to provide a choice
for administrators to select between "flock" and "localflock" options.

However, from the large number of problems that users report when
trying to use applications that depend on this feature (typically
databases and other cloud stacks) that disabling flock by default
causes more problems than it solves.

Enable the "flock" (distributed coherent userspace locking) feature
by default.  If applications do not need this functionality, then it
will not affect them.  If applications *do* need this functionality,
they will get it.  If administrators really know what they are doing,
then they can use the "localflock" feature to enable client-local
flock functionality, possibly only on select nodes that need this.

Users wanting to disable this functionality should mount with the
existing "-o noflock" mount option, or build the client with the
"configure --disable-flock" option.

If clients are already using "-o {flock|localflock|noflock}" then
their existing options will be handled appropriately.

Lustre-change: https://review.whamcloud.com/32091
Lustre-commit: 3613af3e15cbc6091e3a16c8caeb1307be2d91f6

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I182637604fa22573b1da6b6b86d8915e3c3ebbe5
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-on: https://review.whamcloud.com/34987
Tested-by: Jenkins
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8365 ldiskfs: procfs entries for mballoc 42/34842/3
Lokesh Nagappa Jaliminche [Mon, 4 Jul 2016 09:04:20 +0000 (14:34 +0530)]
LU-8365 ldiskfs: procfs entries for mballoc

Export mballoc streaming block allocator variables
mb_last_group and mb_last_start through procfs.

Lustre-change: https://review.whamcloud.com/21142
Lustre-commit: 75703118588f2b23afd8c8815e5ebb768fc7a8ff

Test-Parameters: testgroup=review-ldiskfs
Change-Id: I5dd00503a81c6819751c9f99b64615b497ef4e28
Cray-bug-id: LUS-3176
Signed-off-by: Lokesh Nagappa Jaliminche <lokesh.jaliminche@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34842
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12458 kernel: kernel update RHEL7.6 [3.10.0-957.21.3.el7] 69/35269/3
Jian Yu [Fri, 28 Jun 2019 18:07:19 +0000 (11:07 -0700)]
LU-12458 kernel: kernel update RHEL7.6 [3.10.0-957.21.3.el7]

Update RHEL7.6 kernel to 3.10.0-957.21.3.el7.

Test-Parameters: clientdistro=el7.6 serverdistro=el7.6

Change-Id: I78133d5bc7567d8ea56c4b1aebc3e97096495fad
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35269
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12166 test: fix broken detection on ZFS 24/34924/2
Wang Shilong [Sun, 7 Apr 2019 03:44:51 +0000 (11:44 +0800)]
LU-12166 test: fix broken detection on ZFS

We intent to run the command on mds, otherwise
project quota will never be tested.

Lustre-change: https://review.whamcloud.com/34609
Lustre-commit: 0f2cd5948b870c0f82a70bdf32f0c5f6d845144d

Test-Parameters:trivial fstype=zfs
Fixes: a046e87 ("LU-7991 quota: project quota against ZFS backend")
Change-Id: I8650a0e1065f0bb465da01556472d3d23b22a530
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34924
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12447 utils: specify correct size for lfs project buffer 84/35284/2
Wang Shilong [Fri, 21 Jun 2019 06:28:10 +0000 (23:28 -0700)]
LU-12447 utils: specify correct size for lfs project buffer

Enviorment:
Fedora release 28 (Twenty Eight)

gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
Copyright (C) 2018 Free Software Foundation, Inc.

Hit build failure:
lfs_project.c: In function â€˜lfs_project_item_alloc’:
lfs_project.c:72:2: error: â€˜strncpy’ specified bound 4096
equals destination size [-Werror=stringop-truncation]
  strncpy(lpi->lpi_pathname, pathname, sizeof(lpi->lpi_pathname));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This patch is back-ported from the following one:
Lustre-commit: ffef6e3271ad1136d3ab1c2ee229b4690a6722a0
Lustre-change: https://review.whamcloud.com/35257

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Ia6429c47391bf503546609ec6a262fe24664bdd4
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-on: https://review.whamcloud.com/35284
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12399 tests: avoid 'pdsh localhost' in sanity test_420 50/35250/2
Sebastien Buisson [Mon, 17 Jun 2019 19:30:26 +0000 (12:30 -0700)]
LU-12399 tests: avoid 'pdsh localhost' in sanity test_420

sanity test_420 needs a clean env to execute openfile, ie not
inherited from root user.
Replace 'pdsh localhost' with simpler 'su - $uname -c' alternative
to achieve this.

This patch is back-ported from the following one:
Lustre-commit: 1476ac047b449886a0c382b840a7b09dc0cec7eb
Lustre-change: https://review.whamcloud.com/35176

Test-Parameters: trivial envdefinitions=ONLY=420 testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ifeba7fc1eba86d74a64cca187e286adb23147e2e
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35250
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11893 o2iblnd: add secondary IP address handling 48/35248/2
James Simmons [Mon, 17 Jun 2019 19:21:53 +0000 (12:21 -0700)]
LU-11893 o2iblnd: add secondary IP address handling

Using dev_get_by_name() in kiblnd_create_dev() means we can only
discover primary IP addresses. This breaks using network
aliasing which some people use. Move away from dev_get_by_name()
to using for_ifa() so we can detect any secondary IP addresses.

This patch is back-ported from the following one:
Lustre-commit: c4b39bf56bbcacd49d7f888a0745cd4b5580b36b
Lustre-change: https://review.whamcloud.com/34476

Change-Id: I03f2f8d18118b716a5eb5fb87694000ac06fe242
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/35248
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 sysfs: make ping sysfs file read and writable 13/35313/2
James Simmons [Wed, 12 Dec 2018 16:19:45 +0000 (11:19 -0500)]
LU-8066 sysfs: make ping sysfs file read and writable

Starting with 4.15 kernels any sysfs read only is limited to
root access only. To retain the ability for non root users
to detect if a remote server is alive using the 'ping' sysfs
file we need to change it to writable. Retain the read ability
so older tools will work.

Lustre-change: https://review.whamcloud.com/33776
Lustre-commit: 6bbae72c6900dbd2b853d716bc4d456dc7fd586e

Change-Id: I6560c119328d723a20a2b32e1fa8c68dce5d407a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33776
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/35313
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
4 years agoLU-12382 llite: fix deadloop with tiny write 12/35312/2
Wang Shilong [Tue, 4 Jun 2019 12:54:01 +0000 (20:54 +0800)]
LU-12382 llite: fix deadloop with tiny write

For a small write(<4K), we will use tiny write and
__generic_file_write_iter() will be called to handle it.

On newer kernel(4.14 etc), the function is exported and will
do something like following:

|->__generic_file_write_iter
  |->generic_perform_write()

If iov_iter_count() passed in is 0, generic_write_perform() will
try go to forever loop as bytes copied is always calculated as 0.

The problem is VFS doesn't always skip IO count zero before it comes
to lower layer read/write hook, and we should do it by ourselves.

To fix this problem, always return 0 early if there is no
real any IO needed.

Lustre-change: https://review.whamcloud.com/35058
Lustre-commit: e9a543b0d3039027423cb469525015f97caa3a3f

Change-Id: I765a723da79eb5fd09317c3fad47fe479b1dd4fb
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35312
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-8066 utils: have llapi_target_iterate use sysfs tree 81/34781/5
James Simmons [Tue, 25 Jun 2019 13:29:07 +0000 (09:29 -0400)]
LU-8066 utils: have llapi_target_iterate use sysfs tree

Update llapi_target_iterate() to not use 'devices' but collect the
data from the lustre sysfs tree itself.

Lustre-change: https://review.whamcloud.com/33799
Lustre-commit: b24d69492b818457d9da0d6dce3adc0f91f18ec6

Change-Id: If100b4918bdcc8b24e72f37127048a32a808310f
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34781
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
4 years agoLU-12269 build: fix hardened builds in rpm spec file 61/35161/3
Ben Menadue [Tue, 11 Jun 2019 03:38:18 +0000 (20:38 -0700)]
LU-12269 build: fix hardened builds in rpm spec file

The hardened build configure on RHEL8 has a quoted string
with spaces in it, and this breaks the construction of
%eval_configure on lustre.spec.in - the quotes end up in
the wrong place.

Moreover, the hardened build flags are only for user-space
code, and breaks kernel code compilation on RHEL 8.0 (it
adds -fPIE, which isn't valid for kernel code.

This patch stores the %build_cflags and %build_ldflags from
rpmbuild as environment variables before turning hardened
build off to allow the kernel code to build. These
environment variables are used in the lnet/utils and
lustre/utils Makefiles so that the user-space code there
gets the benefit of any system-specific RPM build flag
(such as hardened builds).

For RHEL7 on PPC64 we then also need to define the C macro
__SANE_USERSPACE_TYPES__ so that __s64 and __u64 are long
long instead of the default long - otherwise the build will
fail with a format string error on this platform because
Lustre uses %ll when printing/scanning __s64/__u64.

The environment variables (UTILS_CFLAGS and UTILS_LDFLAGS)
could also be used for a standalone, non-RPM build to pass
flags to the user-space code, with the usual CFLAGS and
LDFLAGS still used for kernel code.

This patch is back-ported from the following one:
Lustre-commit: 5270583ae6e436e9e7ae0199312e7f50365744af
Lustre-change: https://review.whamcloud.com/34882

Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: I9b4ba830bf63838fd88ef1bae5dd10dff2109a1d
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/35161
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11742 test: have libtool execute the test binaries 83/34583/7
James Simmons [Fri, 14 Jun 2019 04:52:06 +0000 (21:52 -0700)]
LU-11742 test: have libtool execute the test binaries

With the move to libtools the ability to run all the lustre
utilities form the source tree was lost. To work around this
the libtool -no-install flag was used to prevent the creation
of the libtool wrappers. While this worked to restore the
source tree sand box development new package breakage is showing.
This is due to the rpath being hard coded into the utilies when
-no-install is used and some platforms disable fixed rpaths.

A very similar problem exist for people who want to use gdb to
debug their projects application. gdb does not work on libtool
wrappers as well so the recommended approach to this type of
problem is to use the libtool execute command. This command
allows the execution of an external non project binary, like
gdb, with the projects real binary application. Apply this
approach to the lustre test suite so commands like kill can
be used to shutdown lustre utilies that are not installed into
the testing environment.

Lustre-change: https://review.whamcloud.com/33947
Lustre-commit: f9e5224fbb60bb8b44753b7be10cb06108627f89

Change-Id: I74112f7250f1c43313d868c0edc7c8815d373002
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34583
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12195 tests: use sleep instead of wrapped multiop 55/34955/5
Alex Zhuravlev [Fri, 14 Jun 2019 04:43:47 +0000 (21:43 -0700)]
LU-12195 tests: use sleep instead of wrapped multiop

in sanity/43* and sanity/14* tests as multiop is not a binary,
but libtool-wrapped script. the tests fail when started from a
build tree.

Lustre-commit: 9a1f327a76f72c7713e53d8b354ff7f0e32be870
Lustre-change: https://review.whamcloud.com/34721

LU-12261 tests: Race between exec and truncate

Execing '$tdir/sleep' with & doesn't guarantee the file is
actually open before returning, so it is sometimes losing
the race with truncate, resulting in errors like this:
/usr/lib64/lustre/tests/sanity.sh: line 4172:
/mnt/lustre/d43b.sanity/sleep: Text file busy

Where $tdir/sleep gets ETXTBSY, instead of truncate as
expected.

A 1 second delay should be enough to guarantee exec wins
the race vs truncate.

Test-Parameters: trivial
Test-Parameters: testgroup=review-ldiskfs-arm
Test-Parameters: testgroup=review-ldiskfs
Test-Parameters: testgroup=review-ldiskfs-arm

Lustre-commit: c64855fca1504bddcb0fc7ad7316d8d6b20a9c6f
Lustre-change: https://review.whamcloud.com/34791

Change-Id: Iaec3433f03aab23583052373e5f0252d9eac7f04
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34955
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11893 ksocklnd: add secondary IP address handling 59/35159/4
James Simmons [Tue, 11 Jun 2019 07:49:50 +0000 (00:49 -0700)]
LU-11893 ksocklnd: add secondary IP address handling

With ksocknal_enumerate_interfaces() use of for_primary_ifa() only
primary IP addresses are returned. This disables using network
aliasing which some people use. Change for_primary_ifa() to
for_ifa() so we can detect any secondary IP addresses. Update the
string handling since ifa_device names can be different than the
net_device name. Discard the 'j' counter and instead keep
ksnn_ninterfaces up to date. This measn that we return 0 on
sucess, rather than a count of added interfaces. Update the
too many interfaces test in ksocknal_enumerate_interfaces()
with a better test using ARRAY_SIZE.

This patch is back-ported from the following one:
Lustre-commit: 9a2013af0668737dc56424c5c6eaac01621f6c17
Lustre-change: https://review.whamcloud.com/34392

Change-Id: I832df89148def5088502ac92df27b8b3872f3792
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35159
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 socklnd: use for_each_netdev() instead of lnet_ipif_enumerate() 58/35158/3
NeilBrown [Tue, 11 Jun 2019 05:37:58 +0000 (22:37 -0700)]
LU-11838 socklnd:  use for_each_netdev() instead of lnet_ipif_enumerate()

for_each_netdev() is a more direct interface and doesn't require
library support.

Also get the ip address directly from the net_device, rather than
using lnet_ipif_query().

Linux-commit: f703f71afd98e6e7ec70f92ffc52ef3ffffcd849
Linux-commit: 9eb957b98aa6322abde33240bf50dd483c5d1190

This patch is back-ported from the following one:
Lustre-commit: e9d9cbb072956f2582c97263184aecd196bba14a
Lustre-change: https://review.whamcloud.com/33966

Change-Id: I82894991b9a4a250d0560af31325b6c765cc0620
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-on: https://review.whamcloud.com/35158
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 kernel: harden current_time autoconf test 70/35170/2
James Simmons [Tue, 11 Jun 2019 05:26:42 +0000 (22:26 -0700)]
LU-11838 kernel: harden current_time autoconf test

In newer kernels CURRENT_TIME was replaced by current_time(). The
return value of current_time() was struct timespec but to support
time after 2038 the return value was changed to struct timespec64.
This change broke the autoconf test. The solution is to use one
of the struct iattr field in the autoconf test since it hides
the return value type.

Test-Parameters: trivial

This patch is back-ported from the following one:
Lustre-commit: 74b3726f42b1f72e289e3c3252030a62646afa7b
Lustre-change: https://review.whamcloud.com/33963

Change-Id: I95abd2cd2b777f99cbf6ab78370ee2171e5fca67
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-on: https://review.whamcloud.com/35170
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11359 mdt: fix mdt_dom_discard_data() timeouts 97/35197/3
Mikhail Pershin [Wed, 31 Oct 2018 13:28:29 +0000 (16:28 +0300)]
LU-11359 mdt: fix mdt_dom_discard_data() timeouts

The mdt_dom_discard_data() issues new lock to cause data
discard for all conflicting client locks. This was done in
context of unlink RPC processing and may cause it to be stuck
waiting for client to cancel their locks leading to cascading
timeouts for any other locks waiting on the same resource and
parent directory.

Patch skips discard lock waiting in the current context by
using own CP callback for that which doesn't wait for blocking
locks. They will be finished later by LDLM and cleaned up in
that completion callback. So current thread just makes sure
discard locks are taken and BL ASTs are sent but doesnt't wait
for lock granting and that fixes the original problem.

At the same time that opens window for race with data being
flushed on client, so it is possible that new IO from client
will happen on just unlinked object causing error message and
it is not possible to distinguish that case from other
possibly critical situations. To solve that the unlinked object
is pinned in memory while until discard lock is granted.
Therefore, such objects can be easily distinguished as stale one
and any IO against it can be just silently ignored.

Older clients are not fully compatible with async DoM discard so
patch adds also new connection flag ASYNC_DISCARD to distinguish
old clients and use old blocking discard for then.

Lustre-change: https://review.whamcloud.com/34071
Lustre-commit: 9c028e74c2202a8a481557c4cb22225734aaf19f

Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I419677af43c33e365a246fe12205b506209deace
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35197
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838: lnet: remove lnet_ipif_enumerate() 60/35160/5
NeilBrown [Tue, 11 Jun 2019 07:52:04 +0000 (00:52 -0700)]
LU-11838: lnet: remove lnet_ipif_enumerate()

Also remove lnet_ipif_query() and related functions.

There are no longer any users of these functions, so remove them.

Linux-commit: 6e659fcfab0cdd876a555a752acf9997f98acbcd

This patch is back-ported from the following one:
Lustre-commit: dedd3706945ef759d7d645cde30fa488c8ced4a1
Lustre-change: https://review.whamcloud.com/34234

Change-Id: I8183e505e3dbe12ff71ddf38f5b18a945d8a4a6c
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/35160
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12270 o2iblnd: pci_unmap_addr() removed in 4.19 57/35157/4
Li Dongyang [Tue, 11 Jun 2019 07:47:14 +0000 (00:47 -0700)]
LU-12270 o2iblnd: pci_unmap_addr() removed in 4.19

Since kernel 4.19 the pci_unmap_addr() wrappers have
been removed, along with linux/pci-dma.h
We can use the good old DEFINE_DMA_UNMAP_ADDR instead
of DECLARE_PCI_UNMAP_ADDR.

Linux-commit: 18b01b16e8bae9cd227909f6e6d2783d74855f65

This patch is back-ported from the following one:
Lustre-commit: 0cae491cc6d3cc949972366a3fdfdf32dfea5912
Lustre-change: https://review.whamcloud.com/34827

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I387bd3d1c4e8c3bc75400ce1be05132fb25f8a50
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35157
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 llite: address_space ->page_tree renamed ->i_pages 56/35156/3
Li Dongyang [Tue, 11 Jun 2019 05:55:30 +0000 (22:55 -0700)]
LU-11838 llite: address_space ->page_tree renamed ->i_pages

kernel 4.17 renamed address_space renamed ->page_tree to ->i_pages,
and switched to xa_lock on the radix_tree_root.

Linux-commit: b93b016313b3ba8003c3b8bb71f569af91f19fc7

This patch is back-ported from the following one:
Lustre-commit: 2d0c621d21be4e67b6075b76017af6e6fcd18c64
Lustre-change: https://review.whamcloud.com/34673

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Iadbc5eda884dbe8ad0d694e0f88255bc496dea5b
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35156
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-11838 ldlm: struct timespec64.tv_sec type change 75/35175/2
Li Dongyang [Tue, 11 Jun 2019 05:52:12 +0000 (22:52 -0700)]
LU-11838 ldlm: struct timespec64.tv_sec type change

Since kernel 4.18 struct timespec64 is no longer defined
as struct timespec on 64bit systems, this means tv_sec
is no longer __kernel_time_t but now time64_t.

Use %llu as the format specifier and explicitly cast it
to unsigned long long.

This patch is back-ported from the following one:
Lustre-commit: f2bf0379a773c8c1659bfe018a22861784a0b9a6
Lustre-change: https://review.whamcloud.com/34677

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ib4c80c9b20854d45b1b3c04057c45ee20d5413d9
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35175
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 osp: atomic64_read() returns s64 74/35174/2
Li Dongyang [Tue, 11 Jun 2019 05:46:42 +0000 (22:46 -0700)]
LU-11838 osp: atomic64_read() returns s64

Since kernel 4.17 atomic64_read on x86_64 returns s64
instead of long.

Use %llu as the format specifier and explicitly cast it
to unsigned long long.

This patch is back-ported from the following one:
Lustre-commit: dc46952ecd1aa09e738b2de6b1a3076ecbaa740e
Lustre-change: https://review.whamcloud.com/34676

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I805d43251f24417e6405f5d087927c15cf531619
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35174
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 lnet: getname dropping addrlen argument 73/35173/2
Li Dongyang [Tue, 11 Jun 2019 05:43:55 +0000 (22:43 -0700)]
LU-11838 lnet: getname dropping addrlen argument

Since kernel 4.17 ->getname() does not take int *addrlen
argument anymore, instead it's returning the length to
the caller.

Linux-commit: 9b2c45d479d0fb8647c9e83359df69162b5fbe5f

This patch is back-ported from the following one:
Lustre-commit: dbb81e826290b2db27e24a85869c9d0736726caa
Lustre-change: https://review.whamcloud.com/34672

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I4ad5de4a22f3fb23c07a356650ea7925acf07eed
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/35173
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 llite: remove assert for acl refcount 72/35172/2
James Simmons [Tue, 11 Jun 2019 05:40:39 +0000 (22:40 -0700)]
LU-11838 llite: remove assert for acl refcount

The purpose of this asssert to was to ensure lustre
was properly managing its posix_acl access. This test
is invalid due to the VFS layer also taking references
on the posix_acl. In reality their is no simple way to
detect this class of mistakes.

* lastest kernels remove this refcount *

Linux-commit: 6a42e615a28bad49f2e04829486e94190c066390

This patch is back-ported from the following one:
Lustre-commit: df7bfbb1c7890deed15fd85e75da70d88be2ef7f
Lustre-change: https://review.whamcloud.com/34236

Change-Id: I167f2de449a2e8357517f33c2e81a25b25104d57
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35172
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 o2iblnd: get IP address more directly. 66/35166/3
NeilBrown [Tue, 11 Jun 2019 05:33:43 +0000 (22:33 -0700)]
LU-11838 o2iblnd: get IP address more directly.

Use dev_get_by_name() and for_primary_ifa() to
get IP address for a named device.  This is more
direct.

Linux-commit: 10e138e41a4343fd1a88e4543990205d134e562a
Linux-commit: 9eb957b98aa6322abde33240bf50dd483c5d1190

This patch is back-ported from the following one:
Lustre-commit: 7a40cd2c83d174ae0bb7e22d62fad9fbd247a654
Lustre-change: https://review.whamcloud.com/33970

Change-Id: Ic4562c3948934bacb8613e9f6f57f609ecc04de7
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-on: https://review.whamcloud.com/35166
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 lnet: change lnet_ipaddr_enumerate() to use for_each_netdev() 71/35171/2
NeilBrown [Tue, 11 Jun 2019 05:29:31 +0000 (22:29 -0700)]
LU-11838 lnet: change lnet_ipaddr_enumerate() to use for_each_netdev()

for_each_netdev() is a more direct interface than
lnet_ipif_enumerate(), so use it instead.  Also get
address and 'up' status directly from the device.

This means we need to possible re-allocate the storage
space if there are lots of IP addresses.

However there is no need to resize the allocation down if we
over-allocated.  This is only used once, and is freed soon
after it is allocated, so that is a false optimization.

Linux-commit: 0400cf406c32ac3968241cd528747d922b6c55c3

This patch is back-ported from the following one:
Lustre-commit: f5991afd8779fe747778e28e998277a10242a57d
Lustre-change: https://review.whamcloud.com/33969

Change-Id: I1c1e7722c7b2b267dcb8134ae295a54f976d96ad
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35171
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>