Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-12207 tests: allow some margin for sanity/76 27/37827/2
Andreas Dilger [Fri, 6 Mar 2020 23:58:46 +0000 (16:58 -0700)]
LU-12207 tests: allow some margin for sanity/76

With newer slab allocators it is possible that the kernel may keep
some inodes in the per-cpu cache and not release all of them.  This
was resulting sanity test_76 failures with a few (often 12) inodes
not being freed of the inodes that are created by the test, like:

    inode slab grew from 3224 to 3236
    inode slab grew from 70368 to 70380
    inode slab grew from 68878 to 68890

Allow some small number of inodes (8 per core) cached on the client
without considering it a test failure failure.

Clean up test_76 code style to current standards.

Test-Parameters: trivial testlist=sanity env=ONLY=76,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia3f60de2fb471bb32da27d36665a3a0fd43ebbe5
Reviewed-on: https://review.whamcloud.com/37827
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10447 tests: remove use of SETSTRIPE from sanity 31/37831/2
Andreas Dilger [Mon, 9 Mar 2020 09:37:22 +0000 (03:37 -0600)]
LU-10447 tests: remove use of SETSTRIPE from sanity

The recently landed test_136() was using $SETSTRIPE instead of
$LFS setstripe, but that was removed from test-framework.sh in a
later patch (written earlier and didn't take it into account).
This test doesn't fail during normal testing because it is skipped
due to SLOW testing.  Change it to use $LFS setstripe instead.

Fixes: cc1092291932 ("LU-13069 obdclass: don't skip records for wrapped catalog")
Fixes: be3b7e772d3a ("LU-10447 tests: deprecate use of $SETSTRIPE/$GETSTRIPE")
Test-Parameters: trivial testlist=sanity env=ONLY=136,ONLY_REPEAT=10
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibb1e8220e2de711e7034d91baa329bdc04687c72
Reviewed-on: https://review.whamcloud.com/37831
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13343 tests: skip recovery-small test_140 with SSK 32/37832/2
Sebastien Buisson [Mon, 9 Mar 2020 08:47:13 +0000 (17:47 +0900)]
LU-13343 tests: skip recovery-small test_140 with SSK

recovery-small test_140a and test_140b are using a 'local client',
ie a client mounted on a server node.
This is not compatible with SSK keys installed by the test framework,
so just skip these tests when running with SSK.

Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true,ONLY=140 testlist=recovery-small
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I420deed9dbc50da622766648a51f885ed01203c6
Reviewed-on: https://review.whamcloud.com/37832
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-13309 osd-ldiskfs: remove per-page object_get/put in brw 58/37758/3
Andrew Perepechko [Fri, 28 Feb 2020 13:18:40 +0000 (16:18 +0300)]
LU-13309 osd-ldiskfs: remove per-page object_get/put in brw

According to profiling data, object_get/put calls consume a lot
of CPU ticks:

99.99%  [kernel.vmlinux]
           |
           |--28.82%--_atomic_dec_and_lock
           |          |
           |           --28.82%--lu_object_put
           |                     |
           |                      --28.73%--osd_bufs_put
           |                                ofd_commitrw
           |                                tgt_brw_read
           |                                tgt_request_handle
           |                                ptlrpc_server_handle_request
           |                                ptlrpc_main
           |                                kthread
           |                                ret_from_fork
           |
           |--26.51%--osd_bufs_get
           |          |
           |           --26.51%--ofd_preprw
           |                     tgt_brw_read
           |                     tgt_request_handle
           |                     ptlrpc_server_handle_request
           |                     ptlrpc_main
           |                     kthread
           |                     ret_from_fork
           |
           |--18.09%--lu_object_put
           |          |
           |           --18.01%--osd_bufs_put
           |                     ofd_commitrw
           |                     tgt_brw_read
           |                     tgt_request_handle
           |                     ptlrpc_server_handle_request
           |                     ptlrpc_main
           |                     kthread
           |                     ret_from_fork

ofd_preprw_read() and ofd_preprw_write() pin the corresponding
ofd object, later ofd_commitrw_read() and ofd_commitrw_write()
unpin it. When the ofd object is pinned, its underlying
osd object cannot go away, so object_get/object_put
in osd_bufs_get()/osd_bufs_put() are basically no-op.

Change-Id: Ic48f793de5ef3e62505f44879c91050922160000
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Cray-bug-id: LUS-4388
Reviewed-on: https://review.whamcloud.com/37758
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13258 libcfs: fixes for cfs_arch_init() 27/37727/3
Mr NeilBrown [Tue, 25 Feb 2020 20:59:56 +0000 (07:59 +1100)]
LU-13258 libcfs: fixes for cfs_arch_init()

The introduction of cfs_arch_init() brought two problems.

1/ wait_bit_init() wasn't known due to a missing include file.
2/ cfs_arch_init() was not marked __init, but it called
   a function (wait_bit_init) that was.

This patch fixes both of these.

Fixes: 3453c95f513c ("LU-13258 libcfs: make apply_workqueue_attrs() available for Lustre")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I6f19e11e68f52ca8071332364d369ed3a717d5c9
Reviewed-on: https://review.whamcloud.com/37727
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12477 lustre: restore time_after32() 25/37725/3
Mr NeilBrown [Tue, 25 Feb 2020 20:53:38 +0000 (07:53 +1100)]
LU-12477 lustre: restore time_after32()

time_after32 is needed when compiling with kernels earlier than v4.14,
which haven't had it backported.  E.g. SLE12-SP3-LTSS needs it.

Fixes: 8e88bbfef579 ("LU-12477 lustre: remove obsolete config checks")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ifb51b9ed0945ad28f8e8aa34dada3860107b95df
Reviewed-on: https://review.whamcloud.com/37725
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-13261 mdt: PFL layout changed while accessing 84/37684/2
Hongchao Zhang [Sun, 19 Jan 2020 06:26:10 +0000 (01:26 -0500)]
LU-13261 mdt: PFL layout changed while accessing

The PFL layout EA could be enlarged when the corresponding layout
of some IO range is started to be written, which can cause other
thread to get incorrect layout size at "mdt_intent_layout" and cause
"mdt_lvbo_fill" to fail checking the real layout size.

In Lustre, "ldlm_handle_enqueue0" has processed the error "-ERANGE"
and it will retry after expanding the layout buffer size, then it
only needs to decrease the debug level of the log in "mdt_lvbo_fill"

Change-Id: Iad722d1dac187f57ae77606a4d4587525412cd68
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37684
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
4 years agoLU-13276 lnet: Update nnis to avoid infinite loop 76/37676/2
Chris Horn [Thu, 20 Feb 2020 23:38:11 +0000 (17:38 -0600)]
LU-13276 lnet: Update nnis to avoid infinite loop

The goto loop in lnet_push_target_resize() is infinite because
the loop variable 'nnis' is not updated with the new value from
ln_push_target_nnis.

Cray-bug-id: LUS-8526
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I65b0bc0b56393f2296bafa3a964c59840baa0643
Reviewed-on: https://review.whamcloud.com/37676
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13277 lnet: Discovery thread can deadlock on shutdown 75/37675/2
Chris Horn [Fri, 21 Feb 2020 00:01:45 +0000 (18:01 -0600)]
LU-13277 lnet: Discovery thread can deadlock on shutdown

Drop the net_lock/EX before breaking out of the loop to avoid
deadlock.

Cray-bug-id: LUS-8525
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ie62b55fa45b5795937eb1480a1fcabe295fed0ee
Reviewed-on: https://review.whamcloud.com/37675
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13282 tests: wait $LFSCK_BG_PID properly 68/37668/2
Elena Gryaznova [Fri, 21 Feb 2020 14:40:17 +0000 (17:40 +0300)]
LU-13282 tests: wait $LFSCK_BG_PID properly

ha.sh fails with:
  wait: pid $LFSCK_BG_PID is not a child of this shell
because wait $LFSCK_BG_PID duplicates wait in ha_stop_loads().

Patch fixes this issue.

Test-Parameters: trivial
Cray-bug-id: LUS-7930
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: Ifdeab0dc570e9da889ccb34c0b47473e1077bfdc
Reviewed-on: https://review.whamcloud.com/37668
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13253 libcfs: protect libcfs_debug_dumplog() 88/37588/3
Alex Zhuravlev [Sat, 15 Feb 2020 06:16:15 +0000 (09:16 +0300)]
LU-13253 libcfs: protect libcfs_debug_dumplog()

as it uses global state to wait for dumping thread completion and
it doesn't make sense to dump concurrently.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie6916a0bb53a0a3aa8992413811fe7a908f86276
Reviewed-on: https://review.whamcloud.com/37588
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13234: osd-ldiskfs: hold inode mutex for ldiskfs_orphan_add() 28/37528/3
Wang Shilong [Tue, 11 Feb 2020 02:32:29 +0000 (10:32 +0800)]
LU-13234: osd-ldiskfs: hold inode mutex for ldiskfs_orphan_add()

See following warning:

/home/green/git/lustre-release/ldiskfs/namei.c:3310 ldiskfs_orphan_add+0x11e/0x2a0 [ldiskfs]
...
Call Trace:
[<ffffffff817d1711>] dump_stack+0x19/0x1b
[<ffffffff8108ba58>] __warn+0xd8/0x100
[<ffffffff8108bb9d>] warn_slowpath_null+0x1d/0x20
[<ffffffffa0adbfce>] ldiskfs_orphan_add+0x11e/0x2a0 [ldiskfs]
[<ffffffffa0bb3b73>] osd_punch+0x2a3/0x6c0 [osd_ldiskfs]
[<ffffffffa0692541>] ? tgt_fmd_put+0x41/0x120 [ptlrpc]
[<ffffffffa0e8b0a8>] ofd_object_punch+0x7e8/0xce0 [ofd]
[<ffffffffa0e7a5d3>] ofd_punch_hdl+0x4f3/0xa70 [ofd]
[<ffffffffa0673f55>] tgt_request_handle+0x965/0x1620 [ptlrpc]
[<ffffffffa0229dde>] ? libcfs_nid2str_r+0xfe/0x130 [lnet]
[<ffffffffa0616f60>] ptlrpc_server_handle_request+0x250/0xb10 [ptlrpc]
[<ffffffff810c6941>] ? __wake_up_common_lock+0x91/0xc0
[<ffffffff810c6250>] ? sched_feat_set+0xf0/0xf0
[<ffffffffa061b1c0>] ptlrpc_main+0xcb0/0x1cb0 [ptlrpc]
[<ffffffff810c665d>] ? finish_task_switch+0x5d/0x1b0
[<ffffffffa061a510>] ? ptlrpc_register_service+0xff0/0xff0 [ptlrpc]
[<ffffffff810b8254>] kthread+0xe4/0xf0
[<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
[<ffffffff817e5ddd>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140

Hold mutex lock for ldiskfs_orphan_add() to eliminate warning.

Fixes: f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock() for truncates")
Change-Id: I810c3c170649b3c96d98f227a480bb1092f2386c
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37528
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13004 ptlrpc: simplify bd_vec access. 73/36973/4
Mr NeilBrown [Wed, 4 Dec 2019 02:38:05 +0000 (13:38 +1100)]
LU-13004 ptlrpc: simplify bd_vec access.

Now that there are no kvecs in ptlrpc_bulk_desc, only bdvecs, we can
simplify the access, discarding the containing struct and the macros,
and just accessing the fields directly.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I068a7a280f130bf0b53b9c572ed47ef0cc999102
Reviewed-on: https://review.whamcloud.com/36973
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13005 lnet: remove the 'queue' from LNetEQ 44/36844/6
Mr NeilBrown [Wed, 20 Nov 2019 01:11:57 +0000 (12:11 +1100)]
LU-13005 lnet: remove the 'queue' from LNetEQ

All calls to LNetEQAlloc pass a size of 0, so no queue
is ever allocated.
So remove the 'size' arg, and all code that depends on
it being non-zero.
Similarly remove eq_size, eq_deq_seq eq_enq_seq and eq_events
as they are always 0/NULL.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icb7bb352fa61cd6ea46847676e583e738eeeda8c
Reviewed-on: https://review.whamcloud.com/36844
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13005 lnet: remove LNetEQPoll 43/36843/7
Mr NeilBrown [Wed, 20 Nov 2019 01:01:56 +0000 (12:01 +1100)]
LU-13005 lnet: remove LNetEQPoll

There are no longer any users for LNetEQPoll, so remove it
and any mention of it.

Also remove the ln_eq_waitq which is no longer used.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If0c1a2634e879c62925314a55b4c2ae0512d1837
Reviewed-on: https://review.whamcloud.com/36843
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13005 lnet: don't use LNetEQPoll for ping replies. 42/36842/7
Mr NeilBrown [Wed, 20 Nov 2019 00:54:28 +0000 (11:54 +1100)]
LU-13005 lnet: don't use LNetEQPoll for ping replies.

lnet_ping() is the only user of LNetEQPoll.  All other event queues
register a callback and don't make use of queuing.
The queuing provided by lib-eq isn't really needed in the kernel where
callbacks are so easy.

So as a step towards simplifying lib-eq, change lnet_ping() to
register a callback which uses a completion to notify when the ping is
finally complete.

Note that the handling of 'rc' in the current code is a little
confused.
If LNetGet() succeeds, but no LNET_EVENT_REPLY is received, then
rc will be set to -EIO.  There is code to try to set it to ETIMEOUT
or the return value from LNetEQPoll, but this code is ineffectual.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie937887524bfa3ca9c2828d673d431411720c7ae
Reviewed-on: https://review.whamcloud.com/36842
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12214 build: fix build with MPI 27/36427/7
Alexey Lyashkov [Fri, 8 Nov 2019 07:58:29 +0000 (10:58 +0300)]
LU-12214 build: fix build with MPI

 lack of MPI library dependence will lost a mpi tests
 from build

Cray-bug-id: LUS-6033, LUS-7204
Test-parameters: trivial
Change-Id: I88b8ad67a9a2863fdcd02e2df5289fdd480c2a74
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/36427
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12214 build: fix build with mofed 26/36426/7
Alexey Lyashkov [Fri, 8 Nov 2019 07:58:29 +0000 (10:58 +0300)]
LU-12214 build: fix build with mofed

add a MOFED / OPA modules dependency to produce a right
package.

Test-parameters: trivial
Cray-bug-id: LUS-6033, LUS-7204
Change-Id: I843762945c5dbb59d53e1913b7382813bfba86ad
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/36426
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12214 build: fixes if the name is not just 'lustre' 24/36424/7
Alexey Lyashkov [Wed, 22 Jan 2020 11:02:27 +0000 (14:02 +0300)]
LU-12214 build: fixes if the name is not just 'lustre'

fix using a %{name} macro in spec file.
this allow to have right names for server packages.

Cray-bug-id: LUS-5915
Test-parameters: trivial
Change-Id: I2ae271b5344fb899bb053f82d2534355ce60aa3a
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/36424
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12780 osp: don't use ptlrpc_thread for opd_update_thread 64/36264/5
Mr NeilBrown [Wed, 23 Oct 2019 00:30:50 +0000 (11:30 +1100)]
LU-12780 osp: don't use ptlrpc_thread for opd_update_thread

rather than ptlrpc_thread, use native kthreads functionality.

- There is no need to synchornized on startup - do all the startup
  that can fail before starting the thread.  This involves puting the
  lu_env in the per-thread struct osp_updates.

- Synchronization on shutdown is down with kthread_stop() and
  kthread_should_stop().  lu_env_put() call is moved to after
  kthread_stop() call, as it is theoretically possible that the
  thread function never gets called, so it isn't safe for it to
  be responsible for cleanup.

- opd_update_thread is replace with ou_update_task in struct
  osp_updates.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I78dafbadf419ee9b80a9bd0046152fb6f293f191
Reviewed-on: https://review.whamcloud.com/36264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11510 lfs: migrate a composite layout file correctly 82/36082/10
Emoly Liu [Wed, 26 Feb 2020 14:12:59 +0000 (22:12 +0800)]
LU-11510 lfs: migrate a composite layout file correctly

The patch fixes the following issues:
- in function migrate_open_files(), "layout" pointer should be used
  instead of "param" pointer to tell whether a comp file should be
  created or not, because "param" pointer is always not null and
  the composite layout file will never be created;
- make --copy and --yaml options work correctly in lfs_migrate tool;
- when a composite layout file is migrated, "--copy" option will be
  added to preserve its layout in both lfs_migrate and "lfs migrate",
  and in such situation, pool name will be saved as well;
- when a file is restriped with -R option by lfs_migrate, the file
  will be set with its parent's stripe by default, by adding
  "--copy $parent_dir" option;
- do some code cleanup in lfs_migrate and sanity.sh test_56wb/c

sanity.sh test_56xd/xe are added to verify this patch.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I85779c69e74444eb869f28add4363ad3a6835b97
Reviewed-on: https://review.whamcloud.com/36082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12542 ldlm: don't access l_resource when not locked. 84/35484/5
Mr NeilBrown [Thu, 27 Feb 2020 15:15:16 +0000 (10:15 -0500)]
LU-12542 ldlm: don't access l_resource when not locked.

lock->l_resource can (sometimes) change when the resource
isn't locked.
So dereferencing lock->l_resource and then locking the
resource looks wrong.
As lock_res_and_lock() returns the locked resource, this
code can easily be more obviously correct by using
that return value.

Change-Id: Iced0bf1af4fa8ddedffa817e00f1c6a02b035d76
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/35484
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11505 tests: customise run_*() functions 52/33352/12
Elena Gryaznova [Thu, 20 Feb 2020 16:50:21 +0000 (19:50 +0300)]
LU-11505 tests: customise run_*() functions

For PFL testing we need the ability to run
parallel-scale tests with composite files.
Patch adds the subtests' specific variables *_STRIPEPARAMS
which allow to set any specified striping/composite layout
for test directory.

Test-Parameters: trivial testlist=parallel-scale,parallel-scale-nfsv3
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Cray-bug-id: LUS-5936
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I767fa17523f5c40d50260892901dc947a1da8a24
Reviewed-on: https://review.whamcloud.com/33352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9868 llite: remove lld_it field of ll_dentry_data 41/37741/2
Mr NeilBrown [Thu, 27 Feb 2020 03:08:30 +0000 (14:08 +1100)]
LU-9868 llite: remove lld_it field of ll_dentry_data

This field is never set nor used.  Let's remove it.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If01ba4deec34a6528b3c36c45a378b9f3824ca2f
Reviewed-on: https://review.whamcloud.com/37741
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 osc: fix to return right weight in osc_lock_weight() 35/37735/2
Wang Shilong [Thu, 27 Feb 2020 01:36:05 +0000 (09:36 +0800)]
LU-9679 osc: fix to return right weight in osc_lock_weight()

cl_io_init() might return a negative value, it could be indicated
lock should be not eliminated, osc_lock_weight() expect a positive
value, fix to return 1 in this case.

This is not a problem for now, as osc_io_init() only return 0,
To avoid any potential issues in the future, better fix it.

Test-Parameters: trivial
Change-Id: I657e5dcc4bb7691bf3b4ca06df7cb87945008a93
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37735
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 osc: use rb_entry_safe 04/37604/2
Geliang Tang [Tue, 20 Dec 2016 13:56:55 +0000 (21:56 +0800)]
LU-9679 osc: use rb_entry_safe

Use rb_entry_safe() instead of container_of() to simplify the code.

Linux-Commit: e3e0293ca9b9 ("staging: lustre: osc: use rb_entry_safe")

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9f8e19d45d859a6c7b7aa01093a1c2211063874c
Reviewed-on: https://review.whamcloud.com/37604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 osc: simplify osc_page_gang_lookup() 99/37599/2
NeilBrown [Mon, 17 Dec 2018 04:06:47 +0000 (15:06 +1100)]
LU-9679 osc: simplify osc_page_gang_lookup()

osc_page_gang_lookup() has 4 values that it can receive from a
callback, and that it can return to the caller:
CLP_GANG_OKAY,
CLP_GANG_RESCHED,
CLP_GANG_AGAIN,
CLP_GANG_ABORT

"AGAIN" is never used.
"RESCHED" is not needed as a cond_resched() can safely be called at
the point this is returned, rather than returning it.
That leaves "OKAY" and "ABORT" which can simply by "true" and "false"
boolean values.

Internalizing the RESCHED case means the callers don't need to loop
themselves.  This simplifies calling patterns.

Linux-Commit: 1e8fd6f9806c ("lustre: osc_cache: simplify
osc_page_gang_lookup()")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I603963b72e4299bebcdaf4064428d6281fd12def
Reviewed-on: https://review.whamcloud.com/37599
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13102 llog: fix processing of a wrapped catalog 02/37102/4
Alexander Boyko [Mon, 16 Dec 2019 13:24:16 +0000 (08:24 -0500)]
LU-13102 llog: fix processing of a wrapped catalog

The logic for rereading a llog buffer had an exception
for a full catalog, when lgh_last_idx = llh_cat_idx and a first
processing index is a llh_cat_idx+1. This check is based on
a value lh_last_idx, which stays unchanged between buffer read.
But llh_cat_idx could go forward, and this lead to a wrong check
where reread doesn't happen. As a result Lustre got ENOENT for
a record and stoped osp processing.

llog_cat_set_first_idx())
catlog [0x6:0x1:0x0] first idx 34730, last_idx 34731
osp_sync_process_queues()) 1 changes, 0 in progress, 0 in flight
llog_process_thread())
stop processing plain 0x76941:1:0 index 64767 count 1
llog_process_thread())
index: 34731, lh_last_idx: 34730 synced_idx: 34730 lgh_last_idx: 34731
llog_cat_process_common()) processing log [0x2780f:0x1:0x0]:0
at index 34731 of catalog [0x6:0x1:0x0]
llog_cat_id2handle()) snx11281-OST0001-osc-MDT0001:
error opening log id [0x2780f:0x1:0x0]:0:rc = -2

The patch fixes logic and also adds/changes debugging for
llog and osp.

Cray-bug-id: LUS-8193
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I9463223a1ea904b96643b19e1580f5894142c12b
Reviewed-on: https://review.whamcloud.com/37102
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 years agoLU-9679 libcfs: Add CFS_ALLOC_PTR_ARRAY and free 75/36975/6
Mr NeilBrown [Thu, 14 Nov 2019 02:59:39 +0000 (13:59 +1100)]
LU-9679 libcfs: Add CFS_ALLOC_PTR_ARRAY and free

Following the pattern of CFS_ALLOC_PTR() and the kernel
pattern of kmalloc_array(), add
  CFS_ALLOC_PTR_ARRAY()
and
  CFS_FREE_PTR_ARRAY()

which allocate and free arrays given a pointer to
the array, and a number of elements

This makes code easier to read and could be a step
toward using the kernel's hardened alloc_array interfaces,
which insure the multiplication doesn't overflow.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I54c919bd9ce22fbc72715c3da16f8c29e7135ccc
Reviewed-on: https://review.whamcloud.com/36975
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12950 osd-ldiskfs: increase supported size to 1024tb 05/36705/6
Artem Blagodarenko [Thu, 7 Nov 2019 15:56:26 +0000 (18:56 +0300)]
LU-12950 osd-ldiskfs: increase supported size to 1024tb

Currently attempts of creating ldisk file system with size >512TB
finished with message:

    LDISKFS-fs does not support file systems greater than 512TB
    and can cause data corruption. Use "force_over_512tb" mount
    option to override.

Change "force_over_512tb" mount option to "force_over_1024tb" as
testing for these large filesystems have not shown any serious
functional problems (though there are performance issues to address).

Test-Parameters: trivial fstype=ldiskfs testlist=conf-sanity
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Cray-bug-id: lus-6815
Change-Id: I0f84fe85aaab05ab0bafa5f0e6074e1690d79899
Reviewed-on: https://review.whamcloud.com/36705
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
4 years agoLU-13229 ldlm: unlock request memory leak 70/37670/2
Alexey Lyashkov [Fri, 21 Feb 2020 15:36:42 +0000 (18:36 +0300)]
LU-13229 ldlm: unlock request memory leak

resending an unlock request can caused an memory leak as new request
will allocated on resend. Lets finish a request before resend.

Fixes: 85a12c6c8d7 ("LU-12828 ldlm: FLOCK request can be processed twice")
Cray-bug-id: LUS-8447
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I15dfe0388d1305c8eb5be18b19b0ffc687454ef1
Reviewed-on: https://review.whamcloud.com/37670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13296 obd: make statfs cache working again 53/37753/3
Alexey Lyashkov [Thu, 27 Feb 2020 14:48:48 +0000 (17:48 +0300)]
LU-13296 obd: make statfs cache working again

Once statfs raced on mutex, lets read a cached data instead
of trash.

Test-Parameters: testlist=sanity envdefinitions=ONLY=423,ONLY_REPEAT=500
Fixes: 1c41a6ac390b ("LU-12368 obdclass: don't send multiple statfs RPCs")
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I268782875c30c078f239c194f69cdf7506d66169
Reviewed-on: https://review.whamcloud.com/37753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
4 years agoLU-13209 build: Fix vvp_account_page_dirtied 78/37778/3
Shaun Tancheff [Tue, 3 Mar 2020 01:17:18 +0000 (19:17 -0600)]
LU-13209 build: Fix vvp_account_page_dirtied

Fix compile error vvp_account_page_dirtied undefined

Fixes: 788e464a72 ("LU-13288 llite: Find account_page_dirtied on module init")
Cray-bug-id: LUS-8472
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iae0a3af1b091165423a2bb24a9159a7af8ab3cbe
Reviewed-on: https://review.whamcloud.com/37778
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13241 utils: use libext2fs for ldiskfs operations 56/37656/7
Li Dongyang [Fri, 21 Feb 2020 02:36:26 +0000 (13:36 +1100)]
LU-13241 utils: use libext2fs for ldiskfs operations

Instead of using debugfs to stat and read the CONFIGS/mountdata,
we can switch to libext2fs and control the flags used when
opening the ldiskfs, to reduce the mount time on huge targets.

Change-Id: I9da8fc1c77d149fe5cf3bd19b0ff76d892620101
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/37656
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13314 test: add 56ob to ALWAYS_EXCEPT for now 67/37767/3
Wang Shilong [Mon, 2 Mar 2020 04:55:27 +0000 (12:55 +0800)]
LU-13314 test: add 56ob to ALWAYS_EXCEPT for now

There is a problem that 'lfs find' try to calculate
365 days for one year, this will be problem for leap year.

We need fix test or codes, before that let's disable this test
to make other patches landed possibly.

Change-Id: I79d34ce29657b4d149a0fbe82220cfefa0c60378
Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=56ob
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37767
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12988 ldiskfs: mballoc to prefetch groups 33/37633/3
Alex Zhuravlev [Mon, 2 Dec 2019 08:23:30 +0000 (11:23 +0300)]
LU-12988 ldiskfs: mballoc to prefetch groups

ahead of scanning. prefething is done in 8 * flex_bg groups, so
it should be 8 read-ahead reads for a single allocating thread.
at the end of allocation the allocating thread waits for read-ahead
completion and initializes buddy information so that read-aheads
are not lost in case of memory pressure.
at cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of
bitmaps looking for a perfect group and ignoring groups with
good chunks.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ibce9a060900544abe994f4d7a20ac9c2979ccc56
Reviewed-on: https://review.whamcloud.com/37633
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
4 years agoLU-12043 llite: move tunable params to sysfs_memparse() 49/34849/16
Andreas Dilger [Fri, 10 May 2019 21:09:41 +0000 (15:09 -0600)]
LU-12043 llite: move tunable params to sysfs_memparse()

Move the max_read_ahead_* tunables from debugfs to sysfs, since
they follow the one-value-per-file rule and should be visible to
regular users.

Rename the functions and constants from *readahead* to *read_ahead*
or *READ_AHEAD* to match the tunable names from procfs.

Deprecate usage of llprocfs_str_with_units_to_s64(), lu_str_to_s64(),
llprocfs_str_with_units_to_u64(), and lu_str_to_u64(), and instead
use sysfs_memparse() to parse sizes in the few remaining places
where they are used.  A separate patch will remove those functions.

Minor fix to the "lctl set_param" man page.

Fixes: adb5aca3d673 ("LU-8066 llite: Move all remaining procfs entries to debugfs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2cdf5f8f0aeca458ed1989366102c33ae83ebbe5
Reviewed-on: https://review.whamcloud.com/34849
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13096 llite: fix potential overflow in ll_max_cached_mb_seq_write() 07/37707/4
Wang Shilong [Tue, 25 Feb 2020 07:58:55 +0000 (15:58 +0800)]
LU-13096 llite: fix potential overflow in ll_max_cached_mb_seq_write()

atomic_long_cmpxchg() return long, however @rc is int, if we have
a larger memory etc 24T, we will get overflow, and @diff will never
change thus we get a hang loop there.

Test-Parameters: trivial
Fixes: adb5aca3d673 ("LU-8066 llite: Move all remaining procfs entries to debugfs")
Change-Id: I20d6feff9797ba10a089587bee0d8691bee460df
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37707
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13283 tests: add racer to nonmpi load 71/37671/5
Elena Gryaznova [Fri, 21 Feb 2020 15:40:38 +0000 (18:40 +0300)]
LU-13283 tests: add racer to nonmpi load

Patch adds the ability to run racer as one of
nonmpi loads. All racer parameters can be set over
RACERP:
Example:
  RACERP="MDSCOUNT=3 DURATION=600 RACER_ENABLE_STRIPED_DIRS=false"
etc.

ha_start_mpi_loads() is improved to not create
machinefiles if no mpi load set.

Test-Parameters: trivial
Cray-bug-id: LUS-8297
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I974838ce199897284396ef53d1a487d1a1ae1774
Reviewed-on: https://review.whamcloud.com/37671
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13281 tests: ha.sh improvements 66/37666/2
Elena Gryaznova [Fri, 21 Feb 2020 14:28:55 +0000 (17:28 +0300)]
LU-13281 tests: ha.sh improvements

Path adds the possbility to set the different layouts
for clients directories. This allows to run the clients loads
on the directories with DoM, PFL, SEL, etc. including the simple
layouts set on old clients, which is useful for inter-operation
testing.

Patch adds the possibility to split the full client list to
separate subsets by specifying NCLIENTSSET equal to 1 by
default. For NCLIENTSSET=2 two subsets will be created, each
one are passed to corresponding machinefile. This allow
to split the loads which is also useful for inter-operation:
old clients can operate with directories with old layouts while
the new clients operate with directories with new DoM, PFL, SEL,
etc. layouts.
For NCLIENTSSET=2, ${#ha_clients[@]}=4 and
CLIENTSSTRIPE='"-E $((64*1024)) -L mdt -E EOF" "-c -1 "'
layout "-E $((64*1024)) -L mdt -E EOF" will be applied on
even clients ${ha_clients[0]} and ${ha_clients[2]},
layout "-c -1 " will be applied on odd clients ${ha_clients[1]}
and ${ha_clients[3]}.

Test-Parameters: trivial
Cray-bug-id: LUS-6906
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: Ia6f8c07e3936ea773706c3e79217672c68d77bd0
Reviewed-on: https://review.whamcloud.com/37666
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13219 tests: add nfs-server service in setup-nfs.sh 63/37663/2
Jian Yu [Fri, 21 Feb 2020 08:50:38 +0000 (00:50 -0800)]
LU-13219 tests: add nfs-server service in setup-nfs.sh

For RHEL 8.1, the NFS server service is nfs-server
instead of nfs.

Test-Parameters: trivial \
clientdistro=el8.1 serverdistro=el8.1 \
testlist=parallel-scale-nfsv3,parallel-scale-nfsv4

Change-Id: I5a97c8fe419187412dfc02047ed66141f567d7fb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13270 tests: dom-performance fixes 44/37644/3
Elena Gryaznova [Thu, 20 Feb 2020 13:29:24 +0000 (16:29 +0300)]
LU-13270 tests: dom-performance fixes

Patch makes the fs cleanup at the start of dom-performance
optional and depending on t-f global DO_CLEANUP option
value. For DO_CLEANUP=false script removes only the
directories created by previous dom-performance session.
Without this fix test run 2nd time fail with:
  lfs setdirstripe: dirstripe error on '/mnt/testfs/dp_dne':
      stripe already set
  lfs setdirstripe: cannot create stripe dir '/mnt/testfs/dp_dne':
      File exists

Patch removes the check of the files createmany, statmany
and smalliomany which are part of lustre/tests. With
the existing check test_smallio() is always skipped
if run on PWD != lustre/tests.

Patch removes the comparison with MDSSIZE and adds the
comparison with real mds size. Without this fix the test
works incorrectly if run on existing fs (formatall was not
done during this session: Lustre was created with mds size
differs from MDSSIZE).

Add skip_env() to report about missing mdtest/dbench/IOR/etc.
Without this fix  the tests are skipped silently.

Test-Parameters: trivial mdssizegb=20 testlist=dom-performance
Cray-bug-id: LUS-7349
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I5fdf1ad480edb17598cbe427bc550396ebe97808
Reviewed-on: https://review.whamcloud.com/37644
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13269 tests: make lnet-selftest.sh more flexible 40/37640/2
Elena Gryaznova [Thu, 20 Feb 2020 12:50:32 +0000 (15:50 +0300)]
LU-13269 tests: make lnet-selftest.sh more flexible

lnet-selftest.sh adds both LST tests: from client to server and
from server to client, but sometimes we are interesting to execute
the tests only from server or only from client.

Patch allows to set required "from" parameter via lst_FROM variable.
The following values are used:
lst_FROM=c : add the tests from clients to server only
lst_FROM=s : add the tests from server to clients only

Test-Parameters: testlist=lnet-selftest envdefinitions=lst_FROM=c
Cray-bug-id: LUS-8178
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0111a4386b3d8acb022db11f11a0be970864696d
Reviewed-on: https://review.whamcloud.com/37640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13268 tests: customize lnet-selftest for performance 39/37639/2
Elena Gryaznova [Thu, 20 Feb 2020 12:32:54 +0000 (15:32 +0300)]
LU-13268 tests: customize lnet-selftest for performance

Sometimes we need to run read/write/ping tests separately
with different lst check flags.

Patch allows to:
  set the required list of tests via new lst_TESTS variable;
  set lst check to LST_BRW_CHECK_NONE, LST_BRW_CHECK_FULL
      or LST_BRW_CHECK_SIMPLE.

Test-Parameters: trivial testlist=lnet-selftest
Cray-bug-id: LUS-8005
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I502028f645f39f391c19b2028fc6ba5b7bdb1a96
Reviewed-on: https://review.whamcloud.com/37639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13267 tests: improve racer cleanup 38/37638/2
Elena Gryaznova [Thu, 20 Feb 2020 10:06:24 +0000 (13:06 +0300)]
LU-13267 tests: improve racer cleanup

Add RACER_MAX_CLEANUP_WAIT parameter to specify
timeout for racer cleanup to avoid long waiting in case which
racer processes went non-killable.

Loop in racer_cleanup() contains inaccuracy which made the loop
to sleep less that it was supposed to. Fix it.

Test-Parameters: trivial testlist=racer
Cray-bug-id: LUS-8498
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I32dac8bc11ef2041a1a580054c2782780bb5980e
Reviewed-on: https://review.whamcloud.com/37638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13227 LDLM: update LVB if it is server lock 11/37611/6
Wang Shilong [Wed, 19 Feb 2020 08:29:16 +0000 (16:29 +0800)]
LU-13227 LDLM: update LVB if it is server lock

ldlm_glimpse_ast() is registered for server lock which means
when client send a glimpse request, it just return a special error
for this lock, it is possible that local object has size expanding
with this PW lock, so we should try update LVB upon error.

Originally, ldlm_cb_interpret() has codes to handle this error,
but it only try to handle case with some clients race, it doesn't
cover server lock cases especially after we turn on lockless for DIO.

Fixes: 6bce536725 ("LU-4198 clio: turn on lockless for some kind of IO")
Change-Id: Ic84fd19d9eaf7f8245b8f7a2165ee5913849ac01
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37611
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13160 tests: fix sanity-hsm monitor setup 95/37595/2
Li Dongyang [Mon, 17 Feb 2020 02:53:16 +0000 (13:53 +1100)]
LU-13160 tests: fix sanity-hsm monitor setup

On RHEL8, even we are using pdsh -R ssh,
the ssh still waits for the remote cat process
to finish.
Use the subshell to avoid the time out.

Change-Id: Id5b8d492b5ce9a235da73448ade475ade145bbed
Test-Parameters: trivial clientdistro=el8.1 testlist=sanity-hsm
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/37595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13211 ldiskfs: rework data-in-dirent for linux 5.4.7+ 67/37467/6
Shaun Tancheff [Thu, 20 Feb 2020 00:41:26 +0000 (18:41 -0600)]
LU-13211 ldiskfs: rework data-in-dirent for linux 5.4.7+

Linux commit v5.4-rc3-92-g109ba779d6cc
ext4: check for directory entries too close to block end

This impacts the ext4-data-in-dirent.patch due to the usage
of EXT4_DIR_REC_LEN

Invert the original patch from:
  EXT4_DIR_REC_LEN(<int>)        => __EXT4_DIR_REC_LEN(<int>)
  EXT4_DIR_REC_LEN(de->name_len) => EXT4_DIR_REC_LEN(de)
to:
  EXT4_DIR_REC_LEN(<int>)        => EXT4_DIR_REC_LEN(<int>)
  EXT4_DIR_REC_LEN(de->name_len) => EXT4_DIR_ENTRY_LEN(de)

Doing this allows the patch to apply cleanly over a wider
range of kernel releases.

Test-Parameters: trivial
Cray-bug-id: LUS-8478
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I95349743f323bb150854ff4541bd2c88f01662a6
Reviewed-on: https://review.whamcloud.com/37467
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12945 lnet: Disable zero copy when running on VM 00/37300/9
Shaun Tancheff [Wed, 19 Feb 2020 16:17:06 +0000 (10:17 -0600)]
LU-12945 lnet: Disable zero copy when running on VM

When running on a hypervisor platform zero copy buffers
may still be referenced when write queue size is zero

So when running on a hypervisor push the zero copy size limit
above max payload size of 16M.

Use the hypervisor test added to linux v4.14-119-g79cc74155218
and provide a replacement for earlier kernels.

kernel-commit: 79cc74155218316b9a5d28577c7077b2adba8e58

Cray-bug-id: LUS-8072
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5582a6aa8da6f48deafaf13d60cf67a09d7a7231
Reviewed-on: https://review.whamcloud.com/37300
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13110 kernel: kernel update SLES12 SP4 [4.12.14-95.45.1] 23/37123/3
Jian Yu [Fri, 14 Feb 2020 19:10:09 +0000 (11:10 -0800)]
LU-13110 kernel: kernel update SLES12 SP4 [4.12.14-95.45.1]

Update SLES12 SP4 kernel to 4.12.14-95.45.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT="103a 817"

Change-Id: I1f7024465b4b6334488b7314f1073fafa10331d6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37123
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13082 tests: enable lgss_keyring debug traces 42/37042/8
Sebastien Buisson [Mon, 16 Dec 2019 18:50:10 +0000 (03:50 +0900)]
LU-13082 tests: enable lgss_keyring debug traces

Enable lgss_keyring debug traces in test framework, and collect
systemd journal in case of test failure.
Also, if needed, dump SSK keys, keyring and nodemap definitions.

Test-Parameters: trivial
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I201d97b640045e3cbc8dcd4cd4b25e0e4e644037
Reviewed-on: https://review.whamcloud.com/37042
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-13097 tests: set fail_loc on all MDS nodes for pdir tests 45/37145/4
Sebastien Buisson [Mon, 6 Jan 2020 15:21:35 +0000 (00:21 +0900)]
LU-13097 tests: set fail_loc on all MDS nodes for pdir tests

Set fail_loc on all MDS nodes for pdir tests, not only $SINGLEMDS.

Test-Parameters: trivial testlist=sanityn,sanityn,sanityn,sanityn
Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn,sanityn
Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn,sanityn
Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn,sanityn
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I80d935eae0b06e39712abfe48a56b8ce08537926
Reviewed-on: https://review.whamcloud.com/37145
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-13004 ptlrpc: remove *GET*KVEC macros and fields. 72/36972/3
Mr NeilBrown [Wed, 4 Dec 2019 02:22:29 +0000 (13:22 +1100)]
LU-13004 ptlrpc: remove *GET*KVEC macros and fields.

GET_KVEC, BD_GET_KVEC, GET_ENC_KVEC, BD_GET_ENC_KVEC
are no longer used.
There are the only users of the bd_kvec field of bd_u,
so that field can be removed, and bd_u can be discarded
and its other field (bd_kiov) promoted.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I31d19f6867c907029aa8d9ceee27c5ac9c9225a1
Reviewed-on: https://review.whamcloud.com/36972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13293 llite: don't abort readahead too aggressively 97/37697/2
Wang Shilong [Tue, 25 Feb 2020 01:49:04 +0000 (09:49 +0800)]
LU-13293 llite: don't abort readahead too aggressively

We will stop readahead if we have covered @ria_end_idx_min
but lock contention case happen, this could cause a problem
with small read with SSF mode.

Because even lock contention happen, but every client could
potentially have a large consecutive pages to read, if this
is a 4K read, it will end up with a lot small read.

To fix this problem, at least allow readahead continue with
one stripe size, this is exact behavior before the commit

Without Patch:
Max Write: 13082.37 MiB/sec (13717.85 MB/sec)
Max Read:  854.17 MiB/sec (895.67 MB/sec)

With Patch:
Max Write: 12448.90 MiB/sec (13053.61 MB/sec)
Max Read: 23921.73 MiB/sec (25083.75 MB/sec)

Fixes: cfbeae9 ("LU-12043 llite: extend readahead locks for striped file")
Change-Id: I59963592f6dbe6babd746cd01441f4a99a8cafcb
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/37697
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13288 llite: Find account_page_dirtied on module init 86/37686/3
Shaun Tancheff [Mon, 24 Feb 2020 19:11:03 +0000 (13:11 -0600)]
LU-13288 llite: Find account_page_dirtied on module init

Kernel v5.2-5678-gac1c3e4 no longer exports account_page_dirtied
Use kallsyms_lookup_name to find and use it as
vvp_account_page_dirtied on module init to avoid any
performance regressions due to symbol_get.

Test-Parameters: clientdistro=el8.1
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie27abb07ffbf9e5be67fe64601ebc62409f829fd
Reviewed-on: https://review.whamcloud.com/37686
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups 87/37687/3
Alex Zhuravlev [Mon, 24 Feb 2020 03:57:24 +0000 (06:57 +0300)]
LU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups

as those need no IO to initialize buddy structures and the best
candidates for new blocks.

Fixes: 6a7a700a1490 ("LU-12988 ldiskfs: skip non-loaded groups at cr=0/1")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic3c5a238d8825024d7a0fec6a25e842b7ba1f100
Reviewed-on: https://review.whamcloud.com/37687
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13004 lustre: remove support for KVEC bulk descriptors 71/36971/3
Mr NeilBrown [Thu, 21 Nov 2019 04:05:56 +0000 (15:05 +1100)]
LU-13004 lustre: remove support for KVEC bulk descriptors

KVEC descriptors are no long used nor needed.
KIOV are sufficient for all needs.

This allows us to remove
  PTLRPC_BULK_BUF_KVEC
and
  PTLRPC_BULK_BUF_KIOV
flags - the distinction no longer exists.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic3a6ec942b60a05c7ce6c5b05659700e1399d0b9
Reviewed-on: https://review.whamcloud.com/36971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13026 lnet: MR selection of gateway ni 13/36913/6
Amir Shehata [Fri, 7 Feb 2020 19:16:23 +0000 (11:16 -0800)]
LU-13026 lnet: MR selection of gateway ni

Multi-Rail gateways can only have 1 route pointing to them. Use
the MR selection algorithm to get the best lpni on the MR
gateway to use.

Using the selection algorithm to find the gateway NI, allows us
to use the standard MR criteria: health, preference and credits.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I65526c6b64a5702734949c9a583a7558614ceae2
Reviewed-on: https://review.whamcloud.com/36913
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13025 lnet: pick healthiest peer net 12/36912/6
Amir Shehata [Wed, 20 Nov 2019 03:40:34 +0000 (19:40 -0800)]
LU-13025 lnet: pick healthiest peer net

When iterating over the peer nets, select the healthiest one.
Node might be able to reach a peer over multiple nets, and therefore
the health of these peer nets must be considered.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I155888dca358627fcb63c2ed0e51114bc49a9ff1
Reviewed-on: https://review.whamcloud.com/36912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12678 lnet: remove lnet_me_alloc/lnet_me_free 10/36910/5
Mr NeilBrown [Tue, 3 Dec 2019 23:35:02 +0000 (10:35 +1100)]
LU-12678 lnet: remove lnet_me_alloc/lnet_me_free

These functions are simple wrapper that do not benefit
readability, so remove them.

Move the DEBUJG messages to the places where allocation happens.  This
introduces only a tiny amount of code duplication.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ieb40f8a7547cba30b05c1e5e526c762e354f3c47
Reviewed-on: https://review.whamcloud.com/36910
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13040 lmv: Pool name string handling 08/36908/13
Shaun Tancheff [Thu, 20 Feb 2020 17:11:14 +0000 (11:11 -0600)]
LU-13040 lmv: Pool name string handling

KASAN picked up a case where pool_name may not be null
terminated.

There are a few cases where strncpy is used, however strncpy does
not guarantee the target string is null terminated. The preference
is to use strlcpy.

Cray-bug-id: LUS-8229
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I414e955b5964c24b061edd76ed9e64a8985c537d
Reviewed-on: https://review.whamcloud.com/36908
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12775 test: reorder 'tar' command options 07/36907/7
Lai Siyao [Tue, 3 Dec 2019 11:43:28 +0000 (19:43 +0800)]
LU-12775 test: reorder 'tar' command options

'tar' in RHEL8 is stricter in command option order.

Test-Parameters: trivial \
envdefinitions=ONLY="32c" \
clientdistro=el8.1 serverdistro=el8.1 \
mdscount=2 mdtcount=4 testlist=conf-sanity

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I814203808efae4a746166abd3ba08f2bc5fce8f7
Reviewed-on: https://review.whamcloud.com/36907
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12981 lnet: Check MTU accurately 96/36796/4
Alexey Lyashkov [Tue, 19 Nov 2019 14:00:42 +0000 (17:00 +0300)]
LU-12981 lnet: Check MTU accurately

The existing check for MTU lacks checking for the KIOV/IOV cases,
and false positive triggered for very large incomming
buffer.

Cray-bug-id: LUS-7948
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: Id3497e5f63470c24b2e51703fc564b02c9516aa6
Reviewed-on: https://review.whamcloud.com/36796
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12761: tests: make version_code() accept two number versions too 75/36275/7
Oleg Drokin [Mon, 23 Sep 2019 12:39:48 +0000 (08:39 -0400)]
LU-12761: tests: make version_code() accept two number versions too

There's now a user in sanity test 103a that calls version_code with
2.6.  Andreas rightfully points instead of fixing the caller we can
just update the code to accept this usage.

Change-Id: I5915cd08a36946c6d26f2e231aa7a820a3eef46a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36275
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-12780 osp: use native kthreads for opd_pre_thread 63/36263/4
Mr NeilBrown [Wed, 23 Oct 2019 00:30:50 +0000 (11:30 +1100)]
LU-12780 osp: use native kthreads for opd_pre_thread

rather than ptlrpc_thread, use native kthreads functionality.

- provide an opt_args structure which is allocated
  and initialized before the thread is started so errors
  cannot happen in the thread.
- include a completion to synchronize startup so we can be sure
  the thread function actually runs, and so will clean up.
- use kthread_stop and kthread_should_stop to
  synchronize shutdown.

The ptlrpc_thread was not used for signaling the thread about
work-to-do, so no change is needed there.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib0e2041da3fa4d613b17f743b18700c84a79fac2
Reviewed-on: https://review.whamcloud.com/36263
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9859 libcfs: simplify linux-prim.c 10/35410/6
NeilBrown [Thu, 12 Dec 2019 17:28:40 +0000 (12:28 -0500)]
LU-9859 libcfs: simplify linux-prim.c

cfs_block_sigs() is never used.
cfs_clear_sigpending() is only used in lustre_lib.h so move it
to that header. Based on

Linux-commit: 99c1ffc99a570c68cef906d9763edb47b316ea1a

Change-Id: Ia0d5ecb736c4107c5a7b666bda85714d6819fbca
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/35410
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10467 ptlrpc: convert waiting in ptlrpc_hr_main() 96/37696/4
James Simmons [Tue, 25 Feb 2020 13:41:05 +0000 (08:41 -0500)]
LU-10467 ptlrpc: convert waiting in ptlrpc_hr_main()

This is a basic conditional wait. Instead of using
l_wait_condition() use the linux standard wait_event_idle().

Change-Id: I5c81914de003468ac20b6c65f1b6bee2d4cf6891
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37696
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 osc: convert a while loop to for 06/37606/2
NeilBrown [Thu, 13 Dec 2018 00:00:38 +0000 (11:00 +1100)]
LU-9679 osc: convert a while loop to for

This loop uses 'continue' in several places,
and each one is proceeded by
   ext = next_extent(ext)
which also appears at the end.
This is exactly the pattern that a 'for' loop
simplifies.  So change to a for loop.

Linux-Commit 9083b739197b ("lustre: osc: convert a while loop to for")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Ie622690134f9b3ee829255bcf997d06289abd6e6
Reviewed-on: https://review.whamcloud.com/37606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 osc: centralize handling of PTLRPCD_SET 03/37603/2
NeilBrown [Thu, 20 Dec 2018 04:57:34 +0000 (15:57 +1100)]
LU-9679 osc: centralize handling of PTLRPCD_SET

Various places test if a given rqset is PTLRPCD_SET
and call either ptlrpcd_add_req() or ptlrpc_set_add_req()
depending on the result.

This can be unified by putting the test of PTLRPCD_SET in
ptlrpc_set_add_req(), and always calling that function.

This results in there being only one place that tests PTLRPCD_SET.

Linux-Commit: 6a587cd4c705 ("lustre: centralize handling of
PTLRPCD_SET")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I879aa9ebb7e841dc2d1240a32d1c5d07e582e0b2
Reviewed-on: https://review.whamcloud.com/37603
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-9679 osc: use overlapped() consistently. 02/37602/2
NeilBrown [Wed, 12 Dec 2018 07:20:36 +0000 (18:20 +1100)]
LU-9679 osc: use overlapped() consistently.

osc_extent_is_overlapped() open-codes exactly the test that
overlapped() performs.
So use overlapped() instead, to make the code more obviously
consistent.

Linux-Commit: 270995b08634 ("lustre: osc: use overlapped()
consistently.")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I3a3ed2ee04343a294ae94f205f5d12be98f99bf3
Reviewed-on: https://review.whamcloud.com/37602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-9679 osc: remove cl_io_cancel() 97/37597/3
NeilBrown [Mon, 17 Dec 2018 02:39:10 +0000 (13:39 +1100)]
LU-9679 osc: remove cl_io_cancel()

cl_io_cancel() is never used, so remove it and various
other things that it is the only user of.

Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I6cf9b53aa7fc3379e57fa0ac0ea236ccda4ff6b7
Reviewed-on: https://review.whamcloud.com/37597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 osc: use assert_spin_locked() 96/37596/2
NeilBrown [Wed, 12 Dec 2018 03:25:30 +0000 (14:25 +1100)]
LU-9679 osc: use assert_spin_locked()

assert_spin_locked() is preferred to spin_is_locked() for affirming
that a spinlock is locked.

__osc_extent_sanity_check() is only ever called with obj already
locked, so change the check into an assertion.

Linux-Commit: a12d8284b574 ("lustre: osc_cache: use
assert_spin_locked()")

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: Iaae6deb5af4dec4d31893749924f211ba0489c47
Reviewed-on: https://review.whamcloud.com/37596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9679 general: add missing spaces near punctuation 02/37402/5
Mr NeilBrown [Mon, 3 Feb 2020 03:24:19 +0000 (14:24 +1100)]
LU-9679 general: add missing spaces near punctuation

Many places in lustre fold a long string onto multiple lines, usually
at word breaks.  Sometimes the word-break has punctuation, such as
comma, colon, or period, but needs a space as well to be properly
readable.  Because the string is folded, the missing space isn't
immediately obvious in the code.

This patch adds those spaces after punctuation where is seems to be
needed, and joins the affected strings onto a single line, in accord
with current policy.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9e76778b1e9687bc26e85500006b4b9d9ae6f93a
Reviewed-on: https://review.whamcloud.com/37402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 years agoLU-12914 mdt: mdt_prep_ma_buf_from_rep() is called twice 11/36611/3
Bruno Faccini [Wed, 30 Oct 2019 10:38:48 +0000 (11:38 +0100)]
LU-12914 mdt: mdt_prep_ma_buf_from_rep() is called twice

In some rare cases (replay of file open with O_LOV_DELAY_CREATE
when object found dead on mdt) mdt_prep_ma_buf_from_rep() can
be called twice (in either mdt_reint_open() and mdt_open_by_fid())
during the same request handling.
So remove assert checking if LMV or LOV has already been found and
set in ma.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I78e0456ea59c37cab4276383c75c4fa5cc9f4829
Reviewed-on: https://review.whamcloud.com/36611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11668 mdd: use mdd_object_fid() instead of mdo2fid() 47/35047/5
Andreas Dilger [Wed, 21 Nov 2018 02:32:32 +0000 (19:32 -0700)]
LU-11668 mdd: use mdd_object_fid() instead of mdo2fid()

Both mdd_object_fid() and mdo2fid() helper functions are the same.
Replace mdo2fid() with the better-named mdd_object_fid(mdd_obj)
function everywhere.  Use mdd_obj_dev_name(mdd_obj) for console
error messages instead of mdd2obd_dev(mdd)->obd_name for consistency.

It would be nice to consistently use "mdd_obj" for objects (instead of
"o" or "mo" or "obj", ...) and "mdd" for devices (instead of "m"), but
that is too big to include in this patch.  Just replace them in the
few wrapper functions already affected by this patch.

Fix up whitespace and string formatting style in affected code.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6de748bada06f0f66123e4567115deb2633ebbe5
Reviewed-on: https://review.whamcloud.com/35047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12931 gnilnd: use time_after() to compare jiffies 02/36702/5
Andreas Dilger [Thu, 7 Nov 2019 06:33:55 +0000 (23:33 -0700)]
LU-12931 gnilnd: use time_after() to compare jiffies

Fix a potential bug in gnilnd it is directly comparing a timeout
against jiffies instead of using time_after() to handle jiffies wrap.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie4d190e9c04e807f2152b71dc28ef0b0463ebbe5
Reviewed-on: https://review.whamcloud.com/36702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13264 osc: ensure lu_ref work atomic from osc_lock_upcall() 29/37629/2
Bruno Faccini [Wed, 19 Feb 2020 16:25:26 +0000 (17:25 +0100)]
LU-13264 osc: ensure lu_ref work atomic from osc_lock_upcall()

Since osc_lock_upcall() uses per-cpu env via
cl_env_percpu_[get,put](), all undelying work must execute on the
same CPU, meaning that no sleep()/scheduling must occur.
This implies all lu_ref related work to no longer use lu_ref_add(),
which calls might_sleep() (likely to cause a
scheduling/cpu-switch...), but lu_ref_add_atomoc() instead.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ide33d4c415e9e382f0bc344e2114182a1f122de6
Reviewed-on: https://review.whamcloud.com/37629
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13263 osc: use LDLM_LOCK_RELEASE() if no lu_ref added 25/37625/2
Bruno Faccini [Wed, 19 Feb 2020 14:39:19 +0000 (15:39 +0100)]
LU-13263 osc: use LDLM_LOCK_RELEASE() if no lu_ref added

In osc_ldlm_glimpse_ast(), LDLM_LOCK_PUT() is used after
LDLM_LOCK_GET() when no lu_ref has been added.
This causes a LBUG when USE_LU_REF is configured, so
change LDLM_LOCK_PUT() to LDLM_LOCK_RELEASE().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Id522a02878f01ae565e6c2418fe8cd85c945dde9
Reviewed-on: https://review.whamcloud.com/37625
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13258 libcfs: make apply_workqueue_attrs() available for Lustre 13/37613/2
James Simmons [Mon, 17 Feb 2020 19:34:41 +0000 (14:34 -0500)]
LU-13258 libcfs: make apply_workqueue_attrs() available for Lustre

Currently Lustre work queues can run on any core which introduces
noise on the system. The linux kernel has a function called
apply_workqueue_attrs() which allows you to control which cores
a work queue will execute on. Manually export this function so
Lustre can use it.

Test-Parameters: trivial

Change-Id: I467539cb8def7029fa9dfff2386234de5e0fe133
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13004 target: take offset into account in tgt_send_buffer 71/37571/4
Mikhail Pershin [Fri, 14 Feb 2020 09:59:05 +0000 (12:59 +0300)]
LU-13004 target: take offset into account in tgt_send_buffer

While calculating amount of pages needed, take buffer offset into
account because it can be non-aligned after allocations in
out_read().

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib7c9b35d328d366a27cc553ffe2f2c5930949cf4
Reviewed-on: https://review.whamcloud.com/37571
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12251 tests: stop running sanity-flr for PPC 63/37563/3
James Nunez [Thu, 13 Feb 2020 20:29:50 +0000 (13:29 -0700)]
LU-12251 tests: stop running sanity-flr for PPC

Stop running all sanity-flr tests for PPC client
testing until we understand and the sanity-pfl test
suite to passes all testing for PPC clients.

Test-Parameters: trivial clientarch=ppc64 testlist=sanity-flr
Test-Parameters: testlist=sanity-flr

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Iee044e6995ed4f6cca5f6b7f92eee6b59cb7018b
Reviewed-on: https://review.whamcloud.com/37563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12198 libcfs: always copy ioctl header back to user 59/37559/3
Dominique Martinet [Thu, 13 Feb 2020 13:36:32 +0000 (13:36 +0000)]
LU-12198 libcfs: always copy ioctl header back to user

lnetctl_get_peer_list fills back the required size in header if the
given buffer was too small. Userspace needs the info back to grow
the buffer and try again.

Note we only replace err on failure if err was previously not set

Fixes: fba98579efc4 ("LU-6202 libcfs: replace libcfs_register_ioctl with a blocking notifier_chain")
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Change-Id: I2b6e319aceeb00d488572053d27023891afe1928
Reviewed-on: https://review.whamcloud.com/37559
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13225 utils: bash completion for lfs and lctl 83/37483/12
Andreas Dilger [Sat, 8 Feb 2020 08:25:29 +0000 (01:25 -0700)]
LU-13225 utils: bash completion for lfs and lctl

Add a bash completion for "lfs" and improve completion for "lctl".
Rename the "lctl" completion script to "lustre" since the two
commands share helper routines for fsnames, pools, etc. and install
"lfs" and "lctl" symlinks to the common command file.

The completion prints available sub-commands and their options,
and for some sub-commands it completes available arguments
(e.g. mount points, pool names, and MDT/OST names).

A couple of minor changes to "lfs" and "lctl" usage messages to make
the sub-command options easier to parse.  More needs to be done to
make all sub-commands have proper long options.

There is definitely a lot more that could be added to the completions,
but this is a good start and provides a framework for the future.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie989b2ef4c0d6d8565e5c6753205bb6ed83ebbe5
Reviewed-on: https://review.whamcloud.com/37483
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13133 tests: sanity-selinux test_21{a,b} sepol update 24/37224/3
Sebastien Buisson [Tue, 14 Jan 2020 11:51:55 +0000 (20:51 +0900)]
LU-13133 tests: sanity-selinux test_21{a,b} sepol update

We need to make sure MDS receives updated sepol info from MGS.
In case of combined MGT/MDT, directly setting fileset on the node
will mask llog-based info retrieval mechanism. So always use
'lctl set_param -P' to set sepol value.

Test-Parameters: trivial
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Test-Parameters: clientselinux testlist=sanity-selinux
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaf8ff13364b9ba5f5d8b733be0247d79e05a6b3d
Reviewed-on: https://review.whamcloud.com/37224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13071 lnet: reduce log severity for health events 02/37002/2
Amir Shehata [Thu, 12 Dec 2019 19:19:48 +0000 (11:19 -0800)]
LU-13071 lnet: reduce log severity for health events

No need to print an error when the health of an interface is
reduced. Changed it to debug level.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia60ade12efab732ea4b0388a3803976bf65938ab
Reviewed-on: https://review.whamcloud.com/37002
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-13004 osp: use correct page count in osp_prep_update_req 87/37587/3
Mr NeilBrown [Thu, 21 Nov 2019 03:53:59 +0000 (14:53 +1100)]
LU-13004 osp: use correct page count in osp_prep_update_req

A fix that went into patchset 3 of
 https://review.whamcloud.com/#/c/36828/3
disappeared in patchset 5.
We should restore it.

Specifically, 'page_count' should be a count of pages,
but it is currently a count of the bytes in all the pages.

Fixes: f32fbf189fab ("LU-13004 osp: use KIOV in osp_prep_update_req")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic8dcdac414d16b4f2c1c6e0367d496de7e0a8cff
Reviewed-on: https://review.whamcloud.com/37587
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12861 libcfs: Cleanup libcfs_debug_msg use of snprintf 01/36901/8
Shaun Tancheff [Fri, 31 Jan 2020 20:09:34 +0000 (14:09 -0600)]
LU-12861 libcfs: Cleanup libcfs_debug_msg use of snprintf

scnprintf returns the number of bytes written to the buffer.
snprintf returns the size of the buffer needed to satisfy
the request.

Prefer scnprintf when result is being used as the number
of bytes in a buffer.

Use snprintf when the result is used for sizing or resizing
a buffer.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8c4b8f7dcc081f8b9dfffc35059011172be2e091
Reviewed-on: https://review.whamcloud.com/36901
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12191 utils: Make "lctl list_param" read exact path under sysfs tree 52/36852/5
Sonia Sharma [Tue, 4 Feb 2020 18:34:02 +0000 (13:34 -0500)]
LU-12191 utils: Make "lctl list_param" read exact path under sysfs tree

"lctl list_param -R" currently checks for the param_name
in the path and reads the sysfs tree under that. But it can
give erroneous results in the following example -

For path like /sys/fs/lnet/net/o2ib1/ib0, command
"lctl list_param -R" doesn't go down the "net" tree
because it matches "net" with "lnet" and just stop
there.

This patch changes how param_name is checked for
in the path. Like in the above example, instead
of checking for "net", it should check for
"/net". So, this patch adds this change in
param_display() in lustre/utils/lustre_cfg.c

Change-Id: Ieb3ad0d1248eee2192246ff5c4d77a71d87dc446
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36852
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13005 lnet: eq: discard struct lnet_handle_eq 41/36841/5
Mr NeilBrown [Wed, 20 Nov 2019 00:16:34 +0000 (11:16 +1100)]
LU-13005 lnet: eq: discard struct lnet_handle_eq

The Portals API uses a cookie 'handle' to identify an EQ.  This is
appropriate for a user-space API for objects maintained by the kernel,
but it brings no value when the API client and implementation are both
in the kernel, as is the case with Lustre and LNet.

Instead of using a 'handle', a pointer to the 'struct lnet_eq` can be
used.  This object is not reference counted and is always freed
correctly, so there can be no case where the cookie becomes invalid
while it is still held.

So use 'struct lnet_eq *' directly instead of having indirection
through a 'struct lnet_handle_eq'.
Also:
 - have LNetEQAttach() return the pointer, using ERR_PTR() to return
   errors.
 - discard ln_eq_containers and don't store the me there-in.
   This means we don't free any eq that have not already been freed,
   but all eq that are allocated are properly freed, so that is not
   a problem.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0d6e5b654e39e749b39d46f68d0fb3e47a3256e9
Reviewed-on: https://review.whamcloud.com/36841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread 59/36259/6
Mr NeilBrown [Wed, 23 Oct 2019 00:30:49 +0000 (11:30 +1100)]
LU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread

Instead of ptlrpc_thread use more direct interfaces.

- Instead of waiting for thread startup to complete, perform
  the startup before starting the thread.
- as nothing waits for the thread to finish we cannot use
  kthread_should_stop().  Instead, set the task pointer
  sai_task to NULL when the thread is finishing up.
- As we don't use kthread_should_stop(), we can safely do cleanup
  in the thread, because it is sure to run.
- use wake_up_process to signal the thread that there
  is work to do.
- the wake_up that is currently at the end of sa_put() becomes
  a little more complicated and is move to after the one place
  where sa_put() is called.

Change-Id: If694dafc6864348fe5203a4935f4c128ce5ff255
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12780 llite: don't use ptlrpc_thread for sai_agl_thread 58/36258/6
Mr NeilBrown [Wed, 23 Oct 2019 00:30:48 +0000 (11:30 +1100)]
LU-12780 llite: don't use ptlrpc_thread for sai_agl_thread

Instead of ptlrpc_thread use native kthread functionality.

- instead of waiting for the thread to start-up, perform
  all early initialization before starting the thread,
  and cleanup happens after thread is stopped.
- use kthread_stop()/ kthread_should_stop() to negotiate
  shutdown.
- wake_up_process to tell the thread if there is more work
  to do.  The thread sets current->state to TASK_IDLE before
  checking, so that if it gets the wakeup, the 'schedule()'
  call won't block.
  We clear ->sai_agl_task under a spinlock, from which it is
  also woken, to avoid races.

Linux-commit c044fb0f835c

Change-Id: I73294dd2f28087f56c3463ecfad1a8b32a44b2d7
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36258
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10467 lfsck: use wait_event_idle() 10/37610/3
Mr NeilBrown [Mon, 17 Feb 2020 03:46:54 +0000 (14:46 +1100)]
LU-10467 lfsck: use wait_event_idle()

This l_wait_event() call is equivalent to the more standard
wait_event_idle().
So switch over to wait_event_idle().

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8e13360a40dd1eec740f597d649c0f230533eb3d
Reviewed-on: https://review.whamcloud.com/37610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10467 ldlm: use wait_event_idle() instead of l_wait_event 09/37609/2
Mr NeilBrown [Mon, 17 Feb 2020 03:45:31 +0000 (14:45 +1100)]
LU-10467 ldlm: use wait_event_idle() instead of l_wait_event

This l_wait_event() is equivalent to wait_event_idle() which is now
supported in lustre.  So switch over to it.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If1ee81a0d562516534665d049fb24c1f39b59b95
Reviewed-on: https://review.whamcloud.com/37609
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13254 mdt: clear mti_mdt in mdt_thread_info_fini() 92/37592/2
Mikhail Pershin [Sat, 15 Feb 2020 17:09:31 +0000 (20:09 +0300)]
LU-13254 mdt: clear mti_mdt in mdt_thread_info_fini()

Clear mti_mdt when finalizing mdt_thread_info to prevent
its reuse my other handler later. Usually that may happen
at mdt_lvbo_fill/update() which takes thread info as is,
without initialization because at this point it is not
clear was it already initialized or not. So mti_mdt may be
used there being initialized by some other handler from
different MDT or even with garbage at old pointer.
Meanwhile there is no need to use any mdt_thread_info values
like mti_mdt in mdt_lvbo_fill() because there is MDT device
taken from namespace and the only fields are used from
mdt_thread_info are temporary storage for FID and lu_buffer.

Patch zeros mti_mdt upon thread finalizing and removes also
usages of info->mti_mdt from mdt_lvbo_fill/update() replacing
that with MDT taken from lock namespace.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib350093f0b70c777932c056b34cb56a9702b650d
Reviewed-on: https://review.whamcloud.com/37592
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10467 mdc: change ssleep to msleep_interruptible 88/37488/3
James Simmons [Mon, 17 Feb 2020 16:31:47 +0000 (11:31 -0500)]
LU-10467 mdc: change ssleep to msleep_interruptible

During review of the mdc wait_idle* changes for mdc_getpage()
it was pointed out that the use of ssleep() prevents the code
from being interruptible. Change ssleep to msleep_interruptible()
to allow breaking out of the sleep if an application sends
and INTR signal.

Change-Id: I2fcb90ecdd6f2c4f2ee6fbc54d253622e8beee29
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37488
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12811 ptlrpc: pass buffer size to the swabbing functions 35/36435/9
Emoly Liu [Mon, 23 Dec 2019 02:32:31 +0000 (10:32 +0800)]
LU-12811 ptlrpc: pass buffer size to the swabbing functions

By adding a separate rmf_swab_len() function pointer to
req_msg_field, the buffer size can be passed to the swabbing
functions, e.g. lustre_swab_fiemap() in this patch, to avoid
invalid access, especially for small buffer.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I997e6a828f2f1cdfdb8a5fa241fa43ca0ae3677e
Reviewed-on: https://review.whamcloud.com/36435
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 years agoLU-11915 tests: add debugging to conf-sanity test_115 48/37548/6
Andreas Dilger [Wed, 12 Feb 2020 06:48:36 +0000 (23:48 -0700)]
LU-11915 tests: add debugging to conf-sanity test_115

After updating the e2fsprogs build version to 1.45.2.wc2, this
test is not longer being skipped, and is failing to mount the
newly-formatted OST0000 due to errors registering with the MGS
(target index already in use).  Since the MDS+MGS was just
reformatted, that doesn't make sense.

Continue to skip this test until we understand why it is failing,
but use ALWAYS_EXCEPT instead of blaming the e2fsprogs version.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59c689763481c4fc3677ca1807101de09599bb77
Reviewed-on: https://review.whamcloud.com/37548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10073 tests: skip test smoke for PPC 50/37450/2
James Nunez [Wed, 5 Feb 2020 22:49:38 +0000 (15:49 -0700)]
LU-10073 tests: skip test smoke for PPC

The lnet-selftest test smoke fails consistently for
PPC client testing.  Thus, stop running this test until
we understand the failure; add smoke to the ALWAYS_EXCEPT
list.

Test-Parameters: trivial
Test-Parameters: clientarch=ppc64 testlist=lnet-selftest
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I090ec05d7ad934bb4c68e976572adb29eb29a676
Reviewed-on: https://review.whamcloud.com/37450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13186 tests: stop running tests for PPC clients 97/37397/10
James Nunez [Sun, 2 Feb 2020 01:24:56 +0000 (18:24 -0700)]
LU-13186 tests: stop running tests for PPC clients

Stop running tests, put on the ALWAYS_EXCEPT list, that fail
consistently when testing PPC clients including:

sanity-hsm tests
(LU-12251) 1a 1b 1d 1e 12c 12f 12g 12h 12m 12n 12o 12p 12q
21 22 23 24a 24b 24d 24e 24f 25b 30c 37 57 58 90 110b 111b
113 222b 222d 228 260a 260b 260c
(LU-12252) 220A 220a 221 222a 222c 223a 223b 224A 224a 226
227 600 601 602 603 604 605

sanity-pfl tests
(LU-13186) 14
(LU-13205) 16a
(LU-13207) 16b
(LU-13215) 17

Test-Parameters: trivial
Test-Parameters: clientarch=ppc64 testlist=sanity-pfl,sanity-hsm
Test-Parameters: clientarch=ppc64 testlist=sanity-pfl,sanity-hsm
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I847a8121d2675b9671bc9a39c4f6ccff67b208fa
Reviewed-on: https://review.whamcloud.com/37397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-13226 ldiskfs: Add support for Ubuntu eoan 5.3 54/37554/2
Shaun Tancheff [Wed, 12 Feb 2020 20:19:09 +0000 (14:19 -0600)]
LU-13226 ldiskfs: Add support for Ubuntu eoan 5.3

Ubuntu eoan kernel is close enough to 5.4.7+ mainline to
use the patch series directly.
Update the configure script to select it.

Test-Parameters: trivial
Cray-bug-id: LUS-8485
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iadb9b87a153a88846399d91699c972c72a5e1e7a
Reviewed-on: https://review.whamcloud.com/37554
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>