Whamcloud - gitweb
fs/lustre-release.git
5 months agoLU-14095 gss: use hlist_unhashed() instead of ->next 14/40514/4
Sebastien Buisson [Mon, 2 Nov 2020 14:27:00 +0000 (23:27 +0900)]
LU-14095 gss: use hlist_unhashed() instead of ->next

In cache_detail list-mutation primitives, verifying the status of an
entry must be done using hlist_unhashed(), in case 'struct cache_head'
has a 'cache_list' field.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1410eca9a647b74127cf40b8f3d6b68d055f773a
Reviewed-on: https://review.whamcloud.com/40514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-12960 lod: don't set index for 2nd stripe if specific 35/36735/4
Lai Siyao [Sat, 7 Sep 2019 09:38:46 +0000 (17:38 +0800)]
LU-12960 lod: don't set index for 2nd stripe if specific

When MDTs are specific, don't set index for the second stripe
allocation, otherwise it's not created on specific MDT.

Add sanity 31q.

Fixes: c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I415be0fa7c6e702ec72dcaac88bba55290463d44
Reviewed-on: https://review.whamcloud.com/36735
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13571 lnd: Use NETWORK_TIMEOUT for some conn failures 00/39900/15
Chris Horn [Fri, 11 Sep 2020 19:13:32 +0000 (14:13 -0500)]
LU-13571 lnd: Use NETWORK_TIMEOUT for some conn failures

For -EHOSTUNREACH and -ETIMEDOUT we cannot tell whether the
connnection failure was due to a problem with the remote or local NI,
so we should return the LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet in
these cases.

HPE-bug-id: LUS-9342
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9036f5698061ce74fe29683b8249b8f9a05f3433
Reviewed-on: https://review.whamcloud.com/39900
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13571 lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue 99/39899/15
Chris Horn [Fri, 11 Sep 2020 18:42:42 +0000 (13:42 -0500)]
LU-13571 lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue

TXs on the ibp_tx_queue are waiting for a connection to be
established. Failure to establish a connection could be due to a
problem with either the local NI or the remote NI, and o2iblnd cannot
currently distinguish between these failures. As such, it should
return LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet so that LNet will
decrement the health value of both the local NI and the remote NI and
future sends can take these health values into account.

Test-Parameters: trivial
HPE-bug-id: LUS-9342
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Idbbbe95483d25ec48b83e33a00685f72fa5292e6
Reviewed-on: https://review.whamcloud.com/39899
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12564 libcfs: Use vfree_atomic instead of vfree 36/40136/3
Oleg Drokin [Thu, 1 Oct 2020 05:51:55 +0000 (01:51 -0400)]
LU-12564 libcfs: Use vfree_atomic instead of vfree

Since vfree is unsafe to use in atomic context, implement our own
libcfs_vfree_atomic heavily based on code from linux 4.10 commit
bf22e37a641327e34681b7b6959d9646e3886770

We can't use the one in the kernel because it's not exported.

unconditionally use it in *_FREE_LARGE() macros since in_atomic()
is not recommended to be used outside of core kernel code.

Not everything is present on 3.10 (rhel7) so we also add
llist primitive and a replacement for raw_cpu_ptr there.

Change-Id: I50892f231e54a284f4d8a14d910ea9ab2fbe6a16
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40136
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-14024 ofd: Avoid use after free in ofd_inconsistency_verification_main 22/40222/2
Oleg Drokin [Mon, 12 Oct 2020 20:12:55 +0000 (16:12 -0400)]
LU-14024 ofd: Avoid use after free in ofd_inconsistency_verification_main

The ofd_inconsistency_lock should not be unlocked after we woken up
a different thread that is going to free the structure containing
said lock.

Change-Id: I913e7470664e1128a250597b0a803f791d99099e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40222
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
5 months agoLU-14072 llite: fix client evicition with DIO 89/40389/3
Wang Shilong [Sat, 24 Oct 2020 01:47:23 +0000 (09:47 +0800)]
LU-14072 llite: fix client evicition with DIO

We set lockless in file open if O_DIRECT flag is passed,
however O_DIRECT flag could be cleared by
fcntl(..., F_SETFL, ...).

Finally we comes to a case where buffer IO without lock
held properly, and hit hang:

[<ffffffffc0d421ed>] osc_extent_wait+0x21d/0x7c0 [osc]
[<ffffffffc0d44897>] osc_cache_wait_range+0x2e7/0x940 [osc]
[<ffffffffc0d4585e>] osc_cache_writeback_range+0x96e/0xff0 [osc]
[<ffffffffc0d31c45>] osc_lock_flush+0x195/0x290 [osc]
[<ffffffffc0d31d7c>] osc_lock_lockless_cancel+0x3c/0xe0 [osc]
[<ffffffffc081f488>] cl_lock_cancel+0x78/0x160 [obdclass]
[<ffffffffc0cd8079>] lov_lock_cancel+0x99/0x190 [lov]
[<ffffffffc081f488>] cl_lock_cancel+0x78/0x160 [obdclass]
[<ffffffffc081f9a2>] cl_lock_release+0x52/0x140 [obdclass]
[<ffffffffc08238a9>] cl_io_unlock+0x139/0x290 [obdclass]
[<ffffffffc08242e8>] cl_io_loop+0xb8/0x200 [obdclass]
[<ffffffffc0e1d36b>] ll_file_io_generic+0x91b/0xdf0 [lustre]
[<ffffffffc0e1dd0c>] ll_file_aio_write+0x29c/0x6e0 [lustre]
[<ffffffffc0e1e250>] ll_file_write+0x100/0x1c0 [lustre]
[<ffffffffa984aa90>] vfs_write+0xc0/0x1f0
[<ffffffffa984b8af>] SyS_write+0x7f/0xf0
[<ffffffffa9d8eede>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff

Lock cancel time out in the server side and client
eviction happen.

Fix this problem by testing O_DIRECT flag to decide if
we could issue lockless IO.

Fixes: 6bce536725 ("LU-4198 clio: turn on lockless for some kind of IO")
Change-Id: Idbf1c748684a6540aee5f6e35c017929fbcc60b9
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/40389
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14071 doc: lfs setquota/quota doc for PQ 84/40384/3
Sergey Cheremencev [Fri, 23 Oct 2020 13:43:07 +0000 (16:43 +0300)]
LU-14071 doc: lfs setquota/quota doc for PQ

Add description of key --pool for lfs setquota
and lfs quota.

Change-Id: Ie6bfb3240f7eb8b4239d95aac29989103f009578
Test-Parameters: trivial
HPE-bug-id: LUS-9341
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/40384
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-14036 build: fix lbuild for MOFED 5.1 54/40254/7
Minh Diep [Wed, 14 Oct 2020 23:57:51 +0000 (16:57 -0700)]
LU-14036 build: fix lbuild for MOFED 5.1

Starting MOFED 5.1, rdma-core is required for libib*mad

Test-Parameters: trivial

Change-Id: Id26f3cdb0552933577e1b27384ac82f9f48e2b3a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-14054 utils: add --exit-on-close to ofd_access_log_reader 12/40312/6
John L. Hammond [Wed, 30 Sep 2020 19:37:26 +0000 (14:37 -0500)]
LU-14054 utils: add --exit-on-close to ofd_access_log_reader

Add an --exit-on-close (-e), option to ofd_access_log_reader.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ifded63318026b0ad3f9f077dc74008469df877d9
Reviewed-on: https://review.whamcloud.com/40312
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-13153 lib: fix llapi_get_version_string 80/40680/3
Mr NeilBrown [Wed, 18 Nov 2020 00:52:15 +0000 (11:52 +1100)]
LU-13153 lib: fix llapi_get_version_string

llapi_get_version_string() has always been broken as it passes args to
get_lustre_param_value in the wrong order.

Test-Parameters: trivial
Fixes: 0c5fbd80f1ba ("LU-5969 lustreapi: replace llapi_get_version()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib4bdd715295f97fe5b7080f42bfa06883f191234
Reviewed-on: https://review.whamcloud.com/40680
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-14125 osc: prevent overflow of o_dropped 59/40659/3
Olaf Faaland [Wed, 11 Nov 2020 22:38:25 +0000 (14:38 -0800)]
LU-14125 osc: prevent overflow of o_dropped

In osc_announce_cached(), prevent o_dropped from overflowing.
Necessary because o_dropped AKA o_misc is 32 bits, but cl_lost_grant
is 64 bits.

Add a CDEBUG call so we can tell whether this happened.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Ia459934c789ae9609f851ae0a2581de583c6fc1c
Reviewed-on: https://review.whamcloud.com/40659
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 months agoLU-14136 utils: correct error message in lfs mirror extend 31/40631/2
John L. Hammond [Thu, 12 Nov 2020 16:13:33 +0000 (10:13 -0600)]
LU-14136 utils: correct error message in lfs mirror extend

In migrate_open_files(), use llapi_layout_file_open() instead of
lfs_component_create() to avoid printing a confusing error message and
to ensure that the type of error is correctly propagated.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I31ebf29771b1a3e0b106a0a246a260a82eed92a4
Reviewed-on: https://review.whamcloud.com/40631
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-14131 osd-ldiskfs: reduce credits for overwritting 04/40604/2
Wang Shilong [Wed, 11 Nov 2020 06:51:09 +0000 (14:51 +0800)]
LU-14131 osd-ldiskfs: reduce credits for overwritting

If all blocks are mapped which means this is overwritting
case or space has been allocated by fallocate.

There is no need to modify exten tree, and we only
need 1 credits for inode.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I907861d4862256a8a23a81812953e2330e1d9925
Reviewed-on: https://review.whamcloud.com/40604
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13390 tests: add debugging to sanity test_65n 00/40600/3
Andreas Dilger [Wed, 11 Nov 2020 00:56:38 +0000 (17:56 -0700)]
LU-13390 tests: add debugging to sanity test_65n

Print out the layouts used in comparisons in case of error, so that
it is possible to debug intermittent test failures.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib81b10ff704998518275f737f028fb16391654ed
Reviewed-on: https://review.whamcloud.com/40600
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14129 kernel: kernel update RHEL7.9 [3.10.0-1160.6.1.el7] 94/40594/2
Jian Yu [Tue, 10 Nov 2020 18:18:44 +0000 (10:18 -0800)]
LU-14129 kernel: kernel update RHEL7.9 [3.10.0-1160.6.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.6.1.el7.

Test-Parameters: clientdistro=el7.9 serverdistro=el7.9

Change-Id: If2d2120082965de67a3b29ade3e4d24a4221b2c2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-930 utils: document 'lfs getstripe -N' option 45/40545/3
Andreas Dilger [Thu, 5 Nov 2020 07:40:47 +0000 (00:40 -0700)]
LU-930 utils: document 'lfs getstripe -N' option

Add the '--mirror-count|-N' option to the usage message for
"lfs getstripe" and its man page.

Fixes: 818340364d51 ("LU-11124 utils: add 'lfs getstripe -N' option")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I89ffb2661a48336c37b8bdd784e5f9e942cbd798
Reviewed-on: https://review.whamcloud.com/40545
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13783 build: Fix LB_LINUX_VERSION rule for v5.8 71/40371/3
Mr NeilBrown [Fri, 23 Oct 2020 04:07:36 +0000 (15:07 +1100)]
LU-13783 build: Fix LB_LINUX_VERSION rule for v5.8

Since Commit 20b1be595282 ("kbuild: fix single target builds for
external modules") in v5.8-rc7, the LB_LINUX_VERSION autoconf
rule doesn't work.

I don't know exactly why, but it can be fixed by setting "makerule" to
an empty string.

Passing the path to the directory in $makerule is unnecessary as
LB_LINUX_COMPILE_IFELSE, which LB_LINUX_TRY_MAKE eventually calls,
passes "$MODULE_TARGET=$PWD/build" which has the required effect.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2b381d3546aaa0f365328a1319b2d4f145f33eeb
Reviewed-on: https://review.whamcloud.com/40371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13990 ldlm: ldlm_flock_deadlock() ASSERTION(req != lock) failed 47/40047/4
Andriy Skulysh [Tue, 19 May 2020 12:10:52 +0000 (15:10 +0300)]
LU-13990 ldlm: ldlm_flock_deadlock() ASSERTION(req != lock) failed

A client gets evicted and reconnects, so there can be a window
with 2 exports with flocks from the same client.
In this case during deadlock search from the new export we can go
to the old export and back to the new one.

Failed exports should be excluded from
deadlock search.

Change-Id: I9dec50d4c6694bbbcf13b976b5ebdc29377261ce
HPE-bug-id: LUS-8635
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/40047
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13987 ldlm: Don't re-enqueue glimpse lock on read 44/40044/3
Andriy Skulysh [Wed, 1 Apr 2020 18:20:58 +0000 (21:20 +0300)]
LU-13987 ldlm: Don't re-enqueue glimpse lock on read

cl_glimpse_lock() doesn't match a lock with LDLM_FL_BL_AST
even if this lock is acquired by the same thread earlier.

It needs only size to check for spare file,
so let't add LDLM_FL_CBPENDING to match flags.

 #1 [ffff9ba7326036f0] schedule at ffffffff87b67c49
 #2 [ffff9ba732603700] obd_get_request_slot at ffffffffc0dbe0a4 [obdclass]
 #3 [ffff9ba7326037b8] ldlm_cli_enqueue at ffffffffc0faedce [ptlrpc]
 #4 [ffff9ba732603878] mdc_enqueue_send at ffffffffc11b38a8 [mdc]
 #5 [ffff9ba732603938] mdc_lock_enqueue at ffffffffc11b3eb2 [mdc]
 #6 [ffff9ba7326039a8] cl_lock_enqueue at ffffffffc0dfee95 [obdclass]
 #7 [ffff9ba7326039e0] lov_lock_enqueue at ffffffffc10ef265 [lov]
 #8 [ffff9ba732603a20] cl_lock_enqueue at ffffffffc0dfee95 [obdclass]
 #9 [ffff9ba732603a58] cl_lock_request at ffffffffc0dff54b [obdclass]

HPE-bug-id: LUS-8690
Change-Id: I4c3820f754ceb502079bdd7d8e1a5389f6696eba
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/40044
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13891 utils: fix memory leak in llapi_ladvise() 10/39610/6
Jean-Yves VET [Fri, 7 Aug 2020 13:49:36 +0000 (15:49 +0200)]
LU-13891 utils: fix memory leak in llapi_ladvise()

A buffer allocated in llapi_ladvise() for ioctl() is never
released. This patch ensures the buffer is properly freed.

Fixes: e14246641c04 ("LU-4931 ladvise: Add feature of giving file access advices")
Signed-off-by: Jean-Yves VET <jyvet@ddn.com>
Change-Id: I0761e161074ae3029218473ec951670fdbbd33bd
Reviewed-on: https://review.whamcloud.com/39610
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-13837 lnet: Introduce constant for net ID of LNET_NID_ANY 44/39544/4
Chris Horn [Thu, 30 Jul 2020 16:29:30 +0000 (11:29 -0500)]
LU-13837 lnet: Introduce constant for net ID of LNET_NID_ANY

This patch adds a new constant, LNET_NET_ANY, to represent the net
ID of the LNET_NID_ANY wildcard NID.

Test-Parameters: trivial
HPE-bug-id: LUS-9122
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4b4d9e70ba2826843c6585ad5a9e365799face65
Reviewed-on: https://review.whamcloud.com/39544
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
5 months agoLU-13817 quota: print error when pool is absent 05/39505/8
Sergey Cheremencev [Fri, 24 Jul 2020 10:20:17 +0000 (13:20 +0300)]
LU-13817 quota: print error when pool is absent

Print error for "lfs quota --pool" and
"lfs setquota -o" if pool is absent.

HPE-bug-id: LUS-9112
Test-Parameters: trivial testlist=sanity-quota
Change-Id: I4c2aa41d8c6f27742e68dfdc78c9d9365c760237
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39505
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13746 utils: check argument logname presence in llog_print 63/39263/3
Etienne AUJAMES [Fri, 3 Jul 2020 08:17:29 +0000 (10:17 +0200)]
LU-13746 utils: check argument logname presence in llog_print

Correction of segfault in llog_print when no logname specified.
example: lctl --device MGS llog_print

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I03a6c08dfc73ff8cb5861d162e21ce5aa581e197
Reviewed-on: https://review.whamcloud.com/39263
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
5 months agoLU-13159 osd: osd-zfs to release index back structures 93/37293/12
Alex Zhuravlev [Mon, 20 Jan 2020 17:12:58 +0000 (20:12 +0300)]
LU-13159 osd: osd-zfs to release index back structures

otherwise those can be leaked in case of failed mount

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7dc848a8397c11b1f56d36ef0a7155314ca9afc2
Reviewed-on: https://review.whamcloud.com/37293
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-14125 obdclass: add grant fields to export procfile 63/40563/4
Olaf Faaland [Thu, 2 Jul 2020 21:25:32 +0000 (14:25 -0700)]
LU-14125 obdclass: add grant fields to export procfile

Add ted_{grant,reserved,dirty} to the export
procfile for OSTs, to allow comparison between the
OST's idea of grants allocated to the client with
the client's idea.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Ib34582e2be55fe2007363b52cea4dee211b7f540
Reviewed-on: https://review.whamcloud.com/40563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14031 tests: check client reconnect 97/40297/7
Alexander Boyko [Mon, 19 Oct 2020 13:07:10 +0000 (09:07 -0400)]
LU-14031 tests: check client reconnect

The patch adds test 147 for recovery-small. It checks how often
client reconnects to a server with obd_timeout=200.

HPE-bug-id: LUS-8520
Test-Parameters: trivial testlist=recovery-small env=ONLY=147
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id5bd732fe4d949cfa45ea0272f197809cca3290d
Reviewed-on: https://review.whamcloud.com/40297
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13571 lnet: Correct handling of NETWORK_TIMEOUT status 98/39898/14
Chris Horn [Fri, 11 Sep 2020 18:41:39 +0000 (13:41 -0500)]
LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status

The original intent of the LNET_MSG_STATUS_NETWORK_TIMEOUT health
status was to handle cases where the LND was unsure whether the
failure was due to the local or remote NI. In this case, we'll want
to decrement both the local and remote NI health and allow recovery
to ascertain which interface is actually healthy.

Test-Parameters: trivial
HPE-bug-id: LUS-9342
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib00ac260640100123e4e97e9c566289e92fb0b6e
Reviewed-on: https://review.whamcloud.com/39898
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14027 ldlm: Do not hang if recovery restarted during lock replay 38/40238/4
Oleg Drokin [Wed, 14 Oct 2020 03:55:02 +0000 (23:55 -0400)]
LU-14027 ldlm: Do not hang if recovery restarted during lock replay

LU-13600 introduced lock ratelimiting logic, but it did not take
into account that if there's a disconnection in the REPLAY_LOCKS
phase then yet unsent locks get stuck in the sending queue so
the replay locks thread hangs with imp_replay_inflight elevated
above zero.

The direct consequence from that is recovery state machine never
advances from REPLAY to REPLAY_LOCKS status when imp_replay_inflight
is non zero.

Adjust __ldlm_replay_locks() to check if the import state changed
before attempting to send any more requests.

Add a testcase.

Change-Id: Idbaf5461f33d1884088269d67d01071c7e1bf8a5
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Fixes: 3b613a442b ("LU-13600 ptlrpc: limit rate of lock replays")
Reviewed-on: https://review.whamcloud.com/40238
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-13195 obdclass: show FID for corrupted llog 77/38977/2
Alex Zhuravlev [Thu, 18 Jun 2020 07:00:26 +0000 (10:00 +0300)]
LU-13195 obdclass: show FID for corrupted llog

to be able to remove that easily.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iefead860b5e60e74e0eb445f715508a3b01fac87
Reviewed-on: https://review.whamcloud.com/38977
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14053 utils: detect hangup in ofd_access_log_reader 11/40311/7
John L. Hammond [Thu, 24 Sep 2020 21:05:39 +0000 (16:05 -0500)]
LU-14053 utils: detect hangup in ofd_access_log_reader

In ofd_access_log_reader, when the batch file is a pipe or socket,
detect hangups and exit.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: If7509174b992011305d8e4a7aa2766a3a1980831
Reviewed-on: https://review.whamcloud.com/40311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14052 ofd: support for multiple access readers 06/39906/15
Alex Zhuravlev [Tue, 15 Sep 2020 12:42:17 +0000 (15:42 +0300)]
LU-14052 ofd: support for multiple access readers

ofd_access_log_reader can be passed -I, --mdt-index-filter=INDEX to
print only FIDs located on INDEX.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9a4f09c6b7ca15623d459df17939895301a57a8b
Reviewed-on: https://review.whamcloud.com/39906
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14009 osd: missing ldiskfs_htree_unlock() 37/40137/4
Alex Zhuravlev [Sun, 4 Oct 2020 10:41:26 +0000 (13:41 +0300)]
LU-14009 osd: missing ldiskfs_htree_unlock()

in osd_ldiskfs_it_fill()

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5d2242e0864cbaa72af096b263d8758966a6be22
Reviewed-on: https://review.whamcloud.com/40137
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14027 ldlm: Do not wait for lock replay sending if import dsconnected 72/40272/4
Oleg Drokin [Fri, 16 Oct 2020 14:25:58 +0000 (10:25 -0400)]
LU-14027 ldlm: Do not wait for lock replay sending if import dsconnected

If import disconnected while we were preparing to send some lock replays
the sending RPC would get stuck on the sending list and would keep
the reconnected import state from progressing from REPLAY
to REPLAY_LOCKS state waiting for the queued replay RPCs to finish.

Set them as no_delay to ensure they don't wait.

LU-13600 exacerbated this issue some but it certainly exist
before it as well.

Change-Id: Id276a0be7657d9ad6cf40ad8e7a165d5cd341cb8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40272
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
5 months agoLU-14061 utils: prefer mounting with specified fstype 74/40474/3
Andreas Dilger [Thu, 29 Oct 2020 20:47:40 +0000 (14:47 -0600)]
LU-14061 utils: prefer mounting with specified fstype

If the server filesystem is mounted with "-t lustre" use that type
for mounting instead of automatically selecting "lustre_tgt" for
server mounts, so that it is appearing in /proc/mounts correctly,
and "mount -t lustre" will list all of these filesystems.

Only if "mount -t lustre_tgt" is used will mount.lustre_tgt be called,
and that will internally use "lustre_tgt" as the filesystem type, and
only fall back to type "lustre" if that does not work.  This will give
userspace plenty of time to transition to using "lustre_tgt" for
server mountpoints.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifd7560c800acdfe14c9564bbf955ecad1224f2e3
Reviewed-on: https://review.whamcloud.com/40474
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12632 hsm: wait longer in sanity-hsm test_90() 87/40387/6
John L. Hammond [Wed, 28 Oct 2020 15:30:04 +0000 (10:30 -0500)]
LU-12632 hsm: wait longer in sanity-hsm test_90()

Due to the huge (51) number of HSM requests to be completed we must
wait longer for this test to pass when run using an advanced (zfs)
backend filesystem.

Test-Parameters: trivial fstype=zfs env=ONLY=90 testlist=sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ia20086bf5c072c5d120eed5b0937d37d7b4342db
Reviewed-on: https://review.whamcloud.com/40387
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14109 doc: lnetctl manpage missing import and export options 23/40523/3
Olaf Faaland [Tue, 3 Nov 2020 01:38:49 +0000 (17:38 -0800)]
LU-14109 doc: lnetctl manpage missing import and export options

Describe missing options in the lnetctl man page:
lnetctl import "--exec"
lnetctl export "--backup"

Test-Parameters: trivial
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I47e6c5e264dc8d5673f1229291be873996c02f55
Reviewed-on: https://review.whamcloud.com/40523
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14102 tests: add a pause in open-close in sanity/160l 08/40508/3
Alex Zhuravlev [Sun, 1 Nov 2020 11:04:54 +0000 (14:04 +0300)]
LU-14102 tests: add a pause in open-close in sanity/160l

so that close has to update mtime and generate CL_MTIME
record in the changelog.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I118172229c86ed5c201299de7476678689bf4cab
Reviewed-on: https://review.whamcloud.com/40508
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13683 lfs: return -ENOENT when invoked on non-existing file 53/38953/6
Sebastien Piechurski [Tue, 16 Jun 2020 16:14:55 +0000 (18:14 +0200)]
LU-13683 lfs: return -ENOENT when invoked on non-existing file

Since merge of LU-11510, lfs migrate on a non-existing file will give
the following error "lfs migrate: can't create composite layout from
file /some/path/to/file" and will exit with code 0, potentially
leaving a calling script unaware of the error.

This patch fixes it by using errno which is set in the call to
llapi_layout_get_by_path()

Signed-off-by: Sebastien Piechurski <sebastien.piechurski@atos.net>
Change-Id: I910eae78445f6071ff4e741afd43d140f554ab22
Fixes: 8bedfa377fbd ("LU-11510 lfs: migrate a composite layout file correctly")
Reviewed-on: https://review.whamcloud.com/38953
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-14115 lod: fix to set inherit flag for stripe directroy 43/40543/3
Wang Shilong [Thu, 5 Nov 2020 02:05:14 +0000 (10:05 +0800)]
LU-14115 lod: fix to set inherit flag for stripe directroy

lod_xattr_set_lmv() is called to set attributes from parent
to all stripes of the directroy, however, flags are ignored
which will cause project inherit bits missed for stripes.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I10a2de303b4b8430e560752cf0d66466c93616a4
Reviewed-on: https://review.whamcloud.com/40543
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12956 ldlm: fix hrtimer using 13/40513/3
Alexander Boyko [Mon, 2 Nov 2020 11:02:47 +0000 (06:02 -0500)]
LU-12956 ldlm: fix hrtimer using

A race could happen between hrtimer_start() and
hrtimer_expires_remaning(), cause the second one doesn't hold a lock
on timer->base. And a first one could change it between different CPU.
The following failure happened:
BUG: unable to handle kernel NULL pointer dereference at 000000000028
IP: [<ffffffffc0fc773f>] target_handle_connect+0x12ff/0x2b50 [ptlrpc]
at remaining = hrtimer_expires_remaining(timer), timer->base was NULL

The fix changes hrtimer_expires_remaining() to hrtimer_get_remaining()
which helds a lock and prevents race.

Fixes: 9334f1d51249 ("LU-11771 ldlm: use hrtimer for recovery to fix timeout messages")
HPE-bug-id: LUS-9514
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2cea1e5e2d523f131f1acb3346cf0324adae624e
Reviewed-on: https://review.whamcloud.com/40513
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14077 kernel: kernel update SLES15 SP1 [4.12.14-197.64.1] 77/40477/3
Jian Yu [Tue, 3 Nov 2020 07:20:01 +0000 (23:20 -0800)]
LU-14077 kernel: kernel update SLES15 SP1 [4.12.14-197.64.1]

Update SLES15 SP1 kernel to 4.12.14-197.64.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="56oc 100 130 136 817" \
clientdistro=sles15sp1 serverdistro=el7.8 \
testlist=sanity

Change-Id: If2d8ecd89b307c0e49cf39a7ef33c298a2690f27
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40477
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14083 build: Don't overwrite KBUILD_EXTRA_SYMBOLS 35/40435/3
Chris Horn [Thu, 15 Oct 2020 15:40:04 +0000 (10:40 -0500)]
LU-14083 build: Don't overwrite KBUILD_EXTRA_SYMBOLS

The gnilnd requires that KBUILD_EXTRA_SYMBOLS be populated with the
symbols for some of its dependencies. Don't overwrite
KBUILD_EXTRA_SYMBOLS for the kernel tests.

Test-Parameters: trivial
HPE-bug-id: LUS-9448
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie10d3be0078e163f3e36f1c880d87ebc185c49c7
Reviewed-on: https://review.whamcloud.com/40435
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13988 mdt: ASSERTION(!lustre_handle_is_used(&lh->mlh_reg_lh)) failed 45/40045/3
Andriy Skulysh [Tue, 21 Apr 2020 10:13:15 +0000 (13:13 +0300)]
LU-13988 mdt: ASSERTION(!lustre_handle_is_used(&lh->mlh_reg_lh)) failed

mdt_brw_enqueue() should clear lock handle on error.

HPE-bug-id: LUS-8740
Change-Id: Ic2201189af6bcb2b1a114a599e138f9c22f3d99d
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/40045
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13822 ptlrpc: fixes for RCU-related stalls 14/39514/4
Andrew Perepechko [Mon, 27 Jul 2020 09:31:46 +0000 (12:31 +0300)]
LU-13822 ptlrpc: fixes for RCU-related stalls

ptlrpc_expired_set() may need to process a lot
of requests, so the processing loop needs to
schedule from time to time to avoid RCU-related
stalls.

Change-Id: I14b0aaf14ab127805699729adbb26459a2f4f07c
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-8939
Reviewed-on: https://review.whamcloud.com/39514
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13709 tests: test lfs mkdir -c without -i 57/39457/5
Olaf Faaland [Thu, 16 Jul 2020 22:50:29 +0000 (15:50 -0700)]
LU-13709 tests: test lfs mkdir -c without -i

Almost every test with lfs mkdir -c in the test suite also
uses option -i, so lfs mkdir -c (same as -i -1, where lustre
chooses the MDTs) is poorly tested.  Add a test for that
case, sanity test_300s.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Iede537d52cf445c9c9a6353338670e55a11364da
Reviewed-on: https://review.whamcloud.com/39457
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11290 ldlm: page discard speedup 27/39327/10
Alexander Zarochentsev [Sun, 31 May 2020 16:29:25 +0000 (19:29 +0300)]
LU-11290 ldlm: page discard speedup

Improving check_and_discard_cb, allowing to cache
negative result of dlm lock lookup and avoid
excessive osc_dlm_lock_at_pgoff() calls.

HPE-bug-id: LUS-6432
Change-Id: I253f944bf430b06d0e1a300d22e5d9b2e97412bf
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/39327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13839 kernel: new kernel [RHEL 8.3 4.18.0-240.1.1.el8_3] 36/40536/3
Jian Yu [Fri, 6 Nov 2020 09:08:08 +0000 (01:08 -0800)]
LU-13839 kernel: new kernel [RHEL 8.3 4.18.0-240.1.1.el8_3]

This patch makes changes to support new RHEL 8.3 release
for Lustre client.

Test-Parameters: trivial clientdistro=el8.3

Change-Id: I06a46735b42ac258e576b1dd5c0beb17f4fd3e47
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40536
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13344 gss: Update crypto to use sync_skcipher 86/38586/13
Shaun Tancheff [Sun, 24 May 2020 19:29:41 +0000 (14:29 -0500)]
LU-13344 gss: Update crypto to use sync_skcipher

As of linux v4.19-rc2-66-gb350bee5ea0f the change
   crypto: skcipher - Introduce crypto_sync_skcipher

Enabled the deprecation of blkcipher which was dropped
as of linux v5.4-rc1-159-gc65058b7587f
    crypto: skcipher - remove the "blkcipher" algorithm type

Based on the existence of SYNC_SKCIPHER_REQUEST_ON_STACK
use the sync_skcipher API or provide wrappers for the
blkcipher API

Test-Parameters: testlist=sanity,recovery-small,sanity-sec mdscount=2 mdtcount=4 ostcount=8 clientcount=2 env=SHARED_KEY=true,SK_FLAVOR=skn
Test-Parameters: testlist=sanity,recovery-small,sanity-sec mdscount=2 mdtcount=4 ostcount=8 clientcount=2 env=SHARED_KEY=true,SK_FLAVOR=ska
Test-Parameters: testlist=sanity,recovery-small,sanity-sec mdscount=2 mdtcount=4 ostcount=8 clientcount=2 env=SHARED_KEY=true,SK_FLAVOR=ski
Test-Parameters: testlist=sanity,recovery-small,sanity-sec mdscount=2 mdtcount=4 ostcount=8 clientcount=2 env=SHARED_KEY=true,SK_FLAVOR=skpi
HPE-bug-id: LUS-8589
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I7683c20957213fd687ef5cf6dea64c842928db5b
Reviewed-on: https://review.whamcloud.com/38586
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10059 tests: sanityn 32a error messages 96/40496/2
James Nunez [Fri, 30 Oct 2020 17:44:00 +0000 (11:44 -0600)]
LU-10059 tests: sanityn 32a error messages

There are three checks for file size in sanityn test 32a
that, if they fail, have the same error message
"wrong file size".  Let's add additional information to
the error messages to help distinguish the different errors.

Test-Parameters: trivial testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I71aae8de67125d5be93c4fb4728b2c20d26df49c
Reviewed-on: https://review.whamcloud.com/40496
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
6 months agoLU-14093 ptlrpc: redefine uop_params_off to support gcc10 86/40486/2
Alex Zhuravlev [Fri, 30 Oct 2020 07:02:18 +0000 (10:02 +0300)]
LU-14093 ptlrpc: redefine uop_params_off to support gcc10

otherwise gcc10 complains about out-of-boundry access:

llog_swab.c: In function lustre_swab_update_ops:
llog_swab.c:137:46: error: array subscript 65534 is outside the bounds
of an interior zero-length array__u16[0] {aka short unsigned int[]}
[-Werror=zero-length-bounds]
  137 |    __swab16s(&uops->uops_op[i].uop_params_off[j]);

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I27981cbc79991cbd7a79cb90aec97bd1dc8b2f1b
Reviewed-on: https://review.whamcloud.com/40486
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14045 sec: fix O_DIRECT and encrypted files 95/40295/5
Sebastien Buisson [Mon, 19 Oct 2020 08:56:37 +0000 (08:56 +0000)]
LU-14045 sec: fix O_DIRECT and encrypted files

Sometimes, we can end up in a situation where
osc_release_bounce_pages() mistakenly consider pages as fscrypt
bounce pages, and tries to free them.
Fix the way we consider bounce pages by always setting the PageChecked
flag on them.
Also remove sanity test_426 from the ALWAYS_EXCEPT list.

Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54" clientdistro=ubuntu2004 fstype=ldiskfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54" clientdistro=ubuntu2004 fstype=zfs mdscount=2 mdtcount=4
Fixes: 728036f256 ("LU-12275 sec: O_DIRECT for encrypted file")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic54ae031c3b0baa17ffed8a6b6b90ff44f87ff58
Reviewed-on: https://review.whamcloud.com/40295
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14039 obdclass: set LA_TYPE when update_log init 64/40264/2
Yang Sheng [Thu, 15 Oct 2020 14:57:46 +0000 (22:57 +0800)]
LU-14039 obdclass: set LA_TYPE when update_log init

Fix for LBUG:
osp_md_create()) ASSERTION( attr->la_valid & LA_TYPE ) failed:
osp_md_create()) LBUG
Pid: 6024, comm: lod0002_rec0001 3.10.0-1062.18.1.el7_lustre
Call Trace:
 libcfs_call_trace+0x8c/0xc0 [libcfs]
 lbug_with_loc+0x4c/0xa0 [libcfs]
 osp_md_create+0x42a/0x470 [osp]
 llog_osd_get_cat_list+0x8d4/0xbd0 [obdclass]
 lod_sub_prep_llog+0xb9/0x783 [lod]
 lod_sub_recovery_thread+0x383/0xcf0 [lod]
 kthread+0xd1/0xe0
 ret_from_fork_nospec_begin+0x7/0x21

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I035f3f2da5dba54b86431cec65c48e9c5010224c
Reviewed-on: https://review.whamcloud.com/40264
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14037 osd: track commit cb in flight 58/40258/5
Alex Zhuravlev [Wed, 14 Oct 2020 12:20:18 +0000 (15:20 +0300)]
LU-14037 osd: track commit cb in flight

and wait for completion in osd-ldiskfs as mntput() is not synchronous
and returns quickly scheduling real umount to another thread while
osd is shutting down. as a result commit callbacks may come when
osd has been released yet.
the patch replaces lu_device_get() and lu_device_put() for transactions
with a dedicated counter which later used to wait for callback completion.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iaa6089b9158bc2a4bea8e33f4cfcce27395689ca
Reviewed-on: https://review.whamcloud.com/40258
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14051 utils: flush alr batch file in thread 10/40310/7
John L. Hammond [Thu, 17 Sep 2020 20:56:55 +0000 (15:56 -0500)]
LU-14051 utils: flush alr batch file in thread

In ofd_access_log_reader, move flushing of the batch file thread to
the sort and print thread.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Id1e008ede6c05e24ea2e2459520d6585007acc7d
Reviewed-on: https://review.whamcloud.com/40310
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14050 utils: fix fraction output logic 34/39934/6
Alex Zhuravlev [Wed, 16 Sep 2020 19:14:42 +0000 (22:14 +0300)]
LU-14050 utils: fix fraction output logic

so that it doesn't suppress single line reports

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9e374f8e769afcadc5cecbf529fa403deb544544
Reviewed-on: https://review.whamcloud.com/39934
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-12546 mdt: abort recovery between MDTs 27/36027/10
Hongchao Zhang [Sun, 12 Apr 2020 12:40:27 +0000 (20:40 +0800)]
LU-12546 mdt: abort recovery between MDTs

Add an option to abort recovery between MDTs in case there is a
problem during recovery (e.g. MDT is missing or has broken logs),
but don't abort recovery between MDT and clients.

Change-Id: Id88f2b2ebae5cfa722dcac67c087b9b9a448721e
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36027
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
6 months agoLU-13765 osd-ldiskfs: Rename dt_declare_falloc to dt_declare_fallocate 09/40509/2
Arshad Hussain [Sun, 1 Nov 2020 03:33:13 +0000 (09:03 +0530)]
LU-13765 osd-ldiskfs: Rename dt_declare_falloc to dt_declare_fallocate

This patch is the follow up of the patch: 93f700ca24
(LU-13765 osd-ldiskfs: Extend credit correctly for fallocate) and
it makes these changes:

01. Rename dt_declare_falloc() to dt_declare_fallocate()
for better readability.

02. Removes fallocate mode check under osd_fallocate()
as mode check is already done under declare phase.

03. Minor space/tabs changes

Test-Parameters: trivial testlist=sanity ostsizegb=12 env=ONLY="150e"
Test-Parameters: testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: If911a59a9c944e660e9926f4c436a4aeb2919284
Reviewed-on: https://review.whamcloud.com/40509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-10810 test: test lseek support in tools 02/40502/10
Mikhail Pershin [Sat, 31 Oct 2020 09:03:33 +0000 (12:03 +0300)]
LU-10810 test: test lseek support in tools

Check that SEEK_HOLE/SEEK_DATA are preforming in external tools
as expected.

Need 'cp' version 8.33+ and 'tar' version 1.29+, so check tools
version and measure runtime of sparse file handling if applicable

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I1424bf57c88f69d054c1646be66e10dd7fde8a1a
Reviewed-on: https://review.whamcloud.com/40502
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-10810 clio: SEEK_HOLE/SEEK_DATA on client side 08/39708/16
Mikhail Pershin [Fri, 21 Aug 2020 16:22:38 +0000 (19:22 +0300)]
LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side

Patch introduces basic support for lseek SEEK_HOLE/SEEK_DATA
parameters in lustre client.

- introduce new IO type CIT_LSEEK in CLIO stack
- LOV splits request to all stripes involved and merges
  results back.
- OSC sends OST LSEEK RPC asynchronously
- if target doesn't support LSEEK RPC then OSC assumes
  whole related object is data with virtual hole at the end
- lseek restores released files assuming it is done prior
  the file copying.
- tool is added to request needed lseek on file
- basic tests are added in sanity, sanityn and sanity-hsm

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I0728329d4bce71c441de581a439cde1aa873fd46
Reviewed-on: https://review.whamcloud.com/39708
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10810 ptlrpc: introduce OST_SEEK RPC 07/39707/9
Mikhail Pershin [Mon, 17 Aug 2020 11:06:30 +0000 (14:06 +0300)]
LU-10810 ptlrpc: introduce OST_SEEK RPC

For the purposes of SEEK_HOLE/SEEK_DATA support introduce
new OST_SEEK RPC.

Patch add RPC layout, unified handler and connect flag for
compatibility needs.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I1580902b6b773d9a6d6f9beaa1ee1da60fbc20f8
Reviewed-on: https://review.whamcloud.com/39707
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14116 autoconf: check if DES3 enctype is supported 54/40554/3
Jian Yu [Fri, 6 Nov 2020 06:31:28 +0000 (22:31 -0800)]
LU-14116 autoconf: check if DES3 enctype is supported

krb5 releases 1.18 and later completely remove support for
all DES3 enctypes (des3-cbc-raw, des3-hmac-sha1, des3-cbc-sha1-kd).

This patch adds HAVE_DES3_SUPPORT to check if DES3 enctype
is supported.

Change-Id: Ibb51ec7961e8c775ea92dec6119f4de01e2d9b1d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40554
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-13182 llite: Avoid eternel retry loops with MAP_POPULATE 21/40221/6
Oleg Drokin [Mon, 12 Oct 2020 20:10:15 +0000 (16:10 -0400)]
LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE

Kernels 5.4+ have an infinite retry loop from MAP_POPULATE mmap
option. Use the FAULT_FLAG_RETRY_NOWAIT to instruct filemap_fault
to not drop the mmap_sem so if the call fails, we could use
the slow path and break the loop from forming.
(Idea by Neil Brown)

Test-Parameters: testlist=sanity-hsm env=ONLY=1 clientdistro=ubuntu2004
Change-Id: I320ab9ca447282aea15ef2030ef8671c4260d895
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40221
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-13651 hsm: call hsm_find_compatible_cb() only for cancel 67/38867/18
Kirill Malkin [Sun, 17 May 2020 03:17:43 +0000 (20:17 -0700)]
LU-13651 hsm: call hsm_find_compatible_cb() only for cancel

The HSM action queue is scanned linearly in hsm_find_compatible_cb()
for existing requests on the same file so that duplicate or
conflicting requests are not added and cancel requests are assigned
the correct cookie, but this can cause a large delay in adding new
requests when the action queue is very large, as access to it is
locked for the duration of the search. Scanning the queue does not
guarantee that duplicate or conflicting requests are not added as
scanning (in hsm_find_compatible_cb()) and adding requests (in
mdt_agent_record_add()) are distinct operations that are not
serialized by a lock and so a race window exists between these two
function calls within which duplicate or conflicting requests can be
added. This is hopefully not a big problem though, as the CDT thread
will not send duplicate archive requests to a copytool serving a
different HSM backend (and it could probably be prevented from sending
duplicate archive requests to a copytool serving the same backend with
a small change in mdt_hsm_is_action_compat()) and duplicate restore
requests are serialized by taking the layout lock on the file before
being added to the action queue, which effectively serializes
them (although this blocks the caller, e.g. lfs, so it might not be
ideal). Since calling hsm_compatible_cb() does not protect completely
against this issue and can cause large delays in adding new requests,
we skip calling it for all requests apart from cancel requests that
don't specify a cookie (which should be all cancel requests in current
code), hopefully safely.

Test-Parameters: testlist=sanity-hsm
Signed-off-by: Kirill Malkin <kirill.malkin@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Cray-bug-id: LUS-8717
Change-Id: Id82b2a0720e46a9c12c4d9df323ce5a7bd7aff37
Reviewed-on: https://review.whamcloud.com/38867
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathan Rutman <nrutman@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13949 build: add autogen.sh into distribution tarball 25/40425/2
Jian Yu [Tue, 27 Oct 2020 17:33:03 +0000 (10:33 -0700)]
LU-13949 build: add autogen.sh into distribution tarball

This patch adds autogen.sh and config/lustre-version.m4 into
Lustre distribution tarball so that customers can regenerate
aclocal.m4, config.h.in, autoMakefile.in and configure in
their build environments.

Change-Id: Ic6c5430b9a8b504ebc6a7618e141f1ea23b046a2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40425
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
6 months agoLU-14073 build: fix autoconf test for clean_bdev_aliases() 95/40395/3
Mr NeilBrown [Mon, 26 Oct 2020 02:38:21 +0000 (13:38 +1100)]
LU-14073 build: fix autoconf test for clean_bdev_aliases()

From 5.9, buffer_head.h no longer provides a declaration for
'struct block_device' so the code fragment fails because the compiler
doesn't know the size of that structure.

Instead, simple pass NULL rather than the address of a real structure.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1775572fbd56d22822b6e440fe95bd105042e7b8
Reviewed-on: https://review.whamcloud.com/40395
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-14070 tgt: check obd_recovering in tgt_brw_unlock() 82/40382/4
Mikhail Pershin [Fri, 23 Oct 2020 09:30:58 +0000 (12:30 +0300)]
LU-14070 tgt: check obd_recovering in tgt_brw_unlock()

Since tgt_brw_lock() never takes a lock during recovery,
the tgt_brw_unlock() should check that also to prevent
false-positive triggering of assertion.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic6b6f6fa16678622460101d26df14f523e56a47a
Reviewed-on: https://review.whamcloud.com/40382
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14057 ptlrpc: don't log connection 'restored' inappropriately 31/40331/3
Aurelien Degremont [Fri, 16 Oct 2020 13:42:42 +0000 (13:42 +0000)]
LU-14057 ptlrpc: don't log connection 'restored' inappropriately

Reverse imports maintain a target->client connection which
does not support recovery as client don't run a recovery.
At every connection, the reverse import state goes from
NEW to RECOVER to FULL which triggers a `Connection restored`
log message, even if this is the first connection from
this client.

Suppress this log message for reverse import to avoid
this wrong logging.

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I6f35b8d916a4ae535d55ba39b491f57e1553986c
Reviewed-on: https://review.whamcloud.com/40331
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13728 utils: add missing global parameters 98/40298/2
Cyril Bordage [Mon, 19 Oct 2020 16:10:23 +0000 (18:10 +0200)]
LU-13728 utils: add missing global parameters

lnetctl export shows the complete set of global parameters as
with lnetctl global.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I4d864fb4734679106ac6c49ec7f57f5e00ba3434
Reviewed-on: https://review.whamcloud.com/40298
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10728 utils: fix str length in error string 68/40268/3
Cyril Bordage [Fri, 16 Oct 2020 13:39:06 +0000 (15:39 +0200)]
LU-10728 utils: fix str length in error string

sizeof on pointers was used to get the length of the string. Use
instead string length from function inputs. Also remove useless uses
of snprintf and terminating null bytes.

Test-Parameters: @lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7053f39828ababd5782b360ef5c27c607ddb740d
Reviewed-on: https://review.whamcloud.com/40268
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1] 65/40265/3
Jian Yu [Thu, 15 Oct 2020 19:58:01 +0000 (12:58 -0700)]
LU-14029 kernel: new kernel [SLES15 SP2 5.3.18-24.24.1]

This patch makes changes to support new SLES15 SP2 release
with kernel 5.3.18-24.24.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 130 136 817" \
clientdistro=sles15sp2 serverdistro=el7.8 \
testlist=sanity

Change-Id: Icf97678ebb0c6495d956f13d57e0cea65a20b108
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40265
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13989 ldlm: BL AST vs failed lock enqueue race 46/40046/3
Andriy Skulysh [Tue, 11 Feb 2020 12:00:18 +0000 (14:00 +0200)]
LU-13989 ldlm: BL AST vs failed lock enqueue race

failed_lock_cleanup() marks the lock with LDLM_FL_LOCAL_ONLY,
so cancel request isn't sent.

Mark failed lock with LDLM_FL_LOCAL_ONLY only
if BL AST wasn't received.
Add server's lock handle to BL AST RPC.
So client will be able to cancel the lock
even if enqueue fails.

Change-Id: I3201bc29abd877cddc334ca27a9d208cb55c5d8f
HPE-bug-id: LUS-8493, LUS-8830
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/40046
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13984 ptlrpc: throttle RPC resend if network error 20/40020/9
Aurelien Degremont [Wed, 23 Sep 2020 19:20:08 +0000 (19:20 +0000)]
LU-13984 ptlrpc: throttle RPC resend if network error

When sending a callback AST to a non-responding client, the server
retries endlessly until the client is eventually evicted. When using
ksocklnd, it will retry after each AST timeout, until the socket is
eventually closed, after sock_timeout sec, where the retry will fail
immediately, returning -110, as no socket could be established.

The thread will spin on retrying and failing, until eventual client
eviction. This will cause high thread CPU usage and possible resource
denial.

To workaround that, this patch avoids re-trying callback resend if:
 - the request is flagged with network error and timeout
 - last try was less than 1 sec ago

In worst case, retry will happen after a timeout based on req->rq_deadline.
If there is nothing else to handle, thread will be sleeping during that
time, removing CPU overhead.

Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: Ie5028761c978b26e833fd0a5d30d313addf57984
Reviewed-on: https://review.whamcloud.com/40020
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13961 kernel: kernel update RHEL8.2 [4.18.0-193.19.1.el8_2] 87/39987/4
Jian Yu [Tue, 27 Oct 2020 22:52:32 +0000 (15:52 -0700)]
LU-13961 kernel: kernel update RHEL8.2 [4.18.0-193.19.1.el8_2]

Update RHEL8.2 kernel to 4.18.0-193.19.1.el8_2.

Test-Parameters: trivial \
clientdistro=el8.2 serverdistro=el8.2 \
testlist=sanity

Change-Id: I32d65790adcd5829cdc4447e9b116a83bf1efd63
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39987
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14049 utils: manage thread resources in alr_batch_print() 09/40309/5
John L. Hammond [Mon, 14 Sep 2020 17:58:31 +0000 (12:58 -0500)]
LU-14049 utils: manage thread resources in alr_batch_print()

In alr_batch_print(), create the sort and print thread with the
detached attribute. Protect against concurrent write the the batch
output file. Ensure that memory is freed in error cases.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ibf1b299bd15f5d189a2302ce476bf2ef986a85b1
Reviewed-on: https://review.whamcloud.com/40309
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13665 tests: skip sanity subtests for new features 90/39890/10
Andreas Dilger [Fri, 11 Sep 2020 23:01:57 +0000 (17:01 -0600)]
LU-13665 tests: skip sanity subtests for new features

Skip sanity.sh test_165 (OAL) and part of test_56oc (btime) during
interop testing for features that were added recently.

Skip test_56oc timestamp parsing test to avoid timezone issues in
test environment.

Fixes: 3f7853b31ef6 ("LU-10934 llite: integrate statx() API with Lustre")
Fixes: 66172e3274ca ("LU-13238 ofd: add OFD access logs")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib09b60dccb563fcedadd1da55eea11ddca6ecde5
Reviewed-on: https://review.whamcloud.com/39890
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14031 ptlrpc: decrease time between reconnection 44/40244/4
Alexander Boyko [Wed, 14 Oct 2020 08:20:58 +0000 (04:20 -0400)]
LU-14031 ptlrpc: decrease time between reconnection

When a connection get a timeout or get an error reply from a sever,
the next attempt happens after PING_INTERVAL. It is equal to
obd_timeout/4. When a first reconnection fails, a second go to
failover pair. And a third connection go to a original server.
Only 3 reconnection before server evicts client base on blocking
ast timeout. Some times a first failed and the last is a bit late,
so client is evicted. It is better to try reconnect with a timeout
equal to a connection request deadline, it would increase a number
of attempts in 5 times for a large obd_timeout. For example,
    obd_timeout=200
     - [ 1597902357, CONNECTING ]
     - [ 1597902357, FULL ]
     - [ 1597902422, DISCONN ]
     - [ 1597902422, CONNECTING ]
     - [ 1597902433, DISCONN ]
     - [ 1597902473, CONNECTING ]
     - [ 1597902473, DISCONN ] <- ENODEV from a failover pair
     - [ 1597902523, CONNECTING ]
     - [ 1597902539, DISCONN ]

The patch adds a logic to wakeup pinger for failed connection request
with ETIMEDOUT or ENODEV. It adds imp_next_ping processing for
ptlrpc_pinger_main() time_to_next_wake calculation, and fixes setting
of imp_next_ping value.

HPE-bug-id: LUS-8520
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ia0891a8ead1922810037f7d71092cd57c061dab9
Reviewed-on: https://review.whamcloud.com/40244
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13645 ldlm: extra checks for DOM locks 78/39878/6
Vitaly Fertman [Wed, 2 Sep 2020 17:14:06 +0000 (20:14 +0300)]
LU-13645 ldlm: extra checks for DOM locks

a couple of checks are added:
- only DOM lock can be a group lock;
- DOM bit must be the only mandatory one, or optional;

Signed-off-by: Vitaly Fertman <c17818@cray.com>
HPE-bug-id: LUS-8987
Change-Id: Iaf7a14a66eb0f125d2f6f7d06f5de0add387e101
Reviewed-on: https://review.whamcloud.com/39878
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13645 ldlm: group locks for DOM IBIT lock 06/39406/7
Vitaly Fertman [Tue, 4 Aug 2020 12:12:04 +0000 (15:12 +0300)]
LU-13645 ldlm: group locks for DOM IBIT lock

Group lock is supposed to be taken on such operations as layout swap
used for e.g. HSM, and is to be taken for DOM locks as well.

HPE-bug-id: LUS-8987
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I97888e1aee853d7fe04548681b2ed6805cb494ae
Reviewed-on: https://review.whamcloud.com/39406
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13666 lod: update .do_index_ops on layout detach 26/39226/2
Lai Siyao [Wed, 1 Jul 2020 14:03:19 +0000 (22:03 +0800)]
LU-13666 lod: update .do_index_ops on layout detach

Directory migration detaches stripes from source, and then attaches
them to target if source is a striped directory. This will convert
source from striped directory to plain directory, it needs update
.do_index_ops from lod_striped_index_ops to lod_index_ops to avoid
trigger assertion in index ops.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia8f66a8a3fd5e96f0dba4d60eb2443107d320418
Reviewed-on: https://review.whamcloud.com/39226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-11631 mdd: migrate symlink for cross-MDT rename 97/39897/3
Lai Siyao [Wed, 9 Sep 2020 21:50:11 +0000 (05:50 +0800)]
LU-11631 mdd: migrate symlink for cross-MDT rename

If symlink rename is cross MDTs, it's ineffective to turn this
symlink into a remote object, instead migrate it to where target
MDT is. The following changes are made:
* change migration code to allow source and target have different
  name.
* if symlink is renamed to other MDT, and it doesn't have other
  hard link and target doesn't exist, migrate it to target MDT.
* remove mdd_rename_order() which is obsolete.

Add sanity 24G.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib5fafe3122172ac582bbcc907c72a9f391baf0e1
Reviewed-on: https://review.whamcloud.com/39897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
6 months agoLU-13692 ldlm: Ensure we reprocess the resource on ast error 98/39598/8
Oleg Drokin [Fri, 7 Aug 2020 07:38:51 +0000 (03:38 -0400)]
LU-13692 ldlm: Ensure we reprocess the resource on ast error

When we are trying to grant a lock and met an AST error, rerunning
the policy is pointless since it cannot grant a potentially now eligible
lock and our lock is already in all the queues, just be like all the other
handlers for ERESTART return and run a full resource reprocess instead.

Change-Id: I3edb37bf084b2e26ba03cf2079d3358779c84b6e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39598
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
6 months agoLU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait 75/40375/3
Oleg Drokin [Fri, 23 Oct 2020 06:56:04 +0000 (02:56 -0400)]
LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait

in ldlm_handle_cp_callback the while loop is clearly supposed
to be limited by the "to" value of 1 second, but is not.
Seems to have been broken by all the Solaris porting in HEAD
all the way back in 2008.
Restore the to assignment to make it not hang indefinitely.

Change-Id: I449bfd7f585ab7db475fb3fd4cbbd876126ff789
Fixes: adde80ff ("Land b_head_libcfs onto HEAD")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40375
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-1742 o2iblnd: 'Timed out tx' error message 22/3622/4
Brian Behlendorf [Mon, 13 Aug 2012 23:58:20 +0000 (16:58 -0700)]
LU-1742 o2iblnd: 'Timed out tx' error message

Trivial fix to report the total RDMA time outstanding rather
than the number of seconds past the deadline.

Change-Id: I0ef9b7b9b31a4d27adf4f3f33da46c503e5ca49e
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/3622
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14010 build: Ensure dkms installs all Lustre modules 25/40225/2
Amin Tootoonchian [Mon, 12 Oct 2020 21:07:50 +0000 (16:07 -0500)]
LU-14010 build: Ensure dkms installs all Lustre modules

Add --force to dkms install in:
  debian/lustre-client-modules-dkms.postinst

Without it older than available modules are skipped.

Signed-off-by: Amin Tootoonchian <amint@openai.com>
Change-Id: I1d549e7d48d60294810e11ed2588a512f1527eda
Reviewed-on: https://review.whamcloud.com/40225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13437 llite: pass name in getattr by FID 19/40219/6
Lai Siyao [Mon, 12 Oct 2020 14:22:07 +0000 (22:22 +0800)]
LU-13437 llite: pass name in getattr by FID

Now parent FID is packed in getattr_by_FID request
(see https://review.whamcloud.com/39290), it should also pass in name
from llite, so that lmv can replace fid1 with stripe FID, otherwise
MDS may treat sub files under striped directory as remote object.

Note, the name is not packed in request, because if it's packed, MDS
will getattr by name instead of FID.

Fixes: 5f2c44bf6 ("LU-13437 llite: pack parent FID in getattr")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If8215667bcb10ea3c4c5cd2c9034d81fd1cda3b5
Reviewed-on: https://review.whamcloud.com/40219
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13437 mdc: remote object support getattr from cache 18/40218/4
Lai Siyao [Sat, 10 Oct 2020 14:34:19 +0000 (22:34 +0800)]
LU-13437 mdc: remote object support getattr from cache

For historical reason, IT_GETATTR lock revalidate matches
LOOKUP|UPDATE|PERM lock bits because for MDS < 2.4, permission is
protected by LOOKUP lock, but this will cause remote object not
able to match the cached lock because LOOKUP and UPDATE lock are
fetched separately.

Add sanity 803b, and rename 803 to 803a.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ac38fe34472736849307bb7f1eebb5de9343a5c
Reviewed-on: https://review.whamcloud.com/40218
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14016 libcfs: use atomic64_t for libcfs_kmem 68/40168/5
Amir Shehata [Wed, 7 Oct 2020 21:27:14 +0000 (14:27 -0700)]
LU-14016 libcfs: use atomic64_t for libcfs_kmem

libcfs_kmem keeps track of LNet's memory usage. It uses an
int type, so it could wrap around if usage grows beyond 2.14 GB.
Use atomic64_t to avoid this issue.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If96fb8391c6ffb1924e47cef3dfca02eabc5f912
Reviewed-on: https://review.whamcloud.com/40168
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13498 tests: remove tests from ALWAYS_EXCEPT with SSK 61/40161/8
Sebastien Buisson [Wed, 7 Oct 2020 06:36:49 +0000 (08:36 +0200)]
LU-13498 tests: remove tests from ALWAYS_EXCEPT with SSK

A number of tests had previously been added to ALWAYS_EXCEPT
when SHARED_KEY was in use.
These tests are now passing with SSK, so remove them from the
exception list.

Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,replay-dual,replay-single,sanity-hsm,conf-sanity
Test-Parameters: env=SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,replay-dual,replay-single,sanity-hsm,conf-sanity
Test-Parameters: env=SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,replay-dual,replay-single,sanity-hsm,conf-sanity
Test-Parameters: env=SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,replay-dual,replay-single,sanity-hsm,conf-sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If72b212a23b915afdb723acf7254908e1c043e07
Reviewed-on: https://review.whamcloud.com/40161
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
6 months agoLU-13824 test: test sanity 230q with fewer files on ZFS 28/39528/4
Lai Siyao [Tue, 28 Jul 2020 13:44:09 +0000 (21:44 +0800)]
LU-13824 test: test sanity 230q with fewer files on ZFS

Sanity 230q may timeout on ZFS backend, test with fewer files.

Test-Parameters: trivial fstype=zfs testlist=sanity mdscount=2 mdtcount=4 env=ONLY=230q,ONLY_REPEAT=100
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iaf9e4e6d68244937819305af72df33e59df19f1f
Reviewed-on: https://review.whamcloud.com/39528
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14031 ptlrpc: remove unused code at pinger 43/40243/3
Alexander Boyko [Wed, 14 Oct 2020 07:45:21 +0000 (03:45 -0400)]
LU-14031 ptlrpc: remove unused code at pinger

The timeout_list was previously used for grant shrinking,
but right now is dead code.

HPE-bug-id: LUS-8520
Fixes: fc915a43786e ("LU-8708 osc: depart grant shrinking from pinger")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ia7a77b4ac19da768ebe1b0879d7123941f4490b5
Reviewed-on: https://review.whamcloud.com/40243
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13740 tests: improve sanity-sec test_45 46/40146/2
Sebastien Buisson [Tue, 6 Oct 2020 09:43:19 +0000 (11:43 +0200)]
LU-13740 tests: improve sanity-sec test_45

Improve sanity-sec test_45 by referencing the entire mmap-ed region
thanks to multiop.
Also make sure encryption tests are passing on newly supported
Ubuntu 20.04 distro.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54" clientdistro=ubuntu2004 fstype=ldiskfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54" clientdistro=ubuntu2004 fstype=zfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54" clientdistro=el8.1 fstype=ldiskfs mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54" clientdistro=el8.1 fstype=zfs mdscount=2 mdtcount=4
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I983b6bf94d9f51486fd6b688267af46ed4188a98
Reviewed-on: https://review.whamcloud.com/40146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-13765 osd-ldiskfs: Extend credit correctly for fallocate 42/39342/18
Arshad Hussain [Wed, 9 Sep 2020 23:18:13 +0000 (04:48 +0530)]
LU-13765 osd-ldiskfs: Extend credit correctly for fallocate

In OSD layer, before call ->fallocate(), Lustre has already
created journal handle for the fallocate transcation. In
ldiskfs/ext4, for very large range fallocate, the operation
may split into multiple transaction and call journal start/stop
multiple times inside fallocate. However, nested journal will
ignore requested credits, this result in running out of credits
at the end.

As we can not predict the total number of credits needed in
advance especially for large fallocate, thus in this patch, we
move fallocate logic into Lustre OSD, so that it could reserve
credits correctly. It extends credits for the current transaction
when found the left buffer credits is less than needed, and then
restart the transaction.

Testcase sanity/150e and sanity-quota/1h added to verify the
issue.

Test-Parameters: trivial testlist=sanity ostsizegb=12 env=ONLY="150e"
Test-Parameters: testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib7565ed2c1ae72eef4832fbcb710e0ee70c53aec
Reviewed-on: https://review.whamcloud.com/39342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13719 lov: doesn't check lov_refcount 02/39702/2
Hongchao Zhang [Fri, 21 Aug 2020 10:17:12 +0000 (18:17 +0800)]
LU-13719 lov: doesn't check lov_refcount

In lov_cleanup, the check of each OSC is protected by
lov_tgt_getrefs, which will increment the "lov_refcount",
so the "lov_refcount" shouldn't be checked inside because
it is always larger than 0.

Change-Id: I21423d4345190b3e02eb00734c127e35cbc9b1af
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-12214 build: fix suse require krb5 14/40214/6
Minh Diep [Sun, 11 Oct 2020 22:50:46 +0000 (15:50 -0700)]
LU-12214 build: fix suse require krb5

Test-Parameters: trivial clientdistro=sles15sp1

Change-Id: If5bbe77bda84381b363c733f763cfc81e29aedb7
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40214
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
6 months agoLU-13745 tests: skip sanity test_426 for 4.15+ 66/40366/2
John L. Hammond [Thu, 22 Oct 2020 22:09:03 +0000 (17:09 -0500)]
LU-13745 tests: skip sanity test_426 for 4.15+

Add sanity test_426 to ALWAYS_EXCEPT for newer client kernels because
it is crashing 100% since "LU-13745 test: add splice test for lustre"
was landed.

Test-Parameters: trivial clientdistro=ubuntu1804
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I84a722a27c3e8a572c20b46ca9daaf44e8720b54
Reviewed-on: https://review.whamcloud.com/40366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
6 months agoLU-14067 test: skip compile tests on aarch64 65/40365/3
John L. Hammond [Thu, 22 Oct 2020 21:57:47 +0000 (16:57 -0500)]
LU-14067 test: skip compile tests on aarch64

aarch64 gcc segfaults trying to compile our headers so skip sanity
400a, 400b and sanity-lnet 300 on aarch64.

Test-Parameters: trivial clientdistro=el8.1 clientarch=aarch64
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8322107919084c86a0cb6fc15730a49f96c03b22
Reviewed-on: https://review.whamcloud.com/40365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
6 months agoLU-13636 osd: create agent inode with explicit owner 42/38842/5
Alex Zhuravlev [Fri, 5 Jun 2020 05:16:32 +0000 (08:16 +0300)]
LU-13636 osd: create agent inode with explicit owner

to avoid quota misaccounting.

Test-Parameters: fstype=ldiskfs
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5a02e6e7de71821a10704ac3516ee087998c9c21
Reviewed-on: https://review.whamcloud.com/38842
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13498 gss: update sequence in case of target disconnect 22/40122/5
Sebastien Buisson [Fri, 2 Oct 2020 12:05:55 +0000 (21:05 +0900)]
LU-13498 gss: update sequence in case of target disconnect

Client to OST connections can go idle, leading to target disconnect.
In this event, maintaining correct sequence number ensures that GSS
does not erroneously consider requests as replays.
Sequence is normally updated on export destroy, but this can occur too
late, ie after a new target connect request has been processed. So
explicitly update sec context at disconnect time.

Test-Parameters: env=SHARED_KEY=true,SK_FLAVOR=skn mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,sanity-sec
Test-Parameters: env=SHARED_KEY=true,SK_FLAVOR=ska mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,sanity-sec
Test-Parameters: env=SHARED_KEY=true,SK_FLAVOR=ski mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,sanity-sec
Test-Parameters: env=SHARED_KEY=true,SK_FLAVOR=skpi mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I65c27e1ab459b2a29670580121ef6e1a00f18918
Reviewed-on: https://review.whamcloud.com/40122
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13745 tests: skip sanity test_426 for 4.18+ 26/40326/3
Andreas Dilger [Wed, 21 Oct 2020 02:42:29 +0000 (20:42 -0600)]
LU-13745 tests: skip sanity test_426 for 4.18+

Add sanity test_426 to ALWAYS_EXCEPT for newer client kernels because
it is crashing 100% since "LU-13745 test: add splice test for lustre"
was landed.

 Unable to handle NULL pointer dereference at address 0000000000000004
 user pgtable: 64k pages, 48-bit VAs, pgdp = 000000009f14b2d0
 Internal error: Oops: 96000005 [#1] SMP
 CPU: 1 PID: 11273 Comm: ptlrpcd_01_01 4.18.0-147.8.1.el8_1.aarch64 #1
 Process ptlrpcd_01_01 (pid: 11273, stack limit = 0x00000000f9135a93)
 Call trace:
  mempool_free+0x24/0xe0
  llcrypt_free_bounce_page.part.1+0x38/0x48 [libcfs]
  llcrypt_free_bounce_page+0x24/0x30 [libcfs]
  brw_interpret+0x124/0x10c8 [osc]
  ptlrpc_check_set+0x688/0x3318 [ptlrpc]
  ptlrpcd_check+0x470/0x820 [ptlrpc]
  ptlrpcd+0x3d4/0x5c8 [ptlrpc]
  kthread+0x130/0x138

Test-Parameters: trivial clientdistro=el8.1 clientarch=aarch64
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8f7b1d5e3ee69a3e0a6dfe3944949741a74cb62a
Reviewed-on: https://review.whamcloud.com/40326
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>