Whamcloud - gitweb
fs/lustre-release.git
5 months agoLU-10391 socklnd: don't deref lnet_hdr in LNDs 02/43602/12
Mr NeilBrown [Mon, 11 May 2020 03:52:34 +0000 (13:52 +1000)]
LU-10391 socklnd: don't deref lnet_hdr in LNDs

The lnd_hdr structure needs to be extended to support larger
addresses.  To assist this we need to minimize the number of places
that its content are accessed.

Currently the internals of lnet_hdr are larely untouched inside the
various LNDs, but there are some exceptions in socklnd.
These exceptions are not necessary - the same data is available from
elsewhere in the lnet_msg.

So change those accesses to use the lnet_msg info instead.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia37548323fafc77df7a42a1ac956c926f1b9ebf9
Reviewed-on: https://review.whamcloud.com/43602
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 socklnd: prepare for new KSOCK_MSG type 01/43601/12
Mr NeilBrown [Mon, 11 May 2020 01:06:11 +0000 (11:06 +1000)]
LU-10391 socklnd: prepare for new KSOCK_MSG type

Various places in socklnd assume there are only two message type:
KSOCK_MSG_NOOP and KSOCK_MSG_LNET.  We will soon add another type to
support a new lnet_hdr type with large addresses.
So do some cleanup first:

- get rid of ksock_lnet_msg - it doesn't add anything to lnet_hdr
- separate out 'struct ksock_hdr'.  We often want the size of this
  header, and instead request the offset of a field in ksock_msg.
- introduce switch statements in a couple of places to handle the
  different types of ksock_msg.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ibe484f76757c4100b8532cef659c3cc369b658ba
Reviewed-on: https://review.whamcloud.com/43601
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: use large nids in struct lnet_event 00/43600/12
Mr NeilBrown [Tue, 30 Nov 2021 14:51:37 +0000 (09:51 -0500)]
LU-10391 lnet: use large nids in struct lnet_event

All nids, including those in process_id, are changed to
to struct lnet_nid / struct lnet_processid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I799dbbc22f7cfe403f07eb22f4bfc4e4b5dc23ea
Reviewed-on: https://review.whamcloud.com/43600
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: Change lnet_send() to take large-addr nids 99/43599/11
Mr NeilBrown [Tue, 30 Nov 2021 14:48:37 +0000 (09:48 -0500)]
LU-10391 lnet: Change lnet_send() to take large-addr nids

The src and rtr nids passed to lnet_send() are now pointers to a
'struct lnet_nid'.  NULL can be passed for the rtr nid, which is
treated the same as ANY.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id216b82ed6e2dcd81114859a7f964e0680057ff1
Reviewed-on: https://review.whamcloud.com/43599
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: extend nids in struct lnet_msg 98/43598/13
Mr NeilBrown [Fri, 7 Jan 2022 01:07:45 +0000 (20:07 -0500)]
LU-10391 lnet: extend nids in struct lnet_msg

struct lnet_msg contains 3 nids and one process_id (which itself
contains a nid.  Replace each of these with the 'struct lnet_nid'
version.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic6233d36bafda364894d89b2e2b055538a6033f5
Reviewed-on: https://review.whamcloud.com/43598
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11596 osc: Fix and re-enable sanity grant test for ARM 58/40758/19
James Simmons [Sat, 20 Nov 2021 13:58:53 +0000 (08:58 -0500)]
LU-11596 osc: Fix and re-enable sanity grant test for ARM

If both OST and OSC support OBD_CONNECT_GRANT_PARAM, OST side will not
change client side claimed grant (a.k.a. o_grant_used) regardless of
the client page size. So no grant loss in this case.

Fixes: bd1e41672c97 ("LU-2049 grant: add support for OBD_CONNECT_GRANT_PARAM")
Change-Id: Ia0d3da587cb551400fec0c054dc65b116e6bd95b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/40758
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11597 test: Fix sanityn 16a failed on arm 89/37589/15
Wang Shilong [Thu, 6 Jan 2022 14:02:24 +0000 (09:02 -0500)]
LU-11597 test: Fix sanityn 16a failed on arm

As now O_DIRECT expect IO aligned with PAGE SIZE,
x86_64 expect 4K size, but some other platform, it
could be 64K, use PAGE_SIZE here to make the test happy.

And macro O_DIRECT is defined if macro _GNU_SOURCE is defined
according to open man doc[1] and _GNU_SOURCE is defined at the
head of file fsx.c already. So set the value of OP_DIRECT to
O_DIRECT instead of hardcoding its value as O_DIRECT could have
different values for other platforms like Arm64[2].

[1]
https://man7.org/linux/man-pages/man2/open.2.html
"The O_DIRECT, O_NOATIME, O_PATH, and O_TMPFILE flags are Linux-
 specific.  One must define _GNU_SOURCE to obtain their definitions."
[2]
https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h.html#_M/__O_DIRECT

Test-Parameters: testlist=sanityn envdefinitions=ONLY=16a
Fixes: 853d180121a6 ("LU-3606 fsx: Add fallocate operation to fsx")
Change-Id: If72d434adaf91a960dfc50c557d8b50793fda575
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/37589
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15407 test: remove dummy enc key at cleanup 38/46038/3
Sebastien Buisson [Tue, 11 Jan 2022 07:27:42 +0000 (08:27 +0100)]
LU-15407 test: remove dummy enc key at cleanup

Make sure to remove the dummy encryption key from session keyring
when cleaning up encryption tests.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I840490fca0a485110d077fe85254ced817fd55e3
Reviewed-on: https://review.whamcloud.com/46038
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels 37/46037/2
James Simmons [Mon, 10 Jan 2022 23:39:37 +0000 (18:39 -0500)]
LU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels

LNet selftest smoke test was at one time failing for kernels
4.4+. My testing on newer Ubuntu 5.X kernels shows this is now
working so re-enable it in general on the x86 platform.

Test-Parameters: trivial testlist=lnet-selftest

Change-Id: I865ffa868d05c22f2cf53c5e978ab8be9e450e99
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs 22/45922/5
Kevin Zhao [Wed, 22 Dec 2021 01:53:27 +0000 (09:53 +0800)]
LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs

When setup with multiple MDTs, the atomic operation is needed for
`set_bit` operation. On Arm64 platform, the atomic operation will
rely on the exclusive access, which is requesting the address
alignment[1]. So that's why we see that the __ll_sc_atomic64_or+0x4
is crashed. __ll_sc_atomic64_or+0x4 is LDXR instruction, directly
load the value from address exclusively.

The atomic64 required the access the 64 bits alignment address, but
the struct element ha_map is 4 bytes alignment, that is the root
cause. The Error code of this crash is ESR = 0x96000021, which is
the alignment issue[2].

1. https://developer.arm.com/documentation/den0024/a/ch05s01s02
2. https://developer.arm.com/documentation/ddi0595/2021-06/
   AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Change-Id: I3cc6d7347f05680ab55f00538e91886f006deb5d
Reviewed-on: https://review.whamcloud.com/45922
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15406 sec: fix in-kernel fscrypt support 87/45987/4
Sebastien Buisson [Thu, 6 Jan 2022 09:18:20 +0000 (10:18 +0100)]
LU-15406 sec: fix in-kernel fscrypt support

When using in-kernel fscrypt provided by Linux 5.4, the encryption
context can be retrieved by calling the .get_context function defined
in the struct fscrypt_operations of the super_block.
llite needs to retrieve the encryption context explicitly in case of
migration via volatile files.

Fixes: 09c558d16f ("LU-14677 sec: migrate/extend/split on encrypted file")
Fixes: fdbf2ffd41 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I76dbd21f0dc95920519ea375c583bc378d7c9f53
Reviewed-on: https://review.whamcloud.com/45987
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13799 llite: Implement lower/upper aio 09/44209/15
Patrick Farrell [Fri, 30 Jul 2021 16:12:05 +0000 (12:12 -0400)]
LU-13799 llite: Implement lower/upper aio

This patch creates a lower level aio struct for each set of
pages submitted, and attaches that to the llite level aio.

That means the completion of i/o (in the sense of
successful RPC/page completion) is associated with the
lower level aio struct, and the higher level aio waits for
the completion of these lower level structs.  Previously,
all pages were associated with the upper level (and only)
aio struct.

This patch is a reorganization/cleanup, which is necessary
for the next patch, which moves release pages to aio_end.
The justification for this (correctness and performance)
will be provided in that patch.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I02d6a33a0d9f9bbc1a182bcd539bd836c240bcc5
Reviewed-on: https://review.whamcloud.com/44209
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13799 osc: Always set aio in anchor 53/44153/9
Patrick Farrell [Fri, 30 Jul 2021 16:11:37 +0000 (12:11 -0400)]
LU-13799 osc: Always set aio in anchor

We currently do not set csi_aio for DIO and use this to
control when we free the aio struct.  (For AIO, we must
free it in cl_sync_io_note, but for other users, we have to
wait until after cl_sync_io_wait has been called.)

The lack of csi_aio causes trouble for the implementation
of the next patch, so instead we always set it and control
freeing by checking at that time if we are doing DIO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I2122a6a2dad33179e9114494b53c09d0b64f0fa6
Reviewed-on: https://review.whamcloud.com/44153
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13799 llite: Simplify cda_no_aio_complete use 54/44154/8
Patrick Farrell [Fri, 30 Jul 2021 16:11:03 +0000 (12:11 -0400)]
LU-13799 llite: Simplify cda_no_aio_complete use

It is better to handle AIO and DIO the same as much as
possible, limiting the difference to setup if possible.

In this spirit, move the check for DIO (is_sync_kiocb()) to
the setup function rather than cleanup and just use
no_aio_complete.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1b91e5b8f42971cb37780597402c4ee94f82a963
Reviewed-on: https://review.whamcloud.com/44154
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15417 build: build MOFED 5.5 92/45992/3
Minh Diep [Thu, 6 Jan 2022 21:02:39 +0000 (13:02 -0800)]
LU-15417 build: build MOFED 5.5

The path the mofed header files has change to
/usr/src/ofa_kernel/x86_64/<kernel>
so we cannot assume it's /usr/src/ofa_kernel/default

Test-Parameters: trivial
Change-Id: I10f375b459f04b84003e70951e4e423295001f40
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15396 osd: include linux/file.h 23/45923/2
Alex Zhuravlev [Thu, 23 Dec 2021 08:10:21 +0000 (11:10 +0300)]
LU-15396 osd: include linux/file.h

in some 4.x kernels we need to include linux/file.h to have
alloc_file() defined.

Fixes: b0f150eba4 ("LU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files")
Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I279945f70578030bf581fa2afc0ca7b4dfa83653
Reviewed-on: https://review.whamcloud.com/45923
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15358 tests: Variable incorrectly defined in sanity-quota 20/45820/2
Arshad Hussain [Fri, 10 Dec 2021 09:03:58 +0000 (14:33 +0530)]
LU-15358 tests: Variable incorrectly defined in sanity-quota

Under sanity-quota.sh local variable 'accnt_cnt' was
incorrectly defined. This was exposed using
shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ./lustre/tests/sanity-quota.sh line 3344:
local accnt_cnt
      ^-- SC2034: accnt_cnt appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change-Id: Ib5971e7cc95b03c1f57411c6f02156ab236babcd
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/45820
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-15410 tests: Add MDS Space check for dom-performance 73/45973/2
Arshad Hussain [Wed, 5 Jan 2022 11:01:05 +0000 (06:01 -0500)]
LU-15410 tests: Add MDS Space check for dom-performance

IOR Test within dom-performance requires at least
MDS of 20GB. This patch adds MDS space check for
dom-performance/test_IOR to skip in case the MDS
of required size is not found

Test-Parameters: trivial testlist=dom-performance
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I493a6ee5b549539b562aeda418a7418b94060ca9
Reviewed-on: https://review.whamcloud.com/45973
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
5 months agoLU-15408 sec: confirm encrypted file's hash 64/45964/2
Sebastien Buisson [Tue, 4 Jan 2022 17:16:47 +0000 (18:16 +0100)]
LU-15408 sec: confirm encrypted file's hash

It is a good practice to always confirm on server side the encrypted
file's hash included in the digested form sent by the client.

Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I42212a36b23e4e6e41184a78fa8244c5e2d8dd1f
Reviewed-on: https://review.whamcloud.com/45964
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15220 utils: fix gcc-11 -Werror=mismatched-dealloc error 14/45814/3
Jian Yu [Thu, 9 Dec 2021 19:18:13 +0000 (11:18 -0800)]
LU-15220 utils: fix gcc-11 -Werror=mismatched-dealloc error

This patch fixes the following -Werror=mismatched-dealloc error in
lustre_rsync.c:

lustre_rsync.c: In function ‘lr_locate_rsync’:
lustre_rsync.c:1472:17: error: ‘fclose’ called on pointer returned
from a mismatched allocation function [-Werror=mismatched-dealloc]
 1472 |                 fclose(fp);
      |                 ^~~~~~~~~~
lustre_rsync.c:1467:14: note: returned from ‘popen’
 1467 |         fp = popen(rsync, "r");
      |              ^~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=lustre-rsync-test

Change-Id: I518db394a282c8e6123d878f63312bfb27c59235
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45814
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15283 quota: deadlock between reint & lquota_wb 67/45667/6
Yang Sheng [Mon, 29 Nov 2021 15:00:03 +0000 (23:00 +0800)]
LU-15283 quota: deadlock between reint & lquota_wb

The reintegration thread may be still running while
the lquota_wb thread process the update record. The
reint thread will hold the dynlock and start a
transaction, lquota_wb thread will start a transacation
then try to grab the dynlock. So we must avoid the
reint & writeback thread running in parallel. This
issue only occur on the ldiskfs case.

COMMAND: "qsd_reint_2.wor"
__schedule
schedule
wait_transaction_locked [jbd2]
add_transaction_credits [jbd2]
start_this_handle [jbd2]
jbd2__journal_start [jbd2]
__ldiskfs_journal_start_sb [ldiskfs]
ldiskfs_release_dquot [ldiskfs]
dqput
dquot_get_dqblk
osd_acct_index_lookup [osd_ldiskfs]
lquota_disk_read [lquota]
qsd_refresh_usage [lquota]
qsd_reconciliation [lquota]
qsd_reint_main [lquota]
kthread
ret_from_fork

COMMAND: "lquota_wb_work-"
__schedule
 schedule
 dynlock_lock [osd_ldiskfs]
 __iam_it_get [osd_ldiskfs]
 iam_it_get [osd_ldiskfs]
 osd_index_iam_lookup [osd_ldiskfs]
 lquota_disk_write [lquota]
 qsd_update_index [lquota]
 qsd_upd_thread [lquota]
 kthread
 ret_from_fork

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I8cdd6227d3b0c5d4f67c432c3129da42a83c0ef2
Reviewed-on: https://review.whamcloud.com/45667
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15200 llite: revalidate dentry if LOOKUP lock fetched 99/45599/4
Lai Siyao [Sun, 7 Nov 2021 20:38:49 +0000 (15:38 -0500)]
LU-15200 llite: revalidate dentry if LOOKUP lock fetched

Once ll_inode_revalidate() fetches LOOKUP lock, it should revalidate
dentry, so subsequent lookup can find it in dcache.

It should also update lli_dir_depth.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I847e16d64d605b91efc93925821bc29cbea20fa2
Reviewed-on: https://review.whamcloud.com/45599
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15170 llite: Switch pcc to lookup_one_len 36/45436/15
Patrick Farrell [Wed, 8 Dec 2021 20:19:31 +0000 (15:19 -0500)]
LU-15170 llite: Switch pcc to lookup_one_len

Using kern_path to lookup files in the PCC cache means we
are subject to user namespaces, so the PCC volume must be
mapped in to a container or the cached files cannot be
found.

One solution is to switch to using lookup_one_len - this is
what the code which *creates* PCC files does.  This
manually walks the path from the root, which avoids
namespace issues.

This is appropriate because PCC is kernel functionality -
the user should not be able to directly access the volume,
but it should be accessible as a cache.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Idd15574ace29543bed1a9937cb35404781714791
Reviewed-on: https://review.whamcloud.com/45436
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13309 osd: use per-cpu counters for brw_stats 15/37915/14
Andrew Perepechko [Thu, 2 Dec 2021 07:26:32 +0000 (10:26 +0300)]
LU-13309 osd: use per-cpu counters for brw_stats

Based on perf reports, oh_lock is highly contended
when running IOR with NVMe storage, so we need to
move to per-cpu counters.

struct brw_stats becomes larger: from 3872 to 18208 bytes.
Also, 4 bytes are allocated per each cpu for every counter.
With an 8-cpu system and 32 4-byte per-cpu counters,
there are 448 per-cpu counters or 1792 bytes per-cpu.
These counters will either reuse already
allocated per-cpu pages or allocate a new page on each cpu
(8 pages total).

Change-Id: I24536a0138067fb868aaf962d9321dea7566d13f
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-8007, LUS-8185
Reviewed-on: https://review.whamcloud.com/37915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-6142 tests: Fix style issues for statmany.c 86/36386/22
Arshad Hussain [Sun, 29 Sep 2019 17:43:10 +0000 (23:13 +0530)]
LU-6142 tests: Fix style issues for statmany.c

This patch fixes issues reported by checkpatch
for file lustre/tests/statmany.c

Test-Parameters: trivial testlist=sanityn,dom-performance,replay-dual mdssizegb=20
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieb2e8aff7d45c7de3ff5035c9c00dafe82b27d31
Reviewed-on: https://review.whamcloud.com/36386
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15314 utils: set default max-inherit to 3 74/45874/9
Lei Feng [Fri, 17 Dec 2021 03:15:01 +0000 (11:15 +0800)]
LU-15314 utils: set default max-inherit to 3

Change LMV_INHERIT_DEFAULT from 0 to 3. So that the default stripe
policy of dir will not be inherited unlimited and reduce performance
unexpectly.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I67ef540046867ccec7ccc3aab035edbff95874c3
Reviewed-on: https://review.whamcloud.com/45874
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15421 tests: Add missing error() calls on errors 95/45995/2
Oleg Drokin [Fri, 7 Jan 2022 03:06:09 +0000 (22:06 -0500)]
LU-15421 tests: Add missing error() calls on errors

Just a quoted string does not really do what we need

Fixes: 8befc64e5a ("LU-13404 utils: fix lfs mirror duplicate file check")
Fixes: 41bfc1ec78 ("LU-8998 tests: test scripts for PFL")
Fixes: 3afede2b81 ("LU-8900 snapshot: user interface for write barrier on MDT")
Fixes: fdad38781c ("LU-11376 lmv: new foreign LMV format")
Fixes: 6a20bdcc60 ("LU-11376 lov: new foreign LOV format")
Fixes: 4af3ab1945 ("LU-2017 mdc: add layout swap between 2 objects")
Test-Parameters: trivial testlist=sanity,sanity-pfl,sanity-flr,sanity-lfsck
Change-Id: Id0f70512ade1cc93cbd4979dc2925f1e834f6816
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45995
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
5 months agoLU-15130 nrs: null pointer dereference in nrs_tbf_id_parse 91/45291/4
Etienne AUJAMES [Tue, 19 Oct 2021 14:10:43 +0000 (16:10 +0200)]
LU-15130 nrs: null pointer dereference in nrs_tbf_id_parse

cfs_gettok() set next->ls_str to NULL if no delimiter is found but
it does not update next->ls_len to 0.
We have to check if next->ls_str is null inside nrs_tbf_id_parse()
to verify if the tbf expression is valid.

* Reproducer *
lctl set_param  mds.MDS.mdt.nrs_tbf_rule="start tbf_name gid{500}
rate=100"

This patch fix cfs_gettok() to update "next->ls_len = 0;" if no
delimiter is found.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Iaa4eb5085262cee547ea3a944ddb94c6df1f8aa3
Reviewed-on: https://review.whamcloud.com/45291
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15245 mdc: GET(X)ATTR to READPAGE portal 93/45593/7
Andreas Dilger [Wed, 17 Nov 2021 20:01:51 +0000 (15:01 -0500)]
LU-15245 mdc: GET(X)ATTR to READPAGE portal

Send the MDS_GETATTR and MDS_GETXATTR RPCs to the
MDS_READPAGE_PORTAL instead of the default portal to avoid
deadlocks with other MDS_REINT RPCs that may block all of
the MDS service threads on that portal.

This deadlock occurs with MDS_GETXATTR when selinux is
enabled, because getxattr becomes part of lookup, so it
takes a reference on a lock used for lookup.  However, all
of the MDS service threads on the default portal can be
consumed by threads waiting for that lock, resulting in
a deadlock when the getxattr can't be processed.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4fbae266022ee9fa38f3196acb1443df5056fe5e
Reviewed-on: https://review.whamcloud.com/45593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14658 tests: fix conf-sanity 122b test 41/45841/3
Alexander Boyko [Mon, 13 Dec 2021 20:00:48 +0000 (15:00 -0500)]
LU-14658 tests: fix conf-sanity 122b test

Sometimes the test 122b failed with:
dd: failed to open '/mnt/lustre/d122b.conf-sanity/f122b.conf-sanity':
Numerical result out of range

ZFS readonly simulation produces OS_STATFS_READONLY flag.
It leads to zero stripe_count at lod_get_stripe_count(), and
lod_qos_prep_create() returns -34(ERANGE).

The patch fixes it by file creation before replay_barrier.

Test-Parameters: trivial fstype=zfs env=ONLY=122b,ONLY_REPEAT=20 testlist=conf-sanity
Fixes: 747fed818be5 ("LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I7ec04ffe09d0038bcf99e1a571f14d2bfb6a5df5
Reviewed-on: https://review.whamcloud.com/45841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15362 util: fix silent failure of component delete 33/45833/6
Li Xi [Sun, 12 Dec 2021 05:21:32 +0000 (13:21 +0800)]
LU-15362 util: fix silent failure of component delete

When no component ID is specified, no error message is printed
when the command of component deletion fails. The failure thus
could be easily ignored. And the failure reason is hard to be
understood without any error message.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: Id20b55a3e12a7152198ee475e55f6dd764a55219
Reviewed-on: https://review.whamcloud.com/45833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15140 tests: cleanup of recovery-*-scale tests fails 24/45824/3
Elena Gryaznova [Fri, 10 Dec 2021 15:58:14 +0000 (18:58 +0300)]
LU-15140 tests: cleanup of recovery-*-scale tests fails

Bash trap handler is executed only after completition of
current command, so under big I/O load it can be executed
after test and cleanup phase finished.

Run I/O load in background overcome bash limitation.

Test-Parameters: clientcount=6 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180 testlist=recovery-double-scale env=SLOW=yes
Test-Parameters: clientcount=5 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180 env=DURATION=82800 testlist=recovery-mds-scale env=SLOW=yes
Test-Parameters: clientcount=5 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=DURATION=82800 testlist=recovery-random-scale env=SLOW=yes
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-2649
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I3c91cac4d3f9af9863e8f48ba8a6bae02190ccb4
Reviewed-on: https://review.whamcloud.com/45824
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-930 utils: add --lazy option to 'lfs find' usage 16/45816/2
Andreas Dilger [Thu, 9 Dec 2021 20:26:50 +0000 (13:26 -0700)]
LU-930 utils: add --lazy option to 'lfs find' usage

The usage message for "lfs find" does not show the "--lazy"
option to allow checking LSOM data from the MDT instead of
getting size and blocks from the OSTs.

Also sort the options in the "lfs find" usage to be roughly
in alphabetical order.

Test-Parameters: trivial
Fixes: 11aa7f8704c4 ("LU-11367 som: integrate LSOM with lfs find")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iafea18b2d74d889bb258f2be4b3af0f9203ebbe5
Reviewed-on: https://review.whamcloud.com/45816
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15275 lnet: Skip router discovery on send path 84/45684/2
Chris Horn [Tue, 30 Nov 2021 16:57:34 +0000 (10:57 -0600)]
LU-15275 lnet: Skip router discovery on send path

When the router checker is enabled, routes are regularly marked as out
of date w.r.t. discovery. This can cause upper level messages to be
delayed while the router undergoes discovery. We can avoid delaying
messages by relying on the router checker to initiate discovery of
routers. If we happen to send a message to a router before it has
been discovered then the worst case scenario is that the route is
actually down or we end up utilizing a subset of a multi-rail router's
interfaces. Both situations can be remedied by utilizing the
check_routers_before_use parameter.

Change the logic in lnet_handle_find_routed_path() so that we only
initiate discovery if the alive_router_check_interval is <= 0 (i.e.
router checker pings are disabled).

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If0332c21f6157117598b7b908fe17f2d2690fc1d
Reviewed-on: https://review.whamcloud.com/45684
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15069 llite: Add start_idx debug 74/45674/10
Patrick Farrell [Wed, 15 Dec 2021 21:27:58 +0000 (16:27 -0500)]
LU-15069 llite: Add start_idx debug

When readahead is triggered, current readahead debug
prints the page the user requested which triggered
readahead and the number of pages read by readahead.

However, readahead does not necessarily start reading from
the user requested page, so it's important to also print
the page where readahead starts.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie474811f3b0076f4f914fae7f74496e96ddb31da
Reviewed-on: https://review.whamcloud.com/45674
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15317 llite: Add D_IOTRACE 52/45752/6
Patrick Farrell [Mon, 6 Dec 2021 02:50:39 +0000 (21:50 -0500)]
LU-15317 llite: Add D_IOTRACE

In looking in to performance problems, it's very important
to be able to trace the I/O patterns from userspace in to
Lustre, and also understand the key basics of how Lustre
handles that I/O (readahead, RPC generation).

This is best done with a dedicated debug flag - No
userspace tool can provide all this information, and
existing debug flags collect a huge number of unrelated
pieces of, well, debug information.

The goal is for customers to be able to quickly gather log
files of a reasonable size which contain the necessary
information and which can easily be interpreted by
engineering.  This is not possible if the information is
spread out across a number of heavyweight debug flags.

This is a first pass at adding the flag and the debug
required to track basic data I/O.  One significant
omission in the first patch is RPC generation - I have not
decided how best to do that yet.  That will be added in a
future patch.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0ed003ec1488e1c267b194c871f64b34f6dc6025
Reviewed-on: https://review.whamcloud.com/45752
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15317 libcfs: Remove D_TTY 51/45751/3
Patrick Farrell [Mon, 6 Dec 2021 02:33:02 +0000 (21:33 -0500)]
LU-15317 libcfs: Remove D_TTY

The D_TTY flag is almost entirely unused and certainly not
needed.  Remove it so we have a spare flag to use for
iotrace.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1127cbcf6ee51adc07d560a8827fa1e32d16c90c
Reviewed-on: https://review.whamcloud.com/45751
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9206 llite: access striped directory with missing stripe 31/45631/7
Lai Siyao [Fri, 19 Nov 2021 19:50:34 +0000 (14:50 -0500)]
LU-9206 llite: access striped directory with missing stripe

This patch allows acessing striped directory with missing stripes:
* lmv_revalidate_slave() skip error if one stripe returns -ESHUTDOWN.
* add ll_dir_flush(), which will return error found in reading
  stripe dir pages, thus 'ls' can list dirents on other stripes, and
  return an error in the end.

Add sanity 33i, update 60g because now ls may fail.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I16efd34e02b9855756cc93556e9e52550178f203
Reviewed-on: https://review.whamcloud.com/45631
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15216 lmv: improve MDT QOS space balance 44/45544/6
Lai Siyao [Sat, 6 Nov 2021 19:16:49 +0000 (15:16 -0400)]
LU-15216 lmv: improve MDT QOS space balance

When MDTs are not balanced, QOS code tries to keep subdirectory
creation local to the same MDT when it is deep in the directory
tree, to avoid creating too many remote directories, but the
existing weight to stay on the parent MDT until 50% of other MDTs
is too radical, and causes mkdirs to be "stuck" on the same MDT.

* remove "lq_threshold_rr" from above calculation because the check
  in ltd_qos_is_usable() handles this, so use only "dir_depth".
* the factor is changed to "16 / (dir_depth + 10)", then it's less
  likely to stick to the parent MDT for top levels, while more
  likely to stay on the parent MDT for low levels:
  depth=0 -> 160%, depth=4 -> 114%, depth=6 -> 100%,
  depth=8 -> 88%, depth=12 -> 72%
* rename lli_depth to lli_dir_depth to make usage more clear.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iec6b77919b630d4baee6d54bee7bdb8ca9fb8574
Reviewed-on: https://review.whamcloud.com/45544
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
5 months agoLU-15137 socklnd: decrement connection counters on close 22/45422/7
Serguei Smirnov [Sat, 30 Oct 2021 18:39:26 +0000 (11:39 -0700)]
LU-15137 socklnd: decrement connection counters on close

To gracefully handle potential race with delayed connection create,
decrement connection counters per type as connections are being
closed.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 71b2476e ("LU-12815 socklnd: add conns_per_peer parameter")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ieb3b44701e4999ea1fe63234162dd5878d65958a
Reviewed-on: https://review.whamcloud.com/45422
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15133 osp: only deactivate OSP on LAST_FID error 09/45309/7
Lai Siyao [Wed, 20 Oct 2021 05:46:17 +0000 (01:46 -0400)]
LU-15133 osp: only deactivate OSP on LAST_FID error

ofd_get_info_hdl() should return -EFAULT upon LAST_FID error, which
is the same as LAST_ID error.

osp_get_lastfid_from_ost() should deactivate OSP only upon -EFAULT,
which means reading LAST_FID on OST failed. This can avoid unnecessary
admin intervention.

Add sanity 27S.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib78c8994c0398dd4b4db32005abd018933ef3a7c
Reviewed-on: https://review.whamcloud.com/45309
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13560 lod: set default LMV for "lfs mkdir -c 1" 90/45290/11
Lai Siyao [Wed, 13 Oct 2021 05:55:15 +0000 (01:55 -0400)]
LU-13560 lod: set default LMV for "lfs mkdir -c 1"

With the introduction of filesystem-wide default LMV, dirs will be
created on MDT by space usage, but if dir is created by
"lfs mkdir -c 1 ...", its subdirs should be kept on the same MDT.
To achieve this, set default LMV on such dirs, NB if user doesn't
want this, he needs to create dir with
"lfs mkdir -c 1 --max-inherit=0 ...".

The policy to choose MDT in mkdir is as below:
1. is "lfs mkdir -i N"? mkdir on MDT N.
2. is "lfs mkdir -i -1"? mkdir by space usage.
3. is starting MDT specified in default LMV? mkdir on MDT N.
4. is default LMV space balanced? mkdir by space usage.

Changes on server side:
* Don't inherit default LMV for "lfs mkdir".
* Don't migrate default LMV in dir migration/split.

Remove setting default LMV in mkdir_on_mdt().

Update sanity 412.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0ffdcf7a4a85a31e2df788198aeb5e9a910160d8
Reviewed-on: https://review.whamcloud.com/45290
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15097 quota: stop pool_recalc before killing pool 56/45256/8
Sergey Cheremencev [Mon, 20 Sep 2021 12:08:20 +0000 (15:08 +0300)]
LU-15097 quota: stop pool_recalc before killing pool

qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:
qmt_pool_fini()) ASSERTION( list_empty(&qmt->qmt_pool_list) ) failed

HPE-bug-id: LUS-10294
Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15031 quota: reseed glbe in qmt_lvbo_udate 32/45032/13
Sergey Cheremencev [Tue, 25 May 2021 22:44:48 +0000 (01:44 +0300)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate

Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.

The patch also adds test 80 to check that OST gets correct
edquot value after failover.

HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15018 o2iblnd: treat cmid->device == NULL as an error 81/44981/4
Serguei Smirnov [Fri, 17 Sep 2021 21:06:26 +0000 (14:06 -0700)]
LU-15018 o2iblnd: treat cmid->device == NULL as an error

Even if rdma_bind_addr is successful, kiblnd_dev_failover should
treat cmid->device == NULL as an error in order to later avoid
calling kiblnd_set_ni_fatal_on with possibly dev->ibd_hdev == NULL.

Test-Parameters: trivial
Fixes: 4668283cd1 ("LU-14806 o2iblnd: clear fatal error on successful failover")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iefbe030b25d2dc543461cf98afeacd734fd64cf8
Reviewed-on: https://review.whamcloud.com/44981
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14707 tests: Bashify scripts for Ubuntu et. al. 86/43786/8
Shaun Tancheff [Fri, 10 Dec 2021 17:45:46 +0000 (12:45 -0500)]
LU-14707 tests: Bashify scripts for Ubuntu et. al.

Some scripts use bash-isms that are not present in
bourne (sh) or Ubuntu's default dash shell.

Be explicit and prefer bash

HPE-bug-id: LUS-8398
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I02f742e9787e1811b422b619e00911ee52673262
Reviewed-on: https://review.whamcloud.com/43786
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14542 obd: tunable for sanity grant check 28/42128/7
Vladimir Saveliev [Mon, 13 Dec 2021 15:11:58 +0000 (18:11 +0300)]
LU-14542 obd: tunable for sanity grant check

Control on sanity grant check via lctl set_param
*.*.grant_check_threshold is added.  0 is to unconditionally turn
grant checking on.
By default, as before, grant check gets turned off when number of
exports is more than 100.

Change-Id: Ib2505da74f6e3d541bce5def3e90597eda232c58
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
HPE-bug-id: LUS-9827
Reviewed-on: https://review.whamcloud.com/42128
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14195 ldiskfs: update patches for Linux 5.10 13/40913/10
Mr NeilBrown [Tue, 14 Dec 2021 17:43:45 +0000 (12:43 -0500)]
LU-14195 ldiskfs: update patches for Linux 5.10

Mostly simple conflicts due to code movement, however:

ext4-data-in-dirent.patch now needs to patch fs/ext4/fast-commit.c as
well as ext4_init_new_dir() is used in that file. Since fast commit
can break recovery we prevent mounting with this option.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I59b10fdb6bb606b193472e3045ab7d9b1d0d36b5
Reviewed-on: https://review.whamcloud.com/40913
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13799 lov: Cache stripe offset calculation 45/39445/25
Patrick Farrell [Thu, 10 Jun 2021 17:23:19 +0000 (13:23 -0400)]
LU-13799 lov: Cache stripe offset calculation

Calculating the page offset relative to the stripe (etc)
in a file is surprisingly expensive.  Because i/o has
already been split up to stripes by the cl_io code,
calculating the stripe each time is unnecessary.

We cache most of the values requiring calculation.

This improves AIO/DIO page submission significantly,
improving performance by a bit over 10%.

Also remove lpg_generation, which isn't doing anything
useful.  This suggests the possibility of removing
lov_page, but that's for another patch.

This patch reduces i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 22 ms/GiB

Totals:
Write: 119 ms/GiB
Read: 121 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7531 MiB/s
read         7179 MiB/s

Plus this patch:
write        8637 MiB/s
read         8488 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I89e994592853d0fe93a034bfe8bdfb459bdaf584
Reviewed-on: https://review.whamcloud.com/39445
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-11558 tests: re-enable conf-sanity t32_verify_quota() 23/33423/11
Andreas Dilger [Mon, 22 Oct 2018 23:19:57 +0000 (07:19 +0800)]
LU-11558 tests: re-enable conf-sanity t32_verify_quota()

Since patch https://review.whamcloud.com/28020 "LU-3285 test: add
Data-on-MDT tests and fixes" landed the call to t32_verify_quota()
from conf-sanity.sh t32_test() has been removed, and (I'd guess)
that we are no longer verifying that quota is still working
correctly after an upgrade.

It is unclear why this check was removed. Return it back to the
code so that we resume testing that quota is working after an
upgrade from an old disk format.

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: fstype=zfs testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia6bab4907e7cb4e6e5581a6a072cc51ab53ebbe5
Reviewed-on: https://review.whamcloud.com/33423
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-9272 tests: fix for facet_failover mgs 35/26235/27
Elena Gryaznova [Thu, 11 Oct 2018 14:20:32 +0000 (17:20 +0300)]
LU-9272 tests: fix for facet_failover mgs

Have facet_failover to work for mgs facet and include failover
nodes in the list of nodes to load modules.

When mgs/mds are combined, assign the failover host of mds to
the failover host of mgs.

Assign mgsfailover_dev with mds1failover_dev when mgs/mds are combined
while mounting facets, as mds1failover_dev is already defined when
mgs/mds are combined.

Fix start() to export mgs_dev and mgsfailover_dev for
combined_mds_mgs.

Do not wait recovery complete on mgs.

Test-Parameters: trivial failover=true osscount=2 mdscount=2 mdtcount=1 austeroptions=-R iscsi=1 env="ONLY=121" testlist=conf-sanity
Test-Parameters: testlist=conf-sanity
Change-Id: Ie698814c530c8deb98aa0010f2a0fa8e261b4b69
HPE-bug-id: MRP-3374, LUS-4858, LUS-2361
Signed-off-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Noopur Maheshwari <noopur.maheshwari@seagate.com>
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/26235
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14651 libcfs: use namespace CRYPTO_INTERNAL 05/45805/3
Jian Yu [Thu, 9 Dec 2021 08:13:23 +0000 (00:13 -0800)]
LU-14651 libcfs: use namespace CRYPTO_INTERNAL

In kernel 5.12 commit 0eb76ba29d16df2951d37c54ca279c4e5630b071,
cipher routines are moved into include/crypto/internal/cipher.h,
and the symbol exports are moved into namespace CRYPTO_INTERNAL.

This patch accommodates the above changes and fixes the following
build errors:
ERROR: modpost: module libcfs uses symbol crypto_cipher_encrypt_one
from namespace CRYPTO_INTERNAL, but does not import it.
ERROR: modpost: module libcfs uses symbol crypto_cipher_setkey
from namespace CRYPTO_INTERNAL, but does not import it.

Test-Parameters: trivial

Change-Id: I908006f81ee632c2d02fe3dd6ac41fdd6296a4b0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45805
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15298 tests: set mdt_hash permanently 93/45693/3
Elena Gryaznova [Sun, 26 Dec 2021 09:53:33 +0000 (12:53 +0300)]
LU-15298 tests: set mdt_hash permanently

On failover setup where <mdtN>_HOST != <mdtNfailover>_HOST
"do_nodes $(comma_list $(mdts_nodes)) lctl set_param" fails:
  set_param: param_path 'lod/*/mdt_hash': No such file or directory
if mdtN facet is active and up on <mdtNfailover>_HOST.
Let's set this parameter permanently.

Fixes: 0a1cf8da80 ("LU-11025 dne: introduce new directory hash type: "crush"")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10601
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: Ie36745cdc5fde4a33387baafe146e06ce8812eb4
Reviewed-on: https://review.whamcloud.com/45693
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10824 llite: make foreign symlinks aware of mount namespaces 09/45609/4
James Simmons [Thu, 18 Nov 2021 16:39:08 +0000 (11:39 -0500)]
LU-10824 llite: make foreign symlinks aware of mount namespaces

Currently the foreign symlink code test if mount namespace is the
same namespace related to the sysfs tree. This doesn't cover all
cases. Linux supports limiting which mounts are visible to a
process with mount namespaces. Lets add this support as well.

Change-Id: Ie87ed45b3c4439e8800c937eb27ed4931989c0f4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45609
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15112 mgc: do not ignore target registration failure 59/45259/15
Alexander Zarochentsev [Wed, 15 Dec 2021 10:26:02 +0000 (13:26 +0300)]
LU-15112 mgc: do not ignore target registration failure

A serious target registation failure with LDD_F_ERROR
flag set is ignored by target, it makes possible
registreting new target with already used index;
Writeconf flag should be encoded in fs label regardless
the "first_time" flag, otherwise target cannot be registered
after initial registration failure.

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: If051199d3dbafc8f8102f3daf086de01bc5c5f98
Reviewed-on: https://review.whamcloud.com/45259
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15112 ptlrpc: make rq_replied flag always correct 71/45871/3
Alexander Zarochentsev [Wed, 15 Dec 2021 12:31:47 +0000 (15:31 +0300)]
LU-15112 ptlrpc: make rq_replied flag always correct

rq_replied flag is cleared at ptl_rpc_send() only,
so state of the flag may be incorrect for rpcs which
are timed out but have have been never sent.

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I0de996a4d775b8f1a1a6b27ff38d21645694f868
Reviewed-on: https://review.whamcloud.com/45871
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15056 nrs: length of a tbf rule should be checked 24/45124/6
Etienne AUJAMES [Mon, 4 Oct 2021 18:42:31 +0000 (20:42 +0200)]
LU-15056 nrs: length of a tbf rule should be checked

The maximum size of a tbf rule name is 16 bytes (MAX_TBF_NAME). This
length is not verify before applying the rule. This causes a buffer
overflow at name copy.

This patch adds a str length verification inside name_is_invalid().
The test sanityn 77p checks if an error is returned when user try to
register a rule with an invalid name.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I93c73083b6e81ab9070a860e702e56b0cb498352
Reviewed-on: https://review.whamcloud.com/45124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13587 quota: protect qpi in proc 87/43987/15
Sergey Cheremencev [Thu, 15 Apr 2021 14:14:51 +0000 (17:14 +0300)]
LU-13587 quota: protect qpi in proc

Access to pool info only when pool is fully inited.
This patch protects from following panic:

[212010.467347] BUG: unable to handle kernel NULL pointer dereference at           (null)
[212010.468205] IP: [<ffffffffc0e55e46>] qpi_state_seq_show+0x86/0xe0 [lquota]
...
[212010.486786] Call Trace:
[212010.487344]  [<ffffffffbbc68b50>] seq_read+0x130/0x440
[212010.487741]  [<ffffffffbbcb8380>] proc_reg_read+0x40/0x80
[212010.488445]  [<ffffffffbbc4118f>] vfs_read+0x9f/0x170
[212010.489056]  [<ffffffffbbc4204f>] SyS_read+0x7f/0xf0
[212010.489920]  [<ffffffffbc176ddb>] system_call_fastpath+0x22/0x27
[212010.490861] Code: 5c a8 01 00 00 41 8b 8c 1c c0 01 00 00 48 c7 c6 18
[212010.493235] RIP  [<ffffffffc0e55e46>] qpi_state_seq_show+0x86/0xe0 [lquota]
[212010.493672] RSP <ffff908505747e28>
[212010.494161] CR2: 0000000000000000

Add test 79 to sanity-quota to check that race between
access to /proc/.../dt-pool_name/info of non-existed pool
with this pool creating doesn't cause a panic.

HPE-bug-id: LUS-9938
Change-Id: I8eff846c6c3881a8431a98efb54e660ecb9155bf
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/43987
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14008 o2iblnd: cleanup 60/40260/8
Alexey Lyashkov [Fri, 7 Aug 2020 11:26:25 +0000 (14:26 +0300)]
LU-14008 o2iblnd: cleanup

simplify kiblnd_send by avoid code duplication.
lets pickup idle tx first.

Test-Parameters: trivial
HPE-bug-id: LUS-1796
Change-Id: Iaf71a9a3aeb3047a086d4cc0a3cf4f1dbe8944b4
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40260
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12056 ldiskfs: add trusted.projid virtual xattr 79/45679/10
Li Dongyang [Tue, 30 Nov 2021 01:13:03 +0000 (12:13 +1100)]
LU-12056 ldiskfs: add trusted.projid virtual xattr

Add trusted.projid virtual xattr in ldiskfs to export the
current project id, intended for ldiskfs level MDT backup.

When the project id is EXT4_DEF_PROJID/0,
the virtual xattr is hidden from listxattr(2).

It's also hidden on lustre client when parent has the
project inherit flag and the same project ID,
to stop mv from setting the virtual xattr on the dest with
the project id from src, which could be different from dest.

getxattr(2) on trusted.projid will report current project id,
setxattr(2) will change curent project id and
removexattr(2) will set project id back to EXT4_DEF_PROJID/0

Both get|setxattr(2) will work even when the virtual xattr is
hidden.

Invalidate client xattr cache for the inode when changing its
project id, so the virtual xattr can get the new value
for next getxattr(2)

Add test cases to verify the virtual projid xattr and backup
restore MDT using tar can now preserve the project id.

Change mds_backup_restore in test framework, to use
tar with --xattrs --xattrs-include='trusted.*'" options.

Change-Id: I29b1aa922ef72d734cdc87125401fa08fb13d4af
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45679
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15219 lfs: migration to DoM layout fix 49/45549/4
Mikhail Pershin [Fri, 12 Nov 2021 16:00:22 +0000 (19:00 +0300)]
LU-15219 lfs: migration to DoM layout fix

Migration to DoM layout from OST-striped file can skip
data sync beyond DoM component if it is not initialized.
Patch forces data copy prior layout merge, so new layout
is initialized and contains needed data

Tests 272e/272f in sanity.sh were modified to migrate data
for both MDT and OST parts

Fixes: 44a721b8c1 ("LU-11421 dom: manual OST-to-DOM migration via mirroring")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I206358e762780ab7cfaa7587888174a31bc7b196
Reviewed-on: https://review.whamcloud.com/45549
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-15189 osc: don't have extra nvidia call 81/45481/5
Alexey Lyashkov [Mon, 8 Nov 2021 06:36:08 +0000 (09:36 +0300)]
LU-15189 osc: don't have extra nvidia call

osc don't needs to call nvidia to check an GPU page,
this is in the oap_flags

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I124c328838ad9823361afef33d0732fa4ebbb696
Reviewed-on: https://review.whamcloud.com/45481
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15358 tests: Variable incorrectly defined under sanityn 19/45819/2
Arshad Hussain [Fri, 10 Dec 2021 04:34:26 +0000 (10:04 +0530)]
LU-15358 tests: Variable incorrectly defined under sanityn

Under sanityn.sh/print_jbd_stat() local variable
was incorrectly defined. This was exposed using
shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In lustre/tests/sanityn.sh line 950:
local varcvs
      ^-- SC2034: varcvs appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanityn
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7b2f62c15e420a4c6f5d71445a2e940816e20098
Reviewed-on: https://review.whamcloud.com/45819
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
6 months agoLU-15360 tests: Use saved value on EXIT/Restore 21/45821/2
Arshad Hussain [Fri, 10 Dec 2021 09:46:54 +0000 (15:16 +0530)]
LU-15360 tests: Use saved value on EXIT/Restore

This was originally reported by shellcheck as
unused variable. However, on closer inspection
it appears that the restore on "EXIT" was
hard-coded to 0 (mostly this should be correct)
instead of using the original value of $old

This patch resets 'enable_chprojid_gid' value
to original value captured in $old instead of
hard-coded value of 0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ./lustre/tests/sanity-quota.sh line 4150:
local old=$(do_facet mds1 $LCTL get_param -n \
      ^-- SC2034: old appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I31e7a8a931d53a1fcb9d77ecf1759fce572bd52c
Reviewed-on: https://review.whamcloud.com/45821
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15220 utils: fix gcc-11 -Werror=format-truncation= error 15/45815/2
Jian Yu [Thu, 9 Dec 2021 20:00:36 +0000 (12:00 -0800)]
LU-15220 utils: fix gcc-11 -Werror=format-truncation= error

This patch fixes the following -Werror=format-truncation= error in
liblustreapi.c:

liblustreapi.c: In function ‘lov_dump_comp_v1’:
liblustreapi.c:3673:57: error: ‘snprintf’ output may be truncated
before the last format character [-Werror=format-truncation=]
 3673 |                 snprintf(pool_name, LOV_MAXPOOLNAME, "%s",
      |                                                         ^

Change-Id: I55c3e05a933ff3d2c33a71ed269fffe63797b528
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45815
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15356 tests: get rid of extra spaces in PERM_CMD 13/45813/2
Elena Gryaznova [Thu, 9 Dec 2021 18:50:37 +0000 (21:50 +0300)]
LU-15356 tests: get rid of extra spaces in PERM_CMD

The tests use the PERM_CMD set to "set_param  -P" with
extra space before -P" fail because they do not expect
these allowable extra spaces:
   [[ $PERM_CMD = *"set_param -P"* ]]

Fixes: b9c359a70d ("LU-7004 tests: move from lctl conf_param to lctl set_param -P")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: Ia18e32baa56b7dac1f4e15777bfcc4b9ab1048fb
Reviewed-on: https://review.whamcloud.com/45813
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14776 ldiskfs: support Ubuntu 5.8.0-63 94/45794/2
James Simmons [Sat, 4 Dec 2021 22:51:23 +0000 (15:51 -0700)]
LU-14776 ldiskfs: support Ubuntu 5.8.0-63

Handle small changes in ext4 for Ubuntu 5.8.0-63 release.

Change-Id: Ie81b64909a49e66af17b4dfc1b8fbaf538f9f29e
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45794
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15342 tests: escape "|" 88/45788/3
Elena Gryaznova [Wed, 8 Dec 2021 11:19:44 +0000 (14:19 +0300)]
LU-15342 tests: escape "|"

escape "|" on want="FULL|IDLE" to protect interpretation
by shell:
  sh: IDLE: command not found

Fixes: af666bef05 ("LU-12857 tests: allow clients to be IDLE after recovery")
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I2f885ea225ba43537f37b8dad1c2e0cd8f652a79
Reviewed-on: https://review.whamcloud.com/45788
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15339 tests: Increase timeout in sanity 208 79/45779/2
Patrick Farrell [Tue, 7 Dec 2021 21:54:20 +0000 (16:54 -0500)]
LU-15339 tests: Increase timeout in sanity 208

It's been observed that occasionally the initial request in
sanity 208 does not complete in 1 second, which invalidates
the test.  (And sometimes causes it to fail - but even if
it passes, the test is invalid.)

Increase the time to 2 seconds.

Using trivial testing because this just modifies sanity and
it's such a simple change.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I70cf32813a9a2ced0cc388eb25eba29918ba7d03
Reviewed-on: https://review.whamcloud.com/45779
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
6 months agoLU-15338 tests: check whole jobid in sanity 205a 74/45774/3
Andreas Dilger [Tue, 7 Dec 2021 19:12:00 +0000 (12:12 -0700)]
LU-15338 tests: check whole jobid in sanity 205a

Check the whole jobid string in sanity test_205a to avoid matching
a substring of the jobid twice.  This could only currently happen
for the second "dd" test, at a rate about 1/8192, but might also
fail in the future if other tests are added.

Test-Parameters: trivial testlist=sanity env=ONLY=205a,ONLY_REPEAT=200
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34b7ed1a7825e3fbad9ea8666fccb2bdc53ebbe5
Reviewed-on: https://review.whamcloud.com/45774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15331 kernel: kernel update SLES15 SP2 [5.3.18-24.96.1] 64/45764/2
Jian Yu [Tue, 7 Dec 2021 07:11:51 +0000 (23:11 -0800)]
LU-15331 kernel: kernel update SLES15 SP2 [5.3.18-24.96.1]

Update SLES15 SP2 kernel to 5.3.18-24.96.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 103 125 130 136 154 255 817" \
clientdistro=sles15sp2 \
testlist=sanity

Change-Id: Ia457af76060a96f574cb501af6456afdc7de6411
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45764
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15244 llite: set ra_pages of backing_dev_info with 0 12/45712/4
Qian Yingjin [Thu, 2 Dec 2021 13:16:18 +0000 (08:16 -0500)]
LU-15244 llite: set ra_pages of backing_dev_info with 0

The latest RHEL8.5 kernel sets initial @ra_pages of
backing_dev_info with VM_READAHEAD_PAGES:
struct backing_dev_info *bdi_alloc(int node_id)
{
...
bdi->ra_pages = VM_READAHEAD_PAGES;
bdi->io_pages = VM_READAHEAD_PAGES;
...
}

This will cause that @ra_pages of file readahead state is set
with @bdi->ra_pages, make the readahead is out of Lustre control
and trigger the readahead logic in Linux kernel wrongly. And it
results in the failure sanity 101j.

In this patch, we force to set @ra_pages of backing_dev_info with
0 after setup the backing device info. By this way, it disables
kernel readahead in the super block.

This patch also cleanups the unnecessary setting of @ra_pages in
llite "file.c" and "vvp_io.c".

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If6468109620269c1e76abe3a1cd73c3b40a417a8
Reviewed-on: https://review.whamcloud.com/45712
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15293 build: Add build support for arm64 centos8 91/45691/4
Xinliang Liu [Mon, 22 Nov 2021 01:34:34 +0000 (01:34 +0000)]
LU-15293 build: Add build support for arm64 centos8

This patch adds lbuid support for latest Arm64 CentOS 8.4, 8.5.

Also fix build doesn't use Lustre provided kernel config file
on CentOS8.

Test-Parameters: trivial

Test-Parameters: env=SANITY_EXCEPT="101j" \
 clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Change-Id: I95c7aa7e77ea1cc7a99fdaacc2220e14d2db6185
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45691
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-12678 o2iblnd: convert ibp_refcount to a kref 85/45685/4
James Simmons [Tue, 30 Nov 2021 18:21:49 +0000 (13:21 -0500)]
LU-12678 o2iblnd: convert ibp_refcount to a kref

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I23ade8c2f768c70a1fd330e8c173e0d18f5ff976
Reviewed-on: https://review.whamcloud.com/45685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15234 lnet: Race on discovery queue 70/45670/8
Chris Horn [Mon, 29 Nov 2021 17:38:48 +0000 (11:38 -0600)]
LU-15234 lnet: Race on discovery queue

If the discovery thread clears the LNET_PEER_DISCOVERING bit then a
race window opens when the discovery thread drops the
lnet_peer.lp_lock spinlock and closes when the discovery thread
acquires the lnet_net_lock. If another thread queues the peer for
discovery during this window then the LNET_PEER_DISCOVERING bit is
added back to the peer state, but since the peer is already on the
lnet.ln_dc_working queue, it does not get added to the
lnet.ln_dc_request queue.

When the discovery thread acquires the lnet_net_lock/EX, it sees that
the LNET_PEER_DISCOVERING bit has not been cleared, so it does not
call lnet_peer_discovery_complete() which is responsible for sending
messages on the peer's discovery pending queue.

At this point, the peer is stuck on the lnet.ln_dc_working queue, and
messages may continue to accumulate on the peer's
lnet_peer.lp_dc_pendq.

Fix the issue by re-working the main discovery thread loop so that we
do not release the lnet_peer.lp_lock until after we've determined
whether we need to call lnet_peer_discovery_complete().
This ensures that the lnet_peer is correctly removed from the
discovery work queue and any messages on the peer's
lnet_peer.lp_dc_pendq are sent or finalized.

It is also possible for the lnet_peer.lp_dc_error to be cleared
during the aforementioned window, as well as during the time when
lnet_peer_discovery_complete() is processing the contents of the
lnet_peer.lp_dc_pendq. This could prevent messages on the
lnet_peer.lp_dc_pendq from being correctly finalized. To fix this
issue, the responsibilities of lnet_peer_discovery_error() were
incorporated into lnet_peer_discovery_complete().

HPE-bug-id: LUS-10615
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3779a342de7108105c2fd2bc41373560e8e5ef14
Reviewed-on: https://review.whamcloud.com/45670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15279 ptlrpc: use a cached value 61/45661/5
Alexey Lyashkov [Thu, 25 Nov 2021 18:12:21 +0000 (21:12 +0300)]
LU-15279 ptlrpc: use a cached value

Don't calculate a early reply size - use a cached,
as it don't changed after start

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I3a6bd5013d0646b6165db52d6a7fb38b263756e6
Reviewed-on: https://review.whamcloud.com/45661
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15175 tests: fix ldev tests 13/45613/6
Elena Gryaznova [Thu, 2 Dec 2021 11:31:04 +0000 (14:31 +0300)]
LU-15175 tests: fix ldev tests

generate_ldev_conf() and tests use this fn do not
work on setup with ost targets located not on 1 oss
and do not work on failover setup where
  <facet>_HOST != <facet>failover_HOST.

Fixes: 0f17fc82a89a ("LU-7060 ldev: Added MGS NID substitution to ldev")
Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-2495
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ief38df0b1a0ffa37a8e7a4545a69a453d6dba7bd
Reviewed-on: https://review.whamcloud.com/45613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15095 target: lbug_on_grant_miscount module parameter 21/45521/5
Vladimir Saveliev [Wed, 10 Nov 2021 08:40:50 +0000 (11:40 +0300)]
LU-15095 target: lbug_on_grant_miscount module parameter

Some tests have hit "lctl: error invoking upcall" when setting the
lbug_on_grant_miscount tunable parameter.  Instead, define a module
parameter lbug_on_grant_miscount flag as ptlrpc module parameter,
similar to how it is done for ldiskfs_track_declares_assert.

Change-Id: I9cd0f9fa75b37539b23443bbcbb3445c87318ab1
Fixes: bb5d81ea95 ("LU-14543 target: prevent overflowing of tgd->tgd_tot_granted")
Test-Parameters: trivial
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45521
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15195 ofd: missing OST object 59/45459/6
Vitaly Fertman [Thu, 4 Nov 2021 14:28:49 +0000 (17:28 +0300)]
LU-15195 ofd: missing OST object

as the OST-MDT resync may be not finished by the end of the recovery
it may happen new enqueue for a write op may fail due to an absent
object. Return EINPROGRESS so that the enqueue was resent until get
resynced.

to not get stuck forever in case of disappeared MDT or a double
failure, return EINPROGRESS during hard failover timeout only.

also, cleanup replay-ost-single test 12:
- eliminate a need in the hard failover
- no need in a special obd_fail_loc, just use replay_barrier
- createmany is able to create files with unique names,
  no need in special steps

HPE-bug-id: LUS-10267
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I5f16b63454c51ad8d112770c15c7e6e7f41f3c40
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/45459
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15137 socklnd: expect two control connections maximum 61/45461/4
Serguei Smirnov [Thu, 4 Nov 2021 18:35:43 +0000 (11:35 -0700)]
LU-15137 socklnd: expect two control connections maximum

As a result of connecting to ourselves, e.g. pinging own nid,
two control type connections are established vs. just one
in case of connecting externally.
Fix the control connection counter to be able to handle that.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 71b2476e ("LU-12815 socklnd: add conns_per_peer parameter")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Idce01d81e3924226b5b163d2472cbcd4f6eb5819
Reviewed-on: https://review.whamcloud.com/45461
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
6 months agoLU-15109 tests: different quota and usage relations 57/45257/6
Sergey Cheremencev [Fri, 15 Oct 2021 12:23:53 +0000 (15:23 +0300)]
LU-15109 tests: different quota and usage relations

Add sanity-quota_1i that following cases:
- User is above PQ limit and the quota limit is cleared.
  User should now be able to write.
- User is below PQ limit and the quota limit is lowered
  below current usage. User should not be able to write.
- User is above PQ limit and the quota limit is raised
  above current usage. Should now be able to write.

Change-Id: Iad81c706aaf838cacfdf2971ee100950c47d1585
HPE-bug-id: LUS-9935
Test-Parameters: testlist=sanity-quota env=ONLY=1i
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45257
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-9516 tests: fix sanity test_24v 72/45172/3
Andreas Dilger [Fri, 8 Oct 2021 23:01:00 +0000 (17:01 -0600)]
LU-9516 tests: fix sanity test_24v

The "lfs getdirstripe -c" command will return stripes=0 for
unstriped directories.  Handle this when calculating free_inodes
to avoid creating zero files for this test.

Speed cleanup of test_24v and other users of simple_cleanup_common()
by using unlinkmany to delete files if the file count is provided.

Use stack_trap consistently and don't do both manual and exit cleanup.

Test-Parameters: trivial testlist=sanity env=ONLY=24v
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I25105a5d0ab719d41bf41cff0aaea6d00a9c4059
Reviewed-on: https://review.whamcloud.com/45172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14965 ldiskfs: rhel7.6 inode mutex for ldiskfs_orphan_add 65/45165/2
Bobi Jam [Fri, 8 Oct 2021 09:21:22 +0000 (16:21 +0700)]
LU-14965 ldiskfs: rhel7.6 inode mutex for ldiskfs_orphan_add

See following warning:

ldiskfs/namei.c:3331 ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
Call Trace:
dump_stack+0x19/0x1b
__warn+0xd8/0x100
warn_slowpath_null+0x1d/0x20
ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
ldiskfs_xattr_inode_orphan_add+0xbb/0x110 [ldiskfs]
ldiskfs_xattr_delete_inode+0x5c/0x350 [ldiskfs]
ldiskfs_evict_inode+0x1a8/0x630 [ldiskfs]
evict+0xb4/0x180
iput+0xfc/0x190
osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
lu_object_free.isra.27+0xb8/0x1c0 [obdclass]
lu_object_put+0xa5/0x460 [obdclass]
mdt_object_put+0x30/0x110 [mdt]
mdt_reint_unlink+0x8e0/0x1890 [mdt]
mdt_reint_rec+0x83/0x210 [mdt]
mdt_reint_internal+0x720/0xaf0 [mdt]
mdt_reint+0x67/0x140 [mdt]
tgt_request_handle+0x7ea/0x1750 [ptlrpc]
ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21

Need to hold inode mutex on the external EA for ldiskfs_orphan_add()
to soothe the warning.

This is a port of:

Lustre-commit: 7d3b5d9fdc766411eacaed27fb2fd9250800f096
Lustre-change: https://review.whamcloud.com/44754

Test-Parameters: trivial
Fixes: f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock() for truncates")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I47a01862793afaac1d7c311f1b6d65d2cf4bb93f
Reviewed-on: https://review.whamcloud.com/45165
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13717 sec: fix handling of encrypted file with long name 63/45163/3
Sebastien Buisson [Tue, 5 Oct 2021 14:51:52 +0000 (16:51 +0200)]
LU-13717 sec: fix handling of encrypted file with long name

The ciphertext representation of the name of an encrypted file or
directory can be up to 256 bytes of binary data, if the cleartext
name is up to NAME_MAX. But then this ciphertext is encoded via
critical_encode() before being sent to servers. Once encoded, the
length can exceed NAME_MAX because of the escaped critical
characters.
So make sure ll_prep_md_op_data() accepts those too long encoded names
if it is called for lookup or create of an encrypted file or
directory. In the other cases, the 'name' taken as input is the plain
text version, so it must conform to the NAME_MAX limit.

When carrying out operations on an encrypted file with long name, we
manipulate a digested form whose hash needs to be matched against the
content of the LinkEA. The name found in the LinkEA is not NUL
terminated, so this aspect must be taken care of.

Fixes: 4d38566a00 ("LU-13717 sec: filename encryption")
Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4b0e51eee5e549ab56292fe0fec3c1be1b487fc7
Reviewed-on: https://review.whamcloud.com/45163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT 84/44984/7
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/44984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14677 sec: remove MIGRATION_ compatibility defines 57/44957/11
Sebastien Buisson [Fri, 10 Sep 2021 12:03:03 +0000 (14:03 +0200)]
LU-14677 sec: remove MIGRATION_ compatibility defines

Remove the MIGRATION_* compatibility flags and use
LLAPI_MIGRATION_* everywhere.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab2a2f6dfc435377e9db0d4963547841b2cbc403
Reviewed-on: https://review.whamcloud.com/44957
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14677 sec: no encryption key migrate/extend/resync/split 24/44024/24
Sebastien Buisson [Thu, 17 Jun 2021 13:31:44 +0000 (15:31 +0200)]
LU-14677 sec: no encryption key migrate/extend/resync/split

Allow some layout operations on encrypted files, even when the
encryption key is not available:
- lfs migrate
- lfs mirror extend
- lfs mirror resync
- lfs mirror verify
- lfs mirror split
We allow these access patterns to applications that know what they are
doing, by using the specific flag O_FILE_ENC and O_DIRECT.

Also add sanity-sec test_59a,b,c to exercise these access patterns.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ieaeee0e5bf7643f18d775fe6daa5e31c2f349f8c
Reviewed-on: https://review.whamcloud.com/44024
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files 76/43876/20
James Simmons [Wed, 8 Dec 2021 22:13:40 +0000 (17:13 -0500)]
LU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files

With kallsyms_lookup_name() no longer exported with 5.8+ kernels
this means the work around to setup the security handling broke.
Currently osd-ldiskfs will crash due to security_alloc() never
being called. The solution is to use alloc_file_pseudo() instead
to create our fake file.

Change-Id: Ib417ebdda7d9829a231c568022618154c273f3e6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43876
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14704 tests: disable opencache for sanity/29 77/43777/14
Alex Zhuravlev [Tue, 25 May 2021 04:13:55 +0000 (07:13 +0300)]
LU-14704 tests: disable opencache for sanity/29

otherwise lock counting is not quite correct

Fixes: 41d99c4902 ("LU-10948 llite: Introduce inode open heat counter")

Test-Parameters: trivial

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia73e8aa4a16b7ced29490d41c8eac4ee839a3406
Reviewed-on: https://review.whamcloud.com/43777
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-11388 test: enable replay-single test_131b 21/40421/7
Vikentsi Lapa [Tue, 27 Oct 2020 14:39:58 +0000 (14:39 +0000)]
LU-11388 test: enable replay-single test_131b

Issue is fixed, so this commit verifies fix.

Test-Parameters: trivial env=ONLY=131 testlist=replay-single fstype=zfs
Signed-off-by: Vikentsi Lapa <vlapa@whamcloud.com>
Change-Id: I609146172c1fee2a955d5c41f623c8b8c2ffaeaa
Reviewed-on: https://review.whamcloud.com/40421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-15357 mdd: fix changelog context leak 31/45831/4
Mikhail Pershin [Sat, 11 Dec 2021 12:49:47 +0000 (15:49 +0300)]
LU-15357 mdd: fix changelog context leak

The mdd_changelog_clear() shouldn't skip llog_ctxt_put()
in case of error.

Fixes: 6b183927e1 (LU-14553 changelog: eliminate mdd_changelog_clear warning)
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9c9aa3ce0d11e8f67470b450d007f2a1081644c6
Reviewed-on: https://review.whamcloud.com/45831
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15252 mdt: reduce contention at mdt_lsom_update 09/45709/5
Alexander Boyko [Thu, 2 Dec 2021 09:43:54 +0000 (04:43 -0500)]
LU-15252 mdt: reduce contention at mdt_lsom_update

mot_som_mutex serialize all close requests with lsom updates for
a same mdt_object. For a massive open/read/close single shared
file load, it leads to high load avarage cause many threads sleep
on mutex.
This patch introduces a cached lsom size, and uses a mutex at update
part only. Close requests with lsom size less or equal to cached size
would not take a mutex at all.

Test results MPI open/flock/funlock/close SSF
10 iterations 10 node 100 thread each, 1000 file ops per thread
close time secs master patch MDT load avarage master patch
avg             0.142  0.086                  47.05  38.89
max             0.164  0.129                  49.39  44.77
min             0.097  0.041                  44.44  34.7

Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I807b468b128295df9391b0467e74d4f10240662e
Reviewed-on: https://review.whamcloud.com/45709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-7372 tests: re-enable replay-dual test_26 82/43982/2
Andreas Dilger [Fri, 11 Jun 2021 07:19:52 +0000 (01:19 -0600)]
LU-7372 tests: re-enable replay-dual test_26

Re-enable test_26 since it was just the unfortunate victim of
either test_24 or test_25 causing MDS unmount to hang.

Test-Parameters: trivial testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib944028e798488c425501f0c48bf812fc13ebbe5
Reviewed-on: https://review.whamcloud.com/43982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15262 osd: bio_integrity_prep_fn return value processing 46/45646/3
Alexey Lyashkov [Mon, 22 Nov 2021 13:32:23 +0000 (16:32 +0300)]
LU-15262 osd: bio_integrity_prep_fn return value processing

There is osd_bio_integrity_handle() fn in lustre/osd-ldiskfs/osd_io.c
It checks the returned code of bio_integrity_prep_fn() but between
mainstream Linux 4.12 and 4.13 kernel integrity API has changed and
in 4.13+ (as well as for any RHEL8 including first beta)

bio_integrity_prep() returns boolean true on success.

HPe-bug-id: LUS-10443
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I973aa8ccae024157ad863d26afc7b1264a5c7149
Reviewed-on: https://review.whamcloud.com/45646
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoNew tag 2.14.56 2.14.56 v2_14_56
Oleg Drokin [Mon, 13 Dec 2021 20:16:40 +0000 (15:16 -0500)]
New tag 2.14.56

Change-Id: I2491f69b4d4e4a7ae8ed39bef8c9806127c93d79
Signed-off-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7] 87/45687/2
Jian Yu [Tue, 30 Nov 2021 21:43:10 +0000 (13:43 -0800)]
LU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.49.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I356b8a8345a4a91d6d1c1a4a9b4eab4bb5afe75b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15260 tests: numfailovers() fix 33/45633/6
Elena Gryaznova [Mon, 22 Nov 2021 15:13:07 +0000 (18:13 +0300)]
LU-15260 tests: numfailovers() fix

Patch fixes numfailovers() to use comma
separated MDTS list correctly. Without this fix
in newer bash version we see the following error:
  line 69: mds1,mds2,mds3,mds4_nums: bad substitution

Fixes: a7a2133bfa ("b=18696 new RECOVERY_RANDOM_SCALE test")
Fixes: b594948509 ("TT-59 remove . and - from the node name")
Test-Parameters: trivial testlist=recovery-random-scale
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10619
Change-Id: I4c28e3c62cada60dc1241948dc4e969e0e10ce9a
Reviewed-on: https://review.whamcloud.com/45633
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15263 quota: fix bug in qmt_pool_recalc 32/45632/2
Sergey Cheremencev [Thu, 21 Oct 2021 20:28:01 +0000 (23:28 +0300)]
LU-15263 quota: fix bug in qmt_pool_recalc

env should be freed at the end of qmt_pool_recalc,
as it is needed in qpi_putref. It causes a panic,
if pool has the last reference:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffffc08de2d7>] lu_context_key_get+0x17/0x30 [obdclass]
...
Call Trace:
 [<ffffffffc08de358>] lu_object_free.isra.30+0x68/0x170 [obdclass]
 [<ffffffffc08e1a35>] lu_object_put+0xc5/0x3e0 [obdclass]
 [<ffffffffc100e56c>] qmt_pool_free+0x30c/0x590 [lquota]
 [<ffffffffc10100b5>] qmt_pool_recalc+0x365/0x1260 [lquota]
 [<ffffffff8bac1c31>] kthread+0xd1/0xe0
 [<ffffffff8c176c37>] ret_from_fork_nospec_begin+0x21/0x21

HPE-bug-id: LUS-10426
Change-Id: Ic23dcb858ff811757f38948aa572c936c076e21e
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/45632
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15208 ldiskfs: add support for Ubuntu20 kernel 5.4.0.90 47/45547/4
Li Dongyang [Fri, 12 Nov 2021 12:30:43 +0000 (23:30 +1100)]
LU-15208 ldiskfs: add support for Ubuntu20 kernel 5.4.0.90

Also fix the lustre-build-ldiskfs.m4 to select correct series file.
We use -ge to check the kernel release version, so greater version
should come on top.

Change-Id: Id6b599ef5b2ea823e203aaa6a40917e49f98f4d9
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45547
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-930 doc: update lustre.7 man page 93/45493/2
Andreas Dilger [Mon, 8 Nov 2021 21:03:24 +0000 (14:03 -0700)]
LU-930 doc: update lustre.7 man page

Update the lustre.7 man page to better describe current functionality.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I979841e597fcfa8448c708dd66d4d89d3018b1cc
Reviewed-on: https://review.whamcloud.com/45493
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>