Whamcloud - gitweb
fs/lustre-release.git
3 months agoLU-14903 doc: update lfs-setdirstripe man page 81/44481/2
Lai Siyao [Mon, 2 Aug 2021 11:55:12 +0000 (07:55 -0400)]
LU-14903 doc: update lfs-setdirstripe man page

Update lfs-setdirstripe man page to reflect the change of
filesystem-wide default directory layout.

Test-parameters: trivial

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1e7818679e057add4747565a2fc850e1857cd7b0
Reviewed-on: https://review.whamcloud.com/44481
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
3 months agoLU-14899 ldiskfs: Add 5.4.136 mainline kernel support 50/44450/2
Oleg Drokin [Sat, 31 Jul 2021 04:55:40 +0000 (00:55 -0400)]
LU-14899 ldiskfs: Add 5.4.136 mainline kernel support

The changes likely appeared in an earlier release
that we may also track down and update to.

Test-Parameters: trivial
Change-Id: I92125087650109b8cc8a968b2fd95ba5f8e7f998
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
3 months agoLU-12815 socklnd: set conns_per_peer based on link speed 17/44417/4
Serguei Smirnov [Wed, 28 Jul 2021 21:47:39 +0000 (14:47 -0700)]
LU-12815 socklnd: set conns_per_peer based on link speed

Specifying conns_per_peer=0 for a ni is now used to set
the conns_per_peer as a function of the corresponding link speed
as follows:
conns_per_peer = (ilog2(Gbps) / 2 + 1)

Listed below are the resulting defaults for common link speeds:
100Gbps, 200Gbps -> 4
        50Gbps  -> 3
        5Gbps, 10Gbps  -> 2
        less than 4Gbps  -> 1

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ief2b33a796c180d8669bd5796b3e35ec748423a5
Reviewed-on: https://review.whamcloud.com/44417
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7] 76/44376/3
Jian Yu [Thu, 22 Jul 2021 07:26:50 +0000 (00:26 -0700)]
LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]

Update RHEL7.9 kernel to 3.10.0-1160.36.2.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ie2898b1df28c8b99ea4099e94baafe388c6aa626
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44376
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14865 utils: llog_reader.c printf type mismatch 46/44346/5
Gian-Carlo DeFazio [Tue, 20 Jul 2021 00:30:36 +0000 (17:30 -0700)]
LU-14865 utils: llog_reader.c printf type mismatch

Add (unsigned long long) cast to results of
__le64_to_cpu so that it matches the formatting (%llu)
of the enclosing printf call.

Build log message:
"llog_reader.c:887:9: error: format '%llu' expects
argument of type 'long long unsigned int', but
argument 3 has type '__u64' [-Werror=format=]"

Test-Parameters: trivial
Fixes: 9962d6f84db5 LU-14617 utils: llog_reader updatelog support
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I9549e0a0bd21727dfcc42992b693bc39a779e1a1
Reviewed-on: https://review.whamcloud.com/44346
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_* 09/44309/4
Mr. NeilBrown [Wed, 4 Aug 2021 17:27:29 +0000 (13:27 -0400)]
LU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_*

The calling convention for ->proc_handler is rather clumsy,
as a comment in fs/procfs/proc_sysctl.c confirms.
lustre has copied this convention to lnet_debugfs_{read,write},
and then provided a wrapper for handlers - lprocfs_call_handler -
to work around the clumsiness.

It is cleaner to just fold the functionality of lprocfs_call_handler()
into lnet_debugfs_* and let them call the final handler directly.

If these files were ever moved to /proc/sys (which seems unlikely) the
handling in fs/procfs/proc_sysctl.c would need to be fixed to, but
that would not be a bad thing.

So modify all the functions that did use the wrapper to not need it
now that a more sane calling convention is available.

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I548ed6a3179cdb7cd5c024febd3fee4709285a82
Reviewed-on: https://review.whamcloud.com/44309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14787 libcfs: Proved an abstraction for AS_EXITING 70/44070/6
Shaun Tancheff [Thu, 22 Jul 2021 07:31:30 +0000 (02:31 -0500)]
LU-14787 libcfs: Proved an abstraction for AS_EXITING

Linux kernel v3.14-7405-g91b0abe36a7b added AS_EXITING flag
AS_EXITING flag is set while address_space mapping is exiting.

Provide an abstraction mapping_clear_exiting() to clear
the AS_EXITING flag. This balances the kernel mapping_set_existing()
and is used for older kernels when enum mapping_flags does
not include AS_EXITING.

HPE-bug-id: LUS-9977
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib3101b7e3eb8a7fcfd0012ac27367f1e65537f5d
Reviewed-on: https://review.whamcloud.com/44070
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1] 37/44037/2
Jian Yu [Sat, 19 Jun 2021 00:26:07 +0000 (17:26 -0700)]
LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]

Update SLES12 SP5 kernel to 4.12.14-122.74.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I98952c097b14c68f744a570e5558fb21d9392ad2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-14773 tests: skip check_network() on working node 33/44033/4
Andreas Dilger [Fri, 18 Jun 2021 20:55:51 +0000 (14:55 -0600)]
LU-14773 tests: skip check_network() on working node

Don't call check_network() (which can take several seconds per node)
if the get_param command ran successfully on all of the nodes.  The
get_param success implies the connection to the remote nodes works
properly, and completes more quickly.

For consistency with previous behavior, still call check_network() if
get_param didn't return any output, since the modules may be unloaded.

Remove some extra visual clutter from every subtest.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6a11cf8a1a6b43bebc3ff8f5506e1faac13ebbe5
Reviewed-on: https://review.whamcloud.com/44033
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14668 lnet: Lock primary NID logic 63/43563/5
Amir Shehata [Wed, 5 May 2021 18:35:06 +0000 (11:35 -0700)]
LU-14668 lnet: Lock primary NID logic

If a peer is created by Lustre make sure to lock that peer's
primary NID. This peer can be discovered in the background.
There is no need to block until discovery is complete, as Lustre
can continue on with the primary NID it provided.

Discovery will populate the peer with other interfaces the peer has
but will not change the peer's primary NID. It can also delete
peer's NIDs which Lustre told it about (not the Primary NID).

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I677b8e01fc89a42128327645861ca6cfba4c1b1a
Reviewed-on: https://review.whamcloud.com/43563
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14668 lnet: peer state to lock primary nid 62/43562/5
Amir Shehata [Wed, 5 May 2021 01:20:54 +0000 (18:20 -0700)]
LU-14668 lnet: peer state to lock primary nid

Introduce the following two peer states:

LNET_PEER_LOCK_PRIMARY, set by Lustre to lock the primary NID
of a peer to the NID Lustre is configured with

LNET_PEER_BAD_CONFIG, set by LNet if Lustre attempts to set
a peer's Primary NID to a NID used as the primary NID of another
peer

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8c55e90ad2abd083c2fc902a04d4cd06a3412bfa
Reviewed-on: https://review.whamcloud.com/43562
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14661 obdclass: Add peer/peer NI when processing llog 10/43510/6
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/43510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14661 lnet: Provide kernel API for adding peers 09/43509/5
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/43509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14531 osd: serialize access to object vs object destroy 33/43233/18
Alex Zhuravlev [Thu, 18 Mar 2021 08:43:06 +0000 (11:43 +0300)]
LU-14531 osd: serialize access to object vs object destroy

in osd-zfs as ZFS doesn't provide an internal mechanism for this.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5f25710a5cf1568f124733a15e77a37ffcb55434
Reviewed-on: https://review.whamcloud.com/43233
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12815 socklnd: allow dynamic setting of conns_per_peer 63/41463/13
Serguei Smirnov [Mon, 2 Aug 2021 14:48:35 +0000 (10:48 -0400)]
LU-12815 socklnd: allow dynamic setting of conns_per_peer

Modify lnetctl and associated code to allow dynamic setting
of conns_per_peer lnd parameter per ni.

The parameter can be set for a specific active nid:
        lnetctl net set --nid 192.168.122.10@tcp --conns-per-peer=4

Or when adding a new net, taking effect on the new nid:
        lnetctl net add --net tcp --if eth0 --conns-per-peer=1

By default, conns_per_peer value specified as the module parameter
shall be used.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I11625b9ad61f0311c294001a38b7855465491aaf
Reviewed-on: https://review.whamcloud.com/41463
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14093 mgc: rework mgc_apply_recover_logs() for gcc10 84/40484/8
Alex Zhuravlev [Tue, 3 Aug 2021 14:15:10 +0000 (10:15 -0400)]
LU-14093 mgc: rework mgc_apply_recover_logs() for gcc10

rework mgc_apply_recover_logs() to use a separate buffer of
appropriate size so that gcc10 doesn't complain:
mgc_request.c:1506:24: error: argument 4 may overlap destination
        object [-Werror=restrict]
 1506 |        pos += sprintf(obdname + pos, "-%s-%s", cname, inst);

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ice863b412475e53705dc6523ab30ba613244bd90
Reviewed-on: https://review.whamcloud.com/40484
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-6142 tests: remove iam_ut binary 09/44509/3
Andreas Dilger [Thu, 5 Aug 2021 20:21:44 +0000 (14:21 -0600)]
LU-6142 tests: remove iam_ut binary

Remove iam_ut binary that was incorrectly committed many years ago.

Test-Parameters: trivial
Fixes: 6e679230f2f5 ("LU-6142 tests: Remove file iam_ut.c")
Fixes: d2d56f38da01 ("make HEAD from b_post_cmd3")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2c254d990a3f07cad4feb7969e646d856b3ebbe5
Reviewed-on: https://review.whamcloud.com/44509
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14876 out: don't connect to busy MDS-MDS export 90/44390/5
Mikhail Pershin [Wed, 21 Jul 2021 15:14:01 +0000 (18:14 +0300)]
LU-14876 out: don't connect to busy MDS-MDS export

MDS-MDS connection is missing check for busy requests upon
reconnect, so resent can be executed concurrently with
original request.

- in ptlrpc_server_check_resend_in_progress() remove exception
  for bulk requests, they can be compared by XID nowadays.
  This prevents OUT requests vs resent execution as well.
- fix messages in target_handle_connect() to report correct
  information about connection details
- in out_handle() check for last_xid only once per OUT_UPDATE
- test 110m is added to recovery-small to reproduce the issue

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2ad183674d59a2cdeab0037bd8551c607b10ffeb
Reviewed-on: https://review.whamcloud.com/44390
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14798 lustre: Support RDMA only pages 11/44111/2
Amir Shehata [Thu, 6 Feb 2020 04:23:20 +0000 (20:23 -0800)]
LU-14798 lustre: Support RDMA only pages

Some memory architectures and CPU-offload cards with
on-board memory do not map data pages into the CPU
address space. Allow RDMA of data directly into those
pages without accessing contents.

Therefore, made changes to prevent doing checksum on
these type of pages.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I189c34893ffa500ed275f2a1f79e8fb817a2489d
lustre-change: https://review.whamcloud.com/37454
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44111
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14798 lnet: add LNet GPU Direct Support 10/44110/2
Amir Shehata [Thu, 6 Feb 2020 03:14:17 +0000 (19:14 -0800)]
LU-14798 lnet: add LNet GPU Direct Support

This patch exports registration/unregistration functions
which are called by the NVFS module to let the LND know
that it can call into the NVFS module to do RDMA mapping
of GPU shadow pages.

GPU priority is considered during NI selection.

Less than 4K writes are always RDMAed if the rdma source is
the gpu device

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2bfdbdd5fe3b8536e616ab442d18deace6756d57
lustre-change: https://review.whamcloud.com/37368
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-14893 lctl: check user for changelog_deregister 32/44432/3
Emoly Liu [Fri, 30 Jul 2021 08:13:12 +0000 (16:13 +0800)]
LU-14893 lctl: check user for changelog_deregister

If no user is specified for "lctl changelog_deregister", usage
should be printed correctly.
Also, sanity.sh test_106e is modified to verify this fix.

Fixes: a15eb4f13224e ("LU-13055 mdd: per-user changelog names and mask")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ia7f1b18e82f6b4174b9435cd67aba5f591d43ce1
Reviewed-on: https://review.whamcloud.com/44432
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14881 libcfs: Complete testing for tcp_sock_set_* 74/44374/3
Shaun Tancheff [Thu, 22 Jul 2021 08:58:44 +0000 (03:58 -0500)]
LU-14881 libcfs: Complete testing for tcp_sock_set_*

Linux commits:
  v5.7-rc6-2504-gddd061b8daed
  tcp: add tcp_sock_set_quickack

  v5.7-rc6-2508-gd41ecaac903c
  tcp: add tcp_sock_set_keepintvl

  v5.7-rc6-2509-g480aeb9639d6
  tcp: add tcp_sock_set_keepcnt

Introduced a series of helper functions that may be
back ported individually.

Test-Parameters: trivial
Fixes: 99d9638d6c ("LU-13783 libcfs: support removal of kernel_setsockopt()")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4fce67b801979ec7857265b6bd0370c05737e268
Reviewed-on: https://review.whamcloud.com/44374
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14413 test: test for overstriping for sanity 27M 40/44340/8
James Simmons [Wed, 28 Jul 2021 00:10:29 +0000 (20:10 -0400)]
LU-14413 test: test for overstriping for sanity 27M

The introduction of sanity 27M broke interop with 2.12 LTS since
over striping doesn't exist in that version. Adjust the test to
use over striping if the client supports it, otherwise just use
traditional striping.

Test-Parameters: trivial testlist=sanity env=ONLY=27M
Change-Id: I2d788a116cbb749a83d6cec36f97d06533b32421
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44340
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14740 quota: reject invalid project id on server side 39/44339/2
Wang Shilong [Mon, 19 Jul 2021 07:14:43 +0000 (15:14 +0800)]
LU-14740 quota: reject invalid project id on server side

do sanity check before transfer project ID, reject invalid
project id if it comes from some older clients.

Test-parameters: trivial testlist=sanity-quota

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: If89e320c7808d188e615f5f0923c2322774b2ceb
Reviewed-on: https://review.whamcloud.com/44339
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-8066 obdclass: move lu_ref to debugfs 11/44311/6
James Simmons [Tue, 20 Jul 2021 20:14:09 +0000 (16:14 -0400)]
LU-8066 obdclass: move lu_ref to debugfs

A special procfs file is created for lu_ref debugging. Lets move
this to debugfs where it belongs.

Also fixed a missed USE_LU_REF due to landing order as well as
a build fix.

Fixes: dfe2d225b86 ("LU-13799 clio: Implement real list splice")
Change-Id: I33646a87adfcabc5a5f214832953b2444e7aaf0a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14790 lnet: Reflect ni_fatal in NI status 72/44072/2
Chris Horn [Thu, 24 Jun 2021 17:16:46 +0000 (12:16 -0500)]
LU-14790 lnet: Reflect ni_fatal in NI status

If the ni_fatal_error_on flag is set on an NI then that NI should be
considered down.

HPE-bug-id: LUS-10167
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I201bda7e06da1fb1cc23db70ce0cfa3118635d0f
Reviewed-on: https://review.whamcloud.com/44072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14694 mdt: do not remove orphans at umount 83/43783/20
Alex Zhuravlev [Tue, 25 May 2021 15:39:14 +0000 (18:39 +0300)]
LU-14694 mdt: do not remove orphans at umount

as it's very likely that another MDT is being umounted as well
and such a removal can get stuck if the object being removed
is a striped directory.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0417b1b4447887e166c144605bbfa3249126eacd
Reviewed-on: https://review.whamcloud.com/43783
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9859 libcfs: discard cfs_cap_t, use kernel_cap_t 71/43171/9
Mr. NeilBrown [Wed, 14 Jul 2021 16:15:26 +0000 (12:15 -0400)]
LU-9859 libcfs: discard cfs_cap_t, use kernel_cap_t

lustre only sends 32bits of capabilities in on-the-wire RPC calls.
It current strips off higher bits and uses a 32bit cfs_cap_t
throughout.
Though there is a small memory cost, it is cleaner to use
kernel_cap_t throughout and only truncate when marshalling
data for RPC calls.

So this patch replaces cfs_cap_t with kernel_cap_t throughout,
and where a cfs_cap_t was previous stored in a __u32, we now
store cap.cap[0] instead.

With this, we can remove include/linux/libcfs/curproc.h

Linux-commit: 18f92a6e3d6bd00941ddfb5837835348f72d39dc

Change-Id: If7dd7a16c218dfc0d520e189f021ed6bda3b93fd
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/43171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10973 lnet: LUTF Python infra 87/38087/50
Amir Shehata [Wed, 25 Mar 2020 02:23:43 +0000 (19:23 -0700)]
LU-10973 lnet: LUTF Python infra

Added the python LUTF infrastructure. The python infrastructure
provides the core LUTF feature set. The tests-infra is lnet
specific infrastructure to be used by LUTF test suites.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1d0336606625424880f1b64b1dd296d4c7ed85ea
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10973 lnet: LUTF infrastructure updates 77/44177/3
Amir Shehata [Mon, 5 Jul 2021 18:17:16 +0000 (11:17 -0700)]
LU-10973 lnet: LUTF infrastructure updates

Fix Agent management
Handle python failures properly.
Change default location for temporary files to be in /tmp/lutf

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4e37b6226dfa12de4b7a1f5bfd87f84e91ee1dda
Reviewed-on: https://review.whamcloud.com/44177
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-6142 lustre: use list_first_entry() in lustre subdirectory. 38/44338/2
Mr. NeilBrown [Sun, 18 Jul 2021 12:57:33 +0000 (08:57 -0400)]
LU-6142 lustre: use list_first_entry() in lustre subdirectory.

Convert
  list_entry(foo->next .....)
to
  list_first_entry(foo, ....)

in 'lustre'

In several cases the call is combined with
a list_empty() test and list_first_entry_or_null() is used

Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I27b8b55cac2cfeaf95bb66930958c49ad422156e
Reviewed-on: https://review.whamcloud.com/44338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14382 mdt: implement fallocate in MDC/MDT 18/41418/11
Mikhail Pershin [Mon, 1 Feb 2021 21:16:17 +0000 (00:16 +0300)]
LU-14382 mdt: implement fallocate in MDC/MDT

- add CLIO fallocate() handling in MDC
- implement FALLOCATE RPC handling at MDT side
- update test group 150 in sanity to work with
  sanity-dom.sh test

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I46c25c6c7fd72fe14d558b594ecc63c2a3ad81b2
Reviewed-on: https://review.whamcloud.com/41418
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14844 tests: make sure mgc_requeue_timeout_min exist. 15/44215/13
James Simmons [Wed, 14 Jul 2021 17:07:59 +0000 (13:07 -0400)]
LU-14844 tests: make sure mgc_requeue_timeout_min exist.

The module parameter mgc_requeue_timeout_min was introduced to reduce
testing times. Currently the test framework always attempts to set this
value but it doesn't exist in earlier Lustre versions which breaks
interop testing. Set the module parameter only if it exist.

Change-Id: I64f62e3d6e2faeba99ced98363d241083f95d92e
Fixes: 04b2da6180d ("LU-14516 mgc: configurable wait-to-reprocess time")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44215
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14900 tests: do not fail if kmemleak tunable is not writable 51/44451/3
Oleg Drokin [Sat, 31 Jul 2021 05:43:58 +0000 (01:43 -0400)]
LU-14900 tests: do not fail if kmemleak tunable is not writable

Change-Id: Id77430f1e8ff7a8fda6538211a0d36bbe973a889
Test-Parameters: trivial
Fixes: 15c0a21ea9 ("tests: Add kmemleak awareness to test-framework")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44451
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 months agoLU-14880 libcfs: Use crypto/sha2.h if available 73/44373/4
Shaun Tancheff [Thu, 22 Jul 2021 07:45:37 +0000 (02:45 -0500)]
LU-14880 libcfs: Use crypto/sha2.h if available

As of Linux commit a24d22b225ce158651378869a6b88105c4bdb887
   crypto: sha - split sha.h into sha1.h and sha2.h

sha.h is removed and sha2.h or sha3.h is preferred.

Test-Parameters: trivial
Fixes: a813e81870 ("LU-12275 sec: add llcrypt as file encryption library")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iead5e1cb23e79a400da3cbfeb4c35c834e821d62
Reviewed-on: https://review.whamcloud.com/44373
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14093 gss: gcc10 fixes for GSS 63/44363/4
James Simmons [Wed, 21 Jul 2021 16:41:10 +0000 (12:41 -0400)]
LU-14093 gss: gcc10 fixes for GSS

Building with gcc10 reports the following issues when building
GSS:

gss_util.h:37: multiple definition of `this_realm';
gssd.h:73: multiple definition of `clnt_list';
svcgssd.h:38: multiple definition of `krb_enabled';

Properly scope these variables.

Change-Id: I05fc298fb90d67314c6963273688c2577099188a
Test-Parameters: env=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44363
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13299 lnet: add "stats reset" to lnetctl 50/44150/2
Cyril Bordage [Tue, 6 Jul 2021 12:26:09 +0000 (14:26 +0200)]
LU-13299 lnet: add "stats reset" to lnetctl

This new command resets stats shown by "lnetctl stats show". It could
be useful when debugging connectivity issues, by making easier the
process to detect the changes in stats from the clean state rather
than on top of historical values.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I4195a862fa5e04d96ac4c2b1509b625c90fbb579
Reviewed-on: https://review.whamcloud.com/44150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14806 o2iblnd: clear fatal error on successful failover 39/44139/4
Serguei Smirnov [Mon, 5 Jul 2021 18:23:33 +0000 (11:23 -0700)]
LU-14806 o2iblnd: clear fatal error on successful failover

In IB bonding configuration link down event causes fatal error
flag to be set on the bonded interface so it is not selected by
LNet for tx, e.g. when just one of the two cables is pulled.
This change allows for the interface status to be restored on
successful failover.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ifd55b141e73d01a187c02ede3f021f0eab18e0bb
Reviewed-on: https://review.whamcloud.com/44139
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14792 llite: enable filesystem-wide default LMV 90/44090/17
Lai Siyao [Mon, 21 Jun 2021 03:52:01 +0000 (11:52 +0800)]
LU-14792 llite: enable filesystem-wide default LMV

This change includes three parts:
1. save dir depth to ROOT after lookup on client side.
2. once space balanced default LMV is set on ROOT, and
   max-inherit/max-inherit-rr is unlimited or not less than directory
   depth, new directory will be created in QOS or roundrobin mode.
3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to
   3, and increase the ratio to create subdirectory on local MDT with
   the directory depth to ROOT, so that new directories will be
   created by space usage, and the deeper it's located it's more
   likely to create on local MDTs; and the top 3 layer will be created
   in roundrobin mode if system is balanced.

Set default LMV in mkdir_on_mdt() to make sure its subdirectories are
created on the same MDT. Add sanity 413d.

Create a test directory on MDT0 for pjdfstest, because cross-MDT
rename of symlink will migrate symlink to target MDT, which will cause
inode change (LU-11631).

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib3a133ac99655ca04443b9498e6618033f6b88b9
Reviewed-on: https://review.whamcloud.com/44090
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14621 mdd: fix lock-tx order in mdd_xattr_merge() 66/43366/15
Alex Zhuravlev [Mon, 19 Apr 2021 05:42:26 +0000 (08:42 +0300)]
LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()

to follow common transaction-first-then-locks rule.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I437c919e498781b8d5dc653bf90aac799df4882a
Reviewed-on: https://review.whamcloud.com/43366
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13417 mdd: set default LMV on ROOT 53/38553/34
Lai Siyao [Wed, 28 Apr 2021 15:02:23 +0000 (23:02 +0800)]
LU-13417 mdd: set default LMV on ROOT

To balance MDT usage, set default LMV on ROOT if it's not set. The
default stripe offset is "-1", and default stripe count is "1". Then
directory created by "mkdir" under ROOT will be scattered on all MDTs
by usage.

Add sanity 0e.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I7a6c752256225b8d065b2c304c4725268df28045
Reviewed-on: https://review.whamcloud.com/38553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoNew tag 2.14.53 2.14.53 v2_14_53
Oleg Drokin [Fri, 30 Jul 2021 19:20:13 +0000 (15:20 -0400)]
New tag 2.14.53

Change-Id: Idff781cb6333d4f0af90c0e729f3cd0231022a5a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag 13/40813/4
Qian Yingjin [Tue, 1 Dec 2020 08:22:08 +0000 (16:22 +0800)]
LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag

The upcoming new feature PCC-RO is combined with FLR and extend
the on-disk data strucutre 'enum lov_comp_md_flags' for layout
components. It adds a new layout flag: LCM_FL_PCC_RDONLY.

enum lov_comp_md_flags {
LCM_FL_NONE = 0x0,
LCM_FL_RDONLY = 0x1,
LCM_FL_WRITE_PENDING = 0x2,
LCM_FL_SYNC_PENDING = 0x3,
LCM_FL_PCC_RDONLY = 0x8,
LCM_FL_FLR_MASK         = 0xB,
};

The LCM_FL_PCC_RDONLY flag, which is dedicated for PCC-RO, is
different from LCM_FL_RDONLY.
A PCC-RO cached file could be in the state:
- LCM_FL_PCC_RDONLY | LCM_FL_RDONLY: it means that all FLR
  components are synced and in up-to-date state. The replicated
  file is on read-only state. And then one client attaches the
  file into the PCC backend with PCC-RO mode.
- LCM_FL_PCC_RDONLY | LCM_FL_WRITE_PENDING: it means the file was
  once modified, the data content of layout components are not
  synced. MDT has already picked a promary replica and marked
  other components as STALE. At this time, a client can still
  PCC-RO attach the file. On this client, the primary component
  and the PCC copy are both in up-to-date state.

As a new LCM_FL_PCC_RDONLY flag is added, the old client may not
understand this new FLR layout flag, and may result in
inconsistent data access.

This patch adds this new flag for the purpose of compatibility and
interoperability.

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I2e96f413c0b35355503c20dfea0a39d39a216d90
Reviewed-on: https://review.whamcloud.com/40813
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14814 osc: osc: Do not flush on lockless cancel 52/44152/7
Patrick Farrell [Tue, 6 Jul 2021 15:20:56 +0000 (11:20 -0400)]
LU-14814 osc: osc: Do not flush on lockless cancel

The cancellation of a an OSC lock without an LDLM lock
(a 'lockless' OSC lock) should not flush pages.  Only
direct i/o is allowed to use a lockless OSC lock, and
direct i/o does not create flushable pages.

DIO pages are not flushable because:
A) all synced ASAP, and
B) the OSC extents created for them are not added to the
extent tree which is used to track these pages.

Instead, this has the effect of trying to flush pages from
ongoing buffered i/o.  This can lead to crashes like the
following:

osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed

This assert essentially says the lock cancellation
(hp == 1) found an active i/o (an extent in the OES_ACTIVE
state).

This is not allowed because the flushing code assumes an
LDLM lock is being cancelled, which will only start once
there is no active i/o.  Because the OSC lock being
cancelled is not associated with an LDLM lock, this is not
true, and nothing prevents active i/o under a different
lock, leading to this assert.

The solution is simply to not flush pages when cancelling a
no-LDLM-lock OSC lock.

Additional note:
New lockless OSC locks cannot be created if they are
blocked by a regular OSC lock, but a new regular lock can
be created if there is a lockless lock present.

Thus, the sequence is something like this:
Direct i/o creates lockless OSC lock
Buffered i/o creates OSC and LDLM lock on the same range
Direct i/o finishes, starts cancelling its OSC lock
Buffered i/o is still ongoing, with extents in OES_ACTIVE

This results in the above crash during the OSC lock
cancellation.

Note it would be possible to resolve this issue by not
allowing lockless OSC locks to match regular OSC locks, but
this is not necessary, since there's no reason for lockless
locks to flush pages on cancellation.

Test-Parameters: env=ONLY=398b,ONLY_REPEAT=200 testlist=sanity
Test-Parameters: env=ONLY=77,ONLY_REPEAT=100 testlist=sanityn
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iceb1747b66232cad3f7e90ec271310a13a687a33
Reviewed-on: https://review.whamcloud.com/44152
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14838 osc: Remove client contention support 05/44205/5
Patrick Farrell [Fri, 9 Jul 2021 20:13:36 +0000 (16:13 -0400)]
LU-14838 osc: Remove client contention support

Lockless buffered i/o and contention detection don't work,
lockless bufferd i/o is unfixable and contention detection
is broken enough that it will have to be rewritten.

Let's remove both.  This patch starts the removal by
pulling the client side support.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If8583eff176bddb33e197befb967d229f8ca5688
Reviewed-on: https://review.whamcloud.com/44205
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14838 osc: Remove lockless truncate 04/44204/5
Patrick Farrell [Fri, 9 Jul 2021 20:13:09 +0000 (16:13 -0400)]
LU-14838 osc: Remove lockless truncate

Lockless truncate does not work and cannot be made to work.

Fundamentally, it has no means of ensuring consistency
across clients because it can't force them all to drop
cached data without locking.

It's been off for years - let's just get rid of it.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia2979fb6b31a61da6d4833e9f463fcd5b6dbd718
Reviewed-on: https://review.whamcloud.com/44204
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c 32/44332/2
Mr. NeilBrown [Fri, 16 Jul 2021 12:24:10 +0000 (08:24 -0400)]
LU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c

This type is only used in libcfs/module.c, so make it local to there.
If any other module ever wanted to add its own symlinks,
it would probably be easiest to export lnet_debugfs_root
and just call debugfs_create_symlink as required.

Linux-commit: c4f907719736b720aa831447828809840e533371

Test-Parameters: trivial
Change-Id: I08221cfc781735451a292ba20cd35b9172fc20f2
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/44332
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
4 months agoLU-14637 flr: get rid of excluding dom+flr support test 85/44185/8
Bobi Jam [Thu, 8 Jul 2021 14:34:18 +0000 (22:34 +0800)]
LU-14637 flr: get rid of excluding dom+flr support test

Now that DoM+FLR are supported, fix the tests that expect this
combination of features on a file to fail.

Fixes: 0bff64be320fd ("LU-9771 flr: to not support dom+flr for phase 1")
Fixes: 44a721b8c1063 ("LU-11421 dom: manual OST-to-DOM migration via mirroring)
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9fc76e797e469744107e5d0453b78729226be0ee
Reviewed-on: https://review.whamcloud.com/44185
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-14789 tests: make sanity 133f and 133g working 84/44184/4
Cyril Bordage [Thu, 8 Jul 2021 14:35:44 +0000 (16:35 +0200)]
LU-14789 tests: make sanity 133f and 133g working

Tests sanity 133f and 133g were doing nothing after change 38567.
Because the argument given to badarea_io was not a path, 0 was always
returned. This patch finds the complete path of the parameters
returned by "lctl get_param" and gives them to badarea_io.

Fixes: 1c54733894f8 ("LU-10401 tests: add -F so list_param prints entry type")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7a8914e2950d5a8b2a93948c4fbe889520a3198c
Reviewed-on: https://review.whamcloud.com/44184
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-14788 lnet: check memdup_user_nul using IS_ERR 91/44091/4
Cyril Bordage [Mon, 28 Jun 2021 13:13:21 +0000 (15:13 +0200)]
LU-14788 lnet: check memdup_user_nul using IS_ERR

Crash in __proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.

Fixes: 67af976c806 ("LU-14428 libcfs: discard cfs_trace_copyin_string()"
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I1fabf2499b6bbee7b94a2f802fbcbd9270d901b3
Reviewed-on: https://review.whamcloud.com/44091
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13055 doc: update changelog manpages 22/44022/5
Mikhail Pershin [Thu, 17 Jun 2021 14:11:51 +0000 (17:11 +0300)]
LU-13055 doc: update changelog manpages

Add lctl-changelog_register.8 and lctl-changelog_deregister.8
manpages and update lctl.8 manpage to refer to them.

Fixes: 15305c3c3fe7 ("LU-12214 build: fix build without lustre_utils")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie41db630c72f61a884cd8000e0a4aeeb42ca60eb
Reviewed-on: https://review.whamcloud.com/44022
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14748 build: gcc9 fix address of packed member warning 61/43961/4
Shaun Tancheff [Wed, 9 Jun 2021 16:25:21 +0000 (11:25 -0500)]
LU-14748 build: gcc9 fix address of packed member warning

gcc9 complains about use of __swabXXs() with some packed
structures.
Use __swabXX() for these cases.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I7437c6f21d38c209ef452b41760aad6d1d3d6034
Reviewed-on: https://review.whamcloud.com/43961
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14740 llite: avoid project quota overflow 39/43939/8
Wang Shilong [Mon, 7 Jun 2021 15:40:22 +0000 (23:40 +0800)]
LU-14740 llite: avoid project quota overflow

Currently, project ID is stored as u32, max possible
value for it is 4294967295.

However, VFS reserve max value for special usage, see
following function:

  static inline bool
  qid_has_mapping(struct user_namespace *ns, struct kqid qid)
  {
          return from_kqid(ns, qid) != (qid_t) -1;
  }

So qid_has_mapping() could return 0 for id 4294967295.
A further try on chown test:

  $ chown 4294967295:4294967295 c.sh
  chown: invalid user: ‘4294967295:4294967295
  $ chown 4294967294:4294967294 c.sh

Fix to check max possible value for project ID in the
client kernel side, and add a test case for this.

Test-parameters: trivial testlist=sanity-quota
Fixes: 7b5c1f1404c3 ("LU-13845 utils: Quota id 0xFFFFFFFF is invalid")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ide8b9cc79d9b7f2a8b9860a0c0f683ec903b8f91
Reviewed-on: https://review.whamcloud.com/43939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14430 mdd: rename mti_fid to mdi_fid and friends 40/43740/5
Andreas Dilger [Tue, 18 May 2021 17:56:32 +0000 (11:56 -0600)]
LU-14430 mdd: rename mti_fid to mdi_fid and friends

Rename mdd_thread_info fields to avoid confusion with mdt_thread_info.
The final patch to rename mdd_thread_info fields to a unique prefix:

  mti_cattr->mdi_cattr
  mti_fid->mdi_fid
  mti_fid2->mdi_fid2
  MTI_KEEP_KEY->MDI_KEEP_KEY
  mti_la_for_fix->mdi_la_for_fix
  mti_la_for_start->mdi_la_for_start
  mti_pattr->mdi_pattr
  mti_tattr->mdi_tattr
  mti_tpattr->mdi_tpattr

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I17bcc3ddfae400a5ca76e4f654c696da6d3ebbe5
Reviewed-on: https://review.whamcloud.com/43740
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13717 sec: handle null algo for filename encryption 88/43388/6
Sebastien Buisson [Thu, 25 Mar 2021 16:55:35 +0000 (17:55 +0100)]
LU-13717 sec: handle null algo for filename encryption

Encrypted files created with Lustre 2.14 have clear text file names.
With new code implementing filename encryption, newly created files
will have cipher text names, unless they are in an encrypted directory
created in Lustre 2.14.

So we need to make sure llcrypt library can properly handle the "null"
algorithm for client side filename encryption, which is basically a
no-op.
Handling this "null" algo for filename encryption will not be possible
with the in-kernel fscrypt library, so modify the behaviour of
configure to build with embedded llcrypt by default, and only build
against in-kernel fscrypt if explicitly specified via
--enable-crypto=in-kernel configure option.

The objective is to urge users to convert their encrypted directories
to the new fashion that encrypts filenames.
However, with the new code some operations on encrypted files created
with 2.14 might not be possible, like migrate, so expressly forbid
migrate on files that use the "null" algorithm for client side
filename encryption.

Finally, we revert commit 11fcbfa9de4a5170abc2c5df2a6e4e02f0f84268
("LU-12275 sec: force file name encryption policy to null") so that
new encrypted directories will enforce filename encryption.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I393945adc9b720a56544b5da0669cb2848507457
Reviewed-on: https://review.whamcloud.com/43388
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13799 osc: Improve osc_queue_sync_pages 82/39482/23
Patrick Farrell [Tue, 15 Jun 2021 14:23:04 +0000 (10:23 -0400)]
LU-13799 osc: Improve osc_queue_sync_pages

This patch was split and partially done in:
https://review.whamcloud.com/38214

So the text below refers to the combination of this patch
and that one.  This patch now just improves a looped atomic
add by replacing with a single one.  The rest of the grant
calcuation change is in
https://review.whamcloud.com/38214

(I am retaining the text below to show the performance
improvement)
----------
osc_queue_sync_pages now has a grant calculation component,
this has a pretty painful impact on the new faster DIO
performance.  Specifically, per page ktime_get() and the
per-page atomic_add cost close to 10% of total CPU time in
the DIO path.

We can make this per batch of pages rather than for each
page, which reduces this cost from 10% of CPU to almost
nothing.

This improves write performance by about 10% (but has no
effect on reads, since they don't use grant).

This patch reduces i/o time in ms/GiB by:
Write: 10 ms/GiB
Read: 0 ms/GiB

Totals:
Write: 158 ms/GiB
Read: 161 ms/GiB

mpirun -np 1 $IOR -w -t 1G -b 64G -o $FILE --posix.odirect

Before patch:
write     6071

After patch:
write     6470

(Read is similar.)

This also fixes a mistake in c24c25dc1b / LU-13419 where it
removed the shrink interval update entirely from the direct
i/o path.

Fixes: c24c25dc1b ("LU-13419 osc: Move shrink update to per-write")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ic606e03be58239c291ec0382fa89eba64560da53
Reviewed-on: https://review.whamcloud.com/39482
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13799 clio: Skip prep for transients 48/39448/17
Patrick Farrell [Fri, 7 May 2021 19:51:32 +0000 (15:51 -0400)]
LU-13799 clio: Skip prep for transients

The work done by cpo_prep() (etc) is unnecessary for
transient pages.  This gives only a minimal performance
boost and is better seen as a step towards removing the
cl_page abstraction for transient pages.

But, it does consistently give around 1% better
performance.

This patch reduces i/o time in ms/GiB by:
Write: 1 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 169 ms/GiB
Read: 161 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        6028 MiB/s
read         6305 MiB/s

Plus this patch:
write        6071 MiB/s
read         6355 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ib94f57cde468c9aaea952e1bb89db8fcf4b35e07
Reviewed-on: https://review.whamcloud.com/39448
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13799 llite: Adjust dio refcounting 47/39447/16
Patrick Farrell [Fri, 7 May 2021 19:50:15 +0000 (15:50 -0400)]
LU-13799 llite: Adjust dio refcounting

We get a page reference in cl_page_find, then immediately
add another for cl_2queue_add and remove the first
reference.  This is pretty silly, since the life cycle is
the same on these.

This improves DIO/AIO page submission by around 2%.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 170 ms/GiB
Read: 162 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous pa5ches in series:
write        5955 MiB/s
read         6218 MiB/s

Plus this patch:
write        6028 MiB/s
read         6305 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I228eca6d48c6007bbf2c8caae5e477b7d40521d1
Reviewed-on: https://review.whamcloud.com/39447
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13799 lov: Improve DIO submit 46/39446/16
Patrick Farrell [Fri, 7 May 2021 19:42:20 +0000 (15:42 -0400)]
LU-13799 lov: Improve DIO submit

Skip some unnecessary looping in page submission for the
DIO case.

This gives about a 2% improvement for AIO/DIO page
submission.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 172 ms/GiB
Read: 165 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7726 MiB/s
read         5899 MiB/s

Plus this patch:
write        5954 MiB/s
read         6217 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Iedad978438ee3f1f3290d990311532626cba9e2d
Reviewed-on: https://review.whamcloud.com/39446
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13799 llite: Remove transient page counting 41/39441/15
Patrick Farrell [Sat, 29 May 2021 01:32:43 +0000 (21:32 -0400)]
LU-13799 llite: Remove transient page counting

Transient page counting is not used for anything, as
already noted in the commit message, but costs something
like 4% of the time in DIO page submission.

Remove it.

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 11 ms/GiB

Totals:
Write: 174 ms/GiB
Read: 167 ms/GiB

With previous patches in series:
write     5703 MiB/s
read      5756 MiB/s

Plus this patch:
write     5900 MiB/s
read      6136 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I825de4f1b5d1dd1476a4a711bfa51e7d24b5027a
Reviewed-on: https://review.whamcloud.com/39441
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-13799 llite: Modify AIO/DIO reference counting 42/39442/14
Patrick Farrell [Fri, 7 May 2021 15:50:51 +0000 (11:50 -0400)]
LU-13799 llite: Modify AIO/DIO reference counting

For DIO pages, it's enough to have a reference on the
cl_object associated with the AIO.  This saves taking a
reference on the cl_object for each page, which saves about
5% of the time when doing DIO/AIO.

This is possible because the lifecycle of the aio struct is
always greater than that of the associated pages.

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 198 ms/GiB
Read: 197 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5030 MiB/s
read      5174 MiB/s

Plus this patch:
write     5183 MiB/s
read      5200 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I970cda20417265b4b66a8eed6e74440e5d3373b8
Reviewed-on: https://review.whamcloud.com/39442
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13326 mds: remove MDS_SETATTR_PORTAL and service 98/37798/7
Andreas Dilger [Wed, 4 Mar 2020 20:28:26 +0000 (12:28 -0800)]
LU-13326 mds: remove MDS_SETATTR_PORTAL and service

Remove the MDS_SETATTR_PORTAL and the service threads listening on
this portal since they are unused since Lustre 2.1 and are no longer
needed.

Remove module tunables related to the mds_attr service threads:
- mds_attr_num_threads
- mds_attr_cpu_bind
- mds_attr_num_cpts

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I64f4f3f0004e1895ef7b49b31a4ad687a1abcca2
Reviewed-on: https://review.whamcloud.com/37798
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13417 test: mkdir_on_mdt0() in more tests 15/44315/8
Lai Siyao [Thu, 8 Jul 2021 08:09:01 +0000 (16:09 +0800)]
LU-13417 test: mkdir_on_mdt0() in more tests

Replace mkdir with mkdir_on_mdt0() in several tests.

Update recovery-small test_110k() in case there are opened files on
MDT1 which would cause umount stall.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iebc32568b7fc146b658f47c5f5053fd3db24432f
Reviewed-on: https://review.whamcloud.com/44315
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14655 lnet: Protect lpni deref in lnet_health_check 03/43503/3
Chris Horn [Wed, 28 Apr 2021 01:10:16 +0000 (20:10 -0500)]
LU-14655 lnet: Protect lpni deref in lnet_health_check

Discovery thread can modify peer NI/peer net/peer relationship
so we need to be careful when dereferencing the peer NI pointer in
lnet_health_check(). Discovery thread operations under net lock, so
move the peer NI dereference under the net lock which is taken for
incrementing the health stats.

Move some of the other code that is only relevant for messages with a
health status != LNET_MSG_STATUS_OK under the appropriate condition.

HPE-bug-id: LUS-9962
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3e6763b71bcdc9281f46b79c59e40f939190d468
Reviewed-on: https://review.whamcloud.com/43503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13417 test: generate uneven MDTs early for sanity 413 84/44384/6
Lai Siyao [Tue, 20 Jul 2021 01:24:36 +0000 (09:24 +0800)]
LU-13417 test: generate uneven MDTs early for sanity 413

Fill MDT early to generate uneven MDTs for sanity test_413, and
add test_413z to unlink these directories.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-1
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I84e3670bb40c3666488139d6a272f29188b0dfae
Reviewed-on: https://review.whamcloud.com/44384
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14868 llite: revert 'simplify callback handling for async getattr' 71/44371/6
Andreas Dilger [Wed, 21 Jul 2021 23:38:37 +0000 (23:38 +0000)]
LU-14868 llite: revert 'simplify callback handling for async getattr'

This reverts commit cbaaa7cde45f59372c75577d7274f7e2e38acd24.

This is causing process hangs and timeouts during file removal.

Test-Parameters: trivial
Fixes: cbaaa7cde4 ("LU-14139 llite: simplify callback handling for async getattr")
Change-Id: I77f5bc460850bfe7a5143e22b0c5f3e14a40474a
Reviewed-on: https://review.whamcloud.com/44371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-14826 mdt: getattr_name("..") under striped directory 68/44168/6
Lai Siyao [Thu, 8 Jul 2021 14:25:51 +0000 (10:25 -0400)]
LU-14826 mdt: getattr_name("..") under striped directory

For getattr_name(".."), it should return FID of the master object for
striped directories. This includes changes on both client and server:
* lmv_getattr_name() should use master object FID if it's looking up
  "..".
* mdt_raw_lookup() should check parent object is sub stripe, if so
  it needs to lookup again to get master object FID. For old client
  without above change this needs to be checked twice.

This is needed by NFS export, because ll_get_parent() find parent by
getattr_name("..").

Reenable check_fhandle_syscall and update sanityn test_102.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I72c951293e41656ce3778750147402d7f8ca4cec
Reviewed-on: https://review.whamcloud.com/44168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-14833 sec: quiet spurious gss_init_svc_upcall() message 97/44197/2
Sebastien Buisson [Fri, 9 Jul 2021 12:52:40 +0000 (14:52 +0200)]
LU-14833 sec: quiet spurious gss_init_svc_upcall() message

Switch from CWARN to CDEBUG(D_SEC) for message printed by
gss_init_svc_upcall():
Init channel is not opened by lsvcgssd, following request might be
dropped until lsvcgssd is active
Indeed, this message is printed no matter GSS is enabled or not, and
we do not have any way to check this by the time the kernel module
is loaded.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I66c8c2a16e58ca75973226c80e0f4a92c90b4025
Reviewed-on: https://review.whamcloud.com/44197
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14114 lnet: print device status in net show command 69/44169/2
Cyril Bordage [Wed, 7 Jul 2021 13:27:54 +0000 (15:27 +0200)]
LU-14114 lnet: print device status in net show command

A device can be in fatal state, if the cable was disconnected, or the
port brought down on the switch side. In these cases, the LND (o2iblnd
for now), will flag the device in fatal state. That device will not be
used any further. However, it's health will not be decremented. This
causes some confusion when examining the state of the node.
It is better to print the device status in the output of the lnetctl
net show command.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7c635ab1062f6153449fcec1bc07585065818a72
Reviewed-on: https://review.whamcloud.com/44169
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14804 nodemap: do not return error for improper ACL 27/44127/3
Sebastien Buisson [Thu, 1 Jul 2021 15:20:39 +0000 (00:20 +0900)]
LU-14804 nodemap: do not return error for improper ACL

In nodemap_map_acl(), in case the ACL is incorrect, do nothing
and just return initial size to caller.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I26aba9ce43e4a8878bfa47e145b1b44cfff89403
Reviewed-on: https://review.whamcloud.com/44127
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14300 quota: avoid nested lqe lookup 26/43326/5
Sergey Cheremencev [Mon, 12 Apr 2021 23:44:34 +0000 (02:44 +0300)]
LU-14300 quota: avoid nested lqe lookup

lqe_locate called from qmt_pool_lqes_lookup for lqe
that hasn't an entry on a disk calls qmt_lqe_set_default.
This may call qmt_set_id_notify->qmt_pool_lqes_spec
and rewrite already added lqes in a qti. Rewritten
lqes may trigger an assertion:

LustreError: 5072:0:(qmt_pool.c:838:qmt_pool_lqes_lookup()) ASSERTION( (((qmt_info(env)->qti_lqes_num) > 16 ? qmt_info(env)->qti_lqes : qmt_info(env)->qti_lqes_small)[(qmt_info(env)->qti_glbl_lqe_idx)])->lqe_is_global ) failed:
LustreError: 5072:0:(qmt_pool.c:838:qmt_pool_lqes_lookup()) LBUG
Pid: 5072, comm: mdt_rdpg00_003 3.10.0-957.1.3957.1.3.x4.1.15.x86_64 #1 SMP Mon Nov 18 14:47:03 PST 2019
Call Trace:
 [<ffffffffc046f62c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<ffffffffc046f94c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<ffffffffc0e4ae38>] qmt_pool_lqes_lookup+0x798/0x8f0 [lquota]
 [<ffffffffc0e3b0ce>] qmt_intent_policy+0x86e/0xe00 [lquota]
 [<ffffffffc109d53d>] mdt_intent_opc+0x3bd/0xb40 [mdt]
 [<ffffffffc10a5134>] mdt_intent_policy+0x1a4/0x360 [mdt]
 [<ffffffffc0a7bedb>] ldlm_lock_enqueue+0x3cb/0xad0 [ptlrpc]
 [<ffffffffc0aa4a46>] ldlm_handle_enqueue0+0xa56/0x1610 [ptlrpc]
 [<ffffffffc0b304b2>] tgt_enqueue+0x62/0x210 [ptlrpc]
 [<ffffffffc0b3753a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]

or a deadlock(2 same lqes qti_lqes array):

 call_rwsem_down_write_failed+0x17/0x30
 qti_lqes_write_lock+0xb1/0x1b0 [lquota]
 qmt_dqacq0+0x2ee/0x1ac0 [lquota]
 qmt_intent_policy+0xbfe/0xe00 [lquota]
 mdt_intent_opc+0x3ba/0xb50 [mdt]
 mdt_intent_policy+0x1a1/0x360 [mdt]
 ldlm_lock_enqueue+0x3d6/0xaa0 [ptlrpc]
 ldlm_handle_enqueue0+0xa76/0x1620 [ptlrpc]
 tgt_enqueue+0x62/0x210 [ptlrpc]
 tgt_request_handle+0x96a/0x1680 [ptlrpc]
 kthread+0xd1/0xe0

Patch adds a sanity-quota_73b to check that the isssue
doesn't exist anymore.

Change-Id: Ib1ebe82c3b6e819b2538f30af08930060bd659ae
HPE-bug-id: LUS-9902
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158581
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/43326
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14508 lfs: make mirror operations preserve timestamps 09/42009/17
John L. Hammond [Thu, 11 Mar 2021 16:02:54 +0000 (10:02 -0600)]
LU-14508 lfs: make mirror operations preserve timestamps

Save and try to restore the file timestamps around the various mirror
operations. Add sanity-flr tests 61[abc] to verify this.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5ef754e46cfbe82c731a709209576bbfcc73af3d
Reviewed-on: https://review.whamcloud.com/42009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12214 build: fix SLES build/install 72/39972/20
Alexey Lyashkov [Fri, 15 Jan 2021 15:21:15 +0000 (10:21 -0500)]
LU-12214 build: fix SLES build/install

Redhat and SuSe can have different library name for same devel,
lets drop a strong requrement to the library package name and
ask rpm to use an autoprovide option.

Test-Parameters: trivial
Test-Parameters: clientdistro=sles15sp1 ossdistro=el7.7 mdsdistro=el8.2
HPE-bug-id: LUS-7204
Fixes: e1bf37870d LU-12214 build: fix build with gss enabled
Fixes: d746e64fe1 LU-13562 build: SUSE build support for azure
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I7e0fe83f9090e7616ab156fa75fed4821099406e
Reviewed-on: https://review.whamcloud.com/39972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12022 tests: error on resync failure sanity-flr 54/35754/7
James Nunez [Tue, 15 Jun 2021 17:14:49 +0000 (11:14 -0600)]
LU-12022 tests: error on resync failure sanity-flr

In sanity-flr test 200, we should error if the final resync
fails.  Replace all calls to 'mirror_io resync' that does
not inject an error to  '$LFS mirror resync'.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I9b2ec1beb7060086808b7529467bef80c8e9659f
Reviewed-on: https://review.whamcloud.com/35754
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
4 months agoLU-6142 libcfs: checkpatch cleanup of libcfs fail.c 07/44207/4
James Simmons [Sat, 10 Jul 2021 14:54:23 +0000 (10:54 -0400)]
LU-6142 libcfs: checkpatch cleanup of libcfs fail.c

Resolve several checkpatch issues reported for fail.c. This brings
us into aligment with the native Linux client version.

Test-Parameters: trivial
Change-Id: I71e71f48a94fa20756f7696b5fbf115c919d05d3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44207
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg 22/44122/3
Mr NeilBrown [Thu, 1 Jul 2021 03:19:29 +0000 (13:19 +1000)]
LU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg

Rather than requiring the called to format a thread name into a temp
buffer, change these thread_start function to accept a format and
args, and to hand them directly to kthread_run().

This is done with a macro rather than a function as the functions are
trivial and varargs is slightly easier with macros.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I73926ef38a9e84061d1a3f9acf5c0be4a247f957
Reviewed-on: https://review.whamcloud.com/44122
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-6142 lnet: discard lnet_current_net_count 89/44089/2
Mr NeilBrown [Mon, 28 Jun 2021 06:22:02 +0000 (16:22 +1000)]
LU-6142 lnet: discard lnet_current_net_count

The variable lnet_current_net_count is never used.  So remove it.
The function lnet_get_net_count() is only used to update thar
variable, so remove it too.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id61f381f6220356c5b96c8a5822d8748a8ba43a4
Reviewed-on: https://review.whamcloud.com/44089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14217 osd-zfs: allow SEEK_HOLE/DATA only with sync 70/40970/2
Mikhail Pershin [Tue, 15 Dec 2020 11:47:20 +0000 (14:47 +0300)]
LU-14217 osd-zfs: allow SEEK_HOLE/DATA only with sync

ZFS doesn't report valid offset for SEEK_DATA if there are dirty
data, but may report SEEK_HOLE correctly that cause unreliable
results when same offset can be reported as HOLE (correctly) and
also as DATA, incorrectly but because switching to generic approach,
assuming all file is data and hole beyond end of file.

To avoid that we have to sync dirty data when dmu_offset_next()
reports EBUSY and repeat lseek call. Considering that this can
cause slowdown this behavior is controlled via new 'sync_on_lseek'
option. With this option turned off osd-zfs reports that it doesn't
support SEEK_DATA/HOLE because we cannot use unrealiable results
in our tools to copy sparse data

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic92c127628ce517a9c2f79f595a1d16116930383
Reviewed-on: https://review.whamcloud.com/40970
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14805 llite: No locked parallel DIO 31/44131/3
Patrick Farrell [Fri, 2 Jul 2021 17:24:48 +0000 (13:24 -0400)]
LU-14805 llite: No locked parallel DIO

If we are doing locked DIO, the OSC & LDLM locks are
released at the end of cl_io_loop, ie, before we wait for
parallel DIO at the llite layer.

This is problematic because the locks are released before
i/o done using them is complete; this can lead to data
inconsistencies.  (And at least one LBUG, see LU-14805.)

The easiest solution for now is only do parallel DIO when
working lockless (which is the default; DIO only switches
to locked to manage conflicts with buffered i/o).

This problem & fix apply to AIO as well as parallel DIO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If98a0551d6dde54220b406b26e978e284a6b1ebf
Reviewed-on: https://review.whamcloud.com/44131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13440 utils: fix handling of lsa_stripe_off = -1 30/43530/11
Andreas Dilger [Tue, 4 May 2021 01:25:23 +0000 (19:25 -0600)]
LU-13440 utils: fix handling of lsa_stripe_off = -1

Use LMV_OFFSET_DEFAULT instead of "-1" for parsing lfs_setdirstripe()
since parse_targets() will return "(__u32)-1" to the caller for the
stripe index, but lsa_stripe_off is a signed long long so it is
interpreted as 4294967295.  This causes the parsing to fail when
"lfs setdirstripe -i -1 --max-inherit-rr 1" is used.

Update sanity test_413a/413c to also specify "-i -1" to verify this.

In sanity test 413a,413b and 413c, create "qos" directory on most
full directory, so that its subdirectories won't be created on the
same MDT.

Fixes: f167f78e3bfd ("LU-13440 lmv: add default LMV inherit depth")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic934f859173155b1b2df56fcd315c8da633ebbe5
Reviewed-on: https://review.whamcloud.com/43530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14541 llite: avoid stale data reading 76/43476/5
Wang Shilong [Wed, 28 Apr 2021 14:26:10 +0000 (22:26 +0800)]
LU-14541 llite: avoid stale data reading

remove_mapping() can prohibit to kill page from page cache due page
refcount!=2, in vvp_page_delete() clear uptodate flag in case
stale data reading later.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I322debec951b1a342246475456c0f40e10b0e578
Reviewed-on: https://review.whamcloud.com/43476
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14828 tests: Remove extra debug 75/44175/3
Patrick Farrell [Wed, 7 Jul 2021 16:44:52 +0000 (12:44 -0400)]
LU-14828 tests: Remove extra debug

Accidentally committed 398m with extra debug.
This is sometimes causing OOMs in testing, and it's a
mistake anyway.

Fixes: cba07b68f9 ("LU-13798 llite: parallelize direct i/o issuance")
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I734aa3a952d2c085b3fc0014af1bdc0e881000e6
Reviewed-on: https://review.whamcloud.com/44175
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14817 build: __xa_set_mark is not checked anymore 38/44138/3
Vitaly Fertman [Sat, 3 Jul 2021 09:25:14 +0000 (12:25 +0300)]
LU-14817 build: __xa_set_mark is not checked anymore

LC__XA_SET_MARK does not check for __xa_set_mark anymore after
LU-9859, however the result variable still exists and its value
has changed from 'no' to 'yes'.

Test-Parameters: trivial
Fixes: 84e12028be ("LU-9859 libcfs: add support for Xarray")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I24fffe7f2727b1d892ec3cabfc6e65ae8f68e024
Reviewed-on: https://review.whamcloud.com/44138
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14808 utils: fix YAML support for DOM files 33/44133/3
Vitaly Fertman [Tue, 15 Jun 2021 14:47:25 +0000 (17:47 +0300)]
LU-14808 utils: fix YAML support for DOM files

LFS getstripe never reports LLAPI_LAYOUT_DEFAULT for any stripe
parameter, but 0 or -1 whatever is appropriate.

LU-3285 added extra verification for the DOM parameters, precisely
the stripe count, size and offset have no sense for DOM and are
expected to be LLAPI_LAYOUT_DEFAULT. However, this brakes the yaml
support which uses getstripe output as the wanted values.

Also move the sanity-flr test_6 to ALWAYS_EXCEPT due to LU-14818.

Fixes: 6744eb8eeb ("LU-3285 lfs: add parameter for Data-on-MDT file")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
HPE-bug-id: LUS-10090
Change-Id: Ide0c0fc264c7d1bac487306edf896d90153cf768
Reviewed-on: https://es-gerrit.dev.cray.com/158810
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/44133
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14786 lod: create missing debugfs file 13/44113/3
James Simmons [Tue, 29 Jun 2021 17:14:39 +0000 (13:14 -0400)]
LU-14786 lod: create missing debugfs file

While cleaning up debugfs symlinks the needed, but unused lod debugfs
directory was dropped. This results in the broken symlink

/sys/kernel/debug/lustre/lov/lustre-MDT0000-mdtlov

lctl params handling didn't see this due to glob returning only valid
directory entries so the error didn't get reported by stat(). Restore
the debugfs directory and add a new test to conf-sanity to detect any
potential breakage in the future.

Change-Id: I8fe0732d6caeeb83554833205998e24214343f88
Test-Parameters: env=ONLY=10a testlist=conf-sanity
Fixes: 462d476d ("LU-8066 obd: cleanup server sysfs symlinks handling")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44113
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14677 sec: migrate/extend/split on encrypted file 78/43878/6
Sebastien Buisson [Fri, 28 May 2021 16:11:53 +0000 (18:11 +0200)]
LU-14677 sec: migrate/extend/split on encrypted file

lfs migrate/extend/split makes use of volatile files to swap layouts.
When operation is carried out on an encrypted file, the volatile file
must be assigned the same encryption context as the original file, so
that data moved/copied to different OSTs is identical to the original
file's.
Also update sanity-sec test_52 to exercise these commands.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3878b5e9e6d3738dfee0ce0f89a3646e6a7b976f
Reviewed-on: https://review.whamcloud.com/43878
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14430 mdd: rename mti_oa to mdi_oa and friends 39/43739/5
Andreas Dilger [Thu, 13 May 2021 10:42:20 +0000 (04:42 -0600)]
LU-14430 mdd: rename mti_oa to mdi_oa and friends

Rename fields in mdd_thread_info to confusion with mdt_thread_info.
The second patch of several to rename all mdd_thread_info fields
to use a more unique field prefix:

  mti_dof->mdi_dof
  mti_dt_rec->mdi_dt_rec
  mti_ent->mdi_ent
  mti_flags->mdi_flags
  mti_hint->mdi_hint
  mti_key->mdi_key
  mti_link_data->mdi_link_data
  mti_name->mdi_name
  mti_oa->mdi_oa
  mti_range->mdi_range
  mti_spec->mdi_spec

The mti_lmv and mti_lrl fields are removed since they are unused.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6fd4b7f26b7e9561d8a8585eaa5438d6093ebbe5
Reviewed-on: https://review.whamcloud.com/43739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14430 mdd: rename mti_big_buf to mdi_big_buf 38/43738/6
Andreas Dilger [Thu, 13 May 2021 10:27:49 +0000 (04:27 -0600)]
LU-14430 mdd: rename mti_big_buf to mdi_big_buf

Avoid serious confusion with the MDT mti_big_buf, and other fields
in mdd_thread_info, since they are two separate buffers completely.

  mti_big_buf->mdi_big_buf
  mti_chlg_buf->mdi_chlg_buf
  mti_link_buf->mdi_link_buf
  mti_xattr_buf->mdi_xattr_buf

The first patch of several to rename all mdd_thread_info fields.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib0ec91c8481e747ed058afe5c08c3f60203ebbe5
Reviewed-on: https://review.whamcloud.com/43738
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10499 pcc: introducing OBD_CONNECT2_PCCRO flag 91/40791/8
Qian Yingjin [Mon, 30 Nov 2020 02:08:17 +0000 (10:08 +0800)]
LU-10499 pcc: introducing OBD_CONNECT2_PCCRO flag

Add a new connection flag OBD_CONNECT2_PCCRO to solve the access
consistency from the old client without PCC-RO support.

By necessity, also include definitions for OBD_CONNECT2_MODE_CONVERT
and OBD_CONNECT2_BATCH_RPC so obd_connect_names[] works.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I19716e94a86e53353c1628d414c92e61e084dfc9
Reviewed-on: https://review.whamcloud.com/40791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
4 months agoLU-14780 llite: failed ASSERTION(ldlm_has_layout(lock)) 54/44054/2
Bobi Jam [Fri, 4 Jun 2021 03:58:29 +0000 (11:58 +0800)]
LU-14780 llite: failed ASSERTION(ldlm_has_layout(lock))

When setting layout in layout lock, the lock could lost its layout
bits, and we'd try fetch the layout lock again.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I10f96e4cb03cfe228d3c1ea1500b1a8d8e4e5e54
Reviewed-on: https://review.whamcloud.com/44054
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14459 mdt: support fixed directory layout 91/43291/7
Lai Siyao [Wed, 28 Apr 2021 21:30:00 +0000 (05:30 +0800)]
LU-14459 mdt: support fixed directory layout

User may not want directories split automatically in some cases:
*.directory migrated.
* directory restriped.

To support this, an LMV flag LMV_HASH_FLAG_FIXED is added, and it will
be set on migrated/restriped directories. NB, if directory is migrated
or restriped to a one-stripe directory, it won't be transformed into a
plain directory, because this flag needs to be kept.

Update sanity 230q.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Icd12b2aa34d391e32c3323a8b9c24449ea3e3d0e
Reviewed-on: https://review.whamcloud.com/43291
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14459 mdt: restripe parent may be a stripe 90/43290/8
Lai Siyao [Mon, 12 Apr 2021 03:30:13 +0000 (11:30 +0800)]
LU-14459 mdt: restripe parent may be a stripe

mdt_restripe() check parent LMV sanity with lmv_is_sane(), but parent
may be a stripe, use lmv_is_sane2() instead.

Clear lmv_migrate_hash/offset in layout shrink/update, though it
won't cause any issue, it's strange to see values set in debug logs.

Add more race check between directory restripe, auto-split and
migration.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I38950a07a8c9a8b4b20a2fd7aff229d27dbb403c
Reviewed-on: https://review.whamcloud.com/43290
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14459 llite: reset pfid after dir migration 89/43289/7
Lai Siyao [Mon, 12 Apr 2021 03:17:37 +0000 (11:17 +0800)]
LU-14459 llite: reset pfid after dir migration

A plain directory will be turned into to a stripe upon
migration/restripe, and reversely if target is plain directory, the
target stripe will be turned into directory after.

In the first case, set pfid, and in the latter case, clear pfid,
otherwise ll_lock_cancel_bits() will use the wrong master inode.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I01cac0103dc79d493166e6b090508d24f9678a57
Reviewed-on: https://review.whamcloud.com/43289
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14739 quota: nodemap squashed root cannot bypass quota 88/43988/7
Sebastien Buisson [Fri, 11 Jun 2021 14:49:47 +0000 (16:49 +0200)]
LU-14739 quota: nodemap squashed root cannot bypass quota

When root on client is squashed via a nodemap's squash_uid/squash_gid,
its IOs must not bypass quota enforcement as it normally does without
squashing.
So on client side, do not set OBD_BRW_FROM_GRANT for every page being
used by root. And on server side, check if root is squashed via a
nodemap and remove OBD_BRW_NOQUOTA.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I95b31277273589e363193cba8b84870f008bb079
Reviewed-on: https://review.whamcloud.com/43988
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14733 o2iblnd: Avoid double posting invalidate 90/44190/3
Mike Marciniszyn [Wed, 7 Jul 2021 19:16:01 +0000 (15:16 -0400)]
LU-14733 o2iblnd: Avoid double posting invalidate

When the kib_tx is provisioned during kiblnd_fmr_pool_map(), spare
WRs in the kib_fast_reg_descriptor are setup and the mapping of
pages is given to the mr.

kiblnd_post_tx_locked() then posts the spare WRs from the
kib_fast_reg_descriptor.

if (rc == 0)
return 0;

The code returns and the kib_fast_reg_descriptor is still contains
the spare WRs.   The next time the kib_tx is used, the
now obsolete WRs will be inadvertently posted.   For rdmavt, the
obsolete invalidate will cause an -EINVAL to be returned from
the post send.

Fix by adding a state variable frd_posted to the kib_fast_reg_descriptor.
The variable is set to false in kiblnd_fmr_pool_unmap().
kiblnd_post_tx_locked() is adjusted to avoid prepending the
kib_fast_reg_descriptor WRs when frd_posted is true.   After
the post succeeds, the frd_posted is set to true.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Change-Id: I426dd05e635392e75d1aa48808782a229e83ce5f
Reviewed-on: https://review.whamcloud.com/44190
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14733 o2iblnd: Move racy NULL assignment 89/44189/2
Mike Marciniszyn [Wed, 7 Jul 2021 19:16:00 +0000 (15:16 -0400)]
LU-14733 o2iblnd: Move racy NULL assignment

kiblnd_fmr_pool_unmap() can race map and subsequent processing
because of this flaw in unmap:

if (frd) {
frd->frd_valid = false;
spin_lock(&fps->fps_lock);
list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
spin_unlock(&fps->fps_lock);
fmr->fmr_frd = NULL;
}

The fmr can be pulled off the list in kiblnd_fmr_pool_unmap() on
another CPU an fmr_frd could be in a state of flux and
potentially be seen incorrectly later on as the kib_tx is processed.

Fix my moving the fmr_frd assignment to before the fmr is added to the
list.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Change-Id: Ibddf132a363ecfe9db3cc06287cec873c021d2fb
Reviewed-on: https://review.whamcloud.com/44189
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14798 lnet: RMDA infrastructure updates 09/44109/2
Amir Shehata [Thu, 6 Feb 2020 01:46:03 +0000 (17:46 -0800)]
LU-14798 lnet: RMDA infrastructure updates

Add infrastructure to force RDMA for payloads < 4K.
Add infrastructure to extract the first page in a
payload. Useful for determining the type of the payload
to be transmitted.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id7dc26c83f00dadd26feca94fc4d8233872650d3
Lustre-change: https://review.whamcloud.com/37453
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44109
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14779 utils: no DNS lookups for NID in get_param 56/44056/4
Andreas Dilger [Wed, 23 Jun 2021 08:20:24 +0000 (02:20 -0600)]
LU-14779 utils: no DNS lookups for NID in get_param

Calling libcfs_str2nid() speculatively in "lctl get_param" to see
if there is a NID in the parameter name results in multiple DNS
lookups for invalid hostnames (e.g. "exports.192.168.0.10"). That
may take a very long time if there are a large number of connected
clients, and if the DNS server overloaded or is having problems.

Instead of doing these speculative NID conversions, skip the whole
NID string in the parameter name for the two known parameters that
may contain a NID ("*.exports.<NID>.*" and "*.MGC<NID>.*").  This
is considerably faster since it is only working on a local string.

If new parameters are added that contain a NID (unlikely, but
possible), then "clean_path()" would need to be updated as part
of that change.

Fixes: 85cbe1a3ee69 ("LU-5030 util: migrate lctl params functions to use cfs_get_paths()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I51f865e4ce3a7bc4879f9d688c4b3a68d731810f
Reviewed-on: https://review.whamcloud.com/44056
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14778 readahead: fix to reserve min pages 50/44050/3
Wang Shilong [Tue, 22 Jun 2021 01:26:40 +0000 (09:26 +0800)]
LU-14778 readahead: fix to reserve min pages

@pages_min might be larger than @pages which indicate
more pages should be read, and it will cause a warning
later.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ifd82f709c3877172f08b87ab0551da735a0613e0
Reviewed-on: https://review.whamcloud.com/44050
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14776 ldiskfs: Add Ubuntu 20.04 HWE support 39/44039/4
James Simmons [Mon, 28 Jun 2021 16:15:43 +0000 (10:15 -0600)]
LU-14776 ldiskfs: Add Ubuntu 20.04 HWE support

Use the already landed ldiskfs support for Linux 5.8.0 to enable
support for the Ubuntu 20.04 HWE 5.8.0-53 kernel. Another change
that started with the 5.7 kernel is removal of the flag
EXT4_GET_BLOCKS_KEEP_SIZE. The code was no longer needed with the
removal of EXT4_EOFBLOCKS_FL which happened in 2012. e2fsprog
support for this flag has been removed since version 1.42.2.

Change-Id: I60db446bab50178a601e1c2c20e782435f9f50f2
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44039
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>