Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-15176 sec: present .fscrypt in subdir mount 67/46167/3
Sebastien Buisson [Wed, 12 Jan 2022 10:13:44 +0000 (11:13 +0100)]
LU-15176 sec: present .fscrypt in subdir mount

fscrypt userspace tool works with a .fscrypt directory at the root of
the file system. In case of subdirectory mount, we virtually present
this .fscrypt directory at the root of the mount point so that fscrypt
can be used. This makes it possible to even do a subdirectory mount of
an encrypted directory, making clients access encrypted content only.
Internally, the .fscrypt directory is always stored at the root of
Lustre.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2a0ee360f724da1df49b2be0df986d52e06f45fd
Reviewed-on: https://review.whamcloud.com/46167
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15452 utils: support lctl getattr for osc 31/46131/3
John L. Hammond [Fri, 14 Jan 2022 16:58:59 +0000 (10:58 -0600)]
LU-15452 utils: support lctl getattr for osc

In lctl:jt_obd_getattr(), support FIDs in addition to OIDs and print
whatever valid attributes were returned. Add a supporting
OBD_IOC_GETATTR case to osc_iocontrol().

  # function lctl_osc_device() {
    # Find osc device name for file and index.
    # lctl_osc_device /mnt/lustre/... 42 => lustre-OST002a-osc-ffff89cca1555000
    local path="$1"
    local index="$2"
    local fsname=$(lfs getname --fsname "$path")
    local instance=$(lfs getname --instance "$path")

    printf '%s-OST%04x-osc-%sn' "$fsname" "$index" "$instance"
  }
  # lfs getstripe /mnt/lustre/f0 | grep l_ost_idx
        - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }
        - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] }
        - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] }
        - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] }
  # lctl --device $(lctl_osc_device /mnt/lustre 1) getattr '[0x100010000:0x2:0x0]'
  valid: 0x110000001008fff
  oi.oi.oi_id: 0x100020000
  oi.oi.oi_seq: 0x2
  oi.oi_fid: [0x100020000:0x2:0x0]
  atime: 0
  mtime: 1642178551
  ctime: 1642178551
  size: 0
  blocks: 0
  blksize: 4194304
  mode: 0107666
  uid: 0
  gid: 0
  flags: 2097152
  layout_version: 3
  projid: 0
  data_version: 4294967298

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I57d5778e9ac39030ae9477a0979f20b7f7460fc8
Reviewed-on: https://review.whamcloud.com/46131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15398 lnet: Avoid peer NI recovery for local interface 33/45933/10
Chris Horn [Thu, 23 Dec 2021 20:15:27 +0000 (14:15 -0600)]
LU-15398 lnet: Avoid peer NI recovery for local interface

If a MR peer has a MR peer entry for itself (can happen if manually
created or discovery is run on itself for some reason), then it is
possible for it to put its own interfaces into peer recovery. Problems
with local interfaces should be handled via local NI recovery.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10661
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5b28195979a6113fa863b5795a4528b072610891
Reviewed-on: https://review.whamcloud.com/45933
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15398 tests: Use remote peers for health tests 75/45975/8
Chris Horn [Tue, 4 Jan 2022 20:42:26 +0000 (14:42 -0600)]
LU-15398 tests: Use remote peers for health tests

LNet health may take different action depending on whether a NID
belongs to the local host or a remote peer. As such, the test cases
need to be careful to use remote or local NIs appropriately.

Introduce helper functions to create and cleanup LNet peers that are
needed for these tests. Convert existing test cases to use the new
helpers.

New function, lnet_if_list(), is added to test-framework.sh to
facilitate configuration of remote interfaces. do_rpc_nodes() modified
to recognize '--quiet' flag to ease parsing of lnet_if_list() output.

Tests 204 and 206 were re-worked to check the health state after each
simulated error. lnet_health_post() modified to reset peer and local
NI health so they are at max value when each error condition is
simulated.

Test 214, 215, and 250 were using hardcoded "eth0" names. These were
switched to use the INTERFACES variable.

The lnet_recovery_limit parameter is deprecated so remove lines that
were setting that parameter.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10661
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I685fda8a84bcce024a765ddfc81c085acf24607a
Reviewed-on: https://review.whamcloud.com/45975
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o 51/44651/4
Patrick Farrell [Thu, 12 Aug 2021 20:28:29 +0000 (16:28 -0400)]
LU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o

Logging to the console is relatively expensive and doing it
for fake i/o is very expensive in terms of CPU time.

If we use FAIL_CHECK_QUIET, a debug message is logged only once
to the console, and the rest at D_INFO level (probably not at all).

This should hugely reduce the CPU cost of the debugging.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I46a5042efd116a4f5c80eaf0d5dae7fe132f6a79
Reviewed-on: https://review.whamcloud.com/44651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
2 years agoLU-15286 build: only use baseonly option on el7 77/45677/9
Minh Diep [Mon, 29 Nov 2021 23:32:17 +0000 (15:32 -0800)]
LU-15286 build: only use baseonly option on el7

el7 baseonly option allow to build perf package while
in el8 does not.

Test-Parameters: trivial

Change-Id: Ie973c5cc816b4b98ef71ab7080bd11286bcd644a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45677
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14879 ldiskfs: Support for SUSE 15 sp3 75/44375/9
Shaun Tancheff [Tue, 18 Jan 2022 16:14:26 +0000 (23:14 +0700)]
LU-14879 ldiskfs: Support for SUSE 15 sp3

Add a configure test and updated series for sles15sp3 for the
updated ext4-data-in-dirent.patch

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie56de51701ae903c515d9184e5e79e4cfaf76606
Reviewed-on: https://review.whamcloud.com/44375
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lutf: use configured tmp directory for tar 43/44843/11
Amir Shehata [Fri, 3 Sep 2021 19:12:33 +0000 (12:12 -0700)]
LU-10973 lutf: use configured tmp directory for tar

After the LUTF run is done all the test results on all
the agent nodes need to be tarred, in order to make them
available for review later on. Don't assume the lutf tmp
files are in /tmp/lutf. Use the tmp-dir directory configured
in the lutf configuration.

Add the master only once to the agent list.

Pass PYTHOPATH to agent.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic3effcb53f7d27bf31b6adfd8a22900767ff9524
Reviewed-on: https://review.whamcloud.com/44843
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13621 lnet: utility to print cpt number 13/39113/9
Amir Shehata [Fri, 19 Jun 2020 23:31:36 +0000 (16:31 -0700)]
LU-13621 lnet: utility to print cpt number

Added a command to lnetctl to print the cpt of the NID.
lnetctl cpt-of-nid --nid <nid> --ncpt <number of cpts>
ex:
lnetctl cpt-of-nid --nid 192.28.12.35@tcp9 --ncpt 7
This will return what cpt the NID will hash to within the 0-6 range.
If the NI is bound to specific set of CPTs, then the ncpts refers
to the number of CPTs the NI is bound to. The cpt value returned
will be an index into the list of bound CPTs.

For example if an NI is bound to [0,4,5,7], then the ncpt should be
4. And the returned value will be an index in the array:
ex:
lnetctl cpt-of-nid --nid 192.28.12.35@tcp9 --ncpt 4
cpt:
    value: 1
therefore, the actual CPT the NID will be bound to is 4.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3cb562842448bfb663c2d41007be65299a919300
Reviewed-on: https://review.whamcloud.com/39113
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10633 mdt: Convert MDS restoring RPC message to D_WARNING 14/31214/6
Chris Horn [Wed, 7 Feb 2018 21:26:04 +0000 (15:26 -0600)]
LU-10633 mdt: Convert MDS restoring RPC message to D_WARNING

Using D_WARNING instead of D_RPCTRACE causes the message to be both
logged in the Lustre DK logs and on the system console.  This patch
changes the MDS restore/replay debug message to use D_WARNING.

A restored/replayed metadata request indicates some sort of underlying
error, and even when handled correctly, should generate a warning.

Test-Parameters: trivial
Change-Id: Iff98521853323469fc5d6c7d546ca83477b1cb9f
HPE-bug-id: LUS-2578
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/31214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12056 ldiskfs: add ext4-projid-xattrs.patch for Linux 5.10 95/46295/2
Li Dongyang [Tue, 25 Jan 2022 02:12:49 +0000 (13:12 +1100)]
LU-12056 ldiskfs: add ext4-projid-xattrs.patch for Linux 5.10

ext4-projid-xattrs.patch was missed during the landing/review
process for ldiskfs-5.10.0-ml.series.
We also need a small change in base/ext4-projid-xattrs.patch to
make it apply on v5.10.

Change-Id: I2b7a6c957bd8b40cf78dbd9f4680b722e8d4418a
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/46295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15340 llite: Delay dput in ll_dirty_page_discard_warn 84/45784/4
Oleg Drokin [Wed, 8 Dec 2021 04:30:06 +0000 (23:30 -0500)]
LU-15340 llite: Delay dput in ll_dirty_page_discard_warn

Otherwise we can be final dput and need to wait for pages
to clear which is bad because this is called from ptlrpcd
that is not supposed to block esp. for network traffic as
it can cause livelocks if it happens to be needed to kill
the very same RPC we are waiting on.

Additionally pass in the inode from IO since the page
we are using might come from directio and that is
probably not even a valid inode.

Fixes: 624a3ac23393 ("LU-921 llite: warning in case of discarding dirty pages")
Change-Id: Ie2f1a34047145202c11a4e1a0b18b2e01d9e4601
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45784
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 years agoLU-15282 lod: less spinlock on the alloc rr 94/45694/9
Alexey Lyashkov [Wed, 1 Dec 2021 13:38:45 +0000 (16:38 +0300)]
LU-15282 lod: less spinlock on the alloc rr

Don't need to hold spinlock for so much time, anyway it's released
in middle of loop, so RR cannot be perfect in multithreaded case.

Fix small bug in RR precession for stripecount=4+OSTCOUNT=6.

Fixes: 665e36b780f ("OST pools on HEAD")
HPe-bug-id: LUS-10627
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I66eded451c8256de0e5a9a0eb862af8b306da9e1
Reviewed-on: https://review.whamcloud.com/45694
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15285 mdt: fix same-dir racing rename deadlock 76/45676/9
Oleg Drokin [Mon, 29 Nov 2021 21:45:16 +0000 (16:45 -0500)]
LU-15285 mdt: fix same-dir racing rename deadlock

With LU-12125 lifting the BFL for same directory rename,
a deadlock possibility opens up since we lock source and target
of rename in the source-target order, if there are two renames
racing to rename arguments in reverse order:
mv a b &
mv b a

a lock inversion happens and a deadlock has been observed.

To avert this - instill additional order requirement:
lower PDO hash value is to be locked ahead of the higher one.

Fixes: d76cc65d5d68 ("LU-12125 mds: allow parallel regular file rename")
Fixes: b50bb830f92e ("LU-3538 dne: Commit-on-Sharing for DNE")
Fixes: 9f1711f3d7d1 ("LU-12081 mdt: rename shouldn't PDO lock if parent is remote")
Change-Id: I88dd3aebb394ea40e97e6029d6dcc161116f982e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45676
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
2 years agoLU-14645 utils: optimise setstripe 52/46152/5
Vitaly Fertman [Mon, 17 Jan 2022 19:54:12 +0000 (22:54 +0300)]
LU-14645 utils: optimise setstripe

skip some excessive checks:
- do not check the file is on lustre fs, the following ioctl does it;
- do not check the stripe-index is valid, done on MDS side;
- do not check the pool exists for a !PFL file (align with a setstripe
  for PFL files);

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ia21f85c3ab73a970bad8d11e175c0063ab3a307f
Reviewed-on: https://review.whamcloud.com/46152
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14645 utils: fix API for llapi_sanity_check 51/46151/3
Vitaly Fertman [Mon, 17 Jan 2022 18:49:52 +0000 (21:49 +0300)]
LU-14645 utils: fix API for llapi_sanity_check

fix the previous patch which introduced a change in API.

Fixes: 149934fe28 ("LU-14645 utils: setstripe cleanup")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I43ae6822768ac70c9348af270c17830b13133f8c
Reviewed-on: https://review.whamcloud.com/46151
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13514 tests: replace nid in conf-sanity test_32 54/46354/2
Yang Sheng [Wed, 4 Nov 2020 18:36:43 +0000 (02:36 +0800)]
LU-13514 tests: replace nid in conf-sanity test_32

Need replace_nid for test_32a. Else the mdc cannot
be initialzed and prevent client mounting hung.

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY=32a,ONLY_REPEAT=20
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I651f5728ad4ff96a309ed599490c9dd6ed9c5274
Reviewed-on: https://review.whamcloud.com/40537
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46354
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15220 utils: use 'fallthrough' pseudo keyword for switch 70/46270/4
Jian Yu [Sun, 23 Jan 2022 02:28:56 +0000 (18:28 -0800)]
LU-15220 utils: use 'fallthrough' pseudo keyword for switch

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

Test-Parameters: trivial
Change-Id: Icace4c9953950f86d3c48068d8c6bba7dd1160a7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
2 years agoLU-15220 lustre: use 'fallthrough' pseudo keyword for switch 69/46269/3
Jian Yu [Sun, 23 Jan 2022 02:21:04 +0000 (18:21 -0800)]
LU-15220 lustre: use 'fallthrough' pseudo keyword for switch

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

Test-Parameters: trivial
Change-Id: Icace4c9953950f86d3c48068d8c6bba7dd1160a6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46269
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
2 years agoLU-15220 lnet: use 'fallthrough' pseudo keyword for switch 66/45566/12
Jian Yu [Thu, 20 Jan 2022 18:19:34 +0000 (10:19 -0800)]
LU-15220 lnet: use 'fallthrough' pseudo keyword for switch

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

Test-Parameters: trivial
Change-Id: Icace4c9953950f86d3c48068d8c6bba7dd1160a5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45566
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15220 tests: avoid gcc-11 -Werror=stringop-overread warning 77/45777/7
Jian Yu [Thu, 20 Jan 2022 19:06:42 +0000 (11:06 -0800)]
LU-15220 tests: avoid gcc-11 -Werror=stringop-overread warning

GCC 11 warns about string and memory operations on fixed address:

In function 'memcpy', inlined from 'obd_uuid2str' at
lustre/include/uapi/linux/lustre/lustre_user.h:1222:3,
include/linux/fortify-string.h:20:33: error: '__builtin_memcpy'
reading 39 bytes from a region of size 0 [-Werror=stringop-overread]
   20 | #define __underlying_memcpy     __builtin_memcpy
      |                                 ^
include/linux/fortify-string.h:191:16: note:
in expansion of macro '__underlying_memcpy'
  191 |         return __underlying_memcpy(p, q, size);
      |                ^~~~~~~~~~~~~~~~~~~

The patch avoids the above warning by not using a fixed address.

badarea_io.c:47:14: error: 'write' reading 5 bytes from a region
of size 0 [-Werror=stringop-overread]
   47 |         rc = write(fd, (void *)0x4096000, 5);
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The patch avoids the above warning by making the pointer volatile
as suggested in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578#c16

Change-Id: I90b936835c6236a0f47e744013e3e480442f682c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45777
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15462 gnilnd: Fix syntax accessor to nid_addr 26/46226/2
Shaun Tancheff [Thu, 20 Jan 2022 03:23:58 +0000 (10:23 +0700)]
LU-15462 gnilnd: Fix syntax accessor to nid_addr

Minor typo breaking build of gnilnd

Test-Parameters: trivial
Fixes: 57c03f307075 ("LU-10391 lnet: extend nids in struct lnet_msg")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia8444d541c5ec175eacb2bf96d72e4b0fd80d19f
Reviewed-on: https://review.whamcloud.com/46226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15455 tests: fix [: ==: unary operator expected 50/46150/2
Elena Gryaznova [Mon, 17 Jan 2022 17:47:22 +0000 (20:47 +0300)]
LU-15455 tests: fix [: ==: unary operator expected

PARALLEL is not initiallized in recovery-small.
Patch fixes the bash syntax error.

Fixes: 26e8f1137b ("LU-13116 mgc: do not lose sptlrpc config lock")
Fixes: 688d5da6a8 ("LU-12846 mdd: return error while delete failed")
Test-Parameters: trivial env=ONLY="141 143" testlist=recovery-small
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10679
Change-Id: I495bfd077edf3d15f1d47ccb4723e1de46de94e7
Reviewed-on: https://review.whamcloud.com/46150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15420 libcfs: replace deprecated CPU-hotplug functions 85/46085/2
Jian Yu [Thu, 13 Jan 2022 01:04:12 +0000 (17:04 -0800)]
LU-15420 libcfs: replace deprecated CPU-hotplug functions

Kernel 5.15 commit 8c854303ce0e38e5bbedd725ff39da7e235865d8
removed deprecated CPU-hotplug functions get_online_cpus()
and put_online_cpus(). They map directly to cpus_read_lock()
and cpus_read_unlock().

Change-Id: I09d489cd3ca9a575b20ea25f24210702fbfdd725
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46085
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15445 tests: sanity test_160p() fix 73/46073/2
Elena Gryaznova [Wed, 12 Jan 2022 17:03:02 +0000 (20:03 +0300)]
LU-15445 tests: sanity test_160p() fix

start() requires 2nd parameter device.
If start() is called without the 2nd parameter - the
empty mds1_dev is exported:
        eval export ${dev_alias}_dev=${device}
and test fails on failover setup with:
  CMD: lm0301 loop_dev=$(losetup -j  | cut -d : -f 1);
  lm0301: losetup: option requires an argument -- 'j'
dm_create_dev()
  local real_dev=<empty>
        -> setup_loop_device $facet <empty>
To reproduce the failure just run:
  ONLY=160p sh sanity.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: c7d8fe3106 ("LU-14731 mdd: clear orphans changelog entries")
Test-Parameters: trivial env=ONLY="160p" testlist=sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10674
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2661567672aa9c6e23b5f17500d81053cf9c9fdd
Reviewed-on: https://review.whamcloud.com/46073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15429 tests: mount_mds_client() fix 43/46043/2
Elena Gryaznova [Tue, 11 Jan 2022 17:23:30 +0000 (20:23 +0300)]
LU-15429 tests: mount_mds_client() fix

mount/umount client is to be executed on active facet/host,
not on mds1_HOST. Without this fix test_140a() fails on
failover setup:
  CMD: lm0101 umount /mnt/lustre2 2>&1
  CMD: lm0102 rmdir /mnt/lustre2
  lm0102: rmdir: failed to remove '/mnt/lustre2':
                 No such file or directory
  test_140a: FAIL: no clients with recovery disabled

To reproduce the failure just run:
  ONLY="107 140a" sh recovery-small.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: 8bd04b4e57 ("LU-12722 target: disable recovery for local clients")
Test-Parameters: trivial env=ONLY="140a 140b" testlist=recovery-small
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10669
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ifbdedfda840e8421fa8a969f73131ca23982a28b
Reviewed-on: https://review.whamcloud.com/46043
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13639 tests: increase limit in sanity-quota t_2 43/45943/3
Sergey Cheremencev [Mon, 27 Dec 2021 16:45:51 +0000 (19:45 +0300)]
LU-13639 tests: increase limit in sanity-quota t_2

If limit is equal to the least_qunit, slave target may
preacquire more quota while creating "limit number" of files
causing EDQUOT. Change limit from soft_qunit to 2 soft_qunit.
The patch also changes test behaviour - createmany is devided
to 2 parts. Firstly, it creates (limit-least_qunit) nodes and
check that there is no EDQUOT. Then it creates least_qunit
nodes and ignore the result of creating - it is a valid case
if it hits the limit. And only after that check that we
can't create nodes over quota.

Change-Id: Iad7c1cc05119c8d3e0f1cfc2adffb276d79f18c7
Test-Parameters: testgroup=review-dne-zfs-part-4
Test-Parameters: testlist=sanity-quota env=ONLY=2, ONLY_REPEAT=200
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45943
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15381 hsm: update size upon completion of data version 35/45935/4
Qian Yingjin [Fri, 17 Dec 2021 08:53:37 +0000 (16:53 +0800)]
LU-15381 hsm: update size upon completion of data version

We found a HSM retore followed by a HSM release will set the
file size with 0 wrongly during the tests.
The reason is that the file size and blocks information is
incorrect obtained via @ll_merger_attr().
The data version operation will flush dirty pages from all
clients, the size and blocks information returns from the Lustre
OST is correct.
In this patch, we update the size and block attributes for a file
upon the completion of the data version operation accordingly.
By this way, HSM release will set the size and blocks information
correctly after data version ioctl operation.

Add sanity-hsm test_261.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ifdbf6b58ecd00dc9677a2328438ef68529b72882
Reviewed-on: https://review.whamcloud.com/45935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15366 nrs: increase maximum rate for tbf rule 38/45838/3
Etienne AUJAMES [Mon, 13 Dec 2021 15:30:35 +0000 (16:30 +0100)]
LU-15366 nrs: increase maximum rate for tbf rule

The maximum rpc rate for a tbf rule is 65535. This value could be
problematic for cluster with a large number of clients.

This patch uniformizes the usage of __u64 to store a rpc rate.
And changes the maximum rate for a tbf rule to 1000000 (1 rpc/us)

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I91fd416b9d91bbb5d5674c66ec8ceb0d77a9f7e0
Reviewed-on: https://review.whamcloud.com/45838
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11643 tests: add new images and tests for upgrade tests 27/45827/3
Wei Liu [Fri, 10 Dec 2021 20:41:11 +0000 (12:41 -0800)]
LU-11643 tests: add new images and tests for upgrade tests

Add new images for conf-sanity.sh 32

disk2_10-ldiskfs.tar.bz2
disk2_12-ldiskfs.tar.bz2

Test-Parameters: trivial
Test-Parameters: fstype=ldiskfs envdefinitions=ONLY="32f 32g" testlist=conf-sanity

Change-Id: I6682e247308d7cf3fb57eee595751d6d140a421f
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45827
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15328 lnet: Set rc to ENOMEM in lnet_selftest_init on error 63/45763/2
Oleg Drokin [Tue, 7 Dec 2021 03:49:52 +0000 (22:49 -0500)]
LU-15328 lnet: Set rc to ENOMEM in lnet_selftest_init on error

Test-Parameters: trivial
Change-Id: I9d4eb7b830521ddd50f76544c38ebb0cd939800a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45763
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-15164 tests: remove wrong chattr -h check 97/45697/3
Elena Gryaznova [Wed, 1 Dec 2021 18:23:56 +0000 (21:23 +0300)]
LU-15164 tests: remove wrong chattr -h check

sanity-quota test_62() is always skipped because
of wrong chattr project ID support check.
chattr.c:
Usage: %s [-pRVf] [-+=aAcCdDeijPsStTuFx] [-v version] files...

Let's remove this check: e2fsprogs project ID support for
chattr/lsattr exists since 2016. If one is using so old chattr
- he will be forced to update by error "root failed to clear
inherit".

Fixes: 2d3bbce0c9 ("LU-11101 quota: fix setattr project check")
Test-Parameters: trivial testlist=sanity-quota env=ONLY=62
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9967
Change-Id: I0cfd98735e5e0b5956f3dd6385ce626584443bea
Reviewed-on: https://review.whamcloud.com/45697
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15200 llite: "lfs getdirstripe -D" shows inherit layout 70/45570/7
Lai Siyao [Sun, 7 Nov 2021 05:15:56 +0000 (01:15 -0400)]
LU-15200 llite: "lfs getdirstripe -D" shows inherit layout

Once system-wide default LMV is set, "lfs getdirstripe -D subdir"
should show inherited layout from it.

Add sanity 413e.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If7354cb4093c58f6d56a6a4d449fb69a9deec7cc
Reviewed-on: https://review.whamcloud.com/45570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10378 utils: add formatted printf to lfs find 36/45136/5
Anjus George [Wed, 12 Jan 2022 06:30:03 +0000 (01:30 -0500)]
LU-10378 utils: add formatted printf to lfs find

Introduce new --printf option with lfs find utility along with support for
the backslash escapes and format directives given below that allow users to
obtain metadata in formatted style.

List of backslash escapes supported by --printf option:
-------------------------------------
   Description               | Escape
-------------------------------------
   Newline character         | \n
   Tab character             | \t
   Literal backslash         | \\

List of format directives used with --printf option:
----------------------------------------------------------
   Description                                  | Directive
----------------------------------------------------------
   Literal % character                          | %%
   Access time (in ctime format)                | %a
   Access time (in secs since epoch)            | %A@
   File size (in 512B blocks)                   | %b
   Last change time (in ctime format)           | %c
   Last change time (in secs since epoch)       | %C@
   Numeric group ID of file/dir owner           | %G
   File size (in 1K blocks)                     | %k
   File mode (octal)                            | %m
   Path name of file                            | %p
   File size (in bytes)                         | %s
   Modification time (in ctime format)          | %t
   Modification time (in secs since epoch)      | %T@
   Numeric user ID of file/dir owner            | %U
   Birth time (in ctime format)                 | %w
   Birth time (in secs since epoch)             | %W@
   File type | %y
   Stripe count                                 | %Lc
   Lustre FID | %LF
   Directory hash type                          | %Lh
   Starting OST (file) or MDT (dir) index       | %Li
   List of all OST (file) or MDT (dir) indices  | %Lo
   OST pool name                                | %Lp
   Numeric project id assigned to file/dir      | %LP
   Stripe size in bytes                         | %LS
---------------------------------------------------------
Note: Stripe size and OST pool name are not defined for
directories whereas Hash type is not defined for files.
%Li gives starting OST index for files and starting MDT index
for directories. For composite files %Lo provides list of all
OST indices for all components whereas %Lc, %LS, %Li and %Lp
provide details for last initialized component only.

A usage example for --printf option and its output for a composite
file with three components are shown below.

   lfs find --printf '%a | %t | %c | %w | %W@ | %b | %s | %U | %G |
   %A@ | %T@ | %C@ | %LP | %Lc | %LS | %Li | %Lo | %Lp | %pn'
   /lustre/lustre/composite.txt

   Tue Oct 26 16:06:18 2021 | Tue Oct 26 16:06:50 2021 | Tue Oct 26
   16:06:50 2021 | Tue Oct 26 16:06:18 2021 | 1635278778 | 204800 |
   104857600 | 0 | 0 | 1635278778 | 1635278810 | 1635278810 | 0 | 3 |
   2097152 | 2 | [1][2,0][2,0,1] | pool1 |
   /lustre/lustre/composite.txt

Change-Id: I370c0978900a4837b0ea3060e08dabb1fcb6e115
Signed-off-by: Anjus George <georgea@ornl.gov>
Signed-off-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-on: https://review.whamcloud.com/45136
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14948 build: Warn about /usr/src/lustre.tar.bz2 77/44677/5
Shaun Tancheff [Tue, 4 Jan 2022 15:22:57 +0000 (22:22 +0700)]
LU-14948 build: Warn about /usr/src/lustre.tar.bz2

When /usr/src/lustre.tar.bz2 exists, make debs (and dkms-debs)
will fail with an error like:

  Extracting the package tarball, /usr/src/lustre.tar.bz2, ...
  ../../generic.sh: line 73: debian/rules: Permission denied
  BUILD FAILED!

Add the current git hash to the lustre tarball, as well as
attempt to remove the conflict from /usr/src.  Failing that,
give a warning to ask the user to remove the conflicting file.

HPE-bug-id: LUS-10308
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4aaa803cb81c2ed8ffc0182bb49ea0bff5064df4
Reviewed-on: https://review.whamcloud.com/44677
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Remove unnecessary page get/put 93/44293/8
Patrick Farrell [Fri, 30 Jul 2021 16:15:20 +0000 (12:15 -0400)]
LU-13799 llite: Remove unnecessary page get/put

Part of the aio cleanup code has the slightly strange
behavior of doing get on every page before calling page
cleanup, then doing a put after.

This was required because we call cl_page_list_del before
calling cl_page_delete, and cl_page_list_del was holding
the last reference on the page struct.

If we reverse the order, then we don't need the extra
get/put to keep the pages live.  This should save
significant CPU time in the ptlrpcd threads when finishing
i/o, since this removes a get/put on every page.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3b1639061d775faa43c91e2d0a0f209f2d0df10c
Reviewed-on: https://review.whamcloud.com/44293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13594 obdclass: Add OOM handler for obdclass 21/42121/7
Arshad Hussain [Sun, 21 Mar 2021 01:02:07 +0000 (06:32 +0530)]
LU-13594 obdclass: Add OOM handler for obdclass

This patch adds OOM handler for obdclass. The handler
currently only prints max memory that was used by obdclass
along with current memory being used before attempting
to kill the user process.

Currently, when the handler is kicked in the output under
dmesg would look like:

Output:
~~~~~~~~
...
Mar 21 07:02:02 devbox kernel: obd_memory max: 244859953, obd_memory current: 0
...

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0259f800b1f219ff3427f1d2a17b6a874dd456d3
Reviewed-on: https://review.whamcloud.com/42121
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13911 tests: take osc max_rpcs_in_flight limit into account 87/39687/3
Vladimir Saveliev [Wed, 10 Oct 2018 12:08:49 +0000 (08:08 -0400)]
LU-13911 tests: take osc max_rpcs_in_flight limit into account

max_rpcs_in_flight for osc is limited to 256. sanity.sh:test_115()
tries to set it to ost.OSS.ost_io.threads_started * 4. The test should
make sure that it does not exceed 256.

HPE-bug-id: LUS-5917
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I38929e1ed0fe7855e7e60ea43742740c01ae1bd8
Reviewed-on: https://review.whamcloud.com/39687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Do not get/put DIO pages 38/39438/27
Patrick Farrell [Fri, 30 Jul 2021 16:14:16 +0000 (12:14 -0400)]
LU-13799 llite: Do not get/put DIO pages

We've already told the kernel we're working with these pages
using the get/put_user_pages functions, and userspace must
hold references on them throughout the i/o anyway.

So getting/putting these vmpages is unnecessary.  This
saves around 7% of the time in DIO page submission, netting
about that much of a performance improvement.

This patch reduces i/o time in ms/GiB by:
Write: 22 ms/GiB
Read: 19 ms/GiB

Totals:
Write: 135 ms/GiB
Read: 143 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     6470 MiB/s
read      6354 MiB/s

Plus this patch:
write     7531 MiB/s
read      7179 MiB/s

Signed-off-by: Patrick Farrel <pfarrell@whamcloud.com>
Change-Id: Ic457c21ebca9624da2422463da453b535dcfd10e
Reviewed-on: https://review.whamcloud.com/39438
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Move free user pages 43/39443/29
Patrick Farrell [Fri, 30 Jul 2021 16:13:45 +0000 (12:13 -0400)]
LU-13799 llite: Move free user pages

It is incorrect to release our reference on the user pages
before we're done with them - We need to keep it until the
i/o is complete, otherwise we access them after releasing
our reference.  This has not caused any known bugs so far,
but it's still wrong.

So only drop these references when we free the aio struct,
which is only freed once i/o is complete.

Also rename free_user_pages to release_user_pages, because
it does not free them - it just releases our reference.

This also helps performance by moving free_user_pages to
the daemon threads.  This is a 5-10% boost.

This patch reduces i/o time in ms/GiB by:
Write: 18 ms/GiB
Read: 19 ms/GiB

Totals:
Write: 180 ms/GiB
Read: 178 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5183 MiB/s
read      5201 MiB/s

Plus this patch:
write        5702 MiB/s
read         5756 MiB/s

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ibfe808611bbe6743a1b5fe3aa6a8d42691256d22
Reviewed-on: https://review.whamcloud.com/39443
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12130 lod: make pool inheritance policy more consistent 36/34536/7
Vladimir Saveliev [Fri, 24 Dec 2021 21:02:13 +0000 (00:02 +0300)]
LU-12130 lod: make pool inheritance policy more consistent

If directory's striping includes pool info, setstriping behaves
differently in relation to pool inheritance:
- if setstriping non-PFL layout the pool is inherited
- otherwise, it is not

Make inheritance policy consistent:
- when specified PFL does not include pool information - embed current
  pool specification into new layout

sanity.sh:test_65n is modified to illustrate the case.

HPE-bug-id: LUS-7180
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I92b415e18ba7aadd2059da702878905249dd33c3
Reviewed-on: https://review.whamcloud.com/34536
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14651 build: remove KALLSYMS build requirement 70/46070/2
James Simmons [Wed, 12 Jan 2022 16:55:46 +0000 (11:55 -0500)]
LU-14651 build: remove KALLSYMS build requirement

Now that kallsyms is no longer exported some distros kernels are
disabling it by default. If kallsyms is disabled lustre will fail
configure. Remove this hard requirmenet.

Test-Parameters: trivial
Change-Id: I710433e99afd75eea6a3bf1d77878b97beaed605
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15403 tests: fix some false alarm 58/45958/2
Alexey Lyashkov [Thu, 30 Dec 2021 13:04:59 +0000 (16:04 +0300)]
LU-15403 tests: fix some false alarm

Current test implementation have a two bugs.
1) client mgc llogs processing is async to mount,
so we can start a lock check before all locks processed.
it caused a false alarm. Same story with several client mounts.

2) Server locks counting is unsafe, as it include an other server
locks. so any servers reconnect may cause a false alarm.

Let's fix it.

HPe-bug-id: LUS-8326
Test-Parameters: trivial testlist=sanityn
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I59d6e5deb79ca9f040385231738b8698a3309e8e
Reviewed-on: https://review.whamcloud.com/45958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lnet_del_route() to take lnet_nid 15/43615/11
Mr NeilBrown [Tue, 31 Aug 2021 13:54:33 +0000 (09:54 -0400)]
LU-10391 lnet: change lnet_del_route() to take lnet_nid

The gateway NID passed to lnet_del_route is now a struct lnet_nid.
Instead of passing LNET_NID_ANY as a wildcard, we pass
a NULL pointer.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1243be20d9f40e4ac3ebc6ec5dd9bbcbae6653c3
Reviewed-on: https://review.whamcloud.com/43615
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: Fix NULL-deref in lnet_nidstr_r() 38/44838/7
Mr NeilBrown [Fri, 3 Sep 2021 03:22:17 +0000 (13:22 +1000)]
LU-10391 lnet: Fix NULL-deref in lnet_nidstr_r()

It is valid to pass NULL as the nid for lnet_nidstr_r() - it indicate
"any" nid.  LNET_NID_IS_ANY() tests for this and the function exits
early.

However, 'lnd' is assigned from "nid->nid_type" and 'nnum' from
"nid->nid_num", causing a NULL-pointer dereference.

So move these assignments later.

Fixes: 82a17076f880 ("LU-10391 lnet: introduce struct lnet_nid")
Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie29dd4d0ef7fac0f11c1ece714278a7dd9860602
Reviewed-on: https://review.whamcloud.com/44838
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change src_nid arg to lnet_parse() to 16byte 14/43614/9
Mr NeilBrown [Fri, 7 Jan 2022 01:13:58 +0000 (20:13 -0500)]
LU-10391 lnet: change src_nid arg to lnet_parse() to 16byte

lnet_parse() now gets the source nid as 'struct lnet_nid *'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7afac71c97e4564e544695f057fd0b002d97afc9
Reviewed-on: https://review.whamcloud.com/43614
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: convert nids in lnet_parse to lnet_nid 13/43613/10
Mr NeilBrown [Sat, 11 Sep 2021 14:20:51 +0000 (10:20 -0400)]
LU-10391 lnet: convert nids in lnet_parse to lnet_nid

src_nid and dest_nid in lnet_parse() are changed to
struct lnet_nid, and this change propagates out to
affect a few support function.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic6922d3f643e493f92f8f64974ad30f66457e842
Reviewed-on: https://review.whamcloud.com/43613
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: Convert ping to support 16-bytes address 12/43612/10
Mr NeilBrown [Thu, 9 Jul 2020 06:35:58 +0000 (16:35 +1000)]
LU-10391 lnet: Convert ping to support 16-bytes address

Now that ksocknal can send hello messages with 16-byte address, we can
change lnet_send_ping() to ping hosts with large-address nids.

Note that this doesn't change the addresses in the ping message sent,
only the sending and receiving of the message.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I6f591c2f053698876195575c71da42f64788637e
Reviewed-on: https://review.whamcloud.com/43612
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: add hello message version 4 11/43611/10
Mr NeilBrown [Tue, 28 Apr 2020 04:57:09 +0000 (14:57 +1000)]
LU-10391 socklnd: add hello message version 4

KSOCK_PROTO_V4 uses a 'hello' message that contains
lnet_hdr_nid16 with 16 byte addresses

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I52a36739d3a84dc059537059a586ce3dab2b20f0
Reviewed-on: https://review.whamcloud.com/43611
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: Change ksock_hello_msg to struct lnet_nid 10/43610/10
Mr NeilBrown [Tue, 7 Jul 2020 04:03:23 +0000 (14:03 +1000)]
LU-10391 socklnd: Change ksock_hello_msg to struct lnet_nid

'struct ksock_hello_msg' now stores 'struct lnet_nid', but it is
converted to 'struct ksock_hello_msg_nid4' - the old format - for
transmit, which is converted back on receive.

This opens the way for a new version of the hello protocol
which will use 16byte addresses.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I22e86f9088f6001f203f24f93ef292fcf2a8e69f
Reviewed-on: https://review.whamcloud.com/43610
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: move lnet_hdr unpack into ->pro_unpack 09/43609/10
Mr NeilBrown [Mon, 6 Jul 2020 05:07:24 +0000 (15:07 +1000)]
LU-10391 socklnd: move lnet_hdr unpack into ->pro_unpack

Converting the lnet_hdr from network-format to host-format
is currently done in ksocknal_process_recv().
Move it to ->pro_unpack() so that a different protocol
can send it in a different format.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icc22f4b52c1391d382c28bad157795f5477f4d7c
Reviewed-on: https://review.whamcloud.com/43609
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: alter lnd_notify_peer_down() to take lnet_nid 08/43608/10
Mr NeilBrown [Mon, 6 Jul 2020 01:47:56 +0000 (11:47 +1000)]
LU-10391 lnet: alter lnd_notify_peer_down() to take lnet_nid

The lnd_notify_peer_down() interface now takes a large nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9926caf0508ff257e9e64d5537597addbce657d7
Reviewed-on: https://review.whamcloud.com/43608
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: convert LNetGetID to return an large-addr pid 07/43607/11
Mr NeilBrown [Tue, 30 Nov 2021 15:29:17 +0000 (10:29 -0500)]
LU-10391 lnet: convert LNetGetID to return an large-addr pid

LNetGetID now returns a 'struct processid' containing an
large-address nid.

Various places still convert it to a 4-byte-addr nid for use.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id1dfcd33ad11609fc18fe6779dff39bc9f1ff03a
Reviewed-on: https://review.whamcloud.com/43607
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: convert to struct lnet_process_id in lib-move 06/43606/11
Mr NeilBrown [Fri, 26 Jun 2020 06:04:25 +0000 (16:04 +1000)]
LU-10391 lnet: convert to struct lnet_process_id in lib-move

Various functions in lib-move.c create a 'struct lnet_process_id' just
for the purpose of reporting it in error/debug messages.

Change these to 'struct lnet_processid' with larger address support.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1d5d01eb5d42852a1413ad270a040d7698a0c145
Reviewed-on: https://review.whamcloud.com/43606
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lnet_prep_send to take net_processid 05/43605/12
Mr NeilBrown [Fri, 26 Jun 2020 05:37:10 +0000 (15:37 +1000)]
LU-10391 lnet: change lnet_prep_send to take net_processid

Instead of a 'struct lnet_process_id', lnet_prep_send() now takes a
"struct lnet_processid *" with allows larger address.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ibf01340663316e24b6c3053166465293c4f761e1
Reviewed-on: https://review.whamcloud.com/43605
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lnet_hdr to store large nids. 04/43604/12
Mr NeilBrown [Thu, 25 Jun 2020 06:29:10 +0000 (16:29 +1000)]
LU-10391 lnet: change lnet_hdr to store large nids.

'struct lnet_hdr' now has large-addr nids.  They are converted to
4-byte-addr on transmit, and converted back on receive.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icb333e7b62f8151ad103db0a16aa7685a33071e1
Reviewed-on: https://review.whamcloud.com/43604
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: separate lnet_hdr in msg from that in lnd. 03/43603/12
Mr NeilBrown [Fri, 7 Jan 2022 00:09:49 +0000 (19:09 -0500)]
LU-10391 lnet: separate lnet_hdr in msg from that in lnd.

The lnet_hdr stored in an lnet_msg has fields which are sometimes in
le byte order and sometimes in host byte order.

The various lnds need all these fields to be in le byte order for
transmission or reception over the network.

To support larger (IPv6) NIDs, we will need the lnet_hdr in lnet_msg
to store these NIDs, but the lnd will need both 4byte-addr and 16-byte
lnds depending on protocol negotiation.

This patch separates out the two to make the conversion easier to
follow.

'struct lnet_hdr' is now used within common lnet code, and is not
stored in network buffers.

lnd_send will convert from 'struct lnet_hdr' to whatever is required
in the network buffer.  When lnet_parse() is called, the network
buffer will be converted to a 'struct lnet_hdr' first, and that will
be passed to lnet_parse().

The common fields of 'struct lnet_hdr' are always in host byte order.
The command specific fields (now in 'union lnet_cmd_hdr') have not
been changed and are sometimes host-byte-order and sometimes
l3-byte-order.

The new 'struct lnet_hdr_nid4' is used in network buffers.  It is
opaque - there are no subfields to access.  Very few places in the lnd
code want to access fields anyway.

New functions lnet_hdr_to_nid4() and lnet_hdr_from_nid4() can
convert between the lnet_hdr_nid4 to the internal lnet_hdr.

'struct _lnet_hdr_nid4' is provided to access fields inside 'struct
lnet_hdr_nid4' when that is really needed.  It is used by the to/from
functions and a couple of other places.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I55277a49538543376cf0404f749d3357a2950a7c
Reviewed-on: https://review.whamcloud.com/43603
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14008 o2iblnd: avoid static allocation for msg tx 61/40261/10
Alexey Lyashkov [Wed, 12 Aug 2020 13:20:00 +0000 (16:20 +0300)]
LU-14008 o2iblnd: avoid static allocation for msg tx

tx msg handling simplification, just push
a lnet header message in same list as other.

Cray-bug-id: LUS-1796
Change-Id: I8e5d9b8a4579ff630d4a4fbc57b06a73a662e68c
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40261
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11036 tests: debug information for sanity-lfsck test_8 26/39526/5
Olaf Faaland [Tue, 28 Jul 2020 18:23:34 +0000 (11:23 -0700)]
LU-11036 tests: debug information for sanity-lfsck test_8

When test_8 fails, report lfsck status or recovery status
depending on the failure, to aid debugging.

Test-Parameters: trivial testlist=sanity-lfsck env=ONLY=8,ONLY_REPEAT=30
Test-Parameters: fstype=zfs testlist=sanity-lfsck env=ONLY=8,ONLY_REPEAT=30
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I57ca069125f78e5dc10761a7b44b33a48ea4859c
Reviewed-on: https://review.whamcloud.com/39526
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15307 lod: add option to set max dir stripe count 24/45724/7
Lei Feng [Fri, 3 Dec 2021 11:09:13 +0000 (19:09 +0800)]
LU-15307 lod: add option to set max dir stripe count

Add an option to limit max dir stripe count when the default
dir stripe count is set to -1.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I12dd5be15012a28bda30da85f24bc726c37f1ac7
Reviewed-on: https://review.whamcloud.com/45724
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15456 llite: deadlock in ll_new_node() 57/46157/2
Lai Siyao [Tue, 18 Jan 2022 04:29:19 +0000 (23:29 -0500)]
LU-15456 llite: deadlock in ll_new_node()

ll_new_node() will call ll_dir_getstripe() to fetch parent default
LMV if md_create() returns -EREMOTE, it should call
ll_finish_md_op_data() before calling ll_dir_getstripe() because
the latter will lock lli_lsm_sem again, which will deadlock.

Fixes: 55ca00c3d1cd863 ("LU-11213 ptlrpc: intent_getattr fetches default LMV")
Test-Parameters: mdscount=2 mdtcount=4 testlist=racer,racer,racer
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib858bae19ff88533fe487583c27d544026aafa3f
Reviewed-on: https://review.whamcloud.com/46157
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14277 lod: statfs should not block a create 97/41497/4
Alexey Lyashkov [Fri, 29 Jan 2021 09:29:27 +0000 (12:29 +0300)]
LU-14277 lod: statfs should not block a create

lod_qos_statfs_update() need a guarantee that targets
isn't changed, so it doesn't need to take a QoS mutex.

HPE-bug-id: LUS-2106
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Id9f0ea2fa006bee601d05e14b2e515fcf8248249
Reviewed-on: https://review.whamcloud.com/41497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew tag 2.14.57 2.14.57 v2_14_57
Oleg Drokin [Mon, 24 Jan 2022 19:10:33 +0000 (14:10 -0500)]
New tag 2.14.57

Change-Id: I8596535261f83aac5b8e4900a76c200df226a863
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15370 ksocklnd: remove verbose debug messages 55/45855/2
Andreas Dilger [Tue, 14 Dec 2021 21:59:57 +0000 (14:59 -0700)]
LU-15370 ksocklnd: remove verbose debug messages

Remove excess ENTRY/EXIT debug messages from functions that don't
actually show anything useful about how those functions are used.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ideca3e2996bc3fb7737f6f16dc13ded0883ebbe5
Reviewed-on: https://review.whamcloud.com/45855
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15095 tests: skip lbug_on_grant_miscount on client 85/46185/5
Andreas Dilger [Tue, 18 Jan 2022 23:59:17 +0000 (16:59 -0700)]
LU-15095 tests: skip lbug_on_grant_miscount on client

Do not try to specify the lbug_on_grant_miscount=1 module parameter
on client-only builds (el7.9, pcc64le, aarch64) as this is a server
parameter and will not be present if the client is built without
HAVE_SERVER_SUPPORT.  Otherwise, loading ptlrpc.ko will fail.

Test-Parameters: trivial testlist=sanityn clientdistro=el7.9
Fixes: 2c787065441e ("LU-15095 target: lbug_on_grant_miscount module parameter")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieacd93edcce3f8de30b9308dbf3b03e10c3ebbe5
Reviewed-on: https://review.whamcloud.com/46185
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15402 ldlm: speedup RD flock enqueue 57/45957/2
Andriy Skulysh [Wed, 24 Nov 2021 11:33:47 +0000 (13:33 +0200)]
LU-15402 ldlm: speedup RD flock enqueue

Scanning of lr_granted can be done until
covering granted RD lock is reached.

Change-Id: I907cff002d9765c5f8496d377eddd5e62795d89c
HPE-bug-id: LUS-10623
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/45957
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15401 fld: don't obtain a slot for fld request 56/45956/2
Andriy Skulysh [Wed, 10 Nov 2021 20:22:06 +0000 (22:22 +0200)]
LU-15401 fld: don't obtain a slot for fld request

fld_client_rpc() is called with ldlm_lock held.
Thus it can cause deadlock on obtainig request slot:

 #0 [ffff92c9d63df568] __schedule
 #1 [ffff92c9d63df5f0] schedule
 #2 [ffff92c9d63df600] obd_get_request_slot
 #3 [ffff92c9d63df6b0] fld_client_rpc
 #4 [ffff92c9d63df700] fld_client_lookup
 #5 [ffff92c9d63df780] lmv_fld_lookup
 #6 [ffff92c9d63df7b8] lmv_unpackmd
 #7 [ffff92c9d63df810] mdc_get_lustre_md
 #8 [ffff92c9d63df850] lmv_get_lustre_md
 #9 [ffff92c9d63df888] ll_prep_inode

Request slot can be ommited for fld reuest as they
are sent to separate FLD_REQUEST_PORTAL portal.

Change-Id: I12987ca2e4aa0d70aa760e3a1ac20fe1a91d64b5
HPE-bug-id: LUS-10576
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/45956
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15066 utils: do not inherit limits from global pool 35/45135/10
Sergey Cheremencev [Fri, 2 Jul 2021 11:55:19 +0000 (14:55 +0300)]
LU-15066 utils: do not inherit limits from global pool

lfs_setquota retrieves quota limits from the server
before setting new ones. It is needed to do not overwrite
already set limits. Before the patch limits were retrieved
only from global pool, despite the fact that new limit should
be set to a specific pool. Thus, setting only hard block limit
to a specific pool, this pool could inherit soft block limit
from the global pool.

HPE-bug-id: LUS-10186
Change-Id: I6fdbf3265f950e39b48a5ad06b059d8aa06a6218
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13468 fld: repeat rpc in fld_client_rpc after EAGAIN 02/38302/15
Vladimir Saveliev [Sun, 31 Oct 2021 06:42:35 +0000 (09:42 +0300)]
LU-13468 fld: repeat rpc in fld_client_rpc after EAGAIN

Timeout-ed rpc sent by fld_client_rpc() may lead to client operation
failure.

Have fld_client_rpc() to repeat rpc in case of EAGAIN after a while.

Test to illustrate the issue is added.

Typo in fld_client_rpc() in failure simulation is fixed.
recovery-small.sh:test_110k() is changed so that fld_client_rpc()
failed only once, otherwise it would fall into endless loop.

HPE-bug-id: LUS-8652
Fixes: e3f6111dfd1c ("LU-11761 fld: lets caller to retry FLD_QUERY")
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I145e719ec2fb5f5dbf9b5aa4b2a5b7e62f98c19f
Reviewed-on: https://review.whamcloud.com/38302
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14651 llite: extend inode methods with user namespace arg 38/45938/10
Jian Yu [Fri, 14 Jan 2022 18:22:25 +0000 (10:22 -0800)]
LU-14651 llite: extend inode methods with user namespace arg

Kernel 5.12 supports idmapped mounts, which extends
vfsmount struct with a new struct user_namespace member,
and also extends some inode methods with an additional
user namespace argument.

The series can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=idmapped_mounts

Change-Id: I7cccde8cb3288e1ce3d9b6255796b954a6e115df
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15220 libcfs: fix panic_notifier_list undeclared error 12/45812/6
Jian Yu [Fri, 7 Jan 2022 11:12:29 +0000 (03:12 -0800)]
LU-15220 libcfs: fix panic_notifier_list undeclared error

In kernel 5.14 commit f39650de687e35766572ac89dbcd16a5911e2f0a,
panic and oops helpers are split out from include/linux/kernel.h.

This patch accommodates the above changes and fixes the
"'panic_notifier_list' undeclared" error.

Change-Id: I6888f9f4878906c572bb40d950a70ff642d3474e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45812
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15220 llite: Compat for set_pagevec_dirty 27/45927/6
Patrick Farrell [Thu, 6 Jan 2022 18:20:59 +0000 (10:20 -0800)]
LU-15220 llite: Compat for set_pagevec_dirty

If we can't access account_page_dirtied either via export
or via kallsyms lookup, the benefit of vvp_set_pagevec_dirty
is mostly lost, since we have to take the xarray lock
repeatedly to handle accounting dirty pages.

Replace the more complicated compat code by just falling
back to __set_page_dirty_nobuffers in a loop, since this
has the same effect and is much simpler.

This also resolves 5.14 compatibility, as __set_page_dirty
is no longer exported there.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3feb526b8eaaec3811c689a895875b409204a159
Reviewed-on: https://review.whamcloud.com/45927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15259 tests: skip acl tests if no bin/daemon users 68/45868/3
Andreas Dilger [Wed, 15 Dec 2021 20:22:23 +0000 (13:22 -0700)]
LU-15259 tests: skip acl tests if no bin/daemon users

If the bin/daemon users are not configured on the test system, then
sanity test_103a, test_125, test_154a will fail with:

    $ setfacl -m u:bin:rw f -- failed
    -   ? setfacl: Option -m: Invalid argument near character 3

Skip these tests until they are fixed.

Test-Parameters: trivial clientdistro=sles15sp3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I526f9318862577a6b73c3b63cfc95a3d793ebbe5
Reviewed-on: https://review.whamcloud.com/45868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13749 tests: add quota deb package for testing 98/45998/2
James Simmons [Fri, 7 Jan 2022 15:50:00 +0000 (10:50 -0500)]
LU-13749 tests: add quota deb package for testing

For the sanity-pcc.sh tests script the bash function
test_usrgrp_quota() used quotacheck which is a native
application on Ubuntu. Ensure the quota deb package
is installed so the sanity-pcc tests can run properly
on Ubuntu.

Change-Id: I44fb030754e9f87faf78c9711349de7a3463b15a
Test-Parameters: trivial testlist=sanity-pcc clientdistro=ubuntu2004
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45998
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15415 tests: Use correct PYTHON in Makefile.am 89/45989/2
Nathaniel Clark [Thu, 6 Jan 2022 13:50:46 +0000 (08:50 -0500)]
LU-15415 tests: Use correct PYTHON in Makefile.am

Use already defined PYTHON from autoMake, instead of trying to
guess correct one using undefined PYTHON_VERSION.

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I44e2e5d44121c0bd39cf3032ae228bd6cd442a8c
Reviewed-on: https://review.whamcloud.com/45989
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14596 ldiskfs: Fix mounting issues for newer kernels 60/45960/3
James Simmons [Sun, 2 Jan 2022 20:50:44 +0000 (13:50 -0700)]
LU-14596 ldiskfs: Fix mounting issues for newer kernels

During 2.15 development cycle project quotas was enabled by
default which broke mounting ldiskfs for Ubuntu20 5.4 and
5.8 kernels. The following error showed up:

LDISKFS-fs warning (device loop0): ldiskfs_enable_quotas:6118:
  Failed to enable quota tracking (type=0, err=-3). Please run
  e2fsck to fix.

This was due to Ubuntu20 kernels compiling their quota support
as modules but distributing those modules in the package
linux-modules-extra-$(uname) which is not installed by default.
For debian packaging include the dependency 'linux-generic' that
should install the needed package.

The next problem noticed while debugging is the wrong value
value for EXT4_MOUNT_DIRDATA with newer kernels. The current
value used is a combo of EXT4_MOUNT_QUOTA and EXT4_MOUNT_BARRIER
which is wrong. Set EXT4_MOUNT_DIRDATA to the proper value of
0x0002.

Change-Id: I17a9008edb9ded348bda3a2bf137bb23f9e8b980
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45960
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15176 sec: allow subdir mount of encrypted dir 07/45407/10
Sebastien Buisson [Fri, 29 Oct 2021 11:29:25 +0000 (13:29 +0200)]
LU-15176 sec: allow subdir mount of encrypted dir

In case of sub-directory mount of an encrypted directory, we need to
retrieve the encryption context of the root inode of the filesystem.
This is done by making the MDT return this upon getattr reply.

Also add sanity-sec test_60 to exercise this capability.

Fixes: 40d91eafe2 ("LU-12275 sec: atomicity of encryption context getting/setting")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic7a273813533f2904225011b247cdfe995ce9be8
Reviewed-on: https://review.whamcloud.com/45407
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: make client encryption compatible with ext4 11/45211/15
Sebastien Buisson [Thu, 6 Jan 2022 21:19:02 +0000 (14:19 -0700)]
LU-13717 sec: make client encryption compatible with ext4

In order to benefit from encrypted file handling implemented in
e2fsprogs, we need to adjust the way Lustre deals with encryption
context of files.

First, the encryption context needs to be stored in an xattr named
"encryption.c" instead of "security.c". But neither llite nor ldiskfs
has an xattr handler for this "encryption." xattr type. So we need
to export ldiskfs_xattr_get and ldiskfs_xattr_set_handle symbols for
this to work.

Second, we set the LDISKFS_ENCRYPT_FL flag on files for which we set
the 'encryption.c' xattr. But we just keep this flag for on-disk
inodes, and make sure the flag is cleared for in-memory inodes.
The purpose is to help e2fsprogs with encrypted files handling, while
not disturbing Lustre server side with the encryption flag (servers
are not supposed to know about it for Lustre client-side encryption).

To maintain compatibility with 2.14 in which encryption context is
stored in "security.c" xattr, we try to fetch enc context from this
xattr if getting it from "encryption.c" fails. On client side, in all
cases everything looks like encryption context is stored in
"encryption.c".

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I784ec530f0dfdd2743169ba2326ff6c5cdd4e85a
Reviewed-on: https://review.whamcloud.com/45211
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14399 hsm: process hsm_actions in coordinator 45/41445/21
Sergey Cheremencev [Mon, 8 Feb 2021 22:26:46 +0000 (01:26 +0300)]
LU-14399 hsm: process hsm_actions in coordinator

Wait for mdd setup in a separate thread to don't block mount.
The patch adds conf-sanity_131 to verify the fix.

Change-Id: Ifd0e8969d7ed4f8944ab61ab0e0ebe2655bad003
Fixes: a558006b ("LU-13920 hsm: process hsm_actions only after mdd setup")
HPE-bug-id: LUS-9750
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/41445
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15145 hsm: unlock the restore layout lock for a cancel 41/45341/5
Etienne AUJAMES [Fri, 22 Oct 2021 18:18:29 +0000 (20:18 +0200)]
LU-15145 hsm: unlock the restore layout lock for a cancel

The HSM restore EX layout lock is not unlock by a HSM cancel action
or by "hsm_control=purge" parameter.

This patch call cdt_restore_handle_del() in mdt_cancel_all_cb() and
mdt_agent_record_update_cb() for restore action (when updating action
status to ARS_CANCELED).
The test "sanity-hsm test_103a" checks the "purge actions" with
blocking restore.

Test-Parameters: testlist=sanity-hsm,sanity-hsm
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Id891e06aacd2a2c5950048a2d2a5d1398eedfdd7
Reviewed-on: https://review.whamcloud.com/45341
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: don't deref lnet_hdr in LNDs 02/43602/12
Mr NeilBrown [Mon, 11 May 2020 03:52:34 +0000 (13:52 +1000)]
LU-10391 socklnd: don't deref lnet_hdr in LNDs

The lnd_hdr structure needs to be extended to support larger
addresses.  To assist this we need to minimize the number of places
that its content are accessed.

Currently the internals of lnet_hdr are larely untouched inside the
various LNDs, but there are some exceptions in socklnd.
These exceptions are not necessary - the same data is available from
elsewhere in the lnet_msg.

So change those accesses to use the lnet_msg info instead.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia37548323fafc77df7a42a1ac956c926f1b9ebf9
Reviewed-on: https://review.whamcloud.com/43602
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: prepare for new KSOCK_MSG type 01/43601/12
Mr NeilBrown [Mon, 11 May 2020 01:06:11 +0000 (11:06 +1000)]
LU-10391 socklnd: prepare for new KSOCK_MSG type

Various places in socklnd assume there are only two message type:
KSOCK_MSG_NOOP and KSOCK_MSG_LNET.  We will soon add another type to
support a new lnet_hdr type with large addresses.
So do some cleanup first:

- get rid of ksock_lnet_msg - it doesn't add anything to lnet_hdr
- separate out 'struct ksock_hdr'.  We often want the size of this
  header, and instead request the offset of a field in ksock_msg.
- introduce switch statements in a couple of places to handle the
  different types of ksock_msg.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ibe484f76757c4100b8532cef659c3cc369b658ba
Reviewed-on: https://review.whamcloud.com/43601
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: use large nids in struct lnet_event 00/43600/12
Mr NeilBrown [Tue, 30 Nov 2021 14:51:37 +0000 (09:51 -0500)]
LU-10391 lnet: use large nids in struct lnet_event

All nids, including those in process_id, are changed to
to struct lnet_nid / struct lnet_processid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I799dbbc22f7cfe403f07eb22f4bfc4e4b5dc23ea
Reviewed-on: https://review.whamcloud.com/43600
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: Change lnet_send() to take large-addr nids 99/43599/11
Mr NeilBrown [Tue, 30 Nov 2021 14:48:37 +0000 (09:48 -0500)]
LU-10391 lnet: Change lnet_send() to take large-addr nids

The src and rtr nids passed to lnet_send() are now pointers to a
'struct lnet_nid'.  NULL can be passed for the rtr nid, which is
treated the same as ANY.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id216b82ed6e2dcd81114859a7f964e0680057ff1
Reviewed-on: https://review.whamcloud.com/43599
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: extend nids in struct lnet_msg 98/43598/13
Mr NeilBrown [Fri, 7 Jan 2022 01:07:45 +0000 (20:07 -0500)]
LU-10391 lnet: extend nids in struct lnet_msg

struct lnet_msg contains 3 nids and one process_id (which itself
contains a nid.  Replace each of these with the 'struct lnet_nid'
version.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic6233d36bafda364894d89b2e2b055538a6033f5
Reviewed-on: https://review.whamcloud.com/43598
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11596 osc: Fix and re-enable sanity grant test for ARM 58/40758/19
James Simmons [Sat, 20 Nov 2021 13:58:53 +0000 (08:58 -0500)]
LU-11596 osc: Fix and re-enable sanity grant test for ARM

If both OST and OSC support OBD_CONNECT_GRANT_PARAM, OST side will not
change client side claimed grant (a.k.a. o_grant_used) regardless of
the client page size. So no grant loss in this case.

Fixes: bd1e41672c97 ("LU-2049 grant: add support for OBD_CONNECT_GRANT_PARAM")
Change-Id: Ia0d3da587cb551400fec0c054dc65b116e6bd95b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/40758
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11597 test: Fix sanityn 16a failed on arm 89/37589/15
Wang Shilong [Thu, 6 Jan 2022 14:02:24 +0000 (09:02 -0500)]
LU-11597 test: Fix sanityn 16a failed on arm

As now O_DIRECT expect IO aligned with PAGE SIZE,
x86_64 expect 4K size, but some other platform, it
could be 64K, use PAGE_SIZE here to make the test happy.

And macro O_DIRECT is defined if macro _GNU_SOURCE is defined
according to open man doc[1] and _GNU_SOURCE is defined at the
head of file fsx.c already. So set the value of OP_DIRECT to
O_DIRECT instead of hardcoding its value as O_DIRECT could have
different values for other platforms like Arm64[2].

[1]
https://man7.org/linux/man-pages/man2/open.2.html
"The O_DIRECT, O_NOATIME, O_PATH, and O_TMPFILE flags are Linux-
 specific.  One must define _GNU_SOURCE to obtain their definitions."
[2]
https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h.html#_M/__O_DIRECT

Test-Parameters: testlist=sanityn envdefinitions=ONLY=16a
Fixes: 853d180121a6 ("LU-3606 fsx: Add fallocate operation to fsx")
Change-Id: If72d434adaf91a960dfc50c557d8b50793fda575
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/37589
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15407 test: remove dummy enc key at cleanup 38/46038/3
Sebastien Buisson [Tue, 11 Jan 2022 07:27:42 +0000 (08:27 +0100)]
LU-15407 test: remove dummy enc key at cleanup

Make sure to remove the dummy encryption key from session keyring
when cleaning up encryption tests.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I840490fca0a485110d077fe85254ced817fd55e3
Reviewed-on: https://review.whamcloud.com/46038
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels 37/46037/2
James Simmons [Mon, 10 Jan 2022 23:39:37 +0000 (18:39 -0500)]
LU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels

LNet selftest smoke test was at one time failing for kernels
4.4+. My testing on newer Ubuntu 5.X kernels shows this is now
working so re-enable it in general on the x86 platform.

Test-Parameters: trivial testlist=lnet-selftest

Change-Id: I865ffa868d05c22f2cf53c5e978ab8be9e450e99
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs 22/45922/5
Kevin Zhao [Wed, 22 Dec 2021 01:53:27 +0000 (09:53 +0800)]
LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs

When setup with multiple MDTs, the atomic operation is needed for
`set_bit` operation. On Arm64 platform, the atomic operation will
rely on the exclusive access, which is requesting the address
alignment[1]. So that's why we see that the __ll_sc_atomic64_or+0x4
is crashed. __ll_sc_atomic64_or+0x4 is LDXR instruction, directly
load the value from address exclusively.

The atomic64 required the access the 64 bits alignment address, but
the struct element ha_map is 4 bytes alignment, that is the root
cause. The Error code of this crash is ESR = 0x96000021, which is
the alignment issue[2].

1. https://developer.arm.com/documentation/den0024/a/ch05s01s02
2. https://developer.arm.com/documentation/ddi0595/2021-06/
   AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Change-Id: I3cc6d7347f05680ab55f00538e91886f006deb5d
Reviewed-on: https://review.whamcloud.com/45922
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15406 sec: fix in-kernel fscrypt support 87/45987/4
Sebastien Buisson [Thu, 6 Jan 2022 09:18:20 +0000 (10:18 +0100)]
LU-15406 sec: fix in-kernel fscrypt support

When using in-kernel fscrypt provided by Linux 5.4, the encryption
context can be retrieved by calling the .get_context function defined
in the struct fscrypt_operations of the super_block.
llite needs to retrieve the encryption context explicitly in case of
migration via volatile files.

Fixes: 09c558d16f ("LU-14677 sec: migrate/extend/split on encrypted file")
Fixes: fdbf2ffd41 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I76dbd21f0dc95920519ea375c583bc378d7c9f53
Reviewed-on: https://review.whamcloud.com/45987
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Implement lower/upper aio 09/44209/15
Patrick Farrell [Fri, 30 Jul 2021 16:12:05 +0000 (12:12 -0400)]
LU-13799 llite: Implement lower/upper aio

This patch creates a lower level aio struct for each set of
pages submitted, and attaches that to the llite level aio.

That means the completion of i/o (in the sense of
successful RPC/page completion) is associated with the
lower level aio struct, and the higher level aio waits for
the completion of these lower level structs.  Previously,
all pages were associated with the upper level (and only)
aio struct.

This patch is a reorganization/cleanup, which is necessary
for the next patch, which moves release pages to aio_end.
The justification for this (correctness and performance)
will be provided in that patch.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I02d6a33a0d9f9bbc1a182bcd539bd836c240bcc5
Reviewed-on: https://review.whamcloud.com/44209
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 osc: Always set aio in anchor 53/44153/9
Patrick Farrell [Fri, 30 Jul 2021 16:11:37 +0000 (12:11 -0400)]
LU-13799 osc: Always set aio in anchor

We currently do not set csi_aio for DIO and use this to
control when we free the aio struct.  (For AIO, we must
free it in cl_sync_io_note, but for other users, we have to
wait until after cl_sync_io_wait has been called.)

The lack of csi_aio causes trouble for the implementation
of the next patch, so instead we always set it and control
freeing by checking at that time if we are doing DIO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I2122a6a2dad33179e9114494b53c09d0b64f0fa6
Reviewed-on: https://review.whamcloud.com/44153
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Simplify cda_no_aio_complete use 54/44154/8
Patrick Farrell [Fri, 30 Jul 2021 16:11:03 +0000 (12:11 -0400)]
LU-13799 llite: Simplify cda_no_aio_complete use

It is better to handle AIO and DIO the same as much as
possible, limiting the difference to setup if possible.

In this spirit, move the check for DIO (is_sync_kiocb()) to
the setup function rather than cleanup and just use
no_aio_complete.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1b91e5b8f42971cb37780597402c4ee94f82a963
Reviewed-on: https://review.whamcloud.com/44154
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15417 build: build MOFED 5.5 92/45992/3
Minh Diep [Thu, 6 Jan 2022 21:02:39 +0000 (13:02 -0800)]
LU-15417 build: build MOFED 5.5

The path the mofed header files has change to
/usr/src/ofa_kernel/x86_64/<kernel>
so we cannot assume it's /usr/src/ofa_kernel/default

Test-Parameters: trivial
Change-Id: I10f375b459f04b84003e70951e4e423295001f40
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15396 osd: include linux/file.h 23/45923/2
Alex Zhuravlev [Thu, 23 Dec 2021 08:10:21 +0000 (11:10 +0300)]
LU-15396 osd: include linux/file.h

in some 4.x kernels we need to include linux/file.h to have
alloc_file() defined.

Fixes: b0f150eba4 ("LU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files")
Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I279945f70578030bf581fa2afc0ca7b4dfa83653
Reviewed-on: https://review.whamcloud.com/45923
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15358 tests: Variable incorrectly defined in sanity-quota 20/45820/2
Arshad Hussain [Fri, 10 Dec 2021 09:03:58 +0000 (14:33 +0530)]
LU-15358 tests: Variable incorrectly defined in sanity-quota

Under sanity-quota.sh local variable 'accnt_cnt' was
incorrectly defined. This was exposed using
shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ./lustre/tests/sanity-quota.sh line 3344:
local accnt_cnt
      ^-- SC2034: accnt_cnt appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change-Id: Ib5971e7cc95b03c1f57411c6f02156ab236babcd
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/45820
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15410 tests: Add MDS Space check for dom-performance 73/45973/2
Arshad Hussain [Wed, 5 Jan 2022 11:01:05 +0000 (06:01 -0500)]
LU-15410 tests: Add MDS Space check for dom-performance

IOR Test within dom-performance requires at least
MDS of 20GB. This patch adds MDS space check for
dom-performance/test_IOR to skip in case the MDS
of required size is not found

Test-Parameters: trivial testlist=dom-performance
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I493a6ee5b549539b562aeda418a7418b94060ca9
Reviewed-on: https://review.whamcloud.com/45973
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15408 sec: confirm encrypted file's hash 64/45964/2
Sebastien Buisson [Tue, 4 Jan 2022 17:16:47 +0000 (18:16 +0100)]
LU-15408 sec: confirm encrypted file's hash

It is a good practice to always confirm on server side the encrypted
file's hash included in the digested form sent by the client.

Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I42212a36b23e4e6e41184a78fa8244c5e2d8dd1f
Reviewed-on: https://review.whamcloud.com/45964
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15220 utils: fix gcc-11 -Werror=mismatched-dealloc error 14/45814/3
Jian Yu [Thu, 9 Dec 2021 19:18:13 +0000 (11:18 -0800)]
LU-15220 utils: fix gcc-11 -Werror=mismatched-dealloc error

This patch fixes the following -Werror=mismatched-dealloc error in
lustre_rsync.c:

lustre_rsync.c: In function ‘lr_locate_rsync’:
lustre_rsync.c:1472:17: error: ‘fclose’ called on pointer returned
from a mismatched allocation function [-Werror=mismatched-dealloc]
 1472 |                 fclose(fp);
      |                 ^~~~~~~~~~
lustre_rsync.c:1467:14: note: returned from ‘popen’
 1467 |         fp = popen(rsync, "r");
      |              ^~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=lustre-rsync-test

Change-Id: I518db394a282c8e6123d878f63312bfb27c59235
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45814
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>