Whamcloud - gitweb
fs/lustre-release.git
2 days agoNew release 2.15.7 b2_15 2.15.7 v2_15_7
Oleg Drokin [Tue, 10 Jun 2025 19:54:58 +0000 (15:54 -0400)]
New release 2.15.7

Change-Id: I6ff7ea5a287c9b26552de0c6520a64546c33fd37
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoNew RC 2.15.7-RC2 2.15.7-RC2 v2_15_7-RC2
Oleg Drokin [Tue, 27 May 2025 02:07:57 +0000 (22:07 -0400)]
New RC 2.15.7-RC2

Change-Id: I3db655a794318153a0f66ac87982140da3d3dc55
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-18081 tests: improve conf-sanity/98 testing 33/59433/2
Andreas Dilger [Mon, 26 May 2025 08:44:19 +0000 (01:44 -0700)]
LU-18081 tests: improve conf-sanity/98 testing

conf-sanity test_98 is failing for newer distros like Ubuntu 24.04
and SLES15sp6, which have newer userspace tools.

Print better error messages about what options are being processed
to help debug this issue.  It appears mount is now deduplicating
options passed on the command-line, which invalidates the test case.

Change test_98 to call mount.lustre directly to verify this handling.

Lustre-change: https://review.whamcloud.com/55891
Lustre-commit: ae475331db533826fe444a71bf61ad08d42c0a21

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY=98 clientdistro=sles15sp6
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34de0fdbd4ca36228d313b53251b5dbcfc502b07
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frederick Dilger <fdilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-18092 test: skip sanity-lfsck read check 29/59429/2
Lai Siyao [Mon, 26 May 2025 06:47:07 +0000 (23:47 -0700)]
LU-18092 test: skip sanity-lfsck read check

On some new kernels read on foreign layout doesn't fail yet, skip
read failure check in sanity-lfsck test_38.

Lustre-change: https://review.whamcloud.com/56224
Lustre-commit: 2ea7a0f1956fe0e00e17c14ce37d1a7c240c5c2b

Test-Parameters: trivial clientdistro=sles15sp6 testlist=sanity-lfsck env=ONLY=38,ONLY_REPEAT=30

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ied10a95db217c58383a4082f0bca11c7b47e3276
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59429
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-19035 kernel: update RHEL 8.10 [4.18.0-553.53.1.el8_10] 42/59342/3
Jian Yu [Wed, 21 May 2025 07:32:58 +0000 (00:32 -0700)]
LU-19035 kernel: update RHEL 8.10 [4.18.0-553.53.1.el8_10]

Update RHEL 8.10 kernel to 4.18.0-553.53.1.el8_10.

Lustre-change: https://review.whamcloud.com/59341
Lustre-commit: TBD (from a845ffd5da40af2f24d5a4cad4c8af16513a338e)

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-1

Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-2

Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-3

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-1

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-2

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-3

Change-Id: Ide3ddc9dd8716e24cfb5bbbcba75237ac58041ba
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
2 weeks agoLU-19040 kernel: update SLES15 SP6 [6.4.0-150600.23.50.1] 69/59369/2
Jian Yu [Thu, 22 May 2025 06:40:56 +0000 (23:40 -0700)]
LU-19040 kernel: update SLES15 SP6 [6.4.0-150600.23.50.1]

Update SLES15 SP6 kernel to 6.4.0-150600.23.50.1 for Lustre client.

Lustre-change: https://review.whamcloud.com/59368
Lustre-commit: TBD (from a470380feef87413575d189e548a1a7c82ef0da4)

LU-17905 kernel: new kernel [SLES15 SP6 6.4.0-150600.23.14.2]

This patch makes changes to support new SLES15 SP6 release
with kernel 6.4.0-150600.23.14.2 for Lustre client.

Lustre-change: https://review.whamcloud.com/55563
Lustre-commit: 020e361cfdbad529e31525052e660268d7bd976d

Was-Change-Id: Ib9159d200122595d0a56e3581cfc66d75ddb59f6

LU-18102 tests: skip read/write in sanity/27J for some kernels

Kernel commit v5.11-10234-gcbd59c48ae2b (5.12) skips filemap_read()
with 0 file size, commit v6.2-rc4-61-g5956592ce337 (6.2) just correct
the last page read in filemap_read(), it still skips the real file
read with 0 sized file.

Lustre-change: https://review.whamcloud.com/55977
Lustre-commit: adb3d20f3c8d3a720b293d3d5977a1849a9e0cb7

Was-Change-Id: I1c562cce1374df7659c8d178fb3601e4dfb01744

Fixes: b711af7d24 ("LU-16101 tests: skip sanity/27J for more kernels")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Test-Parameters: trivial mdtcount=4 mdscount=2 \
  clientdistro=sles15sp6 testlist=sanity

Test-Parameters: optional mdtcount=4 mdscount=2 \
  clientdistro=sles15sp6 testgroup=full-dne-part-1

Test-Parameters: optional mdtcount=4 mdscount=2 \
  clientdistro=sles15sp6 testgroup=full-dne-part-2

Test-Parameters: optional mdtcount=4 mdscount=2 \
  clientdistro=sles15sp6 testgroup=full-dne-part-3

Change-Id: Ie2d530f0edb28326bbcbd1326f40e3e7db845c21
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59369
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17784 build: improve wiretest for flexible arrays 03/59303/4
Shaun Tancheff [Thu, 22 May 2025 05:54:48 +0000 (22:54 -0700)]
LU-17784 build: improve wiretest for flexible arrays

Flexible array checking can additionally probe that the size
of the array element is correct.

Lustre-change: https://review.whamcloud.com/54929
Lustre-commit: 339b585c257d7f47d0c9b74f9d940fd20b8c04a3

Was-Change-Id: Ib7de3d156a2e77dfaf2e9ab1df8fab524c073610

LU-17504 libcfs: safer LIBCFS_ALLOC

Make the LIBCFS_ALLOC() family of macros safer by adding
parenthesis around arguments such as (size) to avoid uninteded
expansion.

CoverityID: 415056 ("Integer handling issues")

Lustre-change: https://review.whamcloud.com/55015
Lustre-commit: 0d3a9607655adc8f9dd4ae1c341bde0b57fe88bf

Was-Change-Id: I9701f87025bc5ce038a6bf34413b64a3f019d998

Fixes: 718e3f3e68 ("LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error")

LU-17504 build: fix lock_handle array-index-out-of-bounds

After Linux kernel patch "ubsan: Tighten UBSAN_BOUNDS on GCC"
(commit v6.4-rc2-1-g2d47c6956ab3), flexible trailing arrays
declared like 'lock_handle[2]' will generate warnings when
CONFIG_UBSAN & co. is enabled:

    UBSAN: array-index-out-of-bounds in ldlm_request.c:1282:18
    index 2 is out of range for type 'lustre_handle [2]'

The declaration lock_handle[LDLM_LOCKREQ_HANDLES] confuses the
compiler into thinking there are only two fields in lock_handle,
but the caller often allocates extra fields beyond this for more
locks to be cancelled due to Early Lock Cancellation or from LRU.

Rather than have a second flexible array after lustre_handle[2],
declare the whole array as flexible, and fix up the few sites
that are allocating this array to ensure LDLM_LOCKREQ_HANDLES
fields are allocated at a minimum.

This subtly changes the checks in wiretest.c due to the removal
of the 2 "base" handles in ldlm_request, but I believe this is not
changing the wire protocol because it still allocates those handles
directly, and I have verified interoperability with a 2.14.0 server.

Lustre-change: https://review.whamcloud.com/54926
Lustre-commit: e3a9d87370c4ccc58d1d3a97ea1b221d88f9e57a

Was-Change-Id: I9695fb44f1b5c84bb750d2983cdd8b939e3ebbe5

Test-Parameters: testlist=runtests clientversion=2.14
Test-Parameters: testlist=runtests serverversion=2.14
Test-Parameters: testlist=runtests clientversion=2.15
Test-Parameters: testlist=runtests serverversion=2.15
Test-Parameters: testlist=runtests clientversion=EXA5
Test-Parameters: testlist=runtests serverversion=EXA5
Test-Parameters: testlist=runtests clientversion=EXA6
Test-Parameters: testlist=runtests serverversion=EXA6
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error

This patch fixes the following [-Werror=stringop-overread] and
[-Werror=attribute-warning] errors detected by gcc 13:

lustre/mgc/mgc_request.c:190:21: error: 'strcmp' reading 1 or
more bytes from a region of size 0 [-Werror=stringop-overread]
  190 | if (strcmp(logname, cld->cld_logname) == 0) {
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In function 'fortify_memcpy_chk',
    inlined from 'class_handle_ioctl' at
/root/lustre-release/lustre/obdclass/class_obd.c:381:3:
include/linux/fortify-string.h:528:25: error:
call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  528 |  __write_overflow_field(p_size_field, size);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lustre-change: https://review.whamcloud.com/54834
Lustre-commit: 718e3f3e680f422d865a15890ac60e66dcd9e240

Was-Change-Id: I59f5a88b4cd64c9f4e67e568546baada371543b1

Signed-off-by: Jian Yu <yujian@whamcloud.com>
LU-17504 build: fix array-index-out-of-bounds warning

On Linux kernel 6.5, due to commit 2d47c6956ab3
("ubsan: Tighten UBSAN_BOUNDS on GCC"), flexible
trailing arrays declared like 'lc_array_sum[1];'
will generate warnings when CONFIG_UBSAN & co. is
enabled:

  UBSAN: array-index-out-of-bounds in lprocfs_status.c:1609:17
  index 1 is out of range for type '__s64 [1]'

Since LPROCFS_STATS_FLAG_IRQ_SAFE flag is only used
in one place - obd_memory() counter, we can just
remove it and change obd_memory over to a regular
percpu_counter. This would both simplify the
lprocfs_counter() code, move over to using more
kernel functionality instead of libcfs, as well as
reduce overhead slightly for the memory accounting code.

Lustre-change: https://review.whamcloud.com/54365
Lustre-commit: b698abd415bc4a810f307611fe984e50e007581e

Was-Change-Id: Ic461c4b30317bfd2b1e9f5b6be84c4a7fb4e3eb9

Signed-off-by: Jian Yu <yujian@whamcloud.com>
LU-16363 build: fiemap flexible array

Linux commit v5.19-rc2-1-g94dfc73e7cf4
 treewide: uapi: Replace zero-length arrays with flexible-array
 members
Adjust wiretest to handle flexible array when
sizeof(fiemap->fm_extents) is undefined.

Lustre-change: https://review.whamcloud.com/49305
Lustre-commit: fedf1e8bd70ccb2aaa64cb90111a7298d9bb2bf7

Was-Change-Id: Ia2692d126a871b43e9144e5d151215166604702d

HPE-bug-id: LUS-11388

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib7de3d156a2e77dfaf2e9ab1df8fab524c073610
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-18668 kernel: update RHEL 9.6 [5.14.0-570.17.1.el9_6] 95/59295/2
Jian Yu [Mon, 19 May 2025 19:02:49 +0000 (12:02 -0700)]
LU-18668 kernel: update RHEL 9.6 [5.14.0-570.17.1.el9_6]

Update RHEL 9.6 kernel to 5.14.0-570.17.1.el9_6 for Lustre client.

Lustre-change: https://review.whamcloud.com/59292
Lustre-commit: TBD (from 8fa31ecad4e288e5c949181605a87a90eff86e65)

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.6 testlist=sanity
Test-Parameters: optional clientdistro=el9.6 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.6 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.6 testgroup=full-part-3

Change-Id: Iac6973ac636953c1e64a60433ae72c0f692a24ca
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-19029 kernel: update RHEL 8.10 [4.18.0-553.52.1.el8_10] 94/59294/2
Jian Yu [Mon, 19 May 2025 18:58:25 +0000 (11:58 -0700)]
LU-19029 kernel: update RHEL 8.10 [4.18.0-553.52.1.el8_10]

Update RHEL 8.10 kernel to 4.18.0-553.52.1.el8_10.

Lustre-change: https://review.whamcloud.com/59293
Lustre-commit: TBD (from 6ad0e06d30828f1326f22ac307becaee33738841)

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-1

Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-2

Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-3

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-1

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-2

Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-3

Change-Id: I0d5a2872050a92e1bf8e8b9438ab2bd1a4a21636
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59294
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17179 tests: check the system is clean 12/59112/2
Sergey Cheremencev [Mon, 9 Oct 2023 02:45:16 +0000 (06:45 +0400)]
LU-17179 tests: check the system is clean

Main part of tests cannot work correctly if the system
is not clean. So check this in the beginning of sanity-quota.

Lustre-change: https://review.whamcloud.com/52630
Lustre-commit: 7e1fb1a296ec7ab21be7ec39e2b6a38fbca76b6c

Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ibfbe4663dee8476486e96eb99ccbcea13216861b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59112
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16641 tests: fix sanity-quota_12b 29/56929/7
Sergey Cheremencev [Fri, 8 Nov 2024 06:38:28 +0000 (22:38 -0800)]
LU-16641 tests: fix sanity-quota_12b

Fix sanity-quota_12b to don't fail after
creating $ilimit files with the same inode
hardlmit. It is a legal case to fail create
2048 files when inode hard limit is also 2048.

Lustre-change: https://review.whamcloud.com/53969
Lustre-commit: 25896b8b88207e181eba4994323865cce9878800

Test-Parameters: trivial fstype=zfs \
  env=ONLY=12b,ONLY_REPEAT=100 testlist=sanity-quota

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iea2e976ad1954dc2489ffa81e92e624364343069
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-18340 test: interop check for replay-single 100c 90/56690/2
Hongchao Zhang [Mon, 30 Sep 2024 15:50:57 +0000 (23:50 +0800)]
LU-18340 test: interop check for replay-single 100c

The test_100c in replay-single could failed in interop testing
with old clients, adding an interop check for it.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ia31cd53bdef345ab706d14c27e2b2bcb3d34842f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56690
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoNew RC 2.15.7-RC1 2.15.7-RC1 v2_15_7-RC1
Oleg Drokin [Wed, 21 May 2025 03:43:15 +0000 (23:43 -0400)]
New RC 2.15.7-RC1

Change-Id: I818c7be565c457aed0504af860f6aec4a8d73821
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-18572 lnet: Uninitialized var in lnet_peer_add 94/57494/2
Chris Horn [Fri, 6 Dec 2024 19:51:19 +0000 (12:51 -0700)]
LU-18572 lnet: Uninitialized var in lnet_peer_add

This is a regression introduced in the b2_15 port of

Lustre-change: https://review.whamcloud.com/50106
Lustre-commit: aacb16191a72bc6db1155030849efb0d6971a572

In b2_15, lnet_peer_add() takes an lnet_nid_t nid argument, but the
backport uses an uninitalized large nid variable in various places.
This results in incorrect behavior.

HPE-bug-id: LUS-12633
Test-Parameters: trivial
Fixes: b341288179 ("LU-14668 lnet: Lock primary NID logic")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6b9ca501bac97d40fd193b2c36874f632582714c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57494
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-18053 osc: add another cond_resched() to osc_lru_shrink() 90/56390/4
Brian Behlendorf [Tue, 30 Jul 2024 15:40:19 +0000 (08:40 -0700)]
LU-18053 osc: add another cond_resched() to osc_lru_shrink()

When discarding pages osc_lru_shrink() eventually calls
delete_from_page_cache() which must aquire a spin lock
on the page cache.  On systems with a large number of
CPUs this can result in spinlock contention and soft
lockups.  Add a call to cond_resched() to yield the CPU
as needed.  This is a follow up to LU-17630.

Lustre-change: https://review.whamcloud.com/55888
Lustre-commit: 66549c1540b2931ae1d1d1ebb50afbf15683baf4

Change-Id: I1ebf940a9a96c433f527f3e0dd9dc765b2645c97
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Signed-off-by: Eric Carbonneau <carbonneau1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56390
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
3 weeks agoLU-18697 lnet: lnet_peer_del_nid refcount loss 77/57977/3
Chris Horn [Tue, 4 Feb 2025 16:37:01 +0000 (09:37 -0700)]
LU-18697 lnet: lnet_peer_del_nid refcount loss

This is a regression introduced in the b2_15 port of

Lustre-change: https://review.whamcloud.com/50106
Lustre-commit: aacb16191a72bc6db1155030849efb0d6971a572

In lnet_peer_del_nid(), the call to lnet_peer_ni_find_locked() takes
a reference on the lnet_peer_ni, but this reference is not dropped
if the peer state has LNET_PEER_LOCK_PRIMARY bit set and the nid
being deleted is the primary NID of the peer.

A test case is added to exercise this code path.

HPE-bug-id: LUS-12709
Fixes: c9badd8648 ("LU-14668 lnet: Lock primary NID logic")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib717672189824e61a184ddcb9127d2921f2a66db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57977
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-18931 build: Update ZFS version to 2.2.7 34/58834/7
Jian Yu [Thu, 8 May 2025 06:48:47 +0000 (23:48 -0700)]
LU-18931 build: Update ZFS version to 2.2.7

Update ZFS version to 2.2.7. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.2.7

LU-18886 zfs-osd: za_name flexible array OI scrub fix

Initialize the zap_attribute_t.za_name_len to resolve the
OI scrub failures observed with zfs-2.3.  This is a follow up
to commit d47a71f7a894a193957fe7771d43c5767979c117 which made
an identical fix for the other sites where a zap_attribute_t
is used.

Lustre-change: https://review.whamcloud.com/58762
Lustre-commit: 18f7a2e9ff3536a6d7cefebb9e9a58ffbf460627

Was-Change-Id: Ib4295cbb7ce7e7efe0f7e67b82bc8c73a1f9df8d

Fixes: d47a71f7a894a ("LU-18360 zfs-osd: za_name flexible array")
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
LU-17924 tests: disable obdfilter-survey for ZFS

Starting with ZFS 2.2.7 the obdfilter-survey test fails due to
memory exhausting. For now disable the test for ZFS.

Lustre-change: https://review.whamcloud.com/58796
Lustre-commit: b0bbd61f8d7411560b6aa4ae84b8c680d7bd5c3d

Was-Change-Id: Ibebc637a9b733cf0b262d18de1baeef09108cd36

Signed-off-by: James Simmons <jsimmons@infradead.org>
LU-18153 osd: don't release uninitialized SA

if osd-zfs's object has no initialized SA, then do not
try to release that.

Lustre-change: https://review.whamcloud.com/57989
Lustre-commit: fa0e99f28aa015a721de6eea41019a58c25f8606

Was-Change-Id: I210ae1eb9cae0bfb02161efeee2f897d9c37294b

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
LU-18624 zfs: disable compression by default

By default ZFS is enabling compression which is causing test
failures for us. It should the administrators choice to use
ZFS with compression turned on or off. Change mount.lustre
zfs backend to handle compression=val mount option. By default
we will turn it off.

For the test suite we can use <facet_type>_FS_MKFS_OPTS to
turn on compression for testing.

Disable lots of conf-sanity test which are broken with ZFS 2.X.

Lustre-change: https://review.whamcloud.com/57990
Lustre-commit: de4f30d5c862e4887d001e8c29cf920e04c5f737

Was-Change-Id: I752c883f6f912a340aa346e1dfb8bf7bdef24939

Signed-off-by: James Simmons <jsimmons@infradead.org>
LU-17763 tests: use urandom in sanity/66

as zfs-2.2.6 compresses data by default and this breaks the test
using /dev/zero as a source.

Lustre-change: https://review.whamcloud.com/57987
Lustre-commit: 5751da80ecd57dfae9ad3c080b8c984605da47ec

Was-Change-Id: I6853c693e7cb560ae737025507e5377da82c787b

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
LU-18728 tests: use urandom to really consume ZFS space

It appears newer ZFS is using data compression by default, so reading
from /dev/zero results in files not consuming the expected amount of
space.  Instead, read from /dev/urandom for ZFS to write files in
sanity and conf-sanity to ensure they fill the OSTs, or the image
to be used for target creation, as expected.

Lustre-change: https://review.whamcloud.com/58115
Lustre-commit: 6153eed3ee180e8695c9e2e3d9ad9db8a6b73ad8

Was-Change-Id: I7b4e95032608d8db82c75e4b6dd1ec5beb6f8d99

Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
LU-18360 zfs-osd: za_name flexible array

zfs-2.3 commit 3cf2bfa57008af7f0690f73491d7f9b4ac4ed65a
  Allocate zap_attribute_t from kmem instead of stack

za_name maximum size increased to ZAP_MAXNAMELEN_NEW

Ensure zap_attribute_t.za_name space is available and
zap_attribute_t.za_name_len now needs to be initialized
to the allocated space of MAXNAMELEN.

Also mkfs should disable longname support as it would break
MAX_NAME interop.

Lustre-change: https://review.whamcloud.com/56656
Lustre-commit: d47a71f7a894a193957fe7771d43c5767979c117

Was-Change-Id: I6c48c66a42a36ea6816b37ffce7e17f45eed3dbf

HPE-bug-id: LUS-12561

LU-14094 tests: give zfs more wait time to unlink

In sanity.sh test_311, give zfs more wait time to unlink more
files as expected.

Lustre-change: https://review.whamcloud.com/56952
Lustre-commit: faf7b20eeae7550083842c810a3941f07859ae1c

Was-Change-Id: I17f278df3826fa38b71713c610d644cc7676c1ad

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
LU-14094 tests: improve sanity.sh test_311

Improve sanity.sh test_311 to see why the number of the objects
doesn't decrease as expected.

Lustre-change: https://review.whamcloud.com/55566
Lustre-commit: c1bc42821d36f9ec5630e43c142abda60515d9e3

Was-Change-Id: Iabbaed42c5654ef31bc9f98fe9868785f8ff2f18

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
LU-18272 build: remove Summary line from osd-zfs

Resolve spurious warning:
   warning: line 390: second Summary
when building src rpm:

Lustre-change: https://review.whamcloud.com/56506
Lustre-commit: 69e67fb582f846af45ac608b32be716e435d34e4

Was-Change-Id: I6aa591aae3ae4dc07a36740e12ef3520cea035ef

HPE-bug-id: LUS-12538

LU-15963 osd-zfs: use contiguous chunk to grow blocksize

otherwise a sparse OST_WRITE can grow blocksize way too large.

Lustre-change: https://review.whamcloud.com/47768
Lustre-commit: dacc4b6d384cbe6376a4cf106cc63ad1ac0cd23d

Was-Change-Id: I729775490f9a0c8262708931f321297af943f3c0

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
LU-6142 osd-zfs: Fix style issues for osd_io.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_io.c

Lustre-change: https://review.whamcloud.com/54264
Lustre-commit: 4c0e328c9417ff196dc8e69f75c187dd21809ce7

Was-Change-Id: Ia9153be34a1d583195e3ecfc56ca4ab279781566

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
LU-14692 tests: restore sanity/312 to always_except

The sanity test_312 was incorrectly removed from ALWAYS_EXCEPT.

Lustre-change: https://review.whamcloud.com/49720
Lustre-commit: 8767d2e44110fc19e624e963d5ebc788409339d3

Was-Change-Id: I6e8ed42561809b28fd6d5b4f7ee1104080ebe756

Fixes: eaae465556 ("LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Test-Parameters: fstype=zfs
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-3

Change-Id: Ic9df52fc7933cc9129f3b6cb630199c8c44d6d59
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17662 osd-zfs: Support for ZFS 2.2.3 36/59136/2
Shaun Tancheff [Wed, 7 May 2025 19:28:35 +0000 (12:28 -0700)]
LU-17662 osd-zfs: Support for ZFS 2.2.3

ZFS commit zfs-2.2.99-269-g9b1677fb5
   dmu: Allow buffer fills to fail
Adds a boolean_t to dmu_buf_will_fill() and dmu_buf_fill_done()

Lustre always uses B_FALSE for this argument.

Also re-arrange and split some configure macros so we can all
the zfs and ldiskfs tests can be run in the same parallel pass.

Lustre-change: https://review.whamcloud.com/54530
Lustre-commit: a13fc434c57fd72e5f8908a8a197fca1a0d373e5

Was-Change-Id: I71a4723bfa8ce62ae6f270e26ab149bf98278d3f

LU-18515 build: fix configure checks for ZFS 2.2.3

These config checks were missing certain headers. Thus,
the checks failed which caused the build to also fail when
trying to compile against ZFS 2.3.0+.

Lustre-change: https://review.whamcloud.com/57452
Lustre-commit: d33e96cdd472f76d56f0e543cb1b3afb07785e54

Was-Change-Id: I8e231f0c4581f435cb4209c767fc4727cb6cbfa0

Fixes: a13fc434c57f ("LU-17662 osd-zfs: Support for ZFS 2.2.3")
Signed-off-by: Timothy Day <timday@amazon.com>
LU-16664 build: Debian server fails building crypto.c

When building deb files against a server built without
CONFIG_FS_ENCRYPTION it still attempts to build crypto.c
when it should not.

Lustre-change: https://review.whamcloud.com/50406
Lustre-commit: 71746b6277a59ede2110420f5907fe2dafd9ac2a

Was-Change-Id: Id1e67daa7b021fdfee49be4eb0beb2b86ca62c39

Fixes: 068e5f13fb ("LU-13743 build: Explicitly require encryption support")

LU-13743 build: Explicitly require encryption support

Linux commit v5.18-rc5-17-gb1241c8eb977
  ext4: move ext4 crypto code to its own file crypto.c

Update the ldiskfs Makefile to exclude crypto.c when
CONFIG_FS_ENCRYPTION is not enabled.

Lustre-change: https://review.whamcloud.com/39243
Lustre-commit: 068e5f13fb94802ced68712ee11e7f9cb106d0ae

Was-Change-Id: Ic8a40f3d395286bb52ed20693fd7cc4755b10556

LU-13485 ldiskfs: Parallel configure tests for ldiskfs

Transform the compile tests in ldiskfs to run in parallel

Lustre-change: https://review.whamcloud.com/38351
Lustre-commit: 3774b6afbe3b67e869bb61c9cb212cc37e8705fa

Was-Change-Id: I3a097ab5cd18b57e9311980d9aa708ed25f58464

LU-6142 misc: update headers in config, debian, rpm

Update the file header to have the SPDX license and
use the standard format.

Fix minor style issues with comments in a few files.
Remove `dnl` from m4 files.

Files that are uncertain are left as NOASSERTION
for the license identifier. This makes no claim
about the file. It is used to track files so they
can be addressed later.

https://spdx.github.io/spdx-spec/v2-draft/package-information/#75-package-supplier-field

Lustre-change: https://review.whamcloud.com/52106
Lustre-commit: f89529a06ed2cf6e3a02df6093e047d4d7da15ea

Was-Change-Id: I212ce05a4292bbb0d71372d9d75880ce45a219f3

Signed-off-by: Timothy Day <timday@amazon.com>
LU-13530 build: Add kernel version to depmod

The depmod commands in the postrm and
postinst scripts should use the kernel
version the package is built against.
Otherwise, depmod will use the current
kernel version - which might be different.

This patch also adds a line indicating that
the file has been modified.

Lustre-change: https://review.whamcloud.com/49573
Lustre-commit: 98338572a671e1a9c227a8589f7a9b5972b924bf

Was-Change-Id: I355420a85ea0ed301433816588758197795b5ede

Signed-off-by: Timothy Day <timday@amazon.com>
LU-13906 build: Conditionally require kmod-zfs-devel

Server with zfs support requires either kmod-zfs-devel
or a configure that points to the required headers and
library files via configure.

Here we check the configure arguments for '--with-zfs-obj='
if the zfs path is specified for configure the package
requirement is not needed.

Otherwise require the kmod-zfs-devel package and require
one of libzfs-devel, libzfs4-devel or libzfs5-devel

Lustre-change: https://review.whamcloud.com/46356
Lustre-commit: 77d01c485e00468b2ef6e3eb64544d30c049a411

HPE-bug-id: LUS-9743, LUS-10363
Was-Change-Id: Ia12239ac7e3912ff50ec7c8e2ceb888862afbc34

LU-17171 test: improve sanity-quota test_41

On zfs backend, df result of project quota may print be a slightly
larger block used than quota result because the former is calcuated
in filesystem block size which is 4K.

Update sanity-quota test_41 to make it more robust.

Lustre-change: https://review.whamcloud.com/52591
Lustre-commit: a504f2c869bada922c6211822be432b253921096

Was-Change-Id: Ide51d9aaeb8907eb77acc30fa4fc76dcc16e8de0

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
LU-16743 lod: create stripe with correct attr

lod_xattr_set_lmv() create directory stripe with master object attr,
but it shouldn't change attr->la_valid, otherwise bogus data may be
set on stripe object.

Zfs osd_create() copies attr to object directly, clear la_flags if
LA_FLAGS is not set in la_valid.

Lustre-change: https://review.whamcloud.com/52052
Lustre-commit: 6be9476e790ceef71e874b2745a8280443d5c90b

Was-Change-Id: I8385f36bd2eee0e55cbe6bd031b0e013cda40e06

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Test-Parameters: trivial fstype=zfs
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-3

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I71a4723bfa8ce62ae6f270e26ab149bf98278d3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59136
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16053 build: Update zfs configure checks 32/58832/4
Shaun Tancheff [Wed, 7 May 2025 05:10:20 +0000 (22:10 -0700)]
LU-16053 build: Update zfs configure checks

From Brian Behlendorf <behlendorf1@llnl.gov>:

update dmu_*_by_dnode checks
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update zap_*_by_dnode checks
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update multihost protection check
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.  Drop the compatibility code required to
 support OpenZFS releases older than 0.7.0.

update userobj accounting check
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update dmu_prefetch() check
 Provided since at least ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update dmu_object_alloc_dnsize() check
 Provided since at least ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update spa_maxblocksize() check
 Provided since at least ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update dsl_pool_config_enter/exit check
 Convert to a fatal configure error, these functions have
 been provided since at least ZFS 0.7.x.

replace sa_spill_block() check
 The sa_spill_block() function was removed after the ZFS 0.6.x
 release.  Replace the check with one for use zio_buf_alloc/free
 which have been available since 0.7.x.

 The dsl_sync_task_do_nowait() function has not been provided
 by since the 0.6.x releases.  Furthermore, the results of this
 check are unused by Lustre so let's just remove it.

Lustre-change: https://review.whamcloud.com/48089
Lustre-commit: 46938c53461d136b71a32c4951b1776e2d226648

Test-Parameters: trivial fstype=zfs
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-3

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3c1597e56100961178f9001e918ffb9aa3558706
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-13132 osd: osd-zfs to cache dbufs for llog objects 23/59123/3
Alex Zhuravlev [Tue, 6 May 2025 23:11:47 +0000 (16:11 -0700)]
LU-13132 osd: osd-zfs to cache dbufs for llog objects

working set for llog objects is tiny and very predictable. osd-zfs
can cache couple dbufs (first block storing the header and last
block for new records).

for sanity/60a (llog test) it gives 5939307 hits and 5776 misses
while average osd_write() goes down from 1.09 usec to 0.27 usec,
total time for sanity/60a: before - 153s, after - 101s.

this approach can be used in few other cases like last_rcvd.

Lustre-change: https://review.whamcloud.com/37222
Lustre-commit: 11a89c5ec16685fda91dd7c052b72012833c2f88

Was-Change-Id: Icc0126658894085d33ef79ae41ac6c1ed4140f4c

LU-16479 utils: Add option to manage degraded ZFS OST

Add new Lustre specific ZFS dataset user property to
control/manage degraded ZFS OSTs, also modify the existing
lustre/scripts/statechange-lustre.sh zedlet accordingly.
Extend the same to mkfs.lustre utility to add this property
by default when creating a new Lustre ZFS server.

Lustre-change: https://review.whamcloud.com/49660
Lustre-commit: a2de6af65d21bff0d9357c30e6eb4ba049ff2059

Was-Change-Id: I7032538f507c9ad20d5b109b54e3c3bab8138458

HPE-bug-id: LUS-11447
Signed-off-by: Akash B <akash-b@hpe.com>
LU-14918 osd: don't declare similar zfs writes twice

in some cases (like overstriping) the same operations can be
declared multiple times (new llog records) and this lead to
huge number of credits and performance degradation. we can
avoid this checking for duplicate declarations.
notice each declare operation results in a allocation in ZFS.

the example for an overstriped file (2000 stripes over 4 OSTs),
declare ops before after
create: 2001 2
unlink: 10001 10

creation of 1K-stripe files (over 4 OSTs) is 2.5% faster.
removal of 1K-stripe files is 44% faster.

single-stripe file creation/removal does not degrade.

Lustre-change: https://review.whamcloud.com/49701
Lustre-commit: c1936c9d294d53ff39741e1b07ffc74f51fcddb6

Was-Change-Id: I5d9e6d3a1574ccd7bf97fd3a67ab4fff0b6a352c

LU-14918 osd: don't declare similar ldiskfs writes twice

in some cases (like overstriping) the same operations can be
declared multiple times (new llog records) and this lead to
huge number of credits and performance degradation. we can
avoid this checking for duplicate declarations.
As every declaration would need an allocation, limit the scope
of this checks to transaction likely to be large.

% of "large" transaction in sanity-benchmark, depending on threshold:

  creates < 5 && writes < 5:
  0.58% (mds1) and  2.97% (mds2)

  create < 7 & writes < 7:
  0.58% and 2.4%

  create < 9 & writes  < 9:
  0.6% and 1.85%

  create < 10 & write2 < 10:
  0.0004% and 0.000001%

thus 10 creates or writes is selected as a threshold to enable this
logic.

Lustre-change: https://review.whamcloud.com/45765
Lustre-commit: 9e6225b2e7385cbb7be0474df01075fafc4966d5

Was-Change-Id: I7c893fe3b95646b4b813b999bc832659dfcf03ad

LU-15642 obdclass: use consistent stats units

Use consistent stats units, since some were "usec" and others "usecs".
Most stats already use LPROCFS_TYPE_* to encode type stats type, so
use this to provide units for those stats, and only explicitly provide
strings for the few stats that don't match the commonly-used units.
This also reduces the number of repeat static strings in the modules.

Lustre-change: https://review.whamcloud.com/46833
Lustre-commit: b515c6ec2ab84598c77c65eb78f1afd5e67b1ede

Was-Change-Id: I25f31478f238072ddbf9a3918cd43bb08c3ebbe5

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
LU-16150 zfs: Fix ZFS(2.1.99-1) build error on CentOS (3.10)

ZFS: (2.1.99-1)
Lustre: 27723374a38 LU-16073 utils: double snapshot_mount fix
CentOS: 3.10.0-1160.15.2.el7.x86_64

This patch fixes build failures seens as below for the
above configuration:

First:
make[4]: Entering directory `/root/lustre01/lustre-release/lustre/utils'
gcc  -rdynamic -shared -export-dynamic -pthread \
-L/root/zfs/zfs_git_lustre_build/zfs//lib/libzfs/.libs/
-L/root/zfs/zfs_git_lustre_build/zfs//lib/libnvpair/.libs/
-L/root/zfs/zfs_git_lustre_build/zfs//lib/libzpool/.libs/ -o
mount_osd_zfs.so \
`ar -t libmount_utils_zfs.a` \
-ldl   -lzfs -lnvpair -lzpool
/usr/bin/ld: cannot find -lzfs
/usr/bin/ld: cannot find -lnvpair
/usr/bin/ld: cannot find -lzpool
collect2: error: ld returned 1 exit status

Lustre-change: https://review.whamcloud.com/48536
Lustre-commit: 448963c9a33dbf0e0988ceeb407027f2488e7f42

Was-Change-Id: I32f270c7912379f7dce940e0aa2bceee5e49ad79

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
LU-15611 osd-zfs: Cleanup while mount failed

Need clean up in error out path in osd-zfs.

Lustre-change: https://review.whamcloud.com/46678
Lustre-commit: 9b973ad37f66a10eb7db1ced6865708497ecc02b

Was-Change-Id: I47d9ee9483acb8e1d60c77e8cfc481902a1535ac

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.5 serverdistro=el8.10 testgroup=full-dne-zfs-part-3

Change-Id: Icc0126658894085d33ef79ae41ac6c1ed4140f4c
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59123
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-18970 kernel: update RHEL 8.10 [4.18.0-553.51.1.el8_10] 21/59221/6
Jian Yu [Wed, 14 May 2025 06:24:34 +0000 (23:24 -0700)]
LU-18970 kernel: update RHEL 8.10 [4.18.0-553.51.1.el8_10]

Update RHEL 8.10 kernel to 4.18.0-553.51.1.el8_10.

The patch also provides a fallback series which uses
the former base/ext4-delayed-iput.patch for kernel
before 4.18.0-553.22.1.el8_10.

Lustre-change: https://review.whamcloud.com/59201
Lustre-commit: e1ccf4865dfcaa02c52a6a27440da36779c733a4

Build-Parameters: distro=el8.10 arch=x86_64
Build-Parameters: distro=el9.5 arch=x86_64
Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity
Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-1
Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-2
Test-Parameters: optional fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-part-3
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testgroup=full-dne-zfs-part-3
Change-Id: I210fcf4be1bf39a0cb6fc64dcdfa898bb98f87ca
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59221
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
3 weeks agoLU-18969 kernel: update RHEL 9.5 [5.14.0-503.40.1.el9_5] 18/59218/5
Jian Yu [Wed, 14 May 2025 00:38:15 +0000 (17:38 -0700)]
LU-18969 kernel: update RHEL 9.5 [5.14.0-503.40.1.el9_5]

Update RHEL 9.5 kernel to 5.14.0-503.40.1.el9_5 for Lustre client.

Lustre-change: https://review.whamcloud.com/59200
Lustre-commit: TBD (from ef3a6d0c681ef6ee6eeb5a902b0a5f22862c33f1)

Build-Parameters: distro=el8.10 arch=x86_64
Build-Parameters: distro=el9.5 arch=x86_64
Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.5 testlist=sanity
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-3

Change-Id: I62b270ad85126e6022eaf04ddbd32898fb4dc320
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59218
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
4 weeks agoLU-18668 kernel: new kernel [RHEL 9.6 5.14.0-570.16.1.el9_6] 63/59063/6
Jian Yu [Fri, 9 May 2025 20:21:43 +0000 (13:21 -0700)]
LU-18668 kernel: new kernel [RHEL 9.6 5.14.0-570.16.1.el9_6]

This patch makes changes to support new RHEL 9.6 release
for Lustre client.

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.6 testlist=sanity
Test-Parameters: optional clientdistro=el9.6 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.6 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.6 testgroup=full-part-3

Lustre-change: https://review.whamcloud.com/57876
Lustre-commit: TBD (from fb56a56e6a9620801a938e28cd539b6fb0065bf2)

Change-Id: Idf8c96ee9389978d9497da73b05c5ed400c429d4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59063
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-18516 quota: use wait_woken for qsd_op_begin0() 41/59141/2
James Simmons [Sat, 22 Feb 2025 12:56:17 +0000 (07:56 -0500)]
LU-18516 quota: use wait_woken for qsd_op_begin0()

Kernels with debugging enabled report for Lustre quota handling:

do not call blocking ops when !TASK_RUNNING;
? __might_sleep+0x9d/0xc0
  down_read_nested+0x2e/0x4b0
  lquota_disk_read+0x46e/0x800 [lquota]
  qsd_refresh_usage+0x105/0x3d0 [lquota]
  qsd_acquire+0xbe/0x7c0 [lquota]
  qsd_op_begin0+0x5f8/0xc80 [lquota]

This is due to qsd_acquire() performing operations that can sleep while
the kthread is in an idle state. The Linux kernel solution for this
is wait_woken(). Move the function qsd_op_begin0() from using
wait_event_idle_timeout() to wait_woken(). This will resolve the
potential sleeping issues.

Lustre-change: https://review.whamcloud.com/58156
Lustre-commit: 2417aeddc649840179a2575a28fdedf3fe662916

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Id2b7a5886869bf0ed3d560e159524dcda841d8b0
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59141
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-15808 ptlrpc: ptlrpc_set_wait() use wait_woken 08/59108/2
Shaun Tancheff [Fri, 13 Sep 2024 02:31:08 +0000 (09:31 +0700)]
LU-15808 ptlrpc: ptlrpc_set_wait() use wait_woken

ptlrpc_set_wait() using a potentially long running condition
ptlrpc_check_set() that can also block.

If it does block during ptl_send_rpc() it could potentially
trigger a warning:
   do not call blocking ops when !TASK_RUNNING

NeilBrown <neilb@suse.de> suggested to use wait_woken() instead.

Convert ptlrpc_set_wait to use wait_woken()
similar to the wait_woken() method used in ptlrpcd.

Lustre-change: https://review.whamcloud.com/56317
Lustre-commit: 930ad25733d925021fbce468568acacde219d67c

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I544550db58fa2e89ce18a8a43a64fdea7ed57206
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
4 weeks agoLU-18103 llite: Apply the change of splice_read 70/59170/2
Yang Sheng [Fri, 9 May 2025 20:18:14 +0000 (13:18 -0700)]
LU-18103 llite: Apply the change of splice_read

Upstream changes in kernel v6.4-rc2-3-g69df79a45111
("splice: Rename direct_splice_read() to copy_splice_read()")
the rule of splice_read().  For newer kernels we now
choose copy_splice_read() to adapt it.

Lustre-change: https://review.whamcloud.com/56093
Lustre-commit: 3f24011b01a8a072751899bae84839655d899620

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ic13da773691ada3c21b9803f65ea3e2533f7885a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59170
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17243 build: compatibility updates for kernel 6.6 11/59111/3
Shaun Tancheff [Tue, 6 May 2025 05:28:55 +0000 (22:28 -0700)]
LU-17243 build: compatibility updates for kernel 6.6

linux kernel v5.19-rc1-4-gc4f135d64382
  workqueue: Wrap flush_workqueue() using a macro
linux kernel v6.5-rc1-7-g20bdedafd2f6
  workqueue: Warn attempt to flush system-wide workqueues.
If __flush_workqueue(system_wq) is not available fall back to
flush_scheduled_work()

linux kernel v6.5-rc1-92-g13bc24457850
  fs: rename i_ctime field to __i_ctime
Use accessors for ctime. Provide replacements for older
kernels.

linux kernel v6.5-rc1-95-g0d72b92883c6
  fs: pass the request_mask to generic_fillattr
Provide request_mask argument where needed.

Linux commit v6.5-rc2-20-g2ddd3cac1fa9
  nsproxy: Convert nsproxy.count to refcount_t
Provide a wrapper for inc/dec of nsproxy.count

linux kernel v6.5-rc4-110-gcf95e337cb63
  mm: delete mmap_write_trylock() and vma_try_start_write()
Use down_write_trylock directly mmap_write_trylock

In preparation for kernel 6.7 the remaining inode time
accessors will be preferred:

linux kernel v6.6-rc5-86-g12cd44023651
  fs: rename inode i_atime and i_mtime fields
Use accessors for atime and mtime. Provide replacements for
older kernels.

Lustre-change: https://review.whamcloud.com/52908
Lustre-commit: a0e6d6f7327598d13661bb14098a9f21f2035285

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ide6c2e3e8db532449850b145c2d61b972d21f649
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
4 weeks agoLU-16594 build: get_random_u32_below, get_acl with dentry 09/59109/5
Shaun Tancheff [Mon, 5 May 2025 20:20:37 +0000 (13:20 -0700)]
LU-16594 build: get_random_u32_below, get_acl with dentry

Linux commit v6.1-13825-g3c202d14a9d7
  prandom: remove prandom_u32_max()

Use get_random_u32_below() and provide a replacement
when get_random_u32_below is not available.

Linux commit v6.1-rc1-2-g138060ba92b3
  fs: pass dentry to set acl method
Linux commit v6.1-rc1-4-g7420332a6ff4
  fs: add new get acl method

get_acl() and set_acl() have new signatures

Lustre-change: https://review.whamcloud.com/50193
Lustre-commit: 3ef773db80fc346455c9939ad108f3f56990ee9c

Was-Change-Id: I1de02f86fd2719fc75de4f014f51d73736d83c33

LU-18127 autoconf: fix configure tests -Warray-bounds errors

Kernel commit v5.17-rc3-1-ge6148767825c (Makefile:
Enable -Warray-bounds) enabled -Warray-bounds option,
which caused Lustre configure tests hit false failures
as follows:

  build/conftest.c: In function 'main':
  build/conftest.c:228:39: error: array subscript 0 is
    outside array bounds of 'struct inode_operations[0]'
    [-Werror=array-bounds]
    228 | ((struct inode_operations *)1)->get_acl((struct inode *)NULL, 0, false);
        |                                       ^~
  cc1: all warnings being treated as errors

This patch fixes the "struct inode_operations" related
configure tests to avoid the above failures.

The -Warray-bounds option was disabled by the following
kernel commits:
- v5.19-rc1-28-gf0be87c42cbd (gcc-12: disable '-Warray-bounds' universally for now)
- v6.2-rc3-9-g5a41237ad1d4 (gcc: disable -Warray-bounds for gcc-11 too)
- v6.3-rc7-202-g0da6e5fd6c37 (gcc: disable '-Warray-bounds' for gcc-13 too)

Lustre-change: https://review.whamcloud.com/55989
Lustre-commit: a78894ee642d423a244fc0dcbc518c57a458a2db

Was-Change-Id: Iee73962ffc117a2f98e8f339462820aff2278815
Signed-off-by: Jian Yu <yujian@whamcloud.com>
LU-18388 llite: handle -EOPNOTSUPP in get_inode_acl

When ll_xattr_list returns -EOPNOTSUPP [-95] NULL should be
returned to avoid sanity failing to run as non-root users with:

 operation mds_getxattr to node 192.168.122.50@tcp failed:  rc = -95

Fixes: 13fd5ebef3 ("LU-18101 sec: fix ACL handling on recent kernels again")

Lustre-change: https://review.whamcloud.com/56756
Lustre-commit: 7e457159a097d69e76097c55c75546163a44befa

Was-Change-Id: I208e7e6095c19728643a6d208becd448ed2e2539

LU-18070 sec: clear ACL caches if ACL empty

When the lli_posix_acl field of struct ll_inode_info is updated,
check if new ACL is empty, and clear ACL caches for this inode
in this case.

Also fix sanity test_103a when it is run with multiple MDTs. The
test has several requirements regarding uids and gids, but in case
they are not met, missing ids are only configured on mds1. So make
sure the directory used for the test ($DIR/$tdir) is created on mds1.

Fixes: 13fd5ebef3 ("LU-18101 sec: fix ACL handling on recent kernels again")
Fixes: aa636f8ae6 ("LU-18095 sec: fix ACL handling on recent kernels")

Lustre-change: https://review.whamcloud.com/56600
Lustre-commit: 659bb1d704317346ebfde1899c8ee8b38c9c3f80

Was-Change-Id: I91109bf98bc65dfb1fcefb2551be84d9c73f8ee2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
LU-18101 sec: fix ACL handling on recent kernels again

On recent distributions like Ubuntu 24.04, the kernel offers the
.get_inode_acl op on struct inode_operations. This must be defined
and fetch ACLs, otherwise they can end up being incorrect on inodes.

Lustre-change: https://review.whamcloud.com/56552
Lustre-commit: 13fd5ebef3a7a1ae3574458674e16ca782b181e7

Was-Change-Id: Idcc642a11f6f6198217e5eadb2a2c32e8117b8b7

Fixes: aa636f8ae6 ("LU-18095 sec: fix ACL handling on recent kernels")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
LU-18095 sec: fix ACL handling on recent kernels

On recent distributions like Ubuntu 24.04, the kernel imposes that
ACLs are fetched via the dedicated .get_acl operation (or
.get_inode_acl) instead of doing this via the xattr handlers.
So in ll_get_acl() explicitly fetch the xattr containing ACLs,
XATTR_NAME_ACL_ACCESS or XATTR_NAME_ACL_DEFAULT. This is going to
populate to xattr cache, hence avoiding multiple requests to the MDS.

Also fix sanity-sec test_23b to make sure variable comparisons are
correct. And fix test cleanup to avoid leftovers.

Lustre-change: https://review.whamcloud.com/56254
Lustre-commit: aa636f8ae6883cef018b859bba70140df9a826c4

Was-Change-Id: I467d5a558eaa524e823527a8798478934f65abf9
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
LU-18329 tests: check version for parallel-scale-nfs/test_1

Check the lustre client version of nfs server (MDS1 by default) in
parallel-scale-nfs/test_1 for interop test.

Lustre-change: https://review.whamcloud.com/56674
Lustre-commit: ebfdfd587bc5804acf48d444b182fa9b7ef4d3c9

Was-Change-Id: I76ecb3bc28f37ba7d0c24d18eead621d6b066800

Test-Parameters: testlist=parallel-scale-nfsv3 env=ONLY=1
Test-Parameters: serverversion=2.15.5 testlist=parallel-scale-nfsv3 env=ONLY=1

Fixes: 69dcd1b940 ("LU-18030 tests: Add a test to ensure permissions copy on nfs")
Signed-off-by: Feng Lei <flei@whamcloud.com>
LU-18030 tests: Add a test to ensure permissions copy on nfs

Also ensure no empty POSIX ACL is created.

Lustre-change: https://review.whamcloud.com/56176
Lustre-commit: 69dcd1b940ce43af17489b0268d16c5186cf0325

Was-Change-Id: I85d37f8eebd17d6acdb67c552fec2caa79dbd39c

Test-Parameters: testlist=parallel-scale-nfsv3

LU-18030 llite: make ll_set_acl send xattr to server unmodified

Otherwise posix_acl_update_mode might convert acl into inode mode
change and as the result it's lost and the existing ACL xattr
is removed (if any

Lustre-change: https://review.whamcloud.com/55715
Lustre-commit: 75f55f99a328923ec9fb5bb4ca3418aa11bbe336

Was-Change-Id: I3956beac04889a657f76f9b36dbe97518ae9f2ac

LU-14707 tests: Bashify more scripts for Ubuntu et. al.

Some scripts that are not POSIX sh are being
invoked using sh. The scripts should be called
using the shell listed in the shebang.

Lustre-change: https://review.whamcloud.com/49296
Lustre-commit: 2faee9972cdcde6652db1a857b548b3f7396944b

Was-Change-Id: I7233ce56df95a5b8698b39872e6118a4fa1a029a

Signed-off-by: Timothy Day <timday@amazon.com>
HPE-bug-id: LUS-11556
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1de02f86fd2719fc75de4f014f51d73736d83c33
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59109
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17081 build: Prefer folio_batch to pagevec 99/59099/2
Shaun Tancheff [Mon, 5 May 2025 18:22:00 +0000 (11:22 -0700)]
LU-17081 build: Prefer folio_batch to pagevec

Linux commit v5.16-rc4-36-g10331795fb79
  pagevec: Add folio_batch

Linux commit v6.2-rc4-254-g811561288397
  mm: pagevec: add folio_batch_reinit()

Linux commit v6.4-rc4-438-g1e0877d58b1e
  mm: remove struct pagevec

Use folio_batch and provide wrappers for older kernels to use
pagevec handling, conditionally provide a folio_batch_reinit

Add macros to ease adding pages to folio_batch(es) as well
as unwinding batches of struct folio where struct page is
needed.

Lustre-change: https://review.whamcloud.com/52259
Lustre-commit: b82eab822c078b584fadefd419bfa74df0edebcb

Was-Change-Id: Ie70e4851df00a73f194aaa6631678b54b5d128a1

LU-17904 build: fix typo in vvp_set_batch_dirty

Fix typo vvp_set_batch_dirty() when kallsyms_lookup_name()
is exported and account_page_dirtied is not.

HPE-bug-id: LUS-12374
Fixes: b82eab822c0 ("LU-17081 build: Prefer folio_batch to pagevec")

Lustre-change: https://review.whamcloud.com/55301
Lustre-commit: a89458b3b2a08f78c4795816ca34716b110b8aac

Was-Change-Id: I8b2e6884e74e384aba6e563bef30072175cc0efc

LU-17903 build: enable fast path of vvp_set_batch_dirty

SUSE 15 SP6 6.4 kernel retains kallsyms_lookup_name so
the fast path of vvp_set_batch_dirty() can be enabled.

However the combination of kallsyms_lookup_name without
lock_page_memcg breaks some old assumptions

Prefer folio_memcg_lock to lock_page_memcg however

Linux commit v5.15-12272-g913ffbdd9985
  mm: unexport folio_memcg_{,un}lock

folio_memcg_lock is also not exported so use
kallsyms_lookup_name to acquire the symbol

HPE-bug-id: LUS-12371
Fixes: 61e83a6f130 ("LU-16113 build: Fix configure tests for lock_page_memcg")

Lustre-change: https://review.whamcloud.com/55300
Lustre-commit: ac6dba062928c3eba5f2ddd372a6225436b4e96a

Was-Change-Id: I8ac6b7bde8ee8964db5a801c2f3c4dfb2ef459f9

HPE-bug-id: LUS-11811
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie70e4851df00a73f194aaa6631678b54b5d128a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17081 build: compatibility for 6.5 kernels 97/59097/2
Shaun Tancheff [Mon, 5 May 2025 17:03:42 +0000 (10:03 -0700)]
LU-17081 build: compatibility for 6.5 kernels

Linux commit v6.4-rc2-29-gc6585011bc1d
  splice: Remove generic_file_splice_read()

Prefer filemap_splice_read and provide alternates for older kernels.

Linux commit v6.4-rc2-30-g3fc40265ae2b
  iov_iter: Kill ITER_PIPE

ITER_PIPE and iov_iter_is_pipe() are removed, provide a replacement
for iov_iter_is_pipe

Linux commit v6.4-rc4-53-g54d020692b34
  mm/gup: remove unused vmas parameter from get_user_pages()

Use vma_lookup() to acquire the vma following get_user_pages()

Linux commit v6.4-rc7-1884-gdc97391e6610
  sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Use sendmsg when MSG_SPLICE_PAGES is defined. Provide a wrapper
using sendpage() for older kernels.

Lustre-change: https://review.whamcloud.com/52258
Lustre-commit: 2bb54b6383d57ac61092593b9e6d9c80801263f5

Was-Change-Id: I95a0954a602c8db08d30b38a50dcd50107c8f268

LU-17193 build: fix gcc-12 compiler warnings

Building on el9.2 hit a couple of new errors in configure, ex:
  ((struct inode_operations *)1)->fileattr_get()
hits:
  error: array subscript 0 is outside array bounds
         of â€˜struct inode_operations[0]’

A few instances of QCTL_COPY() should be QCTL_COPY_NO_PNAME()
as the zero-length array to hold the pool name is not
allocated in these cases.

Lustre-change: https://review.whamcloud.com/52687
Lustre-commit: 1b0de05f81372eeda9a2a38142553ead7e88a431

Was-Change-Id: I72bda8b46c51dbd42fb42bf569ba29572526acfe

LU-13485 libcfs: Remove unused iter_type check

The iter_type member check is not used, remove it.

Lustre-change: https://review.whamcloud.com/48091
Lustre-commit: c755373c567090c49589e5aa0d3134847d4b952e

Was-Change-Id: I48d536a27738e73314feb88317d41d8479c72528

HPE-bug-id: LUS-11811
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I95a0954a602c8db08d30b38a50dcd50107c8f268
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16667 build: struct mnt_idmap, linux/filelock.h 83/59083/4
Shaun Tancheff [Sun, 4 May 2025 23:41:48 +0000 (16:41 -0700)]
LU-16667 build: struct mnt_idmap, linux/filelock.h

Linux commit v6.2-rc3-9-g5970e15dbcfe
  filelock: move file locking definitions to separate
            header file

Add configure test for linux/filelock.h and include it
where needed.

linux kernel v6.2-rc1-4-gb74d24f7a74f
  fs: port ->getattr() to pass mnt_idmap
linux kernel v6.2-rc1-3-gc1632a0f1120
  fs: port ->setattr() to pass mnt_idmap

Add a configure test for mnt_idmap and fallback to using
user_namespace for older kernels.

Lustre-change: https://review.whamcloud.com/50420
Lustre-commit: 3011aa564a8c682aafbc6071b9866e266d8a6232

Was-Change-Id: Ib8cbb3157fb11b4f1fc55f1626c2998cb202bd8c

LU-16667 build: kernel_cap_t contains u64

linux kernel v6.2-13111-gf122a08b197d
  capability: just use a 'u64' instead of a 'u32[2]' array

Add configure test for kernel_cap_t as u64 and provide
and accessor for the least significant 32 bits.

As of linux commit v3.6-10973-g607ca46e97a1 lustre implicitly
started to ignore some capabilities, see:
   include/uapi/linux/capability.h

The last capability flag was added by:
   linux commit v5.8-rc5-1-g124ea650d307

The capabilities the Lustre currently ignores are:
 - CAP_MAC_OVERRIDE
 - CAP_MAC_ADMIN
 - CAP_SYSLOG
 - CAP_WAKE_ALARM
 - CAP_BLOCK_SUSPEND
 - CAP_AUDIT_READ
 - CAP_PERFMON
 - CAP_BPF
 - CAP_CHECKPOINT_RESTORE

None of which appear to be important to Lustre operations
and should be fine to continue ignore.

Lustre-change: https://review.whamcloud.com/50421
Lustre-commit: ea9532fb731bbfe041010e2224219479c2c0d71b

Was-Change-Id: I48ad7b1a34fff378c260dc73ea91b22aaa0d7469

HPE-bug-id: LUS-11557
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib8cbb3157fb11b4f1fc55f1626c2998cb202bd8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59083
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-10994 osc: remove oap_cli 78/59078/4
John L. Hammond [Sun, 4 May 2025 07:49:11 +0000 (00:49 -0700)]
LU-10994 osc: remove oap_cli

Remove the redundant oap_cli member from struct osc_async_page.

...:(cl_page.c:216:__cl_page_alloc()) slab-alloced 'cl_page': 256 at 000000009ab84b37.

Lustre-change: https://review.whamcloud.com/47403
Lustre-commit: 9b4fb6e65414fa75a23970863e8b0cc34d621db7

Was-Change-Id: Idd088f0906a10773568495933592ac5e755dc047

LU-10994 clio: remove cpl_obj

Remove cpl_obj from struct cl_page_slice. This member is only used in
the osc layer and struct osc_page already contains a pointer to the
osc_object.

Lustre-change: https://review.whamcloud.com/47402
Lustre-commit: 10da530a2411f28988ab5287e73ccbcfb6100605

Was-Change-Id: I6451aa50ff0e8db67f1c6f4f7edbde4fa8d36c5b

LU-10994 clio: remove unused convenience functions

Remove the unused convenience functions cl_page_top(), cl_page_at(),
cl_page_at_trusted(), and cl2vm_page().

Lustre-change: https://review.whamcloud.com/47401
Lustre-commit: 27e19a5420ae3ec22d8f3fe7f0e7794676479540

Was-Change-Id: I9c994d8f4c81bc93383a9eb46def514685a27690

LU-10994 clio: remove struct vvp_page

Remove struct vvp_page and use struct cl_page_slice in its place. Use
cp_vmpage in place of vpg_page and cl_page_index() in place of
vvp_index().

Lustre-change: https://review.whamcloud.com/47400
Lustre-commit: 127570b3e5a2ff018323724d2c060ccda1fc5e3d

Was-Change-Id: I2cd408f08e6ff9f7686b591c02ea95e31ad2b2ae

LU-10994 clio: remove cpo_prep and cpo_make_ready

Remove the cpo_prep and cpo_make_ready methods from struct
cl_page_operations. These methods were only implemented by the vvp
layer and so they can be easily inlined into cl_page_prep() and
cl_page_make_ready().

Lustre-change: https://review.whamcloud.com/47399
Lustre-commit: ca161cfbcbf04bffb33a678d3a744a9276b4b78a

Was-Change-Id: I177fd8d3c3832bcc8f06ed98cdf9d30f18d49e88

LU-10994 clio: remove vvp_page_print()

Remove vvp_page_print() by placing equivalent code in cl_page_print().

Lustre-change: https://review.whamcloud.com/47398
Lustre-commit: bf1d1b0e41ff245f59280a0783e80dc81278e730

Was-Change-Id: I815c4f63dc6fe57eec0987f209a2f34d3ff58146

LU-10994 clio: Remove cl_2queue_add wrapper

Remove the wrapper function cl_2queue_add() and replace all its calls in
different files with the function it wrapped. Also, comments are added
wherever necessary to make the working of function clear. Prototype of
the function is also removed from the header file as it is no longer
needed.

Linux-commit: 53f1a12768a55e53b2c40e00a8804b1edfa739b3

Lustre-change: https://review.whamcloud.com/47651
Lustre-commit: 5038bf8db83d4cb409b7563f028f48cca0385c19

Was-Change-Id: Ic746c45e3dda9fdf3f1d2f8c6204d80fec5c058f

LU-10994 clio: remove cpo_assume, cpo_unassume, cpo_fini

Remove the cl_page methods cpo_assume, cpo_unassume, and
cpo_fini. These methods were only implemented by the vvp layer and so
they can be easily inlined into cl_page_assume() and
cl_page_unassume(). Remove vvp_page_delete() by inlining its contents
to cl_page_delete0().

Lustre-change: https://review.whamcloud.com/47373
Lustre-commit: 9045894fe0f5033334a39a35a6332dab4498e21e

Was-Change-Id: I260c5593983bac6742cf7577c26a4903e95ceb7c

LU-10994 clio: remove cpo_own and cpo_disown

Remove the cpo_own and cpo_disown methods from struct
cl_page_operations. These methods were only implemented by the vvp
layer so they can be inlined into cl_page_own0() and
cl_page_disown(). Move most of vvp_page_discard() and all of
vvp_transient_page_discard() into cl_page_discard().

Lustre-change: https://review.whamcloud.com/47372
Lustre-commit: 81c6dc423ce4c62a64d328e49697d26194177f9f

Was-Change-Id: I3f156d6ca3e4ea11c050b2addda38e84a84634b9

LU-10994 clio: remove cl_page_export() and cl_page_is_vmlocked()

Remove cl_page_export() and cl_page_is_vmlocked(), replacing them with
direct calls to PageSetUptodate() and PageLoecked().

Lustre-change: https://review.whamcloud.com/47241
Lustre-commit: 3d52a7c5753e80e78c3b6f6bb7a0b66b37f4849b

Was-Change-Id: I883d1664f4afc7a1d4006f9f4833db8125c0e8f5

LU-10994 echo: remove client operations from echo objects

Remove the client (io, page, lock) operations from echo_client
objects. This will facilitate the simplification of CLIO.

Lustre-change: https://review.whamcloud.com/47240
Lustre-commit: 6060ee55b194e37e87031c40e9d48f967eabe314

Was-Change-Id: If9e55c7d54c171aa2e1bcf272641c2bd6be8ad48

LU-10994 lov: remove lov_page

Remove the lov page layer since it does nothing but costs 24 bytes per
page plus pointer chases.

Lustre-change: https://review.whamcloud.com/47221
Lustre-commit: 56f520b1a4c9ae64caa235e9ce7699e7fb627f0c

Was-Change-Id: Icd7b4b0041e0fe414a3a4143179f45845177960e

LU-6142 clio: make cp_ref in cl_page a refcount_t

As this is used as a refcount, it should be declared
as one.

Lustre-change: https://review.whamcloud.com/49072
Lustre-commit: e19804a3b7e793a11b1c8b5e0db9f6315f243b8c

Was-Change-Id: I8108e14e545bb56aae34a0f6ae9d5a04227fc067

LU-6142 obdclass: make ccc_users in cl_client_cache a refcount_t

As this is used as a refcount, it should be declared
as one.

Lustre-change: https://review.whamcloud.com/48881
Lustre-commit: ed0f74cd635fa110c1896981d58c389826041808

Was-Change-Id: I5af513ccb2b706a398e647ce0427affa4516a9b5

LU-6142 obdclass: change some foo0() to __foo()

Change:
  cl_io_init0 -> __cl_io_init
  cl_lock_trace0 -> __cl_lock_trace
  cl_page_delete0 -> __cl_page_delete
  cl_page_state_set0 -> __cl_page_state_set
  cl_page_own0 -> __cl_page_own
  cl_page_disown0 -> __cl_page_disown
  cl_page_delete0 -> __cl_page_delete

This is more consistent with Linux naming style.

Lustre-change: https://review.whamcloud.com/48803
Lustre-commit: 8828aa8e75f28808da42f521662a245c6b8a9896

Was-Change-Id: If38b52465d42ac425d47c1e9ded62bd7f013e0eb

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Idd088f0906a10773568495933592ac5e755dc047
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-15317 osc: Add RPC to iotrace 75/59075/2
Patrick Farrell [Sat, 3 May 2025 01:38:53 +0000 (18:38 -0700)]
LU-15317 osc: Add RPC to iotrace

Add RPCs to iotrace debugging.

To avoid creating too much debug output, this debug
ignores the possiblity that an RPC contains non-contiguous
extents.  Thus the eventual visualization will act as
though the RPC is a continuous whole.  I judge this to be
superior to the amount of log data and complexity of
capturing each extent separately.  If that level of detail
is needed, a higher debug level can be used.

Lustre-change: https://review.whamcloud.com/45894
Lustre-commit: 5cb722c384077dd2469763a4f70a72bed555c8db

Was-Change-Id: I6fe416ba44be5572f130704ba9d3e9b85d09c656

LU-15317 llite: Add COMPLETED iotrace messages

It's very useful to see how long an I/O call took.  There
are other ways to do this, but the goal is for iotrace to
provide all necessary information for basic I/O performance
analysis, so we add COMPLETED messages to iotrace.

Lustre-change: https://review.whamcloud.com/46484
Lustre-commit: d48b10cef36d74cc63cf6e9340f43a5cebd985de

Was-Change-Id: I17f52ebc87a31d5ba34f63dc8b6a279e83cd10ef

LU-15317 llite: Add FID to async ra iotrace

IOtrace log entries need to include the FID of the file
concerned.  Add this to async readahead.

Lustre-change: https://review.whamcloud.com/45912
Lustre-commit: 1f3ecfdbb4c765808a1d30677e0f67421fab6e0c

Was-Change-Id: I8d788969f29412ce88f1cafa229977f6efa20962

LU-15317 llite: Add strided readahead to iotrace

We need to capture some additional parameters to correctly
understand the behavior of strided readahead.  Add these
parameters to the existing iotrace message.

Lustre-change: https://review.whamcloud.com/45888
Lustre-commit: 5ed185955985b099b3bd7311b346f5945c0940a4

Was-Change-Id: I7caddf9dfaf9ba5f2645d045d5a4a50562cc1b54

LU-15317 llite: Make iotrace logging quieter

Most of the time, we don't read any pages with readahead,
since we're moving through the window and aren't ready to
read more yet.  That's important for readahead debug, but
there's no need to log it for iotrace.  (This matters
because without this change, this messsage is the large
majority of iotrace messages.)

Lustre-change: https://review.whamcloud.com/45887
Lustre-commit: a91b5d4a990c6a870774e1e856cc41f665a88854

Was-Change-Id: I58197acd1ef97c903320a2433ec1d5dcb0e46bd0

Test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6fe416ba44be5572f130704ba9d3e9b85d09c656
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/59075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17149 tbf: nrs_tbf_id_cli_set should not modify the fmt 74/52974/3
Etienne AUJAMES [Wed, 27 Sep 2023 11:51:36 +0000 (13:51 +0200)]
LU-17149 tbf: nrs_tbf_id_cli_set should not modify the fmt

nrs_tbf_id_cli_set() needs to parse LDLM_ENQUEUE request to find
uid/gid.

It calls req_capsule_extend() that change the request format (rc_fmt).
If rc_fmt was not null, the function will not restore the initial
request format.

The following crash will occur the 2sd time that nrs_tbf_id_cli_set()
is called (1st: o_cli_find(), 2sd: o_cli_init()):

LustreError: 8949:0:(req_capsule_extend())
ASSERTION( fmt->rf_fields[i].nr >= old->rf_fields[i].nr ) failed:
Call Trace TBD:
[<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[<0>] req_capsule_extend+0x174/0x1b0 [ptlrpc]
[<0>] nrs_tbf_id_cli_set+0x1ee/0x2a0 [ptlrpc]
[<0>] nrs_tbf_generic_cli_init+0x50/0x180 [ptlrpc]
[<0>] nrs_tbf_res_get+0x1fe/0x430 [ptlrpc]
[<0>] nrs_resource_get+0x6c/0xe0 [ptlrpc]
[<0>] nrs_resource_get_safe+0x87/0xe0 [ptlrpc]
[<0>] ptlrpc_nrs_req_initialize+0x58/0xb0 [ptlrpc]
[<0>] ptlrpc_server_request_add+0x248/0xa20 [ptlrpc]
[<0>] ptlrpc_server_handle_req_in+0x36a/0x8c0 [ptlrpc]
[<0>] ptlrpc_main+0xb97/0x1530 [ptlrpc]
[<0>] kthread+0x134/0x150
[<0>] ret_from_fork+0x1f/0x40

Lustre-change: https://review.whamcloud.com/52528
Lustre-commit: 855f3d03c21752c8d7136a8a9e48223ee3302512

Test-Parameters: testlist=sanityn env=ONLY=77
Test-Parameters: testlist=sanityn env=ONLY=77
Test-Parameters: testlist=sanityn env=ONLY=77
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Lei Feng <flei@ddn.com>
Change-Id: Ia762936262e8cde891ae2a9daf4ce691c6a6f616
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52974
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-18427 script: allow llog removal scripts on ZFS 13/58013/2
Mikhail Pershin [Wed, 6 Nov 2024 18:26:57 +0000 (21:26 +0300)]
LU-18427 script: allow llog removal scripts on ZFS

Make both scripts working also for ZFS mounts

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/56906
Lustre-commit: 783d6a4677bad0cd85fe24510310a3844dbe13bd

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I713548caa2f11af334c7bd10c07ecc81c387f5e1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58013
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-16232 script: fix the argument parse 12/58012/2
Yang Sheng [Sat, 6 May 2023 07:16:17 +0000 (15:16 +0800)]
LU-16232 script: fix the argument parse

The issue makes script skip other arguments if
the special parameter is not last one.

Lustre-change: https://review.whamcloud.com/50876
Lustre-commit: 99144a595b767ef79acec058c838759bea73c579

Test-Parameter: trival

Fixes: b533700add (LU-16232 scripts: changelog/updatelog emergency cleanup)
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ia309e7b6f1a62e76b80851848601c3d0b03be8b2
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58012
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-16232 scripts: changelog/updatelog emergency cleanup 11/58011/2
Mikhail Pershin [Wed, 12 Oct 2022 09:22:14 +0000 (12:22 +0300)]
LU-16232 scripts: changelog/updatelog emergency cleanup

Emergency cleanup scripts for situations when llogs are
corrupted and can't be cleaned up in a normal way. In such
cases the recommendation is to remove/truncate those llogs.

Scripts make all needed steps and have debugging option to
collect llogs for further analysis.

Scripts possible actions are:
 - dry-run mode to check all actions and files affected
 - create archive with all llogs for analysis
 - remove llogs including all plain llogs

Lustre-change: https://review.whamcloud.com/48838
Lustre-commit: b533700add91fe4220f50d057a470e0b6f4893c9

Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3b197179bc54f451e3c5d7db36b6f1c56c076856
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58011
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-18738 utils: avoid statx() of root of mounted FS 21/58321/2
Olaf P. Faaland [Sat, 22 Feb 2025 05:29:55 +0000 (21:29 -0800)]
LU-18738 utils: avoid statx() of root of mounted FS

When looking for a specific mounted lustre file system by path, avoid
the stat() or statx() call on lustre file systems whose mountpoints do
not match the given path.

This avoids hangs if the client is disconnected from MDT0 of other
mounted file systems, but the desired file system is reachable.

Lustre-change: https://review.whamcloud.com/58135
Lustre-commit: 2da8542e7069af71566a5d36d53fdc840a63228a

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I1c67214f107ae2afe34d050470155807063bda51
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58321
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-18072 ptlrpc: do not search for duplicate cancel requests 68/58968/2
Oleg Drokin [Thu, 31 Oct 2024 19:29:46 +0000 (15:29 -0400)]
LU-18072 ptlrpc: do not search for duplicate cancel requests

Cancel requests don't have any max inflight limitations, so
really could arrive in huge numbers and if they are also have a resent
flag, that leads to a lot of very expensive duplicates search that is
totally unneeded at the same time, so let's skip the check for cancels.

Lustre-change: https://review.whamcloud.com/56843
Lustre-commit: 41bb553efcf58fcfb6bdd427a41c655c191480e0

Change-Id: Id4be03a3c9406867adcdcfd31ed91ecc7b12f700
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58968
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-18229 ldlm: BL_AST|CANCELLING lock still can be batched 70/58770/3
Vitaly Fertman [Tue, 17 Sep 2024 18:07:18 +0000 (21:07 +0300)]
LU-18229 ldlm: BL_AST|CANCELLING lock still can be batched

The current code makes BL_AST locks which are also CANCELLING to be
cancelled individually (one lh per RPC), because they are already in
the l_bl_list. This still could be optimised.

A small cleanup in mdc_rename(): ldlm_cli_cancel_list() is already
called by mdc_prep_elc_req()->ldlm_prep_elc_req(), aligned with other
mdc_prep_elc_req() calls.

Lustre-change: https://review.whamcloud.com/56389
Lustre-commit: 1c9115f919109936576193a4d20baeccedb8be41

HPE-bug-id: LUS-12470
Fixes: b65374d9 ("LU-16285 ldlm: send the cancel RPC asap")
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I218d38bc56a885845c48a3c982840b35b132f213
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-18111 ptlrpc: don't drop expired cancel request 69/58769/3
Andriy Skulysh [Mon, 24 Apr 2023 10:54:05 +0000 (13:54 +0300)]
LU-18111 ptlrpc: don't drop expired cancel request

There is no need to drop expired cancel request by
a server because the client will resend the same content.
Even if the server is heavy loaded cancel request processing
helps to release ldlm resources and avoids spending time
on processing of the same resends.

Add extra check to prevent same cookie for another client.

Lustre-change: https://review.whamcloud.com/55946
Lustre-commit: 3c4387cb61e8a4056ce56ae37ab538e86265fac7

Change-Id: Ib6e22de72262065c453a390e5563f6ac4212c5a6
HPE-bug-id: LUS-11479, LUS-11595
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17181 misc: don't block reclaim threads 68/58768/2
Alexey Lyashkov [Wed, 11 Oct 2023 09:04:32 +0000 (12:04 +0300)]
LU-17181 misc: don't block reclaim threads

memory reclaim threads may blocked by lustre reclaim
process, but lustre don't have any benifit from parallel
reclaim.

Lustre-change: https://review.whamcloud.com/52627
Lustre-commit: 2c97684db9d9286a2916420138529b4fbd0e4bbe

Test-Parameters: trivial
HPe-bug-id: LUS-11872
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I624edbb8833975864706ec51537d2954f5a9cea4
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58768
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16285 ldlm: BL_AST lock cancel still can be batched 67/58767/2
Vitaly Fertman [Tue, 28 Feb 2023 01:45:15 +0000 (04:45 +0300)]
LU-16285 ldlm: BL_AST lock cancel still can be batched

The previous patch makes BLAST locks to be cancelled separately.
However the main problem is flushing the data under the other batched
locks, thus still possible to batch it with those with no data.
Could be optimized for not yet CANCELLING locks only, otherwise it is
already in the l_bl_ast list.

Lustre-change: https://review.whamcloud.com/50158
Lustre-commit: 9d79f92076b6a9ca735dbe4420c122f47d240263

Fixes: b65374d9 ("LU-16285 ldlm: send the cancel RPC asap")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Ie4a7c7f3e0f5462290f72af7c3b2ff410a31f5e7
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58767
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16285 ldlm: send the cancel RPC asap 66/58766/2
Yang Sheng [Sat, 14 Jan 2023 17:56:14 +0000 (01:56 +0800)]
LU-16285 ldlm: send the cancel RPC asap

This patch try to send cancel RPC ASAP when bl_ast
received from server. The exist problem is that
lock could be added in regular queue before bl_ast
arrived since other reason. It will prevent lock
canceling in timely manner. The other problem is
that we collect many locks in one RPC to save
the network traffic. But this process could take
a long time when dirty pages flushing.

 - The lock canceling will be processed even lock has
   been added to bl queue while bl_ast arrived. Unless
   the cancel RPC has been sent.
 - Send the cancel RPC immediatly for bl_ast lock. Don't
   try to add more locks in such case.

Lustre-change: https://review.whamcloud.com/49527
Lustre-commit: b65374d96b2027213f253e128d3e5b3943ff2240

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie5efff3f1ed4e46448371185a0c08968233e7644
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58766
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16424 tests: Add version check in sanity-lnet 11/57111/2
Wei Liu [Fri, 22 Nov 2024 20:29:53 +0000 (12:29 -0800)]
LU-16424 tests: Add version check in sanity-lnet

Skip sanity-lnet test_205, test_207 and test_209 if
version is older than 2.14.58 since the lnet_if_list
function was added in Fixes:
3166a201e0 ("LU-15398 tests: Use remote peers for health tests")

Lustre-change: https://review.whamcloud.com/51942
Lustre-commit: ee4f470d590dd19d9c7d188958d9305ccd666e5e

Test-Parameters: trivial testlist=sanity-lnet \
serverjob=lustre-b2_14 serverbuildno=2 \
serverdistro=el8.3

Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I9cd62d91980784e3b33cf4e30426bf74d17f717f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-15846 tests: don't use comma-separated debug flags 00/57100/2
Andreas Dilger [Thu, 21 Nov 2024 20:15:54 +0000 (12:15 -0800)]
LU-15846 tests: don't use comma-separated debug flags

To avoid test interop issues between 2.15 clients and 2.12/2.14
servers, don't use comma-separated debug flags in sanity-quota.sh
quota_init() and quota_fini().

Lustre-change: https://review.whamcloud.com/47308
Lustre-commit: fe8315c25ed093d77a6f366e2a4849aba008b680

Test-Parameters: trivial testlist=sanity-quota env=ONLY=48 serverversion=2.14.0
Fixes: 6b6fde1026 ("LU-13055 libcfs: allow comma-separated masks")
Fixes: 78be823f33 ("LU-15218 quota: delete unused quota ID")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifca39054d14292bca8bcff9b8e03ae58fd5cc3a8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57100
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18356 tests: allow server to specify except list 19/56919/6
Andreas Dilger [Wed, 6 Nov 2024 04:00:06 +0000 (21:00 -0700)]
LU-18356 tests: allow server to specify except list

Allow the installed server code to specify a lists of subtests that
should be excluded by older clients when running a particular test
script.  This allows older clients to skip tests that they would
otherwise run from their local test script, but that do not work due
to server changes.

The files for each test script are read from the mds1 and ost1 facets.
The filename(s) under lustre/tests/except/ should start with the base
test script name (e.g. sanity), followed by '.', an optional unique
string to avoid conflicts between patches, and end with ".ex".
For example, sanity.ex, sanity.test_142.ex, sanity.acl.ex are valid
"sanity.sh" except filenames, but sanity-acl.ex is not.

Lines starting with '#' are comments and ignored.  Otherwise, lines
should have whitespace-separated fields on each line, as shown in the
examples below.

  #facet op need_version             jira     space_separated_subtests
  mds1    < v2_14_55-100-g8a84c7f9c7 LU-14927 0f
  linux   < 5.12.0                   LU-18102 27J
  client  == OST1_VERSION            LU-13081 151 156

The facet may be "client", "mds1", "ost1", or "linux" (client), and
"need_version" can be any Lustre (or Linux) version number or another
version name like OST1_VERSION, MDS1_VERSION, or CLIENT_VERSION.
The "op" can be standard math/logic comparisons ">=", "<", "!=", etc.

The version comparison is handled like the below pseudo-code:

  ${FACET}_VERSION $op $need_version OR except $subtests

In other words, the version check must be true or the subtest(s) will
not be run.  Checks within a single file should be ordered by subtest
number to make it easier to see whether some subtest is being skipped.

Lustre-change: https://review.whamcloud.com/56901
Lustre-commit: c4c3a7350b55ace0c38123c4b820c713f42e1cb7

Fix tests using "sh TESTSCRIPT.sh" instead of "bash TESTSCRIPT.sh"
to start a script that is calling test-framework.sh, since that
would now run afoul of the bashism that is added in this patch.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0216d9980147ce3409807e9d7f9759fe533ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56919
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-15553 test: mkdir_on_mdt0 in replay-vbr.sh 43/56943/3
Lai Siyao [Fri, 8 Nov 2024 19:51:53 +0000 (11:51 -0800)]
LU-15553 test: mkdir_on_mdt0 in replay-vbr.sh

Change mkdir to mkdir_on_mdt0 in several replay-vbr.sh sub tests.

Lustre-change: https://review.whamcloud.com/56540
Lustre-commit: f6c733c422eae64cea93c33fb14e6adb2eed81d0

Fixes: b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")
Test-Parameters: trivial testlist=replay-vbr mdtcount=4
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7457c155bbadb86adf8272113a4e4202b98c20a5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56943
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-18214 ldlm: change flock deadlock detection 84/56984/5
Yang Sheng [Thu, 14 Nov 2024 20:54:22 +0000 (12:54 -0800)]
LU-18214 ldlm: change flock deadlock detection

The flock deadlock detection code thought request lock
same as blocking lock is a bug. In fact, this is a case
of cycling chain. So we should treat it as a deadlock
case. Also clean up the reprocess code.

Lustre-change: https://review.whamcloud.com/56319
Lustre-commit: c2e6fa41aac428222770a4fc2826567a74a6dbc6
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Icf0df4ac281c2cdb6cc57cb79db137d39ecef9e6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56984
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17070 lov: retry layout refresh if got old layouts 88/56988/3
Bobi Jam [Tue, 12 Nov 2024 20:09:12 +0000 (12:09 -0800)]
LU-17070 lov: retry layout refresh if got old layouts

lov_layout_change() would not apply old layouts which can get through
when MDS doesn't take layout lock, this patch would retry getting
the layout and re-apply the layout again for once.

Lustre-change: https://review.whamcloud.com/55061
Lustre-commit: 7974e41a26c22181be2818b3580756fa559d14d9

Fixes: 13557aa869 ("LU-15300 mdt: refresh LOVEA with LL granted")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Id29ec4ada85060a20f730f92a6a9409d755a56a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16770 llite: prune object without layout lock first 87/56987/3
Andriy Skulysh [Tue, 12 Nov 2024 20:03:42 +0000 (12:03 -0800)]
LU-16770 llite: prune object without layout lock first

lov_layout_change() calls cl_object_prune() before
changing layout. It may lead to eviction from MDT
in case slow responce from OST.

To reduce risk of possible eviction call cl_object_prune()
without layout lock held before calling lov_layout_change()

vvp_prune() attempts to sync and truncate page cache pages.
osc_page_delete() may encounter page cache pages in non-clean state
during truncate because there's a race window between sync and truncate.
Writes may stick into this window and generate dirty or writeback pages.

This window is usually protected with a special truncate semaphore e.g.
when truncate is requested from the truncate syscall.

Let's use this semaphore to avoid write vs truncate race in vvp_prune().

Lustre-change: https://review.whamcloud.com/50742
Lustre-commit: 9c453ba6d9a0152aa75e92b8372d54a758a10b18

HPE-bug-id: LUS-9927, LUS-11612
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: Ie2ee29ea1e792e1b34b6de068ff2b84fd8f52f2a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-15300 mdt: refresh LOVEA with LL granted 64/55464/8
Alex Zhuravlev [Tue, 12 Nov 2024 19:23:08 +0000 (11:23 -0800)]
LU-15300 mdt: refresh LOVEA with LL granted

this change tries to fix two problems:
1) mdt_reint_open() fetches LOVEA before layout lock is taken.
   this may race with another process changing the layout and
   may result in a stale layout returned with a granted layout
   lock - re-fetch LOVEA once layout lock is granted

2) lov_layout_change() should not apply old layouts which
   can get through when MDS doesn't take layout lock

3) LFSCK shouldn't ignore layout version stored on MDS to avoid
   a situation when version degrades compared to client's copy.

This patch misses an optimization and can result in a number of
useless calls to OSD to fetch LOVEA. To be fixed in a followup
patch.

Lustre-change: https://review.whamcloud.com/46413
Lustre-commit: 13557aa86904376e48a5e43256d5c1ab32c1c2d6

LU-14869 test: improve sanity-flr/200a

Make sure "flock -x" successfully returned before running mirror
resync so that it won't get into running read holding shared flock.

Lustre-change: https://review.whamcloud.com/54345
Lustre-commit: 2bf51212680b3d4117925965c368d53587bf37d4

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idee1101d152ab09947faf6d75574a8761a7690a5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55464
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18085 llite: use RCU to protect the dentry_data 07/57007/2
Yang Sheng [Thu, 14 Nov 2024 02:25:17 +0000 (18:25 -0800)]
LU-18085 llite: use RCU to protect the dentry_data

The upstream has changed the rule of dentry kill since
v6.7-rc1-20-g1c18edd1b7a0. The d_release callback will
be invoked before the dentry was removed from children
list. This means the changes of d_fsdata could be seen
for others. We have already used call_rcu to handle the
release. So just apply RCU in read side to ensure access
safety.

Lustre-change: https://review.whamcloud.com/55984
Lustre-commit: 983999bda71115595df48d614ca1aaf9b746c75f

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I58713bfbf22749d6c0a5e40f710549662248e32f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17613 tests: explicit check for eviction with dmesg parse 57/57057/2
Vladimir Saveliev [Tue, 30 Jul 2024 15:21:43 +0000 (18:21 +0300)]
LU-17613 tests: explicit check for eviction with dmesg parse

client_evicted() used to check for client eviction based on result of
lfs df. When it returned any error but EOPNOTSUPP - that was taken as
"client was evicted".

When glibc's realpath() changed to not call stat()
(see for ref
  stdlib: Sync canonicalize with gnulib [BZ #10635] [BZ #26592] [BZ
  ..
  - Realpath mishandles EOVERFLOW; stat not needed anyway (BZ#24970).
)
'lfs df' started to return EOPNOTSUPP from lfs_df(). client_evicted()
was changed, now any non-zero return is taken as client was evicted.

Check for "This client was evicted" in dmesg output to make sure that
eviction happened.

Add a comment in ptlrpc_import_recovery_state_machine() to make it
clear that this specific error message is used by the test code. Avoid
ratelimiting for the message.

Lustre-change: https://review.whamcloud.com/54299
Lustre-commit: ab5a2b63fb90b75ef07d25b347423e2db05286ef

Fixes: a5a9ded43b ("LU-16916 tests: fix client_evicted() not to ignore EOPNOTSUPP")
Test-Parameters: trivial testlist=replay-vbr,recovery-small
HPE-bug-id: LUS-11742
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I10ef99d23d630164bfdf167e54e2f177e9b85598
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-18387 kernel: update RHEL 9.5 [5.14.0-503.14.1.el9_5] 30/57030/3
Jian Yu [Mon, 18 Nov 2024 20:18:19 +0000 (12:18 -0800)]
LU-18387 kernel: update RHEL 9.5 [5.14.0-503.14.1.el9_5]

Update RHEL 9.5 kernel to 5.14.0-503.14.1.el9_5 for Lustre client.

Lustre-change: https://review.whamcloud.com/57029
Lustre-commit: TBD (from 1879bf37e4360b46feddf9a01b531b8226b6befa)

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.5 testlist=sanity
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-3

Change-Id: I47b80f5fec166220ac25563460a3b0f4fbd2e6bb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/57030
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
6 months agoNew release 2.15.6 2.15.6 v2_15_6
Andreas Dilger [Wed, 27 Nov 2024 21:38:43 +0000 (14:38 -0700)]
New release 2.15.6

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: Ibfd72654a39441de5f7d9aadeae05eef3f500c1e

6 months agoNew RC 2.15.6-RC1 2.15.6-RC1 v2_15_6-RC1
Oleg Drokin [Mon, 18 Nov 2024 17:46:15 +0000 (12:46 -0500)]
New RC 2.15.6-RC1

Change-Id: Ib3857268cee9d89bd1fa2212e6ef53d45cf55513
Signed-off-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-18435 lod: recover layout generation from replay 89/56989/4
Alex Zhuravlev [Tue, 12 Nov 2024 20:23:03 +0000 (12:23 -0800)]
LU-18435 lod: recover layout generation from replay

The offset of the layout generation is different between struct
lov_mds_md_v1/v3.lmm_layout_gen and lov_comp_md.lcm_layout_gen.
When checking/setting layout gen, we must use layout-specific field.

Otherwise layout generation can be set to 0 (or other random value)
after replay and client can't apply new layout during later update.

Lustre-change: https://review.whamcloud.com/56950
Lustre-commit: 1d8a667073b9ef59b6c430642805efec91546ecf
Fixes: 13557aa86904 ("LU-15300 mdt: refresh LOVEA with LL granted")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5e4a63cd097d157317e0e8d1a0fca4a46817d118
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56989
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-18387 kernel: update RHEL 9.5 [5.14.0-503.11.1.el9_5] 98/56998/2
Jian Yu [Wed, 13 Nov 2024 06:14:13 +0000 (22:14 -0800)]
LU-18387 kernel: update RHEL 9.5 [5.14.0-503.11.1.el9_5]

Update RHEL 9.5 kernel to 5.14.0-503.11.1.el9_5 for Lustre client.

Lustre-change: https://review.whamcloud.com/56997
Lustre-commit: TBD (from 929971901f8ca3f90fe593005002865327b137dd)

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.5 testlist=sanity
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-3

Change-Id: I9bc6924c4a71f743acd9df99042df23fdf614593
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56998
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17974 quota: fix qmt_pool_lqes_lookup_spec 36/55536/3
Sergey Cheremencev [Wed, 13 Nov 2024 08:21:19 +0000 (00:21 -0800)]
LU-17974 quota: fix qmt_pool_lqes_lookup_spec

Return 0 from qmt_pool_lqes_lookup_spec if
between found lqes exists global lqe. And
return -ENOENT if
* no lqes have been found
* no global lqe between found lqes
This patch aimed to prevent below panic:

 (qmt_lock.c:957:qmt_id_lock_notify())
ASSERTION( lqe->lqe_is_global ) failed:
 (qmt_lock.c:957:qmt_id_lock_notify()) LBUG
 ...
 Call Trace TBD:
 libcfs_call_trace+0x6f/0xa0 [libcfs]
 lbug_with_loc+0x3f/0x70 [libcfs]
 qmt_id_lock_notify+0x1ee/0x330 [lquota]
 qmt_site_recalc_cb+0x34b/0x550 [lquota]
 cfs_hash_for_each_tight+0x122/0x310 [libcfs]
 qmt_pool_recalc+0x375/0xa80 [lquota]
 kthread+0x134/0x150
 ret_from_fork+0x35/0x40
 Kernel panic - not syncing: LBUG

Lustre-change: https://review.whamcloud.com/55535
Lustre-commit: c97b327758f06f6bf3229126e9aa7b36865e7b92

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I62a2175b7b05c49f28b4e87c36ed653d1b9a71cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55536
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16639 misc: cleanup concole messages 27/56727/4
Andreas Dilger [Wed, 13 Nov 2024 02:16:20 +0000 (18:16 -0800)]
LU-16639 misc: cleanup concole messages

The lprocfs_job_cleanup() was not properly dropping all jobstats
from the hash table and printing errors from job_stat_exit() at
unmount.  Ensure all stats are "old enough" when @clear is set.

Change early libcfs cfs_cpu_init() messages from CERROR() to
pr_err() to avoid circular dependencies on libcfs setup before
printing an error message to the console during module init.

Lustre-commit: 8f40a3d7110da1af8e310a4b7f40b86f13080938
Lustre-change: https://review.whamcloud.com/50283

Test-Parameters: trivial
Fixes: ea2cd3af7b ("LU-11407 obdclass: add start time to stats files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ide3f502103392a79419cc1836200bf5a1a3ebbe5
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Signed-off-by: Eric Carbonneau <carbonneau1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56727
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-13308 mdc: support additional flags for OBD_IOC_CHLG_POLL ioctl 82/54982/3
James Simmons [Thu, 2 May 2024 01:31:53 +0000 (21:31 -0400)]
LU-13308 mdc: support additional flags for OBD_IOC_CHLG_POLL ioctl

Currently the mdc kernel code expects the flag argument for
OBD_IOC_CHLG_POLL ioctl to only be CHANGELOG_FLAG_FOLLOW. With
IPv6 we need to send a request to the kernel to present the NID
in the struct lnet_nid format since we can't just send large NIDs
to user land if we are using older tools.

With the newer user land tools we will be sending an expanded flag
which the current kernel changelog code can't handle. Rework the
code to support the new flag if we end up with the case of newer
user land tools and an older kernel. This code will also maintain
backwards compatiblity with the older user land tools.

Lustre-change: https://review.whamcloud.com/52361
Lustre-commit: 8320394725180b76e76f36b8a513f3c7bf11e65c

Change-Id: I26a80d30ce2ebf2075a2a8f510ff81d6b0b8d848
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52361
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54982
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17152 tests: unmount NFS clients with zconf_umount_clients 86/56886/2
Jian Yu [Tue, 5 Nov 2024 04:35:27 +0000 (20:35 -0800)]
LU-17152 tests: unmount NFS clients with zconf_umount_clients

This patch fixes cleanup_nfs() to unmount NFS clients by running
zconf_umount_clients(), which can find and kill active processes
that are accessing the NFS mount point so as to avoid the
"device is busy" failure.

The patch also adds racer_on_nfs test into always_except list for
parallel-scale-nfsv4 due to LU-17154.

Lustre-change: https://review.whamcloud.com/52533
Lustre-commit: 563deecae0ac2690b6d8d5571bf7af09408943cd

Test-Parameters: trivial testlist=parallel-scale-nfsv4

Change-Id: I37a38502362399540c28e78d1343e768b490ce8b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-12706 tests: sanity-quota 4a sync timeout fix 10/56910/2
Sergey Cheremencev [Wed, 6 Nov 2024 20:56:58 +0000 (12:56 -0800)]
LU-12706 tests: sanity-quota 4a sync timeout fix

Don't sync all OSTs in a system - this might take
too much time. Instead, set striping only on OST0000
and sync only MDTs and OST0000. This fix is against
the following failure:

  FAIL: Passed grace time 20, 15669105271566910563

Lustre-change: https://review.whamcloud.com/55216
Lustre-commit: 9e7b239bbd26b601127073bb0c6789cb9def7073

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I525e6c73c6d14a126a2bde7d92bc28f11f3c78c8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56910
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10733 tests: increase conf-sanity/106 OST size 18/56918/2
Andreas Dilger [Thu, 7 Nov 2024 20:38:57 +0000 (12:38 -0800)]
LU-10733 tests: increase conf-sanity/106 OST size

conf-sanity test_106 is trying to create ~64k files, but OST0000
only has about 48k objects in this case, so the file creates are
failing during the test.  This makes the test somewhat unreliable
and hitting errors not related to what was originally intended
(llog wrap handling).

Increase the OSTSIZE for this test to handle the number of objects
needed by the test so it can run more reliably.

Lustre-change: https://review.whamcloud.com/50732
Lustre-commit: 334d780617561c66c91697fb1681ce24b5379387

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY=106
Test-Parameters: testlist=conf-sanity env=ONLY=106
Test-Parameters: testlist=conf-sanity env=ONLY=106
Test-Parameters: testlist=conf-sanity env=ONLY=106
Test-Parameters: testlist=conf-sanity env=ONLY=106
Test-Parameters: testlist=conf-sanity env=ONLY=106

Test-Parameters: optional env=SLOW=yes,ENABLE_QUOTA=yes \
  clientdistro=el8.9 serverdistro=el8.10 testlist=conf-sanity

Test-Parameters: optional env=SLOW=yes,ENABLE_QUOTA=yes \
  clientdistro=ubuntu2204 serverdistro=el8.9 testlist=conf-sanity

Test-Parameters: optional env=SLOW=yes,ENABLE_QUOTA=yes \
  clientdistro=sles15sp5 serverdistro=el8.9 testlist=conf-sanity

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie33825801172ea565d9d1d5fb81595d2cad65677
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56918
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-18407 tests: check Lustre-patched filefrag 99/56899/2
Jian Yu [Wed, 6 Nov 2024 00:17:31 +0000 (16:17 -0800)]
LU-18407 tests: check Lustre-patched filefrag

In Lustre test suites, there are some subtests using filefrag
from Lustre-patched e2fsprogs. This patch adds checks in those
subtests to skip them if the Lustre-patched e2fsprogs is not
installed on Lustre client.

Test-Parameters: trivial
Test-Parameters: env=ONLY="228" clientdistro=ubuntu2204 testlist=sanity-hsm
Test-Parameters: env=ONLY="24a" clientdistro=ubuntu2204 testlist=sanity-pfl
Test-Parameters: env=ONLY="56" clientdistro=ubuntu2204 testlist=sanity-sec
Test-Parameters: env=ONLY="77n 130" clientdistro=ubuntu2204 testlist=sanity

Change-Id: I86e2edd18052ff7fb19e7cbcbb660aa383824372
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56899
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-7665 test: improve sanity 300p 41/56941/3
Lai Siyao [Fri, 8 Nov 2024 19:14:26 +0000 (11:14 -0800)]
LU-7665 test: improve sanity 300p

Sanity test 300p set OBD_FAIL_OUT_ENOSPC once, but it may fail llog
operation (not critical), therefore subsequent mkdir succeeds. Change
the fail_loc to always fail so the test can be more robust.

Lustre-change: https://review.whamcloud.com/54625
Lustre-commit: ac04484c1beec9f46d1256e8ea236f24073344af

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I128ce39aaf97e1785a8c135a696d0b404b48a2a8
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56941
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10994 test: remove netdisk from obdfilter-survey 13/49413/6
John L. Hammond [Mon, 11 Nov 2024 23:16:19 +0000 (15:16 -0800)]
LU-10994 test: remove netdisk from obdfilter-survey

Remove the netdisk case from obdfilter-survey. Remove subtests that
use echo_client over osc devices.

Lustre-change: https://review.whamcloud.com/47239
Lustre-commit: 51c491dac6aec99fc328732b4358e8d5732dc230

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I260001241cee3027f68e62077e5817221bd0c08b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49413
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17985 osd-ldiskfs: drop osd object if failed to create 40/56940/2
Hongchao Zhang [Fri, 8 Nov 2024 19:07:35 +0000 (11:07 -0800)]
LU-17985 osd-ldiskfs: drop osd object if failed to create

In osd_create, if the newly created inode had already contained
correct XATTR_NAME_LMA but failed to update the OI, it will clear
osd_object->oo_inode, the osd_object should also be dropped.

Lustre-change: https://review.whamcloud.com/55571
Lustre-commit: 40e27b4251bec6d60ce0a6310a5ac7094980f9a3

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4ff5952c154ce459c78514b88b1810471635c703
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15496 tests: fix sanity/398c to use proper OSC name 77/56977/2
Andreas Dilger [Mon, 11 Nov 2024 19:16:00 +0000 (11:16 -0800)]
LU-15496 tests: fix sanity/398c to use proper OSC name

For ppc64le and aarch64 clients, the OSC import instance name does
not have "ffff" at the start, so use the proper device name for this
subtest.

Clean up the rest of test_398c to meet modern test code style.

Lustre-change: https://review.whamcloud.com/55132
Lustre-commit: b1b57bcadeeb5a87ac75387c4aa4ae084e1a27e0

LU-15496 tests: add debugging to sanity/398c

Dump the rpc_stats to help understand why the test is failing.

Lustre-change: https://review.whamcloud.com/53462
Lustre-commit: 304ca31e2aa15c576e468a86e45d8817c8eca391

Test-Parameters: trivial testlist=sanity clientarch=ppc64le env=ONLY=398c,ONLY_REPEAT=100

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8c72fa9b13eace009f39daf82454221eba6761b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56977
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-14472 quota: skip non-exist or inact tgt for lfs_quota 42/56942/2
Hongchao Zhang [Fri, 8 Nov 2024 19:25:58 +0000 (11:25 -0800)]
LU-14472 quota: skip non-exist or inact tgt for lfs_quota

The nonexistent or inactive targets (MDC or OSC) should be skipped
for "lfs quota".

Lustre-change: https://review.whamcloud.com/41771
Lustre-commit: b54b7ce43929ce7ff6e48cd219623c264ca6b6b3

Change-Id: I25eece413715e4e05dd94ccbfd101220da7477f9
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56942
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15839 tests: correct the ZFS grace time for sanity-quota 4a 27/56927/3
Etienne AUJAMES [Fri, 8 Nov 2024 19:03:02 +0000 (11:03 -0800)]
LU-15839 tests: correct the ZFS grace time for sanity-quota 4a

For  sanity-quota 4a, the grace time is increased from 12s to 20s but
not actually set on filesystem.

Lustre-change: https://review.whamcloud.com/47289
Lustre-commit: 8f306f00c02e5455cef48d227f28e8cb90127719

Fixes: 3e4c3fdc ("LU-6836 test: re-add test 4a to sanity-quota for ZFS")
Test-Parameters: fstype=zfs testlist=sanity-quota env=ONLY=4a,ONLY_REPEAT=100
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I2324e818a42a19bc9928f127b1622f1e5274db1f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15055 lod: run qmt_pool_* only from the MDT0000 config 39/56939/2
Etienne AUJAMES [Fri, 8 Nov 2024 18:51:38 +0000 (10:51 -0800)]
LU-15055 lod: run qmt_pool_* only from the MDT0000 config

On the first mds (with MDT0000/QMT0000), if there is more than one MDT
target, qmt_pool_{new/del/rem/add} functions will be call several
times on QMT0000 for the same pool.

This resulting to the following error in dmseg:
LustreError: 5659:0:(qmt_pool.c:1390:qmt_pool_add_rem()) add to: can't
scratch-QMT0000 scratch-OST0000_UUID pool pool1: rc = -17

This patch run qmt_pool_* only from a record config from the MDT0000.
The qmt_pool_add_rem() dmesg error is checked on sanity-quota test_1b.

Lustre-change: https://review.whamcloud.com/47059
Lustre-commit: 0f158c6a093e059d89f637f31d34742078c38209

Test-Parameters: mdtcount=2 mdscount=1 testlist=sanity-quota
Fixes: 09f9fb32 ("LU-11023 quota: quota pools for OSTs")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ia6b712abe25a4d68770753e3408c3321181db1aa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17750 kernel: update SLES15 SP4 [5.14.21-150400.24.100.2] 92/55192/8
Jian Yu [Tue, 5 Nov 2024 20:31:10 +0000 (12:31 -0800)]
LU-17750 kernel: update SLES15 SP4 [5.14.21-150400.24.100.2]

Update SLES15 SP4 kernel to 5.14.21-150400.24.100.2 for Lustre client.

Lustre-change: https://review.whamcloud.com/54823
Lustre-commit: 4cdabc2c25f71ed968d8c2300d3b717e3160d46e

Test-Parameters: trivial env=SANITY_EXCEPT="154b" \
  mdtcount=4 mdscount=2 clientdistro=sles15sp4 testlist=sanity

Test-Parameters: optional clientdistro=sles15sp4 testgroup=full-part-1
Test-Parameters: optional clientdistro=sles15sp4 testgroup=full-part-2
Test-Parameters: optional clientdistro=sles15sp4 testgroup=full-part-3

Change-Id: I401e97f602e6c8c62fac73e3603eb0226745bba1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
6 months agoLU-18423 kernel: update RHEL 8.10 [4.18.0-553.27.1.el8_10] 95/56895/2
Jian Yu [Tue, 5 Nov 2024 20:24:09 +0000 (12:24 -0800)]
LU-18423 kernel: update RHEL 8.10 [4.18.0-553.27.1.el8_10]

Update RHEL 8.10 kernel to 4.18.0-553.27.1.el8_10.

Lustre-change: https://review.whamcloud.com/56888
Lustre-commit: TBD (from b084d5534a15741094a51ee40c9a1d5e9cfbf5e1)

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.10 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-3

Change-Id: I3737c1f1b2941d2095225f1ab80fd76768c4782c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56895
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-18414 kernel: update RHEL 9.4 [5.14.0-427.42.1.el9_4] 79/56879/3
Jian Yu [Tue, 5 Nov 2024 20:07:56 +0000 (12:07 -0800)]
LU-18414 kernel: update RHEL 9.4 [5.14.0-427.42.1.el9_4]

Update RHEL 9.4 kernel to 5.14.0-427.42.1.el9_4 for Lustre client.

Lustre-change: https://review.whamcloud.com/56845
Lustre-commit: TBD (from 72b19d2215d4b476faf5d5b0a955ce5c22873f86)

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.4 testlist=sanity
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-3

Change-Id: Ib1b95bcaf35a9f8ed80fe7a33b51127086dd412c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56879
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17573 lov: change default object size. 00/56100/2
Alexey Lyashkov [Thu, 22 Feb 2024 06:38:03 +0000 (09:38 +0300)]
LU-17573 lov: change default object size.

OST don't able to use indirects for long time,
let's switch a object size to extent based.

Lustre-commit: f315a3a594a78ecd47fcd74177fa73fb2efff59c
Lustre-change: https://review.whamcloud.com/54137

Test-Parameters: trivial
HPe-bug-id: LUS-11428
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Signed-off-by: Eric Carbonneau <carbonneau1@llnl.gov>
Change-Id: I9759fc7122c41075ebc35d52ade342c37706b041
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56100
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17421 build: Update check for arc_prune_func_t parameters 19/54819/6
Brian Atkinson [Fri, 12 Jan 2024 00:36:59 +0000 (17:36 -0700)]
LU-17421 build: Update check for arc_prune_func_t parameters

In OpenZFS 2.2.1 the code for arc_prune_async() was unified so that
FreeBSD and Linux did not have their own implementation versions of
the same code. Part of this update changed first parameter for the
arc_prune_func_t to be an uint64_t.

Without this patch, Lustre would not build with ZFS 2.2.1 because of
a failure for incompatible pointer types for the arc_prunte_func_t
function pointer passed to arc_add_prune_callback().

Lustre-change: https://review.whamcloud.com/53664
Lustre-commit: 303cfe3372349974ff7cd610ad878b618ce4ee29

Test-Parameters: trivial
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Eric Carbonneau <carbonneau1@llnl.gov>
Change-Id: Iaa03cc9421f27a8517ce04817f04102de9adb86a
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54819
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-16791 utils: ZFS 2.2 const prop args 18/54818/9
Brian Atkinson [Tue, 26 Sep 2023 18:35:43 +0000 (12:35 -0600)]
LU-16791 utils: ZFS 2.2 const prop args

ZFS 2.2 now expects const char * from certain interfaces in
sys/nvpair.h. I updated the build system to detect if this is the case
and if so update the paramters passed to certain functions in
libmount_utils_zfs.c to account for these changes.

Without this patch, Lustre master would not build with ZFS master and
the 2.2 release candidates.

Lustre-change: https://review.whamcloud.com/52519
Lustre-commit: b4b32ffd22d276bc1d8f40e3336df982f3717070

Test-Parameters: trivial testgroup=review-dne-zfs-part-1
Test-Parameters: testgroup=review-dne-zfs-part-2
Test-Parameters: testgroup=review-dne-zfs-part-3
Test-Parameters: testgroup=review-dne-zfs-part-4
Test-Parameters: testgroup=review-dne-zfs-part-5
Test-Parameters: testgroup=review-dne-zfs-part-6
Test-Parameters: testgroup=review-dne-zfs-part-7
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Eric Carbonneau <carbonneau1@llnl.gov>
Change-Id: I0469eeff6dafa6c276fc616381530b6b679d9da1
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-15689 libcfs: libcfs_debug_mb set incorrectly on init 38/54538/3
Chris Horn [Wed, 23 Mar 2022 06:21:06 +0000 (01:21 -0500)]
LU-15689 libcfs: libcfs_debug_mb set incorrectly on init

If libcfs_debug_mb parameter is specified to insmod (i.e. set before
module is initialized) then it does not get initialized correctly.

libcfs_param_debug_mb_set() expects cfs_trace_get_debug_mb() to return
zero if the module has not been initialized yet, but
cfs_trace_get_debug_mb() will return 1 in this case. Modify
cfs_trace_get_debug_mb() to return zero as expected. A related issue
is that in this case we need to call cfs_trace_get_debug_mb() after
cfs_tracefile_init() so that libcfs_debug_mb gets the same value it
would get if we had set it after module init.

When libcfs_debug_mb is specified to insmod, libcfs_debug_init()
divides its value by num_possible_cpus(), but this is already done in
libcfs_param_debug_mb_set().

Lustre-change: https://review.whamcloud.com/c/46925/
Lustre-commit: d38ef181d8250b083553ec95209c28c1dc11fa99

Test-Parameters: trivial
Fixes: 8b78a3ffb5 ("LU-9859 libcfs: always range-check libcfs_debug_mb setting.")
HPE-bug-id: LUS-10839
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1003758156acb5cf6ea30bbdfd7b45a743a2a5aa
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54538
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-15618 lnet: Return ESHUTDOWN in lnet_parse() 59/49259/3
Chris Horn [Thu, 3 Mar 2022 07:12:32 +0000 (01:12 -0600)]
LU-15618 lnet: Return ESHUTDOWN in lnet_parse()

If the peer NI lookup in lnet_parse() fails with ESHUTDOWN then we
should return that value back to the LNDs so that they can treat the
failed call the same way as other lnet_parse() failures.

Returning zero results in at least one bug in socklnd where a
reference on a ksock_conn can be leaked which prevents socklnd from
shutting down.

Lustre-change: https://review.whamcloud.com/46711
Lustre-commit: 4fbd0705a3d25bbc85e953f81e697e5006b215ce

Fixes: 47b7b31978 ("LU-8106 lnet: Do not drop message when shutting down LNet")
Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-15794
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Ic403619c6dccf3921c46a674808c404adad7a30e
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 months agoLU-12511 llite: use mapping_set_error instead of opencoded set_bit 53/55553/4
Michal Hocko [Thu, 11 Jul 2024 21:05:26 +0000 (17:05 -0400)]
LU-12511 llite: use mapping_set_error instead of opencoded set_bit

The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it.  Use the helper directly.

[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
Linux-commit: 5114a97a8bce7f4ead29a32b67dee85438699b9e

Lustre-change: https://review.whamcloud.com/51372
Lustre-commit: aac625055e50e83d7716bdfc6ecfab3282eb0ad2

Change-Id: I153bc04d4745a20013820ba81572cadb37ab8f39
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51372
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
7 months agoLU-17440 lnet: prevent errorneous decref for asym route 06/54906/4
Gian-Carlo DeFazio [Thu, 29 Feb 2024 00:44:48 +0000 (16:44 -0800)]
LU-17440 lnet: prevent errorneous decref for asym route

The following stack trace was seen on a lustre server:
Call Trace TBD:
[<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[<0>] lnet_destroy_peer_ni_locked+0x44d/0x4e0 [lnet]
[<0>] lnet_handle_find_routed_path+0x86c/0xee0 [lnet]
[<0>] lnet_select_pathway+0xb95/0x16c0 [lnet]
[<0>] lnet_send+0x6d/0x1e0 [lnet]
[<0>] lnet_parse_local+0x3ed/0xdd0 [lnet]
[<0>] lnet_parse+0xd7d/0x1490 [lnet]
[<0>] kiblnd_handle_rx+0x30e/0x900 [ko2iblnd]
[<0>] kiblnd_scheduler+0x104b/0x10d0 [ko2iblnd]
[<0>] kthread+0x14c/0x170
[<0>] ret_from_fork+0x1f/0x40

It was discovered that the lnet routes between the server
and a client cluster were misconfigured, so that the clients
had routes to the server through all 8 available routers,
but the server had routes to the clients through only 7 of
the routers.

The server was contacted by a client node through the
router with the missing route. It incremented the ref count
for the corresponding struct lnet_peer_ni for that router,
but then, because it had no route through that peer, changed
the value of the struct lnet_peer_ni to a peer with a route
back to the client. It then decremented the new
struct lnet_peer_ni which resulted in the ref count being
decremented to 0 which caused an LBUG.

Detect if the peer is a router to the appropriate net.
If so, decrement its ref count at the end of the function,
if not, decrement its ref count immediately.

Lustre-change: https://review.whamcloud.com/53896
Lustre-commit: 2b210f39059be998b80b0acc13c12451960b63bb

Fixes: 2e27193 ("LU-17062 lnet: Update lnet_peer_*_decref_locked usage")
Test-Parameters: testlist=sanity-lnet mdscount=1 osscount=2 clientcount=1
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I2d00faef60ae8768afa7afbb1b00a62ba90535bb
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54906
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-18345 test: interop check for sanity-quota 2 88/56688/4
Hongchao Zhang [Mon, 30 Sep 2024 15:41:18 +0000 (23:41 +0800)]
LU-18345 test: interop check for sanity-quota 2

The "least qunit" had been renamed to "least_qunit" in 2.15.51,
adding interop handling for it.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-quota env=ONLY=2 serverjob=lustre-master serverbuildno=4586
Fixes: cd1847e73e ("LU-14535 quota: improve quota output format")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I1a2cbe66280c2165e0da78ca93605113f9d8e974
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56688
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 months agoLU-18344 test: sanity test_247f interop fix 33/56733/2
Lai Siyao [Sun, 29 Sep 2024 18:04:25 +0000 (14:04 -0400)]
LU-18344 test: sanity test_247f interop fix

2.16 always enables remote subdir mount, update sanity test_247f.

Test-Parameters: trivial
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibe04d307a5596a6047d5fd301e19c33bf07f1e21
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56733
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-16390 tests: check Lustre filefrag in sanity-flr/49a 05/56605/2
Andreas Dilger [Tue, 8 Oct 2024 00:18:13 +0000 (17:18 -0700)]
LU-16390 tests: check Lustre filefrag in sanity-flr/49a

Check that a Lustre-patched filefrag is installed when running
sanity-flr test_49a.

Lustre-change: https://review.whamcloud.com/49386
Lustre-commit: 37f18670e49b8150170f9b724b5f7089fa176c4e

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic909ea4ca160d47480004f53a96ce7539ce5076c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56605
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <adeiter@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-18341 tests: skip sanity-flr/test_36 for new servers 93/56693/5
Bobi Jam [Tue, 15 Oct 2024 10:49:54 +0000 (18:49 +0800)]
LU-18341 tests: skip sanity-flr/test_36 for new servers

2.16 servers allows layout version update from client while 2.15
does not allow it, so we'd skip sanity-flr/test_36 which would
check this behavior.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-flr env=ONLY=36 serverjob=lustre-master serverbuildno=4586
Fixes: fa6574150b ("LU-14642 flr: allow layout version update from client/MDS")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I50d81922217b8a864053ba8781f4627f02410717
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56693
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 months agoLU-18387 kernel: new kernel [RHEL 9.5 5.14.0-503.2.1.el9_5] 54/56754/6
Shaun Tancheff [Wed, 30 Oct 2024 17:25:58 +0000 (10:25 -0700)]
LU-18387 kernel: new kernel [RHEL 9.5 5.14.0-503.2.1.el9_5]

This patch makes changes to support new RHEL 9.5 release
for Lustre client.

Lustre-change: https://review.whamcloud.com/56748
Lustre-commit: TBD (from a347e8bece92e00af02d5499b092700954c4fb8e)

LU-17243 build: compatibility updates for kernel 6.6

linux kernel v5.19-rc1-4-gc4f135d64382
  workqueue: Wrap flush_workqueue() using a macro
linux kernel v6.5-rc1-7-g20bdedafd2f6
  workqueue: Warn attempt to flush system-wide workqueues.
If __flush_workqueue(system_wq) is not available fall back to
flush_scheduled_work()

Lustre-change: https://review.whamcloud.com/52908
Lustre-commit: a0e6d6f7327598d13661bb14098a9f21f2035285

LU-17592 build: compatibility updates for kernel 6.8

Linux commit v6.7-rc1-3-gda549bdd15c2
  dentry: switch the lists of children to hlist
Provide trival wrappers to abstract the changed members

Lustre-change: https://review.whamcloud.com/54229
Lustre-commit: 6d27c2c8c72e853a238fd3fc7f42d658188ca02f

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.5 testlist=sanity
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.5 testgroup=full-part-3

Change-Id: I1bce12b2b7190bcbd880916049667630aba700c8
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56754
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17696 llite: remove LASSERT from ll_ddelete() 30/56830/2
Jian Yu [Wed, 30 Oct 2024 17:21:57 +0000 (10:21 -0700)]
LU-17696 llite: remove LASSERT from ll_ddelete()

On Linux kernel 6.8, the changes in commit 2f42f1eb9093
("Call retain_dentry() with refcount 0") made d_delete()
instances called for dentries with ->d_lock held and
refcount equal to 0, which caused the following assertion
failure on Lustre client:

(dcache.c:136:ll_ddelete()) ASSERTION( d_count(de) == 1 ) failed

The value of d_count(de) became 0 instead of 1. Since
retain_dentry() was called either with refcount 0 or 1,
we can simply remove the LASSERT(ll_d_count(de) == 1)
from ll_ddelete() to avoid the above failure.

Lustre-change: https://review.whamcloud.com/54676
Lustre-commit: 0176629ab3f71e88850ab95796b0e519c4d0f740

Change-Id: Ic4a39d9328326634190cd0719b4c0637e1bf315c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-16520 build: Move strscpy to libcfs common header 66/56766/4
Shaun Tancheff [Wed, 23 Oct 2024 06:25:55 +0000 (23:25 -0700)]
LU-16520 build: Move strscpy to libcfs common header

Ensure strscpy is available to lustre

Lustre-change: https://review.whamcloud.com/49863
Lustre-commit: 7fe7f4ca06b9c8d128f7ba36988e36f8141ed53d

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I0c3673c2aa7e6b61671521a8cabde8a364f7f6f8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56766
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c 65/56765/6
James Simmons [Fri, 25 Oct 2024 19:51:04 +0000 (12:51 -0700)]
LU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c

Move the libcfs_mem.c code to the LNet core. The prototypes are declared in libcfs_cpu.h
but we don't move them yet since the CPT code depends on the libcfs_mem.c work. This can
end up in a modular cyclic dependency if we move the CPT work right away so limit what is
changed at this point.

Lustre-change: https://review.whamcloud.com/52701
Lustre-commit: 24d515367f44de6b92b453cc9a1c8384e52b5e3f

LU-9859 lnet: move CPT handling to LNet

The CPT work is used for LNet and ptlrpc which is the Lustre LNet
interface. Move this work there and merge the lib-mem.c code as
well since they both work closely together. Move cpt debugfs
handling from libcfs to lnet. Now all remaining debugfs in libcfs
is for debugging.

Lustre-change: https://review.whamcloud.com/52923
Lustre-commit: 7f8cde3b77ada95e8b96dee1996f8d40bd17a538

LU-9859 libcfs: remove workitem.

There are no more users of the "workitem" code so it can be removed.
Lustre uses Linux workqueues instead.

Lustre-change: https://review.whamcloud.com/50462
Lustre-commit: 1782884fa247da0c1400ee6307596b64d6aaa440

Test-Parameters: trivial
Change-Id: I6bf5cd9f20033f988dde1989f0fc5f89ea74b5a2
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56765
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-9859 libcfs: move percpt_lock into lnet 93/56793/2
Mr NeilBrown [Fri, 25 Oct 2024 18:57:43 +0000 (11:57 -0700)]
LU-9859 libcfs: move percpt_lock into lnet

lnet is the only users of percpt_lock - and there are only two such
locks!
So move the code into lnet, as part of deprecating libcfs.

Lustre-change: https://review.whamcloud.com/50832
Lustre-commit: c4e2563ff3bfa84ab7558c2aced32445da543ef6

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id7091e88cf61228aa031921747fb9c7b08214931
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-9859 lnet: convert selftest to use workqueues 74/56774/2
Mr NeilBrown [Thu, 24 Oct 2024 00:29:39 +0000 (17:29 -0700)]
LU-9859 lnet: convert selftest to use workqueues

Instead of the cfs workitem library, use workqueues.

As lnet wants to provide a cpu mask of allowed cpus, it
needs to be a WQ_UNBOUND work queue so that tasks can
run on cpus other than where they were submitted.
We use alloc_ordered_workqueue for lst_sched_serial (now called
lst_serial_wq) - "ordered" means the same as "serial" did.
We use cfs_cpt_bind_queue() for the other workqueues which sets up the
CPU mask as required.

An important difference with workqueues is that there is no equivalent
to cfs_wi_exit() which can be called in the action function and which
will ensure the function is not called again - and that the item is no
longer queued.

To provide similar semantics we treat swi_state == SWI_STATE_DONE as
meaning that the wi is complete and any further calls must be no-op.
We also call cancel_work_sync() (via swi_cancel_workitem()) before
freeing or reusing memory that held a work-item.

To ensure the same exclusion that cfs_wi_exit() provided the state is
set and tested under a lock - either crpc_lock, scd_lock, or tsi_lock
depending on which structure the wi is embedded in.

Another minor difference is that with workqueues the action function
returns void, not an int.

Also change SWI_STATE_* from #define to an enum.  The only place these
values are ever stored is in one field in a struct.

Linux-commit: 6106c0f82481e686b337ee0c403821fb5c3c17ef
Linux-commit: 3fc0b7d3e0a4d37e4c60c2232df4500187a07232
Linux-commit: 7d70718de014ada7280bb011db8655e18ed935b1

Lustre-change: https://review.whamcloud.com/36991
Lustre-commit: 51dd6269c91dab7543cd9dfd1848c983efa6db36

Test-Parameters: trivial testlist=lnet-selftest
Change-Id: I5ccf1399ebbfdd4cab3696749bd1ec666147b757
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-9859 libcfs: move kernel specific code out of libcfs core 63/56763/2
James Simmons [Wed, 23 Oct 2024 00:29:11 +0000 (17:29 -0700)]
LU-9859 libcfs: move kernel specific code out of libcfs core

Over time kernel version specific code has leaked into the libcfs
core code. Move that code to the linux subdirectory code so in
the future code cleanup is not missed.

Lustre-change: https://review.whamcloud.com/52010
Lustre-commit: 8754693fe6ddac4b74e27800a05d5aea00bb0359

Test-Parameters: trivial
Change-Id: I38a00c377334066160083edd3932d4a718198497
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56763
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-8130 libcfs: don't use radix tree for xarray 62/56762/2
James Simmons [Wed, 23 Oct 2024 00:19:44 +0000 (17:19 -0700)]
LU-8130 libcfs: don't use radix tree for xarray

For newer kernels the radix tree is totally based on Xarray. For Lustre
support for RHEL7 we backported Xarray but it still was using the
radix tree. Their is a mismatch between what the radix tree expects
and using a struct xa_node when allocating and freeing memory. Instead
abandon all use of the radix tree with Xarray. We use our own private
kmem cache which is based on radix tree but it uses xa_node.

Lustre-change: https://review.whamcloud.com/51840
Lustre-commit: 778791dd7da107710c2311935a24cfd7e7a5fd85

LU-17052 libcfs: fix build for old kernel

Fix build for kernel v4.17 to v4.19.
These old kernels already have xarray.h and #include by fs.h but
don't have full xarray support. It is needed to #include libcfs's
xarray.h also to contain xarray support.

Rename the header define macro to ensure libcfs's xarray.h will be
included。

Lustre-change: https://review.whamcloud.com/52090
Lustre-commit: 778791dd7da107710c2311935a24cfd7e7a5fd85

Test-Parameters: trivial
Test-Parameters: testlist=sanityn envdefinitions=ONLY=77,ONLY_REPEAT=20
Fixes: 84e12028be9a ("LU-9859 libcfs: add support for Xarray")
Fixes: 778791dd7da1 ("LU-8130 libcfs: don't use radix tree for xarray")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Change-Id: I87607aa0e55a4aca039f2fef5a76fbff0bedd9b3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-18350 tests: skip sanityn 33c/d interop 96/56696/5
Lai Siyao [Fri, 27 Sep 2024 19:29:48 +0000 (15:29 -0400)]
LU-18350 tests: skip sanityn 33c/d interop

Skip sanityn 33c 33d interop with 2.16 since they are DNE
Commit-on-Sharing related, and are refactored in 2.16.

Test-Parameters: trivial
Test-Parameters: testlist=sanityn env=ONLY=33 mdtcount=4 serverjob=lustre-master serverbuildno=4586
Fixes: 1d6b96a1cf ("LU-15529 mdt: optimize dir migration locking")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7487e2d2a142517dd425281517629fc42159b8b9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56696
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoNew release 2.15.5 2.15.5 v2_15_5
Oleg Drokin [Fri, 28 Jun 2024 18:07:09 +0000 (14:07 -0400)]
New release 2.15.5

Change-Id: Ibdf3d3d3d405f49a148da2fff4eb35ae50bce7dd
Signed-off-by: Oleg Drokin <green@whamcloud.com>
11 months agoNew RC 2.15.5-RC3 2.15.5-RC3 v2_15_5-RC3
Oleg Drokin [Wed, 26 Jun 2024 18:49:41 +0000 (14:49 -0400)]
New RC 2.15.5-RC3

Change-Id: I3eeee228c9747b1e09d0370235739891b220eb14
Signed-off-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16341 quota: fix panic in qmt_site_recalc_cb 18/55518/3
Sergey Cheremencev [Fri, 24 Jun 2022 20:38:29 +0000 (23:38 +0300)]
LU-16341 quota: fix panic in qmt_site_recalc_cb

The panic occurred due to empty qit_lqes array after
qmt_pool_lqes_lookup_spec. Sometimes it is possible if
global lqe is not enforced. Return -ENOENT from
qmt_pool_lqes_lookup_spec if no lqes have been added.

It fixes following panic:

    BUG: unable to handle NULL pointer dereference at 00000000000000f8
    ...
    RIP: 0010:qmt_site_recalc_cb+0x2ec/0x780 [lquota]
    ...
    cfs_hash_for_each_tight at ffffffffc0c72c81 [libcfs]
    qmt_pool_recalc at ffffffffc142dec7 [lquota]
    kthread at ffffffffb45043a6
    ret_from_fork at ffffffffb4e00255

Add test sanity-quota_14 that reproduces above panic without the fix,
but skip it for older MDS that do not have this fix.

Lustre-change: https://review.whamcloud.com/49241
Lustre-commit: dfe7d2dd2b0d4c0c08faa613f44d2ab1f74c7420

HPE-bug-id: LUS-11007
Change-Id: Ie51396269fae7ed84379bef5fc964cce789eba7c
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55518
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16709 lnet: fix locking multiple NIDs of the MR peer 86/55486/3
Serguei Smirnov [Tue, 4 Apr 2023 21:02:51 +0000 (14:02 -0700)]
LU-16709 lnet: fix locking multiple NIDs of the MR peer

If Lustre identifies the same peer with multiple NIDs,
as a result of peer discovery it is possible that
the discovered peer is found to contain a NID which is locked
as primary by a different existing peer record.
In this case it is safe to merge the peer records,
but the NID which got locked the earliest should be
kept as primary.

This allows for the first of the two locked NIDs
to stay primary as intended for the purpose of communicating
with Lustre even if peer discovery succeeded
using a different NID of MR peer.

Lustre-change: https://review.whamcloud.com/50530
Lustre-commit: 3b7a02ee4d656b7b3e044713681da2f56dddb152

Fixes: aacb16191a ("LU-14668 lnet: Lock primary NID logic")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iec9f8b70053fe24cddee552358500dfad0234b7f
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55486
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>