Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-16285 ldlm: send the cancel RPC asap
Yang Sheng [Sat, 14 Jan 2023 17:56:14 +0000 (01:56 +0800)]
LU-16285 ldlm: send the cancel RPC asap

This patch try to send cancel RPC ASAP when bl_ast
received from server. The exist problem is that
lock could be added in regular queue before bl_ast
arrived since other reason. It will prevent lock
canceling in timely manner. The other problem is
that we collect many locks in one RPC to save
the network traffic. But this process could take
a long time when dirty pages flushing.

 - The lock canceling will be processed even lock has
   been added to bl queue while bl_ast arrived. Unless
   the cancel RPC has been sent.
 - Send the cancel RPC immediatly for bl_ast lock. Don't
   try to add more locks in such case.

Lustre-change: https://review.whamcloud.com/49527
Lustre-commit: b65374d96b2027213f253e128d3e5b3943ff2240

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie5efff3f1ed4e46448371185a0c08968233e7644
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49651
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16445 sec: make nodemap root squash independent of map_mode
Sebastien Buisson [Thu, 5 Jan 2023 14:06:39 +0000 (15:06 +0100)]
LU-16445 sec: make nodemap root squash independent of map_mode

When the admin property is set to 0 on a nodemap, the root user must
be squashed, even if the map_mode property specifies to not map uids
or gids.

Enhance sanity-sec test_17 to exercise this use case.

Lustre-change: https://review.whamcloud.com/49561
Lustre-commit: 1335eb1d599ceb6423de6800e0995614cdb37bd8

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1b41caa1ccc6e544ce9fac45b47d0c4c129221f7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49797
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6791 lipe: lamigo, lpurge are not for customer use
Alexandre Ioffe [Thu, 26 Jan 2023 22:11:04 +0000 (14:11 -0800)]
EX-6791 lipe: lamigo, lpurge are not for customer use

Lamigo and lpurge helps notice that they are not intended
for direct customer use

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I36ba2da080156da2d62ffa215cd7eb98b5c10adc
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49794
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn74
Andreas Dilger [Wed, 25 Jan 2023 03:23:00 +0000 (20:23 -0700)]
RM-620 build: New tag 2.14.0-ddn74

New tag 2.14.0-ddn74

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0f3d9cfe21d8b9a3829edcc357d59a24b1e3084e

2 years agoLU-16228 tests: skip sanity/205e in interop tests
Andreas Dilger [Wed, 25 Jan 2023 00:51:18 +0000 (17:51 -0700)]
LU-16228 tests: skip sanity/205e in interop tests

Add a version check to sanity.sh test_205e and update the check
in test_205d to match the actual patch version that lljobstat
was landed in, so that it is not run during interop testing.

Test-Parameters: trivial testlist=sanity env=ONLY=205 serverversion=EXA6.1.0
Fixes: e9f9822822 ("LU-16228 utils: add lljobstat util")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34d517c4b33e88f316cedbd94c8f48ace63ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49756
Tested-by: jenkins <devops@whamcloud.com>
2 years agoLU-16486 kernel: kernel update RHEL8.7 [4.18.0-425.10.1.el8_7]
Jian Yu [Thu, 19 Jan 2023 20:27:45 +0000 (12:27 -0800)]
LU-16486 kernel: kernel update RHEL8.7 [4.18.0-425.10.1.el8_7]

Update RHEL8.7 kernel to 4.18.0-425.10.1.el8_7.

Lustre-change: https://review.whamcloud.com/49683
Lustre-commit: TBD (from 390b84b102f63ab8daade91b4a34960d097028d1)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Change-Id: I5759d0cb06a1148689ed9b8c947cb6516ab3aca1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49708
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16490 kernel: kernel update RHEL 7.9 [3.10.0-1160.81.1.el7]
Jian Yu [Thu, 19 Jan 2023 20:30:18 +0000 (12:30 -0800)]
LU-16490 kernel: kernel update RHEL 7.9 [3.10.0-1160.81.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.81.1.el7.

Lustre-change: https://review.whamcloud.com/49684
Lustre-commit: TBD (from 0a6b9460584046c0344204ad5169efac4d791e59)

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I46f556f327d92fde17790e223187df5b1c33d2c1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16165 sec: retry mechanism for identity cache
Sebastien Buisson [Fri, 16 Sep 2022 16:02:51 +0000 (18:02 +0200)]
LU-16165 sec: retry mechanism for identity cache

Implement a retry mechanism in the identity cache in case the
identity up call times out.

Lustre-change: https://review.whamcloud.com/48579
Lustre-commit: 61c3b3a9bb848e256845462ffd79b15565cd23ad

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib70d3b851a6da3cf66dfed49b03be51da7886d01
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49747
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6692 tests: run LFSCK in sanity-lipe-scan3/find3
Jian Yu [Mon, 23 Jan 2023 19:43:02 +0000 (11:43 -0800)]
EX-6692 tests: run LFSCK in sanity-lipe-scan3/find3

Some tests running "lfs rm_entry" will leave the FID in
.lustre/fid/ undeleted, which causes sanity-lipe-scan3
and sanity-lipe-find3 fail. Since it is not possible to
list the .lustre/fid/ directory, we have to run LFSCK to
link the FID back into .lustre/lost+found so as to
check if the file system needs to be reformatted.

Test-Parameters: trivial testlist=sanityn,sanity-lipe-scan3
Test-Parameters: trivial testlist=sanityn,sanity-lipe-find3

Fixes: 933691b3d7 ("EX-6692 tests: clean up test env in sanity-lipe-scan3.sh")
Fixes: de1bb57641 ("EX-6170 tests: make sanity-lipe-scan3.sh support remote MDS")
Fixes: 8b572c4de0 ("EX-6169 lipe: sanity-lipe-find3 reformat to clean lost+found")
Change-Id: If23479fc222052e25a7f21bcb70003c5176247b6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49674
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16026 llite: always enable remote subdir mount
Lai Siyao [Sun, 28 Aug 2022 19:33:29 +0000 (15:33 -0400)]
LU-16026 llite: always enable remote subdir mount

For historical reason, ROOT is revalidated with IT_LOOKUP in
.permission to ensure permission is update to date because ROOT is
never looked up. But ROOT FID and layout is not changeable, it's
PERM lock that should be revalidated, i.e., revalidate with
IT_GETATTR instead of IT_LOOKUP.

Since PERM|UPDATE lock is on the MDT where object is located, client
can cache this lock, therefore remote subdir mount doesn't need to
lookup ROOT in each file access.

Deprecate mdt.*.enable_remote_subdir_mount.

Per http://review.whamcloud.com/19195, replace 'df' with 'lfs df' in
sanity 228b since the former doesn't support transparent recovery.

Add sanity 247h.

Lustre-change: https://review.whamcloud.com/48535
Lustre-commit: 6f490275b0e0455a431707775d685fb3df1d322d

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I66f8ee347f6c01a8a154245b10a1d93539ea13b8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49673
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15082 osp: invalidate statfs data from the timer callback
Alex Zhuravlev [Tue, 12 Oct 2021 05:26:21 +0000 (08:26 +0300)]
LU-15082 osp: invalidate statfs data from the timer callback

osp_statfs_timer_cb() can be called just before statfs data gets
stale. this in turn may cause early wakeup to the precreate thread
which would find statfs data still up-to-data and go back to slepp.
if no precreate happens to this OSP (e.g. due to current space
usage), then the precreate thread will stay asleep for a long time,
statfs data won't get refreshed and this may block new objects
on the corresponding OST.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I86e16eed6f1068702db696a9ddec7a22994180b7
Reviewed-on: https://review.whamcloud.com/45199
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49694

2 years agoLU-16380 osd-ldiskfs: race in OI mapping
Lai Siyao [Sat, 17 Dec 2022 13:06:16 +0000 (08:06 -0500)]
LU-16380 osd-ldiskfs: race in OI mapping

There is race in OI scrub thread and OI mapping entry insertion, which
may add an inconsistent OI mapping entry, but not started OI scrub
thread. This may lead to osd_fid_lookup() always returns -EINPROGRESS.

To avoid such race, osd_fid_lookup() returns -EINPROGRESS only when
OI mapping is inconsistent, and OI scrub thread is not running.

Lustre-change: https://review.whamcloud.com/49514
Lustre-commit: 43fe6e51804f8fb4cca4445be576233595e27b42

Fixes: 558784caad ("LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs")
Test-Parameters: mdscount=2 mdtcount=4 testlist=conf-sanity env=ONLY=108b,ONLY_REPEAT=50
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I05114b6a33940c210e9952f6e24f6c36fd7f76a2
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49719
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16335 test: add fail_abort_cleanup()
Lai Siyao [Wed, 7 Dec 2022 04:04:42 +0000 (23:04 -0500)]
LU-16335 test: add fail_abort_cleanup()

Add helper fail_abort_cleanup() to unlink test directories (call lfs
rm_entry if directory is broken) after fail_abort because after
LU-16159 update logs will be canceled upon recovery abort, which may
leave broken directories.

Update replay-single.sh in places where fail_abort is called and
directory may become broken.

Lustre-change: https://review.whamcloud.com/49335
Lustre-commit: d5fe41a02a6ed57bcbfc4a4c695bb509c9c7c313

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I260689b1a6fa5b0b4db5aab5095cb062ae57d612
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49713
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16335 mdt: skip target check for rm_entry
Lai Siyao [Wed, 7 Dec 2022 02:53:25 +0000 (21:53 -0500)]
LU-16335 mdt: skip target check for rm_entry

For "lfs rm_entry", target may not exist, sanity check of it may fail
thus causes rm_entry fail.

Add sanity 832.

Lustre-change: https://review.whamcloud.com/49329
Lustre-commit: ae98c5fdaaf37daeb328b7110cbcf42754752c9d

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I824c7581af05c7494cf03c0c9bc999ca1abfec01
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49712
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16335 build: remove _GNU_SOURCE dependency in lustre_user.h
Lai Siyao [Thu, 1 Dec 2022 08:17:00 +0000 (03:17 -0500)]
LU-16335 build: remove _GNU_SOURCE dependency in lustre_user.h

The lustre_user.h header uses the non-standard strchrnul() function
in userspace.  This will always leads to LC_IOC_REMOVE_ENTRY configure
check to fail, and in the end "lfs rm_entry" always returns -ENOTSUP.

Implement an alternative approach to avoid external dependencies on
the lustre_user.h header.  Also, LC_IOC_REMOVE_ENTRY is itself
unnecessary, the code can check for LL_IOC_REMOVE_ENTRY directly.

Replace the NFS-specific -ENOTSUP error return code with -EOPNOTSUPP.

Fix the compile test_400[ab] checks to not use "-std=c99" to verify
that the uapi headers are usable without this dependency.

Lustre-change: https://review.whamcloud.com/49328
Lustre-commit: efc5c8d4de60d394344506f7cfb188eaf04a4bac

Fixes: b59835f8b6 ("LU-13903 utils: have liblustreapi support Linux client")
Fixes: 7a7309fa84 ("LU-13274 uapi: make lustre UAPI headers C99 compliant")
Fixes: 6331eadbd6 ("LU-15420 uapi: avoid gcc-11 -Werror=stringop-overread")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If42743a2148c317b8a9b701ceb5d08bac5149f5f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49711
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16159 lod: cancel update llogs upon recovery abort
Lai Siyao [Sun, 28 Aug 2022 18:35:25 +0000 (14:35 -0400)]
LU-16159 lod: cancel update llogs upon recovery abort

If recovery is aborted, cancel update catalog from catlist, and keep
them on disk for some time (for debug purpose), as can avoid
accumulating stale update records, and also avoid recovery problems
if update llogs are corrupt.

Update llogs are canceled after recovery completes and before regular
request processing. For these logs, their ctime will be set, and log
header will be marked with LLOG_F_MAX_AGE|LLOG_F_RM_ON_ERR, and when
30 days passed, they will be removed automatically.

Tidy up recovery abort code:
* if obd_abort_recovery is set, or OBD is stopping, stop both
  client recovery and MDT recovery.
* otherwise if obd_abort_mdt_recovery is set, stop MDT recovery only.

lctl llog_print support printing update log FIDs used by specified
MDT:
* "lctl --device <MDT> llog_print update_log" will list all update
  llog FIDs used by this MDT device.

Disabled replay-single.sh 100c stripe check because abort_recovery
will cancel update llogs, and won't replay them upon next recovery.

Added replay-single.sh 100d.

Formatall in the end of replay-single.sh because directory unlink may
fail.

Lustre-change: https://review.whamcloud.com/48584
Lustre-commit: b054fcd7852f6a22f8ec469ce94ddf6f3331ab34

Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie2bda6c097d65f5c51cba66c2dbf6ae4a5d36dda
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49403
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16228 utils: add lljobstat util
Lei Feng [Mon, 17 Oct 2022 05:36:14 +0000 (13:36 +0800)]
LU-16228 utils: add lljobstat util

lljobstat util read datas from job_stats file(s),
parse, aggregate the data and list top jobs.

For example:
$ ./lljobstats -n 1 -c 3
---
timestamp: 1665984678
top_jobs:
- ll_sa_3508505.0: {ops: 64, ga: 64}
- touch.500:       {ops: 6, op: 1, cl: 1, mn: 1, ga: 1, sa: 2}
- bash.0:          {ops: 3, ga: 3}
...

Includes part of "LU-16110 lprocfs: make job_stats and
rename_stats valid YAML" to make rename_stats valid
and verify the YAML output.

Includes "LU-16459 tests: fix YAML verification function"
to fix the test case of LU-16110.

Lustre-change: https://review.whamcloud.com/48888
Lustre-commit: TBD (from 08836199bbd26bdc1a800f5710691d9b6723b1eb)

Change-Id: I0c4ac619496c184a5aebbaf8674f5090ab722d72
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49560
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn73
Andreas Dilger [Tue, 17 Jan 2023 19:44:29 +0000 (12:44 -0700)]
RM-620 build: New tag 2.14.0-ddn73

New tag 2.14.0-ddn73

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I654b9727f7a52d24b81d07aa1b001567ad064681

2 years agoEX-6692 tests: clean up test env in sanity-lipe-scan3.sh
Jian Yu [Tue, 17 Jan 2023 07:00:05 +0000 (23:00 -0800)]
EX-6692 tests: clean up test env in sanity-lipe-scan3.sh

This patch reformats the file system in sanity-lipe-scan3.sh
to clean up the test env before running subtests.

Test-Parameters: trivial testlist=sanity-lipe-scan3
Test-Parameters: trivial mdscount=2 mdtcount=4 \
testlist=sanity-lipe-scan3

Fixes: de1bb57641 ("EX-6170 tests: make sanity-lipe-scan3.sh support remote MDS")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I301149f7f585716fcb39ba9065c2f372fb075344
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49621
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16385 obdlcass: stop MGC before MGS
Alex Zhuravlev [Mon, 12 Dec 2022 13:35:41 +0000 (16:35 +0300)]
LU-16385 obdlcass: stop MGC before MGS

drops a reference to MGC when MGS is being umounted so that
MGC doesn't try to disconnected from a missing MGS which
can take long and hurt HA.

Lustre-change: https://review.whamcloud.com/49378
Lustre-commit: 817184a9788ae399dcd5cf53ae7c9801e4778a43

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib15f1ca56c47201bf6e29c12b3f81a11e55944ca
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49641
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14377 tests: make parallel-scale/rr_alloc less strict
Andreas Dilger [Wed, 19 Oct 2022 00:37:58 +0000 (18:37 -0600)]
LU-14377 tests: make parallel-scale/rr_alloc less strict

test_rr_alloc() sometimes fails with a difference of 3-4 objects
per OST, after creating 1500+ objects on each OST.  This should
not be considered fatal.  Make the test more lenient, and allow
a difference of up to 0.3% of objects between the OSTs.

Fix some code style issues in the test.

Lustre-change: https://review.whamcloud.com/48914
Lustre-commit: b104c0a27713899a4d047f56fed57c30c39b8195

Test-Parameters: trivial testlist=parallel-scale env=ONLY=rr_alloc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib6ba8c5d8e9d3245833448a52f8ed25308698a33
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49607
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14938 tests: fail_abort() in t-f to take care of MDTs
Alex Zhuravlev [Mon, 16 Aug 2021 17:22:00 +0000 (20:22 +0300)]
LU-14938 tests: fail_abort() in t-f to take care of MDTs

fail_abort() in test-framework ensures that the clients
are back after evictions. the same should be done for
MDTs as otherwise any subsequent test may fail due to
another MDT observing eviction and interrupting current
request with -EIO.

Lustre-change: https://review.whamcloud.com/44671
Lustre-commit: 436cd4fd21ffee5830c9b4e75055db80c47547d5

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0a00ece52d28c6d28eef029a4f87a348efaa041c
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49598
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16210 llite: replace selinux_is_enabled()
Etienne AUJAMES [Thu, 6 Oct 2022 13:30:54 +0000 (15:30 +0200)]
LU-16210 llite: replace selinux_is_enabled()

selinux_is_enabled() was removed from kernel 5.1.
The commit 39e5bfa add the kernel support by assuming SELinux to be
enabled if the function selinux_is_enabled() does not exist.

This has performances impacts: on older kernel (e.g: Centos7) getxattr
RPCs was not send for "security.selinux" if selinux was disabled.
Utilities like "ls -l" always try to get "security.selinux".
See the LU-549 for more information.

This patch uses security_inode_listsecurity() when mounting the
client to know if a LSM module (selinux) required a xattr to store
file contexts. If a xattr is returned we store it and use it for in
request security context.

For getxattr/setxattr we use the stored LSM's xattr to filter xattr
security contexts like security.selinux. If xattr does not match the
stored xattr name we returned -EOPNOTSUPP to userspace.

It adds also the s_security check for security_inode_notifysecctx() to
avoid calling this function if selinux is disabled (as in
nfs_setsecurity()).

For "Enforcing SELinux Policy Check" functionnality, the selinux check
have been moved in l_getsepol: -ENODEV is returned if selinux is
disabled.

Add a regresion test "sanity test_434" for this use case.

*Note:*
This patch detects that selinux is disabled without explicitly
disabled it in kernel cmdline. This is recommended for RHEL >= 8.5.

*Performances:*
Tests with "strace -c ls -l" with 100000 files on root in a multi VMs
env (on Rocky 9). FS is remount for each tests (cache is cleaned) and
selinux is disabled.
 __________________ ___________ _________
| Total time %     | lgetxattr | statx   |
|__________________|___________|_________|
|Without the patch:|    29%    |   51%   |
|__________________|___________|_________|
|With the patch:   |    0%     |   87%   |
|__________________|___________|_________|
"ls -l" uses lgetxattr to get "security.selinux".

Linux-commit: 3d252529480c68bfd6a6774652df7c8968b28e41

Lustre-change: https://review.whamcloud.com/48875
Lustre-commit: 1d8faaf6caf4acaf0e2d4943b51c024a96c80624

Fixes: 39e5bfa ("LU-12355 llite: include file linux/selinux.h removed")
Fixes: 9bcac0b ("LU-549 llite: Improve statfs performance if selinux is disabled")
Test-Parameters: clientselinux=false clientdistro=el7.9 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux=false clientdistro=el8.5 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux=false clientdistro=el8.6 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux clientdistro=el8.6 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el8.6 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el7.9 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el7.9 testlist=sanity-selinux
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I4dac87ac0341b45a1c2fef836cdce0361017b3f5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49628
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16114 build: Update security_dentry_init_security args
Shaun Tancheff [Sun, 28 Aug 2022 14:38:39 +0000 (21:38 +0700)]
LU-16114 build: Update security_dentry_init_security args

Linux commit v5.15-rc1-20-g15bf32398ad4
   security: Return xattr name from security_dentry_init_security()

Adjust security_dentry_init_security() calls accordingly

Lustre-change: https://review.whamcloud.com/48359
Lustre-commit: 88bccc4fa4dd7310560f588c730eefedf423c515

Test-Parameters: trivial
HPE-bug-id: LUS-11188
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I42d3307f7fe0d2412381363f60ac5b3df2d5891a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6571 tests: Added tests for sanity-lipe-find3
Alexandre Ioffe [Thu, 12 Jan 2023 23:51:12 +0000 (15:51 -0800)]
EX-6571 tests: Added tests for sanity-lipe-find3

Added sanity tests for lipe_find3
-stripe-count
-mirror-count
-path
-ipath

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I1180bf0b667372dc9f0d48e4fbf89fbaaca7fdd7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15828 o2iblnd: reset hiw proportionally
Serguei Smirnov [Thu, 22 Dec 2022 22:42:48 +0000 (14:42 -0800)]
LU-15828 o2iblnd: reset hiw proportionally

As a result of connection negotiation, queue depth may end up
being shorter than "peer_tx_credits" tunables value. Before this
patch, the high-water mark "lnd_peercredits_hiw" would be set at
min(current hiw, queue depth - 1).

For example, considering that hiw is allowed to only be as low as
half of peer_tx_credits, negotiating queue_depth/peer_credits down
from 32 to 8 would always result in hiw set at 7, i.e. credits would
be released as late as possible.

With this patch, if queue depth is reduced, hiw is set proportionally
relative to the level it was at before:
hiw = (queue_depth * lnd_peercredits_hiw) / peer_tx_credits

Using the above example with queue depth initially at 32, negotiating
down to 8 would result in hiw set to 4 if "lnd_peercredits_hiw" is
initially at 16, 17, 18, 19; hiw set to 5 if "lnd_peercredits_hiw" is
initially at 20, 21, 22, 23, and so on.

Lustre-change: https://review.whamcloud.com/49497
Lustre-commit: e1944c29793d489429730a9445e243b448c3d751

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I633933d7448db1ca88d3c65de9c29e870ca2c9fb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6713 doc: man pages for asynchronous PCCRO attachment
Qian Yingjin [Mon, 16 Jan 2023 03:06:50 +0000 (22:06 -0500)]
EX-6713 doc: man pages for asynchronous PCCRO attachment

This patch updates the man pages for asynchronous PCCRO attachment
for "lfs pcc attach -A" command.

Change-Id: I7757a9d0b66a3586abdc9053b73d69944561ffbd
Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn72
Andreas Dilger [Thu, 12 Jan 2023 01:12:44 +0000 (18:12 -0700)]
RM-620 build: New tag 2.14.0-ddn72

New tag 2.14.0-ddn72

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I61033de82c0777ad4acc90ca6794780f71057e0d

2 years agoEX-6682 build: add json-c-devel into lustre-dkms.spec.in
Jian Yu [Tue, 10 Jan 2023 20:02:01 +0000 (12:02 -0800)]
EX-6682 build: add json-c-devel into lustre-dkms.spec.in

While installing client DKMS package, json-c-devel package is
required. This patch adds the package requirement into
lustre-dkms.spec.in.

Test-Parameters: trivial

Fixes: fbfd2d0755 ("EX-5176 pcc: use JSON string for trusted.pin xattr")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I72f7e23a8c1ec9edecfc69b2e8dda758f215b4e2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49594
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16444 enc: null-enc names cannot be digested form
Sebastien Buisson [Wed, 4 Jan 2023 15:10:02 +0000 (16:10 +0100)]
LU-16444 enc: null-enc names cannot be digested form

When encrypted files have their names encrypted, long names are in
digested form in case access is done without the encryption key. The
digest is base64-encoded, and prepended with '_'.
With null encryption for file names, names are always plain text. In
this case, a legitimate '_' at the start of a name must not be
interpreted as a digested form.

sanity-sec test_54 is improved to test the case of a file whose name
starts with '_'.

Lustre-change: https://review.whamcloud.com/49550
Lustre-commit: TBD (5487e006b1ca152be665729a4fdf273c6109f0f4)

Fixes: f18c87cb53 ("LU-13717 sec: handle null algo for filename encryption")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idaad186afd06cfbabbe1d13e78f083d12876c8ff
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49552
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16160 revert: "llite: clear stale page's uptodate bit"
Andreas Dilger [Fri, 6 Jan 2023 18:19:33 +0000 (18:19 +0000)]
LU-16160 revert: "llite: clear stale page's uptodate bit"

This reverts commit 451b4ac514dd03c4fe91726da2f95a1f5575a5a6
which caused a bug in cl_page_own() race with ll_releasepage()
and cl_pagevec_put() assertion failure.

Lustre-change: https://review.whamcloud.com/49541
Lustre-commit: TBD (from ef330e09a59da0df2de153ecdb2e7d8729cd6b63)

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Icdb8c60f4d992c9976670e1b06c5bab5ef3a3954
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49576
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6220 lipe: lipe_find3 has no mirror-count and stripe-count
Alexandre Ioffe [Sat, 7 Jan 2023 06:07:00 +0000 (22:07 -0800)]
EX-6220 lipe: lipe_find3 has no mirror-count and stripe-count

Added mirror-count and stripe-count search options

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I79c3b1cd0b1759abce248bee73676a823441825c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49578
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-14645 tests: test lfs setdirstripe with '/$'
Jian Yu [Tue, 27 Dec 2022 01:13:31 +0000 (17:13 -0800)]
LU-14645 tests: test lfs setdirstripe with '/$'

This patch improves one of the lfs setdirstripe tests to
verify that dir name ending with '/' also works.

Lustre-change: https://review.whamcloud.com/49463
Lustre-commit: 4b9a39d3ed58a664a2498911ca1d3c9073c13bd3

Test-Parameters: trivial mdscount=2 mdtcount=4 \
env=ONLY=24B testlist=sanity

Change-Id: I237d5a9ebad42cc0569aa1db487d0df147372316
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49464
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6170 tests: make sanity-lipe-scan3.sh support remote MDS
Jian Yu [Thu, 5 Jan 2023 07:51:38 +0000 (23:51 -0800)]
EX-6170 tests: make sanity-lipe-scan3.sh support remote MDS

The sanity-lipe-scan3.sh script was written to only run on a local
client+MDS configuration. This patch fixes it to support running
the lipe_scan3 command on remote MDS.

Test-Parameters: trivial testlist=sanity-lipe-scan3
Test-Parameters: trivial mdscount=2 mdtcount=4 \
testlist=sanity-lipe-scan3

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I0ede3420c6f529cdfb9e97a5664945a5c2f0ff09
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn71
Andreas Dilger [Wed, 4 Jan 2023 23:07:16 +0000 (16:07 -0700)]
RM-620 build: New tag 2.14.0-ddn71

New tag 2.14.0-ddn71

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia77c09c628c0b5f76497825f8e2142cd1bc6ad96

2 years agoLU-14645 utils: setstripe cleanup
Vitaly Fertman [Tue, 20 Dec 2022 19:50:47 +0000 (11:50 -0800)]
LU-14645 utils: setstripe cleanup

lfs setstripe checks stripe parameters differently for PFL and !PFL
layouts. Whereas the PFL layout is checked in comp_args_to_layout()
individually and in llapi_layout_sanity_cb() in pairs, !PFL layout
verification is done partially in several places. Create a common
llapi_stripe_param_verify() for this purpose. Make the checks for
both cases symmetric.

skip some excessive checks:
- do not check the file is on lustre fs, the following ioctl does it;
- do not check the stripe-index is valid, done on MDS side;
- do not check the pool exists for a !PFL file (align with a setstripe
  for PFL files);

Lustre-change: https://review.whamcloud.com/43465
Lustre-commit: 149934fe28dac22a51ec9b2873c4f215cb204947

Lustre-change: https://review.whamcloud.com/46151
Lustre-commit: 5e65d6a8e57a5a17c4c7e043cb46e86bf82b7782

Lustre-change: https://review.whamcloud.com/46152
Lustre-commit: cd1f8527d414a12ec7eb5b69fe30509a45b33ad4

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I456b1b2e876229ac1a354d4e3879624325856574
HPE-bug-id: LUS-9886
Reviewed-on: https://es-gerrit.dev.cray.com/158589
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49459
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-16187 tests: Fix is_project_quota_supported()
Arshad Hussain [Mon, 26 Sep 2022 09:31:41 +0000 (15:01 +0530)]
LU-16187 tests: Fix is_project_quota_supported()

is_project_quota_supported() is called from sanity-quota.sh
to verify if the ldiskfs FS $ENABLE_PROJECT_QUOTAS is true
and to verify if current version of lfs command supports
'project'.  To do this it calls 'lfs --help' which is
not supported. This patch moves 'lfs --help' call to
'lfs --list-commands' call to verfiy if the present
version of lfs supports 'project'

Lustre-change: https://review.whamcloud.com/48654
Lustre-commit: d4848d779bb8716c6df2fe5438fbe00997f87f3d

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iba7e6696d3fa9e980088f448ae72b07a4b47f4f2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49454
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
2 years agoEX-5819 tests: wait longer for lpcc.sock
Lei Feng [Fri, 30 Dec 2022 01:47:21 +0000 (09:47 +0800)]
EX-5819 tests: wait longer for lpcc.sock

wait a little longer for lpcc.sock in sanity-pcc/test_210.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc env=ONLY=210
Change-Id: I359782b5de86d7354df2db169f85a18490602d7d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49531
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16390 tests: check Lustre filefrag in sanity-flr/49a
Andreas Dilger [Tue, 13 Dec 2022 07:01:06 +0000 (00:01 -0700)]
LU-16390 tests: check Lustre filefrag in sanity-flr/49a

Check that a Lustre-patched filefrag is installed when running
sanity-flr test_49a.

Lustre-change: https://review.whamcloud.com/49386
Lustre-commit: 37f18670e49b8150170f9b724b5f7089fa176c4e

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic909ea4ca160d47480004f53a96ce7539ce5076c
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16412 llite: check truncated page in ->readpage()
Qian Yingjin [Mon, 19 Dec 2022 06:57:39 +0000 (01:57 -0500)]
LU-16412 llite: check truncated page in ->readpage()

The page end offset calculation in filemap_get_read_batch() was
off by one. This bug was introduced in commit v5.11-10234-gcbd59c48ae
("mm/filemap: use head pages in generic_file_buffered_read")

When a read is submitted with end offset 1048575, it calculates
the end page index for read of 256 where it should be 255. This
results in the readpage() call for the page with index 256 is over
stripe boundary and may not be covered by a DLM extent lock.

This happens in a corner race case: filemap_get_read_batch()
batches the page with index 256 for read, but later this page is
removed from page cache due to the lock protected it being revoked,
but has a reference count due to the batch.  This results in this
page in the read path is not covered by any DLM lock.

The solution is simple. We can check whether the page was
truncated and was removed from page cache in ->readpage() by the
address_sapce pointer of the page. If it was truncated, return
AOP_TRUNCATED_PAGE to the upper caller.  This will cause the
kernel to retry to batch pages and the truncated page will not
be added as it was already removed from page cache of the file.

Add sanityn/test_95 to verify it.

Lustre-change: https://review.whamcloud.com/49433
Lustre-commit: TBD (from 02fe613db9517875c03e8a919e1b42cb1ba7c619)

Test-Parameters: testlist=sanityn env=ONLY=95 clientdistro=ubuntu2204
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I192df92b1d1b79057055430cc81cb7cc760cc9ed
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15115 ptlrpc: recalc timer on EINPROGRESS reply
Alexander Zarochentsev [Fri, 15 Oct 2021 18:27:29 +0000 (21:27 +0300)]
LU-15115 ptlrpc: recalc timer on EINPROGRESS reply

ptlrpcd doesn't recalculate wait queue timer after
getting -EINPROGRESS reply. It may delay request resend
till its timing out.

Lustre-change: https://review.whamcloud.com/45266
Lustre-commit: 9a5bace55a5ddb8a928af2de1b199e968f3fbecd

HPE-bug-id: LUS-10366
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Idc76c688a0f7ff8e110446fd1fe13dd83f636f3b
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49513
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x
Li Dongyang [Mon, 19 Dec 2022 10:03:47 +0000 (21:03 +1100)]
LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x

Recreate the currently broken lustre kernel patches
to allow using custom integrity functions for bio.
Note we don't need to save the generate_fn anymore,
it will be used once we call bio_integrity_prep_fn().

Add upstream fix
b13e0c718568 ("block: bio-integrity: Advance seed correctly
for larger interval sizes") for CentOS 8.0 to 8.6.

Handle the kernel api changes for the T10PI generate and
verify functions introduced in CentOS 8.x kernel,
mostly because of switching to blk_integrity_iter.

Update the custom generate and verify functions, to sync
with upstream versions.
- Add T10-DIF-TYPE2, currently only a place holder,
  not used in upstream either.
- Use __be16 instead of __u16 for guard tags.

Only reuse guard tags if the rpc checksum is the same
one supported on the target. We already have some protection
during checksum type negotiation, the server
will mark the target's T10PI type as the only
T10PI checksum type supported. But it's still good to
have the logic in place.

Do not call bio_integrity_prep() if the custom interface
bio_integrity_prep_fn() does not exist, submit_bio() will
do that for us.

On the servers, show the target's T10PI checksum as
the preferred checksum_type even if it's not the fastest.
Note this is only cosmetic and does not impact the checksum
type used, which is still done during negotiation.

Lustre-change: https://review.whamcloud.com/49441
Lustre-commit: TBD (from a0c96829a760a5cf199e5278bf2693f2618b77c9)

Change-Id: I2d0ba0b80ba9cde2977da24db08095671aa5373c
Test-Parameters: trivial
Fixes: 293844d132 ("LU-16222 kernel: RHEL 8.7 client and server support")
Fixes: f176efd183 ("LU-12269 kernel: RHEL 8.0 server support")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49483
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16376 obdclass: NUL terminate long jobid strings
Andreas Dilger [Thu, 8 Dec 2022 18:43:57 +0000 (11:43 -0700)]
LU-16376 obdclass: NUL terminate long jobid strings

It appears that some jobid names can be sent that are using the full
32-byte size, rather than containing an embedded NUL terminator. This
caused errors in lprocfs_job_stats_log() when it overflowed.

If there is no NUL terminator in lustre_msg_get_jobid() then add one
if not found within the buffer, so that the rest of the code doesn't
have to deal with unterminated strings.

This potentially exposes a larger issue that other places may not be
handling the unterminated string properly either, which needs to be
addressed separately on both the client and server.  Terminating the
jobid to 31 chars only on the client does not totally solve the issue,
since there will still be older clients that are not doing this, so
the server needs to handle this in any case.

Lustre-change: https://review.whamcloud.com/49351
Lustre-commit: 9eba5d57297f807fddf046356c846478bbf232f4

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4c05fabdacb6a0bbf6477d3601a628fe1f3ebbe5
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14552 ptlrpc: NULL pointer dereference in ptlrpc_watchdog_fire
Andriy Skulysh [Mon, 1 Mar 2021 21:41:33 +0000 (23:41 +0200)]
LU-14552 ptlrpc: NULL pointer dereference in ptlrpc_watchdog_fire

thread->t_task isn't initialized by target_recovery_thread()

Lustre-change: https://review.whamcloud.com/43115
Lustre-commit: 14a1102268941d851ef5ef793923e39081b81ff4

Change-Id: Ia38d5ccaab6b9332a1fd60ebe5ed2461f7d5db84
HPE-bug-id: LUS-9748
Fixes: 0496cdf20 ("LU-13608 tgt: abort recovery while reading update llog")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49486
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15081 vfs: set_nlink() is not race-safe
Andrew Perepechko [Mon, 11 Oct 2021 19:11:05 +0000 (22:11 +0300)]
LU-15081 vfs: set_nlink() is not race-safe

set_nlink() is not atomic wrt race with itself and
the following warning may be triggered by VFS:

WARNING: CPU: 5 PID: 195090 at fs/inode.c:241 __destroy_inode+0xdb/0xf0

It does not seem important what exact nlink value is the result
of the race. However, we need to protect the superblock remove
counter.

Lustre-change: https://review.whamcloud.com/45191
Lustre-commit: 12b05772fdb6d080819b6c213fcd7f8705278412

Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-9825
Change-Id: I67bc345b9a9e43fb88d919a83246759d11604b03
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49452
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6169 lipe: sanity-lipe-find3 reformat to clean lost+found
Alexandre Ioffe [Tue, 13 Dec 2022 06:20:33 +0000 (22:20 -0800)]
EX-6169 lipe: sanity-lipe-find3 reformat to clean lost+found

Reformat file system when .lustre/lost+found/ has garbage

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ib78b06e685aaeabb8356662747285ed7a27dde15
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6187 lipe: lipe_find3 missing option ipath
Alexandre Ioffe [Tue, 20 Dec 2022 07:35:13 +0000 (23:35 -0800)]
EX-6187 lipe: lipe_find3 missing option ipath

Added missing lexical ipath

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I62260e054a9c514aa31d378322b6840f75edf221
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49455
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn70
Andreas Dilger [Sat, 17 Dec 2022 02:30:27 +0000 (19:30 -0700)]
RM-620 build: New tag 2.14.0-ddn70

New tag 2.14.0-ddn70

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iaf30877c3a4c88c20a64814670c7e97e4f0cc5e0

2 years agoLU-15935 tests: add version check to replay-dual test_33
Jian Yu [Wed, 14 Dec 2022 02:13:33 +0000 (18:13 -0800)]
LU-15935 tests: add version check to replay-dual test_33

This patch adds MDS version check to replay-dual test_33
to avoid interop test failure.

Lustre-change: https://review.whamcloud.com/49398
Lustre-commit: TBD (from 0027fba3d3f797407fad9f3995f839a431e49782)

Test-Parameters: trivial \
serverjob=lustre-b_es5_2 serverbuildno=539 \
env=ONLY=33 testlist=replay-dual

Test-Parameters: trivial env=ONLY=33 testlist=replay-dual

Change-Id: I3ec665302a431d3c0f07bc819a08237dbc5b4309
Fixes: 1a79d395dd ("LU-15935 target: keep track of multirpc slots in last_rcvd")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49401
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15234 lnet: add mechanism for dumping lnd peer debug info
Serguei Smirnov [Mon, 28 Feb 2022 19:04:00 +0000 (11:04 -0800)]
LU-15234 lnet: add mechanism for dumping lnd peer debug info

Add ability to dump lnd peer debug info:
lnetctl debug peer --nid=<nid>

The debug info is dumped to the log as D_CONSOLE by the respective
lnd and can be retrieved with "lctl dk" or seen in syslog.
This mechanism has been added for socklnd and o2iblnd peers.

Lustre-change: https://review.whamcloud.com/48566
Lustre-commit: 950e59ced18d49e9fdd31c1e9de43b89a0bc1c1d

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ia9c4d59143206bcb7ec43806594cf0cfaed5f0a9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49038
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5924 lipe: lipe_scan3 ERROR replaced by WARNING
Alexandre Ioffe [Fri, 9 Dec 2022 22:52:55 +0000 (14:52 -0800)]
EX-5924 lipe: lipe_scan3 ERROR replaced by WARNING

Decrease severity of the message down to WARNING.

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I2f4b885248692e042ba9eb0f97736401e6d35de6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49355
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
2 years agoLU-16378 lnet: handles unregister/register events
Cyril Bordage [Mon, 12 Dec 2022 10:49:11 +0000 (11:49 +0100)]
LU-16378 lnet: handles unregister/register events

When network is restarted, devices are unregistered and then
registered again. When a device registers using an index that is
different from the previous one (before network was restarted), LNet
ignores it. Consequently, this device stays with link in fatal state.

To fix that, we catch unregistering events to clear the saved index
value, and when a registering event comes, we save the new value.

Lustre-change: https://review.whamcloud.com/49375/
Lustre-commit: TBD (from 7442710a56a8f38453441c62253c0ad891fe9b8c)

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I17e93a1103d588f3e630a9c7446b345f4d472b97
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16373 tests: failover mds1 back to the primary server
Jian Yu [Thu, 15 Dec 2022 19:38:56 +0000 (11:38 -0800)]
LU-16373 tests: failover mds1 back to the primary server

This patch fixes recovery-small test 144a to failover
mds1 back to the primary server so that stack_trap can
set timeout parameter on the correct mds node.

Lustre-change: https://review.whamcloud.com/49345
Lustre-commit: TBD (from 68c75d28fe86ac890d242c004c664f872204b660)

Test-Parameters: trivial \
env=SLOW=yes,FAILURE_MODE=HARD,ONLY=144a \
clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
austeroptions=-R failover=true iscsi=1 \
testlist=recovery-small

Change-Id: Idbfdb7b084c7edac8784008e0455f76632aa685b
Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49419
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16329 Revert "LU-8621 utils: cmd help to stdout or short cmd error"
Andreas Dilger [Thu, 15 Dec 2022 15:30:32 +0000 (08:30 -0700)]
LU-16329 Revert "LU-8621 utils: cmd help to stdout or short cmd error"

This reverts commit 608d763955d7e0a9c438c317e595f14825e9423b.
This breaks bash command completion.

Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout, short cmd error")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I004ea5af499593b0f36ba17ff5f517548f0ea0f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6349 Revert "LU-14661 obdclass: Add peer/peer NI when processing llog"
Alex Zhuravlev [Wed, 14 Dec 2022 19:00:01 +0000 (22:00 +0300)]
EX-6349 Revert "LU-14661 obdclass: Add peer/peer NI when processing llog"

This reverts commit e8ddb2f550072cdd3489389c107af3e892a21f66.
It is causing problem with reconnection at failover.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I53594f8f93474666c4abd96291d58dadf8ac5969
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49411
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs
Lai Siyao [Tue, 15 Mar 2022 19:43:14 +0000 (15:43 -0400)]
LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs

In osd_fid_lookup(), if the FID mapping found in OI table is insane,
it will be added into a list called os_inconsistent_items, and OI
scrub will be triggered.

Later if OI scrub can't fix this mapping, it should move this mapping
into a list called os_stale_items, and subsequent access of the same
FID should return -ESTALE immediately, other than trigger OI
scrub repeatedly.

Add sanity-scrub 20. Remove sanity-scrub 1d, which is not a sane test
because it altered FID in LMA, which is the last to trust for an
object, and it could pass just by chance.

Lustre-change: https://review.whamcloud.com/46852
Lustre-commit: 558784caad491be50e93ae60a31d4219a1e038bc

Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ed8928506551416b1008121adbe385dedda29bc
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49424
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn69
Andreas Dilger [Tue, 13 Dec 2022 19:12:09 +0000 (12:12 -0700)]
RM-620 build: New tag 2.14.0-ddn69

New tag 2.14.0-ddn69

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I592bd3a6fdb9db02bbe1a18c6e84d9b61a639f95

2 years agoEX-6497 lipe: Refine stats field name in lamigo
Alexandre Ioffe [Thu, 8 Dec 2022 06:45:35 +0000 (22:45 -0800)]
EX-6497 lipe: Refine stats field name in lamigo

Corrected periodically printed by lamigo INFO
message "processed":
- Added two additional fields:
  "running" - number of currently running jobs such as replication
  "delayed" - current number of failed and other (such as set flag)
  jobs which are awating to be run on next lamigo cycle
- "in queue" field is changed to "awaiting". This is current number
  of files in the internal cache. These files are awating to be
  processed (replicated)

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iacf0199cfcf56edcbb8ad91e0e4b62c7451900f5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 lipe: decrease delay before ALR restart
Alexandre Ioffe [Sat, 19 Nov 2022 05:39:08 +0000 (21:39 -0800)]
EX-6298 lipe: decrease delay before ALR restart

- Decrease delay before restarting access log reader and
eliminate this delay when the read from ALR fails
due to timeout. Increase SSH poll/read timeout while
keep-alive message in ofd_access_log_reader is not
implemented
This will decrease probability of missing ALR.
- Remove excluding hot-pools test_72

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I36989e9c3fd877aee5ce1cfb8525db8604e666bd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49196
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16353 config: enable_foo variables mustn't contains space
Mr NeilBrown [Thu, 1 Dec 2022 17:53:01 +0000 (09:53 -0800)]
LU-16353 config: enable_foo variables mustn't contains space

$enable_crypto is in some circumstances set to "embedded llcrypt"
which contains a space.
When the code from lustre-build.m4 then tests the value with:

   if test x$enablecrypto = xyes

we get a syntax error from ./configure

We could add quotes to this comment, but for consistency we would need
to add quotes to ever other test for an enable_foo variable.

It is simpler just to ensure we don't add spaces.  So change the space
to a hyphen.

Lustre-change: https://review.whamcloud.com/49282
Lustre-commit: c8a33e5322b0675680f8d737f04259799d30aa0e

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I097e857409d6ec48a765ccda1cc470d28b90e601
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16051 o2iblnd: detect link state to set fatal error on ni
Serguei Smirnov [Fri, 23 Sep 2022 22:20:51 +0000 (15:20 -0700)]
LU-16051 o2iblnd: detect link state to set fatal error on ni

To avoid selecting lnet ni which corresponds to a downed link
for sending, add a mechanism for detecting ip-layer link events
in o2iblnd. On ip link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag. This complements the
existing mechanism for ib-layer link event handling.

Lustre-change: https://review.whamcloud.com/48644
Lustre-commit: 30d73908087d5b2f0b18cce95826c4825c030ad4

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I4720cd0a7bc577a522c7d40b54f821a4c12b670f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49315
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14992 tests: add more mkdir_on_mdt0 calls
Mr NeilBrown [Tue, 29 Nov 2022 02:31:21 +0000 (18:31 -0800)]
LU-14992 tests: add more mkdir_on_mdt0 calls

A previous patch changed some mkdir calls in test_133a to
mkdir_on_mdt0. This allows stats collected from mdt0 to
reflect the mkdir.

However two mkdir calls were missed, so "crossdir_rename" stats can be
wrong.

Lustre-change: https://review.whamcloud.com/49252
Lustre-commit: d56ea0c80a959ebd9b393f2da048cc179cb16127

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity env=ONLY=133a

Fixes: 543341afc3 ("LU-14992 tests: sanity/replay-vbr mkdir on MDT0")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4e5c2e5504307462bff4012a13ef9deb24f8da8c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49262
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16308 llite: wake_up after cl_object_kill
Lai Siyao [Thu, 10 Nov 2022 13:15:51 +0000 (08:15 -0500)]
LU-16308 llite: wake_up after cl_object_kill

cl_inode_fini() calls cl_object_kill() to set LU_OBJECT_HEARD_BANSHEE,
and then calls cl_object_put_last() to wait for object refcount to
become one, It should wake_up() in the middle in case someone is
waiting on the flag.

Lustre-change: https://review.whamcloud.com/49130
Lustre-commit: 3a0a6c7a88499a78c9bfc6ac514d05eba60312c9

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I244db71ee4ed9c39118e443b99c3b8a3a0aa4bc3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49312
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6468 pcc: add threshold to determine direct I/O during attach
Qian Yingjin [Wed, 30 Nov 2022 14:29:47 +0000 (09:29 -0500)]
EX-6468 pcc: add threshold to determine direct I/O during attach

This patch adds the threshold tunable parameter to determine doing
direct I/O or buffered I/O for data copying during attach:
llite.*.pcc_dio_attach_threshold
The default value is same as direct I/O size: 32MiB.

And the usage of the parameter "pcc_dio_attach_size_mb" is
deprecated, and use "pcc_dio_attach_iosize_mb" instead.

Change-Id: I393d6a06523303e749192ba9978449c3d75886ae
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn68
Andreas Dilger [Tue, 6 Dec 2022 05:15:41 +0000 (22:15 -0700)]
RM-620 build: New tag 2.14.0-ddn68

New tag 2.14.0-ddn68

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id4e3d1a9f28afe251e55582c84acaf98ebfe9954

2 years agoLU-15852 lnet: Don't modify uptodate peer with temp NI
Chris Horn [Wed, 30 Mar 2022 18:35:23 +0000 (13:35 -0500)]
LU-15852 lnet: Don't modify uptodate peer with temp NI

When processing the config log it is possible that we attempt to
add temp NIs after discovery has completed on a peer. These temp
may not actually exist on the peer. Since discovery has already
completed the peer is considered up-to-date and we can end up with
incorrect peer entries. We shouldn't add temp NIs to a peer that
is already up-to-date.

Lustre-change: https://review.whamcloud.com/47322
Lustre-commit: 8f718df474e453fbc69dfe90214e71565963f6db

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia484713b1e6c9e1a46e525589b7c741c6478e417
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49303
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15938 llog: more checks in llog_reader
Mikhail Pershin [Tue, 2 Aug 2022 12:41:52 +0000 (15:41 +0300)]
LU-15938 llog: more checks in llog_reader

Add more correctness checks and reports in llog_reader:
- better report wrong record length and chunk skipping case
- add tail check: tail id and len should be the same as in head
- better report for gap in record indeces
- test case with two corruption types:
  1) llog has bits set in bitmap beyond file end
  2) corruption in the middle

Lustre-change: https://review.whamcloud.com/48112
Lustre-commit: 386ffcdbb4c9b89f798de4c83a51a3f020542c8b

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I0c2af6ae2592c94e14e90ead12e28104409313b2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-16317 build: dkms build requires flex, bison and libmount-devel
Jian Yu [Tue, 29 Nov 2022 17:14:22 +0000 (09:14 -0800)]
LU-16317 build: dkms build requires flex, bison and libmount-devel

This patch fixes lustre.spec.in and lustre-dkms.spec.in to add
requires for flex, bison, libmount and libmount-devel. The last
two have already been added into lustre.spec.in.

Lustre-change: https://review.whamcloud.com/49183
Lustre-commit: c74c630ff7596317d1b500fd385fca271b31708c

Test-Parameters: trivial

Fixes: 121a79651f ("LU-15967 build: configure script does not check for required build tools")
Fixes: f21b944127 ("LU-15940 build: add a required dependency for libmount")

Change-Id: I9923fc7eb09f974e8c38c3664138486a424e16d7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6373 pcc: asynchronous PCCRO attach command support
Qian Yingjin [Fri, 11 Nov 2022 09:01:02 +0000 (04:01 -0500)]
EX-6373 pcc: asynchronous PCCRO attach command support

Currently PCCRO attach via the command "lfs pcc attach" will block
during the data copying.
There is a requirement that this command can also do data copy
asynchronously. Thus we add an option "--async|-A" to the command
which will not block while the file data is being fetched.

Add sanity-pcc/test_{103, 104} to verify that it works correctly.

Change-Id: I6f31190c8b9e9b9876b34f8e484c6c8b7f16b6db
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49133
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16313 pcc: use two bits to indicate pcc type for attach
Qian Yingjin [Tue, 15 Nov 2022 06:57:08 +0000 (01:57 -0500)]
LU-16313 pcc: use two bits to indicate pcc type for attach

PCC currenty supports two types: readwrite and readonly.
The attach data structure @lu_pcc_attach is using 32 bit value to
indicate the PCC type:
struct lu_pcc_attach {
__u32 pcca_type;
__u32 pcca_id;
};

In this patch, it changes to use 2 bits to represent the PCC type.
The left bits in @pcca_type can be used as flags for attach such
as a flag to indicate using the asynchronous attach via the
command "lfs pcc attach -A" for PCCRO.

Lustre-change: https://review.whamcloud.com/49160
Lustre-commit: 6e90974b1f4ac24c5a5d45ecc9bdb4d47018dab4

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idee26018642a174b04d1d36a81952ea98a06514e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn67
Andreas Dilger [Tue, 6 Dec 2022 02:05:39 +0000 (19:05 -0700)]
RM-620 build: New tag 2.14.0-ddn67

New tag 2.14.0-ddn67

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia40ed3b7d185fa171586d5ca377714518fdc5e2e

2 years agoLU-8585 llapi: use open_by_handle_at in llapi_open_by_fid
Quentin Bouget [Sun, 2 Jan 2022 16:12:42 +0000 (11:12 -0500)]
LU-8585 llapi: use open_by_handle_at in llapi_open_by_fid

Reimplement llapi_open_by_fid() to use llapi_fid_to_handle() and
open_by_handle_at(2) rather than using ioctl().  This works for
opens on subdirectory mountpoints, unlike ".lustre/fid/<fid>".

This patch also adds llapi_open_by_fid_at() which is similar to
llapi_open_by_fid() except that it takes an open directory file
descriptor or AT_CWD rather than a path as its first argument.

[AD:
- Move get_root_*() functions over to a new liblustreapi_root.c
  file in expectation of further enhancements to that code.
- Cache an open file handle on the root directory so repeated
  calls to llapi_open_by_fid() and llapi_fid2path() do not need
  to search for and open the same root directory path many times.
- Add man pages for newly-added functions.

  This reduces the system calls for llapi_fid_test significantly:

      original     patched
         14511        4315   total opens
         64807       34067   total syscalls
]

There may still be a need to have a fallback from open_by_handle_at()
to using ".lustre/fid/<FID>" to open the fid (if available), but
that can be added if this initial patch does not test well.  The
open_by_handle_at() method avoids reopening the "fid/" directory
each time (though this fd could also be cached), but it has the
drawback that it reconnects dentries to the root directory each time.

Lustre-change: https://review.whamcloud.com/36603
Lustre-commit: bdf7788d19985bb7abf2385add15f1d67f3d01e4

Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I8a4904c996389da2b0894cd9fac639a398607535
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49202
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15833 llapi: don't use realpath in llapi_search_fsname()
Etienne AUJAMES [Mon, 9 May 2022 13:44:29 +0000 (15:44 +0200)]
LU-15833 llapi: don't use realpath in llapi_search_fsname()

This patch use st_dev value to determine the fsname in
llapi_search_fsname().
The main purpose of this is to limit the number of lstat()
(realpath()) in this function.

get_root_path() is modified to search a mountpoint by dev.
And the last results of get_root_path() is cached to avoid reading
/proc/mount for each call.

A new api function llapi_search_rootpath_by_dev() is added to get
the path of Lustre mountpoint using the specified device value.

**Testing:**

*Environement:*
VMs: 1 client, 1 MDS (2MDT), 1 OSS (2 OST)
Lustre tree: test{001..100}/test{001..100}/test{01..10}/file{01..05}
(500000 files + 110100 folders)
OS: Centos 7 (no statx)
Lustre: 2.15.50_15_g1116739

*Tests*
cd <rootfs>
strace lfs getstripe -r .
echo 3 > /proc/sys/vm/drop_caches
/usr/bin/time lfs getstripe -r . (2 iterations)

*Results*
times (s):

                 ______________________________
                | user | system | real | real% |
 _______________|______|________|______|_______|
|without patch: | 6.18 | 57.3   | 427  | 0%    |
|_______________|______|________|______|_______|
|with patch:    | 2.88 | 47.3   | 404  |-5.45% |
|_______________|______|________|______|_______|

strace (only significant changes are displayed):
(*stat = lstat + stat + fstat)
                 _____________________________________________
                | *stat  | mmap   | open   | read   | all     |
 _______________|________|________|________|________|_________|
|without patch: | 760545 | 110142 | 330379 | 330325 | 4742658 |
|_______________|________|________|________|________|_________|
|with patch:    | 440484 | 0      | 220277 | 19     | 3541739 |
|_______________|________|________|________|________|_________|

-25.32% syscalls after patching.

Lustre-change: https://review.whamcloud.com/47258
Lustre-commit: 4fd7d5585d33240a658f57bf7399da4415a7eb6c

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I3812d922d5b1d194d52132cba95d11820424c5d7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoDDN-3473 build: support kernel 3.10.0-693.el7
Jian Yu [Wed, 16 Nov 2022 05:54:26 +0000 (21:54 -0800)]
DDN-3473 build: support kernel 3.10.0-693.el7

This patch fixes the following build failures to support
kernel 3.10.0-693.el7 for Lustre client:

- error: implicit declaration of function 'idr_destroy'
- error: implicit declaration of function 'gfpflags_allow_blocking'
- error: implicit declaration of function ‘cdev_device_add’
- error: passing argument 1 of 'init_wait_var_entry' from
  incompatible pointer type

Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I4b5c5264fb102d3a825c92e7b1e92cf0c52540e5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49197
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-12016 tests: skip sanity/156 in interop
Andreas Dilger [Fri, 25 Nov 2022 02:28:21 +0000 (19:28 -0700)]
LU-12016 tests: skip sanity/156 in interop

Since LU-12071 was backported to b_es5_2 the version check on b_es6_0
is incorrect and this part of the test_156 should be skipped.

Test-Parameters: trivial testlist=sanity env=ONLY=156 serverversion=EXA5
Fixes: 3043c6f189 ("LU-12071 osd-ldiskfs: bypass pagecache if requested")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3fd96578e36675655fb265d83ba3f661950ab112
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49246
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15139 osp: block reads until the object is created
Alex Zhuravlev [Sun, 13 Nov 2022 14:51:30 +0000 (17:51 +0300)]
LU-15139 osp: block reads until the object is created

it's possible that remote llog can be read and written simultaneously
at recovery. for example, dtx recovery thread is fetching updates
while MDD's orphan cleanup procedure is removing orphans from PENDING.

OSP can be asked to read a just created in OSP cache object while
actual object on remote MDS hasn't been created yet. OSP should
block such reads until the creation is done.

Lustre-change: https://review.whamcloud.com/47003/
Lustre-commit: 4f2914537cc32fe89c4781bcfc87c38e3fe4419c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5596c791a758dd542746afd961eb1ed9c97845be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16295 kernel: kernel update RHEL 7.9 [3.10.0-1160.80.1.el7]
Jian Yu [Fri, 18 Nov 2022 20:13:08 +0000 (12:13 -0800)]
LU-16295 kernel: kernel update RHEL 7.9 [3.10.0-1160.80.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.80.1.el7.

Lustre-change: https://review.whamcloud.com/49045
Lustre-commit: TBD (from 636e97a22936a1fab8d9e5fde40f6e1f9a1c5bc5)

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I50a0ee572d24ddc73f8af6dc32ef701c260e45b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-6399 pcc: add tunable parameter for PCC attach thread
Qian Yingjin [Wed, 16 Nov 2022 09:26:33 +0000 (04:26 -0500)]
LU-6399 pcc: add tunable parameter for PCC attach thread

Currently the max number of kernel threads doing asynchronous
attach is a hard code value (1024 by default).
In this patch, we make it a tunable parameter:
llite.*.pcc_max_attach_thread_num

Change-Id: Ic59c15af935dd8dff586fa6be3939d4322c136d5
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6372 lipe: Remove colocation constraint from lamigo/lpurge resources
Gaurang Tapase [Fri, 11 Nov 2022 18:26:30 +0000 (23:56 +0530)]
EX-6372 lipe: Remove colocation constraint from lamigo/lpurge resources

We now rely on node attribute *-recovered to start HP resources.
Hence, starting ES 5.2.7 colocation constraints are not needed
to start resources. Moreover, with the rules added, base FS
target resources cannot start on the designated nodes as node
get -inf score. This prevents resources failback in case original
server comes back up after failover.

Test-Parameters: trivial

Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Change-Id: I890b12bf8a0d75d618a041be1eb27960dc62cc7e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49179
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artur Novik <anovik@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16160 llite: clear stale page's uptodate bit
Bobi Jam [Tue, 20 Sep 2022 16:27:04 +0000 (00:27 +0800)]
LU-16160 llite: clear stale page's uptodate bit

With truncate_inode_page()->do_invalidatepage()->ll_invalidatepage()
call path before deleting vmpage from page cache, the page could be
possibly picked up by ll_read_ahead_page()->grab_cache_page_nowait().

If ll_invalidatepage()->cl_page_delete() does not clear the vmpage's
uptodate bit, the read ahead could pick it up and think it's already
uptodate wrongly.

In ll_fault()->vvp_io_fault_start()->vvp_io_kernel_fault(), the
filemap_fault() will call ll_readpage() to read vmpage and wait for
the unlock of the vmpage, and when ll_readpage() successfully read
the vmpage then unlock the vmpage, memory pressure or truncate can
get in and delete the cl_page, afterward filemap_fault() find that
the vmpage is not uptodate and VM_FAULT_SIGBUS got returned. To fix
this situation, this patch makes vvp_io_kernel_fault() restart
filemap_fault() to get uptodated vmpage again.

Lustre-change: https://review.whamcloud.com/48607
Lustre-commit: 5b911e03261c3de6b0c2934c86dd191f01af4f2f

Test-Parameters: testlist=sanityn env=ONLY="16f",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16g",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16f 16g",ONLY_REPEAT=50
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I369e1362ffb071ec0a4de3cd5bad27a87cff5e05
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16304 kernel: kernel update RHEL8.7 [4.18.0-425.3.1.el8]
Jian Yu [Wed, 16 Nov 2022 19:56:58 +0000 (11:56 -0800)]
LU-16304 kernel: kernel update RHEL8.7 [4.18.0-425.3.1.el8]

Update RHEL8.7 kernel to 4.18.0-425.3.1.el8.

Lustre-change: https://review.whamcloud.com/49080
Lustre-commit: TBD (from 8900b469b4d521361d31ca96fed23c49a141fe93)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I13e6d83ada1ec0c4da92f307bf56db5281c41892
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16294 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.28.1]
Jian Yu [Thu, 10 Nov 2022 18:45:10 +0000 (10:45 -0800)]
LU-16294 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.28.1]

Update SLES15 SP4 kernel to 5.14.21-150400.24.28.1 for Lustre client.

Lustre-change: https://review.whamcloud.com/49046
Lustre-commit: TBD (from 6573047b9b577a908ee3ea4ce0904d34cd867912)

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I651894274a09b6240f321e787736d298c5dc41ce
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
Alexandre Ioffe [Thu, 17 Nov 2022 22:29:40 +0000 (14:29 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list

This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I5d73cb38d08533656c64b69f814f1d34e5e667ff
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5758 lipe: complete recovery before hotpools start
Arthur Novik [Wed, 5 Oct 2022 05:16:48 +0000 (22:16 -0700)]
EX-5758 lipe: complete recovery before hotpools start

Added Pacemaker location rules for lamigo and lpurge which force
to start these resources only after OST/MDT recovery complete.
This is conditional on newer Lustre Resource Agent being installed.

Lustre-change: https://review.whamcloud.com/48248
Lustre-commit: f093aef6cbc1a02f8a1b8795f79a4c6d10137a30

Test-Parameters: trivial testlist=hot-pools
Change-Id: Icb3405ca55d5ae940d978b16461d8d4bc2d4d623
Signed-off-by: Arthur Novik <artur_novik@epam.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6379 lipe: add dump_fids option in help
Alexandre Ioffe [Tue, 15 Nov 2022 07:16:25 +0000 (23:16 -0800)]
EX-6379 lipe: add dump_fids option in help

Added missing dump_fids command line option in
command line help

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I197fb7beb3e8712736fa29bb49d2df1ee4517616
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49161
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13176 mdd: rename file with different project ID
Hongchao Zhang [Tue, 11 Jan 2022 15:12:55 +0000 (23:12 +0800)]
LU-13176 mdd: rename file with different project ID

This patch relaxes the limitation for rename between different
project IDs, and it will allow the normal file rename between
directories with different project IDs.

Lustre-change: https://review.whamcloud.com/45660
Lustre-commit: 88c26912a3237fb63923bbb7c7b09111f9f30bbe

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4a2c21248d1e12ad1d00430e11e5dd50fe5eaf60
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15435 ptlrpc: unregister reply buffer on rq_err
Alexander Zarochentsev [Fri, 14 Jan 2022 15:35:48 +0000 (10:35 -0500)]
LU-15435 ptlrpc: unregister reply buffer on rq_err

Unregister reply buffer on rq_err and prevent a late reply from
modifying request flags in INTERPRET state.

Fixes: cefabee52586 ("LU-15112 mgc: do not ignore target registration failure")
HPE-bug-id: LUS-10717

Lustre-change: https://review.whamcloud.com/46132
Lustre-commit: d8012811cc6ff9c7f0fb1ddfec9461e9ff963e54

Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I0106e3fd5443c1292c103247cdbf6122f91922e8
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16222 kernel: RHEL 8.7 client and server support
Jian Yu [Fri, 11 Nov 2022 23:17:19 +0000 (15:17 -0800)]
LU-16222 kernel: RHEL 8.7 client and server support

This patch makes changes to support RHEL 8.7 release
with kernel 4.18.0-423.el8 for Lustre client and server.

Lustre-change: https://review.whamcloud.com/48879
Lustre-commit: 293844d132b79a1d256ed4200d5dbd8bb790bfb4

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Change-Id: Ie97ff67c9a5fbd46bc145ab559665dcbc630b4a0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Co-Authored-By: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn66
Andreas Dilger [Fri, 11 Nov 2022 09:46:43 +0000 (02:46 -0700)]
RM-620 build: New tag 2.14.0-ddn66

New tag 2.14.0-ddn66

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I04f5c407499930a1893d32c0c699c438264dcaf5

2 years agoEX-6298 lipe: Decrease wait time to reconnect to ALR
Alexandre Ioffe [Thu, 10 Nov 2022 19:42:33 +0000 (11:42 -0800)]
EX-6298 lipe: Decrease wait time to reconnect to ALR

1) Made delay between reconnections to ALR gradually increasing
starting from as little as 5 seconds when ssh session
to ALR fails. It makes attempt to reconnect more often
initially.
2) Enable hot-pools test 72 previously excepted

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools mdtcount=6 env=ONLY=72,ONLY_REPEAT=40
Change-Id: Iafae62d733390f92370f5d224830944f285da934
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6279 lipe: need python and pylint for all builds
Lei Feng [Tue, 1 Nov 2022 23:12:53 +0000 (07:12 +0800)]
EX-6279 lipe: need python and pylint for all builds

Check python and pylint ready for all builds.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I7e93ec3cdd51d96ed938f6fa85953b9e3f250877
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]
Jian Yu [Tue, 8 Nov 2022 18:40:24 +0000 (10:40 -0800)]
LU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.30.1.el9_0 for Lustre client.

Lustre-change: https://review.whamcloud.com/49044
Lustre-commit: TBD (from 247849f22a32e85eb8b718d18642f65ac7663a82)

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ide942f88242c80af1e103b226b65cfbce94bfb57
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15935 target: keep track of multirpc slots in last_rcvd
Etienne AUJAMES [Fri, 29 Jul 2022 12:35:33 +0000 (14:35 +0200)]
LU-15935 target: keep track of multirpc slots in last_rcvd

OBD_INCOMPAT_MULTI_RPCS is cleared by tgt_boot_epoch_update() if the
recovery is aborted. This supposes that all the clients are evicted
but that is not true. Some clients could have successfully finished
their recovery. In that case, those clients will keep their last_rcvd
slot.

This patch modifies lut_num_client to keep track of multirpc
slots in last_rcvd.
For now the counter is use only by tgt_fini() to clear
OBD_INCOMPAT_MULTI_RPCS. So we can expand this use case for
tgt_boot_epoch_update().

Add replay-dual test_33.

Lustre-change: https://review.whamcloud.com/48082
Lustre-commit: 1a79d395dd61ea2e21598bfaa5b39375e64ec22c

Test-Parameters: testlist=replay-dual env=ONLY=33,ONLY_REPEAT=30
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I70791c9dcb7cc77f018b9e5c95568598d54f0322
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15404 ldiskfs: truncate during setxattr leads to kernel panic
Andrew Perepechko [Thu, 10 Nov 2022 04:59:27 +0000 (20:59 -0800)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic

When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.

A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Lustre-change: https://review.whamcloud.com/46358
Lustre-commit: e239a14001b62d96c186ae2c9f58402f73e63dcc

Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16251 obdclass: fill jobid in a safe way
Lei Feng [Wed, 19 Oct 2022 04:10:23 +0000 (12:10 +0800)]
LU-16251 obdclass: fill jobid in a safe way

jobid_interpret_string() does not fill jobid in an atomic way.
So in lustre_get_jobid() give it a buffer first, then copy the
buffer to jobid as a whole.

Lustre-change: https://review.whamcloud.com/48915
Lustre-commit: 9a0a89520e8b57bd63a9343fe3cdc56c61c41f6d

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ib8f6aaa93df31867982a0d142f33d7374a27234f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49081
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode
Alexander Zarochentsev [Fri, 29 Jul 2022 19:38:09 +0000 (22:38 +0300)]
LU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode

The flag should be cleared for "fast" symlinks otherwise
e2fsck complains about inode correctness.
New agent inodes of symlink type may have EXT4_EXTENT_FL flag
set if the fs has "extent" feature and it is not cleared as in
other places where "fast" symlinks are created.

Lustre-change: https://review.whamcloud.com/48093
Lustre-commit: 73ac8e35e5d64d3fe4ca6c48514dc57058e3a7b8

HPE-bug-id: LUS-10237
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib7b807bb1298cc3a9fd4fdba35747b4bda6fe034
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16258 llite: Explicitly support .splice_write
Shaun Tancheff [Fri, 21 Oct 2022 04:54:49 +0000 (23:54 -0500)]
LU-16258 llite: Explicitly support .splice_write

Linux commit v5.9-rc1-6-g36e2c7421f02
  fs: don't allow splice read/write without explicit ops

Lustre supports splice_write and previously provide handlers
for splice_read.
Explicitly use iter_file_splice_write, if it exists.

Lustre-change: https://review.whamcloud.com/48928
Lustre-commit: c619b6d6a54235cc0e34a65cf5916a632f4011c3

HPE-bug-id: LUS-11259
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I858688fc9b4dd370b6018c3b134f01e580477b25
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16207 build: add rpm-build BuildRequires for SLES15 SP3
Jian Yu [Tue, 4 Oct 2022 16:24:36 +0000 (09:24 -0700)]
LU-16207 build: add rpm-build BuildRequires for SLES15 SP3

SLES15 SP3 fails to build using rpm-build-4.14.1-29.46
from the main O/S repository with error message:

- Dependency tokens must begin with alpha-numeric,
  '_' or '/': BuildRequires: %kernel_module_package_buildreqs

Updating rpm-build to 4.14.3-150300.46.1 or higher
resolved the build issue.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48760
Lustre-commit: 78c681d9f42cb56e30c8946e5d7b05f0bc6e86f2

Change-Id: I80099e7ba2d98e07b9877183879766f3dd7f3c1a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5473 tests: add version check for interop
Minh Diep [Wed, 9 Nov 2022 21:21:32 +0000 (13:21 -0800)]
EX-5473 tests: add version check for interop

sanity-quota test_75 on 2.12 servers

Test-Parameters: trivial testlist=sanity-quota

Change-Id: I57f5b6415017ec7cf81e3bcb43f289087a8621fd
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6331 lipe: lamigo --help causes Segmentation fault
Alexandre Ioffe [Tue, 8 Nov 2022 18:32:25 +0000 (10:32 -0800)]
EX-6331 lipe: lamigo --help causes Segmentation fault

Fixed printf NULL string argument which causes the seg fault

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I0a9bc3cee308c8cd88d23674bb5127cddb1fdb41
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>