Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-11912 fid: clean up OBIF_MAX_OID and IDIF_MAX_OID
Li Dongyang [Tue, 23 Nov 2021 23:45:48 +0000 (10:45 +1100)]
LU-11912 fid: clean up OBIF_MAX_OID and IDIF_MAX_OID

Define the OBIF|IDIF_MAX_OID macros to 1ULL << OBIF|IDIF_MAX_BITS - 1
Clean up the callers and remove OBIF|IDIF_OID_MASK which are not used.

Lustre-change: https://review.whamcloud.com/45659
Lustre-commit: bb2f0dac868cf1321277bc3d7d6fc71f016d921b

Test-Parameters: trivial
Change-Id: I9a679b930c73da5904b2eb4c74f785fc1d27a8a0
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15670 clio: Disable lockless for DIO with O_APPEND
Shaun Tancheff [Tue, 22 Mar 2022 13:08:35 +0000 (08:08 -0500)]
LU-15670 clio: Disable lockless for DIO with O_APPEND

Lockless O_DIRECT with O_APPEND can allow interleaved / racy
appends from concurrent I/O.

Disable lockless I/O when O_APPEND is set

Lustre-change: https://review.whamcloud.com/46890
Lustre-commit: 649d638467c0375797cd59ab7c9ac4113e6c682e

HPE-bug-id: LUS-9776
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5c56f92c90e631c295f56e5958985f516e1990f8
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49666
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-7544 lipe: set lpcc interval to 30 by default
Lei Feng [Fri, 19 May 2023 05:01:06 +0000 (13:01 +0800)]
EX-7544 lipe: set lpcc interval to 30 by default

set the default value of interval lpcc.conf and lpcc_purge
to 30 seconds everywhere.

Test-Parameters: trivial testlist=sanity-pcc
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Iaee812bdc0d3c4d04549092e5c8aa78b0a2f1dbd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51058
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16331 utils: fix 'lfs find -O <uuid>' with gaps
Andreas Dilger [Tue, 22 Nov 2022 05:40:03 +0000 (22:40 -0700)]
LU-16331 utils: fix 'lfs find -O <uuid>' with gaps

Fix the UUID parsing in llapi_get_target_uuids() so that the OST
UUIDs are put into the right slots when there is a gap in numbering.
Otherwise, "lfs find -O <uuid>" will not be able to find the given
UUID if it is the first OST after the gap.

Add test case for 'lfs find' and 'lfs getstripe' with large/sparse
OST indices in conf-sanity test_81.

Lustre-change: https://review.whamcloud.com/49207
Lustre-commit: 05334b90a5d3ddd6c8eabc3683fd487f47df6e35

Test-Parameters: trivial testlist=conf-sanity env=ONLY="81-82"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia0581f85f016c202514148114924509118a0f792
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51076
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-7553 - reduce b_es6_0 failover-part-3 duration
Charlie Olmstead [Tue, 23 May 2023 15:18:53 +0000 (09:18 -0600)]
EX-7553 - reduce b_es6_0 failover-part-3  duration

Reduced recovery-mds-scale duration to 12 hours

Test-Parameters: trivial
Test-Parameters: austeroptions="-R" clientcount=4 clientdistro=el7.9 \
                 env=SLOW=yes failover=true iscsi=1 mdscount=2 \
                 mdssizegb=10 osscount=2 ostsizegb=2 serverdistro=el7.9 \
                 testgroup=failover-part-3

Change-Id: I791044863b5c52f44a0db36e816581b9d4655287
Signed-off-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14692 tests: allow FID_SEQ_NORMAL for MDT0000
Li Dongyang [Tue, 25 Jan 2022 00:53:33 +0000 (11:53 +1100)]
LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000

Fix the tests asssuming objects created for MDT0000
always have a seq number of 0, to prepare for
deprecating IDIF sequence.

Fix sanity test_312 on ZFS to properly identify which
OST the object was created on, but do not re-enable it.

Lustre-change: https://review.whamcloud.com/46293
Lustre-commit: eaae4655567b16260237764dadb7ab57df8b0edd
Lustre-change: https://review.whamcloud.com/49720
Lustre-commit: 8767d2e44110fc19e624e963d5ebc788409339d3

Test-Parameters: trivial
Test-Parameters: testlist=sanity-scrub env=ONLY=19
Test-Parameters: testlist=sanity-sec env=ONLY=37
Change-Id: I4bffabe25a6f84cdba760aabea1da3429715a283
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50756
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15252 mdt: reduce contention at mdt_lsom_update
Alexander Boyko [Thu, 2 Dec 2021 09:43:54 +0000 (04:43 -0500)]
LU-15252 mdt: reduce contention at mdt_lsom_update

mot_som_mutex serialize all close requests with lsom updates for
a same mdt_object. For a massive open/read/close single shared
file load, it leads to high load avarage cause many threads sleep
on mutex.
This patch introduces a cached lsom size, and uses a mutex at update
part only. Close requests with lsom size less or equal to cached size
would not take a mutex at all.

Test results MPI open/flock/funlock/close SSF
10 iterations 10 node 100 thread each, 1000 file ops per thread
close time secs master patch MDT load avarage master patch
avg             0.142  0.086                  47.05  38.89
max             0.164  0.129                  49.39  44.77
min             0.097  0.041                  44.44  34.7

Lustre-change: https://review.whamcloud.com/45709
Lustre-commit: c8b7afe4970415f8dae84f5e20661f8a3b3681a0

Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I807b468b128295df9391b0467e74d4f10240662e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51030
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6841 tests: do not truncate over s_maxbytes
Li Dongyang [Wed, 10 May 2023 00:18:20 +0000 (10:18 +1000)]
EX-6841 tests: do not truncate over s_maxbytes

sanity-lipe-scan3/104 truncates file to MAX_LFS_FILESIZE
but it should not go over s_maxbytes.

Fixes: c05dbbbbca ("EX-4015 lipe: add lipe_scan3")
Test-Parameters: trivial testlist=sanity-lipe-scan3 env=ONLY=104
Change-Id: I4a14befb0dfc34b2611052850b77a0841ff853aa
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50899
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16529 test: verify the grant > 0 before test
Hongchao Zhang [Tue, 18 Apr 2023 18:58:26 +0000 (02:58 +0800)]
LU-16529 test: verify the grant > 0 before test

In the test 84 in sanity-quota, the grant should be larger than 0
after the dd complete, this patch adds check for it and forces the
quota to be re-integrated during the check.

Lustre-change: https://review.whamcloud.com/50799
Lustre-commit: fdffbd3d7842a2f99e5fcdcb5b6e6766949f6333

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I9cddcf0c4afc12a11f3535792262ebb35a1e480e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16674 obdclass: optimize job_stats reads
Etienne AUJAMES [Tue, 28 Mar 2023 19:46:24 +0000 (21:46 +0200)]
LU-16674 obdclass: optimize job_stats reads

This patch has 2 objectives:

1/ limit the lock time on ojs_list (list of job stats)

"lctl get_param mdt.*.job_stats" can not dump job_stats in a single
read (seq_file buffer is limited to 4k). So, several reads are needed
to dump the full job list.
For each read, we have to find the job entry corresponding to the file
offset. For now, we walk ojs_list from the beginning to get this
entry.

This patch saved the last known entry and the corresponding offset to
start the next read from here.

2/ avoid the lock contention when reading job_stats

This patch replaces the read lock on ojs_lock by RCU locking, this
enables userspace processes reading the job_stats not to interfere
with the kernel target threads.

Add the stress test sanity 205g to check for possible races.
Add stack_trap in sanity test 205a and 205e to restore jobid_name and
jobid_var.

* Performance *

The following command is used to capture records:
$ time grep -c job_id /proc/fs/lustre/mdt/lustrefs-MDT0000/job_stats

- job_stats dump with no fs activity

Here are results after ending sanity test 205g with slow mode and
job_cleanup_interval=300s.
               ___________________________________
              | nbr of job | time | rate          |
 _____________|____________|______|_______________|
|without patch| 14749      | 1.3s | 11345 jobid/s |
|_____________|____________|______|_______________|
|with patch   | 22209      | 0.6s | 37015 jobid/s |
|_____________|____________|______|_______________|
|diff %       | +43%       | -54% | +226%         |
|_____________|____________|______|_______________|

- job_stats dump with fs activity

Here are results before ending sanity test 205g with slow mode and
job_cleanup_interval=300s.
               ___________________________________
              | nbr of job | time | rate          |
 _____________|____________|______|_______________|
|without patch| 14849      | 2.3s | 6428  jobid/s |
|_____________|____________|______|_______________|
|with patch   | 22776      | 1.2s | 18823 jobid/s |
|_____________|____________|______|_______________|
|diff %       | +53%       | -47% | +192%         |
|_____________|____________|______|_______________|

Lustre-change: https://review.whamcloud.com/50459
Lustre-commit: c6890a955f89508db46fd8ffbf22b05b145976cd

Test-Parameters: testlist=sanity env=SLOW=yes,ONLY=205g,ONLY_REPEAT=10
Test-Parameters: testlist=sanity env=ONLY=205g serverversion=2.14.0
Test-Parameters: testlist=sanity env=SLOW=yes,ONLY=205
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: Ic4cd90965720af76eff0ed4e00ca897518bfbc66
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16712 cksum: fix generating T10PI guard tags for partialbrw
Li Dongyang [Wed, 5 Apr 2023 02:54:13 +0000 (12:54 +1000)]
LU-16712 cksum: fix generating T10PI guard tags for partialbrw

To get better performance, we allocate a page as the buffer for
T10PI guard tags, we fill the buffer going over every page for brw,
when the buffer is considered full, we use
cfs_crypto_hash_update_page() to update the hash and reuse the buffer.

It doesn't work when there's a page in the brw gets clipped, and the
checksum sector size is 512. For a page with PAGE_SIZE of 4096,
and offset at 1024, we will end up with 6 guard tags, and won't have
enough space in the very end of the buffer for a full brw page, which
needs 8.

Work out the number of guard tags for each page, update the
checksum hash and reuse the buffer when needed.

Lustre-change: https://review.whamcloud.com/50540
Lustre-commit: 3999627447c01eebd96c14cc5cf8bba93f89a66b

Change-Id: Ic591e63b24534f2a42b670669520895cb35a9546
Fixes: b1e7be00cb ("LU-10472 osc: add T10PI support for RPC checksum")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-12353 ldiskfs: add ext4-dquot-commit-speedup patch to more series
Jian Yu [Fri, 12 May 2023 19:10:04 +0000 (12:10 -0700)]
LU-12353 ldiskfs: add ext4-dquot-commit-speedup patch to more series

Add ext4-dquot-commit-speedup.patch to RHEL 8.x ldiskfs patch series.

Lustre-change: https://review.whamcloud.com/50853
Lustre-commit: TBD (from 06da805983c298f0957decfdb1d08cf7c39fd99b)

Test-Parameters: trivial clientdistro=el8.7 serverdistro=el8.7 testlist=sanity
Change-Id: Ib0ac325bde442b4eafedde9ba44984b02d5ea061
Fixes: dad25f258e50 ("LU-12353 ldiskfs: speedup quota journalling")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn87
Andreas Dilger [Sat, 13 May 2023 00:01:58 +0000 (18:01 -0600)]
RM-620 build: New tag 2.14.0-ddn87

New tag 2.14.0-ddn87

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6450c71a5e3ada3e0f63670867420a57c88bde5d

2 years agoLU-16649 llite: EIO is possible on a race with page reclaim
Patrick Farrell [Fri, 12 May 2023 14:22:58 +0000 (10:22 -0400)]
LU-16649 llite: EIO is possible on a race with page reclaim

We must clear the 'uptodate' page flag when we delete a
page from Lustre, or stale reads can occur.  However,
generic_file_buffered_read requires any pages returned from
readpage() be uptodate.

So, we must retry reading if page truncation happens in
parallel with the read.

This implements the same fix as:
https://review.whamcloud.com/49647
b4da788a819f82d35b685d6ee7f02809c05ca005

did for the mmap path.

Lustre-change: https://review.whamcloud.com/50344
Lustre-commit: 1d98e5c32b41e19bb1247958e666bb66e69dbc4c

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iae0d1eb343f25a0176135347e54c309056c2613a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRevert "LU-14541 llite: Check vmpage in releasepage"
Patrick Farrell [Fri, 12 May 2023 14:22:09 +0000 (10:22 -0400)]
Revert "LU-14541 llite: Check vmpage in releasepage"

This reverts commit c524079f4f59a39b99467d9868ee4aafdcf033e9,
because it breaks releasepage for Lustre and does not
completely fix the data consistency issue in LU-14541.

Breaking releasepage matters because it prevents direct I/O
from working if there is page cache data present, and
because it causes similar issues with GDS, which must be
able to flush page cache pages before doing I/O.

With patches:
"LU-16160 llite: SIGBUS is possible on a race with page reclaim"/
d9c23a7934747eb19e23470b30806482a1aa60f8
and
"LU-14541 llite: Check for page deletion after fault"/
19678e30147f50f813e72e8216cfb0453fe0ca6e
LU-14541 is fully resolved, so we can revert this patch.

Lustre-change: https://review.whamcloud.com/49654
Lustre-commit: e3cfb688ed7116a57b2c7f89a3e4f28291a0b69f

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I613bdb4f27161ffc3638d1d8ea38827af5a7bd47
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50304
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-7395 pcc: use llapi_open_by_fid to check pinned files
Qian Yingjin [Wed, 26 Apr 2023 09:18:18 +0000 (05:18 -0400)]
EX-7395 pcc: use llapi_open_by_fid to check pinned files

When check whether a file was pinned in PCC backend, it reported:
"cannot read or parse pin xattr of file
'/lustre/fsr/.lustre/fid/[0x780001b83:0x2138:0x0]'.: No such file
or directory (2)"

The failure reason is that open by FID is not configured for
subdirectory mounts.
In this patch, we use llapi_open_by_fid (which supports for
subdirectory mount) to open the file to solve this error.

Change-Id: If0120d745418836cfdf449a795b6f524c40f9c27
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11170 tests: don't fail sanity/415 in VM
Andreas Dilger [Mon, 17 Apr 2023 08:34:12 +0000 (02:34 -0600)]
LU-11170 tests: don't fail sanity/415 in VM

Don't fail sanity test_415 when running in a VM due to variable
runtimes for the tests.

A proper solution would be to examine the logs to determine if
the renames are blocked or just all slow due to VM contention.

Lustre-change: https://review.whamcloud.com/50654
Lustre-commit: 73a7b1c2a3f0114618db7781adb56974ed682f24

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1a5d0f601705c9ec8559e760c4ec27c7f83ebbe5
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-10733 tests: increase conf-sanity/106 OST size
Andreas Dilger [Thu, 20 Apr 2023 22:13:54 +0000 (16:13 -0600)]
LU-10733 tests: increase conf-sanity/106 OST size

conf-sanity test_106 is trying to create ~64k files, but OST0000
only has about 48k objects in this case, so the file creates are
failing during the test.  This makes the test somewhat unreliable
and hitting errors not related to what was originally intended
(llog wrap handling).

Increase the OSTSIZE for this test to handle the number of objects
needed by the test so it can run more reliably.

Lustre-change: https://review.whamcloud.com/50732
Lustre-commit: 334d780617561c66c91697fb1681ce24b5379387

Test-Parameters: trivial testlist=conf-sanity env=ONLY=106
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie33825801172ea565d9d1d5fb81595d2cad65677
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50714
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16650 kernel: update RHEL 7.9 [3.10.0-1160.88.1.el7]
Jian Yu [Thu, 6 Apr 2023 07:00:16 +0000 (00:00 -0700)]
LU-16650 kernel: update RHEL 7.9 [3.10.0-1160.88.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.88.1.el7.

Lustre-change: https://review.whamcloud.com/50553
Lustre-commit: bd0d79456b91db58a75eeb717c7805d78d8a9a1a

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I4119595943940cca94d1853b59c94a02fed8cb71
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50556
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14699 tests: fix server version to check
Mikhail Pershin [Wed, 26 Apr 2023 09:26:29 +0000 (12:26 +0300)]
LU-14699 tests: fix server version to check

tests 160g and 160s check Lustre version wrongly, patch
corrects these checks to use correct version

Test-Parameters: trivial
Fixes: 735136ead955 (LU-14699 mdd: proactive changelog garbage collection)
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib6a78a55672629686b59f0227e81d78f28bb81ac
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50957
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16739 uapi: make lustre_disk.h buildable in user land
James Simmons [Wed, 26 Apr 2023 15:12:48 +0000 (11:12 -0400)]
LU-16739 uapi: make lustre_disk.h buildable in user land

The rbac work introduced a regression that makes lustre_disk.h
UAPI header no longer buildable in user land. This is causing
sanity test 400b to fail with:

lustre_disk.h:266:18: error: 'LUSTRE_NODEMAP_NAME_LENGTH' undeclared here (not in a function)
  char   ncr_name[LUSTRE_NODEMAP_NAME_LENGTH + 1];
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
lustre_disk.h:267:20: error: 'ncr_flags' is narrower than values of its type [-Werror]
  enum nm_flag_bits ncr_flags:8;
                    ^~~~~~~~~
lustre_disk.h:267:20: error: field 'ncr_flags' has incomplete type
lustre_disk.h:268:21: error: 'ncr_flags2' is narrower than values of its type [-Werror]
  enum nm_flag2_bits ncr_flags2:8;
                     ^~~~~~~~~~
lustre_disk.h:268:21: error: field 'ncr_flags2' has incomplete type
lustre_disk.h:277:2: error: unknown type name 'lnet_nid_t'
  lnet_nid_t nrr_start_nid;
  ^~~~~~~~~~
lustre/lustre_disk.h:278:2: error: unknown type name 'lnet_nid_t'
  lnet_nid_t nrr_end_nid;
  ^~~~~~~~~~

To fix this move several pieces of nodemap handling from lustre_idl.h
to lustre_disk.h.

The git commit 5e6a51787fef20b849682d8c49ec9c2beed5c373 for Linux
kernel version 6.2.0-rc5 made guid_t only available for kernel code.
The only UAPI data structure left is uuid_le. Thankfully MCE requires
this otherwise even uuid_le would be removed. We will need to keep
an eye on this.

Lustre-change: https://review.whamcloud.com/50641
Lustre-commit: 5a6725d19d4037026d7cab2442b0c639d1511e5d

Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=400b
Fixes: b3b61b85cad ("LU-16524 nodemap: add rbac property to nodemap")
Change-Id: I4b962572ec2bf76159a17807c564390ded00d630
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-7449 pcc: output valid state for valid cached files
Qian Yingjin [Thu, 4 May 2023 09:33:35 +0000 (05:33 -0400)]
EX-7449 pcc: output valid state for valid cached files

There are two cases where the command 'lfs pcc state' reports
'none' for PCC status of a file:
- File has not been cached at all into PCC. When the file is read,
  data will come from remote Lustre filesystem.
- File was cached into PCC but system cache dropped on client
  later (e.g. 'sysctl -w vm.drop_caches=3'). When file is read,
  file layout version needs to be compared against remote file
  system. And if verion is matching, data will com from PCC.

This patch adds a valid flag to distinguish between these two
states. For the latter case, the command 'lfs pcc state' will
output as follows:
$ lfs pcc state /mnt/lustre/f105.sanity-pcc
file: /mnt/lustre/f105.sanity-pcc, type: none, flags: valid

Add sanity-pcc/test_105 to verify it works as expected.

Test-Parameters: trivial testlist=sanity-pcc
Change-Id: I1d729bfe550b1bde0e78e8b3ec8217cd598fb64c
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50860
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16758 krb: use Kerberos machine principal in client
Sebastien Buisson [Fri, 21 Apr 2023 13:55:21 +0000 (15:55 +0200)]
LU-16758 krb: use Kerberos machine principal in client

In addition to having Lustre client rely on the
lustre_root/<hostname>@REALM principal to authenticate, support the
more standard Kerberos machine principal host/<hostname>@REALM.
That avoids the need for additional keytab entries, and brings Lustre
in line with other services such as OpenSSH and NFS.

Lustre-change: https://review.whamcloud.com/50709
Lustre-commit: 74890266a39297c1c3a41263a7bfd86e0d8e426a

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id50cef1a3a94248b958ce9ea42b5ae356f29cbf1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50911
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-7447 utils: lamigo to ignore ENODATA
Alex Zhuravlev [Thu, 4 May 2023 06:33:14 +0000 (09:33 +0300)]
EX-7447 utils: lamigo to ignore ENODATA

lamigo shouldn't print an error message when replication fails
with ENODATA - this is a valid case as llapi_layout_get_by_path()
returns ENODATA in all case when a layout can't be fetching,
including non-existing file.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8184d4af78514016b1afb3bca2eb34caf123d3ca
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50858
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
2 years agoLU-16782 kernel: update RHEL 9.1 [5.14.0-162.23.1.el9_1]
Jian Yu [Fri, 28 Apr 2023 00:57:50 +0000 (17:57 -0700)]
LU-16782 kernel: update RHEL 9.1 [5.14.0-162.23.1.el9_1]

Update RHEL 9.1 kernel to 5.14.0-162.23.1.el9_1 for Lustre client.

Lustre-change: https://review.whamcloud.com/50785
Lustre-commit: TBD (from ecb01fccaf9deb30ae0688e353a2b379a30ce65e)

Test-Parameters: trivial clientdistro=el9.1 testlist=sanity
Test-Parameters: trivial serverdistro=el8.7 clientdistro=el9.1 testlist=sanity

Change-Id: I961bac2129b98da2950694fa03e0bf47b780d85c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50787
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15140 tests: cleanup of recovery-*-scale tests fails
Elena Gryaznova [Fri, 10 Dec 2021 15:58:14 +0000 (18:58 +0300)]
LU-15140 tests: cleanup of recovery-*-scale tests fails

Bash trap handler is executed only after completition of
current command, so under big I/O load it can be executed
after test and cleanup phase finished.

Run I/O load in background overcome bash limitation.

Lustre-commit: f252abc6690247ee9608dbde80238add0ecaed8c
Lustre-change: https://review.whamcloud.com/45824

Test-Parameters: trivial clientcount=6 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180 testlist=recovery-double-scale env=SLOW=yes
Test-Parameters: clientcount=5 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180 env=DURATION=82800 testlist=recovery-mds-scale env=SLOW=yes
Test-Parameters: clientcount=5 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=DURATION=82800 testlist=recovery-random-scale env=SLOW=yes

Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-2649
Change-Id: I3c91cac4d3f9af9863e8f48ba8a6bae02190ccb4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46749
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16163 tests: skip racer_on_nfs for NFSv3
Alex Deiter [Fri, 7 Apr 2023 19:49:23 +0000 (23:49 +0400)]
LU-16163 tests: skip racer_on_nfs for NFSv3

Export ALWAYS_EXCEPT env for child NFS test

Lustre-change: https://review.whamcloud.com/50579
Lustre-commit: 892d726f274c7cd4e505689ad69194ac68dc323b

Fixes: 513eb670b0 ("LU-16163 tests: skip racer_on_nfs for NFSv3")
Test-Parameters: trivial testlist=parallel-scale-nfsv3
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: Ibb4a9916166f13ab9bd2374b33d4313453972276
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16772 quota: protect lqe_glbl_data in qmt_site_recalc_cb
Sergey Cheremencev [Tue, 25 Apr 2023 18:10:21 +0000 (22:10 +0400)]
LU-16772 quota: protect lqe_glbl_data in qmt_site_recalc_cb

lqe_glbl_data should be protected with lqe_glbl_data_lock in
qmt_site_reaclc_sb like it did in other places. Otherwise it
may cause following panic:

  BUG: unable to handle kernel NULL pointer at 00000000000000f8
  qmt_site_recalc_cb+0x2f8/0x790 [lquota]
  cfs_hash_for_each_tight+0x121/0x310 [libcfs]
  qmt_pool_recalc+0x372/0x9f0 [lquota]

Also protect lqe_glbl_data access with lqe_glbl_data_lock in
qmt_lvbo_free().  Add debugging to see how often this is hit.

Lustre-change: https://review.whamcloud.com/50748
Lustre-commit: TBD (from e3511c6dbfb097308f48957e2e2df8c25f87030a)

Fixes: 1dbcbd70f8 ("LU-15021 quota: protect lqe_glbl_data in lqe")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I030f14b02062151f1708a03ac7414a9991f798f6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50784
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16735 test: cancel MDC locks and wait recovery
Hongchao Zhang [Tue, 4 Apr 2023 01:05:39 +0000 (21:05 -0400)]
LU-16735 test: cancel MDC locks and wait recovery

During test_35, the MDC LDLM locks should also be cancelled to
flush the pending operations and the recovery should be waited
to complete before checking the quota.

Lustre-change: https://review.whamcloud.com/50638
Lustre-commit: 6c42ca6e445ff41f17c92f1ec875479388a59212

Test-Parameters: trivial testlist=sanity-quota mdscount=2 mdtcount=4
Change-Id: I6508644976be77ad2895107abf90144b51790cfe
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50644
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16286 ldiskfs: reimplement nodelalloc optimization
Andrew Perepechko [Mon, 1 May 2023 17:54:29 +0000 (10:54 -0700)]
LU-16286 ldiskfs: reimplement nodelalloc optimization

fiemap calls perform costly delayed extent search affecting
BRW performance, however, in Lustre we don't use delayed
allocation at all. Let's skip this search completely as we did
in RHEL7.

Lustre-change: https://review.whamcloud.com/49007
Lustre-commit: 3dd73b5c5d61a219c702873711055cb1cc80394a

LU-16286 ldiskfs: add ext4_find_delayed_extent patch to more series

Add rhel8.4/ext4-optimize-find_delayed_extent.patch to RHEL 8.7
ldiskfs patch series.

Test-Parameters: trivial clientdistro=el8.6 serverdistro=el8.6 testlist=sanity
Test-Parameters: trivial clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Change-Id: I2c3562cf5cbdf3c5532e4b79b28a040a995322b7
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11161
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-16523 lprocfs: adjust the format of rename_stats
Lei Feng [Thu, 2 Feb 2023 01:39:03 +0000 (09:39 +0800)]
LU-16523 lprocfs: adjust the format of rename_stats

Adjust the format of rename_stats to a more human-friendly YAML.

Lustre-change: https://review.whamcloud.com/49869
Lustre-commit: 73b5d7db7e8d3ede42524fc447fb30fa05ea7a3f

Fixes: bedb797c5d ("LU-16110 lprocfs: make job_stats and rename_stats valid YAML")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I20e6d07c974e907bb2e30412dd1899f845de2021
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49871
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16087 lprocfs: add histogram to stats counter
Lei Feng [Wed, 17 Aug 2022 00:48:33 +0000 (08:48 +0800)]
LU-16087 lprocfs: add histogram to stats counter

Add histogram to stats counter.
Enable histogram for read/write_bytes in mdt/obdfilter
job stats.

Sample job_stats:
- job_id:          md5sum.0
snapshot_time   : 3143196.864165417 secs.nsecs
start_time      : 3143196.707206168 secs.nsecs
elapsed_time    : 0.156959249 secs.nsecs
  read_bytes:      { samples: 2, ..., hist: { 32K: 1, 1M: 1 } }
  write_bytes:     { samples: 1, ..., hist: { 1K: 1 } }

Lustre-change: https://review.whamcloud.com/48278
Lustre-commit: fde40ce32c91c804cb85be085f2aaf06170047b6

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I75b6909c8b63f08b74c3c411ff3dcd27881bb839
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49760
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15642 obdclass: use consistent stats units
Andreas Dilger [Wed, 16 Mar 2022 04:51:55 +0000 (22:51 -0600)]
LU-15642 obdclass: use consistent stats units

Use consistent stats units, since some were "usec" and others "usecs".
Most stats already use LPROCFS_TYPE_* to encode type stats type, so
use this to provide units for those stats, and only explicitly provide
strings for the few stats that don't match the commonly-used units.
This also reduces the number of repeat static strings in the modules.

Lustre-change: https://review.whamcloud.com/46833
Lustre-commit: b515c6ec2ab84598c77c65eb78f1afd5e67b1ede

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I25f31478f238072ddbf9a3918cd43bb08c3ebbe5
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-11407 obdclass: add start time to stats files
Andreas Dilger [Wed, 19 Sep 2018 21:08:47 +0000 (17:08 -0400)]
LU-11407 obdclass: add start time to stats files

When the stats files are initialized or reset, store the current
timestamp with the stats.  That allows computing average IO and
RPC rates over the accumulated stats lifetime, in addition to the
normal incremental operation rates found by comparing successive
values read from the stats file with the read interval.

Any stats that currently print the "snapshot_time:" header will
now also print "start_time:" and "elapsed_time:" fields as well.
Consolodate this printing into a helper function instead of
duplicating very similar code in many different functions.  Output
can't be exactly the same for all callers, because these fields are
embedded into different types of output files, but it is very close.

Change struct rename_stats and brw_stats to use a common name prefix.

Change the obd_job_stats timestamps to ktime_t so that we can use the
common helper function for printing the header.  It is easier to store
ojs_cleanup_interval internally as 1/2 of the maximum stats age, since
since division is more easily done when the value is initially set as
seconds compared to when it is ktime_t.  This may also be a tiny bit
more efficient since we don't do a divide/shift on each access.

Lustre-change: https://review.whamcloud.com/33201
Lustre-commit: ea2cd3af7bfabfa6876727ee44495f4c331bea8e

LU-16231 misc: fix stats snapshot_time to use wallclock

merged into this patch here to avoid landing broken stats.

Some "init" times were not initialized when stats were allocated or
cleared, do this for all stats shown by lprocfs_stats_header().

Rename struct osc_device fields from od_ to osc_ to avoid confusion
with struct osd_device. Having two od_stats was especially confusing.

Add a test case to verify snapshot_time, start_time, elapsed_time.

Lustre-change: https://review.whamcloud.com/48821
Lustre-commit: e42efe35eec7b9725f7f4fff86aaee04093366b0

LU-16110 lprocfs: make job_stats and rename_stats valid YAML

related to lprocfs_stats_header() are include here that were not
present when that patch was backported.  This fixes the output to
correctly indent the items to follow YAML formatting rules.

Lustre-change: https://review.whamcloud.com/48417
Lustre-commit: e96cb6ff1fea7a2bc62a6c0786fb0e07cbfda81a

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iacefa17def455ef53a28fd14b6d4c670463ebbe5
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49745
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-4141 test: extend expected number of keepalive msgs
Alexandre Ioffe [Thu, 27 Apr 2023 04:19:19 +0000 (21:19 -0700)]
EX-4141 test: extend expected number of keepalive msgs

Make test_165g tolerate wide range of number of
keepalive messages received by lamigo

Test-Parameters: trivial env=ONLY=165g testlist=sanity
Test-Parameters: trivial env=ONLY=165g testlist=sanity
Test-Parameters: trivial env=ONLY=165g testlist=sanity
Test-Parameters: trivial env=ONLY=165g testlist=sanity
Test-Parameters: trivial env=ONLY=165g testlist=sanity
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ia3132d96420d571e9ed67d2baacdb25da2d52c4d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50778
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-7389 pcc: open file for detach O_RDONLY
Patrick Farrell [Tue, 9 May 2023 17:53:23 +0000 (13:53 -0400)]
EX-7389 pcc: open file for detach O_RDONLY

llapi_pcc_detach_file is rdwr, but should just be rdonly.
This means files can be attached but not detached if the
client is mounted rdonly.

The fix is just to open the file in detach with O_RDONLY.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3e289ab52ff760a8ab84a209b968109517953b52
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50749
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16768 lfs: copy optarg string other than using it directly
Bobi Jam [Wed, 19 Apr 2023 14:19:53 +0000 (22:19 +0800)]
LU-16768 lfs: copy optarg string other than using it directly

Copy optarg string for fp_format_printf_str lest it be messed
later.

Lustre-change: https://review.whamcloud.com/50733
Lustre-commit: 75db98cef3df8f9a0e1b6e7a5150f3c332e6167b

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib32883d3261ae921adf0fdd7b05bcbf728de7557
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50690
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16662 autoconf: fix configure test compile for CONFIG_KEYS
Xinliang Liu [Thu, 27 Apr 2023 07:42:03 +0000 (00:42 -0700)]
LU-16662 autoconf: fix configure test compile for CONFIG_KEYS

This fixes below configure error on Linux v5.19+:
$ ./configure --disable-server
...
checking whether to enable gss keyring backend... yes
checking if Linux kernel was built with CONFIG_KEYS in or as module...
no
configure: WARNING: GSS keyring backend requires that CONFIG_KEYS be
enabled in your kernel.
checking for keyctl_search in -lkeyutils... yes
configure: error: Cannot enable gss_keyring. See above for details.
$ grep CONFIG_KEYS -rn /boot/config-*
6884:CONFIG_KEYS=y

For in-tree IB support and without passing Linux src path when run
./configure, the LINUX_OBJ maybe just a soft link to O2IBPATH, they
are pointing to the same dir. E.g.:
O2IBPATH='/usr/src/kernels/6.1.8-3.0.0.7.oe1.aarch64'
LINUX_OBJ='/lib/modules/6.1.8-3.0.0.7.oe1.aarch64/build'
$ ls -l /lib/modules/6.1.8-3.0.0.7.oe1.aarch64/build
lrwxrwxrwx 1 root root 42 Feb  7 00:00
/lib/modules/6.1.8-3.0.0.7.oe1.aarch64/build ->
/usr/src/kernels/6.1.8-3.0.0.7.oe1.aarch64
In this case, current configure will put kernel's Module.symvers to
variable KBUILD_EXTRA_SYMBOLS. This should be avoided after kernel
v5.19 which contains commit "b8422711080f modpost: make multiple export
error". This making multiple export symbol as an error from a warning
which can be seen in the config.log:
...
ERROR: modpost: vmlinux: 'init_uts_ns' exported twice. Previous export
was in vmlinux
...

Lustre-change: https://review.whamcloud.com/50399
Lustre-commit: 321a533b868908f37d01a4b787f5a463a02e427c

Test-Parameters: trivial
Change-Id: I35295b3acc7fffb93716362f5d8c659eb922afcb
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50569
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15300 mdt: refresh LOVEA with LL granted
Alex Zhuravlev [Fri, 21 Apr 2023 05:42:37 +0000 (08:42 +0300)]
LU-15300 mdt: refresh LOVEA with LL granted

this change tries to fix two problems:
1) mdt_reint_open() fetches LOVEA before layout lock is taken.
   this may race with another process changing the layout and
   may result in a stale layout returned with a granted layout
   lock - re-fetch LOVEA once layout lock is granted

2) lov_layout_change() should not apply old layouts which
   can get through when MDS doesn't take layout lock

3) LFSCK shouldn't ignore layout version stored on MDS to avoid
   a situation when version degrades compared to client's copy.

This patch misses an optimization and can result in a number of
useless calls to OSD to fetch LOVEA. To be fixed in a followup
patch.

Lustre-change: https://review.whamcloud.com/46413
Lustre-commit: efbe0f63eff8a9a7b192607382f6859e3b0088b8

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Idee1101d152ab09947faf6d75574a8761a7690a5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16670 enc: make sure DoM files are correctly decrypted
Sebastien Buisson [Mon, 27 Mar 2023 08:46:07 +0000 (10:46 +0200)]
LU-16670 enc: make sure DoM files are correctly decrypted

Make sure DoM files are decrypted upon read by loading their
associated encryption context, via llcrypt_prepare_readdir()/
llcrypt_get_encryption_info().

Fix sanity-sec test_50 accordingly.

Lustre-change: https://review.whamcloud.com/50429
Lustre-commit: 1c424252d37c64e3c223c19dced3cad2649c1f61

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie9ef3cbb08d2295a2fd10b9e9ab0862119c7723e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50431
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-10026 csdc: reserve layout bits for compress component
Bobi Jam [Mon, 14 Nov 2022 08:25:05 +0000 (16:25 +0800)]
LU-10026 csdc: reserve layout bits for compress component

Add layout bits for compress component layout.

* lcme_compr_type: compression type (gzip, lz4, lz4hc, lzo, etc.)
* lcme_compr_lvl: compression level (0=default, 1-15)
* lcme_compr_chunk_log_bits: chunk size = 2^(16+chunk_log_bits)

Component pattern:
* LOV_PATTERN_COMPRESS - file contains compressed data chunks and
       cannot be read by a client without decompression support.

Compress component flags:
* LCME_FL_COMPRESS - the component should be compressed with the
       compression algorithm stored in lcme_comp_type, at level
       lcme_comp_level, with chunk size 2^(16+lcme_chunk_log_bits)
* LCME_FL_PARTIAL - the component holds some uncompressed chunks due
       to poor write size/alignment, and may benefit from being
       recompressed as the full file data is available
* LCME_FL_NOCOMPR - the component should not be compressed because
       the data was found to be incompressible, or by user request

Lustre-change: https://review.whamcloud.com/49170
Lustre-commit: TBD (from 147d4eb27b85b4994a47539be6aceff212365ee5)

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idca22cca87b01bba8a5b3c85ca62044abe1d30eb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49321
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16677 utils: add bitfields and ifdefs to wiretest
Andreas Dilger [Fri, 31 Mar 2023 02:37:08 +0000 (20:37 -0600)]
LU-16677 utils: add bitfields and ifdefs to wiretest

Add CHECK_BITFIELD() for "checking" bitfields in data structures
(currently only adds a comment to wiretest.c, maybe improve later).

Add CHECK_COND_START/FINISH() for adding #ifdef/#endif conditions
into wiretest, mainly for structs not used by (upstream) client.

Lustre-change: https://review.whamcloud.com/50479
Lustre-commit: 5a730827147714136b7d5035ca6115545a6b5ef0

LU-16677 utils: synchronize wirecheck.c and wiretest.c

wirecheck.c is not compilable and out of sync with wiretest.c.
The patch adds forgotten changes to wirecheck.c and replaces
wiretest.c by a version generated by "make newwiretest".

Lustre-change: https://review.whamcloud.com/50456
Lustre-commit: d43eb211995e0afb35690946c78ef6c82b9f86ad

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8ecc4bdd5b5d651faa42f65ce8ea46da003ebbe5
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16524 sec: add fscrypt_admin rbac role
Sebastien Buisson [Wed, 1 Mar 2023 15:11:19 +0000 (16:11 +0100)]
LU-16524 sec: add fscrypt_admin rbac role

The purpose of the new fscrypt_admin rbac role is to control admin
tasks related to fscrypt. When not set, it is forbidden to all users
including root to modify existing protectors or policies, or create
new ones. But it remains possible to lock and unlock encrypted
directories.

Internally, this is achieved by marking fscrypt metadata files and
directories, i.e. everything under ROOT/.fscrypt, with a special mdt
object flag LOHA_FSCRYPT_MD.
Upon request processing, the mdt layer returns -EPERM if the flag
LOHA_FSCRYPT_MD is found on an object that is the target of a modify
request.
The LUSTRE_IMMUTABLE_FL flag is also returned to clients for such
objects.

sanity-sec test_64f is added to exercise the new fscrypt_admin flag.

Lustre-change: https://review.whamcloud.com/50184
Lustre-commit: 22bef9b6c64ef394a2efb41ce1388be71300af0d

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I99956499133994444ccd88e33340067790a182ce
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50339
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16524 sec: enforce rbac roles
Sebastien Buisson [Fri, 3 Feb 2023 13:11:51 +0000 (14:11 +0100)]
LU-16524 sec: enforce rbac roles

There are 5 different rbac roles defined via nodemap:
- byfid_ops, to allow operations by FID (e.g. 'lfs rmfid').
- chlg_ops, to allow access to Lustre Changelogs.
- dne_ops, to allow operations related to DNE (e.g. 'lfs mkdir').
- file_perms, to allow modifications of file permissions and owners.
- quota_ops, to allow quota modifications.
Enforce these roles by checking the value of the 'rbac' nodemap
property on server side and returning -EPERM if operation is
forbidden.

Add sanity-sec test_64* to exercise these capabilities.

Lustre-change: https://review.whamcloud.com/49907
Lustre-commit: 971e025f5fb77f4eaaa1e9070598dfa6292a9678

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I37057f0ab50c02fa99db03cb04149a437e35ee0a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50312
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-7391 lipe: lpcc_purge calculates disk usage as df
Lei Feng [Wed, 26 Apr 2023 01:45:41 +0000 (09:45 +0800)]
EX-7391 lipe: lpcc_purge calculates disk usage as df

lpcc_purge calculate disk usage in the same way as df command.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc
Change-Id: I43fe60188b1363d0ba58ea659b560b97807dc019
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC
Li Dongyang [Fri, 29 Jul 2022 06:35:41 +0000 (16:35 +1000)]
LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC

ladvise RPC doesn't have OBD_MD_FLGROUP set, when RPC
reaches server, tgt_validate_obdo() will corrupt the FID
if it's seq is in FID_SEQ_NORMAL range.

Do not mess with seq in obdo_to_ioobj() and tgt_validate_obdo(),
since 2.0 all RPCs should have OBD_MD_FLGROUP set.

Add OBD_MD_FLGROUP for ladvise RPC to fix new client talking
to old servers.

Lustre-change: https://review.whamcloud.com/48080
Lustre-commit: bee803c6e440ba6b55e0ff356e5324f44cfa63eb

Change-Id: I373b7f32458b18e29d9bb716a912fe4a54eccac5
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/48080
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50755
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-12347 llite: do not take mod rpc slot for getxattr
Vladimir Saveliev [Thu, 9 Sep 2021 12:05:24 +0000 (15:05 +0300)]
LU-12347 llite: do not take mod rpc slot for getxattr

The following scenario may lead to client eviction:
clientA                clientB                  MDS
threadA1: write to file F1, get
and hold DoM MDC LDLM lock L1:
   ->cl_io_loop()
    ->cl_io_lock()
     :
     ->mdc_lock_granted()
      ->lock->l_writers++
      [hold ref until write done]

threadA2-A8: create files F2-F8:
   ->ll_file_open()
    ->mdc_enqueue_base()
     ->ldlm_cli_enqueue()
      ->ptlrpc_get_mod_rpc_slot()
      ->ptlrpc_queue_wait()
      [hold RPC slot until create done]

                                              OST(s) in recovery.
                                              MDS waiting on OST(s) to
                                              precreate new objects.

threadA1:
   -> cl_io_start()
    -> __generic_file_aio_write()
     -> file_remove_suid()
      -> ll_xattr_cache_refill()
       -> mdc_xattr_common()
        -> ptlrpc_get_mod_rpc_slot()
        [blocked waiting for RPC slot]

                       threadB1: write file F1,
       enqueue DoM MDC lock L1

                                              MDS sends blocking AST
                                              to clientA for lock L1

ldlm_threadA3: cannot cancel busy lock L1:
   -> ldlm_handle_bl_callback()
   ["Lock L1 referenced, will be cancelled later"]

                                              MDS evicts clientA for
                                              not cancelling lock L1

threadA1: never completes write:
  ->cl_io_end()
   ->cl_io_unlock()
    ->osc_lock_cancel()
     ->lock->l_writers--;

The fix is to add IT_GETXATTR to list of operations which do not
need mod rpc slot.

Tests to illustrate the issue is added.

wait_for_function(): total sleep time (wait) is to be equal to max
when 1 is returned.

Lustre-change: https://review.whamcloud.com/44151
Lustre-commit: eb64594e4473af859e74a0e831316cead0f5c49b

Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
HPE-bug-id: LUS-7271
Change-Id: I1b80677df084bda141b9ac58a78b765bd0b14a41
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50754
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16341 quota: fix panic in qmt_site_recalc_cb
Sergey Cheremencev [Fri, 24 Jun 2022 20:38:29 +0000 (23:38 +0300)]
LU-16341 quota: fix panic in qmt_site_recalc_cb

The panic occurred due to empty qit_lqes array after
qmt_pool_lqes_lookup_spec. Sometimes it is possible if
global lqe is not enforced. Return -ENOENT from
qmt_pool_lqes_lookup_spec if no lqes have been added.
It fixes following panic:
BUG: unable to handle kernel NULL pointer dereference at
00000000000000f8
...
RIP: 0010:qmt_site_recalc_cb+0x2ec/0x780 [lquota]
...
[ffffa5564118fda0] cfs_hash_for_each_tight at ffffffffc0c72c81
[libcfs]
[ffffa5564118fe08] qmt_pool_recalc at ffffffffc142dec7 [lquota]
[ffffa5564118ff10] kthread at ffffffffb45043a6
[ffffa5564118ff50] ret_from_fork at ffffffffb4e00255

Add test sanity-quota_14 that reproduces above panic
without the fix.

HPE-bug-id: LUS-11007
Change-Id: Ie51396269fae7ed84379bef5fc964cce789eba7c
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
2 years agoLU-14541 llite: Check for page deletion after fault
Patrick Farrell [Tue, 9 May 2023 15:08:50 +0000 (11:08 -0400)]
LU-14541 llite: Check for page deletion after fault

Before completing a page fault and returning to the kernel,
we lock the page and verify it has not been truncated.  But
we must also verify the page has not been deleted from
Lustre, or we can return a disconnected (ie, not tracked by
Lustre) page to the kernel.

We mark deleted pages !uptodate, but this doesn't matter
for faulted pages, because the kernel assumes they are
returned uptodate, and maps them in to the process address
space.  Once mapped, the page state is not checked until
the page is unmapped.

But because the page is referenced by the mapping, it stays
in the page cache even though it's been disconnected from
Lustre.

Because the page is disconnected from Lustre, it will not
be found and cancelled on lock cancellation.  This can
result in stale data reads.

This is particularly an issue with releasepage (called from
drop_caches or under memory pressure), which can delete
pages separately from cancelling covering locks.

If releasepage is disabled, which is effectively what
"LU-14541 llite: Check vmpage in releasepage"
does, this is not an issue.  But disabling releasepage
causes other problems and is incorrect anyway.

Lustre-change: https://review.whamcloud.com/49653
Lustre-commit: b3d2114e538cf95a7e036f8313e9095fe821da79

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If1164db8f8e92a1cf811431d56d15f30d8eb3faa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50303
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13081 tests: skip sanity test_151/test_156
Alex Deiter [Wed, 26 Apr 2023 22:04:01 +0000 (02:04 +0400)]
LU-13081 tests: skip sanity test_151/test_156

Skip both sanity test_151 and test_156 during interop testing,
since this is really testing server-side functionality only
(OSS caching behavior). And it makes sense to just exclude
test_151 and test_156 during interop testing, otherwise it
seems that the client version of the test can become
inconsistent with the caching behavior/tunables on the OSS
and the failures don't mean anything. There is enough
non-interop testing to catch any regressions in the OSS
cache behavior.

Lustre-change: https://review.whamcloud.com/50777
Lustre-commit: TBD (from 62cd8d19ff103e6e8a2b4bb7cdc00815ddb0edac)

Test-Parameters: trivial
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: I39a8b54894d5b0c7573e6c56d1f8e1ba02b3e3fe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16775 tests: cleanup target after sanity-sec test_31
Sebastien Buisson [Wed, 26 Apr 2023 07:57:03 +0000 (09:57 +0200)]
LU-16775 tests: cleanup target after sanity-sec test_31

sanity-sec test_31 adds an LNet network tcp999, and associated
servicenode param on MDS target. This param must be cleared when
exiting the test, otherwise it can lead to incorrect client HA
behavior, trying to reach out to the fake service nodes.

Lustre-change: https://review.whamcloud.com/50766
Lustre-commit: TBD (e8b6c5a1128210a3a2bf525b5b489224932c8f88)

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 \
    clientselinux testlist=sanity-sec env=SHARED_KEY=true,ONLY="31 61"
Fixes: c508c94268 ("LU-16557 client: -o network needs add_conn processing")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If3a1926855bd23e9154c9a32b7a555e934e94565
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50767
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16515 tests: disable sanity test_118c/118d
Andreas Dilger [Wed, 29 Mar 2023 21:39:50 +0000 (15:39 -0600)]
LU-16515 tests: disable sanity test_118c/118d

Temporarily disable sanity test_118c and test_118d until there is
a fix available, since this is failing a large fraction of tests.

Lustre-change: https://review.whamcloud.com/50470
Lustre-commit: 7c52cbf65218d77c0594f92981173aa7d78f6758

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I16ebbc470a126bb99b5c3ecdf93407d6b73ebbe5
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50794
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn86
Andreas Dilger [Tue, 25 Apr 2023 07:22:58 +0000 (01:22 -0600)]
RM-620 build: New tag 2.14.0-ddn86

New tag 2.14.0-ddn86

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iad3d4ffca7cdbb06657a679b5b580af788b6a5ec

2 years agoLU-16603 protocol: add OBD_BRW_COMPRESSED
Alex Zhuravlev [Tue, 28 Feb 2023 09:49:11 +0000 (12:49 +0300)]
LU-16603 protocol: add OBD_BRW_COMPRESSED

so the client can hint OST the data is compressed

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4b721db3ad349d5745ee6698de368d0cb0138954
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50154
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-change: with short Gerrit URL https://review.whamcloud.com/50154
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50372
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15374 tests: check FULL and IDLE for client import state
Jian Yu [Thu, 13 Apr 2023 19:04:50 +0000 (12:04 -0700)]
LU-15374 tests: check FULL and IDLE for client import state

The client-to-OST import state can be FULL or IDLE.

Lustre-change: https://review.whamcloud.com/49298
Lustre-commit: 25fb82eb413389b6023e0e61f7efb71e91d15c01

Test-Parameters: trivial testgroup=review-dne-part-3

Fixes: 25606a2ce1 ("LU-15342 tests: escape "|"")
Fixes: 3da8f014fd ("LU-12857 tests: allow clients to be IDLE after recovery")
Fixes: 5a6ceb664f ("LU-7236 ptlrpc: idle connections can disconnect")

Change-Id: I3582ceb273d241ee71fe907f6d1423746e453faa
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50632
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15342 tests: escape "|"
Elena Gryaznova [Thu, 13 Apr 2023 18:58:27 +0000 (11:58 -0700)]
LU-15342 tests: escape "|"

escape "|" on want="FULL|IDLE" to protect interpretation
by shell:
  sh: IDLE: command not found

Lustre-change: https://review.whamcloud.com/45788
Lustre-commit: 25606a2ce19e94c13694d46c3f15e9a10df40a91

Fixes: af666bef05 ("LU-12857 tests: allow clients to be IDLE after recovery")
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I2f885ea225ba43537f37b8dad1c2e0cd8f652a79
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50631
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-12857 tests: allow clients to be IDLE after recovery
Andreas Dilger [Thu, 13 Apr 2023 18:53:50 +0000 (11:53 -0700)]
LU-12857 tests: allow clients to be IDLE after recovery

If clients are not connected to an OST when it fails (connection
is IDLE), they do not need to be involved in recovery, so this
should not be considered an error when checking the client state.

Lustre-change: https://review.whamcloud.com/45318
Lustre-commit: af666bef058c5b7997527fc851a84a89375912fb

Test-Parameters: trivial testlist=recovery-mds-scale env=SLOW=no
Test-Parameters: testlist=conf-sanity
Test-Parameters: testlist=replay-dual,replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6cfeb718acd233378ed1608f22061bc15c3ebbe5
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16706 kernel: update RHEL 9.1 [5.14.0-162.22.2.el9_1]
Jian Yu [Tue, 4 Apr 2023 18:49:23 +0000 (11:49 -0700)]
LU-16706 kernel: update RHEL 9.1 [5.14.0-162.22.2.el9_1]

Update RHEL 9.1 kernel to 5.14.0-162.22.2.el9_1 for Lustre client.

Lustre-change: https://review.whamcloud.com/50512
Lustre-commit: 9c316c69b9788ac219540e560a7831b88b81b690

Test-Parameters: trivial clientdistro=el9.1 testlist=sanity
Test-Parameters: trivial serverdistro=el8.7 clientdistro=el9.1 testlist=sanity
Change-Id: Ib5186e6f0dcd89660b7000db7f37c0c5a29f944f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50528
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16710 kernel: update RHEL 8.7 [4.18.0-425.19.2.el8_7]
Jian Yu [Thu, 6 Apr 2023 01:31:34 +0000 (18:31 -0700)]
LU-16710 kernel: update RHEL 8.7 [4.18.0-425.19.2.el8_7]

Update RHEL 8.7 kernel to 4.18.0-425.19.2.el8_7.

Lustre-change: https://review.whamcloud.com/50548
Lustre-commit: TBD (from 8cd92d6bdb3726fb25286ead1276622b884805d6)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Change-Id: I17e43d98d1a3c7217e61771e8ed78a7123f9313f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50551
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16563 lnet: use discovered ni status to set initial health
Serguei Smirnov [Thu, 16 Feb 2023 18:34:03 +0000 (10:34 -0800)]
LU-16563 lnet: use discovered ni status to set initial health

If not routing, track local NI status in the ping buffer
such that locally recognized "down" state, for example,
due to a downed network interface/link, is available
to any discovering peer.

On the active side of discovery, check peer NI status so if NI
is down, decrement its health score and queue for recovery.

Lustre-change: https://review.whamcloud.com/50027/
Lustre-commit: da230373bd14306cb97fb48748ebce205f09d468

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I513c7942099c0da9088fa6d4460f76386ea91d3b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16637 llite: call truncate_inode_pages() under inode lock
Bobi Jam [Tue, 14 Mar 2023 02:02:12 +0000 (10:02 +0800)]
LU-16637 llite: call truncate_inode_pages() under inode lock

truncate_inode_pages() is required to be called under (and serialised
by) inode lock.

Lustre-change: https://review.whamcloud.com/50284
Lustre-commit: ef9be34478036db0544753e33030fff7e32bfe44

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I0f1a09c8756522f87a2e5d8030d12f80e2f630b4
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16621 enc: file names encryption when using secure boot
Alex Deiter [Mon, 6 Mar 2023 13:59:46 +0000 (13:59 +0000)]
LU-16621 enc: file names encryption when using secure boot

Secure boot activates lockdown mode in the Linux kernel.
And debugfs is restricted when the kernel is locked down.
This patch moves file names encryption from debugfs to sysfs.

Lustre-change: https://review.whamcloud.com/50219
Lustre-commit: 716675fff642655c4d4715654b0b4880b96139b6

Test-Parameters: trivial testlist=sanity-sec
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Change-Id: I434714941ffac2a4694cabd33f613aef70933678
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50578
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
2 years agoEX-4141 lipe: keepalive message from ofd_access_log_reader
Alexandre Ioffe [Thu, 23 Mar 2023 06:39:12 +0000 (23:39 -0700)]
EX-4141 lipe: keepalive message from ofd_access_log_reader

Lamigo checks ofd_access_log_reader for keepalive message
feature availability. Start ofd_access_log_reader with keepalive
message option if it is available.
Added tests for lamigo and ofd_access_log_reader to
test keepalive message feature
Fixed a bug when trace file is opened.
Moved FATAL, ERROR, DEBUG macros to include and
Used FATAL macro wherever possible

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools,sanity
Test-Parameters: trivial testlist=hot-pools,sanity
Test-Parameters: trivial testlist=hot-pools,sanity
Change-Id: I0f218e2394c9d0ab6cd425860ba79956a10cbd58
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50389
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16615 utils: add messages in l_getidentity
Lai Siyao [Wed, 18 Jan 2023 00:23:05 +0000 (19:23 -0500)]
LU-16615 utils: add messages in l_getidentity

Add time related messages in l_getidentity to help debug upon
timeout, which may cause -EACCES error in user applications.

Lustre-change: https://review.whamcloud.com/50213
Lustre-commit: d5b26443a3d33d68e5747fecc591baa887bc5b89

Test-Parameters: trivial
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I87ebfb85d05e19886d8becc6b14ed0233eaed42d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49717
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16524 nodemap: add rbac property to nodemap
Sebastien Buisson [Wed, 25 Jan 2023 12:54:00 +0000 (13:54 +0100)]
LU-16524 nodemap: add rbac property to nodemap

Add new rbac property to nodemap. Internally this is a mask of allowed
roles. Externally it defaults to all, which means all roles are
allowed, and it can take the following values (multiple can be
specified, comma separated), with the semantic:
- byfid_ops, to allow operations by FID (e.g. 'lfs rmfid').
- chlg_ops, to allow access to Lustre Changelogs.
- dne_ops, to allow operations related to DNE (e.g. 'lfs mkdir').
- file_perms, to allow modifications of file permissions and owners.
- quota_ops, to allow quota modifications.
Apart from all, any role not explicitly specified is forbidden. And to
forbid all roles, use 'none' value.

Update lctl-nodemap-modify man page to mention this new property.

Lustre-change: https://review.whamcloud.com/49873
Lustre-commit: 5e48ffca322c3c72d3b83b0719f245fc6f13c8e4

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4cedf03c75948f4b1e9b55292414ab9110701874
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16338 readahead: add stats for read-ahead page count
Qian Yingjin [Wed, 23 Nov 2022 09:42:53 +0000 (04:42 -0500)]
LU-16338 readahead: add stats for read-ahead page count

This patch adds the stats for read-ahead page count:

lctl get_param llite.*.read_ahead_stats
llite.lustre-ffff938b7849d000.read_ahead_stats=
snapshot_time             4011.320890492 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              4011.320890492 secs.nsecs
hits                      4 samples [pages]
misses                    1 samples [pages]
zero_size_window          4 samples [pages]
failed_to_reach_end       1 samples [pages]
failed_to_fast_read       1 samples [pages]
readahead_pages           1 samples [pages] 255 255 255

Lustre-change: https://review.whamcloud.com/49224
Lustre-commit: cdcf97e17e73dfdd65c4e46bb30c4a07f5e710cf

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Iada06eb7d78ab28cfcc7167e49d25da252da4009
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16579 llite: fix the wrong beyond read end calculation
Qian Yingjin [Mon, 20 Feb 2023 03:11:54 +0000 (22:11 -0500)]
LU-16579 llite: fix the wrong beyond read end calculation

During the test, we found a dead loop in the read path which
retruns AOP_TRUNCATED_PAGE(0x8001) endless.
The reason is that the calculation of the ending beyond offset is
wrong: (iter->count + iocb->ki_pos).
The ending beyond offset was supposed to be not changed during
the read I/O loop for each page in buffered I/O mode.
However, @iter->count is decreased with read bytes when finished
the read of each page: @iter->count -= read_bytes.

In this patch, we store the ending beyond page index in
@lcc->lcc_end_index before call @generic_file_read_iter into a
loop for each read page and solve this bug.

Lustre-change: https://review.whamcloud.com/50065
Lustre-commit: ae356dc325877bd130ad94acc5f3610898de8a8a

Fixes: 2f8f38effa ("LU-16412 llite: check read page past requested")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5bb7ab82e5e2de8b9bd911798fb8ae65fc7c91af
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50068
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16494 fileset: check fileset for operations by fid
Sebastien Buisson [Thu, 19 Jan 2023 17:07:27 +0000 (18:07 +0100)]
LU-16494 fileset: check fileset for operations by fid

Some operations by FID, such as lfs rmfid, must be aware of
subdirectory mount (fileset) so that they do not operate on files
that are outside of the namespace currently mounted by the client.

For lfs rmfid, we first proceed to a fid2path resolution. As fid2path
is already fileset aware, it fails if a file or a link to a file is
outside of the subdirectory mount. So we carry on with rmfid only
for FIDs for which the file and all links do appear under the
current fileset.

This new behavior is enabled as soon as we detect a subdirectory mount
is done (either directly or imposed by a nodemap fileset). This means
the new behavior does not impact normal, whole-namespace client mount.

sanity test_421h is added to exercise this new capability.

Lustre-change: https://review.whamcloud.com/49696
Lustre-commit: 9a72c073d33b0454229402c0cc930dc4e796107b

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I47136ac0a3324b9afdd01b0f902abc37938bd361
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16310 sec: Lustre/HSM on enc file with enc key
Sebastien Buisson [Mon, 14 Nov 2022 16:28:36 +0000 (17:28 +0100)]
LU-16310 sec: Lustre/HSM on enc file with enc key

Support for Lustre/HSM on encrypted files when the encryption key is
available requires similar attention as with file migration.
The volatile file used for HSM restore must have the same encryption
context as the Lustre file being restored, so that file content
remains accessible after the layout swap at the end of the restore
procedure.

Please note that using Lustre/HSM with the encryption key creates
clear text copies of encrypted files on the HSM backend storage.

Lustre-change: https://review.whamcloud.com/49153
Lustre-commit: df7a8d92d2378e236ee83b559e7a1158f84e63f4

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I99cba202cd2c7c747bbe5c4ec7d9208c7f6baf4b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49899
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16205 sec: fid2path for encrypted files
Sebastien Buisson [Thu, 3 Nov 2022 10:52:02 +0000 (11:52 +0100)]
LU-16205 sec: fid2path for encrypted files

Add support of fid2path for encrypted files. Server side returns raw
encrypted path name to client, which needs to process the returned
string. This is done from top to bottom, by iteratively decrypting
parent name and then doing a lookup on it, so that child can in turn
be decrypted.

For encrypted files that do not have their names encrypted, lookups
can be skipped. Indeed, name decryption is a no-op in this case, which
means it is not necessary to fetch the encryption key associated with
the parent inode.

Without the encryption key, lookups are skipped for the same reason.
But names have to be encoded and/or digested. So server needs to
insert FIDs of individual path components in the returned string.
These FIDs are interpreted by the client to build encoded/digested
names.

Add sanity-sec test_63 to exercise this new capability.

Lustre-change: https://review.whamcloud.com/48930
Lustre-commit: fa9da556ad22b1485c53cf0337dc6872d89aedfa

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I165bf2e5657037ae2e25c9378e4713537ea94bec
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49898
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16205 sec: reserve flag for fid2path for encrypted files
Sebastien Buisson [Thu, 3 Nov 2022 10:47:46 +0000 (11:47 +0100)]
LU-16205 sec: reserve flag for fid2path for encrypted files

Reserve OBD_CONNECT2_ENCRYPT_FID2PATH connection flag for fid2path
support for encrypted files.
This connection flag is required so that newer servers continue to
return -ENODATA to older clients.

Lustre-change: https://review.whamcloud.com/49028
Lustre-commit: 6f74bb60ff6c58f4a2647556124c501100330f4c

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I505b90a061687a7ef481adacca98908c96e487be
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16369 ldiskfs: do not check enc context at lookup
Sebastien Buisson [Tue, 6 Dec 2022 16:36:02 +0000 (17:36 +0100)]
LU-16369 ldiskfs: do not check enc context at lookup

On rhel8, ldiskfs should not check for encryption context of inodes
upon lookup. On these kernels, ext4 is not encryption aware, so just
assume context is fine when target is mounted as ldiskfs.

Lustre-change: https://review.whamcloud.com/49324
Lustre-commit: 540c293a4d0fc80253670b3db8d6722da43284ad

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9f9813d290ea24b34f710e2c8219e856ca8fbc58
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16091 enc: S_ENCRYPTED flag on OST objects for enc files
Sebastien Buisson [Thu, 11 Aug 2022 15:08:11 +0000 (17:08 +0200)]
LU-16091 enc: S_ENCRYPTED flag on OST objects for enc files

Add a dumb encryption context on OST objects being created, when the
LUSTRE_ENCRYPT_FL flag gets set in the LMA, for ldiskfs backend
targets. This leads ldiskfs to internally set the LDISKFS_ENCRYPT_FL
flag on the on-disk inode. Also, it makes e2fsprogs happy to see an
enc ctx for an inode that has the LDISKFS_ENCRYPT_FL flag.

Add a dumb encryption context on OST objects being opened, if there is
not already one, for ldiskfs backend targets. This is done by adding
the LUSTRE_ENCRYPT_FL flag if necessary, at the same time as atime
gets updated. It is some sort of live self-check that fixes OST
objects created with an older Lustre version.

Enhance lfsck to detect and fix OST objects belonging to encrypted
files that are missing the encryption flag. This is implemented in the
MDT-OST consistency routine, as part of the layout checking.

Also add sanity-sec test_62 and sanity-lfsck test_42 to exercise this.

Note this patch does not add any dumb encryption context on OST
objects when the backend is ZFS.

Lustre-change: https://review.whamcloud.com/48198
Lustre-commit: 348446d6370b3f63f0da8a96997b3295f896c6fb

Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6bee3c82ee4d1a52275facf9e2b0d60061e0beef
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16454 mdt: Add a per-MDT "max_mod_rpcs_in_flight"
Vitaliy Kuznetsov [Wed, 8 Feb 2023 21:34:38 +0000 (00:34 +0300)]
LU-16454 mdt: Add a per-MDT "max_mod_rpcs_in_flight"

Value max_mod_rpcs_per_client doesn't define a static number of
slots for the per-client replies or anything, and the only
thing it is used for is to pass the limit to the client. For the
same reason, there also doesn't appear to be a particularly hard
limitation why the client cannot change and exceed the
server-provided parameter, except to avoid overloading the server
with too many RPCs at once, but that may also be true of the
current limit with a larger number of clients, no different than
"max_rpcs_in_flight".

This fix adds a tunable parameter "max_mod_rpcs_in_flight" per MDT
to lustre/mdt/mdt_lproc.c so that it can be set
with "lctl set_param" at runtime. The max_mod_rpcs_per_client global
setting is marked "deprecated" but is still used as the default
value when creating an MDT.

Merge
8449bd91ba LU-16558 mdt: Fix max limit for "max_mod_rpcs_in_flight"
into this patch.

Lustre-change: https://review.whamcloud.com/49749
Lustre-commit: f16c31ccd91d66caba69d3ceea6a61c1682df59e
Lustre-change: https://review.whamcloud.com/50010
Lustre-commit: 8449bd91ba45c47614231a9bfe141e700dec8bb9

Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I27cfcb68e1a534e80e6a2dbf2e1affc430803b49
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50450
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-8367 osp: wait for precreate on reformatted OST
Li Dongyang [Mon, 14 Nov 2022 13:28:37 +0000 (00:28 +1100)]
LU-8367 osp: wait for precreate on reformatted OST

We should wait for precreate rpc to finish when we see a just
reformatted/replaced OST, otherwise the client could try
to access the object on OST before it's created.

Do not use sync_trans when recreating the objects on the
reformatted/replaced OST.

Fix detecting reformatted OST for FID_SEQ_NORMAL, for such
seqs the oid will be initialized as LUSTRE_FID_INIT_OID,
which is 1.

Change-Id: I4aebb9d573aa352dd7897e5f1129dc2117a084bb
Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49151
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50410

2 years agoLU-8367 osp: enable replay for precreation request
Alexander Boyko [Tue, 22 Mar 2022 12:09:01 +0000 (08:09 -0400)]
LU-8367 osp: enable replay for precreation request

Lustre has some kind of deadlock between osp_precreate_thread()
and stripe allocation at osp_precreate_reserve(). Stripe allocation
thread allocated objects and sleeps for more objects at
osp_precreate_reserve() in case of OST failover. After reconnection,
osp_precreate_thread() calls osp_precreate_cleanup_orphans() to
synchronize last id and clean-up unused objects, but it waits zero
object reservation(d->opd_pre_reserved). So, no more objects could
be created at OST and no reserved objects could be freed.
This produce slow creates messages and MDT creation threads hang
osp_precreate_reserve()) kjcf05-OST0003-osc-MDT0001: slow creates,
 last=[0x340000400:0x23a4f483:0x0], next=[0x340000400:0x23a4f378:0x0],
 reserved=267, sync_changes=0, sync_rpcs_in_progress=0, status=0
The issue reproduced more often with over stripe feature.

No need to do orphan clean-up phase when MDT supports
resend/replay for precreation request. This behaviour resolves the
osp_precreate_cleanup_orphans() hang and unblocks objects creation.

Force creation logic is added to support reformatted OST with a same
index. It was done during orphan clean-up phase before this.

Sanity tests 27S and 822 become invalid. 27S is based on orphan
clean-up after reconnection, 822 is based on not resendable
OST_CREATE request. These tests are removed.

Lustre-change: https://review.whamcloud.com/46889
Lustre-commit: 63e17799a369e2ff0b140fd41dc5d7d8656d2bf0

HPE-bug-id: LUS-10793
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I21287b51252e573e796fac69ee3df6ac90e28c10
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49821
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15660 statahead: statahead thread doesn't stop
Yang Sheng [Fri, 17 Jun 2022 12:30:34 +0000 (20:30 +0800)]
LU-15660 statahead: statahead thread doesn't stop

Add a barrier to ensure sai_task changing can be seen
when access it without locking. Else the statahead
thread could sleep forever since wake_up was lost.

Lustre-change: https://review.whamcloud.com/47673
Lustre-commit: b977caa2dc7dddcec9e20d393ee79dfa9fe31c0d

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I211e99f1bdddaaaf028a205658f603fda034d389
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16478 target: disconnected export
Alex Zhuravlev [Fri, 17 Feb 2023 08:00:20 +0000 (11:00 +0300)]
LU-16478 target: disconnected export

eviction can race with a reconnect and this in turn can lead
to a leaked export reference prevent further umount -
mdt_obd_reconnect() grabs a reference via nodemap_add_member().
call obd_disconnect() if such a case observed to balance
obd_reconnect().

Lustre-change: https://review.whamcloud.com/50041
Lustre-commit: 654d5f3fa4df2a0f7275a6da0f050a18881f4f75

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3fd49429ef40ef391d58e042e091258dcb9add72
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50427
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16263 lov: continue fsync on other OST objs even on -ENOENT
Bobi Jam [Fri, 3 Mar 2023 03:50:42 +0000 (11:50 +0800)]
LU-16263 lov: continue fsync on other OST objs even on -ENOENT

When fsync races with truncate, we'd continue to other OST object's
fsync even some stripe fsync returns -ENOENT, so that on client it
could potentially discard caching pages by calling
osc_io_fsync_start()->osc_cache_writebase_range().

Lustre-change: https://review.whamcloud.com/50005
Lustre-commit: 927b5cd49c3369d533d7f8dc5c8df497aaf33b6e

Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Test-Parameters: testlist=sanity ostcount=4 env=ONLY="273b 273c",ONLY_REPEAT=120
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I457ba80063086e310df55aaa22778b51a6ea211e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50195
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16465 llite: fix LSOM blocks for ftruncate and close
Etienne AUJAMES [Wed, 18 Jan 2023 09:42:54 +0000 (10:42 +0100)]
LU-16465 llite: fix LSOM blocks for ftruncate and close

LSOM is updated on close and setattr request.
For the setattr, clients do not know the numbers blocks yet (OSTs
setattr requests have to finish). So the blocks number is set to 1 by
the server.

The close request send after a ftruncate() will wrongly update LSOM
back to its old blocks number. This is because clients do not update
the inode.i_blocks after an OST setattr.

Then the MDS will denied a client close request to update LSOM to its
correct blocks number. Only truncates are allowed to decrease the
blocks number (server side).

This patch force the client inode update at the end of an OST setattr.
And it tries (if no contention on the inode_size) to update the inode
at the end of an OST fsync or a sync IO.

Update sanity test 806/807 for this use case.

Lustre-change: https://review.whamcloud.com/49675
Lustre-commit: dfb08bbf77a1362f79c3738cc3952f8db2e46511

Test-Parameters: testlist=sanity env=ONLY=806,807,ONLY_REPEAT=20
Test-Parameters: fstype=zfs testlist=sanity-flr env=ONLY=70,ONLY_REPEAT=10
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: Ib1afde97071ebae56f0b413ec444403c3cdebd02
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15404 ldiskfs: use per-filesystem workqueues to avoid deadlocks
Andrew Perepechko [Tue, 21 Mar 2023 12:30:58 +0000 (08:30 -0400)]
LU-15404 ldiskfs: use per-filesystem workqueues to avoid deadlocks

Calling flush_scheduled_work() under s_umount is dangerous and may
cause deadlocks. This patch backports the fix from
https://lore.kernel.org/all/20220402084023.1841375-1-anserper@ya.ru/

Lustre-change: https://review.whamcloud.com/50354
Lustre-commit: 616fa9b581798e1b66e4d36113c29531ad7e41a0

Fixes: e239a14001 ("LU-15404 ldiskfs: truncate during setxattr leads to kernel panic")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: Ia191b70166f94f34e96a282ec18bd8650871e108
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50585
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16371 ldlm: clear lock converting flag on resource cleanup
Bobi Jam [Wed, 7 Dec 2022 16:03:20 +0000 (00:03 +0800)]
LU-16371 ldlm: clear lock converting flag on resource cleanup

During resource cleanup clear lock's converting flag so that
ldlm_cli_cancel() won't erroneously trip the assertion, the assertion
is used for normal lock revoke callbacks.

Lustre-change: https://review.whamcloud.com/49339
Lustre-commit: 4990f4ef5eb81d8017c9992c1f6924527dc8ce60

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I1be4d7f16dbc7e026b460fd5358a0fe509b97a59
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49337
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16612 llite: protect cp_state with vmpage lock
Bobi Jam [Thu, 2 Mar 2023 09:45:14 +0000 (17:45 +0800)]
LU-16612 llite: protect cp_state with vmpage lock

cl_page_make_ready() calls cl_page_io_start() without vmpage lock
protection, and that could mess up cl_page's cp_state/cp_owner.

Lustre-change: https://review.whamcloud.com/50180
Lustre-commit: d03b038d0dd8360dc896ceb7f3cee99245551cb8

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Id0df7e14246aa561494a9b6e581cebc55241c4b9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50182
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16646 krb: improve lookup of user's credentials
Sebastien Buisson [Wed, 22 Mar 2023 16:09:58 +0000 (17:09 +0100)]
LU-16646 krb: improve lookup of user's credentials

Rather than only looking up for user's credentials in hard-coded
FILE:/tmp/krb5cc_<uid>, try first the default ccache on the system,
and then fallback to files matching /tmp/*krb5cc* and
/run/user/<uid>/*krb5cc*.

Lustre-change: https://review.whamcloud.com/50377
Lustre-commit: 4b12f2b9ae14556285529b40985a7c3aacdbb7f4

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic2bedb4cc12e9adad0ce63bd0617b2e0ec13907e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50653
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16646 krb: use system ccache for Lustre services
Sebastien Buisson [Fri, 17 Mar 2023 16:30:07 +0000 (17:30 +0100)]
LU-16646 krb: use system ccache for Lustre services

Instead of hard-coding a FILE credentials cache for Lustre services,
comply with the system configuration in place and use the default
ccache.

Lustre-change: https://review.whamcloud.com/50342
Lustre-commit: 9784178eff0ad61acedb50ed4bfbb423cafe32b4

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib89fe117e9f1d937925a02c7ed786a81cd8954cb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16646 krb: get rid of MEMORY private cache for krb creds
Sebastien Buisson [Fri, 17 Mar 2023 12:39:31 +0000 (13:39 +0100)]
LU-16646 krb: get rid of MEMORY private cache for krb creds

On client side, Kerberos credentials for root are cached in MEMORY,
but this is just in addition to the original credentials cache.
As there is no need to cache credentials twice, get rid of this
MEMORY private cache.

Lustre-change: https://review.whamcloud.com/50341
Lustre-commit: 5731acd4997f422e3e5561d0ed73359537a7b65f

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ifd02ee6cfafc27347b3c31e0dbbaab15190cf883
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16630 sec: improve Kerberos cross-realm trust remapping
Sebastien Buisson [Fri, 10 Mar 2023 17:02:31 +0000 (18:02 +0100)]
LU-16630 sec: improve Kerberos cross-realm trust remapping

Improve Kerberos cross-realm trust remapping by leveraging existing
Kerberos mechanisms. gss_localname() can be used to resolve usernames:
it goes through the auth_to_local translation rules in krb5.conf and
thus can easily be configured by security administrators.
This new mechanism does not replace the existing and rudimentary
mapping based on /etc/lustre/idmap.conf. If /etc/lustre/idmap.conf
exists, it is used for user mapping. If not, the new mechanism based
on gss_localname() gets involved.
But we now print a warning that idmap.conf is deprecated if we detect
it is in use.

Lustre-change: https://review.whamcloud.com/50259
Lustre-commit: 3214d4d860e36b6aa07addad9e600fd754fc9149

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaaf15a757dc246673e2f412181219cc978079fab
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50292
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15690 tests: skip replay-ost-single/12a on old server
hxing [Fri, 21 Apr 2023 06:57:12 +0000 (14:57 +0800)]
LU-15690 tests: skip replay-ost-single/12a on old server

Skip 12a of replay-ost-single for older server version.

Lustre-change: https://review.whamcloud.com/50701
Lustre-commit: 3201bd4ac497540f74c7295b9ec541aa9775537c

Test-Parameters: trivial testlist=replay-ost-single env=ONLY=12a
Fixes: 28769c65987c ("LU-15195 ofd: missing OST object")
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I473452a5326691f4394c9e3ab2ab5dfecbc6ec58
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50706
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16734 gss: fix lookup_user_key() bug
Aurelien Degremont [Fri, 31 Mar 2023 09:30:37 +0000 (11:30 +0200)]
LU-16734 gss: fix lookup_user_key() bug

With more recent kernels, like on Ubuntu 22.04, trying to
delete some keyring resources trigger a kernel warning message
and cleaning is not successful, leading to stuck resources
and warning messages being regularly printed.

This is because Linux 5.8, in commit 8c0637e, introduced an API
change for lookup_user_key() that was not taken in account.

Update the lookup_user_key() call from _user_key() to fix it.

Lustre-change: https://review.whamcloud.com/50623
Lustre-commit: 013a6711503045b9e7154b8ff786ee85cdc3ecdd

Change-Id: I34ef4dac3f56cbb4aac6bc5a3bad36feb66b8675
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Jonathan Calmels <jcalmels@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50721
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn85
Andreas Dilger [Sat, 15 Apr 2023 00:21:02 +0000 (18:21 -0600)]
RM-620 build: New tag 2.14.0-ddn85

New tag 2.14.0-ddn85

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5bfbaad7178fca1a2af35531579e98f9f03215cf

2 years agoLU-15495 tests: fixed dbench test
Alex Deiter [Wed, 9 Nov 2022 17:06:37 +0000 (21:06 +0400)]
LU-15495 tests: fixed dbench test

* Using awk to get list shared libraries
* Fixed shellcheck warnings

Lustre-change: https://review.whamcloud.com/49088
Lustre-commit: 01fb7bda971ee9d5dcd3950f40668131f306b8c5

Test-Parameters: trivial testlist=sanity \
    clientdistro=el7.9 clientarch=x86_64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial testlist=sanity \
    clientdistro=el8.6 clientarch=x86_64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial testlist=sanity \
    clientdistro=el8.7 clientarch=aarch64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial testlist=sanity \
    clientdistro=el9.0 clientarch=x86_64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial testlist=sanity \
    clientdistro=sles15sp4 clientarch=x86_64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial testlist=sanity \
    clientdistro=ubuntu2004 clientarch=x86_64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial testlist=sanity \
    clientdistro=ubuntu2204 clientarch=x86_64 env=SLOW=yes,ONLY=71
Test-Parameters: trivial env=SLOW=yes,ONLY=26 testlist=replay-dual
Test-Parameters: trivial env=SLOW=yes,ONLY=70b testlist=replay-single
Test-Parameters: trivial env=SLOW=yes,ONLY=8 testlist=sanity-pfl
Test-Parameters: trivial env=SLOW=yes,ENABLE_QUOTA=yes,ONLY=8 \
    testlist=sanity-quota

Change-Id: Ic28bd67dcfb5ff24e65e33278ac867409a2c1cc6
Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50572
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16672 tests: auster node.yml labels Alma and Rocky as CentOS
Charlie Olmstead [Wed, 12 Apr 2023 14:45:18 +0000 (08:45 -0600)]
LU-16672 tests: auster node.yml labels Alma and Rocky as CentOS

release() assumes a node with /etc/centos-release is CentOS. This patch
removes that assumption and uses the name in the centos-release file.
Corrected the os-release code to strip off the last word if present.

Lustre-change: https://review.whamcloud.com/50442
Lustre-commit: TBD (from 3d6c37836b4bb7b1e6ea90bb7aacf4715e44c667)

Test-Parameters: trivial
Other-Id: Ia5acbce3351ca23f4d9265d1aaf8d952a2c8b502
Change-Id: Id16ec38d3530c4ece4fbdbb56c23af24d4c55b99
Signed-off-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50617
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15053 tests: reset quota if ENABLE_QUOTA=1
Sergey Cheremencev [Mon, 30 Jan 2023 17:33:08 +0000 (20:33 +0300)]
LU-15053 tests: reset quota if ENABLE_QUOTA=1

Quota limits set in setup_quota() with ENABLE_QUOTA=1
should be cleaned up in the end to avoid failures in
the next sessions

Lustre-change: https://review.whamcloud.com/49823
Lustre-commit: 2d40d96b4ec86327ec510be293f2ce4711f00826

Test-Parameters: testlist=sanity-quota env=ENABLE_QUOTA=yes
Test-Parameters: testgroup=review-dne-part-4 env=ENABLE_QUOTA=yes
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ia6b034739cfe800c6661f199420d0a4dbe7110fc
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50634
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16510 build: include unsafe_memcpy definition
Patrick Farrell [Fri, 7 Apr 2023 18:15:43 +0000 (14:15 -0400)]
LU-16510 build: include unsafe_memcpy definition

The original LU-16510 missed a key part of the
unsafe_memcpy code from the upstream kernel, and so we
weren't actually defining unsafe_memcpy() as intended.

Thanks to Aurelien Degremont <adegremont@nvidia.com> for
pointing this out.

Fixes: 5099a3b7 ("LU-16510 build: fortified memcpy from linux 6.1")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ib9e2d56ed0b3691f1ab9fcd25403fa86ac784b6d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50574
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-7251 lnet: update locking multiple NIDs of the MR peer
Serguei Smirnov [Tue, 11 Apr 2023 15:37:43 +0000 (08:37 -0700)]
EX-7251 lnet: update locking multiple NIDs of the MR peer

Port updates to
LU-16709 lnet: fix locking multiple NIDs of the MR peer

This allows for the first of the two locked NIDs
to stay primary as intended for the purpose of communicating
with Lustre even if peer discovery succeeded
using a different NID of MR peer.

Lustre-change: https://review.whamcloud.com/50530
Lustre-commit: TBD (ddc9652a238e146e215157572b2e7e119de0e63b)

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ic66c3b6d4dec98540e4fa2d7fa51c0e5e2f442ed
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50603
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer NI when processing llog (again)
Serguei Smirnov [Mon, 10 Apr 2023 16:51:47 +0000 (09:51 -0700)]
LU-14661 obdclass: Add peer NI when processing llog (again)

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to
take the LNet API mutex when adding/modifying peers. We don't want
to take the mutex while a spin lock is already being held, so drop
the spin lock prior to calling class_add_nids_to_uuid().

Lustre-change: https://review.whamcloud.com/43510
Lustre-commit: 16321de596f6395153be6cbb6192250516963077

[This was problematic when the patch first landed, but was fixed
 by commit aacb16191a ("LU-14668 lnet: Lock primary NID logic")]

Fixes: 759d488fa0 ("EX-6349 revert: Add peer NI when processing llog")
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib587ef478251e9722b21210e896838e0344d0e47
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50589
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn84
Andreas Dilger [Wed, 5 Apr 2023 20:17:51 +0000 (14:17 -0600)]
RM-620 build: New tag 2.14.0-ddn84

New tag 2.14.0-ddn84

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I04f278c63bbc7c646dcf2fa32bf4d11d128bb7db

2 years agoEX-7251 lnet: fix locking multiple NIDs of the MR peer
Serguei Smirnov [Tue, 4 Apr 2023 21:02:51 +0000 (14:02 -0700)]
EX-7251 lnet: fix locking multiple NIDs of the MR peer

If LNetPrimaryNID is called on multiple NIDs of the same node,
as a result of peer discovery it is possible that
the discovered peer is found to contain a NID which is locked
as primary by a different existing peer record.
In this case it is safe to delete one of the peer records,
but the NID which got locked the earliest should be
kept as primary.

This allows for the first NID listed in the mount command's
comma-separated list to stay primary as intended
for the purpose of communicating with Lustre even if peer
discovery succeeded using a different NID of MR peer.

Lustre-change: https://review.whamcloud.com/50530
Lustre-commit: TBD (47df7c726987d49e92b7145c128414daa413835f)

Fixes: aacb16191a ("LU-14668 lnet: Lock primary NID logic")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iec9f8b70053fe24cddee552358500dfad0234b7f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50533
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn83
Andreas Dilger [Tue, 4 Apr 2023 17:23:27 +0000 (11:23 -0600)]
RM-620 build: New tag 2.14.0-ddn83

New tag 2.14.0-ddn83

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: Ib5162a7d3c15b84a298ce5d30374a9b08fc7c5db

2 years agoLU-16642 tests: improve sanity-sec test_61
Sebastien Buisson [Thu, 16 Mar 2023 16:59:59 +0000 (17:59 +0100)]
LU-16642 tests: improve sanity-sec test_61

Improve sanity-sec test_61 by using a client-specific nodemap rather
than the default nodemap.

Lustre-change: https://review.whamcloud.com/50317
Lustre-commit: a7222127c7a6437e3f3561fc55f3dc4ba69a97e5

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie0c9e381e42a93d89558947dee9a60537cf01e65
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
LU-16683 tests: fix sanity-sec test_61 for SSK

When SHARED_KEY is in use, nodemap specific shared keys must be loaded
explicitly because sanity-sec test_61 defines a nodemap dedicated to
the client.

Lustre-change: https://review.whamcloud.com/50476
Lustre-commit: 05e5cb0b0c07e15f51ce4e8fa26e12c178ab404a

Fixes: a7222127c7 ("LU-16642 tests: improve sanity-sec test_61")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec env=ONLY=61
Test-Parameters: testlist=sanity-sec env=SHARED_KEY=true,ONLY=61
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I206205496352b6f36341c8b962bb7de4b71541d5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/50502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>