Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-14719 lod: distributed transaction check space
Lai Siyao [Wed, 30 Mar 2022 21:50:22 +0000 (17:50 -0400)]
LU-14719 lod: distributed transaction check space

Distributed transaction failure may cause file missing or disconnected
directories, to avoid failure on disk full, check remote MDT free
space before transaction start.

The block/inode watermarks in obd_statfs_info are used to check
whether MDT has enough free blocks/inodes.

Add sanity 230x.

Lustre-commit: 6aee406c84b6b8fddf08b560acfcdf7c13c97e63
Lustre-change: https://review.whamcloud.com/47039

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0922e9c8668e8b842d313576bd68b52fa5d434ac
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6193 pcc: dio attach failed on non-blksz-aligned file
Qian Yingjin [Fri, 21 Oct 2022 03:49:35 +0000 (23:49 -0400)]
EX-6193 pcc: dio attach failed on non-blksz-aligned file

PCC attach failed due to do DIO copy on files with blksz unligned
file size.
The reason is that the copy tool ll_fid_path_copy fails on
non-blksize-aligned file for PCC backend (such as a local Ext4
file system) using direct I/O.
In this path, it fixes this bug by falling back from direct I/O to
buffered I/O mode when copy the tail non-blksize-aligned file
part.

This patch also sets the errno with return code in the function
@get_root_path(), thus the call for @llaip_open_by_fid() with
invalid mount path will see the correct errno.

Change-Id: I5287563029269032a91397c0094e2ccede73b9b1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15031 quota: reseed glbe in qmt_lvbo_udate
Sergey Cheremencev [Fri, 28 Oct 2022 10:29:03 +0000 (18:29 +0800)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate

Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.

The patch also adds test 80 to check that OST gets correct
edquot value after failover.

Lustre-change: https://review.whamcloud.com/45032
Lustre-commit: 61ec1e0f2ca8dc4c9f7ed41f782960e65cab0920

HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: move tti_ transaction params to tsi_
Mikhail Pershin [Sat, 28 May 2022 18:16:11 +0000 (21:16 +0300)]
LU-15847 tgt: move tti_ transaction params to tsi_

Move tti_mult_trans and tti_has_trans to tgt_session_info to
be available in all targets. This allows to cleanup old MDT
duplicating code and can be used for complex transaction
handling in MDT/OFD if needed.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47491
Lustre-commit: 0a317b171ebedcba8fc58e548991a884186c350c

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: reply always with the latest assigned transno
Mikhail Pershin [Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)]
LU-15847 tgt: reply always with the latest assigned transno

In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Lustre-commit: 4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15447 tests: sanity-flr/208 reset rotational status
Alex Zhuravlev [Thu, 13 Jan 2022 07:27:21 +0000 (10:27 +0300)]
LU-15447 tests: sanity-flr/208 reset rotational status

new kernels (e.g. 4.18.0-305.25.1) declares loopback devices
in tmpfs as non-rotational one. sanity-flr/208 does wrong
assumption that devices are non-rotational by default. thus,
sanity-flr/208 started to fail with new kernels.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/46088
Lustre-commit: 78dddb423f0dc8571d3c7f8ccd8f77a1c2bc28ae

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib5c42da39667227a6cff5d379e30d2cd6c1e2773
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16106 lnet: allow direct messages regardless of peer NI status
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status

If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.

This change allows non-routed traffic to peer NIs with "down"
status.

Lustre-commit: 3345a8a54e89c342a4ce2d8d4bcb04ee919bcd52
Lustre-change: https://review.whamcloud.com/c/48355

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48669
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16025 llite: adjust read count as file got truncated
Bobi Jam [Thu, 7 Jul 2022 07:38:54 +0000 (15:38 +0800)]
LU-16025 llite: adjust read count as file got truncated

File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.

This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.

Lustre-change: https://review.whamcloud.com/47896
Lustre-commit: 4468f6c9d92448cb72c5a616ec74653e83ee8e10

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie51ba09201a1ca1464c3a3892d367590e978ee34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48848
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14642 test: add fsx mirror file test mode
Bobi Jam [Thu, 2 Sep 2021 16:30:10 +0000 (00:30 +0800)]
LU-14642 test: add fsx mirror file test mode

- add fsx mirror file test mode with "-M" option so that fsx can exert
its IO to FLR file as well as extend/split/resync the FLR file.

- add sanity-flr test_70b() to test fsx with flrmode.

- fix a bug in "lfs mirror verify" to accomodate max mirror count
instead of (max - 1) mirrors.

- improve "lfs mirror verify -v" print proper data range of its crc-32
checksum values.

Lustre-change: https://review.whamcloud.com/43473
Lustre-commit: 90ba8b4ac360b1987178445bd2ccd64f7958d912

Test-Parameters: testlist=sanity-flr env=ONLY=70a,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib55c7b25dcd82fa0b197ad21268b16c82aab5da9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16249 sec: krb5_decrypt_bulk calls decryption primitive
Sebastien Buisson [Tue, 18 Oct 2022 15:19:01 +0000 (17:19 +0200)]
LU-16249 sec: krb5_decrypt_bulk calls decryption primitive

krb5_decrypt_bulk() was mistakenly calling an encryption primitive
instead of a decryption primitive for the confounder.

Lustre-change: https://review.whamcloud.com/48907
Lustre-commit: TBD (851f3915659941db00a0cda58867e68139e5e0d1)

Test-Parameters: trivial
Fixes: 0a65279121 ("LU-13344 gss: Update crypto to use sync_skcipher")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9251172644ed6baa3bb06a59dbe7c1bab401d817
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15097 quota: stop pool_recalc before killing pool
Sergey Cheremencev [Wed, 19 Oct 2022 11:18:04 +0000 (19:18 +0800)]
LU-15097 quota: stop pool_recalc before killing pool

qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:

    qmt_pool_fini() ASSERTION(list_empty(&qmt->qmt_pool_list)) failed

Lustre-change: https://review.whamcloud.com/45256
Lustre-commit: 862f0baa7c21cb631b98d3886ef9e938f4519573

Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4567 kernel: add extra field for snapshot in el8
Hongchao Zhang [Fri, 21 Oct 2022 07:43:11 +0000 (03:43 -0400)]
EX-4567 kernel: add extra field for snapshot in el8

Adding extra fields in "struct jbd2_journal_handle" and
"struct journal_head", which are used by snapshot into the
4-byte hole at the end of struct jbd2_journal_handle so
that they do not increase the structure size and memory
usage for this common allocation.

Use RH_KABI_EXTEND() and RH_KABI_FILL_HOLE() so that the
new fields do not affect the kernel ABI compatibility.

Change-Id: I84f52b18694e56d837d64c5c80076e45dde27eab
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6102 lipe: lipe_scan3 not intended for customer use
Alexandre Ioffe [Tue, 25 Oct 2022 03:06:08 +0000 (20:06 -0700)]
EX-6102 lipe: lipe_scan3 not intended for customer use

Print warning lipe_scan3 is not intended for customer use

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I92f775d77e1d4ffac304d3e46ed6af7c642a3bdd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11388 tests: exclude replay-single/131b for ldiskfs
Andreas Dilger [Fri, 14 Oct 2022 21:09:03 +0000 (15:09 -0600)]
LU-11388 tests: exclude replay-single/131b for ldiskfs

Test is failing about 1/10 of the test runs, even on ldiskfs.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c36d026944876e066a1dc36877927b7a92c537e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48946

2 years agoEX-5099 lipe: Made controllable ssh exec timeout
Alexandre Ioffe [Wed, 13 Apr 2022 05:34:18 +0000 (22:34 -0700)]
EX-5099 lipe: Made controllable ssh exec timeout

- Introduce new lipe ssh API:lipe_ssh_exec_timeout() and
lipe_ssh_start_cmd_timeout().
- Introduce new lamigo command option: --ssh-exec-timeout
to configure ssh connection timeout for ssh exec cmd
- Use lipe_ssh_start_cmd_timeout() to start remote
access log reader with timeout.
Use ssh_channel_read_timeout() with infinite timeout
when reads access log records
- Use lipe_ssh_start_cmd_timeout() to start remote "lfs ..."
commands with a long timeout to prevent premature timeout
when "lfs mirror extend ..." command for a big file
takes too long time.
- Use default timeout 600 seconds for ssh exec cmd.
Such long timeout should allow to finish long lasting
replications
This fixes EX-5429.

Test-Parameters: trivial clientdistro=el8.5 serverdistro=el8.5 testlist=hot-pools env=FAIL_ON_ERROR=false,ONLY="56 59",ONLY_REPEAT=20
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I8de9b1db2014abd1e6f201cda73a0812128f6bb6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]
Jian Yu [Fri, 21 Oct 2022 20:35:40 +0000 (13:35 -0700)]
LU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]

Update SLES15 SP3 kernel to 5.3.18-150300.59.93.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48601
Lustre-commit: c3467db7e7d0652c09bdcef26e2b708ab51cba9e

Change-Id: I1e0afe6974567d13680dbb0d463fbbd873ef2e5f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48864
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
Andreas Dilger [Thu, 6 Oct 2022 17:31:51 +0000 (10:31 -0700)]
LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed

This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.

Lustre-change: https://review.whamcloud.com/48629
Lustre-commit: 9a3e111a2ebdfadec4b6efc65899856edc90ad18

Test-Parameters: testlist=sanity clientdistro=el8.6
Test-Parameters: testlist=sanity clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"
Change-Id: I50f56b87844e8b019053e569767b6c949d2a3f55
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Lustre-change: https://review.whamcloud.com/44984
Lustre-commit: 1711e26ae861c28829870c2433caf7ee232909cf

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16044 osd: discard pagecache in truncate's declaration
Alex Zhuravlev [Mon, 25 Jul 2022 13:26:40 +0000 (16:26 +0300)]
LU-16044 osd: discard pagecache in truncate's declaration

to avoid taking pagelock inside a transaction which conflicts
with the write path where we take pagelock before any another one.
this should be safe as the write path writes the pages out
synchronously, so they should be clean by truncate.

Lustre-change: https://review.whamcloud.com/48033
Lustre-commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iba555ace2ce9ef34ab5517375ecb5c176f738a02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48885
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16076 utils: enhance 'lfs check' command
Lei Feng [Mon, 8 Aug 2022 02:59:25 +0000 (10:59 +0800)]
LU-16076 utils: enhance 'lfs check' command

Add optional argument to 'lfs check' command so that only the
servers related to the specified lustre file system is checked.

lustre-change: https://review.whamcloud.com/48155
lustre-commit: f5ca6853b8d8b918b0228af31fa8249be49d3000

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=113
Change-Id: I826a8e822af0a290f06ffaadadf1bb7f86899d99
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15305 obdclass: fix race in class_del_profile
Li Dongyang [Fri, 7 Oct 2022 12:09:10 +0000 (23:09 +1100)]
LU-15305 obdclass: fix race in class_del_profile

Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.

Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Lustre-commit: 83d3f42118579d7fb7c3002533c047badcf41e0d

Change-Id: I648aa206716213b064d045f546516b219337e0ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15467 tests: fix sanity-hsm test_103a timeout issue
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue

Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.

Lustre-change: https://review.whamcloud.com/46252
Lustre-commit: 98e1e41ce47c95155a8c8d452eef5074492d22f0

Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15646 llog: correct llog FID and path output
Mikhail Pershin [Mon, 17 Oct 2022 19:53:44 +0000 (12:53 -0700)]
LU-15646 llog: correct llog FID and path output

- fix wrong LLOG_ID-to-FID convertion to output llog FID by
  introducing PLOGID macro to expand llog ID for DFID format
- stop printing lgl_ogen along with llog FID as it always zero
  since 2.3.51 and is not used anymore
- output correct path for update llog in llog_reader
- always print header info in llog_reader if available
- print llog flags in header info

Lustre-change: https://review.whamcloud.com/48430
Lustre-commit: e28f3ee185b2ef7bad8046f46444772fac214a40

Fixes: 5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-6612 utils: strengthen llog_reader vs wrong format/header
Bruno Faccini [Mon, 17 Oct 2022 19:46:25 +0000 (12:46 -0700)]
LU-6612 utils: strengthen llog_reader vs wrong format/header

The following snippet shows that llog_reader can be puzzled due to
an invalid 0 for the number of records when parsing an expected
LLOG file header :
root# dd if=/dev/zero bs=4096 count=1 of=/tmp/zeroes
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000263962 s, 15.5 MB/s
root# llog_reader /tmp/zeroes
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

Lustre-change: https://review.whamcloud.com/15654
Lustre-commit: 45291b8c06eebf33d3654db3a7d3cfc5836004a6

Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I12be79e6c6a5da384a5fd81878a76a7ea8aa5834
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48895
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15000 llog: read canceled records in llog_backup
Etienne AUJAMES [Mon, 17 Oct 2022 19:37:39 +0000 (12:37 -0700)]
LU-15000 llog: read canceled records in llog_backup

llog_backup() do not reproduce index "holes" in the generated copy.
This could result to a llog copy indexes different from the source.
Then it might confuse the configuration update mechanism that rely on
indexes between the MGS source and the target copy.

This index gaps can be caused by "lctl --device MGS llog_cancel".

This patch add "raw" read mode to llog_process* to read canceled
records. So now llog_backup is able to reproduce an exact copy of
the original.

Lustre-change: https://review.whamcloud.com/46552
Lustre-commit: d8e2723b4e9409954846939026c599b0b1170e6e

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I811e23de8f4545bed36a44fedc2638d7418830dd
Reviewed-by: Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14098 obdclass: try to skip corrupted llog records
Alex Zhuravlev [Mon, 17 Oct 2022 19:31:56 +0000 (12:31 -0700)]
LU-14098 obdclass: try to skip corrupted llog records

if llog's header or record is found corrupted, then
ignore the remaining records and try with the next one.

Lustre-change: https://review.whamcloud.com/40754
Lustre-commit: 910eb97c1b43a44a9da2ae14c3b83e28ca6342fc

Fixes: 186f083722 ("LU-11924 osp: combine llog cancel operations")

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If47ec1fc1e2eaf64be7ba08d3aa9c2b93903c0cf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14044 llog: check fid after convert
Yang Sheng [Mon, 17 Oct 2022 18:53:47 +0000 (11:53 -0700)]
LU-14044 llog: check fid after convert

We should convert from llog_id and then check fid. Also
change fid-lookup to error check instead LASSERT.

Lustre-change: https://review.whamcloud.com/40294
Lustre-commit: 6df76d3357fc5896b6902399ed7ce6d7c7835f58

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I673d8f16ff9e57a0482d6a3ec3ee3db33699f57f
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48892
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-5909 tests: clean up in sanity-quota/16a
Andreas Dilger [Fri, 14 Oct 2022 22:04:53 +0000 (16:04 -0600)]
EX-5909 tests: clean up in sanity-quota/16a

Clean up the test file in sanity-quota test_16a.  If test_16b is
run (DNE config) then the filesystem is reformatted, but in the
non-DNE config test_17 will fail if there is used quota.

Test-Parameters: trivial testlist=sanity-quota
Fixes: b54b7ce43929 ("LU-14472 quota: skip non-exist or inact tgt for lfs_quota")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id1faeab9df246d8010bf114582ab17a75846db68
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48899
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn64
Andreas Dilger [Fri, 14 Oct 2022 20:06:26 +0000 (14:06 -0600)]
RM-620 build: New tag 2.14.0-ddn64

New tag 2.14.0-ddn64

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia86edfc375e1dda7205db1a32c8c1933153a3e92

2 years agoLU-15738 test: check lfsck status before starting
Hongchao Zhang [Fri, 22 Jul 2022 15:02:24 +0000 (23:02 +0800)]
LU-15738 test: check lfsck status before starting

If the LFSCK has been started before calling "lfsck_start"
to start it, the test shouldn't fail for starting LFSCK.

Lustre-change: https://review.whamcloud.com/48018/
Lustre-commit: 29aaf679afac89359e1b116b8de0480f24b4e8ac

Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I266d9e2b9c5f37eb9e08b489fab428268b90d895
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5964 lamigo: disable idle disconnects
Alex Zhuravlev [Mon, 19 Sep 2022 16:00:15 +0000 (19:00 +0300)]
EX-5964 lamigo: disable idle disconnects

on the connections lamigo uses locally to avoid storms
of reconnects.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3bc2742853e9636e38fbd8f7c2f238b3af55e0ba
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48840
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-3142 tests: changelog processing verification
Alex Zhuravlev [Fri, 6 Aug 2021 06:34:31 +0000 (09:34 +0300)]
EX-3142 tests: changelog processing verification

add extra counter to lamigo stats to catch gaps in changelog
processing. add a new test (hot-pools/60) to verify that no
gaps happen (i.e. lamigo gets all changelog records), verify
that the changelog is purged properly.

Test-Parameters: trivial testlist=hot-pools mdscount=2 mdtcount=4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I34d9d6f6f7f5766d945df43ae7d43dab7c70cef1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13578 test: sleep longer in sanity test_39
John L. Hammond [Wed, 8 Jun 2022 02:15:39 +0000 (19:15 -0700)]
LU-13578 test: sleep longer in sanity test_39

In sanity test_39r(), sleep for 2 * atime_diff rather than atime_diff + 1.

Lustre-change: https://review.whamcloud.com/47346
Lustre-commit: be2525ffddb4bf55fde77e97b00d1c349119daed

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=50
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ied508e12c848f6935d2317fb86bddc5341a6156e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-15472 ldlm: optimize flock reprocess
Andriy Skulysh [Fri, 5 Nov 2021 10:55:08 +0000 (12:55 +0200)]
LU-15472 ldlm: optimize flock reprocess

Resource reprocess on flock unlock can be done once
after all pending unlock requests.
It allows to reduce spinlock contention.

Lustre-change: https://review.whamcloud.com/46257
Lustre-commit: 42f377db4a24cefa7a041fcd3106dd58771eb319

Change-Id: I2809070f27fe3af7e1fc34e2b4b22603931f3dff
HPE-bug-id: LUS-10471, LUS-10909
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15132 mdc: Use early cancels for hsm requests
Etienne AUJAMES [Mon, 2 May 2022 12:27:17 +0000 (14:27 +0200)]
LU-15132 mdc: Use early cancels for hsm requests

HSM RELEASE and RESTORE requests take EX layout lock on the MDT side.
So the client can use early cancel for its local lock on the resource
to limit the contention (mdt side).

This patch does not pack ldlm request inside the hsm request because
the field (RMF_DLM_REQ) does not exist in the request. Adding this
field inside the request would break compatibility with _old_ servers.

Lustre-change: https://review.whamcloud.com/47181
Lustre-commit: 60d2a4b0efa4a944b558bd9b63b6334f7e70419b

Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I30a57b4855c28eef9c55a9645d3b6c491f962b13
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE
Serguei Smirnov [Thu, 8 Sep 2022 22:27:12 +0000 (15:27 -0700)]
LU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE

RDMA_CM_EVENT_UNREACHABLE may be received not only when connection
is being connected, but also when it is being closed. Fix handing
of this event accordingly.

Lustre-change: https://review.whamcloud.com/48492
Lustre-commit: 3925b1669d519e6c038ecce1287c1ced3de623d3

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I79428188c159b2d80d36326589b2977db065d4a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14428 libcfs: discard cfs_trace_copyin_string()
Alex Zhuravlev [Wed, 12 Oct 2022 06:35:42 +0000 (09:35 +0300)]
LU-14428 libcfs: discard cfs_trace_copyin_string()

Instead of cfs_trace_copyin_string(), use memdup_user_nul().
This combines the allocation with the copyin, and nul-terminates.

The resulting code is a lot simpler.

Lustre-change: https://review.whamcloud.com/41490
Lustre-commit: 67af976c806994cec27414d24b43f6519d72c240

LU-14788 lnet: check memdup_user_nul using IS_ERR

Crash in __proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.

Lustre-change: https://review.whamcloud.com/44091
Lustre-commit: 449d046e55a42cc4d1c4ab0217551cded1864bc4

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I089c5da96b59ec62d177aea2f3d170bf751c6fec
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13974 tests: update log corruption
Alexander Boyko [Tue, 24 Nov 2020 09:05:36 +0000 (04:05 -0500)]
LU-13974 tests: update log corruption

Test case reproduce missing object for sub transaction during
set xattr operation.
First setattr got -2, second already started, but didn't
make llog_add yet. In this case llog osp object is stale after
top_trans_start. So declaration phase can not refresh llogs. And
at llog_osd_write_rec osp object changes stale state to
valid(dt_attr_get), but llog handle and llog header are invalid.
A new record would be added to updatelog with wrong index.
In that case processing of update log fails with

fs1-MDT0001-osp-MDT0003: [0x2:0x400024d0:0x2] Invalid record: index
112926 but expected 112925
lod_sub_recovery_thread()) fs1-MDT0001-osp-MDT0003 get update log
failed: rc = -34
Recovery aborted, and clients are evicted.

Lustre-change: https://review.whamcloud.com/40743
Lustre-commit: 562837124ec7bffeba7edb4b4b899bc271833374

HPE-bug-id: LUS-9030
Test-Parameters: testlist=sanity  envdefinitions=ONLY="427"
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I6a47fed1bc01f4be62216d1d0787adc413df0cf5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-8621 utils: cmd help to stdout or short cmd error
Aleksei Alyaev [Thu, 23 Dec 2021 08:48:22 +0000 (11:48 +0300)]
LU-8621 utils: cmd help to stdout or short cmd error

- Changed to print command help to stdout
- Changed to output short error message for an unrecognized command

Lustre-change: https://review.whamcloud.com/47162/
Lustre-commit: bc69a8d058f5bcdb75e062df57a6ccd23243d1e0

Test-Parameters: trivial
Signed-off-by: Aleksei Alyaev <aalyaev@ddn.com>
Change-Id: I67616ddb576e3347a2da130b3a731a6bf8730185
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16233 build: Add always target for SUSE15 SP3 LTSS
Shaun Tancheff [Thu, 13 Oct 2022 19:19:47 +0000 (12:19 -0700)]
LU-16233 build: Add always target for SUSE15 SP3 LTSS

SUSE 15 SP3 LTSS kernel version 5.3.18-150300.59.93
(and later) breaks lustre build tests which expect
conftest.i to be generated.

Lustre-change: https://review.whamcloud.com/48833
Lustre-commit: TBD (from 274b34c4d3a20937ebb17d139dbde0eaaed503b2)

HPE-bug-id: LUS-11286
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If23e9b31b537878a43075ffff62a99906f47fd9a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2]
Jian Yu [Wed, 28 Sep 2022 07:00:22 +0000 (00:00 -0700)]
LU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2]

Update SLES15 SP4 kernel to 5.14.21-150400.24.21.2 for Lustre client.

Lustre-change: https://review.whamcloud.com/48604
Lustre-commit: TBD (from 896fd88c35b6685a586c1279c83c739b48cbe846)

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Change-Id: Ia68e1c960c79f40d0f725b0f440cd562b820a19f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0]
Jian Yu [Wed, 28 Sep 2022 06:46:30 +0000 (23:46 -0700)]
LU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.26.1.el9_0 for Lustre client.

Lustre-change: https://review.whamcloud.com/48676
Lustre-commit: TBD (from 9951a56c26b1ce6639cd2db350fdf6b81b3b4707)

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Change-Id: I9da2ccdf419d6490fdba80199eda69f4f19361be
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6130 lipe: s_volume_name not NUL terminated
Alexandre Ioffe [Wed, 12 Oct 2022 00:53:20 +0000 (17:53 -0700)]
EX-6130 lipe: s_volume_name not NUL terminated

s_volume_name field stores string, but the field may have no
termination NUL if the string size equal the size of the field.
Therefore on some target systems the definition of
struct ext2_super_block s_volume_name in
/usr/include/ext2fs/ext2_fs.h may have
attribute "nonstring". In such case it conflicts with calls
which require NUL terminated string.
The fix replaces NUL-terminated string calls by calls with
limited string size (e.g. strlen() -> strnlen())

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ieb1921a289328a8f9bfae9bb658c6c74f8ec43b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48829
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn63
Andreas Dilger [Tue, 11 Oct 2022 08:04:59 +0000 (02:04 -0600)]
RM-620 build: New tag 2.14.0-ddn63

New tag 2.14.0-ddn63

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e4b8d3d863cbd504fc7470b413d2083bb15e371

2 years agoLU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain
Etienne AUJAMES [Wed, 5 Oct 2022 07:10:05 +0000 (00:10 -0700)]
LU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain

Add the catalog callback return LLOG_SKIP_PLAIN to conditionally skip
an entire llog plain.

This could speedup the catalog processing for specific usages when a
record need to be access in the "middle" of the catalog. This could
be usefull for changelog with several users or HSM.

This patch modify chlg_read_cat_process_cb() to use LLOG_SKIP_PLAIN.
The main idea came from: d813c75d ("LU-14688 mdt: changelog purge
deletes plain llog")

**Performance test:**

* Environement:
2474195 changelogs record store on the mds0 (40 llog plain):
mds# lctl get_param -n mdd.lustrefs-MDT0000.changelog_users
current index: 2474195
ID    index (idle seconds)
cl1   0 (3509)

* Test
Access to records at the end of the catalog (offset: 2474194):
client# time lfs changelog lustrefs-MDT0000 2474194 >/dev/null

* Results
- with the patch:  real    0m0.592s
- without the patch: real    0m17.835s (x30)

Lustre-change: https://review.whamcloud.com/46310
Lustre-commit: aa22a6826ee521ab14994a4533b0dbffb529aab0

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I887d5bef1f3a6a31c46bc58959e0f508266c53d2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6033 lipe: Add note to developers for HP scripts
Gaurang Tapase [Thu, 29 Sep 2022 05:57:45 +0000 (23:57 -0600)]
EX-6033 lipe: Add note to developers for HP scripts

stratagem-hp-* scripts are moved to EMF repo as
they are tightly coupled with EXA release because of
HA configuration. They are kept in lustre repo so that
hotfixes should not delete them.

Test-Parameters: trivial
Change-Id: I33eecaa4ed0c9342a83973bac313322a007d72d0
Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48698
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5936 pcc: dont take UPDATE lock when set lustre.pin xattr
Qian Yingjin [Fri, 23 Sep 2022 09:34:09 +0000 (05:34 -0400)]
EX-5936 pcc: dont take UPDATE lock when set lustre.pin xattr

In this patch, we do not take UPDATE lock whan set lustre.pin
XATTR during the PCC pin command.
The reason is that it may revoke the combined UPDATE|LAYOUT lock
cached on the client namespace, and invalidate the layout and PCC
cache.

As we disable to cache lustre.pin xattr on the client XATTR cache,
so it does not cause problem without taking UPDATE lock bit during
set lustre.pin XATTR.

Add test case: sanity-pcc/204d.

Change-Id: I35a0e399294020efdb0e4710500e8f7b846c290f
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14599 osp: limit allocation at osp_sync_process_committed
Alexander Boyko [Wed, 5 Oct 2022 07:06:59 +0000 (00:06 -0700)]
LU-14599 osp: limit allocation at osp_sync_process_committed

Sometimes osp cancels very large cookie list with 64K elements.
In this case osp_sync_process_committed() tries to allocate 64 pages
and uses vmalloc.
The fix limits memory allocation size to 4 page with kmalloc, and
reuse it in a loop.

Lustre-change: https://review.whamcloud.com/43250
Lustre-commit: 9b692e2e7d105f4926649ea46007ac65b24c4b6d

HPE-bug-id: LUS-9815
Fixes: 6d7332102 ("LU-11924 osp: combine llog cancel operations")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic875335a28f78494fdb3cbc4b0145e5a43831ee8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48773
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16135 lod: prohibit DoM pattern in plain layout
Mikhail Pershin [Mon, 5 Sep 2022 07:41:37 +0000 (10:41 +0300)]
LU-16135 lod: prohibit DoM pattern in plain layout

DoM pattern can be set as default directory plain layout by
older LFS version. It misses DoM component sanity checks if
plain layout is used. Such layout is not allowed and causes
later crashed when file is created under that directory.

While LFS can prevent this but not in all Lustre versions,
so LOD should do the check as well

Lustre-change: https://review.whamcloud.com/48433
Lustre-commit: a8272168e3888ec4ced18035182159a8ee56a51a

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic58fdda2ab3e63083128cb6cf949fcb43ccd2c02
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15132 hsm: Protect against parallel HSM restore requests
Etienne AUJAMES [Thu, 21 Oct 2021 14:31:01 +0000 (16:31 +0200)]
LU-15132 hsm: Protect against parallel HSM restore requests

Multiple parallel accesses (read/write) to the same released file
could cause multiple HSM restore requests to be sent.
On the MDT side, each restore request waits the first one to complete
before grabbing the MDS_INODELOCK_LAYOUT LCK_EX and registering the
llog record.

This could cause several MDT threads to hang for the same restore
request sent in parallel. In the worst case, all MDT threads can
hang and the MDS is not longer able to handle requests.

This patch checks if an HSM restore handle exists before taking the
lock.

Lustre-change: https://review.whamcloud.com/45367
Lustre-commit: 66b3e74bccf1451d135b7f331459b6af1c06431b

Test-Parameters: testlist=sanity-hsm,sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=12s,ONLY_REPEAT=50
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I9584edc2c7411aa41b2e318e55f57c117d1c3dfb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48650
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16062 ldlm: improve bl_timeout for prolong
Vitaly Fertman [Tue, 4 Oct 2022 17:30:08 +0000 (10:30 -0700)]
LU-16062 ldlm: improve bl_timeout for prolong

If there is a client's RPC in hand, we can do a better job for
calculating the lock callback timeout as RPC has the info what
client thinks about this RPC timeout. Let's use it.

Lustre-change: https://review.whamcloud.com/48094
Lustre-commit: 34b2246e4a6c8ce827c404cb4e52f7c6a0a1b90b

HPE-bug-id: LUS-8866, LUS-11074
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ibd67d37c1073d0d3cb2e08b532c801af0de116fe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14183 ldlm: wrong ldlm_add_waiting_lock usage
Vitaly Fertman [Tue, 4 Oct 2022 17:24:31 +0000 (10:24 -0700)]
LU-14183 ldlm: wrong ldlm_add_waiting_lock usage

exp_bl_lock_at accounted the period since BLAST send until cancel RPC
came to server originally. LU-6032 started to update l_blast_sent for
expired locks which are still busy - prolonged locks when the timeout
expired. In fact, this is a good idea to cover not the whole period
but until any involved RPC comes - it avoids excessively large lock
callback timeouts - and the IO which does the lock prolong is also
able to re-start the AT cycle by updating the l_blast_sent.

Unfortunately, the change seems to be made occasionally as the main
prolong code was not adjusted accordingly.

Lustre-change: https://review.whamcloud.com/40868
Lustre-commit: af07c9a79e263f940fea06a911803097b57b55f4

Fixes: 292aa42e08 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock")
HPE-bug-id: LUS-9278
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Idc598508fc13aa33ac9fce56f13310ca6fc819d4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()
Lei Feng [Thu, 30 Jun 2022 02:46:31 +0000 (10:46 +0800)]
LU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()

There is a race condition that: on server side, one thread sent
early replay and is deleting the reply message, another is
searching for existing request and print some debug information
in _debug_req() if there is a duplicated request. They both operate on
req->rq_repmsg but it is not protected in ptlrpc_req_drop_rs().
So we protected it with req->rq_early_free_lock.

Lustre-change: https://review.whamcloud.com/47839
Lustre-commit: aaef545cff2dd958418ec9fb364d4bbe1408edb9

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ied55427ee15c3ef84bdd2d579844eba398dbf010
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47860
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16166 ptlrpc: lower the message level in no resend case
Yang Sheng [Mon, 19 Sep 2022 05:46:27 +0000 (13:46 +0800)]
LU-16166 ptlrpc: lower the message level in no resend case

Don't report the wrong generation as a error message in
rq_no_resend case.

Lustre-change: https://review.whamcloud.com/48585
Lustre-commit: d13cca56a5ae2ad44d8083025e37263e408b8f62

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I534cadc916fcd1eb6840439b6507e646d0e5d974
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6069 ldiskfs: ext4-simple-blockalloc.patch small fixes
Artem Blagodarenko [Wed, 28 Sep 2022 14:28:11 +0000 (10:28 -0400)]
EX-6069 ldiskfs: ext4-simple-blockalloc.patch small fixes

The LU-14305 requires cleanup to do.
MB_DEFAULT_MAX_CX_BYTES #defines are not used anymore,
and should be removed.

Also, in the el8 version of the patch for b_es6_0,
the THRESHOLD_BLOCKS() function should explicitly take "sbi"
as a parameter.

Test-Parameters: trivial
Fixes: d5d5cfdde2 ("add persistent tuning for mb_c3_threshold")
Change-Id: Idcb93432fdfa7694b4e7cabbf46a0bf21a412f87
Signed-off-by:Artem Blagodarenko <ablagodarenko@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48714
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16184 o2iblnd: fix deadline for tx on peer queue
Serguei Smirnov [Fri, 23 Sep 2022 19:29:59 +0000 (12:29 -0700)]
LU-16184 o2iblnd: fix deadline for tx on peer queue

In o2iblnd, deadline is checked for txs on peer queue,
but not set prior to adding the tx to the queue. This
may cause the tx to be dropped unnecessarily with
"Timed out tx for ..." warning.

Fix it by setting the tx_deadline when adding tx to peer queue.

Lustre-change: https://review.whamcloud.com/48640
Lustre-commit: 4c89ee7d7b098c7f1e6566f49fa2940db577518d

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie7cf5590b440b60f71527049953a64bb31d53578
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48641
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
2 years agoLU-16160 osc: take ldlm lock when queue sync pages
Bobi Jam [Thu, 15 Sep 2022 06:46:34 +0000 (14:46 +0800)]
LU-16160 osc: take ldlm lock when queue sync pages

osc_queue_sync_pages() add osc_extent to osc_object's IO extent
list without taking ldlm locks, and then it calls
osc_io_unplug_async() to queue the IO work for the client.

This patch make sync page queuing take ldlm lock in the
osc_extent.

Lustre-change: https://review.whamcloud.com/48557
Lustre-commit: 67aca1fcc6bed20794832decdba590a758d67d8fp

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idefa2981e62a2a6e10d8b8a7692c0337b61b9052
Reviewed-on: https://review.whamcloud.com/48597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5932 lipe: stratagem-hp-config.sh has wrong MDTLIST
Alexandre Ioffe [Wed, 21 Sep 2022 19:15:03 +0000 (12:15 -0700)]
EX-5932 lipe: stratagem-hp-config.sh has wrong MDTLIST

stratagem-hp-config.sh doesn't pick up proper MDTLIST
if snapshot agents are running. Fix MDTLIST which is used
to configure lpurge

Test-Parameters: trivial
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ic1d58d56f1acae140122d0b582410c140759e89e
Reviewed-on: https://review.whamcloud.com/48619
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16154 obdclass: free inst_name correctly
Emoly Liu [Thu, 15 Sep 2022 01:42:47 +0000 (09:42 +0800)]
LU-16154 obdclass: free inst_name correctly

In functon class_config_llog_handler(), inst_name should be freed
correctly before break.

Lustre-change: https://review.whamcloud.com/48542
Lustre-commit: e7f17c5e0c95dba3b80e192e4ca3628cc42e64b9

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I6adc0ed62c3c637237834b799f25666d0e7e1ecb
Reviewed-on: https://review.whamcloud.com/48670
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16050 build: replace ofed_info with dpkg/rpm
Jian Yu [Mon, 26 Sep 2022 18:22:56 +0000 (11:22 -0700)]
LU-16050 build: replace ofed_info with dpkg/rpm

After installing MLNX_OFED by running mlnxofedinstall command,
mlnx-ofed-kernel-modules package is not listed by ofed_info,
which causes Lustre configure fail as follows:

checking whether to use Compat RDMA... /usr/bin/ofed_info
dpkg-query: error: --listfiles needs at least one package name argument

This patch fixes the above issue by replacing ofed_info with
"dpkg -l" and "rpm -qa" commands to find OFED package.

Lustre-change: https://review.whamcloud.com/48047
Lustre-commit: 3a7930e63c15b0fbe51ac73db81a1186939115bb

Test-Parameters: trivial
Fixes: ec03c9628cae ("LU-15417 build: find the new path for MOFED 5.5")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ia3c2d6bf10e147ca2761221741eff6f93008556c
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6014 tests: Revert "EX-4093 tests: hot-pools don't recreate pools"
Jian Yu [Wed, 28 Sep 2022 16:54:33 +0000 (09:54 -0700)]
EX-6014 tests: Revert "EX-4093 tests: hot-pools don't recreate pools"

This reverts commit 116cbacc52d8 to resolve the hot-pools
regression test failures.

After running sub-test 1, the OST pools were destroyed by
the following stack_trap in create_pool():

  stack_trap "destroy_test_pools $fsname" EXIT

If the pools are not recreated in the successive sub-tests,
then they will fail. We have to revert commit 116cbacc52d8
before we find out a way to avoid triggering the stack_trap
between sub-tests.

Test-Parameters: trivial mdscount=2 mdtcount=4 \
testlist=parallel-scale-nfsv4,hot-pools

Fixes: 116cbacc52d8 ("EX-4093 tests: hot-pools don't recreate pools")
Change-Id: I464a1f9f380c55e70b78a0dd7e52723d5b0a298d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48690
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn62
Andreas Dilger [Fri, 23 Sep 2022 22:24:58 +0000 (16:24 -0600)]
RM-620 build: New tag 2.14.0-ddn62

New tag 2.14.0-ddn62

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I21b71b04905a70acbaada6d5a7fbab6c9184ca51

2 years agoRevert "EX-4141 lipe: lamigo should detect dead OST and restart ALR"
Andreas Dilger [Fri, 23 Sep 2022 19:36:53 +0000 (19:36 +0000)]
Revert "EX-4141 lipe: lamigo should detect dead OST and restart ALR"

This reverts commit 028bee14d2c6d8feb5eb418302f8751643e731c6 due to build error.

Change-Id: I6193f3e99192b618a3e6616524e28b230659fc0b
Reviewed-on: https://review.whamcloud.com/48639
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn61
Andreas Dilger [Fri, 23 Sep 2022 17:19:23 +0000 (11:19 -0600)]
RM-620 build: New tag 2.14.0-ddn61

New tag 2.14.0-ddn61

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34c78bc6ce2fbac65e4e8b017cad1da05c78d53a

2 years agoLU-16183 tests: sanity-hsm/70 should detect python
Minh Diep [Thu, 15 Sep 2022 03:41:37 +0000 (20:41 -0700)]
LU-16183 tests: sanity-hsm/70 should detect python

Check for python2 and python3 explicitly, since the
generic python command does not exist in newer distros.

Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=sles15sp3 testlist=sanity-hsm
Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=el7.9 testlist=sanity-hsm
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: I2251be461129310868868277bf9d46015545ffe2
Reviewed-on: https://review.whamcloud.com/48577
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4141 lipe: lamigo should detect dead OST and restart ALR
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR

Use #keepalive message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message
Duplicate ofd_access_log_reader from lustre/utils into
lipe/src/es_ofd_access_log_reader
Use common lamigo_hash.h for lamigo and
es_ofd_access_log_reader

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I26dc631a8663046821e049fc6e091108b2a62f87
Reviewed-on: https://review.whamcloud.com/46944
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John Hammond <jhammond@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14962 lnet: Check for -ESHUTDOWN in lnet_parse
Chris Horn [Tue, 24 Aug 2021 16:16:17 +0000 (11:16 -0500)]
LU-14962 lnet: Check for -ESHUTDOWN in lnet_parse

The fix for LU-8106, http://review.whamcloud.com/19993, no longer
works because rc does not have the return value from
lnet_nid2peerni_locked(). Use PTR_ERR to get the return value and
restore the LU-8106 fix.

Lustre-change: https://review.whamcloud.com/44743
Lustre-commit: cce82630cbf2c7badbbdd16a8ca9c8c0065ded13

Test-Parameters: trivial
HPE-bug-id: LUS-10333
Fixes: fa8b4e6357 ("LU-7734 lnet: peer/peer_ni handling adjustments")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9cc2bc2d6e675d38cf06d99c524bdd95110bf0e9
Reviewed-on: https://review.whamcloud.com/48487
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15618 lnet: Return ESHUTDOWN in lnet_parse()
Chris Horn [Thu, 3 Mar 2022 07:12:32 +0000 (01:12 -0600)]
LU-15618 lnet: Return ESHUTDOWN in lnet_parse()

If the peer NI lookup in lnet_parse() fails with ESHUTDOWN then we
should return that value back to the LNDs so that they can treat the
failed call the same way as other lnet_parse() failures.

Returning zero results in at least one bug in socklnd where a
reference on a ksock_conn can be leaked which prevents socklnd from
shutting down.

Lustre-change: https://review.whamcloud.com/46711
Lustre-commit: 4fbd0705a3d25bbc85e953f81e697e5006b215ce

Fixes: 47b7b31978 ("LU-8106 lnet: Do not drop message when shutting down LNet")
Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-15794
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic403619c6dccf3921c46a674808c404adad7a30e
Reviewed-on: https://review.whamcloud.com/48485
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15616 lnet: ln_api_mutex deadlocks
Chris Horn [Mon, 7 Mar 2022 17:03:50 +0000 (11:03 -0600)]
LU-15616 lnet: ln_api_mutex deadlocks

LNetNIFini() acquires the ln_api_mutex and holds onto it throughout
various shutdown routines. Meanwhile, LND threads (via
lnet_nid2peerni_locked()) or the discovery thread (via
lnet_peer_data_present()) may need to acquire this mutex in order to
progress.

Address these potential deadlocks by setting the_lnet.ln_state to
LNET_STATE_STOPPING earlier in LNetNIFini(), and release the mutex
prior to any call into LND module or before any wait.

LNetNIInit() is modified to return -ESHUTDOWN if it finds that there
is a concurrent shutdown in progress.

Lustre-change: https://review.whamcloud.com/46727
Lustre-commit: 22de0bd145b649768b16dd42559d326af3c13200

Test-Parameters: trivial
HPE-bug-id: LUS-10681
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia8b28cc95ff71e66a0f99aed4f2c22ec9d44ce1e
Reviewed-on: https://review.whamcloud.com/48384
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy
Chris Horn [Fri, 11 Dec 2020 18:04:32 +0000 (12:04 -0600)]
LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy

The MR design dictates that the peer nets and peer NIs are ordered
such that the peer net and peer NI for a peer's primary NID appears
first, followed by other peer NIs in the primary NID's peer net,
followed by other peer nets/NIs. This ordering is broken and it can
result in tripping an assertion if the primary NID of a peer is
deleted. Modify lnet_peer_attach_peer_ni() to check whether the
NI being attached is the peer's primary, and place it, and its
associated peer net, appropriately.

Modify lnet_peer_set_primary_nid() so that it updates the
lp_primary_nid before calling lnet_peer_add_nid() so that
lnet_peer_attach_peer_ni() can detect the situation where the
primary is changing and act appropriately.

Finally, modify lnet_peer_merge_data() to enforce the hierarchy
after it has finished merging the contents of the ping buffer. This
ensures we maintain the correct hierarchy in certain edge cases where
we've needed to reconcile two peers. e.g. if a peer adds a new
interface, the discovery push may arrive from that new interface
which will result in a second peer object being created which will
need to be reconciled with the original peer object.

Lustre-change: https://review.whamcloud.com/40985
Lustre-commit: 9eb9474c41c823c70f34e6bb102a8861ca21a3d1

HPE-bug-id: LUS-9630
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8397a24ba1ba0bba33846e7e97b8d60a8f26a1be
Reviewed-on: https://review.whamcloud.com/48508
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15538 lnet: DLC sets map_on_demand incorrectly
Chris Horn [Sat, 5 Feb 2022 23:15:30 +0000 (23:15 +0000)]
LU-15538 lnet: DLC sets map_on_demand incorrectly

When any NET or LND tunable is specified via CLI or yaml, then the
whole tunables struct gets memset to 0, or in the case of yaml config,
0 gets assigned to any tunable that isn't specified in the yaml. This
causes a problem for map_on_demand because 0 is a valid value for that
parameter, and ko2iblnd cannot know whether the user specified that 0
should be used or if DLC is specifying that the parameter was unset.

Rather than setting this parameter to 0 in the LND tunables struct,
have DLC set it to UINT_MAX to indicate that ko2iblnd should use the
value of the kernel module parameter.

Lustre-change: https://review.whamcloud.com/46492
Lustre-commit: 896f4a082b93453f5e7168f685faff4fba594ff3

Test-Parameters: trivial
HPE-bug-id: LUS-10740
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I303e64d4d402ba61b5ae3e3910873f192a4a2845
Reviewed-on: https://review.whamcloud.com/48491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4093 tests: hot-pools don't recreate pools
Alex Zhuravlev [Wed, 21 Sep 2022 00:40:46 +0000 (17:40 -0700)]
EX-4093 tests: hot-pools don't recreate pools

the test can save some time skipping pools recreating in every
subtest.

before: 1371 seconds
after:  1058 seconds

Test-Parameters: trivial testlist=hot-pools

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9304e29b6fc59dd68626b44844dc81500009a80f
Reviewed-on: https://review.whamcloud.com/48614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5824 test: hot-pools test_57: data copy failed: mirror failed
Alexandre Ioffe [Thu, 8 Sep 2022 08:37:31 +0000 (01:37 -0700)]
EX-5824 test: hot-pools test_57: data copy failed: mirror failed

Add debug prints in hot-pools test_57

Test-Parameters: trivial env=FAIL_ON_ERROR=false,ONLY=56-57 testlist=hot-pools

Change-Id: I863b580f5483c14c24c6f79ebdddbc782b65e945
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-on: https://review.whamcloud.com/48477
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14992 tests: sanity/replay-vbr mkdir on MDT0
James Nunez [Mon, 13 Sep 2021 16:35:30 +0000 (10:35 -0600)]
LU-14992 tests: sanity/replay-vbr mkdir on MDT0

Replace mkdir with mkdir_on_mdt0() for sanity test 133a
and relay-vbr test 7a.  These tests expect the newly
created directory is on MDT0.

Lustre-change: https://review.whamcloud.com/44902/
Lustre-commit: TBD

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: env=SLOW=yes mdscount=2 mdtcount=4 testlist=replay-vbr
Change-Id: Icea2923a8d8d3a3aa0ddf0401f0a025480b2f6f0
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13358 libcfs: add timeout to cfs_race() to fix race
Alex Zhuravlev [Tue, 30 Mar 2021 05:57:14 +0000 (08:57 +0300)]
LU-13358 libcfs: add timeout to cfs_race() to fix race

there is no guarantee for the branches in cfs_race() to be executed
in strict order, thus it's possible that the second branch (with
cfs_race_state=1) is executed before the first branch and then another
thread executing the first branch gets stuck.

this construction is used for testing only and as a
workaround it's enough to timeout.

Lustre-change: https://review.whamcloud.com/43161
Lustre-commit: 2d2d381f35ee004319a20f5d2d8e70d13480d6c7

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie1cc0accedb3e1a198d4b17d1ab00ce298c560f2
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14875 import: fix bad CPT read
Cyril Bordage [Thu, 17 Feb 2022 11:49:16 +0000 (12:49 +0100)]
LU-14875 import: fix bad CPT read

When importing, CPT was read from tunables field but in fact, it is in
the same level in the YAML file generated during export.

Lustre-change: https://review.whamcloud.com/46541
Lustre-commit: 9ad5c43f4a53f8679cfa1a60f8161b08d3dcfa66

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: Iea7b6189ad1a25b95ae6416d75ee2cbe4dca2fbf
Reviewed-on: https://review.whamcloud.com/48490
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5798 tests: add a version check to conf-sanity.sh test_133
Emoly Liu [Fri, 9 Sep 2022 10:18:24 +0000 (18:18 +0800)]
EX-5798 tests: add a version check to conf-sanity.sh test_133

The patch at https://review.whamcloud.com/47334 has been ported
to b_es6_0 since 2.14.0-ddn46, a version check is added to
conf-sanity.sh test_133 to avoid interop failure.

Test-Parameters: trivial testlist=conf-sanity serverversion=2.14.0-ddn23

Change-Id: I4bfc2986abddfd3a5a606f5586a29311582fca42
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16131 build: Do not depend on libmount during --enable-dist
Shaun Tancheff [Wed, 7 Sep 2022 04:35:51 +0000 (21:35 -0700)]
LU-16131 build: Do not depend on libmount during --enable-dist

Defer the libmount requirement when using --enable-dist to
generate the lustre-src.rpm.

This allows mock and/or yum build-deps to resolve resolve
dependencies and pickup the libmount requirement without changing
the existing minimal build.

Lustre-change: https://review.whamcloud.com/48407
Lustre-commit: 819c8b169325045ae8bac9c4f38a58c75e22d099

Test-Parameters: trivial
HPE-bug-id: LUS-11091
Fixes: f21b944127 ("LU-15940 build: add a required dependency for libmount")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I20a7a097f9b651b6ea5519f79efda6c96b6f2199
Reviewed-on: https://review.whamcloud.com/48448
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16085 llite: fix stat attributes_mask
Sebastien Buisson [Fri, 12 Aug 2022 07:59:02 +0000 (09:59 +0200)]
LU-16085 llite: fix stat attributes_mask

Fix stat attributes_mask to return STATX_ATTR_ENCRYPTED whenever it is
possible. Also fix sanityn test_106c to expect at least the 0x30 flag
for attributes_mask.

Lustre-change: https://review.whamcloud.com/48208
Lustre-commit: 0e48653c27eacad29dbff1589da771ad4f5d1014
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
LU-16085 tests: fix sanityn test_106c

Fix sanityn test_106c after modification introduced when fixing
stat attributes_mask.

Lustre-change: https://review.whamcloud.com/48435
Lustre-commit: b843e8f89fe9b697ceec4657dde445aa60c200d0

Test-Parameters: trivial testlist=sanityn env=ONLY=106c
Fixes: 0e48653c27 ("LU-16085 llite: fix stat attributes_mask")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Icd16beff058c42d77e9b04ad1a287ec2ac04dfed
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48520
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16052 llog: handle -EBADR for catalog processing
Mikhail Pershin [Fri, 29 Jul 2022 08:24:15 +0000 (11:24 +0300)]
LU-16052 llog: handle -EBADR for catalog processing

Llog catalog processing might retry to get the last llog block
to check for new records if any. That might return -EBADR code
which should be considered as valid. Previously -EIO was
returned in all cases.

Run conf-sanity test_106 several times as specific test

Lustre-change: https://review.whamcloud.com/48070
Lustre-commit: e260f751f2a21fa126eeb4bc9e94250ba3e815f1

Test-Parameters: testlist=conf-sanity env=ONLY=106,SLOW=yes,ONLY_REPEAT=10
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I30e04ba2c91c8bdce72c95675a1209639e9f0570
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/48540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16084 tests: fix lustre-patched filefrag check
Andreas Dilger [Wed, 10 Aug 2022 18:27:56 +0000 (12:27 -0600)]
LU-16084 tests: fix lustre-patched filefrag check

Fix sanity test_130b thru test_130g to check for "filefrag -l"
instead of "filefrag -e", since the "-e" option has been in
upstream e2fsprogs since commit v1.42.6-50-g2508eaa7.  The "-l"
option (logical extent ordering) is really what is needed to
handle Lustre-striped files anyway.

While there, fix the code style in these subtests:
- use "local" and lower-case names for local variables
- use $(...) for subshells
- use (( ... )) for numeric comparisons
- use preferred "check || action" style checks
- use "skip_env" for environment configuration checks (e2fsprogs)
- use "skip" for test-related checks that can't be "fixed"
- use pre-defined $ost1_FSTYPE for checking OST filesystem type

Lustre-change: https://review.whamcloud.com/48188
Lustre-commit: fef1db004c4230e1051f9266f34a658501bf5d03

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8eb7f17a9532796ab0274247194dd52cbc8a141c
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48555
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16082 ldiskfs: old-style EA inode fix for el8.5/el8.6
Andreas Dilger [Tue, 20 Sep 2022 18:58:35 +0000 (11:58 -0700)]
LU-16082 ldiskfs: old-style EA inode fix for el8.5/el8.6

Add the rhel8/ext4-old_ea_inodes_handling_fix.patch to the ldiskfs
series for el8.5 and el8.6 kernels.

Lustre-change: https://review.whamcloud.com/48496
Lustre-commit: ba9845274c8ea5c55f57b7fa0e839f18d76031ea

Test-Parameters: trivial testlist=sanity clientdistro=el8.6 serverdistro=el8.6
Fixes: 76c3fa96dc30 ("LU-16082 ldiskfs: old-style EA inode handling fix")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifb66a0b7d78e5153d7897bee45fbf1d0e58fbc5c
Reviewed-on: https://review.whamcloud.com/48612
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-5978 scripts: remove zfsobj2fid
Jian Yu [Wed, 21 Sep 2022 20:28:43 +0000 (13:28 -0700)]
EX-5978 scripts: remove zfsobj2fid

The zfsobj2fid utility is not needed on EXA cluster.

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Change-Id: I40993c7c4ddef3f389c002076f5c118a9f610758
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48621
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5975 build: check OS type before using dpkg
Jian Yu [Wed, 21 Sep 2022 07:41:33 +0000 (00:41 -0700)]
EX-5975 build: check OS type before using dpkg

Bright cluster manager by default installs dpkg
on it's centos/rhel installation - presumably to
allow provisioning debian nodes in the cluster,
so dpkg is in the path and can't be removed.

This patch fixes LB_USES_DPKG to check OS type
before checking if dpkg is installed.

Test-Parameters: trivial clientdistro=el8.6
Test-Parameters: trivial clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"

Change-Id: Idc9f6edc91f9c89b40f259421b088287e08bfe9c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48616
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16090 build: Module.symvers lookup by flavor on SUSE
Shaun Tancheff [Wed, 14 Sep 2022 07:48:16 +0000 (00:48 -0700)]
LU-16090 build: Module.symvers lookup by flavor on SUSE

When multiple kernel flavors are found we need to select only
the Module.symvers for the flavor that is being built.

Lustre-change: https://review.whamcloud.com/48195
Lustre-commit: f3a9921ae4f9c3e48328f2c682e0c7e61221e0d3

HPE-bug-id: LUS-11149
Test-Parameters: trivial
Fixes: 1f4aaefe1aae ("LU-15962 build: add in-kernel Module.symvers to symbol path")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1c9af91108534d3a67f816077756fded4cd0b653
Reviewed-on: https://review.whamcloud.com/48329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16059 build: Installation of dkms server builds
Shaun Tancheff [Mon, 19 Sep 2022 19:11:55 +0000 (12:11 -0700)]
LU-16059 build: Installation of dkms server builds

The linux-zfs-dkms package is passing the wrong paths
for zfs [and spl] causing the dkms build to fail.

ZFS_VERSION is not parsed correctly from 'dkms status'.

The splver and zfsver check can match against the wrong
package(s).

lustre-zfs-dkms provides: kmod-lustre-osd-zfs, and
                          lustre-osd-zfs-mount
lustre-ldiskfs-dkms provides: kmod-lustre-osd-ldiskfs and
                              lustre-osd-ldiskfs-mount

In the case of multiple zfs versions installed, build lustre
osd against the highest version number.

Lustre-change: https://review.whamcloud.com/48083
Lustre-commit: c3dc67b2c5bf1974d792b3701d932bd04c756bd8

HPE-bug-id: LUS-11113
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic154ca045427bf26cb7e6a44b8c467675e987aad
Reviewed-on: https://review.whamcloud.com/48594
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16089 kernel: kernel update RHEL 7.9 [3.10.0-1160.76.1.el7]
Jian Yu [Mon, 22 Aug 2022 02:11:08 +0000 (19:11 -0700)]
LU-16089 kernel: kernel update RHEL 7.9 [3.10.0-1160.76.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.76.1.el7.

Lustre-change: https://review.whamcloud.com/48202
Lustre-commit: 94955bbc6dc82b43fd77150b82834132bc56f565

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I97d087a5d5bb27996a5c0caf382c011928c651b4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48277
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16000 utils: align updatelog parameters in llog_reader
Etienne AUJAMES [Wed, 14 Sep 2022 20:17:24 +0000 (13:17 -0700)]
LU-16000 utils: align updatelog parameters in llog_reader

Parameters in update log records are aligned on 64bits. llog_reader
do not aligned these parameters: if a parameters size is not mutiple
of 8, the next parameter size will be read incorrectly.

Lustre-change: https://review.whamcloud.com/47913
Lustre-commit: 6d74b759634355e7f6647ccaefef519a1ff208e2

Test-Parameters: trivial
Fixes: 9962d6f ("LU-14617 utils: llog_reader updatelog support")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I6871614ab4ea79d59c3c3b4644b377de395bad56
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48551
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15724 tests: MDT failover hang reproducer
Alexander Boyko [Wed, 14 Sep 2022 20:13:58 +0000 (13:13 -0700)]
LU-15724 tests: MDT failover hang reproducer

The patch adds recovery-small 144a test to reproduce
MDT failover hang when precreate threads are blocked on objects.

LustreError: 0-0: Forced cleanup waiting for mdt-kjcf05-MDT0001_UUID
namespace with 46 resources in use, (rc=-110)

Lustre-change: https://review.whamcloud.com/47006
Lustre-commit: aa6250b7412e7baf6760fe4010a81f4f22187127

Test-Parameters: trivial testlist=recovery-small env=ONLY=144a
HPE-bug-id: LUS-10750
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2743a1b5c8911d6982b527f7e7b7bbbaf310cd04
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/48550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15724 osp: wakeup all precreate threads
Alexander Boyko [Wed, 14 Sep 2022 19:56:07 +0000 (12:56 -0700)]
LU-15724 osp: wakeup all precreate threads

Number of threads could sleep at osp_precreate_reserve() and
wait objects from OST. When MDT stops Lustre should wakeup
all threads. When opd_pre_recovering is set any wakeup of
opd_pre_user_waitq is useless. Failover of MDT does not produce
disconnect event, only inactive, so osp_precreate_cleanup_orphans()
can not be awakened.

LustreError: 0-0: Forced cleanup waiting for mdt-kjcf05-MDT0001_UUID
namespace with 46 resources in use, (rc=-110)

 schedule_timeout at ffffffff8e551cd3
 osp_precreate_reserve at ffffffffc17d2d83 [osp]
 osp_declare_create at ffffffffc17c7eb9 [osp]
 lod_sub_declare_create at ffffffffc156415b [lod]
 lod_qos_declare_object_on at ffffffffc155bf42 [lod]
 lod_ost_alloc_rr.constprop.23 at ffffffffc155db2f [lod]
 lod_qos_prep_create at ffffffffc15630a6 [lod]
 lod_declare_instantiate_components at ffffffffc154b237 [lod]

Lustre-change: https://review.whamcloud.com/47005
Lustre-commit: e55fc043679cdfadfff6874ef78e2e0128ec37ac

HPE-bug-id: LUS-10750
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: If0164cfbecb1e358d9857421cb234559dc8cecbc
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/48546
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15555 ldiskfs: large directory causes htree corruption
Andrew Perepechko [Wed, 14 Sep 2022 19:50:51 +0000 (12:50 -0700)]
LU-15555 ldiskfs: large directory causes htree corruption

When creating a lot of files in a single directory, it can
get corrupted because of a typo in ext4-kill-dx-root.patch.

Lustre-change: https://review.whamcloud.com/46526
Lustre-commit: ea3ee9337f9bcd42360e4523f1e34bcd04d3bf41

Change-Id: Ia36278580741e1eb905e24a3a6231ba7daaa882a
Fixes: 20a6d32 ("LU-12637 kernel: RHEL 8.1 server support")
HPE-bug-id: LUS-10730
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/48545
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5380 lipe: wait longer before restarting the access log reader
John L. Hammond [Tue, 14 Jun 2022 13:46:45 +0000 (08:46 -0500)]
EX-5380 lipe: wait longer before restarting the access log reader

In lamigo_alr_data_collection_thread() if the access log reader exits
with status zero then it means that no OSTs are mounted on the
host. In this case we should wait longer before restarting the access
log reader.

Lustre-change: https://review.whamcloud.com/47627
Lustre-commit: 27c05f8cb39a8bf8d9e9386841fc7ecd700cf0fb

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I282c6b8e251c432664bc3b4eb202351a5bd7fe5b
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48380
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
2 years agoLU-14305 ldiskfs: add parameters for mb_c123_threshold
Artem Blagodarenko [Thu, 8 Sep 2022 03:13:07 +0000 (23:13 -0400)]
LU-14305 ldiskfs: add parameters for mb_c123_threshold

Add mount options for /sys/fs/ldiskfs/*/mb_c[123]_threshold values
so that they can be set persistently via mount options.

The /sys/fs/ldiskfs/*/mb_c[123]_threshold values are always shown
rounded down to the next lower percentage value due to integer
division, since internal values are stored as blocks for efficiency.

Round up the values shown to the next percent to match what was
used to originally set these parameters.

Lustre-change: https://review.whamcloud.com/41193
Lustre-commit: c2fd5297b46c4973aeda4d4d02cbc7ca2faa0d50

Fixes: 95f8ae567749 ("LU-12103 ldiskfs: don't search large block range if disk full")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Artem Blagodarenko <ablagodarenko@whamcloud.com>
Change-Id: Ie36a6667f8bca7481aa8179ab5b97c85d449d619
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48499

2 years agoLU-15003 sec: use enc pool for bounce pages
Sebastien Buisson [Fri, 25 Mar 2022 08:24:32 +0000 (09:24 +0100)]
LU-15003 sec: use enc pool for bounce pages

Take pages from the enc pool so that they can be used for
encryption, instead of letting llcrypt allocate a bounce page
for every call to the encryption primitives.
Pages are taken from the enc pool a whole array at a time.

This requires modifying the llcrypt API, so that new functions
llcrypt_encrypt_page() and llcrypt_decrypt_page() are exported.
These functions take a destination page parameter.
Until this change is pushed in upstream fscrypt, this performance
optimization is not available when Lustre is built and run against
the in-kernel fscrypt lib.

Using enc pool for bounce pages is a worthwhile performance win. Here
are performance penalties incurred by encryption, without this patch,
and with this patch:

                     ||=====================|=====================||
                     || Performance penalty | Performance penalty ||
                     ||    without patch    |     with patch      ||
||==========================================|=====================||
|| Bandwidth – write |        30%-35%       |   5%-10% large IOs  ||
||                   |                      |    15% small IOs    ||
||------------------------------------------|---------------------||
|| Bandwidth – read  |         20%          |    less than 10%    ||
||------------------------------------------|---------------------||
||      Metadata     |         N/A          |         5%          ||
|| creat,stat,remove |                      |                     ||
||==========================================|=====================||

Lustre-change: https://review.whamcloud.com/47149
Lustre-commit: f3fe144b8572e9e75bb55076e29057227476ebf5

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I3078d0a3349b3d24acc5e61ab53ac434b5f9d0e3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47513
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14719 osp: add inode watermark
Lai Siyao [Fri, 1 Apr 2022 19:58:08 +0000 (15:58 -0400)]
LU-14719 osp: add inode watermark

* move block watermark from debugfs to sysfs.
* add inode watermark for OSP.

Lustre-change: https://review.whamcloud.com/47128
Lustre-commit: 336eb696299e1c9731bd1443f05e5d814314ed36

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7c768fa2ebfb4b8c2f75255f9e9c061d4c15cf66
Reviewed-on: https://review.whamcloud.com/47866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16161 kernel: kernel update RHEL8.6 [4.18.0-372.26.1.el8_6]
Jian Yu [Fri, 16 Sep 2022 06:49:21 +0000 (23:49 -0700)]
LU-16161 kernel: kernel update RHEL8.6 [4.18.0-372.26.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.26.1.el8_6.

Lustre-change: https://review.whamcloud.com/48564
Lustre-commit: TBD (from 66b1b4469d6e5e65b450702c6cb68ec14a51e9b0)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I45bf6dbff5061407e1109732b6d466d0f7a8376c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48575
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4359 build: add bio-integrity patch to rhel8 series
Andreas Dilger [Thu, 30 Jun 2022 23:36:07 +0000 (17:36 -0600)]
EX-4359 build: add bio-integrity patch to rhel8 series

Add bio-integrity-unbound-concurrency patch to the rhel8.5 and
rhel8.6 series to ensure balanced T10-PI core usage.

Test-Parameters: trivial serverdistro=el8.5 clientdistro=el8.5 testlist=sanity,conf-sanity
Test-Parameters: trivial serverdistro=el8.6 clientdistro=el8.6 testlist=sanity,conf-sanity

Fixes: 97fba9aa48ca ("DDN-2042 bio: allow BIO integrity to run on any core")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I31f9ced4eadad105466556183e2b9e9e0419164d
Reviewed-on: https://review.whamcloud.com/47848
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-15795 lbuild: enable KABI
Minh Diep [Thu, 8 Sep 2022 19:54:56 +0000 (12:54 -0700)]
LU-15795 lbuild: enable KABI

Enable build kabi and clean up kmodtool patch

Lustre-change: https://review.whamcloud.com/47507
Lustre-commit: TBD (from 03fc87a2ba08e5c4b8b8787f19b4e736d2752fae)

Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.5 serverdistro=el8.5
Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.6 serverdistro=el8.6

Change-Id: I16d54af0004c4ddc1cc5e6acca81e4aa89a1a1c1
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48486
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14642 flr: allow layout version update from client/MDS
Bobi Jam [Wed, 13 Apr 2022 15:15:22 +0000 (23:15 +0800)]
LU-14642 flr: allow layout version update from client/MDS

Client write/punch request always carries its layout version so
that OFD can reject the request if the carried layout version
is a stale one.

This patch allows MDS as well as client to update new layout version
to OST objects. And during resync write, all OST objects will get
layout version updated.

Lustre-change: https://review.whamcloud.com/45443
Lustre-commit: fa6574150b6f745a668fe69b2d6d970068

Fixes: 7d97777a5d ("LU-14642 flr: abolish MDS transfer layout version to OST")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9f27af354875d48adda3361f6c8ea5a5f6def73b
Reviewed-on: https://review.whamcloud.com/47097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-9699 osp: don't assert on OSP duplicating
Jadhav Vikram [Tue, 25 Jul 2017 07:01:37 +0000 (12:31 +0530)]
LU-9699 osp: don't assert on OSP duplicating

Writeconf on an MDT with index > 0000 will cause
"add mdc" to be added to $FSNAME-client config
and "add osp" to be added to $FSNAME-MDTXXXX configs.

However, the configs may already contain these
directives. Duplicating the OSP device will
cause the assertion failure in osp_obd_connect():
ASSERTION( osp->opd_connects == 1 ) failed

Duplicating the MDC just returns -EEXIST in similar
situation.

A possible solution is to check configs for duplicates
before writing to them. However, sometimes we
would like to change nids which are part of
"add mdc" and "add osp".

Another solution is to mark previous entries with
SKIP flags. This patch implements this approach.
Since after revoking the config lock, the clients
and the MDTs will receive the updated log and
apply its newer entries, we still have to handle
OSP duplication, but this is only an issue
immediately after writeconf processing.

Lustre-change: https://review.whamcloud.com/27753
Lustre-commit: 98f107b53e4daa3bfaf026c379c0a9c41cb5f161

Seagate-bug-id: MRP-2634, MRP-3865
Change-Id: Idd7ad43c78d50e6bbe715850503aa0b01fcbf071
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48515
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>