Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-16378 lnet: handles unregister/register events
Cyril Bordage [Mon, 12 Dec 2022 10:49:11 +0000 (11:49 +0100)]
LU-16378 lnet: handles unregister/register events

When network is restarted, devices are unregistered and then
registered again. When a device registers using an index that is
different from the previous one (before network was restarted), LNet
ignores it. Consequently, this device stays with link in fatal state.

To fix that, we catch unregistering events to clear the saved index
value, and when a registering event comes, we save the new value.

Lustre-change: https://review.whamcloud.com/49375/
Lustre-commit: TBD (from 7442710a56a8f38453441c62253c0ad891fe9b8c)

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I17e93a1103d588f3e630a9c7446b345f4d472b97
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16373 tests: failover mds1 back to the primary server
Jian Yu [Thu, 15 Dec 2022 19:38:56 +0000 (11:38 -0800)]
LU-16373 tests: failover mds1 back to the primary server

This patch fixes recovery-small test 144a to failover
mds1 back to the primary server so that stack_trap can
set timeout parameter on the correct mds node.

Lustre-change: https://review.whamcloud.com/49345
Lustre-commit: TBD (from 68c75d28fe86ac890d242c004c664f872204b660)

Test-Parameters: trivial \
env=SLOW=yes,FAILURE_MODE=HARD,ONLY=144a \
clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
austeroptions=-R failover=true iscsi=1 \
testlist=recovery-small

Change-Id: Idbfdb7b084c7edac8784008e0455f76632aa685b
Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49419
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16329 Revert "LU-8621 utils: cmd help to stdout or short cmd error"
Andreas Dilger [Thu, 15 Dec 2022 15:30:32 +0000 (08:30 -0700)]
LU-16329 Revert "LU-8621 utils: cmd help to stdout or short cmd error"

This reverts commit 608d763955d7e0a9c438c317e595f14825e9423b.
This breaks bash command completion.

Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout, short cmd error")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I004ea5af499593b0f36ba17ff5f517548f0ea0f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6349 Revert "LU-14661 obdclass: Add peer/peer NI when processing llog"
Alex Zhuravlev [Wed, 14 Dec 2022 19:00:01 +0000 (22:00 +0300)]
EX-6349 Revert "LU-14661 obdclass: Add peer/peer NI when processing llog"

This reverts commit e8ddb2f550072cdd3489389c107af3e892a21f66.
It is causing problem with reconnection at failover.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I53594f8f93474666c4abd96291d58dadf8ac5969
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49411
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs
Lai Siyao [Tue, 15 Mar 2022 19:43:14 +0000 (15:43 -0400)]
LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs

In osd_fid_lookup(), if the FID mapping found in OI table is insane,
it will be added into a list called os_inconsistent_items, and OI
scrub will be triggered.

Later if OI scrub can't fix this mapping, it should move this mapping
into a list called os_stale_items, and subsequent access of the same
FID should return -ESTALE immediately, other than trigger OI
scrub repeatedly.

Add sanity-scrub 20. Remove sanity-scrub 1d, which is not a sane test
because it altered FID in LMA, which is the last to trust for an
object, and it could pass just by chance.

Lustre-change: https://review.whamcloud.com/46852
Lustre-commit: 558784caad491be50e93ae60a31d4219a1e038bc

Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ed8928506551416b1008121adbe385dedda29bc
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49424
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn69
Andreas Dilger [Tue, 13 Dec 2022 19:12:09 +0000 (12:12 -0700)]
RM-620 build: New tag 2.14.0-ddn69

New tag 2.14.0-ddn69

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I592bd3a6fdb9db02bbe1a18c6e84d9b61a639f95

2 years agoEX-6497 lipe: Refine stats field name in lamigo
Alexandre Ioffe [Thu, 8 Dec 2022 06:45:35 +0000 (22:45 -0800)]
EX-6497 lipe: Refine stats field name in lamigo

Corrected periodically printed by lamigo INFO
message "processed":
- Added two additional fields:
  "running" - number of currently running jobs such as replication
  "delayed" - current number of failed and other (such as set flag)
  jobs which are awating to be run on next lamigo cycle
- "in queue" field is changed to "awaiting". This is current number
  of files in the internal cache. These files are awating to be
  processed (replicated)

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iacf0199cfcf56edcbb8ad91e0e4b62c7451900f5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 lipe: decrease delay before ALR restart
Alexandre Ioffe [Sat, 19 Nov 2022 05:39:08 +0000 (21:39 -0800)]
EX-6298 lipe: decrease delay before ALR restart

- Decrease delay before restarting access log reader and
eliminate this delay when the read from ALR fails
due to timeout. Increase SSH poll/read timeout while
keep-alive message in ofd_access_log_reader is not
implemented
This will decrease probability of missing ALR.
- Remove excluding hot-pools test_72

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I36989e9c3fd877aee5ce1cfb8525db8604e666bd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49196
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16353 config: enable_foo variables mustn't contains space
Mr NeilBrown [Thu, 1 Dec 2022 17:53:01 +0000 (09:53 -0800)]
LU-16353 config: enable_foo variables mustn't contains space

$enable_crypto is in some circumstances set to "embedded llcrypt"
which contains a space.
When the code from lustre-build.m4 then tests the value with:

   if test x$enablecrypto = xyes

we get a syntax error from ./configure

We could add quotes to this comment, but for consistency we would need
to add quotes to ever other test for an enable_foo variable.

It is simpler just to ensure we don't add spaces.  So change the space
to a hyphen.

Lustre-change: https://review.whamcloud.com/49282
Lustre-commit: c8a33e5322b0675680f8d737f04259799d30aa0e

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I097e857409d6ec48a765ccda1cc470d28b90e601
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16051 o2iblnd: detect link state to set fatal error on ni
Serguei Smirnov [Fri, 23 Sep 2022 22:20:51 +0000 (15:20 -0700)]
LU-16051 o2iblnd: detect link state to set fatal error on ni

To avoid selecting lnet ni which corresponds to a downed link
for sending, add a mechanism for detecting ip-layer link events
in o2iblnd. On ip link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag. This complements the
existing mechanism for ib-layer link event handling.

Lustre-change: https://review.whamcloud.com/48644
Lustre-commit: 30d73908087d5b2f0b18cce95826c4825c030ad4

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I4720cd0a7bc577a522c7d40b54f821a4c12b670f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49315
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14992 tests: add more mkdir_on_mdt0 calls
Mr NeilBrown [Tue, 29 Nov 2022 02:31:21 +0000 (18:31 -0800)]
LU-14992 tests: add more mkdir_on_mdt0 calls

A previous patch changed some mkdir calls in test_133a to
mkdir_on_mdt0. This allows stats collected from mdt0 to
reflect the mkdir.

However two mkdir calls were missed, so "crossdir_rename" stats can be
wrong.

Lustre-change: https://review.whamcloud.com/49252
Lustre-commit: d56ea0c80a959ebd9b393f2da048cc179cb16127

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity env=ONLY=133a

Fixes: 543341afc3 ("LU-14992 tests: sanity/replay-vbr mkdir on MDT0")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4e5c2e5504307462bff4012a13ef9deb24f8da8c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49262
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16308 llite: wake_up after cl_object_kill
Lai Siyao [Thu, 10 Nov 2022 13:15:51 +0000 (08:15 -0500)]
LU-16308 llite: wake_up after cl_object_kill

cl_inode_fini() calls cl_object_kill() to set LU_OBJECT_HEARD_BANSHEE,
and then calls cl_object_put_last() to wait for object refcount to
become one, It should wake_up() in the middle in case someone is
waiting on the flag.

Lustre-change: https://review.whamcloud.com/49130
Lustre-commit: 3a0a6c7a88499a78c9bfc6ac514d05eba60312c9

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I244db71ee4ed9c39118e443b99c3b8a3a0aa4bc3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49312
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6468 pcc: add threshold to determine direct I/O during attach
Qian Yingjin [Wed, 30 Nov 2022 14:29:47 +0000 (09:29 -0500)]
EX-6468 pcc: add threshold to determine direct I/O during attach

This patch adds the threshold tunable parameter to determine doing
direct I/O or buffered I/O for data copying during attach:
llite.*.pcc_dio_attach_threshold
The default value is same as direct I/O size: 32MiB.

And the usage of the parameter "pcc_dio_attach_size_mb" is
deprecated, and use "pcc_dio_attach_iosize_mb" instead.

Change-Id: I393d6a06523303e749192ba9978449c3d75886ae
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn68
Andreas Dilger [Tue, 6 Dec 2022 05:15:41 +0000 (22:15 -0700)]
RM-620 build: New tag 2.14.0-ddn68

New tag 2.14.0-ddn68

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id4e3d1a9f28afe251e55582c84acaf98ebfe9954

2 years agoLU-15852 lnet: Don't modify uptodate peer with temp NI
Chris Horn [Wed, 30 Mar 2022 18:35:23 +0000 (13:35 -0500)]
LU-15852 lnet: Don't modify uptodate peer with temp NI

When processing the config log it is possible that we attempt to
add temp NIs after discovery has completed on a peer. These temp
may not actually exist on the peer. Since discovery has already
completed the peer is considered up-to-date and we can end up with
incorrect peer entries. We shouldn't add temp NIs to a peer that
is already up-to-date.

Lustre-change: https://review.whamcloud.com/47322
Lustre-commit: 8f718df474e453fbc69dfe90214e71565963f6db

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia484713b1e6c9e1a46e525589b7c741c6478e417
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49303
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15938 llog: more checks in llog_reader
Mikhail Pershin [Tue, 2 Aug 2022 12:41:52 +0000 (15:41 +0300)]
LU-15938 llog: more checks in llog_reader

Add more correctness checks and reports in llog_reader:
- better report wrong record length and chunk skipping case
- add tail check: tail id and len should be the same as in head
- better report for gap in record indeces
- test case with two corruption types:
  1) llog has bits set in bitmap beyond file end
  2) corruption in the middle

Lustre-change: https://review.whamcloud.com/48112
Lustre-commit: 386ffcdbb4c9b89f798de4c83a51a3f020542c8b

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I0c2af6ae2592c94e14e90ead12e28104409313b2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-16317 build: dkms build requires flex, bison and libmount-devel
Jian Yu [Tue, 29 Nov 2022 17:14:22 +0000 (09:14 -0800)]
LU-16317 build: dkms build requires flex, bison and libmount-devel

This patch fixes lustre.spec.in and lustre-dkms.spec.in to add
requires for flex, bison, libmount and libmount-devel. The last
two have already been added into lustre.spec.in.

Lustre-change: https://review.whamcloud.com/49183
Lustre-commit: c74c630ff7596317d1b500fd385fca271b31708c

Test-Parameters: trivial

Fixes: 121a79651f ("LU-15967 build: configure script does not check for required build tools")
Fixes: f21b944127 ("LU-15940 build: add a required dependency for libmount")

Change-Id: I9923fc7eb09f974e8c38c3664138486a424e16d7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6373 pcc: asynchronous PCCRO attach command support
Qian Yingjin [Fri, 11 Nov 2022 09:01:02 +0000 (04:01 -0500)]
EX-6373 pcc: asynchronous PCCRO attach command support

Currently PCCRO attach via the command "lfs pcc attach" will block
during the data copying.
There is a requirement that this command can also do data copy
asynchronously. Thus we add an option "--async|-A" to the command
which will not block while the file data is being fetched.

Add sanity-pcc/test_{103, 104} to verify that it works correctly.

Change-Id: I6f31190c8b9e9b9876b34f8e484c6c8b7f16b6db
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49133
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16313 pcc: use two bits to indicate pcc type for attach
Qian Yingjin [Tue, 15 Nov 2022 06:57:08 +0000 (01:57 -0500)]
LU-16313 pcc: use two bits to indicate pcc type for attach

PCC currenty supports two types: readwrite and readonly.
The attach data structure @lu_pcc_attach is using 32 bit value to
indicate the PCC type:
struct lu_pcc_attach {
__u32 pcca_type;
__u32 pcca_id;
};

In this patch, it changes to use 2 bits to represent the PCC type.
The left bits in @pcca_type can be used as flags for attach such
as a flag to indicate using the asynchronous attach via the
command "lfs pcc attach -A" for PCCRO.

Lustre-change: https://review.whamcloud.com/49160
Lustre-commit: 6e90974b1f4ac24c5a5d45ecc9bdb4d47018dab4

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idee26018642a174b04d1d36a81952ea98a06514e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn67
Andreas Dilger [Tue, 6 Dec 2022 02:05:39 +0000 (19:05 -0700)]
RM-620 build: New tag 2.14.0-ddn67

New tag 2.14.0-ddn67

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia40ed3b7d185fa171586d5ca377714518fdc5e2e

2 years agoLU-8585 llapi: use open_by_handle_at in llapi_open_by_fid
Quentin Bouget [Sun, 2 Jan 2022 16:12:42 +0000 (11:12 -0500)]
LU-8585 llapi: use open_by_handle_at in llapi_open_by_fid

Reimplement llapi_open_by_fid() to use llapi_fid_to_handle() and
open_by_handle_at(2) rather than using ioctl().  This works for
opens on subdirectory mountpoints, unlike ".lustre/fid/<fid>".

This patch also adds llapi_open_by_fid_at() which is similar to
llapi_open_by_fid() except that it takes an open directory file
descriptor or AT_CWD rather than a path as its first argument.

[AD:
- Move get_root_*() functions over to a new liblustreapi_root.c
  file in expectation of further enhancements to that code.
- Cache an open file handle on the root directory so repeated
  calls to llapi_open_by_fid() and llapi_fid2path() do not need
  to search for and open the same root directory path many times.
- Add man pages for newly-added functions.

  This reduces the system calls for llapi_fid_test significantly:

      original     patched
         14511        4315   total opens
         64807       34067   total syscalls
]

There may still be a need to have a fallback from open_by_handle_at()
to using ".lustre/fid/<FID>" to open the fid (if available), but
that can be added if this initial patch does not test well.  The
open_by_handle_at() method avoids reopening the "fid/" directory
each time (though this fd could also be cached), but it has the
drawback that it reconnects dentries to the root directory each time.

Lustre-change: https://review.whamcloud.com/36603
Lustre-commit: bdf7788d19985bb7abf2385add15f1d67f3d01e4

Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I8a4904c996389da2b0894cd9fac639a398607535
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49202
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15833 llapi: don't use realpath in llapi_search_fsname()
Etienne AUJAMES [Mon, 9 May 2022 13:44:29 +0000 (15:44 +0200)]
LU-15833 llapi: don't use realpath in llapi_search_fsname()

This patch use st_dev value to determine the fsname in
llapi_search_fsname().
The main purpose of this is to limit the number of lstat()
(realpath()) in this function.

get_root_path() is modified to search a mountpoint by dev.
And the last results of get_root_path() is cached to avoid reading
/proc/mount for each call.

A new api function llapi_search_rootpath_by_dev() is added to get
the path of Lustre mountpoint using the specified device value.

**Testing:**

*Environement:*
VMs: 1 client, 1 MDS (2MDT), 1 OSS (2 OST)
Lustre tree: test{001..100}/test{001..100}/test{01..10}/file{01..05}
(500000 files + 110100 folders)
OS: Centos 7 (no statx)
Lustre: 2.15.50_15_g1116739

*Tests*
cd <rootfs>
strace lfs getstripe -r .
echo 3 > /proc/sys/vm/drop_caches
/usr/bin/time lfs getstripe -r . (2 iterations)

*Results*
times (s):

                 ______________________________
                | user | system | real | real% |
 _______________|______|________|______|_______|
|without patch: | 6.18 | 57.3   | 427  | 0%    |
|_______________|______|________|______|_______|
|with patch:    | 2.88 | 47.3   | 404  |-5.45% |
|_______________|______|________|______|_______|

strace (only significant changes are displayed):
(*stat = lstat + stat + fstat)
                 _____________________________________________
                | *stat  | mmap   | open   | read   | all     |
 _______________|________|________|________|________|_________|
|without patch: | 760545 | 110142 | 330379 | 330325 | 4742658 |
|_______________|________|________|________|________|_________|
|with patch:    | 440484 | 0      | 220277 | 19     | 3541739 |
|_______________|________|________|________|________|_________|

-25.32% syscalls after patching.

Lustre-change: https://review.whamcloud.com/47258
Lustre-commit: 4fd7d5585d33240a658f57bf7399da4415a7eb6c

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I3812d922d5b1d194d52132cba95d11820424c5d7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoDDN-3473 build: support kernel 3.10.0-693.el7
Jian Yu [Wed, 16 Nov 2022 05:54:26 +0000 (21:54 -0800)]
DDN-3473 build: support kernel 3.10.0-693.el7

This patch fixes the following build failures to support
kernel 3.10.0-693.el7 for Lustre client:

- error: implicit declaration of function 'idr_destroy'
- error: implicit declaration of function 'gfpflags_allow_blocking'
- error: implicit declaration of function ‘cdev_device_add’
- error: passing argument 1 of 'init_wait_var_entry' from
  incompatible pointer type

Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I4b5c5264fb102d3a825c92e7b1e92cf0c52540e5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49197
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-12016 tests: skip sanity/156 in interop
Andreas Dilger [Fri, 25 Nov 2022 02:28:21 +0000 (19:28 -0700)]
LU-12016 tests: skip sanity/156 in interop

Since LU-12071 was backported to b_es5_2 the version check on b_es6_0
is incorrect and this part of the test_156 should be skipped.

Test-Parameters: trivial testlist=sanity env=ONLY=156 serverversion=EXA5
Fixes: 3043c6f189 ("LU-12071 osd-ldiskfs: bypass pagecache if requested")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3fd96578e36675655fb265d83ba3f661950ab112
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49246
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15139 osp: block reads until the object is created
Alex Zhuravlev [Sun, 13 Nov 2022 14:51:30 +0000 (17:51 +0300)]
LU-15139 osp: block reads until the object is created

it's possible that remote llog can be read and written simultaneously
at recovery. for example, dtx recovery thread is fetching updates
while MDD's orphan cleanup procedure is removing orphans from PENDING.

OSP can be asked to read a just created in OSP cache object while
actual object on remote MDS hasn't been created yet. OSP should
block such reads until the creation is done.

Lustre-change: https://review.whamcloud.com/47003/
Lustre-commit: 4f2914537cc32fe89c4781bcfc87c38e3fe4419c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5596c791a758dd542746afd961eb1ed9c97845be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16295 kernel: kernel update RHEL 7.9 [3.10.0-1160.80.1.el7]
Jian Yu [Fri, 18 Nov 2022 20:13:08 +0000 (12:13 -0800)]
LU-16295 kernel: kernel update RHEL 7.9 [3.10.0-1160.80.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.80.1.el7.

Lustre-change: https://review.whamcloud.com/49045
Lustre-commit: TBD (from 636e97a22936a1fab8d9e5fde40f6e1f9a1c5bc5)

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I50a0ee572d24ddc73f8af6dc32ef701c260e45b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-6399 pcc: add tunable parameter for PCC attach thread
Qian Yingjin [Wed, 16 Nov 2022 09:26:33 +0000 (04:26 -0500)]
LU-6399 pcc: add tunable parameter for PCC attach thread

Currently the max number of kernel threads doing asynchronous
attach is a hard code value (1024 by default).
In this patch, we make it a tunable parameter:
llite.*.pcc_max_attach_thread_num

Change-Id: Ic59c15af935dd8dff586fa6be3939d4322c136d5
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6372 lipe: Remove colocation constraint from lamigo/lpurge resources
Gaurang Tapase [Fri, 11 Nov 2022 18:26:30 +0000 (23:56 +0530)]
EX-6372 lipe: Remove colocation constraint from lamigo/lpurge resources

We now rely on node attribute *-recovered to start HP resources.
Hence, starting ES 5.2.7 colocation constraints are not needed
to start resources. Moreover, with the rules added, base FS
target resources cannot start on the designated nodes as node
get -inf score. This prevents resources failback in case original
server comes back up after failover.

Test-Parameters: trivial

Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Change-Id: I890b12bf8a0d75d618a041be1eb27960dc62cc7e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49179
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artur Novik <anovik@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16160 llite: clear stale page's uptodate bit
Bobi Jam [Tue, 20 Sep 2022 16:27:04 +0000 (00:27 +0800)]
LU-16160 llite: clear stale page's uptodate bit

With truncate_inode_page()->do_invalidatepage()->ll_invalidatepage()
call path before deleting vmpage from page cache, the page could be
possibly picked up by ll_read_ahead_page()->grab_cache_page_nowait().

If ll_invalidatepage()->cl_page_delete() does not clear the vmpage's
uptodate bit, the read ahead could pick it up and think it's already
uptodate wrongly.

In ll_fault()->vvp_io_fault_start()->vvp_io_kernel_fault(), the
filemap_fault() will call ll_readpage() to read vmpage and wait for
the unlock of the vmpage, and when ll_readpage() successfully read
the vmpage then unlock the vmpage, memory pressure or truncate can
get in and delete the cl_page, afterward filemap_fault() find that
the vmpage is not uptodate and VM_FAULT_SIGBUS got returned. To fix
this situation, this patch makes vvp_io_kernel_fault() restart
filemap_fault() to get uptodated vmpage again.

Lustre-change: https://review.whamcloud.com/48607
Lustre-commit: 5b911e03261c3de6b0c2934c86dd191f01af4f2f

Test-Parameters: testlist=sanityn env=ONLY="16f",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16g",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16f 16g",ONLY_REPEAT=50
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I369e1362ffb071ec0a4de3cd5bad27a87cff5e05
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16304 kernel: kernel update RHEL8.7 [4.18.0-425.3.1.el8]
Jian Yu [Wed, 16 Nov 2022 19:56:58 +0000 (11:56 -0800)]
LU-16304 kernel: kernel update RHEL8.7 [4.18.0-425.3.1.el8]

Update RHEL8.7 kernel to 4.18.0-425.3.1.el8.

Lustre-change: https://review.whamcloud.com/49080
Lustre-commit: TBD (from 8900b469b4d521361d31ca96fed23c49a141fe93)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I13e6d83ada1ec0c4da92f307bf56db5281c41892
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16294 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.28.1]
Jian Yu [Thu, 10 Nov 2022 18:45:10 +0000 (10:45 -0800)]
LU-16294 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.28.1]

Update SLES15 SP4 kernel to 5.14.21-150400.24.28.1 for Lustre client.

Lustre-change: https://review.whamcloud.com/49046
Lustre-commit: TBD (from 6573047b9b577a908ee3ea4ce0904d34cd867912)

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I651894274a09b6240f321e787736d298c5dc41ce
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
Alexandre Ioffe [Thu, 17 Nov 2022 22:29:40 +0000 (14:29 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list

This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I5d73cb38d08533656c64b69f814f1d34e5e667ff
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5758 lipe: complete recovery before hotpools start
Arthur Novik [Wed, 5 Oct 2022 05:16:48 +0000 (22:16 -0700)]
EX-5758 lipe: complete recovery before hotpools start

Added Pacemaker location rules for lamigo and lpurge which force
to start these resources only after OST/MDT recovery complete.
This is conditional on newer Lustre Resource Agent being installed.

Lustre-change: https://review.whamcloud.com/48248
Lustre-commit: f093aef6cbc1a02f8a1b8795f79a4c6d10137a30

Test-Parameters: trivial testlist=hot-pools
Change-Id: Icb3405ca55d5ae940d978b16461d8d4bc2d4d623
Signed-off-by: Arthur Novik <artur_novik@epam.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6379 lipe: add dump_fids option in help
Alexandre Ioffe [Tue, 15 Nov 2022 07:16:25 +0000 (23:16 -0800)]
EX-6379 lipe: add dump_fids option in help

Added missing dump_fids command line option in
command line help

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I197fb7beb3e8712736fa29bb49d2df1ee4517616
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49161
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13176 mdd: rename file with different project ID
Hongchao Zhang [Tue, 11 Jan 2022 15:12:55 +0000 (23:12 +0800)]
LU-13176 mdd: rename file with different project ID

This patch relaxes the limitation for rename between different
project IDs, and it will allow the normal file rename between
directories with different project IDs.

Lustre-change: https://review.whamcloud.com/45660
Lustre-commit: 88c26912a3237fb63923bbb7c7b09111f9f30bbe

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4a2c21248d1e12ad1d00430e11e5dd50fe5eaf60
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15435 ptlrpc: unregister reply buffer on rq_err
Alexander Zarochentsev [Fri, 14 Jan 2022 15:35:48 +0000 (10:35 -0500)]
LU-15435 ptlrpc: unregister reply buffer on rq_err

Unregister reply buffer on rq_err and prevent a late reply from
modifying request flags in INTERPRET state.

Fixes: cefabee52586 ("LU-15112 mgc: do not ignore target registration failure")
HPE-bug-id: LUS-10717

Lustre-change: https://review.whamcloud.com/46132
Lustre-commit: d8012811cc6ff9c7f0fb1ddfec9461e9ff963e54

Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I0106e3fd5443c1292c103247cdbf6122f91922e8
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16222 kernel: RHEL 8.7 client and server support
Jian Yu [Fri, 11 Nov 2022 23:17:19 +0000 (15:17 -0800)]
LU-16222 kernel: RHEL 8.7 client and server support

This patch makes changes to support RHEL 8.7 release
with kernel 4.18.0-423.el8 for Lustre client and server.

Lustre-change: https://review.whamcloud.com/48879
Lustre-commit: 293844d132b79a1d256ed4200d5dbd8bb790bfb4

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Change-Id: Ie97ff67c9a5fbd46bc145ab559665dcbc630b4a0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Co-Authored-By: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn66
Andreas Dilger [Fri, 11 Nov 2022 09:46:43 +0000 (02:46 -0700)]
RM-620 build: New tag 2.14.0-ddn66

New tag 2.14.0-ddn66

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I04f5c407499930a1893d32c0c699c438264dcaf5

2 years agoEX-6298 lipe: Decrease wait time to reconnect to ALR
Alexandre Ioffe [Thu, 10 Nov 2022 19:42:33 +0000 (11:42 -0800)]
EX-6298 lipe: Decrease wait time to reconnect to ALR

1) Made delay between reconnections to ALR gradually increasing
starting from as little as 5 seconds when ssh session
to ALR fails. It makes attempt to reconnect more often
initially.
2) Enable hot-pools test 72 previously excepted

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools mdtcount=6 env=ONLY=72,ONLY_REPEAT=40
Change-Id: Iafae62d733390f92370f5d224830944f285da934
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6279 lipe: need python and pylint for all builds
Lei Feng [Tue, 1 Nov 2022 23:12:53 +0000 (07:12 +0800)]
EX-6279 lipe: need python and pylint for all builds

Check python and pylint ready for all builds.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I7e93ec3cdd51d96ed938f6fa85953b9e3f250877
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]
Jian Yu [Tue, 8 Nov 2022 18:40:24 +0000 (10:40 -0800)]
LU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.30.1.el9_0 for Lustre client.

Lustre-change: https://review.whamcloud.com/49044
Lustre-commit: TBD (from 247849f22a32e85eb8b718d18642f65ac7663a82)

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ide942f88242c80af1e103b226b65cfbce94bfb57
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15935 target: keep track of multirpc slots in last_rcvd
Etienne AUJAMES [Fri, 29 Jul 2022 12:35:33 +0000 (14:35 +0200)]
LU-15935 target: keep track of multirpc slots in last_rcvd

OBD_INCOMPAT_MULTI_RPCS is cleared by tgt_boot_epoch_update() if the
recovery is aborted. This supposes that all the clients are evicted
but that is not true. Some clients could have successfully finished
their recovery. In that case, those clients will keep their last_rcvd
slot.

This patch modifies lut_num_client to keep track of multirpc
slots in last_rcvd.
For now the counter is use only by tgt_fini() to clear
OBD_INCOMPAT_MULTI_RPCS. So we can expand this use case for
tgt_boot_epoch_update().

Add replay-dual test_33.

Lustre-change: https://review.whamcloud.com/48082
Lustre-commit: 1a79d395dd61ea2e21598bfaa5b39375e64ec22c

Test-Parameters: testlist=replay-dual env=ONLY=33,ONLY_REPEAT=30
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I70791c9dcb7cc77f018b9e5c95568598d54f0322
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15404 ldiskfs: truncate during setxattr leads to kernel panic
Andrew Perepechko [Thu, 10 Nov 2022 04:59:27 +0000 (20:59 -0800)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic

When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.

A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Lustre-change: https://review.whamcloud.com/46358
Lustre-commit: e239a14001b62d96c186ae2c9f58402f73e63dcc

Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16251 obdclass: fill jobid in a safe way
Lei Feng [Wed, 19 Oct 2022 04:10:23 +0000 (12:10 +0800)]
LU-16251 obdclass: fill jobid in a safe way

jobid_interpret_string() does not fill jobid in an atomic way.
So in lustre_get_jobid() give it a buffer first, then copy the
buffer to jobid as a whole.

Lustre-change: https://review.whamcloud.com/48915
Lustre-commit: 9a0a89520e8b57bd63a9343fe3cdc56c61c41f6d

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ib8f6aaa93df31867982a0d142f33d7374a27234f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49081
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode
Alexander Zarochentsev [Fri, 29 Jul 2022 19:38:09 +0000 (22:38 +0300)]
LU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode

The flag should be cleared for "fast" symlinks otherwise
e2fsck complains about inode correctness.
New agent inodes of symlink type may have EXT4_EXTENT_FL flag
set if the fs has "extent" feature and it is not cleared as in
other places where "fast" symlinks are created.

Lustre-change: https://review.whamcloud.com/48093
Lustre-commit: 73ac8e35e5d64d3fe4ca6c48514dc57058e3a7b8

HPE-bug-id: LUS-10237
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib7b807bb1298cc3a9fd4fdba35747b4bda6fe034
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16258 llite: Explicitly support .splice_write
Shaun Tancheff [Fri, 21 Oct 2022 04:54:49 +0000 (23:54 -0500)]
LU-16258 llite: Explicitly support .splice_write

Linux commit v5.9-rc1-6-g36e2c7421f02
  fs: don't allow splice read/write without explicit ops

Lustre supports splice_write and previously provide handlers
for splice_read.
Explicitly use iter_file_splice_write, if it exists.

Lustre-change: https://review.whamcloud.com/48928
Lustre-commit: c619b6d6a54235cc0e34a65cf5916a632f4011c3

HPE-bug-id: LUS-11259
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I858688fc9b4dd370b6018c3b134f01e580477b25
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16207 build: add rpm-build BuildRequires for SLES15 SP3
Jian Yu [Tue, 4 Oct 2022 16:24:36 +0000 (09:24 -0700)]
LU-16207 build: add rpm-build BuildRequires for SLES15 SP3

SLES15 SP3 fails to build using rpm-build-4.14.1-29.46
from the main O/S repository with error message:

- Dependency tokens must begin with alpha-numeric,
  '_' or '/': BuildRequires: %kernel_module_package_buildreqs

Updating rpm-build to 4.14.3-150300.46.1 or higher
resolved the build issue.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48760
Lustre-commit: 78c681d9f42cb56e30c8946e5d7b05f0bc6e86f2

Change-Id: I80099e7ba2d98e07b9877183879766f3dd7f3c1a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5473 tests: add version check for interop
Minh Diep [Wed, 9 Nov 2022 21:21:32 +0000 (13:21 -0800)]
EX-5473 tests: add version check for interop

sanity-quota test_75 on 2.12 servers

Test-Parameters: trivial testlist=sanity-quota

Change-Id: I57f5b6415017ec7cf81e3bcb43f289087a8621fd
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6331 lipe: lamigo --help causes Segmentation fault
Alexandre Ioffe [Tue, 8 Nov 2022 18:32:25 +0000 (10:32 -0800)]
EX-6331 lipe: lamigo --help causes Segmentation fault

Fixed printf NULL string argument which causes the seg fault

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I0a9bc3cee308c8cd88d23674bb5127cddb1fdb41
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 target: report multiple transno to debug log
Mikhail Pershin [Wed, 26 Oct 2022 08:17:11 +0000 (11:17 +0300)]
LU-15847 target: report multiple transno to debug log

Don't report multiple transaction cases to console but
make it as debug message.

Lustre-change: https://review.whamcloud.com/49027
Lustre-commit: TBD (1550da71c46f65b72951c0348f32835ed7f617fb)

Fixes: 4e2e8fd2fc0a ("LU-15847 tgt: reply always with the latest assigned transno")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If9b47dfedcaf67487954189e8a75d2029a502469
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49027
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
Jian Yu [Wed, 9 Nov 2022 19:06:37 +0000 (11:06 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list

This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.

Test-Parameters: trivial testlist=hot-pools

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: If214f7285dfb96dee24e6c5968f1f19c81ce1ddf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49085
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15179 tests: add trap cleanup_quota_test
Sergey Cheremencev [Wed, 2 Nov 2022 10:08:50 +0000 (18:08 +0800)]
LU-15179 tests: add trap cleanup_quota_test

Add stack_trap cleanup_quota_test to the tests that
use setup_quota_test. If a test fails without calling
cleanup_quota_test, it may cause later tests to fail
due to used space > 0.

Remove ${tdir}_dom, if exists, in cleanup_quota_test.
sanity-quota_75 doesn't remove test_dom directory.

Lustre-change: https://review.whamcloud.com/#/c/45418/
Lustre-commit: c44b2bea1bacc3cb9173353037cf3a616f13669f

Test-Parameters: trivial  testlist=sanity-quota
Fixes: a4fbe734("LU-14739 quota: nodemap squashed root cannot bypass quota")
Change-Id: Ife4fd499b427bee79f74a5e172d233fe6a83e240
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14958 kernel: use rhashtable for revoke records in jbd2
Alex Zhuravlev [Wed, 12 Oct 2022 07:32:36 +0000 (10:32 +0300)]
LU-14958 kernel: use rhashtable for revoke records in jbd2

resizable hashtable should improve journal replay time when
the latter has got million of revoke records

before:
1048576 records - 95 seconds
2097152 records - 580 seconds

after:
1048576 records - 2 seconds
2097152 records - 3 seconds
4194304 records - 7 seconds

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8f54a51df5e3387277b976e046eea70c26d54dcd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16232 scripts: changelog/updatelog emergency cleanup
Mikhail Pershin [Wed, 12 Oct 2022 09:22:14 +0000 (12:22 +0300)]
LU-16232 scripts: changelog/updatelog emergency cleanup

Emergency cleanup scripts for situations when llogs are
corrupted and can't be cleaned up in a normal way. In such
cases the recommendation is to remove/truncate those llogs.

Scripts make all needed steps and have debugging option to
collect llogs for further analysis.

Scripts possible actions are:
 - dry-run mode to check all actions and files affected
 - create archive with all llogs for analysis
 - remove llogs including all plain llogs

Lustre-change: https://review.whamcloud.com/48838
Lustre-commit: b533700add91fe4220f50d057a470e0b6f4893c9

Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3b197179bc54f451e3c5d7db36b6f1c56c076856
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49023
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16203 llog: skip bad records in llog
Mikhail Pershin [Mon, 3 Oct 2022 15:35:25 +0000 (18:35 +0300)]
LU-16203 llog: skip bad records in llog

This patch is further development of idea to skip bad
(corrupted) llogs data. If llog has fixed-size records
then it is possible to skip one record but not rest of
llog block.

Patch also fixes the skipping to the next chunk:
 - make sure to skip to the next block for partial chunk
   or it causes the same block re-read.
 - handle index == 0 as goal for the llog_next_block() as
   expected exclusion and just return requested block
 - set new index after block was skipped to the first one
   in block
 - don't create fake padding record in llog_osd_next_block()
   as the caller can handle it and would know about
 - restore test_8 functionality to check corruption handling

Lustre-change: https://review.whamcloud.com/48776
Lustre-commit: TBD (from 5896c420d82507f90473414df3e6d342126cc21f)

Fixes: ec4194e4e78c ("LU-11591 llog: add synchronization for the last record")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6f88269e8626269268352f8bfd6d7950de438f3a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer/peer NI when processing llog
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

Lustre-change: https://review.whamcloud.com/43510
Lustre-commit: 16321de596f6395153be6cbb6192250516963077

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 lnet: Provide kernel API for adding peers
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Lustre-change: https://review.whamcloud.com/43509
Lustre-commit: ac201366ad5700edc860c139955af8a09bf53a1a

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]
Jian Yu [Thu, 3 Nov 2022 20:09:15 +0000 (13:09 -0700)]
LU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.32.1.el8_6.

Lustre-change: https://review.whamcloud.com/48969
Lustre-commit: TBD (from c4a23690d3328447c7b4ddbb8f567b2de21457b6)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I5576180ddf10ed2b0a5e2ef85b58fef993de65a4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15259 tests: use existing usernames for setfacl
Andreas Dilger [Tue, 13 Sep 2022 18:06:10 +0000 (02:06 +0800)]
LU-15259 tests: use existing usernames for setfacl

In SLES15.2 and Ubutntu 20 the "bin" and "daemon" users are not
defined in /etc/passwd, causing setfacl to print a cryptic error:

  setfacl -m u:bin:rw f -- failed
  ~     ? setfacl: Option -m: Invalid argument near character 3

Replace "bin" and "daemon" in ACL tests so they are run with user
and group names that exist on all distros currently being tested.
They can also be specified via ACLUSR1/ACLUSR2 in the test config.

The "permission_xattr" test also needs "nobody" user and group.

Also, the "getfacl" command prints users and groups in numerical
order, so the ACL tests will fail if "daemon" < "bin", or if either
group is higher than the "users" group.  Fix them as needed.

Lustre-change: https://review.whamcloud.com/45627
Lustre-commit: 60188994e24b95db5915b8e6802f7963ffb2fd9c

Test-Parameters: trivial testlist=sanity-quota,sanity-sec,pjdfstest
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el7.9 serverdistro=el7.9
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el8.6
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=ubuntu2004

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7003e95577ab3a9314e8d4d29bb6b1784b9f8ae7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-11787 test: Fix checkfilemap tests for 64K page
James Simmons [Mon, 31 Jan 2022 17:44:46 +0000 (12:44 -0500)]
LU-11787 test: Fix checkfilemap tests for 64K page

File mapping is page size aligned. Modify the tests to handle 64K
page.

Lustre-change: https://review.whamcloud.com/45629
Lustre-commit: 7c88dfd28b5cc6114a85f187ecb2473657d42c9d

Test-Parameters: trivial clientdistro=el8.5 clientarch=aarch64 testlist=sanityn env=ONLY="71a 71b"
Change-Id: I316a197db8cdd0f9064431f8c572b43adf6110b8
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48945
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15278 lod: distinguish DIR/REGULAR lod_object members
Bobi Jam [Sat, 25 Dec 2021 14:36:40 +0000 (22:36 +0800)]
LU-15278 lod: distinguish DIR/REGULAR lod_object members

In lod_striping_free_nolock(), we need to distinguish lod_object
type, since DIR/REGULAR lod_object structure share the same memory
region, it could accidently free some unintended memory if it treat
DIR lod_object as REGULAR one, or vice versa.

Lustre-change: https://review.whamcloud.com/45710
Lustre-commit: 7a9c9ccabe93f2d96c80e90f8cbb786faca74835

Fixes: 6a20bdcc608b ("LU-11376 lov: new foreign LOV format")
Fixes: fdad38781ccc ("LU-11376 lmv: new foreign LMV format")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d4c563725b35f7a75f0f1fbf9c1d35b1799eff4
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/45940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoEX-4147 tests: fix interop for sanity test_160h
Xing Huang [Thu, 27 Oct 2022 11:41:11 +0000 (19:41 +0800)]
EX-4147 tests: fix interop for sanity test_160h

Add a check sanity test_160h whether /sbin/umount.lustre is installed
on the MDS, since this subtest is checking whether the MDS unmount
process has completed, and otherwise fails during interop testing.

Test-Parameters: testlist=sanity env=ONLY=160 serverversion=EXA5
Fixes: 6d62073950ac ("EX-3209 lipe: add lpcc util and service")
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I6720b9e27a3a92e543ed877453802d23c0eef36d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48970
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn65
Andreas Dilger [Mon, 31 Oct 2022 04:11:09 +0000 (22:11 -0600)]
RM-620 build: New tag 2.14.0-ddn65

New tag 2.14.0-ddn65

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7bb4b45f5addc0c0d62dcf81c53cb114ad6454c1

2 years agoLU-15829 llite: don't use a kms if it invalid.
Alexey Lyashkov [Thu, 19 May 2022 17:35:18 +0000 (20:35 +0300)]
LU-15829 llite: don't use a kms if it invalid.

Lockless DIO don't update a KMS as other IO type does,
it caused a situation when next read don't known a real file size
to be read. Lets avoid using an invalid KMS.

Lustre-change: https://review.whamcloud.com/47395
Lustre-commit: dc907414db16d99e77aecf6bfd41d82b8cf7c36e

Fixes: 6bce5367 (LU-4198 clio: turn on lockless for some kind of IO)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie71d3f3cc24fc16c03ed07f9f5a3a17c7fdfa684
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4141 lipe: lamigo should detect dead OST and restart ALR
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR

Use '# keepalive' message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message

To make lamigo compatible with older
ofd_access_log_reader lamigo can work in two modes:
1. lamigo does not expect '# keepalive' message.
In this case after timeout it will restart
ofd_access_log_reader silently
2. lamigo expects periodical # keepalive
message. If lamigo does not receive keepalive message
or any other message from ofd_access_log_reader
within timeout it reports error message and
restarts ofd_access_log_reader.
lamigo switches from 1 to 2 once it receives
'# keepalive' message

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I55bc92b03ef5b45b72ff59ffd4b450cd1927cdb0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14719 lod: distributed transaction check space
Lai Siyao [Wed, 30 Mar 2022 21:50:22 +0000 (17:50 -0400)]
LU-14719 lod: distributed transaction check space

Distributed transaction failure may cause file missing or disconnected
directories, to avoid failure on disk full, check remote MDT free
space before transaction start.

The block/inode watermarks in obd_statfs_info are used to check
whether MDT has enough free blocks/inodes.

Add sanity 230x.

Lustre-commit: 6aee406c84b6b8fddf08b560acfcdf7c13c97e63
Lustre-change: https://review.whamcloud.com/47039

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0922e9c8668e8b842d313576bd68b52fa5d434ac
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6193 pcc: dio attach failed on non-blksz-aligned file
Qian Yingjin [Fri, 21 Oct 2022 03:49:35 +0000 (23:49 -0400)]
EX-6193 pcc: dio attach failed on non-blksz-aligned file

PCC attach failed due to do DIO copy on files with blksz unligned
file size.
The reason is that the copy tool ll_fid_path_copy fails on
non-blksize-aligned file for PCC backend (such as a local Ext4
file system) using direct I/O.
In this path, it fixes this bug by falling back from direct I/O to
buffered I/O mode when copy the tail non-blksize-aligned file
part.

This patch also sets the errno with return code in the function
@get_root_path(), thus the call for @llaip_open_by_fid() with
invalid mount path will see the correct errno.

Change-Id: I5287563029269032a91397c0094e2ccede73b9b1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15031 quota: reseed glbe in qmt_lvbo_udate
Sergey Cheremencev [Fri, 28 Oct 2022 10:29:03 +0000 (18:29 +0800)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate

Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.

The patch also adds test 80 to check that OST gets correct
edquot value after failover.

Lustre-change: https://review.whamcloud.com/45032
Lustre-commit: 61ec1e0f2ca8dc4c9f7ed41f782960e65cab0920

HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: move tti_ transaction params to tsi_
Mikhail Pershin [Sat, 28 May 2022 18:16:11 +0000 (21:16 +0300)]
LU-15847 tgt: move tti_ transaction params to tsi_

Move tti_mult_trans and tti_has_trans to tgt_session_info to
be available in all targets. This allows to cleanup old MDT
duplicating code and can be used for complex transaction
handling in MDT/OFD if needed.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47491
Lustre-commit: 0a317b171ebedcba8fc58e548991a884186c350c

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: reply always with the latest assigned transno
Mikhail Pershin [Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)]
LU-15847 tgt: reply always with the latest assigned transno

In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Lustre-commit: 4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15447 tests: sanity-flr/208 reset rotational status
Alex Zhuravlev [Thu, 13 Jan 2022 07:27:21 +0000 (10:27 +0300)]
LU-15447 tests: sanity-flr/208 reset rotational status

new kernels (e.g. 4.18.0-305.25.1) declares loopback devices
in tmpfs as non-rotational one. sanity-flr/208 does wrong
assumption that devices are non-rotational by default. thus,
sanity-flr/208 started to fail with new kernels.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/46088
Lustre-commit: 78dddb423f0dc8571d3c7f8ccd8f77a1c2bc28ae

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib5c42da39667227a6cff5d379e30d2cd6c1e2773
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16106 lnet: allow direct messages regardless of peer NI status
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status

If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.

This change allows non-routed traffic to peer NIs with "down"
status.

Lustre-commit: 3345a8a54e89c342a4ce2d8d4bcb04ee919bcd52
Lustre-change: https://review.whamcloud.com/c/48355

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48669
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16025 llite: adjust read count as file got truncated
Bobi Jam [Thu, 7 Jul 2022 07:38:54 +0000 (15:38 +0800)]
LU-16025 llite: adjust read count as file got truncated

File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.

This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.

Lustre-change: https://review.whamcloud.com/47896
Lustre-commit: 4468f6c9d92448cb72c5a616ec74653e83ee8e10

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie51ba09201a1ca1464c3a3892d367590e978ee34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48848
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14642 test: add fsx mirror file test mode
Bobi Jam [Thu, 2 Sep 2021 16:30:10 +0000 (00:30 +0800)]
LU-14642 test: add fsx mirror file test mode

- add fsx mirror file test mode with "-M" option so that fsx can exert
its IO to FLR file as well as extend/split/resync the FLR file.

- add sanity-flr test_70b() to test fsx with flrmode.

- fix a bug in "lfs mirror verify" to accomodate max mirror count
instead of (max - 1) mirrors.

- improve "lfs mirror verify -v" print proper data range of its crc-32
checksum values.

Lustre-change: https://review.whamcloud.com/43473
Lustre-commit: 90ba8b4ac360b1987178445bd2ccd64f7958d912

Test-Parameters: testlist=sanity-flr env=ONLY=70a,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib55c7b25dcd82fa0b197ad21268b16c82aab5da9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16249 sec: krb5_decrypt_bulk calls decryption primitive
Sebastien Buisson [Tue, 18 Oct 2022 15:19:01 +0000 (17:19 +0200)]
LU-16249 sec: krb5_decrypt_bulk calls decryption primitive

krb5_decrypt_bulk() was mistakenly calling an encryption primitive
instead of a decryption primitive for the confounder.

Lustre-change: https://review.whamcloud.com/48907
Lustre-commit: TBD (851f3915659941db00a0cda58867e68139e5e0d1)

Test-Parameters: trivial
Fixes: 0a65279121 ("LU-13344 gss: Update crypto to use sync_skcipher")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9251172644ed6baa3bb06a59dbe7c1bab401d817
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15097 quota: stop pool_recalc before killing pool
Sergey Cheremencev [Wed, 19 Oct 2022 11:18:04 +0000 (19:18 +0800)]
LU-15097 quota: stop pool_recalc before killing pool

qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:

    qmt_pool_fini() ASSERTION(list_empty(&qmt->qmt_pool_list)) failed

Lustre-change: https://review.whamcloud.com/45256
Lustre-commit: 862f0baa7c21cb631b98d3886ef9e938f4519573

Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4567 kernel: add extra field for snapshot in el8
Hongchao Zhang [Fri, 21 Oct 2022 07:43:11 +0000 (03:43 -0400)]
EX-4567 kernel: add extra field for snapshot in el8

Adding extra fields in "struct jbd2_journal_handle" and
"struct journal_head", which are used by snapshot into the
4-byte hole at the end of struct jbd2_journal_handle so
that they do not increase the structure size and memory
usage for this common allocation.

Use RH_KABI_EXTEND() and RH_KABI_FILL_HOLE() so that the
new fields do not affect the kernel ABI compatibility.

Change-Id: I84f52b18694e56d837d64c5c80076e45dde27eab
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6102 lipe: lipe_scan3 not intended for customer use
Alexandre Ioffe [Tue, 25 Oct 2022 03:06:08 +0000 (20:06 -0700)]
EX-6102 lipe: lipe_scan3 not intended for customer use

Print warning lipe_scan3 is not intended for customer use

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I92f775d77e1d4ffac304d3e46ed6af7c642a3bdd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11388 tests: exclude replay-single/131b for ldiskfs
Andreas Dilger [Fri, 14 Oct 2022 21:09:03 +0000 (15:09 -0600)]
LU-11388 tests: exclude replay-single/131b for ldiskfs

Test is failing about 1/10 of the test runs, even on ldiskfs.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c36d026944876e066a1dc36877927b7a92c537e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48946

2 years agoEX-5099 lipe: Made controllable ssh exec timeout
Alexandre Ioffe [Wed, 13 Apr 2022 05:34:18 +0000 (22:34 -0700)]
EX-5099 lipe: Made controllable ssh exec timeout

- Introduce new lipe ssh API:lipe_ssh_exec_timeout() and
lipe_ssh_start_cmd_timeout().
- Introduce new lamigo command option: --ssh-exec-timeout
to configure ssh connection timeout for ssh exec cmd
- Use lipe_ssh_start_cmd_timeout() to start remote
access log reader with timeout.
Use ssh_channel_read_timeout() with infinite timeout
when reads access log records
- Use lipe_ssh_start_cmd_timeout() to start remote "lfs ..."
commands with a long timeout to prevent premature timeout
when "lfs mirror extend ..." command for a big file
takes too long time.
- Use default timeout 600 seconds for ssh exec cmd.
Such long timeout should allow to finish long lasting
replications
This fixes EX-5429.

Test-Parameters: trivial clientdistro=el8.5 serverdistro=el8.5 testlist=hot-pools env=FAIL_ON_ERROR=false,ONLY="56 59",ONLY_REPEAT=20
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I8de9b1db2014abd1e6f201cda73a0812128f6bb6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]
Jian Yu [Fri, 21 Oct 2022 20:35:40 +0000 (13:35 -0700)]
LU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]

Update SLES15 SP3 kernel to 5.3.18-150300.59.93.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48601
Lustre-commit: c3467db7e7d0652c09bdcef26e2b708ab51cba9e

Change-Id: I1e0afe6974567d13680dbb0d463fbbd873ef2e5f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48864
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
Andreas Dilger [Thu, 6 Oct 2022 17:31:51 +0000 (10:31 -0700)]
LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed

This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.

Lustre-change: https://review.whamcloud.com/48629
Lustre-commit: 9a3e111a2ebdfadec4b6efc65899856edc90ad18

Test-Parameters: testlist=sanity clientdistro=el8.6
Test-Parameters: testlist=sanity clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"
Change-Id: I50f56b87844e8b019053e569767b6c949d2a3f55
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Lustre-change: https://review.whamcloud.com/44984
Lustre-commit: 1711e26ae861c28829870c2433caf7ee232909cf

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16044 osd: discard pagecache in truncate's declaration
Alex Zhuravlev [Mon, 25 Jul 2022 13:26:40 +0000 (16:26 +0300)]
LU-16044 osd: discard pagecache in truncate's declaration

to avoid taking pagelock inside a transaction which conflicts
with the write path where we take pagelock before any another one.
this should be safe as the write path writes the pages out
synchronously, so they should be clean by truncate.

Lustre-change: https://review.whamcloud.com/48033
Lustre-commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iba555ace2ce9ef34ab5517375ecb5c176f738a02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48885
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16076 utils: enhance 'lfs check' command
Lei Feng [Mon, 8 Aug 2022 02:59:25 +0000 (10:59 +0800)]
LU-16076 utils: enhance 'lfs check' command

Add optional argument to 'lfs check' command so that only the
servers related to the specified lustre file system is checked.

lustre-change: https://review.whamcloud.com/48155
lustre-commit: f5ca6853b8d8b918b0228af31fa8249be49d3000

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=113
Change-Id: I826a8e822af0a290f06ffaadadf1bb7f86899d99
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15305 obdclass: fix race in class_del_profile
Li Dongyang [Fri, 7 Oct 2022 12:09:10 +0000 (23:09 +1100)]
LU-15305 obdclass: fix race in class_del_profile

Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.

Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Lustre-commit: 83d3f42118579d7fb7c3002533c047badcf41e0d

Change-Id: I648aa206716213b064d045f546516b219337e0ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15467 tests: fix sanity-hsm test_103a timeout issue
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue

Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.

Lustre-change: https://review.whamcloud.com/46252
Lustre-commit: 98e1e41ce47c95155a8c8d452eef5074492d22f0

Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15646 llog: correct llog FID and path output
Mikhail Pershin [Mon, 17 Oct 2022 19:53:44 +0000 (12:53 -0700)]
LU-15646 llog: correct llog FID and path output

- fix wrong LLOG_ID-to-FID convertion to output llog FID by
  introducing PLOGID macro to expand llog ID for DFID format
- stop printing lgl_ogen along with llog FID as it always zero
  since 2.3.51 and is not used anymore
- output correct path for update llog in llog_reader
- always print header info in llog_reader if available
- print llog flags in header info

Lustre-change: https://review.whamcloud.com/48430
Lustre-commit: e28f3ee185b2ef7bad8046f46444772fac214a40

Fixes: 5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-6612 utils: strengthen llog_reader vs wrong format/header
Bruno Faccini [Mon, 17 Oct 2022 19:46:25 +0000 (12:46 -0700)]
LU-6612 utils: strengthen llog_reader vs wrong format/header

The following snippet shows that llog_reader can be puzzled due to
an invalid 0 for the number of records when parsing an expected
LLOG file header :
root# dd if=/dev/zero bs=4096 count=1 of=/tmp/zeroes
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000263962 s, 15.5 MB/s
root# llog_reader /tmp/zeroes
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

Lustre-change: https://review.whamcloud.com/15654
Lustre-commit: 45291b8c06eebf33d3654db3a7d3cfc5836004a6

Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I12be79e6c6a5da384a5fd81878a76a7ea8aa5834
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48895
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15000 llog: read canceled records in llog_backup
Etienne AUJAMES [Mon, 17 Oct 2022 19:37:39 +0000 (12:37 -0700)]
LU-15000 llog: read canceled records in llog_backup

llog_backup() do not reproduce index "holes" in the generated copy.
This could result to a llog copy indexes different from the source.
Then it might confuse the configuration update mechanism that rely on
indexes between the MGS source and the target copy.

This index gaps can be caused by "lctl --device MGS llog_cancel".

This patch add "raw" read mode to llog_process* to read canceled
records. So now llog_backup is able to reproduce an exact copy of
the original.

Lustre-change: https://review.whamcloud.com/46552
Lustre-commit: d8e2723b4e9409954846939026c599b0b1170e6e

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I811e23de8f4545bed36a44fedc2638d7418830dd
Reviewed-by: Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14098 obdclass: try to skip corrupted llog records
Alex Zhuravlev [Mon, 17 Oct 2022 19:31:56 +0000 (12:31 -0700)]
LU-14098 obdclass: try to skip corrupted llog records

if llog's header or record is found corrupted, then
ignore the remaining records and try with the next one.

Lustre-change: https://review.whamcloud.com/40754
Lustre-commit: 910eb97c1b43a44a9da2ae14c3b83e28ca6342fc

Fixes: 186f083722 ("LU-11924 osp: combine llog cancel operations")

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If47ec1fc1e2eaf64be7ba08d3aa9c2b93903c0cf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14044 llog: check fid after convert
Yang Sheng [Mon, 17 Oct 2022 18:53:47 +0000 (11:53 -0700)]
LU-14044 llog: check fid after convert

We should convert from llog_id and then check fid. Also
change fid-lookup to error check instead LASSERT.

Lustre-change: https://review.whamcloud.com/40294
Lustre-commit: 6df76d3357fc5896b6902399ed7ce6d7c7835f58

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I673d8f16ff9e57a0482d6a3ec3ee3db33699f57f
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48892
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-5909 tests: clean up in sanity-quota/16a
Andreas Dilger [Fri, 14 Oct 2022 22:04:53 +0000 (16:04 -0600)]
EX-5909 tests: clean up in sanity-quota/16a

Clean up the test file in sanity-quota test_16a.  If test_16b is
run (DNE config) then the filesystem is reformatted, but in the
non-DNE config test_17 will fail if there is used quota.

Test-Parameters: trivial testlist=sanity-quota
Fixes: b54b7ce43929 ("LU-14472 quota: skip non-exist or inact tgt for lfs_quota")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id1faeab9df246d8010bf114582ab17a75846db68
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48899
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn64
Andreas Dilger [Fri, 14 Oct 2022 20:06:26 +0000 (14:06 -0600)]
RM-620 build: New tag 2.14.0-ddn64

New tag 2.14.0-ddn64

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia86edfc375e1dda7205db1a32c8c1933153a3e92

2 years agoLU-15738 test: check lfsck status before starting
Hongchao Zhang [Fri, 22 Jul 2022 15:02:24 +0000 (23:02 +0800)]
LU-15738 test: check lfsck status before starting

If the LFSCK has been started before calling "lfsck_start"
to start it, the test shouldn't fail for starting LFSCK.

Lustre-change: https://review.whamcloud.com/48018/
Lustre-commit: 29aaf679afac89359e1b116b8de0480f24b4e8ac

Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I266d9e2b9c5f37eb9e08b489fab428268b90d895
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5964 lamigo: disable idle disconnects
Alex Zhuravlev [Mon, 19 Sep 2022 16:00:15 +0000 (19:00 +0300)]
EX-5964 lamigo: disable idle disconnects

on the connections lamigo uses locally to avoid storms
of reconnects.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3bc2742853e9636e38fbd8f7c2f238b3af55e0ba
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48840
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-3142 tests: changelog processing verification
Alex Zhuravlev [Fri, 6 Aug 2021 06:34:31 +0000 (09:34 +0300)]
EX-3142 tests: changelog processing verification

add extra counter to lamigo stats to catch gaps in changelog
processing. add a new test (hot-pools/60) to verify that no
gaps happen (i.e. lamigo gets all changelog records), verify
that the changelog is purged properly.

Test-Parameters: trivial testlist=hot-pools mdscount=2 mdtcount=4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I34d9d6f6f7f5766d945df43ae7d43dab7c70cef1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13578 test: sleep longer in sanity test_39
John L. Hammond [Wed, 8 Jun 2022 02:15:39 +0000 (19:15 -0700)]
LU-13578 test: sleep longer in sanity test_39

In sanity test_39r(), sleep for 2 * atime_diff rather than atime_diff + 1.

Lustre-change: https://review.whamcloud.com/47346
Lustre-commit: be2525ffddb4bf55fde77e97b00d1c349119daed

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=50
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ied508e12c848f6935d2317fb86bddc5341a6156e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-15472 ldlm: optimize flock reprocess
Andriy Skulysh [Fri, 5 Nov 2021 10:55:08 +0000 (12:55 +0200)]
LU-15472 ldlm: optimize flock reprocess

Resource reprocess on flock unlock can be done once
after all pending unlock requests.
It allows to reduce spinlock contention.

Lustre-change: https://review.whamcloud.com/46257
Lustre-commit: 42f377db4a24cefa7a041fcd3106dd58771eb319

Change-Id: I2809070f27fe3af7e1fc34e2b4b22603931f3dff
HPE-bug-id: LUS-10471, LUS-10909
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15132 mdc: Use early cancels for hsm requests
Etienne AUJAMES [Mon, 2 May 2022 12:27:17 +0000 (14:27 +0200)]
LU-15132 mdc: Use early cancels for hsm requests

HSM RELEASE and RESTORE requests take EX layout lock on the MDT side.
So the client can use early cancel for its local lock on the resource
to limit the contention (mdt side).

This patch does not pack ldlm request inside the hsm request because
the field (RMF_DLM_REQ) does not exist in the request. Adding this
field inside the request would break compatibility with _old_ servers.

Lustre-change: https://review.whamcloud.com/47181
Lustre-commit: 60d2a4b0efa4a944b558bd9b63b6334f7e70419b

Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I30a57b4855c28eef9c55a9645d3b6c491f962b13
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>