Whamcloud - gitweb
James Simmons [Thu, 12 Jan 2023 14:02:54 +0000 (09:02 -0500)]
LU-9680 utils: new llapi_param_display_value().
Currently the special YAML handling is done in lustre_cfg.c
for param handling. Other functionality internal to
liblustreapi.so will use this as well so move the handling
internal to liblustreapi.so. Currently we only make the new
llapi_param_display_value() function visible only to the
liblustreapi internal code. Later when we support /sys access
we can make this available for general use.
The "lctl get_param" and "lctl list_param" generally worked
for non-root users, but not for parameters under
/sys/kernel/debug due to permission changes in the kernel.
We still lacked proper non-root access for lctl get_param and
lctl list_param. Implement full lctl get_param functionality
for non-root users. Also make lctl list_param work for
non-root users. These changes will also work with any
parameters implemented with Netlink.
Change-Id: Ifd9aad16decb0803a336314d4dea38664ff41aa4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Qian Yingjin [Tue, 16 Aug 2022 07:57:47 +0000 (03:57 -0400)]
LU-14393 recovery: reply reconstruction for batched RPCs
Batched RPC can boost the metadata performance for Lustre
dramatically. However, it also increases the complexity of the
recovery, such as how to reconstruct the reply in case of the RPC
resend if the reply was lost.
In this patch, it adds a new field @lrd_batch_idx in the data
structure @lsd_reply_data to store each slot of the "reply_data"
file:
struct lsd_reply_data {
__u64 lrd_transno; /* transaction number */
__u64 lrd_xid; /* transmission id */
__u64 lrd_data; /* per-operation data */
__u32 lrd_result; /* request result */
__u32 lrd_client_gen; /* client generation */
__u32 lrd_batch_idx; /* index in a batched RPC */
__u32 lrd_padding[7]; /* unused fields */
};
When found that a batched RPC was a resend RPC request, and if
the index of the sub request in the batched RPC is smaller or
equal than @lrd_batch_idx in the reply data, it means that the sub
request has already executed, the server will reconstruct the
reply for this sub request; if the index is larger than
@lrd_batch_idx, the server will re-execute the sub reqeust in the
batched RPC.
Disable conf-sanity/32{a,b,c,d,e,f,g}, 108{a,b} temporarily until
the compatibility issue during upgrade for new reply data format
is fixed.
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Id48ecc263002cb783f5032642d05e1f3f6673837
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Qian Yingjin [Mon, 1 Feb 2021 03:51:08 +0000 (11:51 +0800)]
LU-14393 protocol: basic batching processing framework
Batching processing can obtain boost performace. The larger the
batch size, the higher the latency for the entire batch. Although
the latency for the entire batch of operations is higher than the
latency of any single operation, the throughput of the batch of
operations is much high.
This patch implements the basic batching processing framework for
Lustre. It could be used for the future batching statahead and
WBC.
A batched RPC does not require that the opcodes of sub requests in
a batch are same. Each sub request has its own opcode. It allows
batching not only read-only requests but also multiple
modification updates with different opcodes, and even a mixed
workload which contains both read-only requests and modification
updates.
For the recovery, only the batched RPC contains a client XID,
there is no separate client XID for each sub-request. Although the
server will generate a transno for each update sub request, but
the transno only stores into the batched RPC (in @ptlrpc_body)
when the sub update request is finished. Thus the batched RPC only
stores the transno of the last sub update request. Only the
batched RPC contains the @ptlrpc_body message field. Each sub
request in a batched RPC does not contain @ptlrpc_body field.
A new field named @lrd_batch_idx is added in the client reply data
@lsd_reply_data. It indicates the sub request index in a batched
RPC. When the server finished a sub update request, it will update
@lrd_batch_idx accordingly.
When found that a batched RPC was a resend RPC, and if the index
of the sub request in the batched RPC is smaller or equal than
@lrd_batch_idx in the reply data, it means that the sub request has
already executed and committed, the server will reconstruct the
reply for this sub request; if the index is larger than
@lrd_batch_idx, the server will re-execute the sub request in the
batched RPC.
To simplify the reply/resend of the batched RPCs, the batch
processing stops at the first failure in the current design.
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idaa814e82c968811bdda1c750b18c878b2c2ca67
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Chris Horn [Sat, 29 Oct 2022 22:30:17 +0000 (16:30 -0600)]
LU-16452 kfilnd: Check replay deadline before send
The LND timeout needs to account for the total time needed for bulk
operations to complete. On cassini, this can be ~120 seconds due to
the CXI retry-handler timeout on both the sender and target. i.e. LND
timeout is really the max round trip time, and (LND timeout)/2 is the
max one-way trip time.
When we replay a transaction we want to at least ensure we have enough
time to deliver the message to the receiver, as this gives us a
chance at still completing transactions. We should ensure that we
still have (LND timeout)/2 seconds remaining before posting a new
transaction.
Introduce kfilnd_transaction::tn_replay_deadline,
which is set to the transaction deadline minus (LND timeout)/2.
Check the replay deadline in kfilnd_tn_state_idle() before attempting
to post the transaction. If we've exceeded that deadline then fail
the transaction with -ETIMEDOUT and set a NETWORK_TIMEOUT health
status.
Modify the throttle check in kfilnd_tn_state_idle() to check
kfilnd_transaction::tn_replay_deadline instead of
kfilnd_transaction::deadline to determine when we should timeout
a transaction that is being throttled. Note, this check is switched
to using ktime_before() rather than ktime_after() since the case
is about checking whether we are currently before the deadline rather
than after it. The current code isn't wrong. It is just grammatically
awkward.
HPE-bug-id: LUS-11304
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1911d51cee4acea20577e3fc45c99b8948b79523
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49593
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Fri, 28 Oct 2022 22:27:17 +0000 (16:27 -0600)]
LU-16451 kfilnd: Throttle traffic to down peers
If a transaction fails with -EHOSTUNREACH then this suggests the
target is actually down. We want to avoid consuming resources on the
local NIC by trying to send messages to down peers, so we will
require a hello handshake before injecting other new messages to a
peer we suspect is down.
Introduce a new kfilnd_peer state, KP_STATE_DOWN. Peers in either
KP_STATE_UPTODATE or KP_STATE_STALE can transition to KP_STATE_DOWN.
We'll transition a peer to KP_STATE_DOWN if we fail a transaction with
them and the errno we get is either -EHOSTUNREACH or -ENOTCONN.
kfilnd_peer_down() transitions a peer to KP_STATE_DOWN as appropriate.
Similar to stale peers, if we continue to fail transactions with peers
that are down then we want to eventually purge them from the peer
cache. This logic in kfilnd_peer_stale() is moved to
kfilnd_peer_purge_old_peer(), and this new function is called by both
kfilnd_peer_stale() and kfilnd_peer_down().
Introduce kfilnd_peer_needs_throttle() that determines whether we
should queue a message for future replay pending a successful
handshake. Integrate this into kfilnd_tn_state_idle() so that we queue
messages for peers in KP_STATE_DOWN in addition to peers in
KP_STATE_NEW. Modify debug statements in this area to remove redundant
info and reflect that we can hit these conditions for down peers, not
just new peers.
Introduce kfilnd_peer_tn_failed() to interpret the errno for a
transaction failure and call kfilnd_peer_down() or kfilnd_peer_stale()
as appropriate. This function replaces all existing calls to
kfilnd_peer_stale(). kfilnd_peer_stale() is now a static function.
HPE-bug-id: LUS-11314
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I206075c3a1b2836715dc79b49b098dab51c6bb94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49591
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Tue, 25 Oct 2022 19:21:17 +0000 (13:21 -0600)]
LU-16450 kfilnd: Cancel TNs if handshake fails
When sending a message to a new peer a HELLO is sent first and the
original message waits for the handshake to complete. If the HELLO
fails to be sent then the original message will continue to wait for
the full LND timeout. When we retry the original message we should
check whether there is actually an outstanding HELLO. If not, then
this indicates the HELLO failed and we should cancel the TN.
HPE-bug-id: LUS-11310
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4ed07964d5af0bcc3bdca33c1ea46fd436af2e98
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49590
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Elena Gryaznova [Sun, 8 Jan 2023 18:05:19 +0000 (19:05 +0100)]
LU-16455 tests: recovery-small test_139() fix
mds device calculated before stop () can not be used
after stop() because of a device-mapper device is removed and
facet device is restored:
stop () ->
elif dm_flakey_supported $facet; then
if [[ -n ${!failover_host} &&
${!failover_host} != ${!host} ]]
dm_cleanup_dev $facet ->
unexport_dm_dev $facet
Without this fix test_139 fails on failover setup:
losetup: /dev/mapper/mds1_flakey: failed to set up loop device:
No such file or directory
To reproduce the failure just run:
sh llmountcleanup.sh
ONLY=139 sh recovery-small.sh
on failover setup where mds1_HOST != mds1failover_HOST
Fixes:
4597fa7d88 ("LU-13061 osp: check catlog FID after reading in")
Test-Parameters: trivial testlist="recovery-small" failover=true iscsi=1 \
env=ONLY=139,SLOW=yes mdssizegb=10 clientcount=4 osscount=2 mdscount=2 \
mdtcount=2 austeroptions=-R
Test-Parameters: trivial testlist="recovery-small" failover=true iscsi=1 \
env=FAILURE_MODE=HARD,ONLY=139,SLOW=yes mdssizegb=10 clientcount=4 \
osscount=2 mdscount=2 mdtcount=2 austeroptions=-R
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
HPE-bug-id: LUS-10912
Change-Id: I67d98f633de4023a4430b55c6b4d308c7f17d988
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49579
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Thu, 5 Jan 2023 22:44:50 +0000 (15:44 -0700)]
LU-2771 ldlm: remove obsolete LDLM_FL_SERVER_LOCK
The LDLM_FL_SERVER_LOCK flag and accompanying accessor macros have
never been used since they were first introduced. Remove them.
It looks like this may have been duplicated by LDLM_FL_NS_SRV.
Test-Parameters: trivial
Fixes:
caa55aec4a ("LU-2771 dlmlock: compress out unused space")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iffc9b126334a327a9054f9acae86f4a0d03ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Thu, 5 Jan 2023 17:59:24 +0000 (12:59 -0500)]
LU-13642 lnet: modify lnet_inetdev to work with large NIDS
Change li_ipv6 field in struct lnet_inetdev to li_size which
now represents the size of the NID address. This will work
with the GUID of Inifiniband as well. Second change is
to store li_ipaddr always in network format. This will allow
direct comparsion between li_ipaddr and the nid_addr of
struct lnet_nid. We will ensure AF_IB will also be in the
same format as what will be stored in struct lnet_nid.
Implement setup with a NID address for the ko2iblnd LND driver.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7c27edb67263dd5bda4728c536aee266d38a4592
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Sat, 17 Dec 2022 13:06:16 +0000 (08:06 -0500)]
LU-16380 osd-ldiskfs: race in OI mapping
There is race in OI scrub thread and OI mapping entry insertion, which
may add an inconsistent OI mapping entry, but not started OI scrub
thread. This may lead to osd_fid_lookup() always returns -EINPROGRESS.
To avoid such race, osd_fid_lookup() returns -EINPROGRESS only when
OI mapping is inconsistent, and OI scrub thread is not running.
Fixes:
558784caad ("LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs")
Test-Parameters: mdscount=2 mdtcount=4 testlist=conf-sanity env=ONLY=108b,ONLY_REPEAT=50
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I05114b6a33940c210e9952f6e24f6c36fd7f76a2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Wed, 7 Dec 2022 02:53:25 +0000 (21:53 -0500)]
LU-16335 mdt: skip target check for rm_entry
For "lfs rm_entry", target may not exist, sanity check of it may fail
thus causes rm_entry fail.
Add sanity 832.
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I824c7581af05c7494cf03c0c9bc999ca1abfec01
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49329
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Shaun Tancheff [Fri, 2 Dec 2022 09:46:31 +0000 (03:46 -0600)]
LU-16302 llite: Use alloc_inode_sb() to allocate inodes
linux-commit: v5.17-49-g8b9f3ac5b01d
fs: introduce alloc_inode_sb() to allocate filesystems specific
inode
Filesystems are expected to use alloc_inode_sb to allocate inodes
for proper lru handling.
Test-Parameters: trivial
HPE-bug-id: LUS-11332
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie6f091a01df33738ed2ef6f7fef9c1f9c1a51e03
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Lei Feng [Tue, 10 Jan 2023 08:51:27 +0000 (16:51 +0800)]
LU-16459 tests: fix YAML verification function
YAML verification function is not correct in tests.
Fix it and change test case accordingly.
Fixes:
bedb797c5d ("LU-16110 lprocfs: make job_stats and rename_stats valid YAML")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I109e2294aea3d1bffa08e6d2c39a5911fa8ef7df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49584
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Elena Gryaznova [Fri, 14 Oct 2022 14:51:03 +0000 (17:51 +0300)]
LU-16239 tests: do not cleanup clients dirs
Patch adds the ability to not remove the clients
directories. Let's just rename them if CLEANUP set to
false.
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11158
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ibc55d32ef4946a62b00dcbf745567c123650ced9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48870
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Chris Horn [Mon, 22 Aug 2022 19:43:36 +0000 (13:43 -0600)]
LU-16214 kfilnd: Proactively handshake old peers
If asked to send a message to a peer that we haven't communicated with
for some time, then we run the risk of that peer having a stale
(or missing) peer entry for us. This can result in the target peer
silently dropping our message. To reduce the chance of this happening
proactively handshake any peer we haven't talked to in the last 2x LND
timeouts.
Note, kfilnd_peer_needs_hello() is called on both the send and receive
path. We only want to proactively handshake on the send path, so an
argument is added to this function so it can distinguish between the
two situations.
HPE-bug-id: LUS-11125
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaacb48e5c45305869bd22335ce112b21cf67e848
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48786
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Chris Horn [Fri, 19 Aug 2022 20:27:26 +0000 (14:27 -0600)]
LU-16214 kfilnd: Keep stale peer entries
A peer is currently removed from the cache whenever there is a network
failure associated with the peer. This leads to situations where
incoming messages from that peer will be dropped until a handshake
can be completed.
If we instead keep these stale peer entries then we at least have a
chance of completing future transactions with the peer.
To accomplish this, we introduce states to struct kfilnd_peer.
When a kfilnd_peer is newly allocated it is assigned a state of
KP_STATE_NEW. kfilnd_peer_is_new_peer() is modified to check for this
state rather than check if kp_version is set.
When a handshake is completed the peer is assigned a state of
KP_STATE_UPTODATE.
When a peer that is up-to-date experiences a failed network operation
then it is assigned a state of KP_STATE_STALE. kfilnd_peer_stale() is
introduced to set this state. Existing callers of kfilnd_peer_down()
are converted to call kfilnd_peer_stale(). kfilnd_peer_down() is
renamed to kfilnd_peer_del().
We will initiate a handshake to any peer that is in either
KP_STATE_NEW or KP_STATE_STALE. kfilnd_peer_needs_hello() is
modified accordingly.
struct kfilnd_peer::kp_last_alive is checked by kfilnd_peer_stale().
If we haven't heard from a stale peer within five LND timeout periods,
then that peer is deleted.
An additional kfilnd_peer_alive() call is added to
kfilnd_tn_state_idle() for the TN_EVENT_RX_HELLO case, so that
peer aliveness is updated when we receive a hello request or response.
HPE-bug-id: LUS-11125
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Icfb722e58fa334d983df02742dc456a55ac2abc3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48785
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Chris Horn [Mon, 15 Aug 2022 21:06:25 +0000 (15:06 -0600)]
LU-16213 kfilnd: Finalize replay TNs with deleted peer
If there are transactions on the replay queue awaiting a hello
response, and the peer is marked for removal (e.g. because the hello
TN failed) then let's finalize those TNs right away rather than wait
for them to hit the timeout.
HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6dc77cadaf850ab9ec37bf50241074bc3f5650b5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48784
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Chris Horn [Fri, 19 Aug 2022 19:48:37 +0000 (13:48 -0600)]
LU-16213 kfilnd: Allow one HELLO in-flight per peer
Allow one HELLO message to be in-flight, per peer, at one time.
Accomplished by adding a flag to struct kfilnd_peer to indicate
whether a hello request has been sent to the peer. Cleared if the
send fails or when the hello response is received.
To detect situation where hello response is never received we add
kp_hello_ts to struct kfilnd_peer to record timestamp of when the
hello request was sent. If this is more than LND timeout seconds in
the past then we may send another hello.
Fix return value of kfilnd_send() when we're unable to allocate a
kfilnd_tn for the hello.
There's some code duplication with updating a peer based on hello
request and response. Consolidate processing of these hello messages
into a single function.
A race exists where a peer can be marked for removal in between a call
to kfilnd_peer_needs_hello() and the call to kfilnd_tn_alloc() inside
kfilnd_send_hello_request(). This would cause a hello request to be
sent to a new peer, created by kfilnd_peer_get() inside
kfilnd_tn_alloc(), without properly setting the kp_hello_pending flag
on that new peer. To avoid this situation, introduce
kfilnd_tn_alloc_for_peer() which takes a struct kfilnd_peer pointer
as an argument to assign to kfilnd_transaction::tn_kp. Use this to
allocate the kfilnd_transaction for the hello request inside
kfilnd_send_hello_request().
HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6bb0928a629cb398c270366fae6d1040ad67df3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Chris Horn [Fri, 19 Aug 2022 16:48:01 +0000 (10:48 -0600)]
LU-16213 kfilnd: Fail sends of particular message type
Add ability to use failure injection to specify a message type for
simulated failure.
For example, to simulate failure of all immediate messages:
lctl set_param fail_loc=0xF114 fail_val=1
To simulate failure of a single hello request:
lctl set_param fail_loc=0x8000F114 fail_val=4
HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4a20e92826df75812ef5b81979944526e4b94d83
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Chris Horn [Thu, 18 Aug 2022 19:02:18 +0000 (13:02 -0600)]
LU-16213 kfilnd: Add peer info to some debug statements
Add kfilnd_peer pointer address to some debug statements.
Use 0x%llx format consistently when printing kfilnd_peer::kp_addr
Also add the message type to the TN debug macro.
HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4410ca9215f9d0a6eb65e6d4f953234fa7fba5ea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48781
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Thu, 11 Aug 2022 19:03:04 +0000 (13:03 -0600)]
LU-16213 kfilnd: Rename struct kfilnd_peer members
Prefix members of struct kfilnd_peer with kp_ to make these variable
names easier to find.
Also use 'kp' as a standard name for pointers to struct kfilnd_peer
instead of 'peer' (again to make these pointers easier to find). As
such, struct kfilnd_transaction::peer is also renamed to
struct kfilnd_transaction::tn_kp.
HPE-bug-id: LUS-11128
Test-Parameters: trivial
Change-Id: Id535c7af28a5335026037a55920c706a4e16f947
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Tue, 26 Oct 2021 08:38:50 +0000 (11:38 +0300)]
LU-15163 osd: osd_obj_map_recover() to restart transaction
osd_obj_map_recover() stops transaction when need to call
vfs_link() and it has to start a new transaction to modify
filesystem.
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6efe5444ddc959b19092bebc6e3c7dc25a29cea1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45368
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Li Dongyang [Tue, 25 Jan 2022 00:53:33 +0000 (11:53 +1100)]
LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000
Fix the tests asssuming objects created for MDT0000
always have a seq number of 0, to prepare for
deprecating IDIF sequence.
Fix sanity test_312 on ZFS to properly identify which
OST the object was created on, and re-enable it.
Test-Parameters: testlist=sanity env=ONLY="39r 312"
Test-Parameters: testlist=sanity-scrub env=ONLY=19
Test-Parameters: testlist=sanity-sec env=ONLY=37
Change-Id: I4bffabe25a6f84cdba760aabea1da3429715a283
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46293
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Thu, 12 Jan 2023 14:34:10 +0000 (09:34 -0500)]
LU-16460 lnet: validate data sent from user land properly
Testing with improper setting from user land exposed some bugs in
the kernel's code handling of these cases. For tunables sent from
user land we need to do proper range checking. An improper cast
in the new Netlink tunables code preventing setting the default
LND tunable settings. Also silently ignore trying to set LND
tunables when its not supported. We shouldn't stop NI setup in
this case. Lastly setup the NI tunables to -1 when user land
doesn't provide any input. This tells the LND driver to use it
default values for the tunables. Resolve a double free when
setting up a NI with a non-existing interface. Another fix is for
net locking in lnet_net_cmd().
For lnetctl fix the YAML handling when only conns_per_peer is
requested. I only tested conns_per_peer and NI tunables changes
together before which missed the mentioned case.
Fixes:
8f8f6e2f3 ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7c5e993de57e3d674ecb8e3cc1bd62506470d416
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49588
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Mon, 9 Jan 2023 23:27:46 +0000 (16:27 -0700)]
LU-14598 tests: skip conf-sanity test_122b in interop
Code was fixed in 2.15.0.
Test-Parameters: trivial testlist=conf-sanity env=ONLY=122b serverversion=2.14.0
Fixes:
747fed818b ("LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6d9480f4b43706b597df6bd74c65959776cf2b5b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bobi Jam [Tue, 3 Jan 2023 05:57:24 +0000 (13:57 +0800)]
LU-16160 revert: "llite: clear stale page's uptodate bit"
This reverts commit
5b911e03261c3de6b0c2934c86dd191f01af4f2f
which caused a bug in cl_page_own() race with ll_releasepage()
and cl_pagevec_put() assertion failure.
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Icdb8c60f4d992c9976670e1b06c5bab5ef3a3954
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49541
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Mr. NeilBrown [Tue, 20 Dec 2022 17:03:32 +0000 (12:03 -0500)]
LU-6142 osc: tidy up osc_init()
A module_init() function that registers the services
of the module should do that last, after all other
initialization has succeeded.
This patch moves the class_register_type() call to the
end and ensures everything else that might have been
set up, is cleaned up on error.
Linux-commit:
e67f133d02e ("staging: lustre: osc: tidy up osc_init()")
Change-Id: I2a5ffb116c6d7c33a4530bab6e89a5ffe6117cea
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Wed, 4 Jan 2023 00:28:55 +0000 (19:28 -0500)]
LU-10003 lnet: use Netlink to support LNet ping commands
Completely replace the old pre-MR ping command ioctl using
Netlink which will also handle large NIDs. We do update
IOC_LIBCFS_PING_PEER, which only supports only small NIDs,
so older tools will keep working.
Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Change-Id: Ic82a18dc38e4bd4e78bf61da766f7a847da509a8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr. NeilBrown [Sat, 10 Dec 2022 14:27:08 +0000 (09:27 -0500)]
LU-6142 ldlm: use list_first_entry in ldlm_lock
This make the code (slightly) more readable.
Linux-commit: ef7e70a ("staging: lustre: ldlm: use list_first_entry in ldlm_lock")
Change-Id: If9789fef1dec55d08dec25819aaf5152946819c5
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr. NeilBrown [Sat, 10 Dec 2022 14:17:46 +0000 (09:17 -0500)]
LU-6142 ldlm: tidy list walking in ldlm_flock()
Use list_for_each_entry variants to
avoid the explicit list_entry() calls.
This allows us to use list_for_each_entry_safe_from()
instread of adding a local list-walking macro.
Also improve some comments so that it is more obvious
that the locks are sorted per-owner and that we need
to find the insertion point.
Linux-commit: 3ac5a67 ("staging: lustre: ldlm: tidy list walking in ldlm_flock()")
Change-Id: Ie9a756a898a9c58db1b4f446694603a4efa37352
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Tue, 6 Dec 2022 16:36:02 +0000 (17:36 +0100)]
LU-16369 ldiskfs: do not check enc context at lookup
On rhel8, ldiskfs should not check for encryption context of inodes
upon lookup. On these kernels, ext4 is not encryption aware, so just
assume context is fine when target is mounted as ldiskfs.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9f9813d290ea24b34f710e2c8219e856ca8fbc58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49324
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Wed, 14 Dec 2022 19:31:26 +0000 (14:31 -0500)]
LU-8915 lnet: migrate LNet selftest group handling to Netlink
Replace the LSTIO_GROUP_LIST and LSTIO_GROUP_INFO ioctls with a Netlink
backend. Make this transitition transparent to the user. Be aware this
newer version of lnet_selftest.ko doesn't support older versions of the
lst tool. While the old interface allows only setting one group up at
a time the Netlink interface can be used to setup many groups at one
time. Currently we don't change the interface to handle larger NIDs but
this new interface will allow us to use the new NID format in a follow
on patch.
Change-Id: I18f07b380d353425c6e127e4fbd0f30e41f66944
Test-Parameters: trivial testlist=lnet-selftest
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Wed, 4 Jan 2023 15:10:02 +0000 (16:10 +0100)]
LU-16444 enc: null-enc names cannot be digested form
When encrypted files have their names encrypted, long names are in
digested form in case access is done without the encryption key. The
digest is base64-encoded, and prepended with '_'.
With null encryption for file names, names are always plain text. In
this case, a legitimate '_' at the start of a name must not be
interpreted as a digested form.
sanity-sec test_54 is improved to test the case of a file whose name
starts with '_'.
Fixes:
f18c87cb53 ("LU-13717 sec: handle null algo for filename encryption")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idaad186afd06cfbabbe1d13e78f083d12876c8ff
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Sun, 28 Aug 2022 19:33:29 +0000 (15:33 -0400)]
LU-16026 llite: always enable remote subdir mount
For historical reason, ROOT is revalidated with IT_LOOKUP in
.permission to ensure permission is update to date because ROOT is
never looked up. But ROOT FID and layout is not changeable, it's
PERM lock that should be revalidated, i.e., revalidate with
IT_GETATTR instead of IT_LOOKUP.
Since PERM|UPDATE lock is on the MDT where object is located, client
can cache this lock, therefore remote subdir mount doesn't need to
lookup ROOT in each file access.
Deprecate mdt.*.enable_remote_subdir_mount.
Per http://review.whamcloud.com/19195, replace 'df' with 'lfs df' in
sanity 228b since the former doesn't support transparent recovery.
Add sanity 247h.
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I66f8ee347f6c01a8a154245b10a1d93539ea13b8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Mon, 16 Jan 2023 17:58:24 +0000 (17:58 +0000)]
LU-14824 Revert "test: sanity 413a/b unlink timeout"
This reverts commit
5ff3e400f1a74ea49b7eb9cf19715f0fae08c3f5.
The test_413a is timing out regularly for ldiskfs MDTs.
Change-Id: Iafd28ec648f0b30b3c9e48e8f8479979a8cb0d60
Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 fstype=ldiskfs testlist=sanity env=ONLY="413a 413b"
Test-Parameters: mdscount=2 mdtcount=4 fstype=zfs testlist=sanity env=ONLY="413a 413b"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49646
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andrew Perepechko [Tue, 1 Nov 2022 16:26:54 +0000 (19:26 +0300)]
LU-16286 ldiskfs: reimplement nodelalloc optimization
fiemap calls perform costly delayed extent search affecting
BRW performance, however, in Lustre we don't use delayed
allocation at all. Let's skip this search completely as we did
in RHEL7.
Change-Id: I2c3562cf5cbdf3c5532e4b79b28a040a995322b7
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11161
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Alexander Zarochentsev [Thu, 20 Oct 2022 19:23:39 +0000 (22:23 +0300)]
LU-16272 libcfs: cfs_hash_for_each_empty optimization
Restarts from bucket 0 in cfs_hash_for_each_empty()
cause excessive cpu consumption while checking first empty
buckets.
HPE-bug-id: LUS-11311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ic03875ea25101052468213043128912ac46daf32
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Elena Gryaznova [Tue, 3 Jan 2023 18:37:32 +0000 (19:37 +0100)]
LU-16440 tests: recovery-double-scale typo fix
Fix the typo.
Fixes:
f8e56a25cfc3 ("LU-15412 tests: Let init_clients_lists() export client vars")
Test-Parameters: trivial testlist=recovery-double-scale
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11422
Change-Id: I91a0c545f1eb82e6b502d9b0dc434fdb174db295
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Fri, 30 Dec 2022 21:29:59 +0000 (21:29 +0000)]
LU-15626 tests: Fix shellcheck error for rpc
This patch addresses the errors and warnings
reported by shellcheck for rpc.sh. It also
breaks up the triple nested subshell for better
readability.
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I0d4afa83a6b9d4f825f31896a52dd30319b4bf51
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Serguei Smirnov [Thu, 22 Dec 2022 22:42:48 +0000 (14:42 -0800)]
LU-15828 o2iblnd: reset hiw proportionally
As a result of connection negotiation, queue depth may end up
being shorter than "peer_tx_credits" tunables value. Before this
patch, the high-water mark "lnd_peercredits_hiw" would be set at
min(current hiw, queue depth - 1).
For example, considering that hiw is allowed to only be as low as
half of peer_tx_credits, negotiating queue_depth/peer_credits down
from 32 to 8 would always result in hiw set at 7, i.e. credits would
be released as late as possible.
With this patch, if queue depth is reduced, hiw is set proportionally
relative to the level it was at before:
hiw = (queue_depth * lnd_peercredits_hiw) / peer_tx_credits
Using the above example with queue depth initially at 32, negotiating
down to 8 would result in hiw set to 4 if "lnd_peercredits_hiw" is
initially at 16, 17, 18, 19; hiw set to 5 if "lnd_peercredits_hiw" is
initially at 20, 21, 22, 23, and so on.
Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I633933d7448db1ca88d3c65de9c29e870ca2c9fb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr NeilBrown [Tue, 13 Dec 2022 00:59:26 +0000 (11:59 +1100)]
LU-16382 config: ensure lutf.sh is included in dist
The official 2.15.1 source distribution does not contain
lutf.sh. As lustre.spec lists it (when LUTF is enabled) this causes a
build error.
It is likely not included because "./configure --enable-dist" was run
in a context where swig was not installed.
So when determining whether to enable lutf, first check for
enable_dist and in the case for enable_lutf="yes"
Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If5f856985a6d642822baba4b6ee301c04f851217
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49382
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Mon, 12 Dec 2022 13:35:41 +0000 (16:35 +0300)]
LU-16385 obdlcass: stop MGC before MGS
drops a reference to MGC when MGS is being umounted so that
MGC doesn't try to disconnected from a missing MGS which
can take long and hurt HA.
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib15f1ca56c47201bf6e29c12b3f81a11e55944ca
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Gian-Carlo DeFazio [Thu, 8 Dec 2022 23:17:26 +0000 (15:17 -0800)]
LU-14555 lnet: asym route inconsistency warning
remove LNET_UNDEFINED_HOPS from lnet_check_route_inconsistency()
where it is being treated as equivalent to 1 for the
value of lr_hops.
Due to the changes made in commit
3f2844dc9
"LU-14945 lnet: don't use hops to determine the route state",
LNET_UNDEFINED_HOPS is no longer considered equivalent to 1
for lr_hops in all cases, and it is valid to leave hops undefined
for multi-hop routes.
Therefore, having a multi-hop route with a hops of
LNET_UNDEFINED_HOPS is no longer inconsistent.
Fixes:
6ab060e58e ("LU-14555 lnet: asym route inconsistency warning")
Test-Parameters: trivial
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: Iab8597f59c5f8d27b16dbeda79b41e9ec4777f52
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Tue, 1 Nov 2022 02:57:39 +0000 (10:57 +0800)]
LU-16284 utils: lfs getstripe follows symlink
'lfs getstripe' prints the information of symlink target by default.
With '--no-follow' option it prints the information of symlink itself.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I6cef01af5bb2235bdcbf0b5c99af4b9ed5869515
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49003
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Mon, 14 Nov 2022 09:30:23 +0000 (03:30 -0600)]
LU-16202 build: bio_alloc takes struct block_device
Linux commit v5.17-rc2-21-g07888c665b40
block: pass a block_device and opf to bio_alloc
Create a compatible bio_alloc wrapper to handle the change
in arguments and behavior.
HPE-bug-id: LUS-11267
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I060229b25785f46a9749fcdb18727af292a940ac
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48820
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Fri, 16 Sep 2022 16:02:51 +0000 (18:02 +0200)]
LU-16165 sec: retry mechanism for identity cache
Implement a retry mechanism in the identity cache in case the
identity up call times out.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib70d3b851a6da3cf66dfed49b03be51da7886d01
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48579
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Tue, 8 Nov 2022 15:26:46 +0000 (09:26 -0600)]
LU-16121 llite: invalidate_folio and dirty_folio
linux commit v5.17-rc4-10-g128d1f8241d6
fs: Add invalidate_folio() aops method
A struct folio is often analogous to a struct page however
a struct folio can represent (contain) multiple pages.
linux commit v5.17-rc4-38-g6f31a5a261db
fs: Add aops->dirty_folio
__set_page_dirty_nobuffers() is replaced with filemap_dirty_folio()
Test-Parameters: trivial
HPE-bug-id: LUS-11197
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iefe67615b333e066c49c4b884dad5bea3b3ae226
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Sergey Cheremencev [Fri, 15 Jul 2022 10:06:43 +0000 (13:06 +0300)]
LU-16342 mdt: not copy pool_name to quotactl in reply
Don not copy pool_name in mdt reply to avoid out-of-bounds:
BUG: KASAN: slab-out-of-bounds in mdt_quotactl+0x13ff/0x1430 [mdt]
HPE-bug-id: LUS-10579
Change-Id: I34c4cd8aaccd938c95005dca06644e02132def34
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/160899
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49242
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Sebastien Buisson [Thu, 11 Aug 2022 15:08:11 +0000 (17:08 +0200)]
LU-16091 enc: S_ENCRYPTED flag on OST objects for enc files
Add a dumb encryption context on OST objects being created, when the
LUSTRE_ENCRYPT_FL flag gets set in the LMA, for ldiskfs backend
targets. This leads ldiskfs to internally set the LDISKFS_ENCRYPT_FL
flag on the on-disk inode. Also, it makes e2fsprogs happy to see an
enc ctx for an inode that has the LDISKFS_ENCRYPT_FL flag.
Add a dumb encryption context on OST objects being opened, if there is
not already one, for ldiskfs backend targets. This is done by adding
the LUSTRE_ENCRYPT_FL flag if necessary, at the same time as atime
gets updated. It is some sort of live self-check that fixes OST
objects created with an older Lustre version.
Enhance lfsck to detect and fix OST objects belonging to encrypted
files that are missing the encryption flag. This is implemented in the
MDT-OST consistency routine, as part of the layout checking.
Also add sanity-sec test_62 and sanity-lfsck test_42 to exercise this.
Note this patch does not add any dumb encryption context on OST
objects when the backend is ZFS.
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6bee3c82ee4d1a52275facf9e2b0d60061e0beef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48198
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Wed, 13 Jul 2022 07:50:24 +0000 (10:50 +0300)]
LU-16008 tests: don't enforce umount in recovery-small/150
as such an enforcement disconnects all MDS clients, then
another MDS trying to talk to that original MDS gets evicted
and an unlucky RPC (e.g. rmdir in test cleanup) can fail with:
rm: cannot remove '...d110h.recovery-small/source_dir': Is a directory
Fixes:
57f3262baa7 ("LU-15788 lmv: try another MDT if statfs failed")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I593e1425b44fc19cb7b2b7da33fa10590532f930
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47940
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Mon, 29 Jun 2020 07:23:04 +0000 (01:23 -0600)]
LU-930 misc: improve .mailmap coverage
Improve .mailmap coverage and correctness for "git shortlog"
and related commands.
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I41a2474f2c69e1e49b5f8569ca6cc7bfcf3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Wed, 22 Dec 2021 12:26:49 +0000 (07:26 -0500)]
LU-14824 test: sanity 413a/b unlink timeout
Unlinking remote/striped directories is slow on zfs system, limit
total directory number for 1-stripe directory test in 413a/b on zfs
system, and don't test striped directory to avoid timeout.
Also limit total stripe object count to avoid timeout.
Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 fstype=ldiskfs testlist=sanity env=ONLY="413a 413b",ONLY_REPEAT=50
Test-Parameters: mdscount=2 mdtcount=4 fstype=zfs testlist=sanity env=ONLY="413a 413b",ONLY_REPEAT=50
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie116e6df5aee3877ed9f093f58e7bd71f6c6d9d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Elena Gryaznova [Wed, 12 May 2021 18:59:23 +0000 (21:59 +0300)]
LU-14683 tests: get rid of no longer actual test
replay-single test_40() is no longer actual for
modern Lustre with Layout lock support.
Fixes:
945a97dbc2f0 ("LU-2628 tests: disable test_40 of replay-single")
Test-Parameters: trivial testlist=replay-single env=ONLY=40
Signed-off-by: Elenai Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9970
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I51c3a05ef40f389535e04bd50cdf9fe51bca8acd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43676
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Etienne AUJAMES [Thu, 6 Oct 2022 13:30:54 +0000 (15:30 +0200)]
LU-16210 llite: replace selinux_is_enabled()
selinux_is_enabled() was removed from kernel 5.1.
The commit 39e5bfa add the kernel support by assuming SELinux to be
enabled if the function selinux_is_enabled() does not exist.
This has performances impacts: on older kernel (e.g: Centos7) getxattr
RPCs was not send for "security.selinux" if selinux was disabled.
Utilities like "ls -l" always try to get "security.selinux".
See the LU-549 for more information.
This patch uses security_inode_listsecurity() when mounting the
client to know if a LSM module (selinux) required a xattr to store
file contexts. If a xattr is returned we store it and use it for in
request security context.
For getxattr/setxattr we use the stored LSM's xattr to filter xattr
security contexts like security.selinux. If xattr does not match the
stored xattr name we returned -EOPNOTSUPP to userspace.
It adds also the s_security check for security_inode_notifysecctx() to
avoid calling this function if selinux is disabled (as in
nfs_setsecurity()).
For "Enforcing SELinux Policy Check" functionnality, the selinux check
have been moved in l_getsepol: -ENODEV is returned if selinux is
disabled.
Add a regresion test "sanity test_434" for this use case.
*Note:*
This patch detects that selinux is disabled without explicitly
disabled it in kernel cmdline. This is recommended for RHEL >= 8.5.
*Performances:*
Tests with "strace -c ls -l" with 100000 files on root in a multi VMs
env (on Rocky 9). FS is remount for each tests (cache is cleaned) and
selinux is disabled.
__________________ ___________ _________
| Total time % | lgetxattr | statx |
|__________________|___________|_________|
|Without the patch:| 29% | 51% |
|__________________|___________|_________|
|With the patch: | 0% | 87% |
|__________________|___________|_________|
"ls -l" uses lgetxattr to get "security.selinux".
Linux-commit:
3d252529480c68bfd6a6774652df7c8968b28e41
Fixes: 39e5bfa ("LU-12355 llite: include file linux/selinux.h removed")
Fixes: 9bcac0b ("LU-549 llite: Improve statfs performance if selinux is disabled")
Test-Parameters: clientselinux=false clientdistro=el7.9 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux=false clientdistro=el8.5 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux=false clientdistro=el8.6 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux clientdistro=el8.6 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el8.6 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el7.9 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el7.9 testlist=sanity-selinux
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I4dac87ac0341b45a1c2fef836cdce0361017b3f5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48875
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Timothy Day [Thu, 15 Dec 2022 19:47:57 +0000 (19:47 +0000)]
LU-16227 utils: Add warning for lfs setdirstripe -D -i x,y,z
Adjust setdirstripe to be more user friendly. The
use of "-D -i x,y,z" now returns a clear error
that this is creating a default striped directory
layout and that this is a bad idea, if it is not
accompanied by "-c N" that matches the number of
index values given.
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic9f91853d4016bf0edfb3845ac9f1edafdf73d55
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49420
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Zarochentsev [Sat, 20 Aug 2022 11:34:25 +0000 (14:34 +0300)]
LU-16082 ldiskfs: Large EA upgrade test
Check whether old Lustre-only ea inodes
are accessible under new ext4 versions;
additional fixes for 32newtarball test
to work with dm-flakey devices;
32newtarball now creates ldiskfs fs with
ea_inode fs feature enabled;
disk2_12 ldiskfs image is replaced by
a new disk image with a large xattr test
file;
Fix FLR file creation in 32newtarball test.
Test-Parameters: env=ONLY=32 testlist=conf-sanity serverdistro=el7.9
Test-Parameters: env=ONLY=32 testlist=conf-sanity serverdistro=el8.5
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Id8c33b91f7ca7d68a97384dce8922dd25e8ecd68
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48350
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Aurelien Degremont [Mon, 2 Jan 2023 16:26:15 +0000 (16:26 +0000)]
LU-16439 socklnd: clarify error message on timeout
When the local peer times out when writing
to another peer, prints an explicit error message
rather than a generic one. This is make it clearer
for admins and easier to debug.
Add port to help determining if this is always
the same one or not.
Test-Parameters: trivial
Change-Id: Iaefbc601963b50293743a22ff9329018e8a5fc4f
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Aurelien Degremont [Mon, 2 Jan 2023 16:04:31 +0000 (16:04 +0000)]
LU-16438 llite: remove false outdated comment
Old commit
99727c7a1 from Lustre 2.6 changed
ll_i2gids() behavior without updating the function
documentation accordingly. Fix it as this is confusing.
Test-Parameters: trivial
Fixes: 99727c7 ("LU-4476 kernel: support process namespace containers")
Change-Id: Iccc50fe6ac9e02de9bae7fd8f91e3e73ff45e327
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49539
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Li Dongyang [Mon, 19 Dec 2022 10:03:47 +0000 (21:03 +1100)]
LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x
Recreate the currently broken lustre kernel patches
to allow using custom integrity functions for bio.
Note we don't need to save the generate_fn anymore,
it will be used once we call bio_integrity_prep_fn().
Add upstream fix
b13e0c718568 ("block: bio-integrity: Advance seed correctly
for larger interval sizes") for CentOS 8.0 to 8.6.
Handle the kernel api changes for the T10PI generate and
verify functions introduced in CentOS 8.x kernel,
mostly because of switching to blk_integrity_iter.
Update the custom generate and verify functions, to sync
with upstream versions.
- Add T10-DIF-TYPE2, currently only a place holder,
not used in upstream either.
- Use __be16 instead of __u16 for guard tags.
Only reuse guard tags if the rpc checksum is the same
one supported on the target. We already have some protection
during checksum type negotiation, the server
will mark the target's T10PI type as the only
T10PI checksum type supported. But it's still good to
have the logic in place.
Do not call bio_integrity_prep() if the custom interface
bio_integrity_prep_fn() does not exist, submit_bio() will
do that for us.
On the servers, show the target's T10PI checksum as
the preferred checksum_type even if it's not the fastest.
Note this is only cosmetic and does not impact the checksum
type used, which is still done during negotiation.
Change-Id: I2d0ba0b80ba9cde2977da24db08095671aa5373c
Test-Parameters: trivial
Fixes:
293844d132 ("LU-16222 kernel: RHEL 8.7 client and server support")
Fixes:
f176efd183 ("LU-12269 kernel: RHEL 8.0 server support")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49441
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr NeilBrown [Tue, 13 Dec 2022 03:35:30 +0000 (14:35 +1100)]
LU-14409 ldiskfs: remove stray tracing code
These lines should never have landed :-(
Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Fixes:
3a83078628a4 ("LU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1")
Change-Id: I7720158605cce81721738a5f6640ccb4e0440b09
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49383
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Alexander Zarochentsev [Tue, 6 Dec 2022 17:10:41 +0000 (20:10 +0300)]
LU-16387 lustre: switch OBD_ALLOC_LARGE to vmalloc faster
No need to waste time trying hard to kmalloc large memory
chunk in OBD_ALLOC_LARGE. Reduce memory allocation attempts
by specifiying __GFP_NORETRY for all allocations > PAGE_SIZE
(as in kvmalloc in linux-4.18 kernel),
so the kmalloc part fails easily.
HPE-bug-id: LUS-11409
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7ff8acfb6b467a4f5a7e61b2b8ec631bea89f8a5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49380
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Timothy Day [Thu, 8 Dec 2022 19:09:38 +0000 (19:09 +0000)]
LU-15626 tests: Fix shellcheck warning for acceptance-small
This patch addresses the warning and style suggestions
reported by shellcheck. The patch also ensures that
all spaces have been moved to tabs, and the script now
logs what test suites are about to be run.
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia88758d0bf89e7d0aa67dfae31d969c780507b88
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49350
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Wed, 7 Dec 2022 04:04:42 +0000 (23:04 -0500)]
LU-16335 test: add fail_abort_cleanup()
Add helper fail_abort_cleanup() to unlink test directories (call lfs
rm_entry if directory is broken) after fail_abort because after
LU-16159 update logs will be canceled upon recovery abort, which may
leave broken directories.
Update replay-single.sh in places where fail_abort is called and
directory may become broken.
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single,replay-single,replay-single,replay-single
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I260689b1a6fa5b0b4db5aab5095cb062ae57d612
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49335
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Xinliang Liu [Wed, 26 Oct 2022 08:58:14 +0000 (08:58 +0000)]
LU-16322: build: Add client build support for openEuler
The kernel of current openEuler LTS version 22.03 is based on Linux
5.10.0 which is already supported in Lustre master. Thus we only need
to add build support for openEuler client.
OpenEuler Linux although is not compatible with RHEL, but it uses the
same package manager DNF/tools as RHEL and references the package
naming of RHEL. Thus we can reuse most of the RHEL build logic/scripts
for openEuler client building.
OpenEuler Linux is becoming the mainstream Linux distro in China. So
adding support for it makes sense for the users. For more details about
it see: https://www.openeuler.org/en/.
Test-Parameters: trivial
Change-Id: I8e8b59d36e566c6e49b12346c2fde985153f014d
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49187
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Cyril Bordage [Mon, 31 Oct 2022 11:08:44 +0000 (12:08 +0100)]
LU-16279 lnet: improve error reporting in LUTF
When an error occurs without using an RPC, the error reporting lacks
of traceback, listing only the exception itself. This patch adds the
traceback to the error string reported by R().
Test-Parameters: @lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I3fe5f7628a3f96aeb7941ec75db6b6b5e49e9d84
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mikhail Pershin [Tue, 25 Oct 2022 15:34:45 +0000 (18:34 +0300)]
LU-16268 mdd: set effective changelog mask correctly
When changelog mask is changed from MINMASK to a particular
value then recalculation is missed, so effective mask could
stay unchanged against expectations.
Patch adds additional check that old mask is MINMASK or not
to decide if mask recalculation is needed.
Test 160o is extended for that issue.
Fixes:
ffe259f81cda ("LU-13055 changelog: use default mask if server has no mask")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia3c93e19daeb71ff1042ebdb555e918faf89f844
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48961
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Shaun Tancheff [Thu, 10 Nov 2022 06:53:47 +0000 (00:53 -0600)]
LU-16117 build: Avoid excessive modpost warnings
To avoid modpost warnings about duplicate symbols do not add
the LINUX_OBJ kernel symbols to the KBUILD_EXTRA_SYMBOLS list
Test-Parameters: trivial
HPE-bug-id: LUS-11192
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I85fc90661efcb66e4aa39c9bd3393dbe4f7ba5eb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48362
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Shaun Tancheff [Tue, 15 Nov 2022 04:06:05 +0000 (22:06 -0600)]
LU-16113 build: Fix configure tests for lock_page_memcg
Linux commit v5.15-12273-gab2f9d2d3626
mm: unexport {,un}lock_page_memcg
Fails when lock_page_memcg exists but is not exported.
Adjust usage of [un]lock_page_memcg() to vvp_[un]lock_page_memcg() and
define the mapping accordingly to avoid the compile error.
Test-Parameters: trivial
HPE-bug-id: LUS-11189
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I18029d078a00a0b21a14721bcdf953939b4118a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49144
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Shaun Tancheff [Wed, 9 Nov 2022 13:33:40 +0000 (07:33 -0600)]
LU-16116 build: Configure tests for rhltable, bitmap_alloc...
rhel8.6 with kernel 5.18 breaks a couple of compile tests
struct rhltable test fails with:
... error: ‘hlt’ is used uninitialized in this function
[-Werror=uninitialized]
rdma_wr() test failes with:
... error: assignment discards ‘const’ qualifier from pointer
target type [-Werror=discarded-qualifiers]
wr = rdma_wr(NULL);
nla_strdup() test fails due to unused variable 'tmp'
Test-Parameters: trivial
HPE-bug-id: LUS-11191
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib2b1d223ac809cea157158fe35fd2535b04367df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Shaun Tancheff [Sat, 12 Nov 2022 09:29:42 +0000 (03:29 -0600)]
LU-16118 build: Use pde_data() when available
Linux commit v5.16-11573-g6dfbbae14a7b
introduce pde_data() and
Linux commit v5.16-11574-g359745d78351
remove PDE_DATA()
Use PDE_DATA() when pde_data is not available.
Test-Parameters: trivial
HPE-bug-id: LUS-11193
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ida570462acd466a251adc81a14bc1fbf35d96b00
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48363
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Frank Sehr [Thu, 16 Jun 2022 18:40:06 +0000 (11:40 -0700)]
LU-13642 lnet: Allow IP specification
Allows selecting an interface by specifying an IP address in the NID.
All variations of interface and IP address are considered.
1 no interface and no IP address is specified: Select first interface
2 interface and no IP: Select main IP address
3 no interface and IP specified: Select first interface
that has the IP address
4 interface and IP specified: Verify that interface and IP match
The change does not have any effect on current configurations and
will be active when the changes in lnetctl, YAML or
module parameter are applied.
This patch effects only socklnd component. A macro is defined in
lnet-types.h to check if an IP address is set (IPV4 or IPV6).
Further IPV6 changes are not integrated.
For further reference please read
IP specification in LNet
https://wiki.whamcloud.com/display/LNet/IP+specification+in+LNet
Test-Parameters: trivial
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: Ifdf8f884ce1ee1fb1b97ca3121aa83efb46f8ef0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Cyril Bordage [Tue, 7 Dec 2021 22:14:43 +0000 (23:14 +0100)]
LU-15288 lnet: increase transaction timeout
In LU-13145, it was decided to increase default transaction timeout
(LNET_TRANSACTION_TIMEOUT_DEFAULT) to 150s. But, in the associated
patch, it was set to 50s. This modification will also modify
lnd_timeout (from 16 to 49).
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I13a8b5d14230bb6e8936cb3e18540f19dbc62985
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45780
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Shaun Tancheff [Fri, 2 Dec 2022 10:19:59 +0000 (04:19 -0600)]
LU-16321 osd: Allow fiemap on kernel buffers
Linux commit v5.17-rc3-19-g967747bbc084
uaccess: remove CONFIG_SET_FS
When KERNEL_DS gone lustre needs an alternative for fiemap to
copy extents to kernel space memory.
Direct in-kernel calls to inode->f_ops->fiemap() can utilize
an otherwise unused flag on fiemap_extent_info fi_flags
to indicate the fiemap extent buffer is allocated in kernel space.
Include ldiskfs patches for ldiskfs_fiemap() to
define EXT4_FIEMAP_FLAG_MEMCPY and utilize it.
HPE-bug-id: LUS-11337
Fixes:
d0337cab8e ("LU-14195 osd: don't use set_fs() for ->fiemap() calls.")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I7a8edb481833fd1bdcf7b6cd6e08397c1754baee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49190
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Tue, 20 Dec 2022 20:24:25 +0000 (12:24 -0800)]
LU-14645 tests: test lfs setdirstripe with '/$'
This patch improves one of the lfs setdirstripe tests to
verify that dir name ending with '/' also works.
Test-Parameters: trivial mdscount=2 mdtcount=4 \
env=ONLY=24B testlist=sanity
Change-Id: I237d5a9ebad42cc0569aa1db487d0df147372316
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49463
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Thu, 8 Dec 2022 07:56:36 +0000 (23:56 -0800)]
LU-16373 tests: failover mds1 back to the primary server
This patch fixes recovery-small test 144a to failover
mds1 back to the primary server so that stack_trap can
set timeout parameter on the correct mds node.
Test-Parameters: trivial \
env=SLOW=yes,FAILURE_MODE=HARD,ONLY=144a \
clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
austeroptions=-R failover=true iscsi=1 \
testlist=recovery-small
Change-Id: Idbfdb7b084c7edac8784008e0455f76632aa685b
Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49345
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Thu, 29 Dec 2022 08:21:32 +0000 (00:21 -0800)]
LU-16433 llite: check vvp_account_page_dirtied
This patch removes duplicated codes from vvp_set_pagevec_dirty()
and check vvp_account_page_dirtied to determine if falling back
to call __set_page_dirty_nobuffers().
HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT also needs to be checked because
vvp_account_page_dirtied is not defined if account_page_dirtied
is exported.
Test-Parameters: trivial clientdistro=el8.6 testlist=sanity
Test-Parameters: trivial clientdistro=el8.7 testlist=sanity
Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="130 244a" testlist=sanity
Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity
Change-Id: I272033d7494a157145224b1b8ce999a80958aa6c
Fixes:
4bf090b811 ("LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1]")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49512
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Shaun Tancheff [Fri, 16 Dec 2022 09:41:54 +0000 (03:41 -0600)]
LU-16120 build: Add support for kobj_type default_groups
Linux commit v5.1-rc3-29-gaa30f47cf666
kobject: Add support for default attribute groups to kobj_type
Linux commit v5.18-rc1-2-gcdb4f26a63c3
kobject: kobj_type: remove default_attrs
Switch to using kobj_type default_groups when it is available.
Provide support for default_attrs for older kernels.
HPE-bug-id: LUS-11196
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I43b03c67c22307293a2abc444aa1a73889ca09ee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Boyko [Thu, 3 Nov 2022 11:23:20 +0000 (07:23 -0400)]
LU-16297 ptlrpc: don't panic during reconnection
ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks
(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
reconnect changes import state and flags
and second part
(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))
MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.
HPE-bug-id: LUS-10985
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ifc9e413c679c3e8a4c8f4f541251bebabae41c82
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Wed, 14 Dec 2022 02:31:05 +0000 (18:31 -0800)]
LU-15935 tests: add version check to replay-dual test_33
This patch adds MDS version check to replay-dual test_33
to avoid interop test failure.
Test-Parameters: trivial \
serverjob=lustre-b2_15 serverbuildno=28 \
env=ONLY=33 testlist=replay-dual
Test-Parameters: trivial env=ONLY=33 testlist=replay-dual
Change-Id: I3ec665302a431d3c0f07bc819a08237dbc5b4309
Fixes:
1a79d395dd ("LU-15935 target: keep track of multirpc slots in last_rcvd")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49398
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Li Dongyang [Mon, 14 Nov 2022 13:28:37 +0000 (00:28 +1100)]
LU-8367 osp: wait for precreate on reformatted OST
We should wait for precreate rpc to finish when we see a just
reformatted/replaced OST, otherwise the client could try
to access the object on OST before it's created.
Do not use sync_trans when recreating the objects on the
reformatted/replaced OST.
Fix detecting reformatted OST for FID_SEQ_NORMAL, for such
seqs the oid will be initialized as LUSTRE_FID_INIT_OID,
which is 1.
Change-Id: I4aebb9d573aa352dd7897e5f1129dc2117a084bb
Fixes:
63e17799a3 ("LU-8367 osp: enable replay for precreation request")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49151
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Aurelien Degremont [Wed, 8 Jun 2022 07:49:32 +0000 (07:49 +0000)]
LU-15921 tests: fix sanity-hsm 24c
Fix bad copy-paste in test sanity-hsm 24c causing
the test to save 3 different tunables, but actually
restoring the same one three times.
Also improve the code to support values including spaces.
Test-Parameters: trivial testlist=sanity-hsm,sanity-pcc
Fixes: 2042bce ("LU-9474 tests: rewrite copytool_setup to use stack_trap")
Fixes: f172b11 ("LU-10092 llite: Add persistent cache on client")
Change-Id: I34cc61515ebb862d5996f41cdb2055ac53ccac65
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47564
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Cyril Bordage [Mon, 31 Oct 2022 08:57:10 +0000 (09:57 +0100)]
LU-16277 lnet: fix bad parameter in LUTF
In SimpleLustreNode, exception parameter is not passed to BaseTest
that leads to this parameter not used when using remote agent.
Test-Parameters: @lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: Ie458ef4a41dc059da8f069d8d62d365c21c9f25d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48985
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Mon, 12 Dec 2022 09:52:16 +0000 (12:52 +0300)]
LU-16384 tests: dump lustre log if DEBUG_RMMOD set
just to simplify local development and use existing code in
lustre_rmmod script:
DEBUG_RMMOD=<logfile> sh sanity.sh will dump a text lustre log to <logfile>.
it can be DEBUG_RMMOD=- to direct lustre log to standard output.
Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8d72e1e9cecb354bcc5d41ab3cca5767a298c668
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49374
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Wed, 22 Jun 2022 12:25:56 +0000 (17:55 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck (3/5)
This patch fixes "error" issues reported by shellcheck
for file lustre/tests/test-framework.sh. This patch also
moves spaces to tabs.
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5c802e268e68edc118d89d86063a23bedf972013
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49437
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Tue, 13 Dec 2022 05:23:26 +0000 (10:53 +0530)]
LU-16386 utils: Improve mkfs.lustre.8 man page
This patch imporves the
- Options section of "--version" argument
- Adds "--version" option to examples section
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7fd3e7f1ea9a313a33db5620a92a595f2c4bd36f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49384
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Tue, 20 Dec 2022 04:50:59 +0000 (20:50 -0800)]
LU-16348 tests: export TESTLOG_PREFIX and TESTNAME to rpc.sh
In Lustre test suites, while running do_rpc_nodes, if the
remote function failed and error() was called,
then gather_logs() can not gather logs with a correct
prefix name because TESTLOG_PREFIX and TESTNAME variables
were not exported to rpc.sh.
Test-Parameters: trivial testlist=sanity,conf-sanity
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I2bdbca7f1886f376160a87293ef367f3a4a59f86
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49260
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Wed, 22 Jun 2022 12:47:30 +0000 (18:17 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck (4/5)
This patch fixes "error" issues reported by shellcheck
for file lustre/tests/test-framework.sh. This patch also
moves spaces to tabs.
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I84b43cba5b50d6618bee756d2f3c7f59ab0d74da
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49438
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Boyko [Mon, 28 Nov 2022 14:20:05 +0000 (09:20 -0500)]
LU-16271 ptlrpc: fix eviction right after recovery
When recovery is finished exports could be timedout since
recovery thread waits stale clients, and no more requests
come after final ping. This was handled as exports timers update
after final ping processing. LU-16002 introduced fast evictions
and brings error - eviction right after recovery.
Process exports timers updates before obd_recovering is cleared.
Fixes:
6bdeda7afe ("LU-16002 ptlrpc: reduce pinger eviction time")
Test-Parameters: testlist=replay-single env=ONLY=89,ONLY_REPEAT=20
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ibf3b2f632d6d3aa1de57038fdecbec38cf9a97cf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49257
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Wed, 28 Dec 2022 18:08:20 +0000 (10:08 -0800)]
LU-16434 tests: replace '-m' with '-i' in sanity/230j
In lfs_setdirstripe(), '-m' was originally used for '--mode'.
Fix sanity test_230j to replace '-m 0' with '-i 0' to force
directory creation on MDT0000 as the test expected.
Test-Parameters: trivial mdscount=2 mdtcount=4 \
env=ONLY=230j testlist=sanity
Change-Id: I10d435719f4b29ec47fa06c478caee9fcc8134a5
Fixes:
8deea7888c ("LU-11508 mdt: reject DoM file migration")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49523
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Oleg Drokin [Sat, 24 Dec 2022 03:47:34 +0000 (22:47 -0500)]
New tag 2.15.53
Change-Id: I93c2e581fd13b3d233030ce3b178c23059276b01
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Cyril Bordage [Sat, 10 Dec 2022 00:51:16 +0000 (01:51 +0100)]
LU-16378 lnet: handles unregister/register events
When network is restarted, devices are unregistered and then
registered again. When a device registers using an index that is
different from the previous one (before network was restarted), LNet
ignores it. Consequently, this device stays with link in fatal state.
To fix that, we catch unregistering events to clear the saved index
value, and when a registering event comes, we save the new value.
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I17e93a1103d588f3e630a9c7446b345f4d472b97
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49375
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Thu, 1 Dec 2022 08:17:00 +0000 (03:17 -0500)]
LU-16335 build: remove _GNU_SOURCE dependency in lustre_user.h
The lustre_user.h header uses the non-standard strchrnul() function
in userspace. This will always leads to LC_IOC_REMOVE_ENTRY configure
check to fail, and in the end "lfs rm_entry" always returns -ENOTSUP.
Implement an alternative approach to avoid external dependencies on
the lustre_user.h header. Also, LC_IOC_REMOVE_ENTRY is itself
unnecessary, the code can check for LL_IOC_REMOVE_ENTRY directly.
Replace the NFS-specific -ENOTSUP error return code with -EOPNOTSUPP.
Fix the compile test_400[ab] checks to not use "-std=c99" to verify
that the uapi headers are usable without this dependency.
Fixes:
b59835f8b6 ("LU-13903 utils: have liblustreapi support Linux client")
Fixes:
7a7309fa84 ("LU-13274 uapi: make lustre UAPI headers C99 compliant")
Fixes:
6331eadbd6 ("LU-15420 uapi: avoid gcc-11 -Werror=stringop-overread")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If42743a2148c317b8a9b701ceb5d08bac5149f5f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49328
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Tue, 13 Dec 2022 07:01:06 +0000 (00:01 -0700)]
LU-16390 tests: check Lustre filefrag in sanity-flr/49a
Check that a Lustre-patched filefrag is installed when running
sanity-flr test_49a.
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic909ea4ca160d47480004f53a96ce7539ce5076c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49386
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Arshad Hussain [Mon, 12 Dec 2022 14:42:44 +0000 (09:42 -0500)]
LU-16386 mkfs: Handle --version argument correctly
Running mkfs.lustre with --version or -V argument
fails instead of printing the version. This patch
fixes the error.
Without patch:
--------------
$ ./lustre/utils/mkfs.lustre --version
usage: mkfs.lustre <target type> [--backfstype=ldiskfs]
<snip>
With patch:
-----------
$ ./lustre/utils/mkfs.lustre --version
mkfs.lustre 2.15.52_175_ge7aa83d
Test-Parameters: trivial fstype=zfs testlist=sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I4d4d1144d669fce8b02e9f8c3fb5f45f68b337b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr NeilBrown [Fri, 9 Dec 2022 05:31:13 +0000 (16:31 +1100)]
LU-14073 ldiskfs: don't test LDISKFS_IOC_FSSETXATTR
EXT4_IOC_FSSETXATTR was removed upstream in Linux 5.9, Commit
cb29a02d3a9d ("ext4: use generic names for generic ioctls").
So we cannot use it to test if project quotas are supported.
Instead test if EXT4_MAXQUOTAS is 3. This was changed to 3 upstream
in the commit immediately before EXT4_IOC_FSSETXATTR was added, so it
is effectively the same test.
Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I88c51c03959ebe98cd5066596f5158fac570a625
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49353
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 8 Dec 2022 18:43:57 +0000 (11:43 -0700)]
LU-16376 obdclass: NUL terminate long jobid strings
It appears that some jobid names can be sent that are using the full
32-byte size, rather than containing an embedded NUL terminator. This
caused errors in lprocfs_job_stats_log() when it overflowed.
If there is no NUL terminator in lustre_msg_get_jobid() then add one
if not found within the buffer, so that the rest of the code doesn't
have to deal with unterminated strings.
This potentially exposes a larger issue that other places may not be
handling the unterminated string properly either, which needs to be
addressed separately on both the client and server. Terminating the
jobid to 31 chars only on the client does not totally solve the issue,
since there will still be older clients that are not doing this, so
the server needs to handle this in any case.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4c05fabdacb6a0bbf6477d3601a628fe1f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49351
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Thu, 1 Dec 2022 19:18:31 +0000 (19:18 +0000)]
LU-14707 tests: Bashify more scripts for Ubuntu et. al.
Some scripts that are not POSIX sh are being
invoked using sh. The scripts should be called
using the shell listed in the shebang.
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7233ce56df95a5b8698b39872e6118a4fa1a029a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49296
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Thu, 6 Oct 2022 11:40:41 +0000 (07:40 -0400)]
LU-15014 osc: Fix possible null pointer
Change init to fix possible null pointer access.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id1bee8b5ea5fb92a8831992ad44c487c69d52e1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44975
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 13 Oct 2022 06:05:04 +0000 (00:05 -0600)]
LU-16231 misc: rename lprocfs_stats functions
Rename lprocfs_{alloc,register,clear,free}_stats() to be
lprocfs_stats_*() so these functions can be found more easily
in relation to struct lprocfs_stats.
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I671284a86ee2a1fd3c58da75923f9467e72540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ellis Wilson <elliswilson@microsoft.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexey Lyashkov [Wed, 14 Sep 2022 19:59:11 +0000 (22:59 +0300)]
LU-16157 lnet: lst read-outside of allocation
lnet_selftest want a some parameters from userspace,
but it never sends. It caused a read of outside of allocation
like
BUG: KASAN: slab-out-of-bounds in lstcon_testrpc_prep+0x19e7/0x1bb0
Read of size 4 at addr
ffff8888bbaa866c by task lt-lst/6371
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I2a98e60c4be65c49fa9da4b418e50f1c7309b69d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48547
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>