Whamcloud - gitweb
fs/lustre-release.git
7 months agoLU-17000 coverity: Fix Resource Leak(0) 72/52272/4
Arshad Hussain [Tue, 5 Sep 2023 10:32:15 +0000 (16:02 +0530)]
LU-17000 coverity: Fix Resource Leak(0)

This patch fixes Resource leak error reported
by coverity run.

CoverityID: 339696 ("Resource Leak"): liblustreapi_layout.c
CoverityID: 397918 ("Resource Leak"): lsnapshot.c
CoverityID: 397894 ("Resource Leak"): obd.c
CoverityID: 397851 ("Resource Leak"): lfs.c
CoverityID: 397832 ("Resource Leak"): liblusteapi.c
CoverityID: 397772 ("Resource Leak"): liblusteapi_utils.c
CoverityID: 397721 ("Resource Leak"): obd.c

Test-Parameters: trivial fstype=zfs testlist=sanity,conf-sanity,sanity-lsnapshot
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5c0152014f987264df17fac78390a2afc12c9255
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17087 lmv: update stale tgt statfs every 1 hour 70/52270/5
Lai Siyao [Mon, 4 Sep 2023 12:45:34 +0000 (08:45 -0400)]
LU-17087 lmv: update stale tgt statfs every 1 hour

Some tgt statfs may not be initialized upon mount due to network
issues, if the filesystem is imbalanced, these tgts won't be chosen to
create directory because their bavail and ffree are 0.

If MDT is chosen by QoS, update tgt statfs that is one hour overdue,
otherwise check update the statfs of the tgt that is chosen.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I06af8b8bd342f66cb794471df3ee0f3b127ffe05
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
7 months agoLU-16954 llite: add SB_I_CGROUPWB on super block for cgroup 55/51955/7
Qian Yingjin [Wed, 16 Aug 2023 04:02:22 +0000 (00:02 -0400)]
LU-16954 llite: add SB_I_CGROUPWB on super block for cgroup

Cgroup support can be enabled per super_block by setting
SB_I_CGROUPWB in ->s_iflags.
Cgroup writeback requires support from both the bdi and
filesystem.
This patch adds SB_I_CGROUPWB flag on super block for Lustre.
This is required by the subsequent patch series to support
cgroup in Lustre.

Adding this flags for Lustre super block will cause the remount
failure on Maloo testing on Unbutu 2204 v5.15 kernel due to the
duplicate filename (sysfs) for bdi device.
To avoid remount failure, we explicitly unregister the sysfs for
the @bdi.

Test-Parameters: clientdistro=ubuntu2204 testlist=sanity-sec
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7fff4f26aa1bfdb0e5de0c4bdbff44ed74d18c2d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51955
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
7 months agoLU-13308 mdc: support additional flags for OBD_IOC_CHLG_POLL ioctl 61/52361/4
James Simmons [Tue, 19 Sep 2023 13:40:24 +0000 (09:40 -0400)]
LU-13308 mdc: support additional flags for OBD_IOC_CHLG_POLL ioctl

Currently the mdc kernel code expects the flag argument for
OBD_IOC_CHLG_POLL ioctl to only be CHANGELOG_FLAG_FOLLOW. With
IPv6 we need to send a request to the kernel to present the NID
in the struct lnet_nid format since we can't just send large NIDs
to user land if we are using older tools.

With the newer user land tools we will be sending an expanded flag
which the current kernel changelog code can't handle. Rework the
code to support the new flag if we end up with the case of newer
user land tools and an older kernel. This code will also maintain
backwards compatiblity with the older user land tools.

Change-Id: I26a80d30ce2ebf2075a2a8f510ff81d6b0b8d848
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-16134 utils: adds lctl apply_yaml 19/48419/5
Alexander Boyko [Fri, 2 Sep 2022 08:41:32 +0000 (04:41 -0400)]
LU-16134 utils: adds lctl apply_yaml

Commnad set_param -F is used to parse yaml file with settings,
and makes set_param -P for it. Many type of settings are based
on a specific device and have conf_param type. When such settings
goes to set_param -P, all nodes tries to apply it and many errors
happen.
systemd-udevd[568906]: Process '/usr/sbin/lctl set_param
'osp.kjcf04-OST0003-osc-MDT0000.resend_count=43''
failed with exit code 2.

The patch adds functionality for conf_param event of YAML file,
and introduces lctl apply_yaml for both types of event.
YAML example
- {device: testfs-MDT0001, event: conf_param, index: 76, parameter:
   lod.qos_threshold_rr=100}
- { index: 17, event: set_param, device: general,
    parameter: jobid_var, value: procname_uid }

Test-Parameters: trivial
HPE-bug-id: LUS-11116
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iec3b1f14b9ddb85ef3e110bbc4467d0d6c80c136
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48419
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
7 months agoLU-14470 dne: striped mkdir replay by client request 85/47385/22
Lai Siyao [Sun, 21 Nov 2021 09:53:09 +0000 (04:53 -0500)]
LU-14470 dne: striped mkdir replay by client request

Once all involved MDTs of a striped mkdir were rebooted, or MDT
recovery was aborted, this mkdir will be replayed by client request.
To correctly replay such mkdir, pack directory LMV in mkdir reply,
and save it to request from reply, and MDS should use this layout to
replay mkdir.

For MDT recovery abort case, the original mkdir may be partially
executed, so mkdir replay should check below cases and don't treat
them as error:
* name entry is found on parent directory on remote MDT.
* stripe exists on remote MDT.

For backward compatibility, Add MDS_MKDIR_LMV flag to indicate a
client requires directory LMV in mkdir reply.

Updated replay-single 100c since striped mkdir can replay now.

Updated recovery-small 130 since create fetches layout now.

Added replay-single 100e.

Test-Parameters: mdscount=2 mdtcount=4 testlist=racer,racer,racer,racer,racer
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If0cc8f4aebbe55cc28786d6b4198dbb57743feb3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-13805 clio: bounce buffer for unaligned DIO 16/45616/131
Patrick Farrell [Thu, 27 Jul 2023 18:36:30 +0000 (14:36 -0400)]
LU-13805 clio: bounce buffer for unaligned DIO

Direct I/O must normally be page aligned both in terms of
I/O size and memory alignment.  This is because the I/O
must be page aligned before being written to disk.  This
is also true for buffered I/O, but the I/O is aligned
using the page cache.

In recent versions of Lustre, direct I/O is significantly
faster than buffered I/O, due to lower overhead for page
management.  Thus it is desirable to be able to do more
I/O as direct I/O.

This patch allows unaligned direct I/O by creating a buffer
inside the kernel and aligning the I/O by copying in to
this aligned buffer.  Because the main cost of buffered I/O
is locking in the page cache rather than memcopy(), this is
still significantly faster than buffered I/O (though slower
than normal direct I/O).

This will eventually allow us to convert buffered I/O to
direct I/O when doing so would increase performance.

Here are some comparative benchmarks using IOR, all single
process.

UDIO is unaligned DIO.

io size   1M           4M          16M          64M
----------------------------------------------------------
BIO Write  | 1502 MiB/s | 1382 MiB/s | 1683 MiB/s | 1677 MiB/s
BIO Read   | 2169 MiB/s | 1902 MiB/s | 2131 MiB.s | 1955 MiB/s
DIO Write  | 1010 MiB/s | 2778 MiB/s | 5905 MiB/s | 7917 MiB/s
DIO Read   | 893 MiB/s  | 2657 MiB/s | 4724 MiB/s | 7579 MiB/s
UDIO Write | 848 MiB/s  | 1666 MiB/s | 2117 MiB/s | 2243 MiB/s
UDIO Read  | 933 MIB/s  | 2412 MiB/s | 3690 MiB/s | 5370 MiB/s

Unaligned DIO offers benefits vs buffered write and
buffered read, but is of course slower than DIO.

Notice on this node the best case DIO performance is
~8 GiB/s.  On a node with 12 GiB/s best case DIO, best case
UDIO read is 8 GiB/s and best case UDIO write is 2.5 GiB/s.

This is because UDIO read is fully parallel, UDIO write is
not.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7eeebf9a608f006c8095b95f0677adb99f19d640
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45616
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-13805 llite: unaligned direct_rw_pages 93/49993/33
Patrick Farrell [Tue, 14 Feb 2023 18:47:54 +0000 (13:47 -0500)]
LU-13805 llite: unaligned direct_rw_pages

Add support for ll_direct_rw_pages to handle unaligned
IO by allowing both the first and last page to be partial
pages.

This has been broken off from the main unaligned DIO patch
to make it more reviewable.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I055105589d5416fe6aa82fb6a087db7b8b38c8d1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
7 months agoLU-16976 ldiskfs: add support for openEuler 22.03 SP2 53/51753/6
Xinliang Liu [Tue, 18 Jul 2023 03:42:19 +0000 (03:42 +0000)]
LU-16976 ldiskfs: add support for openEuler 22.03 SP2

Add ldiskfs server support for oe2203sp2.
Sync with ldiskfs-5.14-rhel9.2.series adding missing patches.
Also refine openEuler lbuild scripts.

Change-Id: I91841a7140a9f8f3182a4a329b9f04639a85e94d
Test-Parameters: trivial
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-16713 llite: remove unused ccc_unstable_waitq 85/52485/2
Oleg Drokin [Sat, 23 Sep 2023 06:07:00 +0000 (02:07 -0400)]
LU-16713 llite: remove unused ccc_unstable_waitq

Previous patch removed the only waiter on this waitq, so there's
no point in having it around.

Change-Id: Iceb1da2fb8958ae0bd7b0f4241cb263d02ca6dbd
Test-parameters: trivial
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52485
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
7 months agoLU-16713 llite: writeback/commit pages under memory pressure 44/50544/26
Qian Yingjin [Tue, 6 Jun 2023 08:11:30 +0000 (15:11 +0700)]
LU-16713 llite: writeback/commit pages under memory pressure

Lustre buffered I/O does not work well with restrictive memcg
control. This may result in OOM when the system is under memroy
pressure.

Lustre has implemented unstable pages support similar to NFS.
But it is disabled by default due to the performance reason.

In Lustre, a client pins the cache pages for writes until the
write transcation is committed on the server (OST) even these
pinned pages have been finished writeback. The server starts
a transaction commit either because the commit interval (5
second, by default) for the backend storage (i.e. OST/ldiskfs)
has been reached or there is not enough room in the journal
for a particular handle to start. Before the write transcation
has been committed and notify the client, these pages are
pinned and not flushable in any way by the kernel.
This means that when a client hits memory pressure there can
be a large number of unfreeable (pinned and uncommitted) pages,
so the application on the client will end up OOM killed because
when asked to free up memory it can not.
This is particularly common with cgroups. Because when cgroups
are in use, the memory limit is generally much lower than the
total system memory limits and it is more likely to reach the
limits.

Linux kernel has matured memory reclaim mechanism to avoid OOM
even with cgroups.
After perform dirtied write for a page, the kernel calls
@balance_dirty_pages(). If the dirtied and uncommitted pages
are over background threshold for the global memory limits or
memory cgroup limits, the writeback threads are woken to perform
some writeout.
When allocate a new page for I/O under memory pressure, the
kernel will try direct reclaim and then allocating. For cgroup,
it will try to reclaim pages from the memory cgroup over soft
limit. The slow page allocation path with direct reclaim will
call @wakeup_flusher_threads() with WB_REASON_VMSCAN to start
writeback dirty pages.

Our solution uses the page reclaim mechanism in the kernel
directly.
In the completion of page writeback (in @brw_interpret), call
@__mark_inode_dirty() to add this dirty inode which has pinned
uncommitted pages into the @bdi_writeback where each memory
cgroup has itw own @bdi_writeback to contorl the writeback for
buffered writes within it.
Thus under memory pressure, the writeback threads will be woken
up, and it will call @ll_writepages() to write out data.
For background writeout (over background dirty threshold) or
writeback with WB_REASON_VMSCAN for direct reclaim, we first
flush dirtied pages to OSTs and then sync them to OSTs and force
to commit these pages to release them quickly.

When a cgroup is under memory pressure, the kernel asks to do
writeback and then it does a fsync to OSTs. This will commit
uncommitted/unstable pages, and then the kernel can free them
finally.

In the following, we will give out some performance results.
The client has 512G memory in total.
1. dd if=/dev/zero of=$test bs=1M count=$size
I/O size 128G 256G 512G 1024G
unpatch (GB/s) 2.2 2.2 2.1 2.0
patched (GB/s) 2.2 2.2 2.1 2.0
There is no preformance regession after enable unstable page
account with the patch.

2. One process under different memcg limits and total I/O
size varies from 2X memlimit to 0.5 memlimit:
dd if=/dev/zero of=$file bs=1M count=$((memlimit_mb * time))
memcg limits 1G 4G 16G 64G
2X memlimit (GB/s) 1.7 1.6 1.8 1.7
1X memlimit (GB/s) 1.9 1.9 2.2 2.2
.5X memlimit(GB/s) 2.3 2.3 2.2 2.3
Without this patch, dd with I/O size > memcg limit will be
OOM-killed.

3. Multiple cgroups Testing:
8 cgroups in total each with memory limit of 8G.
Run dd write on each cgrop with I/O size of 2X memory limit
(16G).
17179869184 bytes (17 GB, 16 GiB) copied, 12.7842 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.7889 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9504 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9577 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.4066 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5397 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5769 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.6605 s, 1.3 GB/s

4. Two dd writers one (A) is under memcg control and another
(B) is not. The total write data is 128G. Memcg limits varies
from 1G to 128G.
cmd: ./t2p.sh $memlimit_mb
memlimit dd writer (A) dd writer (B)
1G 1.3GB/s 2.2GB/s
4G 1.3GB/s 2.2GB/s
16G 1.4GB/s 2.2GB/s
32G 1.5GB/s 2.2GB/s
64G 1.8GB/s 2.2GB/s
128G 2.1GB/s 2.1GB/s

The results demonstrates that the process with memcg limits
nearly has no impact on the performance of the process without
limits.

Test-Parameters: clientdistro=el8.7 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Test-Parameters: clientdistro=el9.1 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7b548dcc214995c9f00d57817028ec64fd917eab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
7 months agoLU-17015 obdclass: new primitives for upcall cache 89/52389/2
Sebastien Buisson [Fri, 15 Sep 2023 11:23:19 +0000 (13:23 +0200)]
LU-17015 obdclass: new primitives for upcall cache

This patch adds 2 new primitives to the upcall cache mechanism:
- upcall_cache_get_entry_raw: get a ref on an existing entry;
- upcall_cache_update_entry: modify expiry time and state of an entry.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4825f09ae807abb52ebe0e24719dcd915e8c8aef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52389
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-16699 osc: Prefer NR_ZONE_WRITE_PENDING 99/50499/11
Shaun Tancheff [Sun, 2 Apr 2023 16:33:44 +0000 (11:33 -0500)]
LU-16699 osc: Prefer NR_ZONE_WRITE_PENDING

Linux commit v4.7-5966-g5a1c84b404a7
 mm: remove reclaim and compaction retry approximations

Introduced NR_ZONE_WRITE_PENDING which should be used
in mod_zone_page_state.

Older kernels should fallback to NR_UNSTABLE_NFS
or NR_WRITEBACK.

Test-Parameters: trivial
HPE-bug-id: LUS-11559
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I90f22d4bd56f5986eaa5d4a042a2c8ed31fbf752
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50499
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
7 months agoLU-16671 osc: fix unstable pages for short IO 51/50451/13
Patrick Farrell [Tue, 28 Mar 2023 15:02:40 +0000 (11:02 -0400)]
LU-16671 osc: fix unstable pages for short IO

Unstable pages was written with theoretical support for
short IO (ie, no bulk, data-in-rpc, LU-1757), but since the
short IO code wasn't merged until years later, they were
probably never tested together.  And when you do, it
crashes.

In truth, short IO has no separate pages to be tracked,
which is why this is crashing.  This means that small write
RPCs won't be tracked in unstable pages, but that's a very
minor limitation and unlikely to cause trouble.  (and since
RPC allocations are not 'pages', they're just malloc'ed,
there's no good way to track them anyway)

Fixes: 70f092a ("LU-1757 brw: add short io osc/ost transfer.")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I34b09f8324424c3ff0b0c09c86f01c938b643e37
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17108 nodemap: make map_mode available for default nm 36/52336/3
Sebastien Buisson [Mon, 11 Sep 2023 15:31:09 +0000 (17:31 +0200)]
LU-17108 nodemap: make map_mode available for default nm

The map_mode property lets control the way mapping is carried out. It
is already available on regular nodemaps, to decide whether uids, gids
and/or projids will be mapped.
On the default nodemap, where it is not possible to define mappings,
the map_mode property will be taken into account when trusted is 0 and
deny_unknown is 0. Unmapped IDs will be left unchanged.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I16a2f5cfda11a8435b56a00f3e97bdc70741c156
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52336
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17104 build: Correct test for bad allocation 30/52330/3
Shaun Tancheff [Mon, 11 Sep 2023 03:23:11 +0000 (22:23 -0500)]
LU-17104 build: Correct test for bad allocation

Expect non-zero value following allocation. If zeroed
reply to caller with -ENOMEM

This patch fixes a build issue reported by gcc 12

Fixes: 09f9fb3211 ("LU-11023 quota: quota pools for OSTs")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2ce197c31bf444d9f179942e516cfd9bdaf7dd9c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52330
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 months agoLU-17045 lnet: ksocklnd-config report errors on cmd failure 27/52327/2
Frank Sehr [Fri, 8 Sep 2023 23:38:05 +0000 (16:38 -0700)]
LU-17045 lnet: ksocklnd-config report errors on cmd failure

Make sure that ksocklnd-config script logs an error if any of the
commands it attempts to execute fail.

The script already does log a warning if it finds any of the routes
it is intending to add already exist. It should also report if any
of the command execution failed to make the user aware that MR routing
setup could not be applied.

Test-Parameters: trivial
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: If5a240d224f6a45015d1fc1a9d0a8df58ed661e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17098 tests: fix sanity-pcc failures on Ubuntu 10/52310/2
James Simmons [Thu, 7 Sep 2023 20:11:04 +0000 (14:11 -0600)]
LU-17098 tests: fix sanity-pcc failures on Ubuntu

A few sanity-pcc test fail due to the system setup. One of those
failures was due to uidmap not installed on my Ubuntu system.
The rest of the test failures was due to assuming uid / guid
were set values (500 and 1000) which is not the case for all
systems.

Test-Parameters: trivial testlist=sanity-pcc
Change-Id: I667f399854d626d4b22efed2b341ad5c330e0cfe
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52310
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17091 tests: check correct return value in lfs_df 85/52285/2
Lei Feng [Wed, 6 Sep 2023 02:39:49 +0000 (10:39 +0800)]
LU-17091 tests: check correct return value in lfs_df

$? is the return value of last command in a pipe.
We should check the return value of first command 'lfs df'
in this case.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn
Change-Id: I7daa38f27c878e5195181ed82717cd28ca345dbc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-17015 obdclass: set cache entry/acquire expiry at init 71/52271/3
Sebastien Buisson [Tue, 5 Sep 2023 09:08:16 +0000 (11:08 +0200)]
LU-17015 obdclass: set cache entry/acquire expiry at init

Give the ability to define values for cache entry expire and acquire
expire directly at upcall cache init.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iee0dea66943ab6747d85a378861ae98c29faa11a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
7 months agoLU-16981 lod: update llc_stripe_count after ost inactive 59/51759/5
Thomas Bertschinger [Tue, 25 Jul 2023 16:03:47 +0000 (12:03 -0400)]
LU-16981 lod: update llc_stripe_count after ost inactive

If an OST gets deactivated while lod_ost_alloc_qos() is trying to
allocate stripes for a file create, then normally this is caught and
EAGAIN is returned which causes the lod_comp->llc_stripe_count to
get updated to accurately reflect the stripe count. But there is a
race condition and if the OST is deactivated after the call to
ltd_qos_is_usable() but before the stripes are allocated, then
updating the stripe count never occurred.

This causes an LBUG later in lod_striped_create() because fewer
stripes are allocated than the number in llc_stripe_count so it
finds a stripe that is NULL.

The solution is to properly update lod_comp->llc_stripe_count when
the number of stripes created is less than expected.

Fixes: ced540165ef5 ("LU-16623 lod: handle object allocation consistently")
Test-Parameters: testlist=sanity env=ONLY=27V,ONLY_REPEAT=100
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Ia1264f24904fed00454b3bc3c0d6c7b9b947737f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
7 months agoLU-16408 tests: fix replay-dual test 33 34/50434/10
Etienne AUJAMES [Mon, 27 Mar 2023 10:14:34 +0000 (12:14 +0200)]
LU-16408 tests: fix replay-dual test 33

Client can be evicted in REPLAY_LOCK. Wait REPLAY_WAIT import state
before aborting the recovery on the MDS.

When unmounting a combined MDT and MGT, the imperative recovery is
disabled. So, we have to force update the client import states
(MGC/MDC).

Test-Parameters: trivial
Test-Parameters: testlist=replay-dual env=ONLY="33",ONLY_REPEAT=50
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Fixes: 1a79d395dd ("LU-15935 target: keep track of multirpc slots in last_rcvd")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I0869fe968a18795dae39cf39a7009cf444820017
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-16194 lod: define negative extent offset as invalid 84/48684/3
Lei Feng [Wed, 28 Sep 2022 02:16:17 +0000 (10:16 +0800)]
LU-16194 lod: define negative extent offset as invalid

If lu_extent.e_start/e_end is negative after converting to s64,
regard it as invalid data except -1 (EOF).

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I79276a5185f339e9de48fe87c4da39052c7974e1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48684
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 months agoLU-10499 pcc: use foreign layout for PCCRO on server side 75/51375/3
Qian Yingjin [Mon, 11 Sep 2023 15:53:31 +0000 (11:53 -0400)]
LU-10499 pcc: use foreign layout for PCCRO on server side

This patch includes the codes about using foreign layout for PCCRO
on the server side (LOD|MDD|MDT layers).

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I48467be9fef54bd05432528b685241aa53978d24
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51375
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 months agoLU-17088 dom: don't create different size DOM component 69/52269/2
Bobi Jam [Tue, 5 Sep 2023 06:54:44 +0000 (14:54 +0800)]
LU-17088 dom: don't create different size DOM component

Multiple DOM components are allowed in diffrent mirror but they
must be of the same size, mirror extend should check this restraint.

Fix another glitch in lov_init_composite() where dom_size is used
as a __u64 value but declared as boolean.

Fixes: 44a721b8c1 ("LU-11421 dom: manual OST-to-DOM migration via mirroring")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ia0d08c697dbeeb3aa8d20d9849226afa06360012
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52269
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-13306 mgc: handle large NID formats 50/50750/23
James Simmons [Thu, 24 Aug 2023 23:11:55 +0000 (19:11 -0400)]
LU-13306 mgc: handle large NID formats

For newer versions of Lustre the MGS can send mgs_nidtbl_entry
containing NIDs of a larger format. Its also possible an old
MGS will send NIDs of the previous size. We need to handle
both cases. We reused the field of mcb_nm_cur_pass, which only
is used for nodemap, of the struct mgs_config_body to send the
NID size from the client to the MGS. Pre IPv6 clients will
by default have a zero mcb_nm_cur_pass / mcb_nid_size. When
mcb_nid_size is zero the the MGS will treat the client as
pre-IPv6 and send small NID back to the client. This avoids
needing to patch older clients. If the MGS is older then
small size NIDs will be sent back which the new MGC layer can
handle by converting those lnet_nid_t to struct lnet_nid.

To handle this new code the "swab" of the entry is split into
two parts. The "header" is "swab"ed as soon as we know the entry
is large enough for that to make sense. The content containing
NID information is swabbed later once the header has been found
to look sane.

Test-Parameters: serverversion=2.15 testlist=runtests,sanity,recovery-small
Change-Id: I97ebdcecc1ee0fbfe676cbdbdc77edee13e60891
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50750
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
7 months agoLU-16837 llite: handle unknown layout component 60/51060/14
Bobi Jam [Fri, 19 May 2023 09:40:31 +0000 (17:40 +0800)]
LU-16837 llite: handle unknown layout component

If lustre client encounters unknown layout component pattern in
a mirror file, this patch makes client mark this mirror as invalid
and skip it.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie5f44212ab96bdc706cc5a9e11f330234fc01069
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51060
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
7 months agoLU-10465 lov: increase default stripe size to 4MB 18/37318/13
Andreas Dilger [Thu, 23 Jan 2020 20:15:10 +0000 (20:15 +0000)]
LU-10465 lov: increase default stripe size to 4MB

Increase the default stripe size from 1MB to 4MB for better
performance and reduced LDLM lock contention for larger writes.

This can also reduce the need to cache data on the client on a
striped file before a full RPC is generated, since the default
RPC size is 4MB, but with 1MB stripe size, the file would need
4x full stripe_count * stripe_size writes before an RPC is full.

Patch includes several test fixes:
- sanity-pfl: takes into account stripe size in some tests
- sanity-flr: use bigger component size and amount of data to
  saturate all stripes as expected by test
- sanity: 130g to use 1M stripe prior FIEMAP calcs
- sanity-lfsck: 36[a-c] to use 1M stripe as expected by calcs

Test-Parameters: testlist=sanity-compr
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3cef8805247fc5253e0a0ac05157b9d609054df9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/37318
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17041 kernel: update RHEL 8.8 [4.18.0-477.21.1.el8_8] 03/52003/2
Jian Yu [Fri, 18 Aug 2023 21:28:43 +0000 (14:28 -0700)]
LU-17041 kernel: update RHEL 8.8 [4.18.0-477.21.1.el8_8]

Update RHEL 8.8 kernel to 4.18.0-477.21.1.el8_8.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.8 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.8 serverdistro=el8.8 testlist=sanity

Change-Id: Ie24c8e438dd33afafb900664d9a4010160bc1a45
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52003
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17096 debian: add obd_test.ko, llog_test.ko to lustre-tests 98/52398/8
Timothy Day [Sun, 17 Sep 2023 01:00:35 +0000 (21:00 -0400)]
LU-17096 debian: add obd_test.ko, llog_test.ko to lustre-tests

The obd_test.ko module was missing from the lustre-tests
Debian package. Hence, it wasn't being installed on the
Ubuntu clients during testing. This caused sanity/55a and
sanity/55b to consistently fail.

Add llog_test.ko to lustre-tests also. It's not unheard of to
use Ubuntu for Lustre server. So the package may as well include
llog_test.ko.

Also, update debian/.gitignore.

Test-Parameters: trivial testlist=sanity env=ONLY=55,ONLY_REPEAT=50 clientdistro=ubuntu2204
Test-Parameters: trivial testlist=sanity env=ONLY=55,ONLY_REPEAT=50
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I050de4563478996828886ca623fa96b58f9fef5e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52398
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 months agoLU-16661 build: remove -dev packages for Debian 81/52281/4
Andreas Dilger [Tue, 5 Sep 2023 20:18:31 +0000 (14:18 -0600)]
LU-16661 build: remove -dev packages for Debian

Don't depend on libmount-dev, libsnmp-dev, libkeytils-dev for the
lustre-client-utils and lustre-server-utils packages.  These are
only needed for build and for the lustre-client-dkms package.

Disable SNMP by default as this is no longer used anywhere.

Test-Parameters: trivial testlist=runtests clientdistro=ubuntu2204
Fixes: 7dc6e1128a ("LU-15888 build: Debian dkms-debs requires ed and libkeyutils")
Fixes: af2f77633b ("LU-13818 build: use libsnmp-dev instead of libsnmp30")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib788a97028ee40a9c61070d00b823620ec3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52281
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
8 months agoLU-16661 build: use "Recommends: perl" for lustre-iokit 25/52225/3
Jian Yu [Mon, 4 Sep 2023 07:04:50 +0000 (00:04 -0700)]
LU-16661 build: use "Recommends: perl" for lustre-iokit

In lustre-iokit, the "plot" commands all use perl, but
the actual "*-survey" scripts are written in bash, so
the "Requires: perl" in lustre.spec.in for lustre-iokit
could be downgraded to "Recommends: perl" for RHEL 8+
(RHEL 7 does not handle "Recommends:").

Test-Parameters: trivial testlist=obdfilter-survey

Change-Id: I55f3c57e73ac91cedce745dc4f424c3542978cd4
Fixes: 800a9ec58f78 ("LU-16661 build: improve lustre.spec.in Requires")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17000 utils: Fix Resourse leak under mount_utils.c 18/52218/3
Arshad Hussain [Fri, 1 Sep 2023 07:12:18 +0000 (12:42 +0530)]
LU-17000 utils: Fix Resourse leak under mount_utils.c

This patch fixes resource leak error reported
by coverity run.

CoverityID: 399700 ("Resource leak"): mount_utils.c

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ib3281d922936822a0ac298a15d6e8863b3c2c9b7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52218
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17000 ptlrpc: fix string overflow warnings 10/52210/4
Andreas Dilger [Thu, 31 Aug 2023 20:50:56 +0000 (14:50 -0600)]
LU-17000 ptlrpc: fix string overflow warnings

Fix potential string overflow warnings in sptlrpc_flavor2name()
calling strncat() with the full size of the target buffer
instead of the *remaining* space in the target buffer.

Fix potential string overflow warning in sepol_seq_write_old()
and sepol_seq_write() potentially copying an unterminated string
from userspace via strncpy() and not terminating it afterward.

Since the maximum incoming parameter size is known in advance,
is reasonably small (~342 bytes), and is only used temporarily,
reorganize the code to avoid two buffer allocations and copies.
Use memcpy() to copy the string since its length is known, and
always add a NUL terminator to the string afterward.

Improvements to error messages and code style in these functions.

Addresses-Coverity: 199034 ("Out-of-bounds access")
Addresses-Coverity: 199063 ("Out-of-bounds access")
Addresses-Coverity: 199108 ("Out-of-bounds access")
Addresses-Coverity: 397374 ("String not null terminated")
Addresses-Coverity: 397394 ("String not null terminated")

Test-Parameters: trivial testlist=sanity-sec,sanity-selinux
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia810ce9f07b663a90049bb78af21c06f0e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52210
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16796 obd: Change struct obd_device to use kref 79/52179/2
Arshad Hussain [Wed, 30 Aug 2023 09:39:58 +0000 (15:09 +0530)]
LU-16796 obd: Change struct obd_device to use kref

This patch changes struct obd_device to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia8539abb11357b41edd4cf532896d3bc1e66e92f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52179
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16796 mdt: Change struct cdt_agent_record_loc to use kref 53/52153/3
Arshad Hussain [Tue, 29 Aug 2023 07:41:51 +0000 (13:11 +0530)]
LU-16796 mdt: Change struct cdt_agent_record_loc to use kref

This patch changes struct cdt_agent_record_loc to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I99141b00b4cfc7b4b46a87462b9ce21735bb0e7d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17015 obdclass: make upcall cache hashtable size dynamic 28/52128/3
Sebastien Buisson [Mon, 28 Aug 2023 09:37:51 +0000 (11:37 +0200)]
LU-17015 obdclass: make upcall cache hashtable size dynamic

The hash table used by the upcall cache mechanism should have an
adjustable size, depending on the purpose and context where it is
used.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I53c5cb14f9a5630fc269d97cead9a5ca6a33895e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52128
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16743 lod: create stripe with correct attr 52/52052/6
Lai Siyao [Mon, 21 Aug 2023 22:47:33 +0000 (18:47 -0400)]
LU-16743 lod: create stripe with correct attr

lod_xattr_set_lmv() create directory stripe with master object attr,
but it shouldn't change attr->la_valid, otherwise bogus data may be
set on stripe object.

Zfs osd_create() copies attr to object directly, clear la_flags if
LA_FLAGS is not set in la_valid.
_
Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 mdtfilesystemtype=zfs testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 mdtfilesystemtype=zfs testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 mdtfilesystemtype=zfs testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 mdtfilesystemtype=zfs testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 mdtfilesystemtype=zfs testlist=sanity
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8385f36bd2eee0e55cbe6bd031b0e013cda40e06
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52052
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16838 tests: use import name in 398a 64/51064/3
Patrick Farrell [Fri, 19 May 2023 16:24:25 +0000 (12:24 -0400)]
LU-16838 tests: use import name in 398a

The LU-15670 test change assumes ost1_import is always
OST0000.  This isn't quite always true, so the test is
failing in certain configurations.

Change it to use the import name.

Fixes: 649d638467 ("LU-15670 clio: Disable lockless for DIO with O_APPEND")
Test-parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ifaefc503d1118ecd6fd45b661cbe94607f7ad799
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51064
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
8 months agoLU-16834 obdfilter: Do not attach device if already present 34/51034/5
Arshad Hussain [Wed, 17 May 2023 09:17:24 +0000 (05:17 -0400)]
LU-16834 obdfilter: Do not attach device if already present

Running obdfilter-survey where "case=disk" and targets are
repeated with same OST's names. obdfilter-survey throws
"error: attach: File exists". This is because the on the first
iteration the attach and setup is already done and subsequently
the attach fails as the device/uuid is already present.

Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8ab9ea905ec86b9e1aa8906bebcc38fee0fdbc23
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16521 tests: allow + separator for racer 66/49866/10
Elena Gryaznova [Wed, 31 May 2023 19:42:57 +0000 (22:42 +0300)]
LU-16521 tests: allow + separator for racer

The Test-Parameters: line parses ',' even with quotes, so it cannot
be used as a separator in Autotest for RACER_PROGS and RACER_EXTRA.

Allow both ',' and '+' as a separator for both RACER_PROGS and
RACER_EXTRA tasks so specific racer tasks can be run.

Do not always enable dir_remote and dir_migrate if RACER_PROGS set.

Test-Parameters: trivial testlist=racer env=RACER_PROGS=file_rename+file_truncate,RACER_EXTRA=file_create:5+dir_create:5+dir_remote:5
Fixes: 6d9e74580e ("LU-14274 tests: enhance racer to set extra layout")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11466
Change-Id: I3f3b4da6f76ccfac2680068184dc4714187a9a4d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49866
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-15233 llite: Remove extra cl_page_delete call 83/45583/4
Patrick Farrell [Mon, 15 Nov 2021 22:14:07 +0000 (17:14 -0500)]
LU-15233 llite: Remove extra cl_page_delete call

"LU-5108 osc: Performance tune for LRU" added a call to
cl_page_delete to the page discard code used by the OSC
lru shrinker.

This seems to have been a mistake.  cl_page_discard causes
page invalidation, which calls ll_invalidatepage, which
calls cl_page_delete if the page can be found.

Since the page is locked here and ll_invalidatepage checks
for the cl_page, this extra call to cl_pege_delete has
probably never caused an issue.

But it's extraneous and kind of weird, and misled me a bit
when working on another bug.  Let's remove it.

Fixes: b117bc837c02 ("LU-5108 osc: Performance tune for LRU")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1380f532359ba949a0bbb8b53227a6c8e6491030
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16943 tests: use primary ost1 server in replay-single/135 58/52058/3
Jian Yu [Thu, 24 Aug 2023 00:56:05 +0000 (17:56 -0700)]
LU-16943 tests: use primary ost1 server in replay-single/135

This patch fixes replay-single test_135() to make sure
the primary ost1 server is used at the beginning of the test.

Test-Parameters: trivial env=REPLAY_SINGLE_EXCEPT=200 testlist=replay-single

Test-Parameters: trivial env=REPLAY_SINGLE_EXCEPT=200,FAILURE_MODE=HARD \
    clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
    austeroptions=-R failover=true iscsi=1 \
    testlist=replay-single,mmp

Fixes: 1b73b6465b77 ("LU-16943 tests: fix replay-single/135 under hard failure mode")
Change-Id: Ia25314255c9f00ba71687e1f757517f37031caed
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52058
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-12518 llite: rename count and nob variables to bytes 54/38154/14
Andreas Dilger [Thu, 31 Aug 2023 16:26:35 +0000 (12:26 -0400)]
LU-12518 llite: rename count and nob variables to bytes

Rename "*count", "*nob", and "cnt" and similar variables to use
"*bytes" to make it clear what the units are vs. number of pages.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I195f2db4182e4b3099b3f4aa2e25b91f9f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38154
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-8585 llite: add special fid handling for fhandle API 07/51707/11
James Simmons [Thu, 10 Aug 2023 01:59:28 +0000 (21:59 -0400)]
LU-8585 llite: add special fid handling for fhandle API

Lustre has been moving its FIDs handling to the fhandle API. This
works well for normal files but Lustre has special FIDs that don't
map to normal files which are used by user land applications. Add
special handling to ll_iget_for_nfs() so the fhandle API can work
with these special FIDs. These FIDs should also work with filesets.

Change-Id: I4b55d96cc9eea0b1fb898f94c071c8b30c7b2bd5
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51707
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-6142 misc: update headers in config, debian, rpm 06/52106/3
Timothy Day [Sun, 27 Aug 2023 00:19:52 +0000 (00:19 +0000)]
LU-6142 misc: update headers in config, debian, rpm

Update the file header to have the SPDX license and
use the standard format.

Fix minor style issues with comments in a few files.
Remove `dnl` from m4 files.

Files that are uncertain are left as NOASSERTION
for the license identifier. This makes no claim
about the file. It is used to track files so they
can be addressed later.

https://spdx.github.io/spdx-spec/v2-draft/package-information/#75-package-supplier-field

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I212ce05a4292bbb0d71372d9d75880ce45a219f3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16847 ldiskfs: reduce a memory usage by ost IO threads 92/51392/6
Alexey Lyashkov [Wed, 21 Jun 2023 09:10:27 +0000 (12:10 +0300)]
LU-16847 ldiskfs: reduce a memory usage by ost IO threads

page array is useless once lnb array added it might addressed
via lnb->lnb_page, let's remove it and reduce memory consumption.

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ieb0c186e27f56c770fd2ebfbddce9ccf19791611
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51392
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16847 ldiskfs: refactor code brw_stats code. 91/51391/9
Alexey Lyashkov [Tue, 20 Jun 2023 14:01:03 +0000 (17:01 +0300)]
LU-16847 ldiskfs: refactor code brw_stats code.

counting a number disk or logical extents don't
needs a loop.
All information exist around of ldiskfs_map_blocks.

HPe-bug-id: LUS-11645
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I77f3707b88e9bdf6ea06acc950af2a41f056f5d0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51391
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-930 doc: add static analysis documentation 86/52186/2
Timothy Day [Wed, 30 Aug 2023 17:59:20 +0000 (17:59 +0000)]
LU-930 doc: add static analysis documentation

Add more documentation about Clang/LLVM and other
static analysis tools for Lustre. This will make it
easier for other developers to try out various tools.
It will also serve as a place to record best practices
and experiences. Hopefully, this will increase awareness
and usage of these various tools and improve the Lustre
codebase as a result.

This patch also has a few other small doc updates.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I4bd860775729aaa4ef1ae1cc2cceb6435f3affdd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52186
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16796 mdt: Change struct cdt_agent_req to use kref 48/52148/3
Arshad Hussain [Tue, 29 Aug 2023 07:00:35 +0000 (12:30 +0530)]
LU-16796 mdt: Change struct cdt_agent_req to use kref

This patch changes struct cdt_agent_req to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0a99002504ff453b8b748391f08bd1020c545321
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52148
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17058 build: add help and checkpatch as make targets 42/52142/3
Timothy Day [Mon, 28 Aug 2023 19:52:04 +0000 (19:52 +0000)]
LU-17058 build: add help and checkpatch as make targets

Add `make help` to print out available make targets. The
output is styled after the Linux kernel `make help`.
Add `make checkpatch` to run checkpatch.pl script
against most recent commit.

Update README to mention `make help`.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I65ce84040502994ae7caa0c8b72d808442f6b79e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-13031 tests: skip sanity/test_205h,205i in interop 95/52095/2
Thomas Bertschinger [Fri, 25 Aug 2023 16:28:16 +0000 (12:28 -0400)]
LU-13031 tests: skip sanity/test_205h,205i in interop

Skip sanity tests 205h and 205i when the MDS version is too old
to have the jobid xattr changes. Fix test 103a to not try to set
the job_xattr parameter when it does not exist.

Fixes: 23a2db28dcf1 ("LU-13031 jobstats: store jobid in xattr when files are created")
Test-Parameters: trivial testlist=sanity env=ONLY="103a 205h 205i"
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Iaa5d0c1a7f3fa6769fab4340ade315e7a49df009
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52095
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16831 lod: replace (__u16)-1 with LOV_ALL_STRIPES 55/52055/3
Jian Yu [Wed, 23 Aug 2023 19:05:22 +0000 (12:05 -0700)]
LU-16831 lod: replace (__u16)-1 with LOV_ALL_STRIPES

This patch replaces "(__u16)-1" with constant LOV_ALL_STRIPES
and replaces "(__u64)-1" with OBD_OBJECT_EOF.

Test-Parameters: trivial

Change-Id: I2345c5e67da20328e7173c9add8da27015df9d13
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52055
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
8 months agoLU-10026 csdc: DoM pattern could be a combined value 78/51978/5
Bobi Jam [Thu, 17 Aug 2023 17:00:03 +0000 (01:00 +0800)]
LU-10026 csdc: DoM pattern could be a combined value

DoM pattern is LOV_PATTERN_MDT for now, and in the future it could
be combined with LOV_PATTERN_COMPRESS to represent a compressed
DoM component.

Fix a minor glitch for lov_getstripe_old code path (in
ll_lov_getstripe_ea_info), which intends to return the last component
stripe info but the commit abf04e7ea3 omits to correctly set the
last component stripe info before using it.

Fixes: abf04e7ea3 ("LU-14337 lov: return valid stripe_count/size for PFL files")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Id0779c30c004b6979f88bf96b7b7b74a8b8c26e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 months agoLU-17023 krb: use a Kerberos realm different from default 14/51914/10
Sebastien Buisson [Thu, 10 Aug 2023 11:05:52 +0000 (13:05 +0200)]
LU-17023 krb: use a Kerberos realm different from default

It makes sense to give the ability to specify a Kerberos realm that is
different from the default realm as returned by
krb5_get_default_realm().

On client side, the desired realm needs to be specified via the new
'-R' option to lgss_keyring. This can be specified in the config file
/etc/request-key.d/lgssc.conf to replace the default domain, e.g.:
create lgssc * * /usr/sbin/lgss_keyring -R DOMAIN.COM %o %k %t %d %c %u %g %T %P %S

On server side, the desired realm can be specified via the new '-R'
parameter of the lsvcgssd daemon, replacing the default realm.

This patch adds sanity-krb5 test_1b to exercise the new realm options,
by just re-using the same realm as the test system is configured to
use. And former test_1 is renamed test_1a.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9c91d5cb9904781d546e77b1e46115fed433618f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51914
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17050 tests: test Kerberos env in sanity-krb5 68/52068/4
Sebastien Buisson [Thu, 24 Aug 2023 09:40:46 +0000 (11:40 +0200)]
LU-17050 tests: test Kerberos env in sanity-krb5

Test Kerberos environnement is sane before trying to launch
sanity-krb5 tests.

Test-Parameters: trivial kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1675ba7db8c62687c69359a15cc931b5dfd40018
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52068
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17000 lnet: fix various bugs in lib-move.c 76/51876/5
Timothy Day [Sat, 5 Aug 2023 19:12:15 +0000 (19:12 +0000)]
LU-17000 lnet: fix various bugs in lib-move.c

In lnet_select_peer_ni, best_lpni_is_preferred is often written
to before being immediately overwritten.

Addresses-Coverity-ID: 397646 ("Unused value")
Addresses-Coverity-ID: 397434 ("Unused value")

Both LNetPut and LNetGet were not freeing msg under certain failure
conditions. This leaks a small amount of memory each time it occurs.

Addresses-Coverity-ID: 397644 ("Resource leak")
Addresses-Coverity-ID: 397133 ("Resource leak")

Fix potential null dereference in lnet_find_best_ni_on_local_net
when best_lp gets defined by best_net doesn't.

Addresses-Coverity-ID: 397568 ("Explicit null dereferenced")

lnet_post_send_locked has an un-needed null check, since every path
leading to that block of code must dereference ni anyway.

Addresses-Coverity-ID: 397278 ("Dereference before null check")

In the other usage of msg_peerrtrcredit, it is accessed under
lpni_lock. Change the second usage to also be accessed under this
lock.

Addresses-Coverity-ID: 397606 ("Data race condition")

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I4012f407b10d0c9644535d49cce83a6c95d3d22d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16988 mdd: update projid when merging layout 59/51859/5
Hongchao Zhang [Mon, 14 Aug 2023 07:28:17 +0000 (15:28 +0800)]
LU-16988 mdd: update projid when merging layout

When creating mirrors by the special directory ".lustre/fid",
the project ID could not be set correctly, which causes
wrong quota calculation for the projid.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ia4c3a8973b8c467642e12629d36fa42d64162084
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51859
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16977 utils: access_log_reader accesses beyond batch array 54/51754/4
Alexandre Ioffe [Tue, 25 Jul 2023 02:08:26 +0000 (19:08 -0700)]
LU-16977 utils: access_log_reader accesses beyond batch array

Fixed access_log_reader accesses sorted batch array beyond upper
boundary when batch-fraction 100%: consider fraction = 100% as a
special case, which requires no sorting and filtering.
Use a separate thread function to process 100% fraction case.
Made some minor changes using enum type to nicefy the code.

Test-Parameters: trivial testlist=sanity
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iba1734b17dc901875f343c793688aec17b9f7a93
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51754
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16763 obdclass: add unit tests for OBD life cycle 03/51103/7
Timothy Day [Fri, 12 May 2023 04:32:53 +0000 (04:32 +0000)]
LU-16763 obdclass: add unit tests for OBD life cycle

Add some simple OBD life cycle tests. These tests
consist of a kernel module which defines a simple OBD
device, and a few sanity tests. The new OBD device
print logs validating that it has been loaded
correctly. Unlike other OBD devices, this one has
minimal side-effects. The new test OBD device has
been added to the test rpm and dkms.

sanity/55a aims to test that a device can loaded
properly and found by the various OBD device search
functions.

sanity/55b aims to load the maximum number of allowed
OBD devices, which is currently 8192. It also times how
long it takes to perform the loading and unloading. In
the future, this could be used to test for performance
regression.

The tests avoid using any userspace function, like lctl
or lfs, since I noticed bugs when using them with a large
number of devices. Follow-up patches will include fixes
and more testing.

I used a variation of these tests when debugging
sanity/60a failures, and when debugging removing
MAX_OBD_DEVICES.

This test (obd_test.c) and the llog test (llog_test.c)
should probably be moved to a different directory in a
follow-up patch.

Test-Parameters: trivial testlist=sanity env=ONLY=55,ONLY_REPEAT=25
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ibc347ac962c59a4bbc26410c30f9cc5529e6c84d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51103
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-14111 tests: only support recovery-small test 146 for 2.15.54+ 98/52098/4
James Simmons [Fri, 25 Aug 2023 22:42:23 +0000 (18:42 -0400)]
LU-14111 tests: only support recovery-small test 146 for 2.15.54+

If you running newer clients with older servers (2.15.3) then
recovery-small test 146 will fail since the old servers lack
the new sysfs file eviction_count.

Fixes: 3c69d46e176 ("LU-14111 obdclass: count eviction per obd_device")
Test-Parameters: trivial testlist=recovery-small env=ONLY=146 mdsversion=2.15.3
Change-Id: I53f6dabd305ec920e8de1d9fde407b2f2c15ba69
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52098
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16314 llite: Prefer %pK with seq_printf 12/51212/7
Shaun Tancheff [Thu, 24 Aug 2023 09:14:09 +0000 (04:14 -0500)]
LU-16314 llite: Prefer %pK with seq_printf

Update procfs and sysfs users to prefer %pK to when
printing pointers so that when kptr_restrict is set to 1
a real pointer value is provided.

To enable printing non-hashed pointer values:
  sysctl -w kernel/kptr_restrict=1

This change also sets kptr_restrict to 1 for all clients
and servers under test by test framework.

Test-Parameters: trivial
HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iccfce1399648e752cb7b78afc75aacbfb0bde390
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51212
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16043 osc: allow error for write on CL_FSYNC_DISCARD 32/48032/4
Vladimir Saveliev [Wed, 26 Jul 2023 13:09:18 +0000 (16:09 +0300)]
LU-16043 osc: allow error for write on CL_FSYNC_DISCARD

If case of CL_FSYNC_DISCARD error is allowed for write of osc object.

Otherwise, the included test fails in rm with:
  (osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16
  (osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed:
  (osc_page.c:175:osc_page_delete()) LBUG

Test-Parameters: trivial testlist=sanity env=ONLY=907
HPE-bug-id: LUS-10410
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0aae0dc470ba0371964e7643a6d84b19a1b4e106
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-15619 osc: Rename brw_page members 15/46715/4
Patrick Farrell [Wed, 30 Aug 2023 20:05:07 +0000 (16:05 -0400)]
LU-15619 osc: Rename brw_page members

The brw_page members have generic names - add a structure
related tag to the names.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I98e6f874902074934eb01476a9595f502526bc38
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46715
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-8130 osc: convert osc_quota hash to xarray 38/32038/76
James Simmons [Mon, 28 Aug 2023 14:03:03 +0000 (10:03 -0400)]
LU-8130 osc: convert osc_quota hash to xarray

The cl_quota_hash originally had 3 hashes, one for each type of quota
(USR, GRP, PRJ) that just stored on the client whether a particular
quota ID was over its limit. This was overkill since cl_quota_hash
only needs one bit to check if a particular ID has exceeded quota
with IO from this client, and there will usually be only a few IDs
that are actually exceeding their limit where a client is involved.
Instead, use the quota ID as the index into an Xarray, and store
a value with the quota TYPE(s) that are over the limit for that ID.
We only need to test the presence/absence of an ID and a quota type
without the need to store any additional values (the clients do not
track the actual quota usage or limits).
To test if a quota is exceeded for particular ID is a two-step
process. First check if there is any entry for the particular ID,
and if it exists then check which quota type (USR, GRP, PRJ) is
over the limit for that ID value.  The same is done when setting
a particular quota ID/TYPE is over its limit - first lookup the
ID and then add the TYPE flag to the value if not already set.
The Xarray implementation does offer using "marks" (up to 3 bits
per index) but in this case there is no other value that needs to
be stored into the Xarray other than one bit for any exceeded type,
so they are not used here.

Change-Id: I9355ed2a7158f0d5cc0d600ad51ea1a1434f3e98
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/32038
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-17052 libcfs: fix build for old kernel 90/52090/4
Xinliang Liu [Fri, 25 Aug 2023 03:24:12 +0000 (03:24 +0000)]
LU-17052 libcfs: fix build for old kernel

Fix build for kernel v4.17 to v4.19.
These old kernels already have xarray.h and #include by fs.h but
don't have full xarray support. It is needed to #include libcfs's
xarray.h also to contain xarray support.

Rename the header define macro to ensure libcfs's xarray.h will be
included。

Test-Parameters: trivial
Test-Parameters: testlist=sanityn envdefinitions=ONLY=77,ONLY_REPEAT=20
Fixes: 778791dd7da1 ("LU-8130 libcfs: don't use radix tree for xarray")
Change-Id: I760c394cc1d885c2de79d1770243ab7f292b9b3a
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52090
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 months agoLU-16973 osd: adds SB_KERNMOUNT flag 31/51731/6
Alexander Boyko [Fri, 7 Jul 2023 19:35:51 +0000 (15:35 -0400)]
LU-16973 osd: adds SB_KERNMOUNT flag

During umount mntput() is called. It uses delayed_mntput()
function, and it could take much time to finish. A block
device is occupied during delayed work.

[ 8753.941980] Lustre: server umount XXX complete
[ 8800.129136] sysrq: SysRq : Trigger a crash

PID: 319306   TASK:XXXX   CPU: 2    COMMAND: "kworker/2:0"
 #0 __schedule at ffffffff9754e1d4
 #1 preempt_schedule_common at ffffffff9754e6fa
 #2 _cond_resched at ffffffff9754e72d
 #3 invalidate_mapping_pages at ffffffff96e72da5
 #4 invalidate_bdev at ffffffff96f5d13c
 #5 ldiskfs_put_super at ffffffffc1c82e34 [ldiskfs]
 #6 generic_shutdown_super at ffffffff96f1bdcc
 #7 kill_block_super at ffffffff96f1bed1
 #8 deactivate_locked_super at ffffffff96f1b784
 #9 cleanup_mnt at ffffffff96f3b86b

Let's use SB_KERNMOUNT flag during mount, it leads to
synchronous mntput().
It also calls flush_delayed_fput during umount to finish
delayed fput.

HPE-bug-id: LUS-11629
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ia6729f6cbac85c3626562e946a4b96665a143714
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51731
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-12064 ptlrpc: set at_min=5 by default 09/50609/4
Andreas Dilger [Sat, 18 Mar 2023 00:34:06 +0000 (18:34 -0600)]
LU-12064 ptlrpc: set at_min=5 by default

Having at_min=0 as the default value can result in clients timing
out and/or being evicted too easily when there is a sudden spike
in server load.  Increase at_min to 5s by default.

For large clusters, at_min=15 is more reasonable, but distributing
a variable at_min value to clients will need more complex changes.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3463cbc642458f6dd5977fe34478b135d1cd0219
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50609
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
8 months agoLU-13805 llite: add mm to dio struct 47/49947/31
Patrick Farrell [Wed, 8 Feb 2023 19:01:24 +0000 (14:01 -0500)]
LU-13805 llite: add mm to dio struct

When copying to or from userspace, we must use the mm from
the userspace thread.  This can be done either by running
in that thread or borrowing its mm.  Unaligned DIO does
some memory movement to userspace in ptlrpcd threads, so it
requires the user mm be stored in the sub dio.

This will be used by the main unaligned DIO patch and has
been split out for reviewability.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I419cb9f1899b8c8f9790ce25b3aba1d6f07397aa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49947
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
8 months agoLU-13805 clio: Add csi_complete 13/49913/35
Patrick Farrell [Mon, 6 Feb 2023 18:10:10 +0000 (13:10 -0500)]
LU-13805 clio: Add csi_complete

The next patch will make end_io potentially sleep, so we
need to modify how completion works to avoid holding a
spinlock over the end_io() call.

This patch is strictly supporting work for the next patch
and has been pulled out so it can be tested by itself.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iba3388a0e09fdd0ab2f4a95f1cde96908a485cfa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49913
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoNew tag 2.15.58 2.15.58 v2_15_58
Oleg Drokin [Fri, 1 Sep 2023 20:38:40 +0000 (16:38 -0400)]
New tag 2.15.58

Change-Id: I6d58a43d5904c24d32575b4790bcaabd9ebdfb6f
Signed-off-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17038 tests: remove unused compile.sh script 54/52054/2
Timothy Day [Wed, 23 Aug 2023 16:26:41 +0000 (16:26 +0000)]
LU-17038 tests: remove unused compile.sh script

This script just runs make automatically. It doesn't
appear to be called by any other Lustre sanity
test script. I doubt it has been used in many
years. This patch removes it.

Checked for usage using:

 `git grep -i "compile.sh"`

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If1615196bc8d004a63ad8baddd1d3fe3e360dc74
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52054
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17038 tests: remove mlink utility 51/52051/4
Timothy Day [Wed, 23 Aug 2023 02:52:32 +0000 (02:52 +0000)]
LU-17038 tests: remove mlink utility

The mlink utility is nearly identical to the link utility
provided by coreutils. They only differ by some GNU
boilerplate. All tests using mlink are replaced with link.
Luckily, mlink is only used in a few places.

Used the following command:

 `git grep -i mlink | grep -i -v symlink`

to track down all uses of mlink.

Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I197235572d2cb267ee68930c64058e4f5ffe5be1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52051
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-12678 lnet: discard lnet_kvaddr_to_page 41/52041/3
Mr NeilBrown [Wed, 23 Aug 2023 00:18:41 +0000 (20:18 -0400)]
LU-12678 lnet: discard lnet_kvaddr_to_page

This function is not needed, so discard it.

Change-Id: Iffe9745adf477a5f4b78d8ef191849179426cb07
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52041
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17043 enc: fix osd lookup cache for long encrypted names 16/52016/2
Sebastien Buisson [Mon, 21 Aug 2023 09:44:32 +0000 (11:44 +0200)]
LU-17043 enc: fix osd lookup cache for long encrypted names

Fix osd lookup cache to support files with long encrypted names.
Those encrypted names can be up to 256 bytes, not NUL terminated.

Fixes: 29f8eb2a67 ("LU-16405 osd: lookup cache")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ica2329c8a0990395307a14fe9bb9d43db3b364ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-15367 llite: iotrace standardization 02/52002/3
Patrick Farrell [Fri, 18 Aug 2023 18:31:32 +0000 (14:31 -0400)]
LU-15367 llite: iotrace standardization

Clean up and standardize some of the iotrace messages for
easier parsing.

Add a clear 'START' indicator.

Remove a now-redundant debug message in the mmap code.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia620cc8c783509cbc3f47b21a274d67d860b80e7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52002
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
8 months agoLU-17039 build: cleanup ib_dma_map_sg 79/51979/2
Shaun Tancheff [Fri, 18 Aug 2023 04:50:56 +0000 (23:50 -0500)]
LU-17039 build: cleanup ib_dma_map_sg

CONFIG_INFINIBAND_VIRT_DMA is a kernel configuration option
that in some cases conflicts with the configuration of the
externally provided OFED stack.

During configure when ib_dma_map_sg fails to build correctly
we can simply #undef CONFIG_INFINIBAND_VIRT_DMA to resolve
the inconsistent configuration that breaks ib_dma_map_sg

Test-Parameters: trivial
HPE-bug-id: LUS-11771
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Id0849464d3ffbd573cac13016191d80c6ea991af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17038 tests: remove munlink utility 77/51977/4
Andreas Dilger [Thu, 17 Aug 2023 22:06:36 +0000 (16:06 -0600)]
LU-17038 tests: remove munlink utility

The munlink utility is obsoleted by the unlink command added in
the coreutils package many moons ago, and can be removed.  All
tests using munlink are replaced with unlink.

Test-Parameters: trivial testlist=recovery-small,replay-dual,replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I984406525ed958814bd8af74a2d81c4920e320b0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51977
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16510 build: check if CONFIG_FORTIFY_SOURCE is defined 73/51973/2
Jian Yu [Thu, 17 Aug 2023 20:46:38 +0000 (13:46 -0700)]
LU-16510 build: check if CONFIG_FORTIFY_SOURCE is defined

The linux/fortify-string.h header file should not be
included while the kernel config option CONFIG_FORTIFY_SOURCE
is not defined.

Change-Id: I2e1905406e892b182f143d512a2d3722b141e52d
Fixes: 919b93b951d4 ("LU-16510 build: fortified memcpy from linux 6.1")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51973
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-17036 utils: make sure resize option is legit 70/51970/2
Li Dongyang [Thu, 17 Aug 2023 13:27:00 +0000 (23:27 +1000)]
LU-17036 utils: make sure resize option is legit

To align the metadata on 1MB boundaries we manually
set the resize blocks to 16368G for 4K block size,
however mke2fs expects the resize blocks is bigger
than device size.

For devices between 16368G and 16384G the mke2fs
will fail with:
The resize maximum must be greater than the filesystem size.

Change-Id: I4567a79c1405e9527d7f0f9bec4c8a7aae0eba6c
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51970
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17031 build: fix refefine __compiletime_strlen error 53/51953/2
Qian Yingjin [Wed, 16 Aug 2023 02:11:39 +0000 (22:11 -0400)]
LU-17031 build: fix refefine __compiletime_strlen error

Lustre build failed on Ubuntu 2204 kernel v5.17 with "redefine
__compiletime_strlen".
This patch fixes this build error.

Fixes: 919b93b951 ("LU-16510 build: fortified memcpy from linux 6.1")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ic26daecd6b91614e01b5b0030f40eede205a21f7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51953
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17030 llite: allow setting max_cached_mb to a % 52/51952/7
Patrick Farrell [Tue, 15 Aug 2023 23:08:12 +0000 (19:08 -0400)]
LU-17030 llite: allow setting max_cached_mb to a %

Lustre's max_cached_mb parameter is hard to use because it
must be set to a specific numeric value, so in effect it
cannot be set on the server side unless all clients are
guaranteed identical.

Let's add the ability to set that to a % of memory to make
it more useful.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1f9f5a8a5d671ab00b7ab6133bb9b1d1214ca59e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-10885 docs: note flock now being enabled by default 48/51948/2
Laura Hild [Tue, 15 Aug 2023 17:04:37 +0000 (13:04 -0400)]
LU-10885 docs: note flock now being enabled by default

mount -o flock was made the default, but the mount.lustre(8) man-page
still said noflock is default.  Text based on comments in LU-10885 and
http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes.

Signed-off-by: Laura Hild <lsh@jlab.org>
Change-Id: I48bfc0260fb948771f5cf4fb8cbc6ee9588e2217
Test-Parameters: trivial
Fixes: 16fb13eb3863 ("LU-10885 llite: enable flock mount option by default")
Fixes: 3613af3e15cb ("LU-10885 llite: enable flock mount option by default")
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17015 gss: support large kerberos token on client 46/51946/6
Aurelien Degremont [Tue, 15 Aug 2023 14:03:07 +0000 (16:03 +0200)]
LU-17015 gss: support large kerberos token on client

If the current Kerberos setup is using large token, like
when PAC feature is enabled for Kerberos, client can crash.

Return an error instead of asserting to avoid the crash
and increase the default buffer size to 4kB instead of 1kB.
This will only increase the SEC_CTX_INIT request size, and
the buffer is shrunk before being sent over the wire.

This will allow security token up to 2kB to be properly
handled by Lustre. Above that size, a different issue will
happen on server side that will require another patch.

Test-Parameters: trivial kerberos=true testlist=sanity-krb5
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: I9ce30ee7f8c95bfe41525c49986ffac45ffac97c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51946
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-17006 lnet: set up routes for going across subnets 21/51921/4
Serguei Smirnov [Fri, 11 Aug 2023 00:58:11 +0000 (17:58 -0700)]
LU-17006 lnet: set up routes for going across subnets

Modify ksocklnd-config to set up route which features
default gateway for the subnet in case if default gateway
is defined, for example:
        ip route add default via <gw_for_eth0> dev eth0 table eth0
which results in a route similar to the following added to
the eth0 route table:
        default via <gw_for_eth0> dev eth0

If there's no gateway found for the eth0 subnet, keep the old
behaviour which results in the following added to eth0
route table:
        <eth0_subnet> dev eth0 proto kernel scope link src <eth0_ip>

This makes sure that MR traffic goes out the intended interface
as selected by LNet no matter whether going across subnets or not.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I84a299c8b7eb4cdb4fc24408a1e42ad0283d9219
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51921
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16766 obdclass: trim kernel thread names in jobids 19/51919/2
Thomas Bertschinger [Thu, 13 Jul 2023 22:32:52 +0000 (18:32 -0400)]
LU-16766 obdclass: trim kernel thread names in jobids

When collecting jobstats on operations coming from kernel threads, it
is more useful and reduces the noisiness of the data if the names of
kernel threads are trimmed so that all "kworker/CPU:ID" threads are
collected under "kworker", all "ll_sa_PID" threads under ll_sa, etc.

Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Icd82a99c1153de0277ea5ed3f4b1d92535809921
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51919
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17020 kernel: update RHEL 9.2 [5.14.0-284.25.1.el9_2] 86/51886/4
Jian Yu [Tue, 8 Aug 2023 22:43:03 +0000 (15:43 -0700)]
LU-17020 kernel: update RHEL 9.2 [5.14.0-284.25.1.el9_2]

Update RHEL 9.2 kernel to 5.14.0-284.25.1.el9_2.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el9.2 serverdistro=el9.2 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el9.2 serverdistro=el9.2 testlist=sanity

Change-Id: Icdbd9cfa18a72d3e6f09f366952e6e0f2ac1ebd2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17013 lov: fill FIEMAP_EXTENT_LAST flag 63/51863/9
Lei Feng [Thu, 3 Aug 2023 09:44:15 +0000 (17:44 +0800)]
LU-17013 lov: fill FIEMAP_EXTENT_LAST flag

If file has N extents and get the fiemap with exactly N
extent slots, the last extent will miss FIEMAP_EXTENT_LAST
flag. Fix it.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: testlist=sanityn env=ONLY=71a+71b+71c
Change-Id: I4556b31f0d04bdf8e83f323e83b871b093beaa5e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51863
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
8 months agoLU-17011 utils: monotonic clock in lfs mirror 52/51852/4
Alex Zhuravlev [Wed, 2 Aug 2023 10:31:57 +0000 (13:31 +0300)]
LU-17011 utils: monotonic clock in lfs mirror

use monotonic clocks instead of realtime to avoid affecting
bandwidth or hanging the transfer if the clock is changed.

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I58cf327d235448e93fa2ed63cefdf4dd01306e71
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51852
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-17009 tests: fix runtests to read file name with backslash 47/51847/2
Jian Yu [Wed, 2 Aug 2023 07:16:04 +0000 (00:16 -0700)]
LU-17009 tests: fix runtests to read file name with backslash

If a file in /etc dir has a name with backslash, then runtests
will fail because the read command considers the backslash as
an escape character. This patch fixes the issue by adding "-r"
option to read.

Change-Id: Iab912ba9708f5b64e6bb8d8adc266ff23ed32de5
Test-Parameters: trivial testlist=runtests
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17000 lnet: remove redundant errno check in liblnetconfig.c 46/51846/3
Jake McManus [Thu, 10 Aug 2023 03:12:03 +0000 (23:12 -0400)]
LU-17000 lnet: remove redundant errno check in liblnetconfig.c

Variable root is assigned NULL at the beginning of
lustre_lnet_show_stats(). If l_ioctl() fails, its return value
stored in rc will take the True path in the following conditional.
This conditional currently contains a redundant check for errno,
despite the fact that rc would = -errno in this case. If errno had
changed between the l_ioctl() call and this subsequent read, errno
could be 0, which would, from the out: label, lead to a NULL
root being used as a parameter in cYAML_insert_sibling() and
dereferencing the NULL root pointer.

Replaced l_errno's use as a parameter in strerror with -rc, and
removed decleration and other references to l_errno.

Addresses-Coverity-ID: 397850 ("Explicit null dereferenced")

Signed-off-by: Jake McManus <jacobpmcmanus@gmail.com>
Change-Id: I78f080837b60c8216c52bda8562d4c0f9f45a132
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51846
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16866 tests: Use wait_update to check LNet recovery state 45/51845/4
Chris Horn [Mon, 31 Jul 2023 19:03:57 +0000 (13:03 -0600)]
LU-16866 tests: Use wait_update to check LNet recovery state

The monitor thread is somtimes woken up on demand and sometimes sleeps
for one second intervals. This makes it hard to precisely predict how
long we need to sleep for ping counts to update and NIs to be
processed out of recovery.
Use wait_update when checking LNet recovery queues and ping counts.
Additional drop rules are added to tests 210 and 211 because it has
been observed that other test instances may issue pings to the node
running 210/211 and cause the ping_count to reset. These additional
drop rules ensure that any incoming messages are dropped.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=210,211,216
Test-Parameters: testlist=sanity-lnet env=ONLY=211,ONLY_REPEAT=100
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ief84388222e46c23952af4ad1d098924e73a8598
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51845
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17000 misc: remove Coverity annotations 93/51793/2
Timothy Day [Fri, 28 Jul 2023 04:49:50 +0000 (04:49 +0000)]
LU-17000 misc: remove Coverity annotations

These Coverity function annotations were added
around 10 years ago. Since then, Coverity seems
to produce less false positives. Out of about 20
annotations, only 3 warnings get surpressed.
Thus, the applicability of these annotations
should be re-evaluated.

Coverity has more advanced tools now for reducing
false positives. Various Lustre functions and
macros could be modeled rather than using
function annotations. But first, we need to get
a good idea of what kinds of false postives are
being generated.

https://scan.coverity.com/tune

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ibcb9cf55574675e20b13a4f7a1b9142a3b75e262
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16984 tests: replay-dual/31 checks file from DIR2 62/51762/2
Lei Feng [Wed, 26 Jul 2023 00:52:10 +0000 (08:52 +0800)]
LU-16984 tests: replay-dual/31 checks file from DIR2

In replay-dual/test_31, check file existence from DIR2.
Add more messages for diagnosis.

Fixes: 07764c4eeb ("LU-16953 tests: wait longer in replay-dual/test_31")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=replay-dual env=ONLY=31,ONLY_REPEAT=100
Change-Id: Iee679ee94ac2cb51baad1651bfaddf452fafdbd1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16961 clang: plugins and build system integration 59/51659/4
Timothy Day [Thu, 13 Jul 2023 04:19:41 +0000 (04:19 +0000)]
LU-16961 clang: plugins and build system integration

Clang has a plugin system. Compiler extensions can be created
by making a shared library and loading it via the "-fplugin"
options. This makes it simple to implement custom warnings
and static analyzers.

This patch adds a plugin to detect functions that should have
been made static. This plugin has been run over the majority
of the Lustre tree and patches have been submitted for all
warnings. The plugin did not return any false positives in
my testing.

It also add the "--enable-compiler-plugins" configure option,
which automatically builds and sets up the in-tree C compiler
plugins. The option force-enables the plugin regardless of
which compiler is in use. This behavior could be changed if
there is ever a need to support GCC specific plugins.

Also, add the configure checks needed to support building C++
in the Lustre tree. Clang and GCC plugins (and the compilers
themselves) are written in C++.

The license for the plugin mirrors that of the LLVM project
itself. This leaves the door open for contributing this
plugin upstream in the future. This isn't being upstreamed
now because it lacks any significant user community. Hence,
the plugin does not appear to meet the requirements for
upstreaming based on https://clang.llvm.org/get_involved.html.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I747ed91b53e765cc58e91a3eb9ec6c12b9908a96
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51659
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16605 lfs: Add -n option to fid2path 26/51626/12
Arshad Hussain [Tue, 11 Jul 2023 05:55:36 +0000 (11:25 +0530)]
LU-16605 lfs: Add -n option to fid2path

Add '-n' option to fid2path to allow printing
only the filename of the file instead of the
whole parent pathname.

Test-case sanity/226d added.

Test-Parameters: trivial testlist=sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieebd39a1655b4e3ad20cdbb4941dbb44882845f4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16943 tests: fix replay-single/135 under hard failure mode 74/51574/6
Jian Yu [Wed, 12 Jul 2023 13:48:41 +0000 (21:48 +0800)]
LU-16943 tests: fix replay-single/135 under hard failure mode

This patch fixes replay-single test_135() to load libcfs module
on the failover partner node to avoid 'fail_val' setting error.
It also fixes the issue that not all of the OSTs are mounted after
failing back ost1.

Test-Parameters: trivial env=REPLAY_SINGLE_EXCEPT=200 testlist=replay-single
Test-Parameters: trivial env=REPLAY_SINGLE_EXCEPT=200 fstype=zfs testlist=replay-single

Test-Parameters: trivial env=REPLAY_SINGLE_EXCEPT=200,FAILURE_MODE=HARD \
    clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
    austeroptions=-R failover=true iscsi=1 \
    testlist=replay-single

Change-Id: Id46c722a6db9d832829a739f41f7462b32a6d9d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51574
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16936 auster: add --client-only option 09/51509/4
Timothy Day [Thu, 29 Jun 2023 15:37:21 +0000 (15:37 +0000)]
LU-16936 auster: add --client-only option

Add flag to auster to run sanity tests only on the
client-side. This leverages some existing functionality
to avoid having to setup ssh to filesystem hosts and
some other tedious setup.

Force test-framework.sh to honor the --no-setup flag.
Several test suites attempt to setup Lustre even if
auster says not to. Some lower level tests, like those
related to OBD device loading, require Lustre to be
not setup.

Change some [ to [[ in test-framework.sh to silence
some error messages.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I24de10743c3845b51fe29518ffc993b15a7c2cdd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16883 ldiskfs: update for ext4-delayed-iput for RHEL9.0 76/51376/2
Shaun Tancheff [Tue, 20 Jun 2023 07:31:53 +0000 (14:31 +0700)]
LU-16883 ldiskfs: update for ext4-delayed-iput for RHEL9.0

ext4-delayed-iput patch does not apply cleanly to RHEL9.0

Adjust the minor conflict in ext4_put_super()

Test-Parameters: trivial
Fixes: 616fa9b581 ("LU-15404 ldiskfs: use per-filesystem workqueues to avoid deadlocks")
HPE-bug-id: LUS-11661
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia8c2dcda50417b113399973f177a14283514a1ff
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16896 flr: resync should not change file size 44/51344/6
Bobi Jam [Sat, 17 Jun 2023 00:51:26 +0000 (08:51 +0800)]
LU-16896 flr: resync should not change file size

mirror resync could punch a hole reaching the end of file in a
mirror, which could change the file size when the mirror is referred.

This patch calls truncate after punch in this case to keep the file
size unchanged in the mirror.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ia0fc1f220a32a60f3516c69e86867796ae5c35c7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>