Whamcloud - gitweb
Timothy Day [Mon, 23 Oct 2023 15:50:23 +0000 (11:50 -0400)]
LU-12610 obd: delete OBD_ -> CFS_ redefinitions
With all consumers of the OBD macros fixed, finally
remove OBD macros that are simply redefinitions of CFS
macros.
Remove:
OBD_FAIL_PRECHECK(id)
OBD_FAIL_CHECK(id)
OBD_FAIL_CHECK_VALUE(id, value)
OBD_FAIL_CHECK_ORSET(id, value)
OBD_FAIL_CHECK_RESET(id, value)
OBD_FAIL_RETURN(id, ret)
OBD_FAIL_TIMEOUT(id, secs)
OBD_FAIL_TIMEOUT_MS(id, ms)
OBD_FAIL_TIMEOUT_ORSET(id, value, secs)
OBD_RACE(id)
OBD_FAIL_ONCE
OBD_FAILED
Avoid losing the unlikely optimization with OBD_FAIL_PRECHECK by
adding unlikely to CFS_FAIL_PRECHECK. For libcfs_private.h not
all callers of CFS_FAIL_PRECHECK had unlikely so this is fixed
as well.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I6620bae389a9e29da2c0258b07f0ca2a7f67c14a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/35640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bobi Jam [Wed, 25 Oct 2023 06:58:35 +0000 (14:58 +0800)]
LU-16837 lov: NULL dereference in lov_delete_composite
commit
14ed4a6f8f retroduced the issue fixed by commit
5da049d9ef ("LU-14389 lov: avoid NULL dereference in cleanup), this
patch makes the fix cover the new case added by
14ed4a6f8f.
Fixes:
14ed4a6f8f ("LU-16837 llite: handle unknown layout component")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4a2b72e21139b60519ed523b4851723c91f523c1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Li Dongyang [Mon, 23 Oct 2023 11:49:55 +0000 (22:49 +1100)]
LU-11912 tests: fix racing in force_new_seq_all
We run force_new_seq in parallel to reduce time spent
on consuming precreated objects.
However this could be racy when multiple MDTs are on
the same MDS, a task could finish for one MDT early
and reset the fail_loc to 0 on MDS while other tasks
are still working on other MDTs.
Replace OBD_FAIL_OSP_FORCE_NEW_SEQ with a new param
prealloc_force_new_seq for osp, so we can control
the seq rollover individually for each osp device.
Change-Id: I52dbd550564ca628a8a85c42951694d58b2b93a9
Fixes:
656fc937cf ("LU-11912 tests: consume precreated objects in parallel")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Fri, 20 Oct 2023 08:25:47 +0000 (04:25 -0400)]
LU-6142 tests: Add missing /tmp/target2 under cleanup_src_tgt
cleanup_src_tgt() is called after each test case
under lustre-rsync-test.sh. The cleanup function
was missing /tmp/target2/... cleanup. This patch
adds the missing cleanup of folder /tmp/target2/...
It is good to have a clean start for next test case
as the diff (comparison) is performed with these
folders.
This patch also fixes space/tabs missmatch reported
by checkpatch
Test-Parameters: trivial testlist=lustre-rsync-test
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieb7cfa60d894f43f1aa7b2510d03bd07eeb90a1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Li Dongyang [Wed, 18 Oct 2023 05:15:41 +0000 (16:15 +1100)]
LU-11912 tests: force new seq in runtests
If seq rollover happens during runtests/1,
the new seq on OST will consume some space and
this will fail the free space check.
Force a new seq before running test case to
prevent this.
Change-Id: I7bb1156127eb423889626bf84bc6c87dd68e6ece
Test-Parameters: trivial testlist=runtests
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52741
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Sun, 21 May 2023 16:48:25 +0000 (12:48 -0400)]
LU-13805 llite: add flag to disable unaligned DIO
As with any new IO feature, it's a good idea to have the
option to turn off unaligned DIO support if needed.
It would be reasonable to merge this patch with the core
patch implementing the feature; I have kept it separate for
ease of review.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ibc86d84704151a7f30afcc538d9c03e3fdf1c38a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51125
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Sun, 9 Jul 2023 13:55:39 +0000 (09:55 -0400)]
LU-10003 lnet: migrate old route API to Netlink / YAML API
With LNet route management now supported with the Netlink / YAML
API we can move the pre-MR route handling away from the
old ioctls. This allows use of large NIDs as well for IPv6
support.
Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY="67"
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I1526d3e27967a10f556b8e31f15803659c0164bd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50440
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Mon, 7 Aug 2023 13:23:15 +0000 (09:23 -0400)]
LU-10391 lnet: migrate router management to Netlink
Add the doit Netlink handling for routes. The creation and
deletion of routes can now work with large NIDs using the
Netlink API.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ia9fc64785962ce3e74abeef9c77e8b2df0789dd7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50254
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Mon, 25 Sep 2023 19:03:20 +0000 (14:03 -0500)]
LU-17103 lnet: Avoid deadlock when updating ping buffer
lnet_peer_send_push() adds a reference to the the_lnet.ln_ping_target
lnet_ping_buffer. This reference is dropped by
lnet_discovery_event_handler. When the LNet configuration is modified
the ln_api_mutex is held and lnet_ping_target_update() is called to
update the ln_ping_target to reflect the new configuration.
While holding the ln_api_mutex, lnet_ping_target_update() will wait
until all refs on the old ping buffer are dropped. This can result
in a deadlock if the ln_api_mutex is required to complete the push.
Here is one scenario where this can happen:
1. PUSH is sent by discovery thread
2. LNet configuration is modified. lnetctl process is holding
ln_api_mutex and waiting in lnet_ping_target_update()
3. Local NI goes into recovery
4. Monitor thread wakes and attempts to send ping to local NI. If this
is the first ping sent to this NI then monitor thread needs
ln_api_mutex to create peer NI object for local NI.
(LNetGet->
lnet_send->
lnet_select_pathway->
lnet_peerni_by_nid_locked->
mutex_lock(&the_lnet.ln_api_mutex))
5. PUSH (1) fails with local timeout. It is placed on monitor thread
resend queue.
6. monitor thread cannot process resend queue until it acquires
ln_api_mutex. ln_api_mutex cannot be acquired until monitor thread
processes resend queue. Deadlock.
Fix is to drop ln_api_mutex before calling lnet_ping_md_unlink() in
lnet_ping_target_update(). This should ensure that updates to the
ping target are still synchronized via ln_api_mutex as intended, but
we're able to clear refs on the old ping buffer as needed.
Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=207,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I20cda185a865192f1ad162eaef1b8b4e5d751b2c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52479
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Alex Zhuravlev [Thu, 12 Oct 2023 18:13:23 +0000 (21:13 +0300)]
LU-17188 mdt: remove n from LDLM_DEBUG
LDLM_DEBUG() doesn't need n in an extra message
Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5a62cccb0a17b3f878206e8bbec6c1fbe07c4753
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52673
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Arshad Hussain [Thu, 12 Oct 2023 11:12:10 +0000 (07:12 -0400)]
LU-17000 coverity: Fix 'Extra argument' under lst.c
This patch fixes 'Extra argument to printf' error reported
by coverity run.
CID: 397265 ("Extra argument to printf"): lst.c
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iae039e9c6a27a71a852e7d44baab840eac17795e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Thu, 12 Oct 2023 10:27:16 +0000 (06:27 -0400)]
LU-17000 coverity: Fix use before null under pcc.c
This patch fixes 'dereference before null' error reported
by coverity run.
CoverityID: 397538 ("Dereference before null"): pcc.c
Test-Parameters: trivial testlist=sanity-pcc
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic125f74b61e500e2184bde894aade7119b7a2899
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52658
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Thu, 12 Oct 2023 07:58:12 +0000 (15:58 +0800)]
LU-17182 utils: pool_add send OSTs in one batch
'lctl pool_add' command sends all requests in one batch
then checks results. In this way, the command won't take
too long time if the OSTs are specified in command line
one by one.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: Ibd6e7ed5104e100d44c5f4288a25e7378cd9cfe8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52654
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Tue, 10 Oct 2023 00:07:24 +0000 (00:07 +0000)]
LU-17151 tests: increase sanity/411b memory limit
This test fails most of the time when run using
arm clients. It seems like the cgroup memory limit
was increased in a past revision for a similar issue.
Increase it a bit more for aarch64. Increase it a
smaller amount for x86.
Also, add some better logging for some other issues.
There's likely a better fix for this, but hopefully
this will let the test pass and provide some value
without having to do a full revert.
Fixes:
8aa231a99 ("LU-16713 llite: writeback/commit pages under memory pressure")
Test-Parameters: trivial
Test-Parameters: testgroup=review-ldiskfs-arm testlist=sanity env=ONLY=411b,ONLY_REPEAT=50
Test-Parameters: clientdistro=el8.7 testlist=sanity env=ONLY=411b,ONLY_REPEAT=50
Test-Parameters: clientdistro=el9.1 testlist=sanity env=ONLY=411b,ONLY_REPEAT=50
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If850077c0d7f6466082433776d370d24eee9736c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Fri, 6 Oct 2023 05:41:53 +0000 (11:11 +0530)]
LU-17000 coverity: llverfs: Add check for -n option
CID: 397833 (Uninitialized scalar variable): llverfs.c
This patch also add missing "-n" opiton under man page
Test-Parameters: trivial testlist=sanity,conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8c795edba0cfde80402db0f877caeb91663629a3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Wei Liu [Tue, 3 Oct 2023 22:31:52 +0000 (15:31 -0700)]
LU-10026 tests: Fix sanity test_56ab for CSDC
Use /dev/urandom in sanity test_56ab so the data cannot be compressed
Test-Parameters: trivial testlist=sanity env=ONLY=56ab
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: Ib0108f4507a7b46ad5ca973ef43351e005232edf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52572
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bruno Faccini [Tue, 26 Sep 2023 09:15:35 +0000 (11:15 +0200)]
LU-17146 tests: avoid sanity-lfsck/test_38 failure
This regression has been introduced in kernels after commit
v5.11-10234-gcbd59c48ae2b (5.12), and is fixed with
commit v6.2-rc4-61-g5956592ce337 (6.2).
The issue has been introduced by upstream
commit
8c8387ee3f55
("mm: stop filemap_read() from grabbing a superfluous page").
Skip sanity-lfsck/test_38 for this range of kernels.
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ic6066e43959c913c2f225d229927803471f06cee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52537
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Tue, 3 Oct 2023 20:43:11 +0000 (13:43 -0700)]
LU-17152 tests: unmount NFS clients with zconf_umount_clients
This patch fixes cleanup_nfs() to unmount NFS clients by running
zconf_umount_clients(), which can find and kill active processes
that are accessing the NFS mount point so as to avoid the
"device is busy" failure.
The patch also adds racer_on_nfs test into always_except list for
parallel-scale-nfsv4 due to LU-17154.
Test-Parameters: trivial testlist=parallel-scale-nfsv4
Change-Id: I37a38502362399540c28e78d1343e768b490ce8b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52533
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Fri, 22 Sep 2023 16:19:34 +0000 (18:19 +0200)]
LU-17138 enc: prefer specific crypto engines
Some ciphers provided by external accelerators might register under
the generic cipher name. To avoid using them with Lustre, prefer the
AES-NI variant implemented directly in the CPU. And fallback to the
generic cipher if AES-NI is not available.
Introduce a new libcfs kernel module parameter named
'client_encryption_engine' to give the ability to choose the cipher.
By default its value is 'aes-ni', which makes Lustre look for the
AES-NI cipher first. This parameter can be set to 'system-default'
whic makes Lustre pick the generic cipher.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8b00f1c3c8dcf11c58e9f40a410b57b2f255e642
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52477
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Thu, 7 Sep 2023 07:28:45 +0000 (09:28 +0200)]
LU-17015 gss: support large kerberos token for rpc sec init
If the current Kerberos setup is using large token, like when PAC
feature is enabled for Kerberos, authentication can fail due to server
side unable to exchange token between kernel and userspace.
This limitation is inherent to the sunrpc cache mechanism, that can
only handle tokens up to PAGE_SIZE.
For RPC sec init phase, use Lustre's upcall cache mechanism
instead of deprecated kernel's sunrpc cache. The upcall calls a new
userspace command 'l_getauth', that fowards the sec init request to
the lsvcgssd daemon via Unix domain sockets.
Test-Parameters: kerberos=true testlist=sanity-krb5
Change-Id: I709cd79894a5a13fc4cdfab2109c86f2230db3b8
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Arshad Hussain [Fri, 1 Sep 2023 04:37:56 +0000 (10:07 +0530)]
LU-17000 misc: Fix Unused Value error(0)
This patch fixes unused value error reported
by coverity run.
CoverityID: 397913 ("Unused value"): lquota_disk.c
CoverityID: 397912 ("Unused value"): libmount_utils_zfs.c
Test-Parameters: trivial fstype=zfs testlist=sanity-quota,conf-sanity,sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I509592ddc9b8fa3e9a6a7dcef4e476ad4fc9d9fc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52216
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Thu, 31 Aug 2023 15:29:37 +0000 (11:29 -0400)]
LU-17065 build: Remove snmp support
Last patched in 2012 and not well tended to before that,
snmp support can probably be removed.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1831aed42e560531e57a6ff8aa978f3e5286fd44
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52194
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Fri, 15 Sep 2023 13:34:24 +0000 (09:34 -0400)]
LU-9859 libcfs: move kernel specific code out of libcfs core
Over time kernel version specific code has leaked into the libcfs
core code. Move that code to the linux subdirectory code so in
the future code cleanup is not missed.
Test-Parameters: trivial
Change-Id: I38a00c377334066160083edd3932d4a718198497
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52010
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Fri, 13 Oct 2023 13:59:42 +0000 (08:59 -0500)]
LU-17193 build: fix gcc-12 compiler warnings
Building on el9.2 hit a couple of new errors in configure, ex:
((struct inode_operations *)1)->fileattr_get()
hits:
error: array subscript 0 is outside array bounds
of ‘struct inode_operations[0]’
A few instances of QCTL_COPY() should be QCTL_COPY_NO_PNAME()
as the zero-length array to hold the pool name is not
allocated in these cases.
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I72bda8b46c51dbd42fb42bf569ba29572526acfe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Sun, 17 Sep 2023 04:34:21 +0000 (23:34 -0500)]
LU-17024 build: Make o2ib defines consistent
Move all O2IB specific configure tests to use HAVE_O2IB_
as the #define prefix.
Making the defines consistent removes some complexity
when cloning the o2ib sources to create in-kernel-o2iblnd
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I82e72a993abc51ad260d880317da32a75929583b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51937
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Timothy Day [Mon, 22 May 2023 14:26:21 +0000 (14:26 +0000)]
LU-930 doc: add lnet man page
Add a simple lnet man page which provides a short description of
lnet and points to all lnet related commands, like lnetctl and lst.
This page will also help users find more lnet related pages using
apropos.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia3c1be4e60118e00ab9378ef7cd2d53b25196cf2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51085
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Tue, 9 May 2023 05:07:36 +0000 (05:07 +0000)]
LU-8802 obd: remove MAX_OBD_DEVICES
Remove this arbitrary limit by reimplementing the array as an
Xarray. Xarray can grow and shink dynamically, hence saving
memory and allow for many more OBD devices. There is still
technically a limit OBD_MAX_INDEX, which is xa_limit_31b.max
or around 2 billion. This is far more than is practically
useful.
This patch also adds various iterators for OBD devices, which
are used to simplify code in various places.
Removing class_obd_list() since it is unused. Rename
class_dev_by_str() to class_str2obd() to keep the pattern.
Several class_* functions have been refactored to improve
locking. The larger issue of OBD device locking will be
addressed separately.
Update the OBD device lifecycle test to try loading
more devices (about 24,000 for now).
Currently, adding an additional OBD device is an O(n^2)
operation due to the class_name2dev calls in
class_register_device(). This will be addressed in a future
patch adding a hash table for OBD device name lookups.
Further, OBD life cycle management could likely be simplified
by using Xarray marks. Right now, it is handled by a bit
field in the obd_device struct. Since the scope of the changes
needed to simplify this seem large, this will also be addressed
separately.
Test-Parameters: testlist=sanity env=ONLY=55,ONLY_REPEAT=10
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Icb2cd94a5529e79f5d3ebd0de5e0f225cf212075
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Thu, 18 Nov 2021 00:34:46 +0000 (08:34 +0800)]
LU-15246 ptlrpc: per-device adaptive timeout parameters
When a client is mounting multiple filesystems with different
MGSes setting global parameters at_min, at_max, etc., then the
settings from one filesystem's MGS config will also apply to RPCs
sent for the OSC, MDC, and MGC devices on the other filesystem(s).
Typically the settings of the last filesystem to mount on the client
override the earlier values, and there is no way to separate them.
Moving the parameters to be per-device values allows them to be
set independently for each set of client devices, so that the
client can interact properly with each set of servers. This allows
e.g. different timeouts for local and remote mounts, or for flash
and HDD filesystems that have different load and performance.
Add per-device adaptive timeout parameters that can optionally
replace global parameters of the same name:
at_min -> *.<fsname>*.at_min
at_max -> *.<fsname>*.at_max
at_history -> *.<fsname>*.at_history
ldlm_enqeue_min -> *.<fsname>*.ldlm_enqueue_min
These parameters should always be set with fsname in the device
name, rather than pure wildcard '*' device names, or it will be
be same as the global parameters in the end (settings from one
MGS will apply to devices on other filesystems). That is a bug
in how "lctl set_param -P" works, but will be fixed separately.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I5b04c9aa53a446fb5a78bfaff372b4f236c9eb8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45598
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Vladimir Saveliev [Mon, 11 Sep 2023 19:32:00 +0000 (22:32 +0300)]
LU-14708 ptlrpc: skip unnecessary client eviction
A server does not update last_rcvd file on connection of new clients
synchronously. If the server fails over before the last_rcvd update is
committed, recently connected client may find itself evicted
unexpectedly.
If a client has not cached any data from a server and has not
performed any modifying rpcs to the server - let the client to connect
as a new one instead of considering itself as evicted.
Test to illustrate the issue is included.
Fixes:
dcc8b9c00d5 "LU-9679 ptlrpc: list_for_each improvements"
Change-Id: I0c2d9c3b67cbc69c3283422f1f581b42f7f13a1a
HPE-bug-id: LUS-7141
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43834
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Ben Evans [Mon, 20 May 2019 12:46:10 +0000 (08:46 -0400)]
LU-12316 llog: convert llog_client defines to functions
Convert preprocessor functions into actual functions. The
original commit that added these macros (
81c713adeb) did
not provide an explaination for why these were chosen to
be macros rather than functions. Hence, bias towards using
functions rather than complex macros since they appear
in stack traces, are easier to debug, and simpler to
develop with.
Signed-off-by: Ben Evans <bevans@cray.com>
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7a73e9af57dcf5edb9d013272ff5cf85e4234146
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/34904
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Ryan Haasken [Tue, 3 May 2016 19:49:57 +0000 (15:49 -0400)]
LU-5134 utils: Add parallel option to lctl set_param
Add a "-t" option to lctl set_param to enable setting multiple matched
parameters in parallel. When called with "-t", lctl will set up a work
queue of matched file names and spawn a fixed number of threads per
CPU. Each thread will pull items off the work queue, write to the file
associated with each work item, and return when there are no more
items on the work queue.
A field called po_parallel_threads is added to struct param_opts to
indicate the number of threads set_param should run in parallel. If in
parallel, jt_lcfg_setparam initializes a work queue and passes it to
do_param_op, which adds each matched item to the work queue. Once
jt_lcfg_setparam has called do_param_op for each param-value pair, it
passes the work queue to sp_run_threads, which creates threads, each
of which call write_param to set the parameter. If not in parallel,
jt_lcfg_setparam does not pass a work queue to do_param_op, and
do_param_op directly calls write_param on each matched param.
param_display was renamed to do_param_op to more accurately reflect
what it does.
If lctl is compiled without pthread support, "lctl set_param" will
still accept the "-t" option, but it will print a warning message, and
it will set the parameters in series.
The new "-t" option to set_param was documented in the lctl usage and
in the man page.
HPE-bug-id: LUS-2592
Signed-off-by: Ryan Haasken <haasken@cray.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3f96a6f06c50d4ba2ce97050c35f46b976dfc005
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/10555
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Wed, 20 Sep 2023 06:43:14 +0000 (12:13 +0530)]
LU-6142 checkpatch: Add check for %zd
CWARN() or CERROR() requires a certain format. That is
it should start with '%s:' and end with '%d' or '%ld'.
Otherwise it will throw an warning. This patch extends
checks for type '%zd'. As few return types could be of
size_t.
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ida4f41af6b8fda0b028daeea18231b2d01905c31
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Sat, 16 Sep 2023 05:54:54 +0000 (00:54 -0500)]
LU-17062 lnet: Update lnet_peer_*_decref_locked usage
Move decref's to occur after last reference to prevent
use after free.
HPE-bug-id: LUS-11799
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2382ece560039383f644b6aee73a9481d6bb5673
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Boyko [Thu, 17 Aug 2023 10:03:07 +0000 (06:03 -0400)]
LU-17040 scrub: inconsistent item
When OI does not include the fid, scrub will attempt to
fix it with zero inode number. There is
low chance that fid would be found during full inode
scan. But inode scan requires an empty inconsistent
list. With repeated EINPROGRESS replies, inconsistent list is
always not empty.
Move fid with zero inode numbers to stale list.
1 scrub fix to print real OI resurect and
skip not related
2 out_handle debug for dt_locate() fid failed
3 debug for out requests when it was interrupted
HPE-bug-id: LUS-10780
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iad9e9cba90b4648eb0fe8fa6c99984ada60fde70
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51997
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexey Lyashkov [Wed, 12 Jul 2023 14:37:26 +0000 (17:37 +0300)]
LU-16952 debug: use right io object to check
cp_owner is io_top(io) not an io itself,
let's fix it to make invariants happy.
HPe-bug-id: LUS-11707
Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ib473e8616e2c339fd54cd96af7933e437e8f5869
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51676
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexey Lyashkov [Wed, 12 Jul 2023 14:35:16 +0000 (17:35 +0300)]
LU-16952 debug: don't put extra new line at output
n already in output line.
HPe-bug-id: LUS-11707
Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I30667b5fe269aa8b80d5b2101d397f34b75f8e30
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51675
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Mon, 7 Aug 2023 13:38:51 +0000 (09:38 -0400)]
LU-10391 lnet: migrate peer NI control to Netlink
Move peer creation and deletion to the Netlink API. This change
enables the creation of peers with large NID addresses.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7f2f75e73e3f39856751f65e240f2172f703d0bc
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49574
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Sat, 7 Oct 2023 06:21:30 +0000 (02:21 -0400)]
LU-9680 lnet: collect data about peer_ni by using Netlink
Migrate the LNet peer NI API to use the Netlink API for
the case of collecting data on peer. This change also
allows large NID support for IPv6. Since this doesn't
cover creation and deletion of peers we can't setup
large NID peers just yet.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Iefa3f566255e768047b0f9ff21d64bc74634f284
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49516
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Thu, 5 Oct 2023 13:52:54 +0000 (09:52 -0400)]
LU-10391 lnet: migrate full LNet NI information collection
Fill in the rest of the LNet NI state to report to user land. This
mostly covers the stats information of the NI and LND specific
tunables. To handle the LND specific tunables we need to reorder
the code to send an updated key table. With the additional
information I found status wasn't properly set and the nesting
for was properly set for multiple NIs per NET. This is now fixed.
Test-Parameters: trivial testlist=sanity-lnet
Fixes:
8f8f6e2f36e ("LU-10003 lnet: use Netlink to support old and new NI APIs")
Change-Id: I32b06b1ce8cb049a33f45f2310d31897ffa7dc90
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50255
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Wed, 13 Sep 2023 16:24:12 +0000 (12:24 -0400)]
LU-6174 lov: use standard Linux 64 divison macros
For the case of 64 bit platforms lov_do_div64() is basically the
same as div_u64_rem(). For the 32 bit platform case lustre tries
to work around the issue of the divisor being larger than 32 bits.
We can also use div64_u64_rem() as well for 32 bit platforms since
Linux supports that as well.
With the move to Linux div64 operations we need to migrate fields
related to 64 bit size fields. To pick the correct div64 function
we need to select the proper sign value. The lov layer often uses
loff_t, which is signed, for stripe sizes which doesn't make sense
since sizes should always be positive. Creating a negative size by
mistake will give an bizarre offset into a file. Avoid any potential
issues with signed vs unsigned issues by replacing the loff_t use
in this case with u64.
Test-Parameters: testlist=large-lun env=ONLY=5,REFORMAT=yes
Change-Id: Ieadc266d43d6be1d6d47ee14ba9ac0dab01e7d86
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39343
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Tue, 19 Sep 2023 18:12:22 +0000 (12:12 -0600)]
LU-15671 llite: cleanup code style in xattr.c
Clean up code style in llite/xattr.c.
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I511d12daefdb509adbee65aaba9dea911ebaa18a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52418
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Wed, 11 Oct 2023 20:27:04 +0000 (23:27 +0300)]
LU-17187 ldiskfs: use ext4_fsblk_t for block pointers
instead of ext4_lblk_t which is 32bit while actual block
pointers can be larger than 2^32.
Fixes:
0f7e6c02a9 ("LU-16843 ldiskfs: merge extent blocks")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I201cfa5cb04907eef05bc87abc5701e8aed39d62
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52633
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Etienne AUJAMES [Thu, 14 Jan 2021 16:58:40 +0000 (17:58 +0100)]
LU-14027 tests: Fix test_135 of replay-single
The test_135 ("Server failure in lock replay phase") of replay-single
has an error on the method to get the pid of a background process.
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Fixes:
7ca495ec67 ("LU-14027 ldlm: Do not hang if recovery restarted during lock replay")
Test-Parameters: trivial
Test-Parameters: testlist=replay-single env=ONLY=135,ONLY_REPEAT=50
Test-Parameters: testlist=replay-single env=ONLY_REPEAT=20,ONLY=135,FAILURE_MODE=HARD clientcount=4 mdtcount=1 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 testlist=replay-single
Change-Id: I6ed41d75f4cbba796e39288bad8895ee1c24459f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41227
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Tue, 26 Sep 2023 16:15:38 +0000 (19:15 +0300)]
LU-17076 nrs: wait for RCU completion
before we destroy the slab holding the objects scheduled
for a release via RCU.
BUG: unable to handle kernel paging request at
000000046474e5c6
Oops: 0000 [#1] SMP
CPU: 0 PID: 9 Comm: ksoftirqd/0 4.18.0 #3
RIP: 0010:kmem_cache_free+0xe3/0x170
Call Trace:
rcu_core+0x27a/0x770
__do_softirq+0xc2/0x44d
run_ksoftirqd+0x35/0x50
smpboot_thread_fn+0xb8/0x160
kthread+0x14a/0x170
Fixes:
42bf5f78ba ("LU-8130 nrs: convert NRS ORR/TRR to rhashtable")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia4b1cf6f190f17c3b85548fcb6876be72822cdd2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52515
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Sebastien Buisson [Fri, 22 Sep 2023 15:48:51 +0000 (17:48 +0200)]
LU-17015 gss: bump token buffer size to 16KiB
A 4 KiB large buffer is not enough to hold the GSS token under some
circumstances. So bump GSS_CTX_INIT_MAX_LEN value to 16 KiB.
Fixes:
9758129177 ("LU-17015 gss: support large kerberos token on client")
Test-Parameters: trivial kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8e72f1447593d2bf2ae537fcc920ceee20e93c09
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52475
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Fri, 22 Sep 2023 09:20:22 +0000 (11:20 +0200)]
LU-12896 gss: key can be unlinked when timeout expires
The key associated with a GSS context could appear to be already
unlinked when the upcall timeout expires. In this case, do not assert
but report this case with a warning message.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I714af3a1ce54648c4ba29ef13015f9291de52765
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52473
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Tue, 26 Sep 2023 19:45:07 +0000 (12:45 -0700)]
LU-16218 utils: add component flags "prefrd" and "prefwr"
The initial implementation of "lfs setstripe ... --comp-flags=prefer"
only allowed specifying a single "prefer" argument for a given
mirror component, which would set both the "LCME_FL_PREF_RD" and
"LCME_FL_PREF_WR" flags at the same time.
This patch adds the separated component flags "prefrd" and "prefwr"
to allow setting the individual flags on a component.
Test-Parameters: trivial testlist=sanity-flr
Change-Id: I3e413cb37fab7ab2834946536705ce61a3feeed4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52508
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Fri, 22 Sep 2023 13:01:56 +0000 (16:01 +0300)]
LU-17136 ldiskfs: increase max extent tree depth
this is an workaround until LU-16843 ready
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5829c10888bf32649fe7a7a72c8ee697647a89cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52474
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Tue, 19 Sep 2023 07:03:20 +0000 (09:03 +0200)]
LU-17129 tests: cleanup fileset info on nodemaps
In sanity-sec, fileset info added to nodemaps via 'set_param -P' must
be removed afterwards with 'set_param -P -d', otherwise those commands
will remain in the llogs.
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-sec env=ONLY=27
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I75bd0dc263f71c7f5d9ece028cc038eb1f2ca9a4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52408
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Wed, 20 Sep 2023 18:45:43 +0000 (11:45 -0700)]
LU-17109 kernel: new kernel [SLES15 SP5 5.14.21-150500.55.22.1]
This patch makes changes to support new SLES15 SP5 release
with kernel 5.14.21-150500.55.22.1 for Lustre client.
Test-Parameters: trivial clientdistro=sles15sp5 testlist=sanity
Test-Parameters: trivial clientdistro=sles15sp4 testlist=sanity
Change-Id: I278017a5c996a8cf4e3d604aa928e968ca007312
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Mon, 4 Sep 2023 17:30:26 +0000 (20:30 +0300)]
LU-17084 lod: fix comparision in lod_striping_load()
in if (rc > sizeof(struct lmv_foreign_md)) the latter
is unsigned and gcc treats rc (which is defined as int
and can be negative to encode an error) as unsigned.
this way -EIO becomes greater than the size of the
structure. make sizeof() signed to avoid confusion.
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie6735578649e397ed05b6951fab941f97051305b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52265
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Arshad Hussain [Tue, 29 Aug 2023 10:09:48 +0000 (15:39 +0530)]
LU-16796 target: Change struct barrier_instance to use kref
This patch changes struct barrier_instance to use
kref(refcount_t) instead of atomic_t
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I2d2312b64d873e58bbef449c0867c679feb0c31b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Fri, 23 Jun 2023 20:31:30 +0000 (20:31 +0000)]
LU-8191 libcfs: convert functions to static, removed function
Static analysis shows that a number of functions
could be made static. This patch declares several
functions in libcfs static.
Remove cfs_expr_list_values_free in string.c, since
it is not used.
A header was missing in param.c, causing a number of
functions to be missing declarations.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia580881efa806bde49d532e5b2d8f5097f2294e0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51428
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Fri, 15 Sep 2023 18:38:01 +0000 (13:38 -0500)]
LU-16962 build: cleanup configure messages
Take advantage of LB2_MSG_LINUX_TEST_RESULT to cleanup
the remaining configure checks.
Re-order checking of OpenSSL support so checking
message and result are not split.
Test-Parameters: trivial
HPE-bug-id: LUS-11709
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2880e2b50f4cc79106201c241fe7c078e5d8c37e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51857
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Li Dongyang [Thu, 20 Jul 2023 13:12:19 +0000 (23:12 +1000)]
LU-15002 utils: disable meta_bg and enable packed_meta_blocks
To take advantage of the upcoming e2fsprogs changes,
disable meta_bg to allow group descriptor blocks allocate
beyond group#0. This allows the group descriptor blocks
to be packed and placed at the beginning, rather than
distributed across the entire file system with meta_bg.
Enable packed_meta_blocks to place bitmaps, inode table
and journal at the beginning of the file system.
Due to LU-16971, use 200 blocks when testing mke2fs options.
Re-enable lazy_journal_init, which got turned off accidentally.
Fix up the handling of extented options, there should be spaces
in front and after -E when building the mke2fs cmd.
Test-Parameters: trivial testlist=conf-sanity
Change-Id: If4385c1d3740bf7a47bd12f90c2c93046bc8bead
Fixes:
701cc24959 ("LU-13533 utils: ext4lazyinit should be disabled")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51723
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Mon, 4 Sep 2023 12:34:44 +0000 (15:34 +0300)]
LU-16966 osd: take trunc_lock for fallocate
as fallocate may need few transactions (or transaction restarted)
we have to avoid any concurrent writes/truncates on this object
until fallocate supports 'restart-from-beginning' - first stop the
transaction, then release the lock, then repeat again (like
the write path does).
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0bf38b1886fbf24656b45fe0f87fcbad2227672a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Tue, 19 Sep 2023 15:09:54 +0000 (10:09 -0500)]
LU-17131 ldiskfs: el8.1 use rhel8/ext4-enc-flag.patch
Correctly specify the ext4-enc-flag.patch for el8.1
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ice1eca4f7092de7d65e5c1e5338dba6cc6e8f4ec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52411
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Mon, 18 Sep 2023 16:55:37 +0000 (09:55 -0700)]
LU-17128 build: fix lnet.service missing issue on Ubuntu 22.04
The lnet.service file was in the lustre-client-utils package
built on Ubuntu 20.04, but it was missing from the package
built on Ubuntu 22.04.
This is caused by the dpkg-buildpackage change introduced by
dpkg version 1.21.1ubuntu2.1 installed by default on Ubuntu 22.04.
To fix this issue, we need to specify build profiles explicitly
to dpkg-buildpackage via -P|--build-profiles option instead of
just setting the environment variable DEB_BUILD_PROFILES.
Test-Parameters: trivial clientdistro=ubuntu2204
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I9975ef357f0aba722c56d27eaa9b2cfbccc9c524
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52404
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Shaun Tancheff [Mon, 18 Sep 2023 10:42:24 +0000 (05:42 -0500)]
LU-17124 llite: Write and wait on FIEMAP_FLAG_SYNC
fiemap FIEMAP_FLAG_SYNC flag expects filemap_write_and_wait()
HPE-bug-id: LUS-11854
Fixes:
c16ecc8600 ("LU-5823 clio: add cl_object_fiemap()")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I40aa8d423164cf460fda5c11093a5f7b25682a96
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 14 Sep 2023 20:59:58 +0000 (14:59 -0600)]
LU-17050 tests: remove sanity-sec IDENTITY_UPCALL
In sanity-sec.sh there was an IDENTITY_UPCALL variable that
conflicted with an identically-named global variable in
test-parameters. Due to new checks by Gerrit Janitor, this
was causing any patch that ran sanity-sec.sh to log a warning.
Remove the parameter from sanity-sec.sh as it is unused.
Update code style in functions upcating identity_upcall.
Test-Parameters: trivial testlist=sanity-sec mdscount=2 mdtcount=4
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2f2ace33cf01153d16f4f25038065d33443ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52400
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Boyko [Sun, 20 Aug 2023 23:58:35 +0000 (19:58 -0400)]
LU-16830 lod: accurate OSTs iteration for the last speed
The first fix removed lqr->lqr_start_idx from calculation
of ost_idx for the last speed loop. It was not enough
because lqr->lqr_offset_idx is used at calculation and
also could be changed by another thread. And it leads
to the same OST index during a loop.
1694009994: lod_ost_alloc_rr()) #0 strt 2166 act 0 strp 0 ary 0 idx 0
1694010094: lod_ost_alloc_rr()) #1 strt 2167 act 0 strp 0 ary 0 idx 0
1694010197: lod_ost_alloc_rr()) rc -28
Fixes:
cacdaa9251 ("LU-16830 lod: improve rr allocation")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Idf4b528dfbf3995cd03580a4214aff9206a52378
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52393
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Fri, 15 Sep 2023 05:54:46 +0000 (11:24 +0530)]
LU-17000 coverity: Fix Resource Leak(1)
This patch fixes Resource leak error reported
by coverity run.
CoverityID: 397493 ("Resource Leak"): liblustreapi_mirror.c
CoverityID: 397494 ("Resource Leak"): lustre_rsync.c
Test-Parameters: trivial fstype=zfs testlist=sanity,sanityn,conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I10e9fd0945cf13824c25faa62b4310796b09bade
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52384
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Wed, 13 Sep 2023 07:24:35 +0000 (00:24 -0700)]
LU-17111 kernel: update RHEL 9.2 [5.14.0-284.30.1.el9_2]
Update RHEL 9.2 kernel to 5.14.0-284.30.1.el9_2.
Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el9.2 serverdistro=el9.2 testlist=sanity
Test-Parameters: trivial fstype=zfs \
clientdistro=el9.2 serverdistro=el9.2 testlist=sanity
Change-Id: Id80dbba6b4434a83cf925d6961d727941274edf4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Wed, 13 Sep 2023 05:12:18 +0000 (23:12 -0600)]
LU-17010 lfsck: don't dump stack repeatedly
If there are transactions started with LFSCK in dry-run mode, don't
dump the stack repeatedly, as this can spam the console logs and
significantly hurt performance.
Test-Parameters: trivial testlist=sanity-lfsck
Fixes:
0c1ae1cb9c ("LU-13124 scrub: check for multiple linked file")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0b0d64911453dc8ab947e284656311b5d0300c1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52356
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Etienne AUJAMES [Tue, 12 Sep 2023 16:06:25 +0000 (18:06 +0200)]
LU-17110 llite: fix slab corruption with fm_extent_count=0
If userspace uses fiemap with .fm_extent_count=0, .fm_extents[0] is
not allocated. Writing on the first entry without checking the extent
count could lead to memory corruption (slab).
This patch fix also the case when osc is disable: FIEMAP_EXTENT_LAST
should be set on the extent (fe_flags) and not on the fiemap struct.
Add a regression test sanityn 71d to test fiemap with
fm_extent_count=0.
Add a regression test sanity-hsm 408 to test fiemap on release files.
Fixes: 4097196 ("LU-11848 lov: FIEMAP support for PFL and FLR file")
Test-Parameters:testlist=sanityn
Test-Parameters:testlist=sanityn env=ONLY=71d,ONLY_REPEAT=20
Test-Parameters:testlist=sanity-hsm env=ONLY=408,ONLY_REPEAT=20
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: Id63c6973540187e678020977f2d555dfcbf3c634
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Wed, 13 Sep 2023 18:00:53 +0000 (11:00 -0700)]
LU-17095 build: avoid modules.order nonexistence failure
The modules.order is a temporary output file generated by
kbuild while running "make" command. Sometimes, there is
a race condition that causes the file not created and makes
make command fail as follows:
cat: ...//modules.order: No such file or directory
This patch creates an empty modules.order file to avoid
the error.
Test-Parameters: trivial
Change-Id: If779a727731f18e9409c35c0cd0deddd79559d3a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52323
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Etienne AUJAMES [Wed, 6 Sep 2023 12:24:47 +0000 (14:24 +0200)]
LU-17076 orr: take a ref after lookup_get_insert_fast()
We need to take a reference on all existing object returned by
nrs_orr_res_get(). Otherwise, the object could be freed before
nrs_orr_res_put().
Therefore, a ref should be taken if
rhashtable_lookup_get_insert_fast() returns an existing object.
This avoids the following use-after-free in sanityn test_77c:
ptlrpc_nrs_req_stop_nolock+0x3b/0x150
ptlrpc_server_finish_active_request+0x2b/0x140 [ptlrpc]
ptlrpc_server_handle_request+0x40e/0xc00 [ptlrpc]
ptlrpc_main+0xc8c/0x1680 [ptlrpc]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21
Fixes: 42bf5f7 ("LU-8130 nrs: convert NRS ORR/TRR to rhashtable")
Test-Parameters: fstype=zfs testlist=sanityn env=ONLY=77c,ONLY_REPEAT=100
Test-Parameters: fstype=zfs testlist=sanityn env=ONLY=77c,ONLY_REPEAT=100
Test-Parameters: fstype=zfs testlist=sanityn env=ONLY=77c,ONLY_REPEAT=100
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I267287a1075ee91019d3a4492b57f272a4b0cadd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jeya ganesh babu Jegatheesan [Thu, 31 Aug 2023 22:29:07 +0000 (15:29 -0700)]
LU-17089 mdd: fix for bi_writers ref counter in case of error
If the mdd_child_ops() called from mdd_trans_create returns error,
we dont call barrier_exit to decrement the bi_writers. Call
barrier_exit in case of error returned from mdd_child_ops().
This patch also added more CDEBUG messages in lustre/target/barrier.c.
Signed-off-by: Jeya ganesh babu J <jeyaga@amazon.com>
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia430df404b700167cb9207eb13ac938575a2030a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Tue, 5 Sep 2023 10:32:15 +0000 (16:02 +0530)]
LU-17000 coverity: Fix Resource Leak(0)
This patch fixes Resource leak error reported
by coverity run.
CoverityID: 339696 ("Resource Leak"): liblustreapi_layout.c
CoverityID: 397918 ("Resource Leak"): lsnapshot.c
CoverityID: 397894 ("Resource Leak"): obd.c
CoverityID: 397851 ("Resource Leak"): lfs.c
CoverityID: 397832 ("Resource Leak"): liblusteapi.c
CoverityID: 397772 ("Resource Leak"): liblusteapi_utils.c
CoverityID: 397721 ("Resource Leak"): obd.c
Test-Parameters: trivial fstype=zfs testlist=sanity,conf-sanity,sanity-lsnapshot
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5c0152014f987264df17fac78390a2afc12c9255
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Mon, 4 Sep 2023 12:45:34 +0000 (08:45 -0400)]
LU-17087 lmv: update stale tgt statfs every 1 hour
Some tgt statfs may not be initialized upon mount due to network
issues, if the filesystem is imbalanced, these tgts won't be chosen to
create directory because their bavail and ffree are 0.
If MDT is chosen by QoS, update tgt statfs that is one hour overdue,
otherwise check update the statfs of the tgt that is chosen.
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I06af8b8bd342f66cb794471df3ee0f3b127ffe05
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Qian Yingjin [Wed, 16 Aug 2023 04:02:22 +0000 (00:02 -0400)]
LU-16954 llite: add SB_I_CGROUPWB on super block for cgroup
Cgroup support can be enabled per super_block by setting
SB_I_CGROUPWB in ->s_iflags.
Cgroup writeback requires support from both the bdi and
filesystem.
This patch adds SB_I_CGROUPWB flag on super block for Lustre.
This is required by the subsequent patch series to support
cgroup in Lustre.
Adding this flags for Lustre super block will cause the remount
failure on Maloo testing on Unbutu 2204 v5.15 kernel due to the
duplicate filename (sysfs) for bdi device.
To avoid remount failure, we explicitly unregister the sysfs for
the @bdi.
Test-Parameters: clientdistro=ubuntu2204 testlist=sanity-sec
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7fff4f26aa1bfdb0e5de0c4bdbff44ed74d18c2d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51955
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
James Simmons [Tue, 19 Sep 2023 13:40:24 +0000 (09:40 -0400)]
LU-13308 mdc: support additional flags for OBD_IOC_CHLG_POLL ioctl
Currently the mdc kernel code expects the flag argument for
OBD_IOC_CHLG_POLL ioctl to only be CHANGELOG_FLAG_FOLLOW. With
IPv6 we need to send a request to the kernel to present the NID
in the struct lnet_nid format since we can't just send large NIDs
to user land if we are using older tools.
With the newer user land tools we will be sending an expanded flag
which the current kernel changelog code can't handle. Rework the
code to support the new flag if we end up with the case of newer
user land tools and an older kernel. This code will also maintain
backwards compatiblity with the older user land tools.
Change-Id: I26a80d30ce2ebf2075a2a8f510ff81d6b0b8d848
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Boyko [Fri, 2 Sep 2022 08:41:32 +0000 (04:41 -0400)]
LU-16134 utils: adds lctl apply_yaml
Commnad set_param -F is used to parse yaml file with settings,
and makes set_param -P for it. Many type of settings are based
on a specific device and have conf_param type. When such settings
goes to set_param -P, all nodes tries to apply it and many errors
happen.
systemd-udevd[568906]: Process '/usr/sbin/lctl set_param
'osp.kjcf04-OST0003-osc-MDT0000.resend_count=43''
failed with exit code 2.
The patch adds functionality for conf_param event of YAML file,
and introduces lctl apply_yaml for both types of event.
YAML example
- {device: testfs-MDT0001, event: conf_param, index: 76, parameter:
lod.qos_threshold_rr=100}
- { index: 17, event: set_param, device: general,
parameter: jobid_var, value: procname_uid }
Test-Parameters: trivial
HPE-bug-id: LUS-11116
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iec3b1f14b9ddb85ef3e110bbc4467d0d6c80c136
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48419
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Lai Siyao [Sun, 21 Nov 2021 09:53:09 +0000 (04:53 -0500)]
LU-14470 dne: striped mkdir replay by client request
Once all involved MDTs of a striped mkdir were rebooted, or MDT
recovery was aborted, this mkdir will be replayed by client request.
To correctly replay such mkdir, pack directory LMV in mkdir reply,
and save it to request from reply, and MDS should use this layout to
replay mkdir.
For MDT recovery abort case, the original mkdir may be partially
executed, so mkdir replay should check below cases and don't treat
them as error:
* name entry is found on parent directory on remote MDT.
* stripe exists on remote MDT.
For backward compatibility, Add MDS_MKDIR_LMV flag to indicate a
client requires directory LMV in mkdir reply.
Updated replay-single 100c since striped mkdir can replay now.
Updated recovery-small 130 since create fetches layout now.
Added replay-single 100e.
Test-Parameters: mdscount=2 mdtcount=4 testlist=racer,racer,racer,racer,racer
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If0cc8f4aebbe55cc28786d6b4198dbb57743feb3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Thu, 27 Jul 2023 18:36:30 +0000 (14:36 -0400)]
LU-13805 clio: bounce buffer for unaligned DIO
Direct I/O must normally be page aligned both in terms of
I/O size and memory alignment. This is because the I/O
must be page aligned before being written to disk. This
is also true for buffered I/O, but the I/O is aligned
using the page cache.
In recent versions of Lustre, direct I/O is significantly
faster than buffered I/O, due to lower overhead for page
management. Thus it is desirable to be able to do more
I/O as direct I/O.
This patch allows unaligned direct I/O by creating a buffer
inside the kernel and aligning the I/O by copying in to
this aligned buffer. Because the main cost of buffered I/O
is locking in the page cache rather than memcopy(), this is
still significantly faster than buffered I/O (though slower
than normal direct I/O).
This will eventually allow us to convert buffered I/O to
direct I/O when doing so would increase performance.
Here are some comparative benchmarks using IOR, all single
process.
UDIO is unaligned DIO.
io size 1M 4M 16M 64M
----------------------------------------------------------
BIO Write | 1502 MiB/s | 1382 MiB/s | 1683 MiB/s | 1677 MiB/s
BIO Read | 2169 MiB/s | 1902 MiB/s | 2131 MiB.s | 1955 MiB/s
DIO Write | 1010 MiB/s | 2778 MiB/s | 5905 MiB/s | 7917 MiB/s
DIO Read | 893 MiB/s | 2657 MiB/s | 4724 MiB/s | 7579 MiB/s
UDIO Write | 848 MiB/s | 1666 MiB/s | 2117 MiB/s | 2243 MiB/s
UDIO Read | 933 MIB/s | 2412 MiB/s | 3690 MiB/s | 5370 MiB/s
Unaligned DIO offers benefits vs buffered write and
buffered read, but is of course slower than DIO.
Notice on this node the best case DIO performance is
~8 GiB/s. On a node with 12 GiB/s best case DIO, best case
UDIO read is 8 GiB/s and best case UDIO write is 2.5 GiB/s.
This is because UDIO read is fully parallel, UDIO write is
not.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7eeebf9a608f006c8095b95f0677adb99f19d640
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45616
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Tue, 14 Feb 2023 18:47:54 +0000 (13:47 -0500)]
LU-13805 llite: unaligned direct_rw_pages
Add support for ll_direct_rw_pages to handle unaligned
IO by allowing both the first and last page to be partial
pages.
This has been broken off from the main unaligned DIO patch
to make it more reviewable.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I055105589d5416fe6aa82fb6a087db7b8b38c8d1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Xinliang Liu [Tue, 18 Jul 2023 03:42:19 +0000 (03:42 +0000)]
LU-16976 ldiskfs: add support for openEuler 22.03 SP2
Add ldiskfs server support for oe2203sp2.
Sync with ldiskfs-5.14-rhel9.2.series adding missing patches.
Also refine openEuler lbuild scripts.
Change-Id: I91841a7140a9f8f3182a4a329b9f04639a85e94d
Test-Parameters: trivial
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Oleg Drokin [Sat, 23 Sep 2023 06:07:00 +0000 (02:07 -0400)]
LU-16713 llite: remove unused ccc_unstable_waitq
Previous patch removed the only waiter on this waitq, so there's
no point in having it around.
Change-Id: Iceb1da2fb8958ae0bd7b0f4241cb263d02ca6dbd
Test-parameters: trivial
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52485
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Qian Yingjin [Tue, 6 Jun 2023 08:11:30 +0000 (15:11 +0700)]
LU-16713 llite: writeback/commit pages under memory pressure
Lustre buffered I/O does not work well with restrictive memcg
control. This may result in OOM when the system is under memroy
pressure.
Lustre has implemented unstable pages support similar to NFS.
But it is disabled by default due to the performance reason.
In Lustre, a client pins the cache pages for writes until the
write transcation is committed on the server (OST) even these
pinned pages have been finished writeback. The server starts
a transaction commit either because the commit interval (5
second, by default) for the backend storage (i.e. OST/ldiskfs)
has been reached or there is not enough room in the journal
for a particular handle to start. Before the write transcation
has been committed and notify the client, these pages are
pinned and not flushable in any way by the kernel.
This means that when a client hits memory pressure there can
be a large number of unfreeable (pinned and uncommitted) pages,
so the application on the client will end up OOM killed because
when asked to free up memory it can not.
This is particularly common with cgroups. Because when cgroups
are in use, the memory limit is generally much lower than the
total system memory limits and it is more likely to reach the
limits.
Linux kernel has matured memory reclaim mechanism to avoid OOM
even with cgroups.
After perform dirtied write for a page, the kernel calls
@balance_dirty_pages(). If the dirtied and uncommitted pages
are over background threshold for the global memory limits or
memory cgroup limits, the writeback threads are woken to perform
some writeout.
When allocate a new page for I/O under memory pressure, the
kernel will try direct reclaim and then allocating. For cgroup,
it will try to reclaim pages from the memory cgroup over soft
limit. The slow page allocation path with direct reclaim will
call @wakeup_flusher_threads() with WB_REASON_VMSCAN to start
writeback dirty pages.
Our solution uses the page reclaim mechanism in the kernel
directly.
In the completion of page writeback (in @brw_interpret), call
@__mark_inode_dirty() to add this dirty inode which has pinned
uncommitted pages into the @bdi_writeback where each memory
cgroup has itw own @bdi_writeback to contorl the writeback for
buffered writes within it.
Thus under memory pressure, the writeback threads will be woken
up, and it will call @ll_writepages() to write out data.
For background writeout (over background dirty threshold) or
writeback with WB_REASON_VMSCAN for direct reclaim, we first
flush dirtied pages to OSTs and then sync them to OSTs and force
to commit these pages to release them quickly.
When a cgroup is under memory pressure, the kernel asks to do
writeback and then it does a fsync to OSTs. This will commit
uncommitted/unstable pages, and then the kernel can free them
finally.
In the following, we will give out some performance results.
The client has 512G memory in total.
1. dd if=/dev/zero of=$test bs=1M count=$size
I/O size 128G 256G 512G 1024G
unpatch (GB/s) 2.2 2.2 2.1 2.0
patched (GB/s) 2.2 2.2 2.1 2.0
There is no preformance regession after enable unstable page
account with the patch.
2. One process under different memcg limits and total I/O
size varies from 2X memlimit to 0.5 memlimit:
dd if=/dev/zero of=$file bs=1M count=$((memlimit_mb * time))
memcg limits 1G 4G 16G 64G
2X memlimit (GB/s) 1.7 1.6 1.8 1.7
1X memlimit (GB/s) 1.9 1.9 2.2 2.2
.5X memlimit(GB/s) 2.3 2.3 2.2 2.3
Without this patch, dd with I/O size > memcg limit will be
OOM-killed.
3. Multiple cgroups Testing:
8 cgroups in total each with memory limit of 8G.
Run dd write on each cgrop with I/O size of 2X memory limit
(16G).
17179869184 bytes (17 GB, 16 GiB) copied, 12.7842 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.7889 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9504 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9577 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.4066 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5397 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5769 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.6605 s, 1.3 GB/s
4. Two dd writers one (A) is under memcg control and another
(B) is not. The total write data is 128G. Memcg limits varies
from 1G to 128G.
cmd: ./t2p.sh $memlimit_mb
memlimit dd writer (A) dd writer (B)
1G 1.3GB/s 2.2GB/s
4G 1.3GB/s 2.2GB/s
16G 1.4GB/s 2.2GB/s
32G 1.5GB/s 2.2GB/s
64G 1.8GB/s 2.2GB/s
128G 2.1GB/s 2.1GB/s
The results demonstrates that the process with memcg limits
nearly has no impact on the performance of the process without
limits.
Test-Parameters: clientdistro=el8.7 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Test-Parameters: clientdistro=el9.1 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7b548dcc214995c9f00d57817028ec64fd917eab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Sebastien Buisson [Fri, 15 Sep 2023 11:23:19 +0000 (13:23 +0200)]
LU-17015 obdclass: new primitives for upcall cache
This patch adds 2 new primitives to the upcall cache mechanism:
- upcall_cache_get_entry_raw: get a ref on an existing entry;
- upcall_cache_update_entry: modify expiry time and state of an entry.
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4825f09ae807abb52ebe0e24719dcd915e8c8aef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52389
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Sun, 2 Apr 2023 16:33:44 +0000 (11:33 -0500)]
LU-16699 osc: Prefer NR_ZONE_WRITE_PENDING
Linux commit v4.7-5966-g5a1c84b404a7
mm: remove reclaim and compaction retry approximations
Introduced NR_ZONE_WRITE_PENDING which should be used
in mod_zone_page_state.
Older kernels should fallback to NR_UNSTABLE_NFS
or NR_WRITEBACK.
Test-Parameters: trivial
HPE-bug-id: LUS-11559
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I90f22d4bd56f5986eaa5d4a042a2c8ed31fbf752
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50499
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Patrick Farrell [Tue, 28 Mar 2023 15:02:40 +0000 (11:02 -0400)]
LU-16671 osc: fix unstable pages for short IO
Unstable pages was written with theoretical support for
short IO (ie, no bulk, data-in-rpc, LU-1757), but since the
short IO code wasn't merged until years later, they were
probably never tested together. And when you do, it
crashes.
In truth, short IO has no separate pages to be tracked,
which is why this is crashing. This means that small write
RPCs won't be tracked in unstable pages, but that's a very
minor limitation and unlikely to cause trouble. (and since
RPC allocations are not 'pages', they're just malloc'ed,
there's no good way to track them anyway)
Fixes: 70f092a ("LU-1757 brw: add short io osc/ost transfer.")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I34b09f8324424c3ff0b0c09c86f01c938b643e37
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Mon, 11 Sep 2023 15:31:09 +0000 (17:31 +0200)]
LU-17108 nodemap: make map_mode available for default nm
The map_mode property lets control the way mapping is carried out. It
is already available on regular nodemaps, to decide whether uids, gids
and/or projids will be mapped.
On the default nodemap, where it is not possible to define mappings,
the map_mode property will be taken into account when trusted is 0 and
deny_unknown is 0. Unmapped IDs will be left unchanged.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I16a2f5cfda11a8435b56a00f3e97bdc70741c156
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52336
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Mon, 11 Sep 2023 03:23:11 +0000 (22:23 -0500)]
LU-17104 build: Correct test for bad allocation
Expect non-zero value following allocation. If zeroed
reply to caller with -ENOMEM
This patch fixes a build issue reported by gcc 12
Fixes:
09f9fb3211 ("LU-11023 quota: quota pools for OSTs")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2ce197c31bf444d9f179942e516cfd9bdaf7dd9c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52330
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Frank Sehr [Fri, 8 Sep 2023 23:38:05 +0000 (16:38 -0700)]
LU-17045 lnet: ksocklnd-config report errors on cmd failure
Make sure that ksocklnd-config script logs an error if any of the
commands it attempts to execute fail.
The script already does log a warning if it finds any of the routes
it is intending to add already exist. It should also report if any
of the command execution failed to make the user aware that MR routing
setup could not be applied.
Test-Parameters: trivial
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: If5a240d224f6a45015d1fc1a9d0a8df58ed661e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Thu, 7 Sep 2023 20:11:04 +0000 (14:11 -0600)]
LU-17098 tests: fix sanity-pcc failures on Ubuntu
A few sanity-pcc test fail due to the system setup. One of those
failures was due to uidmap not installed on my Ubuntu system.
The rest of the test failures was due to assuming uid / guid
were set values (500 and 1000) which is not the case for all
systems.
Test-Parameters: trivial testlist=sanity-pcc
Change-Id: I667f399854d626d4b22efed2b341ad5c330e0cfe
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52310
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Wed, 6 Sep 2023 02:39:49 +0000 (10:39 +0800)]
LU-17091 tests: check correct return value in lfs_df
$? is the return value of last command in a pipe.
We should check the return value of first command 'lfs df'
in this case.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn
Change-Id: I7daa38f27c878e5195181ed82717cd28ca345dbc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Tue, 5 Sep 2023 09:08:16 +0000 (11:08 +0200)]
LU-17015 obdclass: set cache entry/acquire expiry at init
Give the ability to define values for cache entry expire and acquire
expire directly at upcall cache init.
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iee0dea66943ab6747d85a378861ae98c29faa11a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Thomas Bertschinger [Tue, 25 Jul 2023 16:03:47 +0000 (12:03 -0400)]
LU-16981 lod: update llc_stripe_count after ost inactive
If an OST gets deactivated while lod_ost_alloc_qos() is trying to
allocate stripes for a file create, then normally this is caught and
EAGAIN is returned which causes the lod_comp->llc_stripe_count to
get updated to accurately reflect the stripe count. But there is a
race condition and if the OST is deactivated after the call to
ltd_qos_is_usable() but before the stripes are allocated, then
updating the stripe count never occurred.
This causes an LBUG later in lod_striped_create() because fewer
stripes are allocated than the number in llc_stripe_count so it
finds a stripe that is NULL.
The solution is to properly update lod_comp->llc_stripe_count when
the number of stripes created is less than expected.
Fixes:
ced540165ef5 ("LU-16623 lod: handle object allocation consistently")
Test-Parameters: testlist=sanity env=ONLY=27V,ONLY_REPEAT=100
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Ia1264f24904fed00454b3bc3c0d6c7b9b947737f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51759
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Etienne AUJAMES [Mon, 27 Mar 2023 10:14:34 +0000 (12:14 +0200)]
LU-16408 tests: fix replay-dual test 33
Client can be evicted in REPLAY_LOCK. Wait REPLAY_WAIT import state
before aborting the recovery on the MDS.
When unmounting a combined MDT and MGT, the imperative recovery is
disabled. So, we have to force update the client import states
(MGC/MDC).
Test-Parameters: trivial
Test-Parameters: testlist=replay-dual env=ONLY="33",ONLY_REPEAT=50
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Fixes:
1a79d395dd ("LU-15935 target: keep track of multirpc slots in last_rcvd")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I0869fe968a18795dae39cf39a7009cf444820017
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Wed, 28 Sep 2022 02:16:17 +0000 (10:16 +0800)]
LU-16194 lod: define negative extent offset as invalid
If lu_extent.e_start/e_end is negative after converting to s64,
regard it as invalid data except -1 (EOF).
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I79276a5185f339e9de48fe87c4da39052c7974e1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48684
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Qian Yingjin [Mon, 11 Sep 2023 15:53:31 +0000 (11:53 -0400)]
LU-10499 pcc: use foreign layout for PCCRO on server side
This patch includes the codes about using foreign layout for PCCRO
on the server side (LOD|MDD|MDT layers).
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I48467be9fef54bd05432528b685241aa53978d24
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51375
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bobi Jam [Tue, 5 Sep 2023 06:54:44 +0000 (14:54 +0800)]
LU-17088 dom: don't create different size DOM component
Multiple DOM components are allowed in diffrent mirror but they
must be of the same size, mirror extend should check this restraint.
Fix another glitch in lov_init_composite() where dom_size is used
as a __u64 value but declared as boolean.
Fixes:
44a721b8c1 ("LU-11421 dom: manual OST-to-DOM migration via mirroring")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ia0d08c697dbeeb3aa8d20d9849226afa06360012
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52269
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Thu, 24 Aug 2023 23:11:55 +0000 (19:11 -0400)]
LU-13306 mgc: handle large NID formats
For newer versions of Lustre the MGS can send mgs_nidtbl_entry
containing NIDs of a larger format. Its also possible an old
MGS will send NIDs of the previous size. We need to handle
both cases. We reused the field of mcb_nm_cur_pass, which only
is used for nodemap, of the struct mgs_config_body to send the
NID size from the client to the MGS. Pre IPv6 clients will
by default have a zero mcb_nm_cur_pass / mcb_nid_size. When
mcb_nid_size is zero the the MGS will treat the client as
pre-IPv6 and send small NID back to the client. This avoids
needing to patch older clients. If the MGS is older then
small size NIDs will be sent back which the new MGC layer can
handle by converting those lnet_nid_t to struct lnet_nid.
To handle this new code the "swab" of the entry is split into
two parts. The "header" is "swab"ed as soon as we know the entry
is large enough for that to make sense. The content containing
NID information is swabbed later once the header has been found
to look sane.
Test-Parameters: serverversion=2.15 testlist=runtests,sanity,recovery-small
Change-Id: I97ebdcecc1ee0fbfe676cbdbdc77edee13e60891
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50750
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Bobi Jam [Fri, 19 May 2023 09:40:31 +0000 (17:40 +0800)]
LU-16837 llite: handle unknown layout component
If lustre client encounters unknown layout component pattern in
a mirror file, this patch makes client mark this mirror as invalid
and skip it.
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie5f44212ab96bdc706cc5a9e11f330234fc01069
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51060
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 23 Jan 2020 20:15:10 +0000 (20:15 +0000)]
LU-10465 lov: increase default stripe size to 4MB
Increase the default stripe size from 1MB to 4MB for better
performance and reduced LDLM lock contention for larger writes.
This can also reduce the need to cache data on the client on a
striped file before a full RPC is generated, since the default
RPC size is 4MB, but with 1MB stripe size, the file would need
4x full stripe_count * stripe_size writes before an RPC is full.
Patch includes several test fixes:
- sanity-pfl: takes into account stripe size in some tests
- sanity-flr: use bigger component size and amount of data to
saturate all stripes as expected by test
- sanity: 130g to use 1M stripe prior FIEMAP calcs
- sanity-lfsck: 36[a-c] to use 1M stripe as expected by calcs
Test-Parameters: testlist=sanity-compr
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3cef8805247fc5253e0a0ac05157b9d609054df9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/37318
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Fri, 18 Aug 2023 21:28:43 +0000 (14:28 -0700)]
LU-17041 kernel: update RHEL 8.8 [4.18.0-477.21.1.el8_8]
Update RHEL 8.8 kernel to 4.18.0-477.21.1.el8_8.
Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.8 serverdistro=el8.8 testlist=sanity
Test-Parameters: trivial fstype=zfs \
clientdistro=el8.8 serverdistro=el8.8 testlist=sanity
Change-Id: Ie24c8e438dd33afafb900664d9a4010160bc1a45
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52003
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Sun, 17 Sep 2023 01:00:35 +0000 (21:00 -0400)]
LU-17096 debian: add obd_test.ko, llog_test.ko to lustre-tests
The obd_test.ko module was missing from the lustre-tests
Debian package. Hence, it wasn't being installed on the
Ubuntu clients during testing. This caused sanity/55a and
sanity/55b to consistently fail.
Add llog_test.ko to lustre-tests also. It's not unheard of to
use Ubuntu for Lustre server. So the package may as well include
llog_test.ko.
Also, update debian/.gitignore.
Test-Parameters: trivial testlist=sanity env=ONLY=55,ONLY_REPEAT=50 clientdistro=ubuntu2204
Test-Parameters: trivial testlist=sanity env=ONLY=55,ONLY_REPEAT=50
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I050de4563478996828886ca623fa96b58f9fef5e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52398
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Tue, 5 Sep 2023 20:18:31 +0000 (14:18 -0600)]
LU-16661 build: remove -dev packages for Debian
Don't depend on libmount-dev, libsnmp-dev, libkeytils-dev for the
lustre-client-utils and lustre-server-utils packages. These are
only needed for build and for the lustre-client-dkms package.
Disable SNMP by default as this is no longer used anywhere.
Test-Parameters: trivial testlist=runtests clientdistro=ubuntu2204
Fixes:
7dc6e1128a ("LU-15888 build: Debian dkms-debs requires ed and libkeyutils")
Fixes:
af2f77633b ("LU-13818 build: use libsnmp-dev instead of libsnmp30")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib788a97028ee40a9c61070d00b823620ec3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52281
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Jian Yu [Mon, 4 Sep 2023 07:04:50 +0000 (00:04 -0700)]
LU-16661 build: use "Recommends: perl" for lustre-iokit
In lustre-iokit, the "plot" commands all use perl, but
the actual "*-survey" scripts are written in bash, so
the "Requires: perl" in lustre.spec.in for lustre-iokit
could be downgraded to "Recommends: perl" for RHEL 8+
(RHEL 7 does not handle "Recommends:").
Test-Parameters: trivial testlist=obdfilter-survey
Change-Id: I55f3c57e73ac91cedce745dc4f424c3542978cd4
Fixes:
800a9ec58f78 ("LU-16661 build: improve lustre.spec.in Requires")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>