Whamcloud - gitweb
Qian Yingjin [Wed, 24 Jan 2024 02:43:38 +0000 (21:43 -0500)]
LU-17463 osc: add option to disable page cache shrinker
The pages mapped into VM_LOCKED [mlocked()ed] VMAs are unevictable
pages. Those pages are marked with PG_mlocked.
However, page cache shrinker in Lustre treats all cached pages
equally even some of them are unevictable. It may evict mlocked
pages by mlock() or mlockall() calls wrongly.
This patch adds an tunable option to enable or disable page cache
shrinker:
- osc.*.enable_page_cache_shrink
It is enabled by default.
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I23ebf6d438a71c7917b0cb3375407a64587e15db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53795
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Tue, 23 Jan 2024 04:01:21 +0000 (12:01 +0800)]
LU-16194 tests: set minversion of MDS for sanity/65q
There are 2 sanity/65p, rename one to 65q.
Checking for negative start/end is not expected for old
verson of MDS. So check the verson of MDS in 65q.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity env=ONLY=65 serverversion=2.15
Change-Id: I1cb7716c37a349f441ed248613f569dd5ab78330
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Fri, 8 Mar 2024 21:54:57 +0000 (16:54 -0500)]
LU-13642 lnet: Allow dynamic IP specification
Currently you can setup an NI only using the device interface.
It is possible that a device interface has more than one IP
address. This change updates lnet_net_cmd() to setup an NI using
a specific network address.
For further reference please read
IP specification in LNet
https://wiki.whamcloud.com/display/LNet/IP+specification+in+LNet
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I2c456790fe9534bbfe02b0330cce73e80318cc1c
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53605
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Tue, 12 Dec 2023 16:15:56 +0000 (21:45 +0530)]
LU-16796 lfsck: Change lfsck_layout_slave_target to use kref
This patch changes struct lfsck_layout_slave_target
to use kref instead of atomic_t
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7ea87e2b94a72363971b71415c9430e5b7ded8cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53422
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Thu, 18 Jan 2024 15:59:25 +0000 (10:59 -0500)]
LU-17334 lmv: exclude newly added MDT in mkdir
Exclude newly added MDT in QoS mkdir for 30 seconds in case
connections between MDTs are not ready, which may cause lookup fail.
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibb5e6eda29ddfff8f66708d72e33453a96f5e7ef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53860
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Li Dongyang [Wed, 1 Nov 2023 11:36:10 +0000 (22:36 +1100)]
LU-17248 kernel: wait for pages under writeback for bdev
Since RHEL 8.6 wait_for_stable_page() is controlled by
a new flag SB_I_STABLE_WRITES on the super block.
However the new flag is not set on the bdev pseudo sb,
which mean when doing write directly to the block device
we are not waiting on page writeback, this could trigger
false block integrity errors, as page could be modified
again when under writeback, the integrity checksum does
not match the new data any more.
Upstream has a pending patch
https://lore.kernel.org/linux-mm/
20231025141020.192413-1-hch@lst.de/
which works for RHEL 9 kernels.
For RHEL 8 kernels the changes for bdev made it difficult
to backport, a different patch is used to check and wait
for bdev stable_pages.
Change-Id: Ie088abf29f40b294c31f993bcfad56d6081a3fce
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52922
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Wed, 28 Feb 2024 02:38:04 +0000 (21:38 -0500)]
LU-15367 llite: add 'rc' to all iotrace messages
It's easy to add the return code to iotrace, so let's do it.
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <paf0187@gmail.com>
Change-Id: Ic2357d3d32fd4954e96878174f13b7fe907df2df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Wed, 28 Feb 2024 02:33:08 +0000 (21:33 -0500)]
LU-15367 llite: add lseek to iotrace
Add iotrace messages for lseek.
Credit to Qian Yingjin <qian@ddn.com> for original patch.
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I2beed5e80ea9a3d6278ddd40e9deb6b56754fabe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52004
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Vladimir Saveliev [Wed, 23 Aug 2023 14:19:38 +0000 (17:19 +0300)]
LU-16935 llite: avoid hopeless i/o repeats
On SLES12SP5 kernels (4.12.14_122.147, 4.12.14-122.162) a race between
ll_filemap_fault and ll_imp_inval may lead to the livelock:
- ll_filemap_fault loops endlessly as filemap_fault()->readpage()
returns VM_FAULT_SIGBUS (it is unable to send read rpc as import
is invalid) and as ll_page_inv_lock gets incremented within
cl_page_discard()->..->vvp_page_delete() called after readpage
failure.
- ll_imp_inval stucks in
obd_import_event(IMP_EVENT_INVALIDATE)->..->osc_object_invalidate
(before recovery) waiting for completion of i/o ll_filemap_fault
can not complete.
@ll_page_inv_lock is used to check the page being read by kernel
after it has been deleted from Lustre, which avoids potential
stale data reads. This seqlock allows us to see that a page was
potentially deleted, catch it in this case and repeat the I/O in
ll_filemap_fault() or vvp_io_read_start().
To avoid endless I/O repeat wrongly, in this patch we only increse
@ll_page_inv_lock for the page in PageUptodate state when delete
the page in vvp_page_delete(). The page that not in PageUptodate
state is usually deleted due to the error that does not require
retry.
By this way, ll_filemap_fault() and vvp_io_read_start() will not loop
endless for those errors that does not need to repeat I/O as the
seqlock @ll_page_inv_lock does not have any change.
Test to illustrate the issus is added.
sanity.sh tests are to test i/o error handling.
cl_io_loop(): avoid restart if ci_tried_all_mirrors flag is set.
HPE-bug-id: LUS-11686
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3b62bc95db01bf11f6098011bf29e4064c7e201e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51505
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Hongchao Zhang [Sat, 20 Jan 2024 06:39:33 +0000 (14:39 +0800)]
LU-14535 quota: get all quota info in LFS
This patch adds option "-a" for LFS to get the quota info of
all quota IDs. it iterates quota setting saved in global quota
setting files "quota_master/md-0x0" and "quota_master/dt-0x0"
from QMT and iterates the quota usage info saved in acct quota
files in the backend FS (LDiskFS or ZFS) from QSDs, then merge
the two kinds of quota info at client and print it in the similar
way as "lfs quota -u|-g|-p".
$lfs quota -a -u /mnt/lustre
Filesystem /mnt/lustre, Disk usr quotas
quota_id kbytes quota limit grace files quota limit grace
root 9684 0 0 - 1019 0 0 -
bin 4 0 102400 - 1 0 10240 -
daemon 4 0 102400 - 1 0 10240 -
adm 4 0 102400 - 1 0 10240 -
lp 4 0 102400 - 1 0 10240 -
sync 4 0 102400 - 1 0 10240 -
shutdown 4 0 102400 - 1 0 10240 -
halt 4 0 102400 - 1 0 10240 -
mail 4 0 102400 - 1 0 10240 -
$lfs quota -a -g /mnt/lustre
Filesystem /mnt/lustre, Disk grp quotas
quota_id kbytes quota limit grace files quota limit grace
root 9684 0 0 - 1019 0 0 -
bin 4 0 204800 - 1 0 20480 -
daemon 4 0 204800 - 1 0 20480 -
adm 4 0 204800 - 1 0 20480 -
lp 4 0 204800 - 1 0 20480 -
sync 4 0 204800 - 1 0 20480 -
shutdown 4 0 204800 - 1 0 20480 -
halt 4 0 204800 - 1 0 20480 -
mail 4 0 204800 - 1 0 20480 -
This patch also fixes an deadlock issue in qmt_pool_recalc,
the rw_semaphore "qmt_pool_info.qpi_sarr.osts.op_rw_sem" has been
acquired in qmt_pool_recalc (read mode), but it was acquired once
more in qmt_seed_glbe_all (read mode) and will be stuck if there
is a pending write mode lock acquisition from another thread.
qsd_reint_qpool D
Call Trace:
schedule+0x29/0x70
rwsem_down_read_failed+0x105/0x1c0
call_rwsem_down_read_failed+0x18/0x30
down_read+0x20/0x40
qmt_seed_glbe_all+0x3a0/0x800 [lquota]
qmt_site_recalc_cb+0x3c7/0x800 [lquota]
cfs_hash_for_each_tight+0x11e/0x330
cfs_hash_for_each+0x10/0x20 [libcfs]
qmt_pool_recalc+0x9fc/0x1310 [lquota]
llog_process_th D
Call Trace:
schedule+0x29/0x70
rwsem_down_write_failed+0x215/0x3c0
call_rwsem_down_write_failed+0x17/0x30
down_write+0x2d/0x3d
lu_tgt_pool_remove+0x36/0x1e0 [obdclass]
qmt_pool_add_rem+0x655/0x920 [lquota]
qmt_pool_rem+0x10/0x20 [lquota]
lod_pool_remove_q+0xd6/0x1d0 [lod]
class_process_config+0x16f2/0x2b20
class_config_llog_handler+0x839/0x1540
llog_process_thread+0x913/0x1c10
llog_process_thread_daemonize+0x9f/0xe0
Test-Parameters: testlist=sanity-quota env=SLOW=yes,ONLY=49,NUM_QIDS=20000
Change-Id: I08feb928fbf34635ec9c5c341de993c718798dc9
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/42098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Mon, 23 Jul 2018 14:19:25 +0000 (22:19 +0800)]
LU-10499 pcc: add readonly mode for PCC
Readonly Persistent Client Cache (RO-PCC) shares the same framework
with Readwrite Persistent Client Cache, expect that no HSM mechanism
is used in readonly mode of PCC. Instead, RO-PCC adds a new flag
field in the file object's layout named LCM_FL_PCC_RDONLY to
indicate that the file is in PCC read-only state. It is protected
under the layout lock.
After introducing the readonly feature for the layout, the IO path
has some changes. For read, if the file has been valid RO-PCC
cached, the file data can be read from PCC directly; Otherwise, it
will read data using normal I/O path from OSTs. For data modifying
operations (write or truncate), it must clear the readonly flag of
the layout on MDT (which will invaliate the RO-PCC cached state on
clients via layout lock blocking callback), and then it can perform
I/O.
For RO-PCC, as the PCC cached file is actual a replication of
Lustre file, when data read on PCC failed, it can tolerate this
error by falling back to normal read path: read data from OSTs.
Refer to paper (LPCC: hierarchical persistent client caching for
Lustre) for more design details.
Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I6badd72e00a106a0f68950621ce6f82471731a95
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38305
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Tue, 12 Mar 2024 13:24:19 +0000 (09:24 -0400)]
LU-10003 lnet: migrate lnet setup and tear down to Netlink
Migrate the LNet setup and tear down functionality from ioctls to
Netlink. This change now means lnet_ioctl() is no longer needed but
we will keep it for now to support older tools. The work here will
be used in a follow on patch to tell lnet to setup large NIDs by
default for testing.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Id69810e114818d423102d6e85ff93529f04c337f
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Sat, 9 Mar 2024 00:47:09 +0000 (19:47 -0500)]
LU-10391 lnet: use correct type in nid_addr_is_set
For nid_addr_is_set() we use NID_ADDR_BYTES macro to scan the
nid_addrs in struct lnet_nid. Each nid_addr is actually u32
so we are going beyond the 4 nid_addr that exist to see if
the nid_addr is set. Fix this by casting nid_addr to u8 so we
can scan by each byte properly.
Fixes:
14cdcd61985 ("LU-13642 lnet: Allow IP specification")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I220bc9d2adad09225ce44f7c1b96fba5a8f6dd26
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Etienne AUJAMES [Thu, 7 Mar 2024 17:27:13 +0000 (12:27 -0500)]
LU-12452 lnet: inherit default lnd tunings from modparams
When a network is added dynamically (via Netlink), LNet assumes that
all the "unset" or default LND parameters are 0. But for some
use cases, 0 could be a valid value.
This patch modifies the callback lnet_lnd.lnd_nl_set() to set default
values: a NULL attr argument means "set the default value".
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ifb91ae63d96131ed87d9fae7d91b8b18df4c9ce9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Wed, 23 Aug 2023 22:46:53 +0000 (18:46 -0400)]
LU-13814 clio: Improve cl_io_submit_sync comments
Add notes on what cl_io_submit_sync is for.
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5e32f1a7e6893b63d82f14848a865f90d30fb079
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Fri, 23 Feb 2024 15:58:47 +0000 (10:58 -0500)]
LU-13814 osc: Remove most uses of oap_obj
Removing most uses of oap_obj makes it easier to do the
upcoming transient page cl_page removal.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic8acaed2ce3c6831f9a0d2bd13d859b9c564efdd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52072
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Fri, 15 Mar 2024 15:19:08 +0000 (11:19 -0400)]
LU-17638 util: remove newer lnetctl export handling
On the current maloo VMs lnetctl export ends up segfaulting. For
now go back to the original code until we figure out what is
different on this setup and yet it works elsewhere. The reason
for a partial reveret is other important works are ready to land
that would be delayed by a full revert.
Fixes:
d3ef8f6993 ("LU-9680 lnet: add NLM_F_DUMP_FILTERED support")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ibd3437ee619cde9667d049455d641a602ea50174
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54436
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Timothy Day [Sat, 2 Mar 2024 21:36:04 +0000 (21:36 +0000)]
LU-17600 lnet: delete lbstats and lnetunload
It's not likely that anyone still uses these scripts.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I418bdf2a1428905d598fdffdf27dff80831350d0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54250
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Thu, 22 Feb 2024 18:46:12 +0000 (13:46 -0500)]
LU-16752 test: improve sanity 413a/b reliability
Set qos_maxage to 1 early in test_qos_mkdir() to ensure statfs are
updated in round-robin mkdir test, so that the subsequent QoS mkdir
behave as expected.
Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Fixes:
233344d451 ("LU-13417 test: generate uneven MDTs early for sanity 413")
Fixes:
c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I08f94b5b4e355ffff0704bd0f661bb99a82a9234
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Wed, 21 Feb 2024 20:00:14 +0000 (15:00 -0500)]
LU-16695 llite: remove O_APPEND check for sync
A check for O_APPEND in determining 'sync' or not was
accidentally introduced. This forces O_APPEND writes to
all be synchronous, which is of course wrong.
Fixes:
dad7079dfd ("LU-16695 llite: switch to ki_flags from f_flags")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iafae63ebda527834bd45d6fcbfb0cebb0340f4e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54128
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Aurelien Degremont [Tue, 20 Feb 2024 11:46:03 +0000 (12:46 +0100)]
LU-17566 mdt: remove duplicate call to mdt_init_ucred_reint()
Remove duplicate call to mdt_init_ucred_reint() from
mdt_reint_setxattr().
mdt_init_ucred_reint() is called in mdt_reint_internal() which is
covering all actual reinters. However, SETXATTR was converted to
reinters framework in fd908da and this call was not removed.
So mdt_init_ucred_reint() is called first in mdt_reint_internal() then
again in the specific mdt_reint_setxattr() handler, without anything
special being done on the ucred between them.
Also merge __mdt_init_ucred() and mdt_init_cred() which was
called only once, and with the same prototype.
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: I90fed1d2709edf7337a27dd9c3cb0f75f7625135
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lei Feng [Wed, 31 Jan 2024 08:39:55 +0000 (16:39 +0800)]
LU-17490 tests: verify fanotify works for lustre
The fanotify API provides notification and interception of filesystem
events. Here we prepare a small util to monitor open/read/write/close
events of file in a filesystem. Verify it works for lustre
filesystem.
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: Id57a59bca16133db645e6804024cba9f11d60f1d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53869
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Tue, 16 Jan 2024 19:18:30 +0000 (14:18 -0500)]
LU-17434 lmv: add exclude list for remote dir
Apache Spark creating a _temporary subdirectory for staging files, and
it should be created on the same MDT as its parent directory. Add a
tunable lmv.*.qos_exclude_prefixes, if directory prefix is in this
list, lmv_create() should put it on its parent MDT.
This prefix list follows the same rule of shell environment PATH: use
':' as separator for prefixes. And for convenience '+/-' can be used
to add/remove prefixes.
Add sanity 413k.
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I4c8a118f0630c19054934a87bee3599bdb1fe7bb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Wed, 15 Nov 2023 10:22:13 +0000 (11:22 +0100)]
LU-17175 gss: start lsvcgssd from l_getauth
If l_getauth detects it cannot connect to the socket supposed
to be opened by lsvcgssd, it tries to launch the daemon, with
predefined default values.
Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3961ce0f548fb6ea23458edcb01a03fb8b3a617f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sergey Cheremencev [Mon, 9 Oct 2023 02:45:16 +0000 (06:45 +0400)]
LU-17179 tests: check the system is clean
Main part of tests cannot work correctly if the system
is not clean. So check this in the beginning of sanity-quota.
Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ibfbe4663dee8476486e96eb99ccbcea13216861b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Thu, 29 Feb 2024 02:34:17 +0000 (21:34 -0500)]
LU-9859 lnet: move CPT handling to LNet
The CPT work is used for LNet and ptlrpc which is the Lustre LNet
interface. Move this work there and merge the lib-mem.c code as
well since they both work closely together. Move cpt debugfs
handling from libcfs to lnet. Now all remaining debugfs in libcfs
is for debugging.
Test-Parameters: trivial
Change-Id: I016a90520bd7c6428b45bafff8618bc864e9112b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Qian Yingjin [Tue, 5 Mar 2024 03:40:31 +0000 (22:40 -0500)]
LU-14361 statahead: add connect flag check for batch RPC
The tests (sanity/test_123 test case series) are all failing for
servers that do not have batch RPC support.
In this patch we add the connect feature flag check in
mdc.*.import for batch RPC support and skip the batch stat-ahead
tests without this support.
Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I54c95722df803131727e5882156570c9da5293ee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Mr. NeilBrown [Fri, 23 Feb 2024 21:07:35 +0000 (16:07 -0500)]
LU-8066 obdclass: fix module load locking.
Safe module loading requires that we try_module_get() in a context
where the module cannot be unloaded, typically protected by
a spinlock that module-unload has to take.
This doesn't currently happen in class_get_type().
As free_module() calls synchronize_rcu() between calling the
exit function and freeing the module, we can use rcu_read_lock()
to check if the exit function has been called, and try_module_get()
if it hasn't.
We must also check the return status of try_module_get().
Linux-commit:
71707c276e0acff160e7f2bd38d5b61eb1f61ab2
Change-Id: Ia551a951db8fd97db51140123d340b1649a159cd
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/36043
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Thu, 29 Feb 2024 10:05:19 +0000 (15:35 +0530)]
LU-6142 mdd: Fix style issues for mdd_dir.c
This patch fixes issues reported by checkpatch
for file lustre/mdd/mdd_dir.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I255a2cfe2dd7c09ce421cb7c5387cef0bba73611
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54217
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Thu, 29 Feb 2024 00:41:08 +0000 (06:11 +0530)]
LU-6142 lfsck: Fix style issues for lfsck_striped_dir.c
This patch fixes issues reported by checkpatch
for file lustre/lfsck/lfsck_striped_dir.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I6469e5973a5ee33c408ced48bb9ab162307fdf07
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Wed, 28 Feb 2024 21:43:43 +0000 (03:13 +0530)]
LU-6142 lfsck: Fix style issues for lfsck_namespace.c
This patch fixes issues reported by checkpatch
for file lustre/lfsck/lfsck_namespace.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie415d9ace24adaa845a4298499128b2766dc66aa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54213
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Wed, 28 Feb 2024 22:50:21 +0000 (04:20 +0530)]
LU-6142 lfsck: Fix style issues for lfsck_engine.c
This patch fixes issues reported by checkpatch
for file lustre/lfsck/lfsck_engine.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Icf9941210e7e403088ac9216de38f8c49f52e72e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54212
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Fri, 23 Feb 2024 16:44:30 +0000 (22:14 +0530)]
LU-6142 lfsck: Fix style issues under lustre/lfsck
This patch fixes issues reported by checkpatch
for all files under folder lustre/lfsck
Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7fed1e66f82c691d94198390ad89e91db9bfcdea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54165
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bruno Faccini [Fri, 23 Feb 2024 12:16:36 +0000 (13:16 +0100)]
LU-17578 lnet: fix &the_lnet.ln_mt_peerNIRecovq race
To avoid race &the_lnet.ln_mt_peerNIRecovq must always be
accessed with lnet_net_lock(0) protection.
Test-Parameters: trivial
Fixes: da23037 ("LU-16563 lnet: use discovered ni status to set initial health")
Change-Id: Ic5e0194020200afdecba4cbf5afed274b14da388
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54163
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
James Simmons [Wed, 28 Feb 2024 19:34:38 +0000 (14:34 -0500)]
LU-9680 lnet: add NLM_F_DUMP_FILTERED support
In addition to different API levels for the netlink packets we
can also filter the data sent back when user land sends the
NLM_F_DUMP_FILTERED. Support this across the various netlink
dumpit functions.
This work is needed for the proper support for lnetctl export
command. Update the export to work with the Netlink API. This
results in proper IPv6 support for the export command.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I0e8993b1f9a08199f282965601781aa6fd0e4844
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53004
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Fri, 23 Feb 2024 15:56:00 +0000 (10:56 -0500)]
LU-13814 osc: Remove oap_request
oap_request isn't actually per-page, it's per requet, so
move it up and associate it with the request async args.
The goal is to shift away from page lists at the RPC level
for DIO.
The first step of this is to move everything that can be
moved from osc_async_page to the osc_brw_async_args level.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I188039b0abd4b639755dbebfab02597da13d5ddf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52071
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Wed, 31 Jan 2024 04:00:47 +0000 (22:00 -0600)]
LU-10391 obd: Update lmd_parse to handle IPv6 NIDs
lmd_find_delimiter()/lmd_parse_nidlist() are updated to handle IPv6
NIDs.
class_parse_value() is updated to handle IPv6 NIDs that begin with
'::'. Also, when parsing a Lustre network name, the buffer needn't
contain an '@', nor do we want to search for '@' when locating the
next delimiter.
Fixes: 101f6e8 ("LU-10391 obdclass: handle large NIDs for mount strings")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib9e5a0d161babfea368989dd9521d923ec592185
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexey Lyashkov [Tue, 6 Feb 2024 14:58:04 +0000 (17:58 +0300)]
LU-16011 lnet: use preallocate bulk for server
Server side want to have a preallocate bulk to avoid large lock
contention on the page cache.
Without it LST limited with 35Gb/s speed with 3 rail host (HDR each)
due large CPU usage.
Preallocate bulks increase a memory consumption for small bulk,
but performance improved dramatically up to 74Gb/s with very low
cpu usage.
Test-Parameters: testgroup=review-ldiskfs-arm testlist=sanity-lnet,lnet-selftest
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Icf396ba2ecfbded807b5722bb2c4cbe4d0084300
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50276
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mikhail Pershin [Wed, 6 Mar 2024 09:12:33 +0000 (12:12 +0300)]
LU-17611 utils: fix wrong static declarations
Revert wrong changes made to zfs mount utils
Fixes:
c7e9bdf8d4 ("LU-8191 utils: remove unused, fix non-static functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I162d349ebadbf93a89abf49bd41465979d561423
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54293
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Thu, 22 Feb 2024 06:07:05 +0000 (06:07 +0000)]
LU-17576 nodemap: remove nodemap_rbtree.c
Remove nodemap_rbtree.c. This was a port of an
in-progress rbtree patch to the kernel. Every
kernel that Lustre supports should have the needed
macro. The rest of the stuff in the file is unused.
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I25d6eea90a1e9b983fb0be690384e50c6808cb7b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54136
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
James Simmons [Thu, 22 Feb 2024 00:57:21 +0000 (19:57 -0500)]
LU-14391 utils: handle very large YAML data sets.
Some functionality for Lustre and even LNet can return huge
amounts of Netlink data that can overwhelm the internal libyaml
buffers. To resolve this we can create a resizable internal
buffer to collect all the Netlink data that is formated into
YAML. After the message has been completed we can feed this
data in chunk sizes the smaller internal libyaml library can
handle. The libyaml library internal buffer is a rolling buffer
so it will updated when we exceed its internal size. This will
allows collecting every single type of Lustre stat in one go
and for sites that have very large LNet router setups.
Test-Parameters: trivial
Change-Id: I20fdbb19b0f3de3ab52e8ad568c6926f61f627b9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54132
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Mon, 19 Feb 2024 18:40:52 +0000 (01:40 +0700)]
LU-17560 build: with kfi check for kfabric or cray-kfabric
The default location of kfabric is either
/usr/src/kfabric
or
/usr/src/cray-kfabric
Check for either during rpmbuild --with kfi
Test-Parameters: trivial
HPE-bug-id: LUS-12160
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ibcd21335d554b66ec925619c60e61f87d79be63d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54097
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Shaun Tancheff [Sat, 17 Feb 2024 04:22:23 +0000 (11:22 +0700)]
LU-17552 kernel: el9 rhashtable for revoke records in jbd2
The resizable hashtable journal replay is applicable to the
el9.x series of kernels as well.
Lustre-commit:
c3bb2b778d6b40a5cecb01993b55fcc107305b4a
Lustre-change: https://review.whamcloud.com/45122
Lustre-author: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5a8fae6c14d50337c0c454e301112f100f611ab0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54085
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Serguei Smirnov [Fri, 16 Feb 2024 19:01:21 +0000 (11:01 -0800)]
LU-17476 lnet: use bits only to match ME in all cases
If NIDs belong to the same peer and matchbits are matching,
declare a match even if matchbits are matched as not available
or ignored
Test-Parameters: testlist=sanity env=ONLY=350,ONLY_REPEAT=10
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I394c492381a2d069b34516c473220192df05fbd2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bruno Faccini [Thu, 15 Feb 2024 18:07:00 +0000 (19:07 +0100)]
LU-17545 lnet: use unsafe_memcpy() when flexible array
To avoid <memcpy: detected field-spanning write (size 64)
of single field "&lp->lp_data->pb_info" at
.../lnet/lnet/peer.c:2456 (size 16)> false positive
msgs/error.
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Change-Id: I4e2fc58e31f60b434a9050393cd65b89c54f0798
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54069
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Yang Sheng [Tue, 30 Jan 2024 13:14:33 +0000 (21:14 +0800)]
LU-17276 tests: Enqueue same range flocks
The ldlm_interval buffer will be released and referenced
between flocks or extent locks. Add a test case to trigger
such scene.
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ifa23ba7c16ad8601b1e3e7891a136589ea44e54a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Serguei Smirnov [Sat, 27 Jan 2024 20:17:34 +0000 (12:17 -0800)]
LU-17476 lnet: prefer to use bits only to match ME
In some cases, it has been observed that a reply will arrive
at the portal with the correct match bits, but is dropped by
lnet_parse_put(). This appears to happen with LNet Multi-Rail
peers, each having two separate NIDs.
If a reply arrives with matchbits available and matching, but
the NIDs don't match, confirm the match if the NIDs are found
to belong to the same peer. This will only happen in cases
where the reply would be dropped entirely, causing hundreds of
seconds of delay until the RPC is resent, so the extra overhead
of checking for a peer match before dropping the request is
only in the error path and minimal compared to the alternative.
Add CFS_FAIL_CHECK() for exercising the match NIDs code.
That is in a hot codepath, but CFS_FAIL_CHECK() is marked unlikely()
and this check is in the error case and _should_ only be hit when
the message would have been dropped anyway, so it seems unlikely to
impact performance in any meaningful way.
Test-Parameters: testlist=sanity env=ONLY=350,ONLY_REPEAT=10
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I10e1a2142539ddf5dabc26ce962cec1f2cfcf3db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53843
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Tue, 23 Jan 2024 13:29:11 +0000 (14:29 +0100)]
LU-16307 tests: fix sanity-sec test_31
In order to improve sanity-sec test_31 resiliency, reorganize the way
the new LNet '999' is handled, and make sure everything is correctly
cleaned up after the test.
The test is also updated to handle IPv6 and numeric NIDs, and it has
been tweaked to run out of tree.
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idd657c7555e598d0ebc08387eac537b1c73e35bd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 14 Dec 2023 01:56:53 +0000 (18:56 -0700)]
LU-14518 libcfs: print CFS_FAIL_CHECK() location
Print the file/function/line where cfs_fail_loc is hit.
This allows better debugging of issues with this code.
This adds the CDEBUG_LIMIT_LOC() macro to allow printing
the location passed to the caller instead of the function,
file, and line number where the macro is located.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieadace61b014d3576c0535f181256c728c7ec6f8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53451
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Sat, 17 Feb 2024 12:27:15 +0000 (13:27 +0100)]
LU-17175 gss: background speedtest for Miller-Rabin rounds
The number of rounds used for Miller-Rabin testing of the prime
provided as input parameter to DH_check() is evaluated when the
lsvcgssd daemon starts. This speed test takes between 5 and 10 seconds
so it makes sense to run it in the background.
Any prime tested before the right number of rounds has been determined
would use the default from OpenSSL. This can lead to longer request
processing time, but this is only for a temporary and short period of
time.
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If77f4374c5af463fdadd15979a594af1786af1df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54088
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Wed, 18 Oct 2023 17:00:39 +0000 (17:00 +0000)]
LU-17217 obd: reserve server-side connection policy bits
Reserve bits for new mechanism enabling server to refuse
client connections based on arbitrary criterion that Lustre
admins can define via policies.
The policies envisioned currently are:
ALLOW (allow connection to procede as normal)
WARN (generate a warning on the client-side)
SOFT_BLOCK (block mounting unless client overrides)
HARD_BLOCK (block mounting entirely)
In order to SOFT_BLOCK a client, servers need to be able to
differentiate between clients that support scp and those that
don't scp. Older clients would not have the mechanism to
override a server-side SOFT_BLOCK, so they would HARD_BLOCK
instead.
We also need a bitmask for the client/server to communciate
policy opinions (i.e. don't soft-block me).
Therefore, this patch reserves:
1) OBD_CONNECT2_CONN_POLICY
2) 8 bits of obd_connect_data
It also explicitly defines the use of some of the bits
via an enum.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If717045728e516eece7c2d812f8ee6e7ebba9497
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52793
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Sat, 23 Sep 2023 18:22:25 +0000 (13:22 -0500)]
LU-17137 utils: Deprecate l_getidentity 'files' alias
The use of 'files' masks the nss module files.
The psuedo module lustre should be used instead of files now.
The 'files' alias should be deprecated and a periodic warning
written (once per hour).
In addition a file /etc/lustre/perm.conf-warning created with
the warning that 'lookup lustre' should be used and 'files'
is deprecated.p
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8167c8827cb4e2120404c08c3f10220f13087a2f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52486
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Mr NeilBrown [Wed, 9 Aug 2023 06:11:51 +0000 (16:11 +1000)]
LU-17022 osd: convert od_connects to atomic_t
od_connects in ldiskfs is protected with od_osfs_lock.
in zfs it is protected with obd->obd_dev_lock.
If we convert it to atomic_t we get cheaper locking and less
confusion.
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4e28c22c8988c7f6a5e67064073541e917a209db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51907
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr NeilBrown [Sun, 25 Feb 2024 23:41:46 +0000 (18:41 -0500)]
LU-17022 obdclass: convert obd_conn_inprogress to atomic_t
Using atomic_t for obd_conn_inprogress means we don't need to take a
spinlock.
Also send wakeup when value reaches zero, and wait for the wakeup
instead of using a yield() loop.
Change-Id: I9af29e068203cde951e592c408906d121702fa18
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51906
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Sat, 20 Jan 2024 18:19:11 +0000 (18:19 +0000)]
LU-8191 utils: remove unused, fix non-static functions
Remove several functions which are never called.
Static analysis shows that a number of functions
could be made static. This patch declares several
functions in various Lustre utils static.
Some missing headers caused some functions being
incorrectly marked as possible candidates for
being made static. These missing headers have
been added.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id51f922be57c33c011ee2f9e509ca164cc480edf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51439
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Zhuravlev [Wed, 1 Mar 2023 18:28:25 +0000 (21:28 +0300)]
LU-10026 osd-ldiskfs: use preallocation for dense writes
use inode's preallocation chunks as per-inode group preallocation:
just grab the very first available blocks from the window.
Test-Parameters: env=ONLY=1000,ONLY_REPEAT=11 testlist=sanity-compr
Test-Parameters: env=ONLY=fsx,ONLY_REPEAT=11 testlist=sanity-compr
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9d36701f569f4c6305bc46f3373bfc054fcd61a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Wed, 14 Feb 2024 17:41:32 +0000 (12:41 -0500)]
LU-15069 llite: Add RAS_CDEBUG in needed spots
Some of the basic readahead state controlling functions
don't dump the readahead state.
Fix that.
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia36a8437d1877a31bfc18c1b6a4170f31383ae66
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Wed, 14 Feb 2024 17:40:40 +0000 (12:40 -0500)]
LU-15069 llite: Rename 'skip' label
There's a goto label in ras_update named just "skip".
Skip what? This is extra confusing because the concept of
"skip index" is used in neighboring code, and this is
unrelated.
Give it a more descriptive name.
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1e6ec7a75b6d9a296bfdea4c70a333497d804564
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45653
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Patrick Farrell [Wed, 14 Feb 2024 17:40:21 +0000 (12:40 -0500)]
LU-15069 llite: Remove ras_set_start
ras_set_start is a one line function and serves only to
obfuscate how simple "set_start" actually is.
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I95d0b891ea2c88354dcb9e5b5a205cafa19380c7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Mon, 12 Feb 2024 16:56:00 +0000 (11:56 -0500)]
LU-15274 llite: whole file read fixes
There are two significant issues with whole file read.
1. Whole file read does not interact correctly with fast
reads - specifically, whole file read is not recognized by
the fast read code so files below the
"max_read_ahead_whole_mb" limit will not use fast reads.
This has a significant performance impact.
2. Whole file read does not start from the beginning of the
file, it starts from the current IO index. This causes
issues with unusual IO patterns, and can also confuse
readahead more generally (I admit to not fully understanding
what happens here, but the change is reasonable regardless.)
This is particularly important for cases where the read
doesn't start at the beginning of the file but still reads
the whole file (eg, random or backwards reads).
Performance data:
max_read_ahead_whole_mb defaults to 64 MiB, so a 64 MiB
file is read with whole file, and a 65 MiB file is not.
Without this fix:
rm -f file
truncate -s 64M file
dd if=file bs=4K of=/dev/null
67108864 bytes (67 MB, 64 MiB) copied, 7.40127 s, 9.1 MB/s
rm -f file
truncate -s 65M file
dd if=file bs=4K of=/dev/null
68157440 bytes (68 MB, 65 MiB) copied, 0.0932216 s, 630 MB/s
Whole file readahead: 9.1 MB/s
Non whole file readahead: 630 MB/s
With this fix (same test as above):
Whole file readahead: 994 MB/s
Non whole file readahead: 630 MB/s (unchanged)
Fixes: 7864a68 ("LU-12043 llite,readahead: don't always use max RPC size")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I72f0b58e289e83a2f2a3868ef0d433a50889d4c0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54011
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Patrick Farrell [Mon, 12 Feb 2024 18:57:01 +0000 (13:57 -0500)]
LU-15033 llite: Increase default RA sizes
The default max_readahead_mb is 1/32 of all cached pages,
which doesn't make much sense but isn't usually a problem
since most real nodes have very high RAM or it is tuned to
larger values.
It is reduced further for the per-file limit, which is also
reasonable.
However, on test VMs with smaller RAM sizes, this results
in hilariously tiny max_read_ahead_per_file_mb values, like
20. This is small enough it causes extra misses because
two RPCs cannot be reliably sent. This edge case isn't
important for performance, but it makes small scale testing
of readahead nearly impossible.
To avoid this, we add a minimum readahead requirement of
256 MiB, which is used unless it's > half of RAM. This
should avoid this case on test VMs without changing the
behavior for real clients unless they are extremely small.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie8aab6b04ad520e4633d634d846e7ef23cc91ced
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46475
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Tue, 6 Feb 2024 17:45:53 +0000 (12:45 -0500)]
LU-10366 test: re-enable sanity test 410 for Ubuntu
For older Ubuntu the pr_err() messages in the kinode module was not
making it to the dmesg ring buffer due to the default loglevel used
in their environment. Now that older verisons of Ubuntu are dropped
sanity 410 should pass. Thanks to the work of LU-17096 the kinode
modules is in its proper place so sanity 410 should pass.
This patch also changes test_410 and kinode to load the module
successfully and unload after the test is done. The Lustre spec
file is adjusted to accommodate this change.
Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY=410,ONLY_REPEAT=10
Test-Parameters: testlist=sanity env=ONLY=410,ONLY_REPEAT=10 clientdistro=ubuntu2204
Change-Id: Iac96efe64db721f9d7247a889f6e9bd4c7d45e2a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Timothy Day <timday@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/31921
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sergey Cheremencev [Mon, 26 Feb 2024 13:29:49 +0000 (16:29 +0300)]
LU-16356 tests: add trap 0 in cleanup_echo_devs
"trap 0" was accidentally removed from cleanup_echo_devs
in "LU-16356 hsm: store crh in rhashtable instead of list".
Add it back.
Test-Parameters: trivial
Fixes:
dc13a56187 ("LU-16356 hsm: store crh in rhashtable instead of list")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I5d309a926f376165aacc8f0fe1c3b04dcd86f545
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54183
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Timothy Day [Sun, 25 Feb 2024 02:47:07 +0000 (02:47 +0000)]
LU-6142 lnet: SPDX for lnet/utils/
Convert from verbose license text to SDPX.
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I0568f692c6799834794ed9c565bdac7ec9aef1d3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Thu, 22 Feb 2024 04:46:16 +0000 (10:16 +0530)]
LU-6142 llite: Fix style issues for vvp_io.c
This patch fixes issues reported by checkpatch
for file lustre/llite/vvp_io.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia79639369e553d74f791d6a13a956240e4cdd82c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54135
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Arshad Hussain [Tue, 20 Feb 2024 10:56:13 +0000 (16:26 +0530)]
LU-6142 tests: Fix style issues under lustre/tests
This patch fixes issues reported by checkpatch
for all files under folder lustre/tests
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I93f18f737c219222593b9689cd3c1b5eaba7630d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Tue, 20 Feb 2024 08:28:28 +0000 (13:58 +0530)]
LU-6142 osd: Fix style issues for osd_iam.c
This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_iam.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I727e5229a8ec89a496d878046c3b4f1a429be59d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54109
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Tue, 20 Feb 2024 09:12:49 +0000 (14:42 +0530)]
LU-6142 lmv: Fix style issues for lmv_obd.c
This patch fixes issues reported by checkpatch
for file lustre/lmv/lmv_obd.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3d1f16b4b33bd6000855e93117b3f73344605e98
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Timothy Day [Tue, 20 Feb 2024 02:44:12 +0000 (02:44 +0000)]
LU-6142 contrib: git plugins
Git plugins to quick see the progress towards
checkpatch cleanups in Lustre.
The original (git-tabcheck) was authored by
Andreas Dilger to help with space to tabs conversion.
The second (git-checkpatch) was modified from
the original script to help capture generic
checkpatch warnings.
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Iab7aaa70690d00f1b7dd5ebcd2412865dae34729
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54101
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Mon, 19 Feb 2024 10:37:00 +0000 (16:07 +0530)]
LU-6142 mgs: Fix style issues for mgc_handler.c
This patch fixes issues reported by checkpatch
for file lustre/mgs/mgc_handler.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I53781b40464676fb36b704bdfcc960d30e81acd1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54093
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Mon, 19 Feb 2024 09:20:57 +0000 (14:50 +0530)]
LU-6142 mgc: Fix style issues for mgc_request.c
This patch fixes issues reported by checkpatch
for file lustre/mgc/mgc_request.c
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I06c443716e527e38fa49cffcdbab03a40df5cffb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54092
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Mon, 12 Feb 2024 11:26:55 +0000 (16:56 +0530)]
LU-6142 llite: Fix style issues for llite_internal.h
This patch fixes issues reported by checkpatch
for file lustre/llite/llite_internal.h
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I366570f4789ab2803c736b80be80bc46bb136eba
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bobi Jam [Fri, 26 Jan 2024 10:06:50 +0000 (18:06 +0800)]
LU-17482 llite: short read could mess up next read offset
When read reaches EOF, it could read data from stale pagecache, but
we need to restore the iocb->ki_pos so that next read could continue
from the correct offset.
Fixes:
4468f6c9d9 ("LU-16025 llite: adjust read count as file got truncated")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib8b62c41bf65f8efec82dda53fcfbdb68ad08b38
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53827
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Sat, 24 Feb 2024 15:15:35 +0000 (10:15 -0500)]
LU-10391 lnet: support updating LNet local NI settings
The LNet API allows updating specific settings instead of a full new
configuration for NIs. We can accomplish this using NLM_F_REPLACE with
the LNET_CMD_NETS command. The only change for the user land tools is
now you can use large NID addresses.
Another change in the user land tools is increasing intf_name field
in size from IFNAMSIZ to LNET_MAX_STR_LEN which requires increasing
err_str handling. This is because we use struct lnet_dlc_intf_descr
both to store network addresses or / and network interfaces.
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Id334ed3a73ac6ec7a342d4616e32dcfef46907a7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53560
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Courrier Guillaume [Thu, 17 Nov 2022 12:15:19 +0000 (13:15 +0100)]
LU-13048 mdd: allow release after a non-blocking migrate
lfs setstripe -i0 file
lfs hsm_archive file
lfs migrate -n -i1 file
lfs hsm_release file
These actions lead to "Cannot send HSM request ...: Operation not
permitted". This happens because of data version mismatch. This error
is returned by mdt_hsm_release() when the data versions are not the
same.
This patch only corrects the non-blocking migrations.
mdd_swap_layouts is updated to check and update the HSM archive
version when possible. The new and old data versions are added as
arguments to this function. If the old data version does not match
the data version in the HSM attribute, we don't update the HSM
attribute because we don't know what caused the inconsistency.
During a swap between a volatile and a regular file, if both objects
have an HSM xattr, mdd_swap_layouts was called from the MDT HSM layer
(release and restore). In this case, we want to swap the HSM xattr
(previously done using SWAP_LAYOUTS_MDS_HSM as a last argument to
mdd_swap_layouts).
If only the regular file has an HSM attribute, mdd_swap_layouts was
called after a migration (blocking or not). In this case, we want to
update the HSM archive version only if the file is not dirty and if
the new data version is provided.
Also, this patch removes the CL_LAYOUT event that was emitted for a
release. Since a CL_HSM event with HE_RELEASE flag is also emitted,
the CL_LAYOUT is unecessary.
For "lfs swap_layouts", the operation is denied on 2 files with HSM
xattr (HSM xattr swap will cause inconsistencies).
With non-HSM file and archived file, the operation is allowed but the
dirty flag is set on the HSM file.
Add lustre_swab_close_data_special() to swab close_data fields inside
the union (specific to some types of close).
Add regression test sanity-hsm 607a, 607b and 607c.
Test-Parameters: clientversion=2.15.4 testlist=sanity-hsm
Test-Parameters: serverversion=2.15.4 testlist=sanity-hsm env=EXCEPT="114 409a"
Test-Parameters: testlist=sanity-hsm env=ONLY=607,ONLY_REPEAT=15
Signed-off-by: Courrier Guillaume <guillaume.courrier@cea.fr>
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I6e90131235f96255b636eea366ad0cef5f4f0b19
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Thu, 15 Feb 2024 18:51:30 +0000 (13:51 -0500)]
LU-9680 utils: fix nested attribute handling in liblnetconfig
Testing with several different YAML layouts revealed several
limitations. The first breakage discovered while porting LNet
export to Netlink was that for a nested list if the first
attribute processed was another nested list the YAML generated
was missing the needed '-'. Now we instert it manually.
The second problem was the idea of updating an individual key
didn't work which was discovered while testing lustre stats.
We moved the printing of the new key to under NLA_NESTED case
directly. This required created yaml_nested_header() which
handles both empty nested list and ones containing data.
The comments added to the library should make this clear.
Sending Netlink packets also had some bugs that have been
resolved. The function yaml_fill_scalar_data() is used to
parse out simple scalar values and key value pairs. The
original codes parsing of the input string altered the
string. This broke the do while loop over entry since
entry dropped the rest of the configuration data. Instead
of altering the string we carefully parse the string
without altering it.
Handle the case when nla_nest_start() fails to create
a nlattr in lnet_genl_parse_list() which prevents a
node crash when we run out of space in the skbuff.
Make sure the skbuff is large enough for LNet NI
Netlink data collection by setting cb->min_dump_alloc
to U16_MAX.
Test-Parameters: trivial testlist=sanity-lnet
Fixes:
d137e9823ca ("LU-10003 lnet: use Netlink to support LNet ping commands")
Change-Id: I2d702c9211abffc051db3203ec3811ceaedb2376
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Li Dongyang [Mon, 19 Feb 2024 02:27:22 +0000 (13:27 +1100)]
LU-16692 osp: osp_fid_diff vs rollover_new_seq race
osp_fid_diff/osp_objs_precreated is accessing the
last_created_fid and pre_used_fid without opd_pre_lock,
and this could race with osp_precreate_rollover_new_seq()
when updating them to new fids.
Change-Id: I3a61c99570b5532776ddc43247c1513b8c89fb32
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 16 Jun 2022 05:03:45 +0000 (23:03 -0600)]
LU-15913 tests: add rename stress test via racer
Add a rename stress test using the racer framework. Use
mrename if found, to avoid stat and allow directory rename.
Sometimes create and rename files to/from subdirectories.
Run e2fsck after every run to confirm filesystem structure.
Allow tunable parameters via environment variables so they
can be set via Test-Parameters. Parameters can be set on
different nodes via variables CLIENT_LCTL_SETPARAM_PARAM,
MDS_LCTL_SETPARAM_PARAM, OSS_LCTL_SETPARAM_PARAM.
Test-Parameters: trivial testlist=racer env=ONLY=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2ae034b864a5ccb8a59bf7028d22cd67c643f51f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47643
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
adilger [Tue, 27 Feb 2024 18:50:11 +0000 (11:50 -0700)]
Create SECURITY.md
Sebastien Buisson [Thu, 15 Feb 2024 08:58:16 +0000 (09:58 +0100)]
LU-17528 gss: cleanup gss api usage
The lucid context support has been available from at least
krb5 1.7, and even RHEL7 ships with a more recent version.
So drop support for non-lucid api, and cleanup gss api usage.
Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I91fb706d2444c199156423b57a8c1ef24a0c3420
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54063
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Wed, 14 Feb 2024 12:38:25 +0000 (07:38 -0500)]
LU-14291 tests: make module loading of ost optional
Future Lustre versions will no longer have an ost kernel module.
load_module in the test framework will failure so capture the
failure to ignore it. We will need this for interop testing.
Change-Id: Iedff4f6a36ceffa9428e3f891db78b7538217085
Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Sat, 10 Feb 2024 08:02:17 +0000 (15:02 +0700)]
LU-17522 build: Distribute clang build infrastructure
Macro files:
lustre-toolchain.m4 lustre-compiler-plugins.m4
and directory:
cc-plugins
Should be included in distributed files, unconditionally.
Test-Parameters: trivial
Fixes:
d684885098 ("LU-16961 clang: plugins and build system integration")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6ddedd82c6180ffd1c4134fda6af6df6bd23dd34
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53991
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sergey Cheremencev [Fri, 9 Feb 2024 17:22:07 +0000 (20:22 +0300)]
LU-17520 tests: change DEBUG_SIZE logic
Don't set DEBUG_SIZE to 2MB*CPU_num. This way
lustre debug buffer could be just 4MB on a system
with 2 CPUs, despite 3GB RAM. It is the reason
why often time period in debug logs doesn't scope
the reason of failure(sometimes depending on debug
level logs it may store just several seconds). When
DEBUG_SIZE is not set, debug_mb would be calculated
inside libcfs based on RAM and CPU number.
Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iacccc625ec6564c982c75172561c8c3e4114e4b7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Thu, 8 Feb 2024 12:44:21 +0000 (13:44 +0100)]
LU-17484 gss: reply error for SEC_CTX_INIT on wrong node
When a server receives a SEC_CTX_INIT request for a target that is not
available (either stopping, or not set up yet, or moved to a failover
node), the request gets dropped. This makes the client-side RPC time
out, increasing the time it takes to establish a proper gss context
with the target, because it slows down the HA mechanism that tries
alternate failover NIDs.
Instead of dropping the request reply for SEC_CTX_INIT, the server
needs to send back a proper error reply. The client will then be able
to immediately try alternate failover NIDs, speeding mount/reconnect
process up, and avoiding potential eviction.
Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id2cefaa7d54729a63c7be13b65d7ace579bcaa78
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53970
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Serguei Smirnov [Wed, 7 Feb 2024 18:48:08 +0000 (10:48 -0800)]
LU-17258 socklnd: stop connecting on too many retries
If peer repeatedly rejects connection requests with EALREADY,
assume that it doesn't support as many connections as we're trying
to create. Make sure to stop connecting to the peer altogether and
either continue with already created connections if there's at least
one of each type, or fail.
This helps avoid the assertion:
"ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed"
Test-Parameters: trivial testlist=sanity-lnet
Fixes:
5afe3b053 ("LU-17258 socklnd: ensure connection type established upon race")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I6072e91cc36544fc2f56c91cd78f6637cf82ecbc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mikhail Pershin [Tue, 6 Feb 2024 09:36:40 +0000 (12:36 +0300)]
LU-17379 ptlrpc: fix check for callback discard
In ptlrpc_unregister_reply() decision about need to
discard request-out callback is done too early, before
LNetMDUnlink() invokes reply callback. Therefore at the
monent of discard check rq_reply_unlinked is not set yet
and discard is skipped always.
Patch removes discard check from __ptlrpc_cli_wait_unlink()
and does that after LNetMDUnlink() call right inside
ptlrpc_unregister_reply().
That makes __ptlrpc_cli_wait_unlink() unused, so it was
removed and only ptlrpc_cli_wait_unlink() remains
Fixes:
babf0232273 ("LU-13368 lnet: discard the callback")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6448cafa8a0b81d7ba0172ad1709e75e592d4924
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53937
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Arshad Hussain [Tue, 6 Feb 2024 10:11:57 +0000 (15:41 +0530)]
LU-17000 utils: Use ssize_t to store return from sysconf()
Use ssize_t instead of size_t to capture return
from sysconf() as it can return a negative value
Test-Parameters: trivial testlist=sanity-flr
CoverityID: 414674 ("Unsigned compared against 0")
CoverityID: 414673 ("Unsigned compared against 0")
CoverityID: 414672 ("Unsigned compared against 0")
CoverityID: 414671 ("Unsigned compared against 0")
CoverityID: 414670 ("Unsigned compared against 0")
CoverityID: 414669 ("Unsigned compared against 0")
CoverityID: 414668 ("Unsigned compared against 0")
CoverityID: 414667 ("Unsigned compared against 0")
Fixes:
b02a9bc1 (LU-17000 utils: Add check after calling sysconf(_SC_PAGESIZE))
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I03f280f25beb7b6b8b41888c379b0709a6195d9c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53936
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Tue, 6 Feb 2024 07:44:13 +0000 (14:44 +0700)]
LU-17507 build: Allow symlink in mofed default path
A default installation is for /usr/src/ofa_kernel/default
to be a symlink, it is also the default place users expect
to find the MOFED kernel development headers.
Explicitly pass -H to find to allow the command line
arguments to find be symlinks.
Test-Parameters: trivial
Fixes:
3c66185c84 ("LU-17398 build: detect mlnx-ofa_kernel-devel contents")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5d54f9b0a70db52c4be6a9a6ccaed2c59185098b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53934
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Serguei Smirnov [Mon, 5 Feb 2024 23:27:15 +0000 (15:27 -0800)]
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT
Returning LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet on ETIMEDOUT
causes LNet to only decrement the local NI health score,
while the issue may actually be with the remote NI.
Changing this to return LNET_MSG_STATUS_NETWORK_TIMEOUT
causes LNet to decrement both local NI and peer NI health.
If local NI is ok, it will recover its health score quickly,
but the affected peer NI health is lowered until peer NI is recovered.
This helps LNet select healthy NIs of the same peer in the meantime.
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I916772477d1fd63571447262880a33830746f002
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Serguei Smirnov [Mon, 5 Feb 2024 20:14:30 +0000 (12:14 -0800)]
LU-17379 lnet: add LNetPeerDiscovered to LNet API
LNetPeerDiscovered is added to allow lustre check
whether the peer has been successfully discovered by LNet
before attempting to open a connection to it.
For example, given a mount command with a list of NIDs,
Lustre can use LNetAddPeer API to initiate discovery on
every candidate first, and later use LNetPeerDiscovered
to select a reachable peer to connect to.
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I7c9964148a5a2a24d7889b8b4c2e488a433ca258
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53926
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sergey Cheremencev [Fri, 2 Feb 2024 20:07:00 +0000 (23:07 +0300)]
LU-17500 qmt: avoid "enforced bit set, but neither"
Don't call qmt_revalidate_qunit in qmt_set_with_lqe
as it is possible that lqe_enforced bit is not cleared
in case when hard and soft limits are setting to 0.
No reasons to recalculate qunit and edquot when we
set limits to 0. For the case when limits are changed,
qunit and edquot will be calculated below in "dirtied"
branch. So not reasons to do this 2 times.
Patch helps to avoid following error:
LustreError: 21362:0:(qmt_entry.c:746:qmt_adjust_qunit())
$$$ enforced bit set, but neither hard nor soft limit are set
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I8f5d9630f43b66ae7ea2be0bf2c735a02e1f6299
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53893
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Yang Sheng [Thu, 1 Feb 2024 16:31:13 +0000 (00:31 +0800)]
LU-17481 mdt: count all opens in mdt.*.md_stats
Count all of opens for mdt. Also add a test case to
verify it.
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I2fa90cc2b4ce8d7d039736a5f40a70cbeb04bf8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Wed, 31 Jan 2024 14:40:44 +0000 (15:40 +0100)]
LU-17454 nodemap: allow mapping for root
Allow an id mapping for root, to match what is implemented for regular
users, with the following behavior:
- if admin property is set, root remains root.
- if admin property is not set, the idmap for '0' is taken into
account.
- if admin property is not set and there is no idmap for '0' and
deny_unknown property is not set, root is squashed to the squash
uid/gid.
- if admin property is not set and there is no idmap for '0' and
deny_unknown property is set, root is blocked.
Note that map_mode remains ignored for root. Also, capabilities are
not dropped for root when mapped, just like it is done for regular
users. If admins want to drop root capabilities, root must be
squashed.
sanity-sec test_15 is updated to test root mapping.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id2e950b99e3b3ba27179408c647e1f7b7c49e32e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53870
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Matt Ezell [Tue, 23 Jan 2024 15:40:52 +0000 (18:40 +0300)]
LU-13257 llite: Disallow users to set/clear group lock flag
Group locks are created/freed via dedicated ioctls. Disallow manually
setting or clearing the flag.
HPE-bug-id: LUS-12078
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Signed-off-by: Matt Ezell <ezellma@ornl.gov>
Change-Id: Id5022cc02a7bdce2f0150592470e8336b4537a61
Reviewed-on: https://es-gerrit.hpc.amslabs.hpecorp.net/162708
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Tested-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Mon, 5 Feb 2024 06:47:49 +0000 (13:47 +0700)]
LU-17453 llite: use dget_parent to access dentry.d_parent
Use dget_parent() to aquire the d_parent member of a dentry
to ensure dentry is valid while it is accessed.
HPE-bug-id: LUS-11889
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Icb0a25ece5a3a3d50da076708fcd631176652a1b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53757
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Thu, 18 Jan 2024 09:49:48 +0000 (02:49 -0700)]
LU-17441 mdc: use MDS_IO_PORTAL for rename
Some workloads like Apache Spark are very rename intensive, and there
here may be many concurrent renames that need the BFL lock (more than
the number of MDS_REQUEST_PORTAL service threads), they will block
these threads until each is able to get the rename lock, and prevent
other MDS_REINT RPCs from being processed.
Since the MDS_IO_PORTAL is often unused (only needed for DoM files),
and has existed since 2.11.0, it seems possible to move the rename
RPCs to be serviced by the MDS_IO_PORTAL threads to avoid contention
on the primary MDS service threads. Also, it will avoid blocking
normal file open, setattr, statfs, and other common operations if the
BFL lock is contended. Even with DoM files they may have read-on-open
handling and only DoM writes would be blocked by the uncommon rename.
Test-Parameters: testlist=sanity serverversion=2.15 \
env=SANITY_EXCEPT="56x 56xa 56xc 65p 70a 119h 119i 123g 123h 123i 398d 398o"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I623a27de1482778f3c9fc6bb5bbcf917611dc75b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53725
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Alex Zhuravlev [Thu, 11 Jan 2024 05:28:40 +0000 (08:28 +0300)]
LU-17415 ldlm: lock conversion to skip cancelled locks
ldlm_cli_inodebits_convert() should re-check the lock so it's
not being cancelled to skip such locks and avoid an assertion:
LustreError:
15208:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist())
ASSERTION( ldlm_is_granted(lock) ) failed:
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If212931d8fa6a2d8f56c44714de830d5fb4a9a6b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53645
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Tue, 12 Dec 2023 16:49:49 +0000 (17:49 +0100)]
LU-17357 mgc: wait for sptlrpc config log
The sptlrpc config log is mandatory to establish connections to
targets with proper security context. So wait for its retrieval.
Add sanity-sec test_68 to exercise this, and improve test_32
for mgssec.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5352e926dc6a9a68db1224629c68a42b74bee8a4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53423
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Tue, 5 Dec 2023 13:14:58 +0000 (14:14 +0100)]
LU-17317 sec: add srpc_serverctx proc file
GSS srpc contexts for client connections can already be dumped via
proc file <mdc,osc>.*.srpc_contexts.
This patch adds a new proc file to dump server side GSS srpc contexts,
e.g.:
mgs.MGS.gss.srpc_serverctx
mdt.testfs-MDT0000.gss.srpc_serverctx
obdfilter.testfs-OST0000.gss.srpc_serverctx
The GSS context information is dumped as YAML, with one line per
context, like this:
-
0000000013221bdf: { peer_nid: 192.168.56.206@tcp, uid: 0, ctxref: 1,
expire:
1707934985, delta: 3401, flags: [uptodate, cached], seq: 0,
win: 2048, key:
00000000, keyref: 0,
hdl: "0x5ae1a771fd57043:0x65a64972fda4e200",
mech: "krb5 (aes256-cts-hmac-sha1-96)" }
Because of this new syntax, sanity-sec test_28 needs to be fixed.
Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I37da9ffe6dd5884006b36271185a4d7155ead65b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Tue, 5 Dec 2023 05:20:58 +0000 (08:20 +0300)]
LU-17337 osd: ask for more revoke credits
starting from 4.* kernels JBD2 tracks number of potential
revoked blocks separately from regular journal blocks and
checks a transaction doesn't exceed the declared number.
before extent merging patch a regular block allocation could
free only very limited number of blocks. now with extent
merging when an extent tree is really big and few extents
are inserted in a single transaction, then such an allocation
can exceed default revoke credits (8).
the patch uses number of extent in the transaction to calculate
potential number of revoke records (max tree depth * default).
Fixes:
0f7e6c02a9 ("LU-16843 ldiskfs: merge extent blocks")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4967deb56e5aba82b68ffdc91de589fffae6a64a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>