Whamcloud - gitweb
fs/lustre-release.git
4 weeks agoLU-16692 tests: remove force_new_seq from some test suites 33/54433/3
Li Dongyang [Fri, 15 Mar 2024 11:39:30 +0000 (22:39 +1100)]
LU-16692 tests: remove force_new_seq from some test suites

force_new_seq was used in some tests to avoid the
situation where the sequence from replay request
could be different than the one osp is at, due to
previous sequence width has been used up.

Now it can be handled so remvoe the force_new_seq
to speed up test runs.
Some force_new_seq are still required to make sure
there are enough objects in the current precreate pool
for the overstriping test cases.

Change-Id: Id1bc6760e721db61c11b1c3d6b2fa82965459728
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16692 osp: do not assert on seq got over network 20/54020/15
Li Dongyang [Tue, 13 Feb 2024 04:10:53 +0000 (15:10 +1100)]
LU-16692 osp: do not assert on seq got over network

Replay requests have FIDs already assigned and the
sequence could be different to the osp:
seq rollover happened after the original request,
then something triggers replay, or osp lost the
seq rollover record on storage.

Detect this and avoid the assert in osp_fid_diff(),
we don't update the last id on osp in this case,
otherwise orhpan cleanup could cleanup the objects
in the current osp's sequence.

Also when rollover seq happens in osp, do not
LASSERT() if we didn't get a new seq, most likely
on ofd/ost the previous seq update was lost on storage.
We could return the error code and let precreate
thread try again.

Cleanup lu_fid_diff() which is not used.
In osp_create(), do not call osp_update_last_fid()
again for the regular non-replay case, it's already
done via osp_object_assign_fid()->osp_precreate_get_fid().

Change-Id: I509c00b998933d45865c9540e12a2db7d1b2b8ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54020
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17053 libcfs: make a debugfs equivalent for markers 63/54363/7
James Simmons [Mon, 18 Mar 2024 14:53:53 +0000 (08:53 -0600)]
LU-17053 libcfs: make a debugfs equivalent for markers

Most of the ioctl handlding that was LNet related has been moved
from libcfs to LNet. One left over are markers which allow
injection of strings into the lustre debug buffer located in
libcfs_ioctl(). This is the only functionality which exist in
libcfs yet it is only available in lnet module. We can create
an debugfs equivalent that also allows injection of strings
into the Lustre debug buffers with scripts.

Change-Id: I22395b6b19f94de3c95ba8517a14d2ea251fe37a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54363
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17409 scripts: correct ldev MGS handling 19/53619/3
Olaf P. Faaland [Tue, 9 Jan 2024 05:36:50 +0000 (21:36 -0800)]
LU-17409 scripts: correct ldev MGS handling

ldev was incorrectly parsing the line specifying the hosts that can
run the MGS, when of the form

    gopher1  gopher2  MGS  gopher1/mgs

as it assumed every target specified included a filesystem name,
like 'lustre3-MDT0000'.

This corrects that, assuming that an MGS may not be related to a
specific file system.

When such an input line is found, assume that MGS is used by any file
systems included in the ldev.conf.  When user includes option '-F
<fsname>' as well as '-R MGS', include that MGS, in the targets
reported.

Test-Parameters: trivial
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Ifab5db1dfb094755e29747ec6b90d1566b16c18c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53619
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cameron Harr <charr@llnl.gov>
Reviewed-by: Eric Carbonneau <carbonneau1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17666: configure lnet before add net in sanity-sec:31 43/54543/3
Li Xi [Fri, 22 Mar 2024 12:30:57 +0000 (20:30 +0800)]
LU-17666: configure lnet before add net in sanity-sec:31

If "options lnet config_on_load=1" is not configured in
modprobe.d, the lnet will not be configured when trying to
add a network. The command will hit problem.

/usr/sbin/lnetctl net add --if eth1 --net tcp999
add:
    - net:
          errno: -22
          descr: "cannot add network: Invalid argument"

Test-Parameters: trivial testlist=sanity-sec env=ONLY=31

Change-Id: If65b7cb372d4f04a10ea066d62f3ae43029fcf65
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54543
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17000 utils: Pass statx parameter as reference 43/54443/5
Arshad Hussain [Mon, 18 Mar 2024 08:56:12 +0000 (14:26 +0530)]
LU-17000 utils: Pass statx parameter as reference

In printf_format_file_attributes() parameter 'struct statx'
was passed as value. Since copying large value is inefficient.
This function changes passing 'struct statx' to be passed as
reference.

Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=65
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=65
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=65
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=65
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=65
Fixes: f0ab3ac6d6 ("LU-16760 utils: support 'lfs find --attrs' and '-printf %La'")
CoverityID: 399698 ("Big parameter passed by value")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic20feff84d7043000ebaa1eaec98d54c73fc1a7e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54443
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17640 test: wait osts up after restart 80/54380/2
Hongchao Zhang [Sat, 9 Mar 2024 14:35:41 +0000 (22:35 +0800)]
LU-17640 test: wait osts up after restart

In test_18e of sanity-lfsck, the OSTs could not be ready on all MDTs
and the LFSCK status will be incorrect because the LFSCK notify can
not be sent to all OSTs.

Change-Id: If1ed5d920d5c8b99d42f59f92a1e245a9e2a8267
Test-Parameters: trivial testlist=sanity-lfsck,sanity-lfsck,sanity-lfsck,sanity-lfsck,sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54380
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17634 hsm: serialize HSM restore for a file on a client 66/54366/6
Qian Yingjin [Wed, 13 Mar 2024 01:33:19 +0000 (21:33 -0400)]
LU-17634 hsm: serialize HSM restore for a file on a client

For a file in HSM released, exists, archived status, start tens of
processes to read it in parallel on a client, and one read process
may report "No data available" error.

After analyzed the error, we found the following bug in HSM code:
Reading a released file already granted LAYOUT lock on a client:
P1:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
        ->ldlm_cancel_resource_local()
          remove LAYOUT lock from resource into cancel list
          NOT yet cancel the LAYOUT lock on the client via ELC...

P2:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
      SKIP: No any conflict LAYOUT lock on resource lock list as P1
      has already move it (if any) into its cancel list
    ->mdt_hsm_request()
      ->cdt_restore_handle_add()
        ->cdt_restore_handle_find()
        ->list_add_tail(): add @crh to restore handle list
        NOT yet obtain EX LAYOUT lock to cancel cached LAYOUT
        locks on client side...

P3:
->ll_file_read_iter()
->ll_do_fast_read(): => return -ENODATA;
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
      SKIP as P1 has already move the conflict LAYOUT lock
      (if any) into its cancel list
    ->mdt_hsm_request()
      ->cdt_restore_handle_add()
        ->cdt_restore_handle_find()
        SKIP as found a restore handle with same FID in the
        the restore handle list added by P2.
  ->ll_layout_refresh()
  ->io->ci_need_restart = vio->vui_layout_gen != gen;
  ->LAYOUT gen does not have any change as the LAYOUT lock on
    the client is not revoken yet, will not restart I/O...
->return -ENODATA; =>from fast read

We can fix this bug by serializing the HSM restore operation on a
client by using the @lli->lli_layout_mutex simply.

Add sanity-hsm/test_12{t, u} to verfiy it.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idc2a8c1818386c64798d7e28500c20c80ff369f1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17056 tests: force osc import reconnect in sanity-sec 30b 49/54349/9
Sebastien Buisson [Mon, 28 Aug 2023 08:09:53 +0000 (10:09 +0200)]
LU-17056 tests: force osc import reconnect in sanity-sec 30b

In sanity-sec test_30b, force reconnect of idle osc imports
so that security flavor is correctly updated.
In case of failure, dump more information about state of the imports
and the srpc connections.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec env=ONLY=30b,ONLY_REPEAT=50,SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: testlist=sanity-sec env=ONLY=30b,ONLY_REPEAT=50,SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: testlist=sanity-sec env=ONLY=30b,ONLY_REPEAT=50,SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: testlist=sanity-sec env=ONLY=30b,ONLY_REPEAT=50,SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: testlist=sanity-sec env=ONLY=30b,ONLY_REPEAT=50,SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Test-Parameters: testlist=sanity-sec env=ONLY=30b,ONLY_REPEAT=50,SHARED_KEY=true mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaecc7321b12e61a266e97d3640a3288f0e7ec9dd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54349
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17448 lod: don't skip uninited components 02/54302/3
Alex Zhuravlev [Wed, 6 Mar 2024 18:00:51 +0000 (21:00 +0300)]
LU-17448 lod: don't skip uninited components

don't skip uninitialized component during declaration as we need
to declare potential records to llogs if the component is created
in this transaction later.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia1cbfaae9b28e40fd68fa125d748ec0b5319f512
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54302
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17599 ldiskfs: restore ldiskfs patch attribution 47/54247/4
Shaun Tancheff [Fri, 8 Mar 2024 11:18:11 +0000 (18:18 +0700)]
LU-17599 ldiskfs: restore ldiskfs patch attribution

Over time various ports of ldiskfs patches to newer kernel
releases and distribution kernels have lost or confused the
original history and author of many patches.  It is also
helpful to have a summary for the reason behind each patch.

Thanks-to: Andreas Dilger <adilger@whamcloud.com>
for digging through the history of these patches.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ibf9dd8c9583816251836bd396acd4543116ccc1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54247
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17606 build: remove el8.[0-3] kernel patch series 25/54325/2
Andreas Dilger [Fri, 8 Mar 2024 02:22:00 +0000 (19:22 -0700)]
LU-17606 build: remove el8.[0-3] kernel patch series

Remove el8.[0123] kernel patch series.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie6f12d868fd92299fecfa9277947b7d8883ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54325
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17606 build: remove el7.[678] kernel patch series 24/54324/4
Andreas Dilger [Fri, 8 Mar 2024 02:06:36 +0000 (19:06 -0700)]
LU-17606 build: remove el7.[678] kernel patch series

Remove the kernel patches for el7.[678] along with another
obsolete configure check in the ldiskfs tree.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie5ea5dd96fb7a9e7315e9a80117dad5de63ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54324
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17557 osd: only accounting inodes are special 91/54091/8
Alex Zhuravlev [Mon, 19 Feb 2024 08:18:45 +0000 (11:18 +0300)]
LU-17557 osd: only accounting inodes are special

don't treat all inodes special (system) because 5.14 turns filesystem
read-only when we try to access an non-existing inode with
LDISKFS_IGET_SPECIAL flag.

Fixes: 2c0b2b7540 ("LU-13166 osd-ldiskfs: fix to allow to get system inode")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0c05adaf7b94e04c094cb069e8271bf478010b8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54091
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17556 llite: removed dead branches 90/54090/3
Shaun Tancheff [Mon, 19 Feb 2024 07:53:07 +0000 (14:53 +0700)]
LU-17556 llite: removed dead branches

A few cases disabled branches:
   if (0 && ...

Code disabled for many years should be removed.

Fixes: 39f63cf54c6 ("LU-4476 kernel: support process namespace containers")
Fixes: 99727c7a1a4 ("LU-4017 quota: add setting/getting project id function")
Fixes: c3e10ade1ee ("Moved IAM code from ldiskfs to OSD.")
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8f7ba09881d66845acea9fdf24f499fb7b5366fa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17546 osd: use __vfs_removexattr 72/54072/12
Alex Zhuravlev [Fri, 16 Feb 2024 05:31:41 +0000 (08:31 +0300)]
LU-17546 osd: use __vfs_removexattr

as otherwise vfs_removexattr() taking inode's lock confict with
osd_execute_truncate() while we don't really need inode's lock
because another per-object lock has been already taken.

Fixes: dcd5607ce0 ("LU-13430 vfs: add ll_vfs_getxattr/ll_vfs_setxattr compat macro")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I43c1c60d2a9f911b6395e1b7546507074a90b1cf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54072
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17499 llite: inode lock in ll_migrate() 41/54041/9
Alex Zhuravlev [Thu, 15 Feb 2024 16:24:12 +0000 (19:24 +0300)]
LU-17499 llite: inode lock in ll_migrate()

should be taken after data version check as this is the
correct locking order used in another paths like lseek.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0bafb8db215a2ea004928ff36049d8f053507c6f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54041
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17497 obdclass: check upcall incorrect values 78/53878/8
Sebastien Buisson [Thu, 1 Feb 2024 15:52:22 +0000 (16:52 +0100)]
LU-17497 obdclass: check upcall incorrect values

Identity upcall is set via lctl set_param mdt.*.identity_upcall=xxx,
and rsi upcall is set via lctl set_param sptlrpc.gss.rsi_upcall=xxx.
Possible values are a valid path to an executable, and also NONE to
disable identity upcall.
Add an upcall cache function that checks the user provided string, to
make sure we do not store an invalid value. And print a message to
stdout to explain the accepted values.

Add sanity-sec test_69 to exercise this.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaf59e72aa1612f5579db175d8999dcf0053308ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ptlrpc: ptlrpc: rename ptlrpc_req_finished 48/53648/7
Patrick Farrell [Thu, 11 Jan 2024 10:51:11 +0000 (16:21 +0530)]
LU-16741 ptlrpc: ptlrpc: rename ptlrpc_req_finished

First series of patchs thats renames ptlrpc_req_finished
to ptlrpc_req_put

Change it as part of a general refactor of the ptlrpc
request put/freeing code.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3f897b74debe383c4efb25c9a0becc1c27faa3d9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53648
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17344 lfsck: check the validity of the res_id 65/53565/2
Hongchao Zhang [Mon, 11 Dec 2023 19:54:19 +0000 (03:54 +0800)]
LU-17344 lfsck: check the validity of the res_id

During processing the object destroying request in LFSCK,
the incoming FID could generate invalid ldlm_res_id even if
the dt_object is loaded successfully by this FID, this patch
adds checks for the validity of the generated res_id and
return error instead of triggering the panic.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ibc5d16c3c7781fd92c44e48960c3746be81739d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53565
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16972 tests: createmany sets xattr 27/51827/6
Li Dongyang [Tue, 1 Aug 2023 04:16:36 +0000 (14:16 +1000)]
LU-16972 tests: createmany sets xattr

Add -x option to make createmany set an xattr named
user.createmany when open+create the files.
The xattr content is unique to avoid sharing on the
backend filesystem.
This could be useful for testing EA blocks and ea_inode
feature.

Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ic2effe1f8cc60065dfda649d4ce003d7f10a135c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-9325 obdclass: use match_table for server mount options 09/51209/11
James Simmons [Mon, 4 Mar 2024 20:31:43 +0000 (15:31 -0500)]
LU-9325 obdclass: use match_table for server mount options

We can greatly simplify lmd_parse() by using the match_table
API of the Linux kernel.

Change-Id: I0bc48da25553d9eb18c3cc188536d7dacd09cbd6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51209
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-9859 lnet: move expr parsing from libcfs 45/50845/10
Mr NeilBrown [Sat, 9 Mar 2024 14:30:35 +0000 (09:30 -0500)]
LU-9859 lnet: move expr parsing from libcfs

The expr parsing is used for lnet and lnet-related data, so move it
into lnet/lnet.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4e65358babf6f522df6d7e9b3622c2b3e517bb7d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50845
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
5 weeks agoLU-15840 lov: return st_blocks=1 for HSM released files 91/47291/4
Qian Yingjin [Wed, 11 May 2022 08:21:12 +0000 (16:21 +0800)]
LU-15840 lov: return st_blocks=1 for HSM released files

The MDT will return st_blocks=1 for a HSM released file.
In the call ->coo_attr_get in LOV layer, the client should also
return st_blocks=1 for a HSM released file.

Otherwise, the client may get 0 block count. It is very easy to
reproduce this problem via the following commands for a archived
file:
# $LFS hsm_restore $file
# $LFS hsm_release $file
# $LFS hsm_release $file
After release a file twice, the reported block count via stat()
call will become 0.

Change-Id: Id1841147e40a7df0ca615e887f324cff8e613f11
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
5 weeks agoLU-15714 pcc: reserve layout intent flags for PCCRO 81/46981/10
Qian Yingjin [Sat, 2 Apr 2022 09:26:29 +0000 (05:26 -0400)]
LU-15714 pcc: reserve layout intent flags for PCCRO

Add wirecheck checks for PCCRO data structures
struct lu_pcc_attach,  lu_pcc_detach, lu_pcc_state

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If8a414103ab13155aa483179247c81908b6ced69
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46981
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-6142 lnet: SPDX for lnet/lnet/ 52/54252/3
Timothy Day [Sat, 2 Mar 2024 22:15:24 +0000 (22:15 +0000)]
LU-6142 lnet: SPDX for lnet/lnet/

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8ef4aaabdd0a0b89a60a9187756b451c67a43492
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-6142 misc: SPDX for lustre-iokit 80/54180/2
Timothy Day [Sun, 25 Feb 2024 19:39:04 +0000 (19:39 +0000)]
LU-6142 misc: SPDX for lustre-iokit

Convert from verbose license text to SPDX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I70f0c1387bcfe2dbe862a292ccb0f549e4e80d31
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54180
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-6142 lnet: SPDX for lnet/util/lnetconfig/ 72/54172/3
Timothy Day [Sun, 25 Feb 2024 02:39:17 +0000 (02:39 +0000)]
LU-6142 lnet: SPDX for lnet/util/lnetconfig/

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I25c57a0db2554f551bda0b9b5f6c03893ff83646
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-12452 socklnd: allow setting IP ToS value 80/54080/5
Etienne AUJAMES [Sat, 23 Mar 2024 19:43:38 +0000 (15:43 -0400)]
LU-12452 socklnd: allow setting IP ToS value

This patch add a new tuning to set the IP "Type of Service" value for
TCP QoS.

It adds the module parameter "tos":
...
options ksocklnd tos=106

tos=-1 means "disable": the LND will not try to set the ToS value.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I15d3d1dfb645cc778763713c5018f66bea8567c6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54080
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17496 lnet: retry cleanup during shutdown 76/53876/4
Shaun Tancheff [Thu, 1 Feb 2024 08:20:14 +0000 (15:20 +0700)]
LU-17496 lnet: retry cleanup during shutdown

LNet can work a little harder to cleanup during teardown to
avoid an assert on module removal.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic19c7774fa354b55bbfe21e8d87171dd024748c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17602 mdd: use correct fid in mdd_rename 60/54260/6
Alex Zhuravlev [Mon, 4 Mar 2024 04:32:04 +0000 (07:32 +0300)]
LU-17602 mdd: use correct fid in mdd_rename

mdd_rename() can re-insert target name back as a part of error
handling. use correct fid for that, not own target directory fid.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0662fa005459416b070157a2d049fcf5ed08ae91
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54260
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17525 tests: sanity/133a interop version checking 66/54466/5
Shaun Tancheff [Thu, 21 Mar 2024 04:43:43 +0000 (11:43 +0700)]
LU-17525 tests: sanity/133a interop version checking

sanity/133a with 2.15.4 and master fails with:
   Error: 'The open counter on mds1 is 1, not 2'

Add a version check for at least 2.15.62 to exclude the extra
checks when the MDS does not have v2_15_61-63-g055f939979

Test-Parameters: testlist=sanity env=ONLY=133a clientarch=aarch64 clientdistro=el8.8
Test-Parameters: testlist=sanity env=ONLY=133a serverversion=2.15.4 serverdistro=el8.8
Test-Parameters: testlist=sanity env=ONLY=133a clientarch=aarch64 clientdistro=el8.8 serverversion=2.15.4 serverdistro=el8.8
Fixes: 055f939979 ("LU-17481 mdt: count all opens in mdt.*.md_stats")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ibb6eca7a5dcf295b419f7025a0167d70babe0f1f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54466
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 weeks agoLU-17675 tests: skip sanity-flr/61a for el9.3 12/54612/2
Andreas Dilger [Thu, 28 Mar 2024 22:05:24 +0000 (16:05 -0600)]
LU-17675 tests: skip sanity-flr/61a for el9.3

The atime update appears to be broken in this kernel, skip the test
for now.

Test-Parameters: trivial testlist=sanity env=ONLY=61 clientdistro=el9.3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2e27e2fbaa9a6a9e11049c4629b10998b3824c12
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54612
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17392 build: compatibility updates for kernel 6.7 21/53621/17
Shaun Tancheff [Thu, 29 Feb 2024 20:55:58 +0000 (03:55 +0700)]
LU-17392 build: compatibility updates for kernel 6.7

Linux commit v6.6-rc4-53-gc42d50aefd17
  mm: shrinker: add infrastructure for dynamically allocating
      shrinker

Users of struct shrinker must dynamically allocate shrinker objects
to avoid run-time warnings.

Provide a wrapper for older kernels to alloc+register shinkers
and unregister+free.

Use get_group_info() and put_group_info() wrappers instead of
open coding the reference counting on group_info.usage

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie07bdb7fe3eb6060bd84f95f860f1b53d120a605
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53621
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
6 weeks agoLU-17243 build: compatibility updates for kernel 6.6 08/52908/20
Shaun Tancheff [Thu, 29 Feb 2024 20:33:54 +0000 (03:33 +0700)]
LU-17243 build: compatibility updates for kernel 6.6

linux kernel v5.19-rc1-4-gc4f135d64382
  workqueue: Wrap flush_workqueue() using a macro
linux kernel v6.5-rc1-7-g20bdedafd2f6
  workqueue: Warn attempt to flush system-wide workqueues.
If __flush_workqueue(system_wq) is not available fall back to
flush_scheduled_work()

linux kernel v6.5-rc1-92-g13bc24457850
  fs: rename i_ctime field to __i_ctime
Use accessors for ctime. Provide replacements for older
kernels.

linux kernel v6.5-rc1-95-g0d72b92883c6
  fs: pass the request_mask to generic_fillattr
Provide request_mask argument where needed.

Linux commit v6.5-rc2-20-g2ddd3cac1fa9
  nsproxy: Convert nsproxy.count to refcount_t
Provide a wrapper for inc/dec of nsproxy.count

linux kernel v6.5-rc4-110-gcf95e337cb63
  mm: delete mmap_write_trylock() and vma_try_start_write()
Use down_write_trylock directly mmap_write_trylock

In preparation for kernel 6.7 the remaining inode time
accessors will be preferred:

linux kernel v6.6-rc5-86-g12cd44023651
  fs: rename inode i_atime and i_mtime fields
Use accessors for atime and mtime. Provide replacements for
older kernels.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ide6c2e3e8db532449850b145c2d61b972d21f649
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52908
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
6 weeks agoLU-17081 build: Prefer folio_batch to pagevec 59/52259/25
Shaun Tancheff [Tue, 5 Mar 2024 03:15:54 +0000 (10:15 +0700)]
LU-17081 build: Prefer folio_batch to pagevec

Linux commit v5.16-rc4-36-g10331795fb79
  pagevec: Add folio_batch

Linux commit v6.2-rc4-254-g811561288397
  mm: pagevec: add folio_batch_reinit()

Linux commit v6.4-rc4-438-g1e0877d58b1e
  mm: remove struct pagevec

Use folio_batch and provide wrappers for older kernels to use
pagevec handling, conditionally provide a folio_batch_reinit

Add macros to ease adding pages to folio_batch(es) as well
as unwinding batches of struct folio where struct page is
needed.

HPE-bug-id: LUS-11811
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie70e4851df00a73f194aaa6631678b54b5d128a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52259
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
6 weeks agoLU-6142 osd-zfs: Fix style issues for osd_quota.c 65/54265/2
Arshad Hussain [Mon, 4 Mar 2024 06:44:59 +0000 (01:44 -0500)]
LU-6142 osd-zfs: Fix style issues for osd_quota.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_quota.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Idb21a3004fb6cb711b2b6a48b4bba735f28ecb31
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54265
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 osd-zfs: Fix style issues for osd_scrub.c 61/54261/5
Arshad Hussain [Mon, 4 Mar 2024 05:44:43 +0000 (11:14 +0530)]
LU-6142 osd-zfs: Fix style issues for osd_scrub.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_scrub.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5b14e20f07e43b36cf974fa358c49661a569ef8b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54261
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 osd: Fix style issues for osd_oi.c 55/54255/6
Arshad Hussain [Sun, 3 Mar 2024 17:38:57 +0000 (23:08 +0530)]
LU-6142 osd: Fix style issues for osd_oi.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_oi.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5a1292790007f921c206803dec5230ebda16ebf9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54255
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 osd: Fix style issues for osd_index.c 31/54231/2
Arshad Hussain [Fri, 1 Mar 2024 08:54:21 +0000 (14:24 +0530)]
LU-6142 osd: Fix style issues for osd_index.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_index.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: If2fa688443dd2d7528b6c2551f4f6cd21e39e8b3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54231
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 mdd: Fix style issues for mdd_orphans.c 26/54226/4
Arshad Hussain [Fri, 1 Mar 2024 03:49:15 +0000 (09:19 +0530)]
LU-6142 mdd: Fix style issues for mdd_orphans.c

This patch fixes issues reported by checkpatch
for file lustre/mdd/mdd_orphans.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I4edbce9d25093afe5a6ab5293e9872b8f3681ae7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54226
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-6142 lfsck: Fix style issues for lfsck_lib.c 15/54215/2
Arshad Hussain [Thu, 29 Feb 2024 01:47:26 +0000 (07:17 +0530)]
LU-6142 lfsck: Fix style issues for lfsck_lib.c

This patch fixes issues reported by checkpatch
for file lustre/lfsck/lfsck_lib.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I10e8d63c36221ffddc2258b54c86d2f64092c0ce
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54215
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 lfsck: Fix style issues for lfsck_layout.c 74/54174/2
Arshad Hussain [Sun, 25 Feb 2024 04:15:09 +0000 (09:45 +0530)]
LU-6142 lfsck: Fix style issues for lfsck_layout.c

This patch fixes issues reported by checkpatch
for file lustre/lfsck/lfsck_layout.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5d6f8156f4552ef8b89248a7a349eb43b74d7b9f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54174
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 selftest: SPDX for lnet/selftest/ 71/54171/2
Timothy Day [Sun, 25 Feb 2024 02:20:30 +0000 (02:20 +0000)]
LU-6142 selftest: SPDX for lnet/selftest/

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I131a74e146c02e601c2474f648550b1bedf37a28
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 llite: Fix style issues for llite_mmap.c 43/54143/2
Arshad Hussain [Thu, 22 Feb 2024 07:56:22 +0000 (13:26 +0530)]
LU-6142 llite: Fix style issues for llite_mmap.c

This patch fixes issues reported by checkpatch
for file lustre/llite/llite_mmap.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9ead23aea0b13f91427a208feac3390d241f1b07
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54143
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-6142 osd: Fix style issues for lov_cl_internal.h 24/54124/2
Arshad Hussain [Wed, 21 Feb 2024 11:38:00 +0000 (17:08 +0530)]
LU-6142 osd: Fix style issues for lov_cl_internal.h

This patch fixes issues reported by checkpatch
for file lustre/lov/lov_cl_internal.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8a2a8266745cf75e819698ea794d27edf32fa8c2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17572 lov: remove noise from lov_init_sub() 25/54125/3
Alex Zhuravlev [Wed, 21 Feb 2024 11:47:32 +0000 (14:47 +0300)]
LU-17572 lov: remove noise from lov_init_sub()

lov_init_sub() generates too many messages in applications like racer.
let's make it a bit less noisy.

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I00ae75597b550c29122d8fb9d34d4e0d24c38dd5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54125
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR 53/54353/2
Serguei Smirnov [Mon, 11 Mar 2024 17:59:29 +0000 (10:59 -0700)]
LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR

There were examples in the field with RoCE setups which demonstrate
that RDMA_CM_EVENT_CONNECT_ERROR may be received when conn state
is neither IBLND_CONN_ACTIVE_CONNECT nor IBLND_CONN_PASSIVE_WAIT.
Handle this in a more gracious manner: report the event as unexpected
and allow the flow to continue.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I58b2482207cfd821f6eac142bdefc8f5bc50f8b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54353
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-14869 test: improve sanity-flr/200a 45/54345/2
Bobi Jam [Mon, 11 Mar 2024 02:47:25 +0000 (10:47 +0800)]
LU-14869 test: improve sanity-flr/200a

Make sure "flock -x" successfully returned before running mirror
resync so that it won't get into running read holding shared flock.

Test-Parameters: trivial testlist=sanity-flr env=ONLY=200a,ONLY_REPEAT=10
Test-Parameters: trivial testlist=sanity-flr env=ONLY=200a,ONLY_REPEAT=10
Test-Parameters: trivial testlist=sanity-flr env=ONLY=200a,ONLY_REPEAT=10
Test-Parameters: trivial testlist=sanity-flr env=ONLY=200a,ONLY_REPEAT=10
Test-Parameters: trivial testlist=sanity-flr env=ONLY=200a,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I6383af5d5761980d24af19efd4a4ac899f369a7d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54345
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17621 lnet: fix conns_per_peer bounds check 26/54326/4
Arshad Hussain [Fri, 8 Mar 2024 03:28:43 +0000 (08:58 +0530)]
LU-17621 lnet: fix conns_per_peer bounds check

Logical opreator '||' would always result in 'TRUE',
allowing any arbitrary conns_per_peer value to be set.
Change Logical operator from '||' to '&&' to correctly
compare that the value is within range.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 9b05e872482e ("LU-10391 lnet: support updating LNet local NI settings")
CoverityID: 415060 ("Logically dead code")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ibcaf18060cae1fc62fe41ee6237abaad1fd2de7f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54326
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-17623 libcfs: save msg_fn pointer to avoid race 20/54320/2
Andreas Dilger [Thu, 7 Mar 2024 22:00:31 +0000 (15:00 -0700)]
LU-17623 libcfs: save msg_fn pointer to avoid race

Save msgdata->msg_fn pointer at the start of libcfs_debug_msg() to
avoid a race condition if another thread calls CDEBUG_WITH_LOC()
at the same time and has a different calling function name.  The
msg_file pointer was already being saved.  Otherwise it is possible
to fail the __LASSERT(debug_buf == string_buf) check if formatted
string length changes between prep and write passes.

Use existing header.ph_mask and .ph_line instead of duplicating them.

Test-Parameters: testlist=sanity-scrub env=ONLY=4d,ONLY_REPEAT=25
Fixes: 1a9bd41846 ("LU-14518 libcfs: print CFS_FAIL_CHECK() location")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3fd30cc9eed2ec8dabd795e9622fe1908a3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54320
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17612 gss: always try to unlink key in error 16/54316/4
Sebastien Buisson [Thu, 7 Mar 2024 15:30:59 +0000 (16:30 +0100)]
LU-17612 gss: always try to unlink key in error

In case of error in context negotiation carried out in userspace,
always try to unlink key to avoid leaking it.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic771f1e4f1b6474caaa89f63c3b02678e163d3d3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54316
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17057 tests: Fix "...endpoint shutdown" under sanity-sec 11/54311/2
Sebastien Buisson [Thu, 7 Mar 2024 08:28:53 +0000 (13:58 +0530)]
LU-17057 tests: Fix "...endpoint shutdown" under sanity-sec

This patch fixes test_0 failing with "Cannot send after
transport endpoint shutdown" by introducing wait_ssk()
in sec_setup() to deterministicly applied SSK flavor.

Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia14021ab82913507df02dbb5a12c8596663f15d9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17612 sec: return keyring errors to userspace 96/54296/3
Aurelien Degremont [Tue, 5 Mar 2024 08:29:23 +0000 (09:29 +0100)]
LU-17612 sec: return keyring errors to userspace

In current code, Linux keyring errors, when using GSS Kerberos,
are all masked under a generic ECONNREFUSED error. That makes
it hard to understand the root cause of the problem
for the I/O caller.

Update the code to propagate errors from request_key() up to
the application.

struct ptlrpc_cli_ctx * gss_sec_lookup_ctx_kr(...) is modified
to now returns a NULL pointer or -errval. This is tested by callers
and propagated. NULL values are still converted to ECONNREFUSED.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Change-Id: I13792f141a961036bc9f7629a4a2db692e245c41
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54296
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 weeks agoLU-930 ptlrpc: remove spurious error messages 85/54285/2
Andreas Dilger [Tue, 5 Mar 2024 22:01:14 +0000 (15:01 -0700)]
LU-930 ptlrpc: remove spurious error messages

Stop errors being printed when a disconnect RPC times out, since
this error is ignored anyway, and adds no value to be printed.

Remove -61 = -ENODATA error from seq_server_init() since this is an
"expected" event for a new target and the error code is not useful.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0f695409617061d46a1b910108cda05f863ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17609 sec: nodemap readonly_mount for remount 82/54282/3
Sebastien Buisson [Tue, 5 Mar 2024 13:43:02 +0000 (14:43 +0100)]
LU-17609 sec: nodemap readonly_mount for remount

The readonly_mount property on nodemaps forces read-only mount from
clients. Clients trying rw remount (via mount -o remount,rw) should
also be forced to read-only.

Also improve sanity-sec test_61 to exercise client remount.

Fixes: e7ce67de92 ("LU-15451 sec: read-only nodemap flag")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I61f8141001d2ff9e832e5c93d8f5997479af98a6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54282
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17317 sec: fix sanity-sec test_28 80/54280/2
Sebastien Buisson [Mon, 4 Mar 2024 09:41:08 +0000 (10:41 +0100)]
LU-17317 sec: fix sanity-sec test_28

Improve sanity-sec test_28 to verify that srpc_contexts is valid
YAML output.
Also remove the ctx information from the output, as printing out a
kernel pointer is not ideal.

Fixes: f6687bafcb ("LU-17317 sec: add srpc_serverctx proc file")
Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie48dc61adfd5017a2313981f27407c9d3b69dd71
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54280
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-17606 ldiskfs: remove old el8.[123] patches/series 72/54272/3
Jian Yu [Tue, 5 Mar 2024 00:17:06 +0000 (16:17 -0800)]
LU-17606 ldiskfs: remove old el8.[123] patches/series

Remove the old ldiskfs el8.[123] patch series files, and the resulting
patch files that are no longer referenced by any patch series file:

    ./contrib/scripts/clearpatches.sh -d ldiskfs/kernel_patches

Test-Parameters: trivial
Change-Id: I0de7302e9de06ad557e9b35d7eea1b9c4084ecae
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17606 ldiskfs: remove old el7.[678] patches/series 71/54271/2
Andreas Dilger [Mon, 4 Mar 2024 22:11:32 +0000 (15:11 -0700)]
LU-17606 ldiskfs: remove old el7.[678] patches/series

Remove the old ldiskfs el7.[678] patch series files, and the resulting
patch files that are no longer referenced by any patch series file:

    ./contrib/scripts/clearpatches.sh -d

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6727d88af1261c9b4090984ad8cab51a5dce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17605 obdclass: do not wait forever acquiring entry 69/54269/2
Sebastien Buisson [Mon, 4 Mar 2024 16:26:30 +0000 (17:26 +0100)]
LU-17605 obdclass: do not wait forever acquiring entry

The process of refreshing an entry via refresh_entry() goes through
an upcall/downcall. If the upcall succeeds, we enter a wait queue.
If after that the downcall is never called, we hit the expiry timeout,
and we get removed from the wait queue.
But if the entry is not new, the expiry time will be
MAX_SCHEDULE_TIMEOUT == LONG_MAX, which means an infinite wait.
So avoid waiting forever if an entry could not be refreshed, and call
wake_up_all() if the wait for the ACQUIRING state failed.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I50ee59654adc221027c79cb68fa182b9abed50fa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54269
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17510 obdclass: fix wake up when queuing close request. 59/54259/2
Mr NeilBrown [Mon, 4 Mar 2024 02:15:17 +0000 (13:15 +1100)]
LU-17510 obdclass: fix wake up when queuing close request.

The waitqueue for requests that need to be sent but that haven't been
allocated a slot is kept ordered by request arrival for fairness.  So
new requests are added to the end.

For requests other than 'close' there is a limit to the number of
active requests (slots) and requests are assigned to slot on a
first-come-first-served basis, so they are simply removed from the
head of the list.

For 'close' requests it is important that these not block indefinitely
behind other other requests so there is one slot that can only be used
by a close request - and only if no other slots are used by a close
request.  These requests do not follow a strict FIFO order.

When a non-"close" request completes we wake the first request on the
list.  There is no point searching all the way down the list for a
close request that could also be woken.  We only do that when a
"close" request completes.  This optimises the common case.

However: when a request is first queued we add it to the end of the
queue and then wake up the first deserving request if there is one.
When there are free slots, this is expected to wake the request just
queued.  When there are no free slots, nothing is woken.

When a "close" request is queued and added to the end of the queue
after other non-close requests, we need to potentially search to the
end of the queue for a close request to wake, just as we do when a
close request completes.  Unfortunately we don't.  This can result in
a close request blocking indefinitely.

So: change the wakeup in obd_get_mod_rpc_slot() to match the wakeup in
obd_put_mod_rpc_slot().  This ensure consistent handling and in
particular will handle a close request immediately if there are no
other close requests in flight.

Clarify comment in claim_mod_rpc_function() to make and perform minor
code cleanup there.

Fixes: b5fde4d6c023 ("LU-17197 obdclass: preserve fairness when waiting for rpc slot")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7b658efc0298a091166f0f18ce460fc3148047eb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54259
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17606 ldiskfs: remove old compatibility code 74/54274/3
Andreas Dilger [Tue, 5 Mar 2024 03:10:58 +0000 (20:10 -0700)]
LU-17606 ldiskfs: remove old compatibility code

The JOURNAL_START_HAS_3ARGS and EXT4_HT_MISC checks are always true
for kernels since 3.10, and no longer have any value to keep around.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I39333026b1f24c3d60fbc3f8c51be693353ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54274
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
6 weeks agoLU-17563 kernel: update SLES15 SP5 [5.14.21-150500.55.49.1] 40/54240/2
Jian Yu [Fri, 1 Mar 2024 22:39:04 +0000 (14:39 -0800)]
LU-17563 kernel: update SLES15 SP5 [5.14.21-150500.55.49.1]

Update SLES15 SP5 kernel to 5.14.21-150500.55.49.1 for Lustre client.

Test-Parameters: trivial mdtcount=4 mdscount=2 \
  clientdistro=sles15sp5 testlist=sanity

Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-1
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-2
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-3

Change-Id: I23868ff25ae093a52f004e556789805a644832ac
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54240
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17593 kernel: update RHEL 8.9 [4.18.0-513.18.1.el8_9] 38/54238/2
Jian Yu [Fri, 1 Mar 2024 19:22:41 +0000 (11:22 -0800)]
LU-17593 kernel: update RHEL 8.9 [4.18.0-513.18.1.el8_9]

Update RHEL 8.9 kernel to 4.18.0-513.18.1.el8_9.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
  testgroup=full-part-3

Change-Id: I2c928e4c08af278dacce1d1dc7a14fa77ffffa33
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54238
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17561 kernel: update RHEL 9.3 [5.14.0-362.18.1.el9_3] 36/54236/2
Jian Yu [Fri, 1 Mar 2024 18:23:10 +0000 (10:23 -0800)]
LU-17561 kernel: update RHEL 9.3 [5.14.0-362.18.1.el9_3]

Update RHEL 9.3 kernel to 5.14.0-362.18.1.el9_3.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.2 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.2 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.2 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.2 serverdistro=el9.3 testlist=sanity

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-3

Change-Id: Iddfe57197d854e0be864c0ce64699f92fcc181d1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17589 flock: Flock blocking information becomes stale 19/54219/3
Andriy Skulysh [Fri, 18 Aug 2023 11:23:25 +0000 (14:23 +0300)]
LU-17589 flock: Flock blocking information becomes stale

Blocking information remains to point for already cancelled lock.

Find new blocker on each reprocess.

Change-Id: I8d353795170f4fd0ae55dd646035cf8feb4cc162
HPE-bug-id: LUS-11784, LUS-11999
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-11085 tests: Add performance test for ldlm_extent code 04/54204/17
Mr NeilBrown [Tue, 20 Feb 2024 00:57:29 +0000 (11:57 +1100)]
LU-11085 tests: Add performance test for ldlm_extent code

Add a new test module "ldlm_extent" which exercises the extent code by
creating multiple extent locks, and discarding them.
Each run is timed and a number of runs are combined to provide a
mean and standard deviation.

Two different tests are performed, with a ramp of locks to keep to
allow seeing any scalability issues:
1/ create lots of non-overlapping extents in
   random order, keeping up to 8000 at a time.
2/ create both random tiny extents and whole-file
   extents, alternating.  Keep up to 1,000,000.
   These are PR and so don't conflict.

Each test runs for at most 5 minutes
(30 loops of 10 seconds each = 300 seconds).

Test-Parameters: trivial env=SLOW=yes env=ONLY=842 testlist=sanity
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I552da3c64fb467cbefb7d25eee709dd038bd454f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54204
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-17566 mdt: move squash code in new/old_init_ucred 94/54194/7
Aurelien Degremont [Tue, 27 Feb 2024 12:20:33 +0000 (13:20 +0100)]
LU-17566 mdt: move squash code in new/old_init_ucred

Move the uid/gid squashing code at the same place,
at the bottom of the function, to make code refactoring
simpler later.

The squashing code is mostly clearing suppgids from ucred,
and no code was using between the old and new position in
the function. So that should be pretty safe.

Handle suppgids clearing the same way for both function
and for both UID or GID squashing.

Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: I29669af26cf68491bf1b6020548116acf318c0c7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17571 tests: set idle_timeout in sanity 77l, 812[ab], 816 23/54123/2
Vladimir Saveliev [Wed, 21 Feb 2024 11:15:24 +0000 (14:15 +0300)]
LU-17571 tests: set idle_timeout in sanity 77l, 812[ab], 816

sanity.sh:test_77l,812a,812b,816 rely on idle_timeout set to not
0. Have the tests to take care of that.

Test-Parameters: trivial
HPE-bug-id: LUS-10951
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I6c0cf3e7264263221b5ec54292673868f4bda25c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54123
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-14535 quota: free lvbo in a wq 07/54107/6
Sergey Cheremencev [Sat, 20 Jan 2024 06:38:38 +0000 (14:38 +0800)]
LU-14535 quota: free lvbo in a wq

Mutex lqe_glbl_data_lock holded in
qmt_lvbo_free might be the reason of
sleeping while atommic if
cfs_hash_for_each_relax is getting a
spinlock on an upper layer:

 BUG: sleeping function called from invalid
context at kernel/mutex.c:104
 ...
 Call Trace:
 dump_stack+0x19/0x1b
 __might_sleep+0xd9/0x100
 mutex_lock+0x20/0x40
 qmt_lvbo_free+0xc7/0x380 [lquota]
 mdt_lvbo_free+0x12d/0x140 [mdt]
 ldlm_resource_putref+0x189/0x250 [ptlrpc]
 ldlm_lock_put+0x1c8/0x760 [ptlrpc]
 ldlm_export_lock_put+0x12/0x20 [ptlrpc]
 cfs_hash_for_each_relax+0x3ff/0x450 [libcfs]
 cfs_hash_for_each_empty+0x9a/0x210 [libcfs]
 ldlm_export_cancel_locks+0xc2/0x1a0 [ptlrpc]
 ldlm_bl_thread_main+0x7c8/0xb00 [ptlrpc]
 kthread+0xe4/0xf0
 ret_from_fork_nospec_begin+0x7/0x21

Move freeing of lvbo to a workqueue. This
patch could be probably reverted as soon
as https://review.whamcloud.com/45882 will
be landed.

Fixes: 1dbcbd70f8 ("LU-15021 quota: protect lqe_glbl_data in lqe")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I56aee72a7adbc6514b40689bae30669e607b5ecd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54107
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-17555 ptlrpc: removed unused lm_repsize accessors 89/54089/6
Shaun Tancheff [Sat, 24 Feb 2024 03:00:58 +0000 (10:00 +0700)]
LU-17555 ptlrpc: removed unused lm_repsize accessors

ptlrpc_req_get_repsize() and ptlrpc_req_set_repsize() are unused.

ptlrpc_req_set_repsize() is superseded by
ptlrpc_request_set_replen() so remove the unused variants.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib89a57c00605b110ec28040614aecba9826f5ffa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54089
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-17517 lov: fix large shift in lov_update_statfs() 43/54043/4
Timothy Day [Wed, 14 Feb 2024 06:17:00 +0000 (06:17 +0000)]
LU-17517 lov: fix large shift in lov_update_statfs()

UBSAN detected:

 shift exponent 65 is too large for
 64-bit type 'long long unsigned int'

in lov_update_statfs() in lov_request.c. This
patch caps shift at 32 since os_bsize is of
type u32. This avoids the invocation of
undefined behavior.

Reported-by: Ake Sandgren <ake.sandgren@hpc2n.umu.se>
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I9ed6dd145279631e8a362c85c6fd46f147ab6946
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54043
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-12019 build: remove global depmod.d conf from debs 25/54025/2
Timothy Day [Tue, 13 Feb 2024 17:40:25 +0000 (17:40 +0000)]
LU-12019 build: remove global depmod.d conf from debs

Lustre should not be creating a global depmod.d configuration
file that affects the load order of all modules installed
on the system.

Yet, Lustre has a depmod.d configuration file that attempts
to mirror the default configuration (the man page is
identical for Debian):

https://manpages.ubuntu.com/manpages/xenial/en/man5/depmod.d.5.html

 "By default, depmod will give a higher priority to a
  directory with the name updates using this built-in
  search string: "updates built-in" but more complex
  arrangements are possible and are used in several
  popular distributions."

However, when we switched from:

 search updates built-in

to:

 search updates/kernel built-in

Ubuntu depmod was forced to prefer build-in modules, since
the modules are not in `updates/kernel` as in Debian. If a
user has third party modules installed on their system, this
could make Ubuntu load the wrong module by default.

This patch removes the lustre.conf depmod.d file. By
leaving these load-order decisions to the distribution, this
patch addresses the regression on Ubuntu.

Fixes: 7ea4e0c ("LU-12019 build: Recognize Debian Kernel and set KMP dir")
Test-Parameters: trivial clientdistro=ubuntu2204
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I96fe1c0e64c48d045d46d62a10e8c8bd6ad2cb7d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54025
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17463 osc: add option to disable page cache shrinker 95/53795/8
Qian Yingjin [Wed, 24 Jan 2024 02:43:38 +0000 (21:43 -0500)]
LU-17463 osc: add option to disable page cache shrinker

The pages mapped into VM_LOCKED [mlocked()ed] VMAs are unevictable
pages. Those pages are marked with PG_mlocked.
However, page cache shrinker in Lustre treats all cached pages
equally even some of them are unevictable. It may evict mlocked
pages by mlock() or mlockall() calls wrongly.

This patch adds an tunable option to enable or disable page cache
shrinker:
- osc.*.enable_page_cache_shrink
It is enabled by default.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I23ebf6d438a71c7917b0cb3375407a64587e15db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53795
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-16194 tests: set minversion of MDS for sanity/65q 71/53771/4
Lei Feng [Tue, 23 Jan 2024 04:01:21 +0000 (12:01 +0800)]
LU-16194 tests: set minversion of MDS for sanity/65q

There are 2 sanity/65p, rename one to 65q.
Checking for negative start/end is not expected for old
verson of MDS. So check the verson of MDS in 65q.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity env=ONLY=65 serverversion=2.15
Change-Id: I1cb7716c37a349f441ed248613f569dd5ab78330
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-13642 lnet: Allow dynamic IP specification 05/53605/6
James Simmons [Fri, 8 Mar 2024 21:54:57 +0000 (16:54 -0500)]
LU-13642 lnet: Allow dynamic IP specification

Currently you can setup an NI only using the device interface.
It is possible that a device interface has more than one IP
address. This change updates lnet_net_cmd() to setup an NI using
a specific network address.

For further reference please read

IP specification in LNet
https://wiki.whamcloud.com/display/LNet/IP+specification+in+LNet

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I2c456790fe9534bbfe02b0330cce73e80318cc1c
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53605
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-16796 lfsck: Change lfsck_layout_slave_target to use kref 22/53422/2
Arshad Hussain [Tue, 12 Dec 2023 16:15:56 +0000 (21:45 +0530)]
LU-16796 lfsck: Change lfsck_layout_slave_target to use kref

This patch changes struct lfsck_layout_slave_target
to use kref instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7ea87e2b94a72363971b71415c9430e5b7ded8cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53422
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17334 lmv: exclude newly added MDT in mkdir 60/53860/3
Lai Siyao [Thu, 18 Jan 2024 15:59:25 +0000 (10:59 -0500)]
LU-17334 lmv: exclude newly added MDT in mkdir

Exclude newly added MDT in QoS mkdir for 30 seconds in case
connections between MDTs are not ready, which may cause lookup fail.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibb5e6eda29ddfff8f66708d72e33453a96f5e7ef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53860
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-17248 kernel: wait for pages under writeback for bdev 22/52922/6
Li Dongyang [Wed, 1 Nov 2023 11:36:10 +0000 (22:36 +1100)]
LU-17248 kernel: wait for pages under writeback for bdev

Since RHEL 8.6 wait_for_stable_page() is controlled by
a new flag SB_I_STABLE_WRITES on the super block.

However the new flag is not set on the bdev pseudo sb,
which mean when doing write directly to the block device
we are not waiting on page writeback, this could trigger
false block integrity errors, as page could be modified
again when under writeback, the integrity checksum does
not match the new data any more.

Upstream has a pending patch
https://lore.kernel.org/linux-mm/20231025141020.192413-1-hch@lst.de/
which works for RHEL 9 kernels.

For RHEL 8 kernels the changes for bdev made it difficult
to backport, a different patch is used to check and wait
for bdev stable_pages.

Change-Id: Ie088abf29f40b294c31f993bcfad56d6081a3fce
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52922
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-15367 llite: add 'rc' to all iotrace messages 07/52007/6
Patrick Farrell [Wed, 28 Feb 2024 02:38:04 +0000 (21:38 -0500)]
LU-15367 llite: add 'rc' to all iotrace messages

It's easy to add the return code to iotrace, so let's do it.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <paf0187@gmail.com>
Change-Id: Ic2357d3d32fd4954e96878174f13b7fe907df2df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-15367 llite: add lseek to iotrace 04/52004/4
Patrick Farrell [Wed, 28 Feb 2024 02:33:08 +0000 (21:33 -0500)]
LU-15367 llite: add lseek to iotrace

Add iotrace messages for lseek.

Credit to Qian Yingjin <qian@ddn.com> for original patch.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I2beed5e80ea9a3d6278ddd40e9deb6b56754fabe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52004
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-16935 llite: avoid hopeless i/o repeats 05/51505/10
Vladimir Saveliev [Wed, 23 Aug 2023 14:19:38 +0000 (17:19 +0300)]
LU-16935 llite: avoid hopeless i/o repeats

On SLES12SP5 kernels (4.12.14_122.147, 4.12.14-122.162) a race between
ll_filemap_fault and ll_imp_inval may lead to the livelock:

  - ll_filemap_fault loops endlessly as filemap_fault()->readpage()
    returns VM_FAULT_SIGBUS (it is unable to send read rpc as import
    is invalid) and as ll_page_inv_lock gets incremented within
    cl_page_discard()->..->vvp_page_delete() called after readpage
    failure.

  - ll_imp_inval stucks in
    obd_import_event(IMP_EVENT_INVALIDATE)->..->osc_object_invalidate
    (before recovery) waiting for completion of i/o ll_filemap_fault
    can not complete.

@ll_page_inv_lock is used to check the page being read by kernel
after it has been deleted from Lustre, which avoids potential
stale data reads. This seqlock allows us to see that a page was
potentially deleted, catch it in this case and repeat the I/O in
ll_filemap_fault() or vvp_io_read_start().

To avoid endless I/O repeat wrongly, in this patch we only increse
@ll_page_inv_lock for the page in PageUptodate state when delete
the page in vvp_page_delete(). The page that not in PageUptodate
state is usually deleted due to the error that does not require
retry.
By this way, ll_filemap_fault() and vvp_io_read_start() will not loop
endless for those errors that does not need to repeat I/O as the
seqlock @ll_page_inv_lock does not have any change.

Test to illustrate the issus is added.

sanity.sh tests are to test i/o error handling.

cl_io_loop(): avoid restart if ci_tried_all_mirrors flag is set.

HPE-bug-id: LUS-11686
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3b62bc95db01bf11f6098011bf29e4064c7e201e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51505
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-14535 quota: get all quota info in LFS 98/42098/65
Hongchao Zhang [Sat, 20 Jan 2024 06:39:33 +0000 (14:39 +0800)]
LU-14535 quota: get all quota info in LFS

This patch adds option "-a" for LFS to get the quota info of
all quota IDs. it iterates quota setting saved in global quota
setting files "quota_master/md-0x0" and "quota_master/dt-0x0"
from QMT and iterates the quota usage info saved in acct quota
files in the backend FS (LDiskFS or ZFS) from QSDs, then merge
the two kinds of quota info at client and print it in the similar
way as "lfs quota -u|-g|-p".

  $lfs quota -a -u /mnt/lustre
  Filesystem /mnt/lustre, Disk usr quotas
  quota_id  kbytes   quota   limit   grace   files quota limit  grace
      root    9684       0       0       -    1019    0     0       -
       bin       4       0  102400       -       1    0  10240      -
    daemon       4       0  102400       -       1    0  10240      -
       adm       4       0  102400       -       1    0  10240      -
        lp       4       0  102400       -       1    0  10240      -
      sync       4       0  102400       -       1    0  10240      -
  shutdown       4       0  102400       -       1    0  10240      -
      halt       4       0  102400       -       1    0  10240      -
      mail       4       0  102400       -       1    0  10240      -

  $lfs quota -a -g /mnt/lustre
  Filesystem /mnt/lustre, Disk grp quotas
  quota_id  kbytes   quota   limit   grace   files quota limit  grace
      root    9684       0       0       -    1019    0      0      -
       bin       4       0  204800       -       1    0  20480      -
    daemon       4       0  204800       -       1    0  20480      -
       adm       4       0  204800       -       1    0  20480      -
        lp       4       0  204800       -       1    0  20480      -
      sync       4       0  204800       -       1    0  20480      -
  shutdown       4       0  204800       -       1    0  20480      -
      halt       4       0  204800       -       1    0  20480      -
      mail       4       0  204800       -       1    0  20480      -

This patch also fixes an deadlock issue in qmt_pool_recalc,
the rw_semaphore "qmt_pool_info.qpi_sarr.osts.op_rw_sem" has been
acquired in qmt_pool_recalc (read mode), but it was acquired once
more in qmt_seed_glbe_all (read mode) and will be stuck if there
is a pending write mode lock acquisition from another thread.

 qsd_reint_qpool D
 Call Trace:
    schedule+0x29/0x70
    rwsem_down_read_failed+0x105/0x1c0
    call_rwsem_down_read_failed+0x18/0x30
    down_read+0x20/0x40
    qmt_seed_glbe_all+0x3a0/0x800 [lquota]
    qmt_site_recalc_cb+0x3c7/0x800 [lquota]
    cfs_hash_for_each_tight+0x11e/0x330
    cfs_hash_for_each+0x10/0x20 [libcfs]
    qmt_pool_recalc+0x9fc/0x1310 [lquota]

 llog_process_th D
 Call Trace:
    schedule+0x29/0x70
    rwsem_down_write_failed+0x215/0x3c0
    call_rwsem_down_write_failed+0x17/0x30
    down_write+0x2d/0x3d
    lu_tgt_pool_remove+0x36/0x1e0 [obdclass]
    qmt_pool_add_rem+0x655/0x920 [lquota]
    qmt_pool_rem+0x10/0x20 [lquota]
    lod_pool_remove_q+0xd6/0x1d0 [lod]
    class_process_config+0x16f2/0x2b20
    class_config_llog_handler+0x839/0x1540
    llog_process_thread+0x913/0x1c10
    llog_process_thread_daemonize+0x9f/0xe0

Test-Parameters: testlist=sanity-quota env=SLOW=yes,ONLY=49,NUM_QIDS=20000
Change-Id: I08feb928fbf34635ec9c5c341de993c718798dc9
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/42098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 weeks agoLU-10499 pcc: add readonly mode for PCC 05/38305/38
Qian Yingjin [Mon, 23 Jul 2018 14:19:25 +0000 (22:19 +0800)]
LU-10499 pcc: add readonly mode for PCC

Readonly Persistent Client Cache (RO-PCC) shares the same framework
with Readwrite Persistent Client Cache, expect that no HSM mechanism
is used in readonly mode of PCC. Instead, RO-PCC adds a new flag
field in the file object's layout named LCM_FL_PCC_RDONLY to
indicate that the file is in PCC read-only state. It is protected
under the layout lock.

After introducing the readonly feature for the layout, the IO path
has some changes. For read, if the file has been valid RO-PCC
cached, the file data can be read from PCC directly; Otherwise, it
will read data using normal I/O path from OSTs. For data modifying
operations (write or truncate), it must clear the readonly flag of
the layout on MDT (which will invaliate the RO-PCC cached state on
clients via layout lock blocking callback), and then it can perform
I/O.

For RO-PCC, as the PCC cached file is actual a replication of
Lustre file, when data read on PCC failed, it can tolerate this
error by falling back to normal read path: read data from OSTs.

Refer to paper (LPCC: hierarchical persistent client caching for
Lustre) for more design details.

Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I6badd72e00a106a0f68950621ce6f82471731a95
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38305
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 weeks agoLU-10003 lnet: migrate lnet setup and tear down to Netlink 59/54359/2
James Simmons [Tue, 12 Mar 2024 13:24:19 +0000 (09:24 -0400)]
LU-10003 lnet: migrate lnet setup and tear down to Netlink

Migrate the LNet setup and tear down functionality from ioctls to
Netlink. This change now means lnet_ioctl() is no longer needed but
we will keep it for now to support older tools. The work here will
be used in a follow on patch to tell lnet to setup large NIDs by
default for testing.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Id69810e114818d423102d6e85ff93529f04c337f
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-10391 lnet: use correct type in nid_addr_is_set 38/54338/2
James Simmons [Sat, 9 Mar 2024 00:47:09 +0000 (19:47 -0500)]
LU-10391 lnet: use correct type in nid_addr_is_set

For nid_addr_is_set() we use NID_ADDR_BYTES macro to scan the
nid_addrs in struct lnet_nid. Each nid_addr is actually u32
so we are going beyond the 4 nid_addr that exist to see if
the nid_addr is set. Fix this by casting nid_addr to u8 so we
can scan by each byte properly.

Fixes: 14cdcd61985 ("LU-13642 lnet: Allow IP specification")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I220bc9d2adad09225ce44f7c1b96fba5a8f6dd26
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-12452 lnet: inherit default lnd tunings from modparams 78/54078/7
Etienne AUJAMES [Thu, 7 Mar 2024 17:27:13 +0000 (12:27 -0500)]
LU-12452 lnet: inherit default lnd tunings from modparams

When a network is added dynamically (via Netlink), LNet assumes that
all the "unset" or default LND parameters are 0. But for some
use cases, 0 could be a valid value.

This patch modifies the callback lnet_lnd.lnd_nl_set() to set default
values: a NULL attr argument means "set the default value".

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ifb91ae63d96131ed87d9fae7d91b8b18df4c9ce9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 weeks agoLU-13814 clio: Improve cl_io_submit_sync comments 73/52073/20
Patrick Farrell [Wed, 23 Aug 2023 22:46:53 +0000 (18:46 -0400)]
LU-13814 clio: Improve cl_io_submit_sync comments

Add notes on what cl_io_submit_sync is for.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5e32f1a7e6893b63d82f14848a865f90d30fb079
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 weeks agoLU-13814 osc: Remove most uses of oap_obj 72/52072/20
Patrick Farrell [Fri, 23 Feb 2024 15:58:47 +0000 (10:58 -0500)]
LU-13814 osc: Remove most uses of oap_obj

Removing most uses of oap_obj makes it easier to do the
upcoming transient page cl_page removal.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic8acaed2ce3c6831f9a0d2bd13d859b9c564efdd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52072
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
7 weeks agoLU-17638 util: remove newer lnetctl export handling 36/54436/3
James Simmons [Fri, 15 Mar 2024 15:19:08 +0000 (11:19 -0400)]
LU-17638 util: remove newer lnetctl export handling

On the current maloo VMs lnetctl export ends up segfaulting. For
now go back to the original code until we figure out what is
different on this setup and yet it works elsewhere. The reason
for a partial reveret is other important works are ready to land
that would be delayed by a full revert.

Fixes: d3ef8f6993 ("LU-9680 lnet: add NLM_F_DUMP_FILTERED support")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ibd3437ee619cde9667d049455d641a602ea50174
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54436
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 weeks agoLU-17600 lnet: delete lbstats and lnetunload 50/54250/2
Timothy Day [Sat, 2 Mar 2024 21:36:04 +0000 (21:36 +0000)]
LU-17600 lnet: delete lbstats and lnetunload

It's not likely that anyone still uses these scripts.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I418bdf2a1428905d598fdffdf27dff80831350d0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54250
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-16752 test: improve sanity 413a/b reliability 68/54168/2
Lai Siyao [Thu, 22 Feb 2024 18:46:12 +0000 (13:46 -0500)]
LU-16752 test: improve sanity 413a/b reliability

Set qos_maxage to 1 early in test_qos_mkdir() to ensure statfs are
updated in round-robin mkdir test, so that the subsequent QoS mkdir
behave as expected.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Fixes: 233344d451 ("LU-13417 test: generate uneven MDTs early for sanity 413")
Fixes: c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I08f94b5b4e355ffff0704bd0f661bb99a82a9234
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-16695 llite: remove O_APPEND check for sync 28/54128/2
Patrick Farrell [Wed, 21 Feb 2024 20:00:14 +0000 (15:00 -0500)]
LU-16695 llite: remove O_APPEND check for sync

A check for O_APPEND in determining 'sync' or not was
accidentally introduced.  This forces O_APPEND writes to
all be synchronous, which is of course wrong.

Fixes: dad7079dfd  ("LU-16695 llite: switch to ki_flags from f_flags")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iafae63ebda527834bd45d6fcbfb0cebb0340f4e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54128
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 weeks agoLU-17566 mdt: remove duplicate call to mdt_init_ucred_reint() 11/54111/2
Aurelien Degremont [Tue, 20 Feb 2024 11:46:03 +0000 (12:46 +0100)]
LU-17566 mdt: remove duplicate call to mdt_init_ucred_reint()

Remove duplicate call to mdt_init_ucred_reint() from
mdt_reint_setxattr().

mdt_init_ucred_reint() is called in mdt_reint_internal() which is
covering all actual reinters. However, SETXATTR was converted to
reinters framework in fd908da and this call was not removed.
So mdt_init_ucred_reint() is called first in mdt_reint_internal() then
again in the specific mdt_reint_setxattr() handler, without anything
special being done on the ucred between them.

Also merge __mdt_init_ucred() and mdt_init_cred() which was
called only once, and with the same prototype.

Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: I90fed1d2709edf7337a27dd9c3cb0f75f7625135
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17490 tests: verify fanotify works for lustre 69/53869/7
Lei Feng [Wed, 31 Jan 2024 08:39:55 +0000 (16:39 +0800)]
LU-17490 tests: verify fanotify works for lustre

The fanotify API provides notification and interception of filesystem
events. Here we prepare a small util to monitor open/read/write/close
events of file in a filesystem. Verify it works for lustre
filesystem.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: Id57a59bca16133db645e6804024cba9f11d60f1d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53869
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17434 lmv: add exclude list for remote dir 80/53780/5
Lai Siyao [Tue, 16 Jan 2024 19:18:30 +0000 (14:18 -0500)]
LU-17434 lmv: add exclude list for remote dir

Apache Spark creating a _temporary subdirectory for staging files, and
it should be created on the same MDT as its parent directory. Add a
tunable lmv.*.qos_exclude_prefixes, if directory prefix is in this
list, lmv_create() should put it on its parent MDT.

This prefix list follows the same rule of shell environment PATH: use
':' as separator for prefixes. And for convenience '+/-' can be used
to add/remove prefixes.

Add sanity 413k.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I4c8a118f0630c19054934a87bee3599bdb1fe7bb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17175 gss: start lsvcgssd from l_getauth 42/53142/12
Sebastien Buisson [Wed, 15 Nov 2023 10:22:13 +0000 (11:22 +0100)]
LU-17175 gss: start lsvcgssd from l_getauth

If l_getauth detects it cannot connect to the socket supposed
to be opened by lsvcgssd, it tries to launch the daemon, with
predefined default values.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3961ce0f548fb6ea23458edcb01a03fb8b3a617f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17179 tests: check the system is clean 30/52630/5
Sergey Cheremencev [Mon, 9 Oct 2023 02:45:16 +0000 (06:45 +0400)]
LU-17179 tests: check the system is clean

Main part of tests cannot work correctly if the system
is not clean. So check this in the beginning of sanity-quota.

Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ibfbe4663dee8476486e96eb99ccbcea13216861b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-9859 lnet: move CPT handling to LNet 23/52923/7
James Simmons [Thu, 29 Feb 2024 02:34:17 +0000 (21:34 -0500)]
LU-9859 lnet: move CPT handling to LNet

The CPT work is used for LNet and ptlrpc which is the Lustre LNet
interface. Move this work there and merge the lib-mem.c code as
well since they both work closely together. Move cpt debugfs
handling from libcfs to lnet. Now all remaining debugfs in libcfs
is for debugging.

Test-Parameters: trivial
Change-Id: I016a90520bd7c6428b45bafff8618bc864e9112b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-14361 statahead: add connect flag check for batch RPC 75/54275/3
Qian Yingjin [Tue, 5 Mar 2024 03:40:31 +0000 (22:40 -0500)]
LU-14361 statahead: add connect flag check for batch RPC

The tests (sanity/test_123 test case series) are all failing for
servers that do not have batch RPC support.
In this patch we add the connect feature flag check in
mdc.*.import for batch RPC support and skip the batch stat-ahead
tests without this support.

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I54c95722df803131727e5882156570c9da5293ee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>