Whamcloud - gitweb
fs/lustre-release.git
5 months agoLU-17280 scrub: skip dir stripes with OI 78/53078/2
Alexander Boyko [Wed, 8 Nov 2023 10:32:55 +0000 (05:32 -0500)]
LU-17280 scrub: skip dir stripes with OI

After fresh mount and LFSCK start all directory stripes
are added to inconsistent list. So scrub for all stripes
would print LFSCK message "inconsistent OI FID...fixed.
Lets check FID to OI mapping before adding to inconsistent
list.

Also fixing additional debug for scrub.

HPE-bug-id: LUS-11777
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I869f1cf71eb6c10f386a3f388a38032c73d2b41a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17275 kernel: RHEL 8.9 client and server support 71/53071/7
Jian Yu [Mon, 20 Nov 2023 18:46:20 +0000 (10:46 -0800)]
LU-17275 kernel: RHEL 8.9 client and server support

This patch makes changes to support RHEL 8.9 release
with kernel 4.18.0-513.5.1.el8_9 for Lustre client and server.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-3

Change-Id: Ia3672d134534b877bb6aaffb4cea0339bc55974f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53071
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3] 54/53054/5
Jian Yu [Mon, 13 Nov 2023 17:03:02 +0000 (09:03 -0800)]
LU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3]

This patch makes changes to support new RHEL 9.3 release
for Lustre client.

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 clientdistro=el9.3 testlist=sanity
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-3

Change-Id: I9cce1a7d2249cb4df39106c44ba4417411ee0757
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53054
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-17278 ldlm: don't grant failed lock 51/53051/2
Alex Zhuravlev [Thu, 9 Nov 2023 13:29:03 +0000 (16:29 +0300)]
LU-17278 ldlm: don't grant failed lock

lock convert can re-grant lock if it loses some bits. this
procedure can race with the import's invalidation. thus
lock can become invalid (l_granted_mode=LCK_MINMODE):
LustreError: 8637:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist())
ASSERTION( ldlm_is_granted(lock) )

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7bb20d62948224647d7632f2822fba44d39a7713
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53051
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17265 tests: allow margin for sanity/39r 35/53035/5
Arshad Hussain [Wed, 8 Nov 2023 06:38:07 +0000 (12:08 +0530)]
LU-17265 tests: allow margin for sanity/39r

The timestamp may be little outdated due to a gap between
writing a file and checking the timestamp, so take that into
consideration and allow 2 second leniency when comparing
timestamps.

The on-disk inode may also not be flushed from the journal
immediately, so allow some time for it to be updated.

This patch also converts the hex value read via debugfs
to decimal.

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9e765f9cd572fb25821f9a0401c34209b7c3f574
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53035
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17230 socklnd: treat UNKNOWN netif operstate as UP 42/52842/6
Serguei Smirnov [Thu, 26 Oct 2023 18:15:28 +0000 (11:15 -0700)]
LU-17230 socklnd: treat UNKNOWN netif operstate as UP

"UNKNOWN" (IF_OPER_UNKNOWN) operational state doesn't necessarily
mean that the interface can't be used and may be the result of
particular network driver not providing UP/DOWN states,
so it may be incorrect for socklnd to initiate
setting of a "fatal error" flag on a NI using an interface
in "UNKNOWN" operstate.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I39dfa01f3758809440d50cf8b6b11555889ef366
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52842
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
5 months agoLU-17216 ofd: make enable_health_write tunable 82/52782/14
Timothy Day [Sat, 21 Oct 2023 19:11:42 +0000 (19:11 +0000)]
LU-17216 ofd: make enable_health_write tunable

enable_health_write should be tunable rather than a
compilation option. This allows us to test it more
easily and gives admins the option to try it out
without having to recompile their Lustre servers.
It will still be disabled by default.

Add sanity/70 test to run a simple check to ensure
enable_health_write and health_check don't explode.
It's not a thorough check. But it at least checks
that the interfaces appear to work.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7b1832f8acf578b891386e28c5af760070a6862c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17212 gss: survive improper obd or imp at ctx init 55/52755/3
Sebastien Buisson [Thu, 19 Oct 2023 09:11:48 +0000 (11:11 +0200)]
LU-17212 gss: survive improper obd or imp at ctx init

GSS context init requests can happen even after a client has been
unmounted, because they are coming from userspace (request-key,
lgss_keyring).
In this case they must be ignored, and code must be robust to survive
improper, already or partially shutdown obd device or import.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I541727165eadf1fcb7715e416da85d100976cf2f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52755
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17097 osc: delete items in Xarray before its destroy 81/52381/23
James Simmons [Wed, 1 Nov 2023 19:25:12 +0000 (15:25 -0400)]
LU-17097 osc: delete items in Xarray before its destroy

For older debug kernels we get a double free with RCU usage with Xarray.

WARNING: CPU: 2 PID: 21477 at lib/debugobjects.c:286
debug_print_object+0x83/0xa0
 ODEBUG: activate active (active state 1) object type:
rcu_head hint:           (null)
 Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE)
ost(OE) mdt(OE) mdd(OE) mgs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE)
mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE)
ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE)
crc_t10dif crct10dif_generic crct10dif_common rpcsec_gss_krb5 squashfs
pcspkr i2c_piix4 i2c_core binfmt_misc ip_tables ext4 mbcache jbd2
ata_generic pata_acpi ata_piix serio_raw libata
 CPU: 2 PID: 21477 Comm: umount Tainted: G           OE
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.16.2-1.fc38 04/01/2014
 Call Trace:
  [<ffffffff817ded29>] dump_stack+0x19/0x1b
  [<ffffffff8108d558>] __warn+0xd8/0x100
  [<ffffffff8108d5df>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff81414723>] debug_print_object+0x83/0xa0
  [<ffffffff814150af>] debug_object_activate+0x1af/0x210
  [<ffffffff817e8d7e>] ? _raw_spin_unlock+0xe/0x20
  [<ffffffffa0189e60>] ? xas_alloc+0xd0/0xd0 [libcfs]
  [<ffffffff8114dc8f>] __call_rcu+0x3f/0x2d0
  [<ffffffff8114df3d>] call_rcu_sched+0x1d/0x20
  [<ffffffffa0189f44>] xas_free_nodes+0xa4/0xf0 [libcfs]
  [<ffffffffa018b26f>] xa_destroy+0xdf/0xf0 [libcfs]

This can be solved by cleaning up individual items in the Xarray
before destroying the Xarray.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-quota env=ONLY=1,ONLY_REPEAT=100 clientdistro=el7.9
Change-Id: I49c5fb588d1b5c44f37e55500a6f33a2cd3988ee
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52381
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-8191 lustre: convert lmv,lod,lov functions to static 79/51479/7
Timothy Day [Fri, 23 Jun 2023 20:53:00 +0000 (20:53 +0000)]
LU-8191 lustre: convert lmv,lod,lov functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in lmv, lod, and lov static.

Also, remove one unused function: lov_lsm_entry()

Another function is intentionally unused for
debugging purposes. It was detected by static
analysis, but it has been left untouched.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If409226ea201587c7f95d4a65ffaef72671b5ac2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51479
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 socklnd: handle IPv6 for zero copy messages 50/53150/2
James Simmons [Wed, 15 Nov 2023 17:53:15 +0000 (12:53 -0500)]
LU-10391 socklnd: handle IPv6 for zero copy messages

When messages exceed a certain size zero copy messages are
created. To support zero copy messages We need to add
KSOCK_PROTO_V4 support. This resolves the error:

LNetError: 5978:0:(socklnd_cb.c:1237:ksocknal_process_receive()) 12345-2601:8c1:c180:2000::36b6@tcp: Unknown ZC-ACK cookie: 0, 272

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I4bc3d03cc5157a0f6ddb1e36ddeac225ed5d0984
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 mgs: copy full nid string 15/53115/5
James Simmons [Wed, 22 Nov 2023 01:51:18 +0000 (20:51 -0500)]
LU-10391 mgs: copy full nid string

For IPv6 testing in mgs_steal_client_llog_handler() the full NID
string was not being copied. Instead we copied the size of pointer
not the NID string. Copy the full NID string.

Fixes: c0cb747e ("LU-13306 mgs: use large NIDS in the nid table on the MGS")
Change-Id: I7e19db0b0d3806c1c6fabe2ede0d880a45fe3052
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53115
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-16827 obdfilter: Fix "emfperf obdfilter-survey" error 83/53083/4
Vitaliy Kuznetsov [Fri, 10 Nov 2023 20:09:42 +0000 (21:09 +0100)]
LU-16827 obdfilter: Fix "emfperf obdfilter-survey" error

This patch fixes the definition of the lctl variable. It changes
the logic so that the LCTL value is assigned only when it was
defined earlier.

Fixes: 91a3b286ba ("LU-16827 obdfilter: Fix obdfilter-survery/1a")
Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I4dfd7e3d1f78208b33b897d8e6680e59b690014c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53083
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17277 build: Distribute lutf.sh unconditionally 50/53050/2
Shaun Tancheff [Thu, 9 Nov 2023 10:36:52 +0000 (04:36 -0600)]
LU-17277 build: Distribute lutf.sh unconditionally

Do not exclude lutf.sh when building the src.rpm regardless
of the build host suitability to run lutf.sh tests.

HPE-bug-id: LUS-11975
Test-Parameters: trivial
Fixes: ba1fa08a0fd ("LU-10973 lnet: LUTF Python infra")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3beeabbae9f1435a002656bfd27d49a02c3bee27
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53050
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: missing some peer functionality 18/53018/3
James Simmons [Fri, 10 Nov 2023 14:30:22 +0000 (09:30 -0500)]
LU-10391 lnet: missing some peer functionality

For peers if we encounter a bad setup in the peer nis
settings for creation we need to cleanup the entire
peer setup.

For the peers API if one of the peer nis is the same as
the primary nid then treat it as tearing down all peer nis
in the peer deletion case.

Change-Id: I57d2a63a9e31860a5ad7587f73f159a9cad2b3c9
Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8a0fdfa0b28 ("LU-10391 lnet: migrate peer NI control to Netlink")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53018
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10729 tests: replay-dual/22d to wait 43/52343/2
hxing [Tue, 12 Sep 2023 03:38:26 +0000 (11:38 +0800)]
LU-10729 tests: replay-dual/22d to wait

replay-dual/22d should have a similar procedure as 23d:
replay-dual/23d simulates a dropped reply for the executed
update, but previous tests can break this:
 - the update modifies remote llog
 - there can be another uptdate to that remote log
   (from the previous tests)
 - fail_loc (OBD_FAIL_UPDATE_OBJ_NET) is applied to the
   old update
 - the 23d's update gets stuck

so the test has to ensure there is no pending/in-flight
updates.

Test-Parameters: trivial testlist=replay-dual mdscount=2 mdtcount=4
Test-Parameters: testlist=replay-dual fstype=zfs mdscount=2 mdtcount=4
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I2e3d3d4d1e5e33ffbb5c953edb21bcae884022c3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52343
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17174 misc: fix hash functions 11/52611/4
Alexey Lyashkov [Tue, 10 Oct 2023 08:38:21 +0000 (11:38 +0300)]
LU-17174 misc: fix hash functions

1) LU-16518 landing caused a bug which visible with debug kernel

UBSAN: Undefined behaviour in include/linux/hash.h:81:31
shift exponent 64 is too large for 64-bit type
'long long unsigned int'
Call Trace:
dump_stack+0x8e/0xd0
ubsan_epilogue+0x5/0x21
ldlm_export_lock_hash+0x49/0x4d [ptlrpc]
cfs_hash_bd_from_key+0x88/0x2e0 [libcfs]

2) use a high bits unstead of low as it more accurate.

HPe-bug-id: LUS-11925
Fixes: 239e8268 (LU-16518 misc: use fixed hash code)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie1c531ad220f44e55fbf80674a49472fb6024252
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
5 months agoRevert "LU-17131 ldiskfs: el9.2 encdata and filename-encode" 69/53069/3
Andreas Dilger [Fri, 10 Nov 2023 04:54:35 +0000 (04:54 +0000)]
Revert "LU-17131 ldiskfs: el9.2 encdata and filename-encode"

This reverts commit b0cc96a1ff516f79f26be32945a237ef8373e408
as it is likely causing ldiskfs to crash immediately at mount:

 LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: journalled.
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: error_code(0x0000) - not-present page
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 7148 Comm: mkfs.lustre  5.14.0-284.30.1_lustre.el9.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__ldiskfs_find_entry+0xab/0x440 [ldiskfs]
 Call Trace:
  ldiskfs_lookup.part.0+0x6c/0x2c0 [ldiskfs]
  __lookup_hash+0x70/0xa0
  __filename_create+0x87/0x150
  do_mkdirat+0x4b/0x160
  __x64_sys_mkdir+0x48/0x70

Change-Id: Idc8448c9e6d2300bc5eccb6ea190252eaaca9f75
Test-Parameters: trivial
Test-Parameters: serverdistro=el9.2 testlist=sanity
Test-Parameters: serverdistro=el9.2 testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53069
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17257 build: use pkg-config to find krb5 libdir 10/53010/3
Jian Yu [Tue, 7 Nov 2023 18:18:32 +0000 (10:18 -0800)]
LU-17257 build: use pkg-config to find krb5 libdir

This patch fixes kerberos5.m4 to use pkg-config to
find krb5 libdir instead of looking for the krb5
libraries in a static list of path.

Test-Parameters: trivial kerberos=true testlist=sanity-krb5

Change-Id: Ia15812932942171b019f3e73034a78f9185c16ce
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53010
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17263 utils: 'lfs find -blocks' to use 512-byte units 93/52993/3
Andreas Dilger [Sun, 5 Nov 2023 05:32:19 +0000 (23:32 -0600)]
LU-17263 utils: 'lfs find -blocks' to use 512-byte units

Change the default units for 'lfs find -blocks' from 1KiB blocks
to 512-byte blocks to better match the behavior of find(1).  This
also matches what "-printf %b" will print.

Change llapi_parse_size() to accept a 'c' argument to specify
characters, and accept a "B" or "iB" suffix if provided.

Fixes: c043f46025 ("LU-10705 utils: add "lfs find --blocks"")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8345f15bf53912501cadc0fa7f981a9f787b767
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17251 osp: force precreate if create_count grows 68/52968/6
Andreas Dilger [Fri, 3 Nov 2023 00:32:44 +0000 (18:32 -0600)]
LU-17251 osp: force precreate if create_count grows

Force the MDS to precreate OST objects if "osp.*.create_count" is
written and the OSP does not have at least that many precreated
objects locally.  This avoids doing complex operations in test
scripts to force precreation to run, which slows down the tests
and increases the chance that a test might fail.

Previously opd_precreate_force was only used for handling OSTs
that were reformatted and this reset "create_count" to minimum, so
move that to the reformat case rather than in the precreate code
path so it does not reset "create_count" when it was just set.

Remove the "env" argument from several precreate-related functions,
since it wasn't used in those functions, and that made it difficult
to call them from the "create_count" parameter handling.

Test-Parameters: testlist=parallel-scale env=ONLY=test_rr_alloc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iac35c1b981fcd6ab2d1ea5abc9ffe2e4563ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52968
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
5 months agoLU-17000 coverity: Fix Logically dead code under lnetctl.c 21/52921/6
Arshad Hussain [Wed, 1 Nov 2023 10:44:08 +0000 (16:14 +0530)]
LU-17000 coverity: Fix Logically dead code under lnetctl.c

This patch fixes Logically dead code reported by
coverity run. This uncovers the missing call
to lustre_lnet_list_peer() to list peer under
old API.

CoverityID: 404746 ("Logically dead code")

Test-Parameters: trivial testlist=sanity-lnet
Fixes: f0be00678c ("LU-9680 lnet: collect data about peer_ni by using Netlink")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0659ce403110118697fb8c88ade70f1695509382
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52921
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17000 coverity: Fix Dereference before null under obd_sysfs.c 03/52903/2
Arshad Hussain [Tue, 31 Oct 2023 11:14:49 +0000 (16:44 +0530)]
LU-17000 coverity: Fix Dereference before null under obd_sysfs.c

This patch fixes Dereference before null check reported
by coverity run.

CoverityID: 404751 ("Dereference before null")

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I89bcc820244ab17a60bf1d5c86f9d6a8747b43ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52903
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17000 coverity: Fix Resource Leak(3) 02/52902/2
Arshad Hussain [Tue, 31 Oct 2023 10:24:27 +0000 (15:54 +0530)]
LU-17000 coverity: Fix Resource Leak(3)

This patch fixes error reported by coverity run.

CoverityID: 404744 ("Resource leak")

Test-Parameters: trivial testlist=sanity,conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ib5a22dd09870fe43a36047e407d1dd57944c9413
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16868 tests: skip conf-sanity/66 in interop 99/52899/2
Andreas Dilger [Tue, 31 Oct 2023 07:36:49 +0000 (01:36 -0600)]
LU-16868 tests: skip conf-sanity/66 in interop

Do not run conf-sanity.sh test_66* in interop testing.  Otherwise,
it is possible that the version of the test script running on the
client does not perform the upgrades with the right steps needed
for remote servers that are running a different version.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=66
Test-Parameters: testlist=conf-sanity env=ONLY=66 serverversion=2.12.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b28b5f123a7348f87d43c54c806eaf6173ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52899
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17221 kernel: update SLES15 SP4 [5.14.21-150400.24.92.1] 20/52820/2
Jian Yu [Tue, 24 Oct 2023 19:25:39 +0000 (12:25 -0700)]
LU-17221 kernel: update SLES15 SP4 [5.14.21-150400.24.92.1]

Update SLES15 SP4 kernel to 5.14.21-150400.24.92.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp4 testlist=sanity

Change-Id: Id82d0ce48179df1f12dc367cced8cf84e1b918d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52820
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16796 mgc: Change config_llog_data to use refcount_t 13/52813/6
Arshad Hussain [Tue, 24 Oct 2023 11:44:15 +0000 (17:14 +0530)]
LU-16796 mgc: Change config_llog_data to use refcount_t

This patch changes struct config_llog_data to use refcount_t
instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieec4de5d957b8dfa82c8cdef80f3a9f73aa55126
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52813
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
5 months agoLU-9859 libcfs: refactor libcfs initialization. 00/52700/6
Mr. NeilBrown [Thu, 9 Nov 2023 02:15:09 +0000 (21:15 -0500)]
LU-9859 libcfs: refactor libcfs initialization.

Many lustre modules depend on libcfs having initialized
properly, but do not explicit check that it did.
When lustre is built as discrete modules, this does not
cause a problem because if the libcfs module fails
initialization, the other modules don't even get loaded.

When lustre is compiled into the kernel, all module_init()
routines get run, so they need to check the required initialization
succeeded.

This patch splits out the initialization of libcfs into a new
libcfs_setup(), and has all modules call that.

The misc_register() call is kept separate as it does not allocate any
resources and if it fails, it fails hard - no point in retrying.
Other set-up allocates resources and so is best delayed until they
are needed, and can be worth retrying.

Ideally, the initialization would happen at mount time (or similar)
rather than at load time.  Doing this requires each module to
check dependencies when they are activated rather than when
they are loaded.  Achieving that is a much larger job that would
have to progress in stages.

For now, this change ensures that if some initialization in libcfs
fails, other modules will fail-safe.

Linux-commit: 64bf0b1a079d61e9e059b9dc7a58e064c7d994ae

Change-Id: I6b5ecdba0defc6e033f78d8fc2b9be9e26c7f720
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52700
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17015 gss: avoid request replay 89/52689/9
Sebastien Buisson [Fri, 13 Oct 2023 15:19:16 +0000 (17:19 +0200)]
LU-17015 gss: avoid request replay

Lustre's upcall cache has a retry mechanism in case the upcall was
interrupted or failed and we timed out waiting. In this case we do our
best to retry and do the upcall again.
But when the upcall cache is used for GSS contexts, the upcall cannot
be done twice with same data. The GSSAPI implements security measures
that forbids that kind of request replay, to prevent man-in-the-middle
attacks for instance.

Add a new uc_acquire_replay field to struct upcall_cache, so that
upcall cache users can tell if acquire upcall can be replayed.
For identity upcall, this replay is fine. But for GSS contexts we need
to avoid those replays.
And bump upcall cache timeout value from 20s to 30s for GSS context
init requests.

Also add more debug messages to gss code for both client and server
sides, and both kernel and userspace.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I56decc83a4f0d21be420e87cb0417826011932af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16518 lod: fix new clang error in lod_lov.c 66/52566/5
Timothy Day [Tue, 3 Oct 2023 15:59:30 +0000 (15:59 +0000)]
LU-16518 lod: fix new clang error in lod_lov.c

The variable hsmsize was uninitialized in some
situations. By moving the initialization earlier,
we can avoid this.

Fixes: aebb405e32e ("LU-10499 pcc: use foreign layout for PCCRO")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I3385e3349ad00d037b8b94337cb3352623d0a40a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52566
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-17054 lnet: Change cpt-of-nid to get result from kernel 02/52502/5
Chris Horn [Tue, 29 Aug 2023 16:46:13 +0000 (10:46 -0600)]
LU-17054 lnet: Change cpt-of-nid to get result from kernel

The lnetctl cpt-of-nid command leverages a userspace implementation
of the kernel hash_long() function to compute the CPT for a given
NID. However, the kernel hash_long() function has changed over time
such that the userspace version may give a different result than the
kernel version. Since Lustre supports such a wide range of kernels
we cannot simply update the userspace implementation of hash_long() to
match newer kernel.

Address this by re-implementing lnetctl cpt-of-nid to call into kernel
space to compute the CPT and return the result to userspace.

lnetctl cpt-of-nid now works with extended NIDs (e.g., IPv6).

lnetctl cpt-of-nid no longer accepts the --ncpt argument because the
kernel functions for computing the cpt do not support this.

lnetctl cpt-of-nid no longer accepts the --nid argument. Instead, the
command now takes a space separated list of nids.

Example:
$ lnetctl cpt-of-nid 867@kfi 5.3.0.9@tcp
cpt-of-nid:
- nid: 867@kfi
  cpt: 0
- nid: 5.3.0.9@tcp
  cpt: 1
$

Because the old implementation could return a wrong result it is
completely removed.

HPE-bug-id: LUS-11785
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I7c2bc48c5c0da7da8a4425d319c0b99207814ae1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17075 osd: destroy declare shouldn't panic 96/52496/7
Alex Zhuravlev [Mon, 25 Sep 2023 10:11:04 +0000 (13:11 +0300)]
LU-17075 osd: destroy declare shouldn't panic

if the object doesn't exist during declaration.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7d42cad0c04e7941a2f7950fdddaf7c473998b12
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52496
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
5 months agoLU-17120 build: Remove deprecated option from dkms.conf 92/52392/3
Shaun Tancheff [Fri, 15 Sep 2023 15:45:37 +0000 (10:45 -0500)]
LU-17120 build: Remove deprecated option from dkms.conf

dkms-commit: 7114c62aa7ead0036b2c3dc9bac8eac482ef2b20
dkms-change: https://github.com/dell/dkms/commit/7114c62aa7ead0036b2c3dc9bac8eac482ef2b20
  Deprecated feature: --no-initrd
  Deprecated feature: REMAKE_INITRD

In dkms.mkconf REMAKE_INITRD="no" is a deprecated option.

It should be removed.
This also breaks installation with some version of dkms.

Test-Parameters: trivial
HPE-bug-id: LUS-11846
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I628193b6b9920fed6037b31ef2344d37d8a85bd7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52392
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17015 gss: support large kerberos token for rpc sec ctxt 05/52305/23
Sebastien Buisson [Thu, 7 Sep 2023 07:33:36 +0000 (09:33 +0200)]
LU-17015 gss: support large kerberos token for rpc sec ctxt

If the current Kerberos setup is using large token, like when PAC
feature is enabled for Kerberos, authentication can fail due to server
side unable to exchange token between kernel and userspace.
This limitation is inherent to the sunrpc cache mechanism, that can
only handle tokens up to PAGE_SIZE.

For RPC sec context phase, use Lustre's upcall cache mechanism
instead of deprecated kernel's sunrpc cache. Note this phase does not
involve a proper upcall, only the downcall part is relevant to
populate the context computed in userspace.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I94e945a99cab60d5b6a4c40076c40fffede217ab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52305
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-8191 liblustre: add missing functions to header 34/51434/7
Timothy Day [Sat, 24 Jun 2023 18:48:31 +0000 (18:48 +0000)]
LU-8191 liblustre: add missing functions to header

A number of functions were missing from lustreapi.h,
causing them to be marked incorrectly as functions that
could be made static. They have been added to the
header so applications can use them.

Static analysis shows that a number of functions
could be made static. This patch also declares
several functions in liblustre static.

liblustreapi_nodemap.c and liblustreapi_ioctl.c were
missing an internal header, causing some functions
to be incorrectly flagged. This patch also adds that
header.

Initialize a previously uninitialized variable in
llapi_fid_to_handle().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I67b9c59418b62602ffe36eb4284eb1e8d4a3b19b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16901 utils: l_getidentity with nss module support 29/51329/10
Shaun Tancheff [Sat, 23 Sep 2023 17:51:00 +0000 (12:51 -0500)]
LU-16901 utils: l_getidentity with nss module support

Enable l_getidenity to fetch user's supplementary groups
info from NIS, LDAP and/or any other services that NSS modules
exist for.

Add support for local lustre specific users in
  /etc/lustre/passwd
  /etc/lustre/group

Specify the list of modules to be searched, in-order which
allows lookup options to skip group or user searches for
particular user(s) and group(s).

To enable this feature add "lookup <mod1> <mod2> ... <modN>"
as the first line to:
  /etc/lustre/perm.conf

An example usage:
[/etc/lustre/perm.conf]
lookup lustre ldap

[/etc/lustre/passwd]
root:x:0:0:root:/root:/bin/bash

[/etc/lustre/group]
root:x:0:root

The special users in /etc/lustre do not incur the
expense of ldap queries.
Other special users like nobody, anon, etc. may be
useful to have on a cluster but not present in ldap, nis, ...

HPE-bug-id: MRP-1137, MRP-2132, LUS-2503, LUS-2453
Test-Parameters: trivial
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I55387e1df08bf2786ab78740403a1daf5a718d64
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-16802 build: compatibility for 6.4 kernels 75/50875/14
Shaun Tancheff [Fri, 27 Oct 2023 06:21:13 +0000 (01:21 -0500)]
LU-16802 build: compatibility for 6.4 kernels

linux kernel v6.3-rc4-32-g6eb203e1a868
  iov_iter: remove iov_iter_iovec()

Provide a replacement iov_iter_iovec() when one is not provided.

linux kernel v6.3-rc4-34-g747b1f65d39a
  iov_iter: overlay struct iovec and ubuf/len

This renames iov_iter member iov to __iov and provides the
iov_iter() accessor.
Define __iov as iov when __iov not present.
Provide an iov_iter() for older kernels.

linux kernel v6.3-rc1-13-g1aaba11da9aa
  driver core: class: remove module * from class_create()

Provide an ll_class_create() to pass THIS_MODULE, or not,
as needed by class_create().

Linux commit v6.2-rc1-20-gf861646a6562
  quota: port to mnt_idmap

Update osd_dquot_transfer to use mnt_idmap and fallback
to user_ns, if needed, by dquot_transfer.

Linux commit v6.3-rc7-2433-gcf64b9bce950
  SUNRPC: return proper error from get_expiry()

Updated get_expiry() requires a time64_t pointer to be passed
to hold the expiry time. A non-zero return value indicates an
error, nominally -EINVAL. Provide a wrapper for kernels that
return a time64_t and return -EINVAL on error.

Test-Parameters: trivial
HPE-bug-id: LUS-11614
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I765d6257eec8b5a9bf1bd5947f03370eb9df1625
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50875
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
5 months agoLU-17156 tests: Improve zfs_or_rotational() 73/52973/8
Arshad Hussain [Fri, 3 Nov 2023 09:06:03 +0000 (14:36 +0530)]
LU-17156 tests: Improve zfs_or_rotational()

Improve zfs_or_rotational() under test-framework.sh to handle
get_param failure gracefully and not throw bash syntax error.

Fix ostname_from_index() to print the OST name once instead of
twice if there are multiple mountpoints (e.g. sanityn).  If the
caller wants the specific name when there are two different
filesystems mounted, the specific mountpoint should be given.

Test-Parameters: trivial testlist=sanityn
Fixes: 43c3a804fe ("LU-13805 tests: Add racing tests of BIO, DIO")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0b914236865574dadd4ba0cb9a0ba7a7775fefc5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52973
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-10391 lnet: filter out white spaces 20/53020/3
James Simmons [Tue, 7 Nov 2023 16:39:52 +0000 (11:39 -0500)]
LU-10391 lnet: filter out white spaces

For the libyaml library two methods exist to construct an internal
YAML document. One is with the creation of yaml_event_t and submitting
it, yaml_emitter_emit(), to the emitter. The second method is using
some source like a file. In both cases the input is processed and
placed into an internal buffer which is passed to the read handler,
yaml_netlink_read_handler(). This buffer ends up looking in the
raw text of the configuration file passed and this includes all
the various whitespaces. Due to an internal processing bug both
creation methods don't yeild the same exact internal buffer
contents. In the sequence case for sources from a file will
contain extra white spacing. Our current Netlink implement
doesn't filter off that extra white spacing so the values packed
into the Netlink pack contains leading white spaces which is
seen as an error. The fix is to skip those extra white space if
they exist.

Change-Id: I7445ffb486d6d39c681ab4e5a85e0b835509c9ee
Test-Parameters: trivial testlist=sanity-lnet
Fixes: 70149f4ea89 ("LU-9680 utils: fix Netlink / YAML library handling")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53020
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-10391 lnet: hops -1 is valid 19/53019/2
James Simmons [Tue, 7 Nov 2023 16:30:02 +0000 (11:30 -0500)]
LU-10391 lnet: hops -1 is valid

For route setup a hops value of -1 is valid. We were assuming
userland would never send a -1 which is wrong.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 6557cd4b8c8 ("LU-10391 lnet: migrate router management to Netlink")
Change-Id: I616334fccfe3aba6409f1a856c62cf02d07782a9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53019
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: use lnet_ni_get_status_locked for lnet_net_show_dump 16/53016/2
James Simmons [Tue, 7 Nov 2023 15:57:46 +0000 (10:57 -0500)]
LU-10391 lnet: use lnet_ni_get_status_locked for lnet_net_show_dump

In my testing of IPv6 I was always seeing the NI state as "down".
This is incorrect and I found this was due to reading ni->ni_status
directly. Using lnet_ni_get_status_locked() fixes the issue.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8f8f6e2f36e ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Change-Id: I490144ceae4a5c1cdd7c920661f8220033df8cd5
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: support setting LND timeouts 13/53013/4
James Simmons [Thu, 9 Nov 2023 19:54:43 +0000 (14:54 -0500)]
LU-10391 lnet: support setting LND timeouts

The patch that added support for NI setup with Netlink was
developed before individual LND timeout settings support was
merged. Add this missing settings. For ksocklnd we already
supported conns_per_peer so rearrange the code into a switch
statement.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8f8f6e2f36e ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Change-Id: Iba955da7f5fa78b8a624bab6af66b577c75917e0
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53013
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17142 mgc: reconnection without pinger 98/52498/5
Alexander Boyko [Tue, 22 Aug 2023 09:53:14 +0000 (05:53 -0400)]
LU-17142 mgc: reconnection without pinger

When MGS was offline for some time, AT is increased and
connection request deadline is high. Reconnect with a pinger
waits a request deadline for a next attempt. A situation is
worse with a failover partner, when different connections are used.
Reconnection could fail with local MGS too.

Here is the error when MGC could not connect to a local MGS, MDT
combined with MGS.

LustreError: 15c-8: MGC90@kfi:
Confguration from log kjlmo12-MDT0000 failed from MGS -5.

The patch forces reconnection with import invalidate and aborts
inflight requests.

ptlrpc_recover_import() aborts waiting for disconnect import state.
But disconnect happens between connection attempt and it is valid.
This is fixed.

Reset Adaptive Timeout when local MGS starts. It allows MGC to
reconnect efficiently.

mgs_barrier_gl_interpret_reply() should handle EINVAL from a client,
it means client don't have a lock.

HPE-bug-id: LUS-11633
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ie631e04fb3e72900af076cf7f268f20f7b285445
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13805 llite: Implement unaligned DIO connect flag 26/51126/44
Patrick Farrell [Tue, 24 Oct 2023 18:29:27 +0000 (14:29 -0400)]
LU-13805 llite: Implement unaligned DIO connect flag

Unupgraded ZFS servers may crash if they received unaligned
DIO, so we need a compat flag and a test to recognize those
servers.

This patch implements that logic.

Fixes: 7194eb6431 ("LU-13805 clio: bounce buffer for unaligned DIO")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5d6ee3fa5dca989c671417f35a981767ee55d6e2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51126
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-16468 llite: protect layout before read IO going 22/49622/6
Bobi Jam [Fri, 13 Jan 2023 04:36:01 +0000 (12:36 +0800)]
LU-16468 llite: protect layout before read IO going

It's possible that the before the read IO, file_read_confine_iter()
->lov_attr_get() to get proper kms (known minimum size of the file),
and lov_attr_get() presumes that it's called under ongoing IO, which
protected the layout from changing, while it's not in this case.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I1b36ec6e158331e63e8026ee2b986d5a7e3cb6dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49622
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12998 mds: add no_create parameter to stop creates 24/47124/19
Andreas Dilger [Sat, 23 Apr 2022 00:10:36 +0000 (18:10 -0600)]
LU-12998 mds: add no_create parameter to stop creates

Add an target tunable parameter and mount option "no_create" to
disable new *directory* creation on an MDT.  This sends the
flag OS_STATFS_NOCREATE to the clients, and the DNE MDT space
balance will avoid selecting that MDT when creating a new
subdirectory, without disabling access to existing files/dirs.

This allows "soft disabling" an MDT in advance of storage
upgrades to minimize new directories and files created on that
MDT, reduce future migration, and/or backup/restore workload.

As yet it does not totally disable *file* creation on the MDT,
but it may be extended to do so in the future.

This is analogous to the "no_precreate" option that was added
on the OSTs, and "no_create" has been added to the OSTs for
consistency ("no_precreate" is kept for compatibility for now).

Test-Parameters: testlist=conf-sanity env=ONLY=112b,ONLY_REPEAT=50
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I53cfb48ade2f844b18bfc630e7fcea6de9ce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10465 osd-ldiskfs: 8MiB IOs should bypass cache 89/52989/3
Andreas Dilger [Fri, 3 Nov 2023 23:49:29 +0000 (17:49 -0600)]
LU-10465 osd-ldiskfs: 8MiB IOs should bypass cache

Changes the writethrough_max_io_mb and readcache_max_io_mb
params to check for IO size >= max_io_mb instead of > max_io_mb
when deciding to bypass cache.

Read/write IOs that are 8MiB in size should bypass the pagecache
on the OSTs, rather than requiring IOs that are slightly larger
than this.  8MiB is enough to submit 1MiB to each HDD spindle in
an 8+2 RAID6, and caching these writes on the OSS is not helping.

Test-Parameters: trivial
Fixes: 3043c6f189 ("LU-12071 osd-ldiskfs: bypass pagecache if requested")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iae435f5b99e2e8bc6a9458fedad65a81c2853350
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52989
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16958 llite: migrate deadlock on not responding lock cancel 88/52388/13
Bobi Jam [Wed, 13 Sep 2023 17:24:19 +0000 (01:24 +0800)]
LU-16958 llite: migrate deadlock on not responding lock cancel

lfs migrate race makes MDS hang with following backtrace

[ 3683.248584] [<0>] ldlm_completion_ast+0x78d/0x8e0 [ptlrpc]
[ 3683.250122] [<0>] ldlm_cli_enqueue_local+0x2fd/0x840 [ptlrpc]
[ 3683.251363] [<0>] mdt_object_local_lock+0x50e/0xb10 [mdt]
[ 3683.252615] [<0>] mdt_object_lock_internal+0x187/0x430 [mdt]
[ 3683.253793] [<0>] mdt_object_lock_try+0x22/0xa0 [mdt]
[ 3683.254857] [<0>] mdt_getattr_name_lock+0x1317/0x1dc0 [mdt]
[ 3683.256016] [<0>] mdt_intent_getattr+0x264/0x440 [mdt]
[ 3683.257105] [<0>] mdt_intent_opc+0x452/0xa80 [mdt]
[ 3683.258126] [<0>] mdt_intent_policy+0x1fd/0x390 [mdt]
[ 3683.259191] [<0>] ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
[ 3683.260350] [<0>] ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
[ 3683.261596] [<0>] tgt_enqueue+0xa4/0x200 [ptlrpc]
[ 3683.262662] [<0>] tgt_request_handle+0xc9c/0x1a40 [ptlrpc]
[ 3683.263860] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
[ 3683.265220] [<0>] ptlrpc_main+0xbf3/0x1540 [ptlrpc]
[ 3683.266303] [<0>] kthread+0x134/0x150
[ 3683.267111] [<0>] ret_from_fork+0x35/0x40

The deadlock happens as follows:
T1:
vvp_io_init()
  ->ll_layout_refresh() <= take lli_layout_mutex
  ->ll_layout_intent()
  ->ll_take_md_lock()  <= take the CR layout lock ref
  ->ll_layout_conf()
    ->vvp_prune()
    ->vvp_inode_ops() <= release lli_layout_mtex
    ->vvp_inode_ops() <= try to acquire lli_layout_mutex
    -> racer wait here for T2
T2:
->ll_file_write_iter()
  ->vvp_io_init()
    ->ll_layout_refresh() <= take lli_layout_mutex
    ->ll_layout_intent() <= Request layout from MDT
    -> racer wait from server...

And server want to cancel the CR layout lock T1 hold, and it won't
happen. Also T1 could has take extent ldlm lock while waiting
lli_layout_mutex hold by T2, and ofd_destroy_hdl does not get the
lock cancellation response from T1.

lli_layout_mutex is only needed for enqueuing layout lock from server,
so that ll_layout_conf() does not involve with lli_layout_mutex.

Fixes: 8f2c1592c3 ("LU-16958 llite: migrate vs regular ops deadlock")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib94de2c63544c3a962199aa0537418255980ae8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52388
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17034 quota: lqeg_arr memmory corruption 94/52094/9
Sergey Cheremencev [Fri, 25 Aug 2023 06:22:26 +0000 (10:22 +0400)]
LU-17034 quota: lqeg_arr memmory corruption

Fix memory corruption caused by accessing memory
out of array lqeg_arr. It could happen when at least
one of OSTs has index larger than the whole number
of OSTs. For example, if the system has 4 OSTs with
indexes 0001, 0002, 00c9, 00ca. This issue more often
corrupted bucket_table in obd_uuid_hash or obd_nid_hash
causing to crash rhashtable code. However, it could
be the reason of other panics depending on the type
of corrupted neighbour memory region.

This patch adds an lge_idx field to each lqe global entry
to store index of the OST. It is needed to map OST index
to the array index to avoid out-of-bound array access.

This patch also add locking to protect lqe_glbl_data in
qmt_set_revoke and qmt_clear_lgeg_arr_nu. This was
forgotten in 50ff4d1da6.

This patch begins to store all connected MDTs in the quota
global pool. Thus handling MDTs beginning from this patch
is the same with OSTs stored in the global pool. It is the
1st step to introduce MDT pools.

Add conf-sanity_33c that reproduces mentioned memory
corruption without the fix.

Fixes: 50ff4d1da6 ("LU-16772 quota: protect lqe_glbl_data in qmt_site_recalc_cb")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Id6e4bcde09d9f32726d69f711eedb82729a2266e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoNew tag 2.15.59 2.15.59 v2_15_59
Oleg Drokin [Wed, 8 Nov 2023 22:18:20 +0000 (17:18 -0500)]
New tag 2.15.59

Change-Id: Icf2d43e6bf0847f2ddf81ef4f80b939a18adaa42
Signed-off-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17184 mgc: remove damaged local configs 97/52697/6
Mikhail Pershin [Fri, 13 Oct 2023 21:28:58 +0000 (00:28 +0300)]
LU-17184 mgc: remove damaged local configs

If local config llog is damaged it can't be removed and
prevents target from mounting. This happens because
mgc_llog_local_copy() uses llog_erase() to remove llogs
which can't do the job if llog header is damaged.

Patch changes are:
- llog_erase() to don't initialize header but just destroy
  llog file
- mgc_llog_local_copy() to don't exit on backup to temp
  file but continue with remote llog copying anyway
- conf-sanity test_151 is added to check that target can
  mount with damaged local config

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I637749c38fd5ed03bdac5ca1cd60196f724ab0d1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52697
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17213 llite: check sdio before freeing it 57/52757/2
Patrick Farrell [Thu, 19 Oct 2023 14:30:57 +0000 (10:30 -0400)]
LU-17213 llite: check sdio before freeing it

We check something in the sdio after freeing it.  Oops.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1eae4bfe5fd83e5d8763266b1a7b3c5cb3118158
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52757
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17259 lnet: kgnilnd_nl_get should return 0 on success 72/52972/3
Shaun Tancheff [Fri, 3 Nov 2023 07:35:16 +0000 (02:35 -0500)]
LU-17259 lnet: kgnilnd_nl_get should return 0 on success

Fix build failure
error: control reaches end of non-void function [-Werror=return-type]

Test-Parameters: trivial
Fixes: d15bfca078 ("LU-10391 lnet: migrate full LNet NI information collection")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I09dd76c46620107d6c3f89cf59b9d9190578ef60
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52972
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17258 socklnd: ensure connection type established upon race 57/52957/3
Chris Horn [Thu, 2 Nov 2023 19:28:45 +0000 (12:28 -0700)]
LU-17258 socklnd: ensure connection type established upon race

When a connection race is hit between two peers, only increment the
retry count if a connection of the specific type has already been
established; otherwise, this can lead to an unexpected value set in
ksnr_connected and some of the assertions being triggered in
ksocknal_connect():

"ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed"

Fixes: da893c6c97 ("LU-16191 socklnd: limit retries on conns_per_peer mismatch")
HPE-bug-id: LUS-11922
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I6e8abb39ad3c0bcd7fbc8f8c5478c903029df908
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52957
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
6 months agoLU-17256 debian: allow building client dkms on arm64 51/52951/3
Aurelien Degremont [Thu, 2 Nov 2023 13:46:18 +0000 (14:46 +0100)]
LU-17256 debian: allow building client dkms on arm64

Just add 'arm64' on the supported architecture list
for 'lustre-client-modules-dkms' debian package.

Test-Parameters: trivial
Change-Id: I2af307ee87448faeec81f6e0e27573ae980710f1
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52951
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
6 months agoLU-17000 coverity: Fix Logically dead code under liblnetconfig.c 50/52950/2
Arshad Hussain [Wed, 1 Nov 2023 07:53:54 +0000 (13:23 +0530)]
LU-17000 coverity: Fix Logically dead code under liblnetconfig.c

This patch fixes Logically dead code check reported
by coverity run.

CoverityID: 404752 ("Logically dead code")

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5a435324a19e04805c2a7c555ac2a0c1433ce2d0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52950
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17254 lnet: Fix ofed detection with specific kernel version 49/52949/2
Aurelien Degremont [Thu, 2 Nov 2023 10:55:30 +0000 (11:55 +0100)]
LU-17254 lnet: Fix ofed detection with specific kernel version

Improve OFED configure step with LNET when the kernel version
is using special characters that could be interprated in regexp
mode.

This is not uncommon in Debian world to have '+' in kernel version.

Test-parameters: trivial
Change-Id: Ia3da59c74d8c2e59e16525dd50c7b83c2b5fada8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52949
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17249 ptlrpc: protect scp_rqbd_idle list operations 31/52931/2
Mikhail Pershin [Wed, 1 Nov 2023 14:55:39 +0000 (17:55 +0300)]
LU-17249 ptlrpc: protect scp_rqbd_idle list operations

Protect scp_rqbd_idle list entry getting by spinlock
in ptlrpc_service_purge_all() like it does in all
other places where rqbd_list linkage is being managed

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Iace37b1ee79bfd0c3a54a35722952e17d860a91c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52931
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17000 coverity: Fix Dereference after null under client.c 01/52901/4
Arshad Hussain [Tue, 31 Oct 2023 08:48:57 +0000 (14:18 +0530)]
LU-17000 coverity: Fix Dereference after null under client.c

This patch fixes Dereference after null check reported
by coverity run.

CoverityID: 404748 ("Dereference after null check")

Fixes: 0f2bc318d7 ("LU-15246 ptlrpc: per-device adaptive timeout parameters")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Id26de5e700b0a420168b359b050d739222574bd2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-17242 debug: remove CFS_CHECK_STACK 83/52883/3
Timothy Day [Mon, 30 Oct 2023 02:18:01 +0000 (02:18 +0000)]
LU-17242 debug: remove CFS_CHECK_STACK

CFS_CHECK_STACK is primitive, doesn't work on
x86_64, and only dumps a stack in kernel log
when we are fairly close to passing the stack
limit anyway.

Admins and developers can grab the same info from
debug/tracing/stack_trace and debug/tracing/stack_max_size
on a live system. And the kernel will dump a stack
if it 'Oops' from going over the stack limit.

We don't need an additional Lustre specific stack
checking mechanism.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Icc7c82f6a0dcd727de6ce2c2d40ba071ee349c0c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52883
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-16868 tests: skip conf-sanity/32 in interop 35/52835/3
Andreas Dilger [Thu, 26 Oct 2023 00:57:47 +0000 (18:57 -0600)]
LU-16868 tests: skip conf-sanity/32 in interop

Do not run conf-sanity.sh test_32* in interop testing.  Otherwise,
it is possible that the version of the test script running on the
client does not perform the upgrades with the right steps needed
for remote servers that are running a different version.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=32a
Test-Parameters: testlist=conf-sanity env=ONLY=32a serverversion=2.12.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iabe1469a87d58c49e3c38b76ab18f8997f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16796 lfsck: Change lfsck_assistant_object to use kref 11/52811/2
Arshad Hussain [Tue, 24 Oct 2023 08:42:50 +0000 (14:12 +0530)]
LU-16796 lfsck: Change lfsck_assistant_object to use kref

This patch changes struct lfsck_assistant_object to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I763a44d2c74f758da5a137c6673f8dfd2ef6dc0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16796 target: Change struct job_stat to use kref 99/52799/2
Arshad Hussain [Mon, 23 Oct 2023 09:24:06 +0000 (14:54 +0530)]
LU-16796 target: Change struct job_stat to use kref

This patch changes struct job_stat to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I02bcf91d4c19915bc37601c90172b1de37b87811
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17205 utils: add lctl get_param -H option 30/52730/6
Aurelien Degremont [Tue, 17 Oct 2023 13:07:45 +0000 (15:07 +0200)]
LU-17205 utils: add lctl get_param -H option

- Add a new '-H' option to 'lctl get_param' that will prefix
each output line with the parameter name instead of only
the first line by default.

That makes grepping lctl get_param with wildcards much easier
as you can now easily know which parameter returns which value.

  $ lctl get_param -H osc.*.state | grep current
  osc.lustre-OST0000-osc-ff1148c0.state=current_state: FULL
  osc.lustre-OST0001-osc-ff1248c0.state=current_state: DISCONN
  osc.lustre-OST0002-osc-ff1348c0.state=current_state: FULL
  osc.lustre-OST0003-osc-ff1448c0.state=current_state: FULL
  osc.lustre-OST0004-osc-ff1548c0.state=current_state: FULL

It also prints an output line even for empty values. That also
makes like easier for admins.

- The patch also removes the force line feed if the parameter
value was larger than 80 chars. This was considered a misfeature
and is now drop for all usages, with or without -H.

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: Ib1fa0dc400db4c19fed10ad4cced9be5668418e3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52730
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17204 lod: don't panic on short LOVEA 27/52727/3
Alex Zhuravlev [Tue, 17 Oct 2023 12:21:18 +0000 (15:21 +0300)]
LU-17204 lod: don't panic on short LOVEA

when we request LOVEA and find the existing buffer is not enough,
we ask for LOVEA's size and reallocate the buffer. but LOVEA can
shrink in parallel (e.g. new default striping), so our expectation
that the size must be greater than size of the existing buffer is
not correct. replace the corresponding assertion with a simple
repeat + extra check for a livelock.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I26ad5091228bf78858f8538478dbcbdb235cddf4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52727
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17203 libcfs: ignore remaining items 26/52726/6
Alex Zhuravlev [Tue, 17 Oct 2023 11:48:58 +0000 (14:48 +0300)]
LU-17203 libcfs: ignore remaining items

remove the assertion checking libcfs hashtable for emptiness
in cfs_hash_for_each_empty(). the only user of this hashtable
is per-export ldlm locks set. in this case it's legal that
some locks can't be removed from the hashtable being in the
process of enqueuing. the hashtable is destroyed from the
export destroy function which in turn is called only when all
RPCs on this export are done (exp_rpc_count==0).

Fixes: 306a9b666e ("LU-16272 libcfs: cfs_hash_for_each_empty optimization")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2b853b017bb7247a0c60cc8f464c2e08d649f0eb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52726
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-16796 ofd: Change struct ofd_seq to use refcount_t 22/52722/7
Arshad Hussain [Tue, 17 Oct 2023 08:22:31 +0000 (13:52 +0530)]
LU-16796 ofd: Change struct ofd_seq to use refcount_t

This patch changes struct ofd_seq to use refcount_t
instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie149a6812671ea872e17d2881e52cf6096d147ff
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52722
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17196 tests: sanity-lnet test_310 MR support 95/52695/2
Chris Horn [Sun, 8 Oct 2023 04:52:18 +0000 (23:52 -0500)]
LU-17196 tests: sanity-lnet test_310 MR support

Modify sanity-lnet test_310 to work with multi-rail configs.

Test-Parameters: trivial testlist=sanity-lnet env=ONLY=310
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I19d02d5a7da1f9ca2b9c0de791bb63b94dff1fdd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52695
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16518 rsync: fix new clang error in lustre_rsync.c 65/52565/3
Timothy Day [Tue, 3 Oct 2023 16:06:18 +0000 (16:06 +0000)]
LU-16518 rsync: fix new clang error in lustre_c

If we bail out of the function early, changelog_priv
may not be initialized when we check it during the
cleanup code. Initialize changelog_priv to NULL to
avoid this.

Fixes: 4fc99832 ("LU-17000 coverity: Fix Resource Leak(1)")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ide0430284b18f085e421cdea677ea14e0a4bac14
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52565
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
6 months agoLU-17207 lnet: race b/w monitor thr stop and discovery push 34/52734/6
Serguei Smirnov [Tue, 17 Oct 2023 18:43:14 +0000 (11:43 -0700)]
LU-17207 lnet: race b/w monitor thr stop and discovery push

As a result of race, discovery thread may attempt to dereference
a message on ln_mt_resendqs which was just freed by monitor thread
stopping. Make sure discovery thread is stopped first.

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0dfcf3bc5bb3c8df195388599f571bdd3caaa3d7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
6 months agoLU-16896 flr: resync could mess mirror content 89/52489/3
Bobi Jam [Sat, 23 Sep 2023 16:18:36 +0000 (00:18 +0800)]
LU-16896 flr: resync could mess mirror content

This could happens in this case: a component with extent [0, 0x1000)
being synced with another mirror with data in [0x100, 0x200) and file
size is 0x300.

The out-of-sync mirror will get punched in [0, 0x100) and data being
correctly written in [0x100, 0x200), but will be missed the punch in
[0x200, 0x300) even though it can be correctly truncated at 0x300.

Fixes: b9ce342ee1 ("LU-16896 flr: resync should not change file size")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: If81433260f42c6161a53157cc0dd9d115eabbfb9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52489
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17132 kernel: update RHEL 8.8 [4.18.0-477.27.1.el8_8] 22/52422/2
Jian Yu [Wed, 20 Sep 2023 07:18:13 +0000 (00:18 -0700)]
LU-17132 kernel: update RHEL 8.8 [4.18.0-477.27.1.el8_8]

Update RHEL 8.8 kernel to 4.18.0-477.27.1.el8_8.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.8 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.8 serverdistro=el8.8 testlist=sanity

Change-Id: I4edd823b273c75618bc6dea236be8d64ed7c13ed
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52422
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17191 osc: only call xa_insert for new entries 13/52713/9
James Simmons [Fri, 3 Nov 2023 00:33:50 +0000 (20:33 -0400)]
LU-17191 osc: only call xa_insert for new entries

The handling of updating the Xarray entry is incorrect. If the
Xarray already exist and we call xa_insert() it will return -EBUSY
and not continue examining the entry. The correct approach is
if xa_load() returns a value never call xa_insert(). With the
return value of xa_load() we test if type bit was set and if not
set it.

We add ll_xa_insert to spelling.txt since the return value changed
with kernel versions. This will avoid mistakes in the future for
cases we look at the return value of xa_insert().

Test-Parameters: testlist=sanity-quota
Test-parameters: clientdistro=ubuntu2204 testlist=sanity-quota
Fixes: ac8c28f959 ("LU-8130 osc: convert osc_quota hash to xarray")
Change-Id: Icd88f5e47fb24f669bf362c1daec500b743b447b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52713
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17115 quota: fix race of acquiring locks in qmt 71/52371/7
Hongchao Zhang [Thu, 26 Oct 2023 12:46:44 +0000 (20:46 +0800)]
LU-17115 quota: fix race of acquiring locks in qmt

In qmt_delete_qid and qmt_reset_qid, the order to require
the lock of lquota_entry and journal is different from that
in qmt_dqacq0, which could cause deadlock in some cases.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ic439f2c5d6ca22429422b87f0dde65e0d2e6113d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17071 o2iblnd: IBLND_REJECT_EARLY condition causes Oops 02/52202/4
Serguei Smirnov [Thu, 31 Aug 2023 19:34:00 +0000 (12:34 -0700)]
LU-17071 o2iblnd: IBLND_REJECT_EARLY condition causes Oops

The message printed when kiblnd_passive_connect recognizes
IBLND_REJECT_EARLY condition introduced by LU-16393 is trying
to derefence a NULL pointer in the parameter list. Fix this.

Test-parameters: trivial
Fixes: 673ff86a84a ("LU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I711e5855383c140b9f7c35b27f48995f3f0e25ee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52202
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
6 months agoLU-17232 build: fix ext4-misc for el7.6 server 50/52850/3
Shaun Tancheff [Fri, 27 Oct 2023 05:14:52 +0000 (00:14 -0500)]
LU-17232 build: fix ext4-misc for el7.6 server

rhel7.6/ext4-misc.patch was partially updated
however the intended addition of:

+EXPORT_SYMBOL(ext4_chunk_trans_blocks);

is not present.

Test-Parameters: trivial
HPE-bug-id: LUS-11954
Fixes: 9e5040a304 ("LU-16847 ldiskfs: do not copy ldiskfs_chunk_trans_blocks")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I28c3f1cc52af61b8b1b5036cf8c7cbce75b5c895
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52850
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
6 months agoLU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c 01/52701/5
James Simmons [Sun, 15 Oct 2023 00:25:55 +0000 (20:25 -0400)]
LU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c

Move the libcfs_mem.c code to the LNet core. The prototypes are declared in libcfs_cpu.h
but we don't move them yet since the CPT code depends on the libcfs_mem.c work. This can
end up in a modular cyclic dependency if we move the CPT work right away so limit what is
changed at this point.

Test-Parameters: trivial
Change-Id: I6bf5cd9f20033f988dde1989f0fc5f89ea74b5a2
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52701
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17131 ldiskfs: el9.2 encdata and filename-encode 12/52412/6
Shaun Tancheff [Wed, 18 Oct 2023 09:37:21 +0000 (04:37 -0500)]
LU-17131 ldiskfs: el9.2 encdata and filename-encode

Add encryption support for el9.2

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I03a42777eb7e23ab0934452461d3581d0b670af1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52412
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 months agoLU-16097 quota: release preacquired quota when over limits 76/48576/25
Hongchao Zhang [Thu, 19 Oct 2023 06:33:47 +0000 (14:33 +0800)]
LU-16097 quota: release preacquired quota when over limits

The pre-acquired quota on each MDT or OST should be released when
the whole quota is over limits, for instance, after the quota limits
had been decreased for some quota ID by Administrator.

Test-Parameters: testlist=sanity-quota ossversion=2.15.3
Test-Parameters: testlist=sanity-quota mdsversion=2.15.3
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6263b835d4ae6a3fd03f9a2bc4f463949cbc74d4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48576
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
6 months agoLU-14361 statahead: add tunable for fname pattern statahead 72/51572/9
Qian Yingjin [Wed, 5 Jul 2023 09:05:57 +0000 (05:05 -0400)]
LU-14361 statahead: add tunable for fname pattern statahead

This patch adds a tunable parameter for fname pattern statahead.
Currenty fname pattern statahead is disabled.
It will be enabled by default once the patch series of fname
pattern statahead work is finished and stable.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7a207e66df10b45bff1b3993e8724116489365b7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51572
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
6 months agoLU-4974 lod: Change pool_desc to "[lod|lov]_pool_desc" 14/11114/8
Arshad Hussain [Thu, 26 Oct 2023 16:42:11 +0000 (22:12 +0530)]
LU-4974 lod: Change pool_desc to "[lod|lov]_pool_desc"

This patch changes 'struct pool_desc' under lov and lod
to 'lov_pool_struct' and 'lod_pool_struct' respectively.
This is the first step to check if there is anything
common and can be unify. Although both layer uses
'struct pool_desc' to define the pool_desc struct
respectively. 'struct pool_desc' under lod has changed
and grown. Therefore to remove ambiguity, prefix lod/lov
is added to pool_desc struct for respective layer.

This patch also adds function description wherever
applicable

This patch also changes space/tabs reported by
checkpatch

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3fee3f2e9c321145779d9177a8e4582d123f1e8d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/11114
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10391 ksocklnd: use ksocknal_protocol v4 for IPv6 11/52911/2
James Simmons [Tue, 31 Oct 2023 16:53:40 +0000 (12:53 -0400)]
LU-10391 ksocklnd: use ksocknal_protocol v4 for IPv6

During testing of IPv6 I encountered the following error:

LNetError: 11c-c: Protocol error connecting to ...

This was due to the ksocklnd protocol not being set to the correct
version when using IPv6 addresses. Test if we have a IPv6 peer and
set the proper protocol.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I19183e904acee4cb8b9d3b7e77284c81f6cdc2b4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52911
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17235 o2iblnd: adding alias ib interface causes crash 94/52894/3
Serguei Smirnov [Mon, 30 Oct 2023 19:13:45 +0000 (12:13 -0700)]
LU-17235 o2iblnd: adding alias ib interface causes crash

Commit 09c6e2b872 (LU-16836) causes o2iblnd startup routine to crash
when alias ib interface is used:

        ifconfig ib0:0 10.1.0.52 up
        modprobe lnet
        lnetctl lnet configure
        lnetctl net add --net o2ib --if ib0:0

Fix the code which attempts to set the NI status on startup to deal
with the case when corresponding net_device is not found gracefully.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 09c6e2b872 ("LU-16836 lnet: ensure dev notification")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iaef9280a10f27ac28b872d9f4bc119c4d459ef40
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17225 utils: check_iam print records 15/52815/3
Alexander Boyko [Sat, 21 Oct 2023 08:30:12 +0000 (04:30 -0400)]
LU-17225 utils: check_iam print records

Fix adds ability to print records with -r option.
Format is FID ino/igen.

[0x200000001:0x3:0x0] 88/2116153859
[0x200000001:0x4:0x0] 89/182164966

If a root block has wrong sizes, stop processing.
Also fixing a test.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=134
HPE-bug-id: LUS-11905
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2e11d1a8d675e6be9ea79023348a18370a0bb5a0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52815
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17197 obdclass: preserve fairness when waiting for rpc slot 38/52738/6
Shaun Tancheff [Wed, 18 Oct 2023 03:54:59 +0000 (22:54 -0500)]
LU-17197 obdclass: preserve fairness when waiting for rpc slot

When obd_get_mod_rpc_slot() waits for an available slot it places the
waiting thread at the HEAD of the queue, so it will be woken before
anything else that is already queued.  This is clearly unfair and can
hurt performance.

So change to always add to the tail to ensure a FIFO ordering (except
that CLOSE might sometimes be woken a bit early).

This regression was introduced in a rewrite that was supposed to make
waiting more fair - by avoiding a broadcast wakeup for "close"
requests.

Also fix some stale comments and expose __add_wait_queue_entry_tail

Running mdtest with the patch applied shows about a 3% improvement:

                             master            patched
  mdtest-easy-write      350.585906 kIOPS   353.783545 kIOPS
   mdtest-easy-stat     1320.329353 kIOPS  1408.320419 kIOPS
 mdtest-easy-delete      285.084103 kIOPS   289.625900 kIOPS
            [SCORE]      509.115803 kiops   524.516113 kiops

Fixes: 5243630b09d2 ("LU-15947 obdclass: improve precision of wakeups for mod_rpcs")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If767c4299bcbab71589b0f3c01e85bf461686ca5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52738
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-10391 lnet: handle discovery with Netlink 53/50253/3
James Simmons [Mon, 23 Oct 2023 14:18:51 +0000 (10:18 -0400)]
LU-10391 lnet: handle discovery with Netlink

Move the LNet discover feature to the Netlink API. This change
enables the detection of remote LNet setups using large NID
addresses. We treat LNet discover as a ping doit function since
the output is nearly identical to pings. Returned are successes
as well as failed attempts to discover the requested NIDs.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Id0eb4adcb4561cfae96040086aae85d6ff804259
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50253
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16356 hsm: add running ref to the coordinator 56/51256/20
Etienne AUJAMES [Thu, 31 Aug 2023 14:46:20 +0000 (16:46 +0200)]
LU-16356 hsm: add running ref to the coordinator

This patch replaces the fe5706e by adding a reference "cdt_ref" when
the coordinator is running (it does not trust HSM state).
This avoids to de-init the coordinator while still in use (e.g:
thread to add an hsm request) and avoids complex locking on HSM state.

It also causes the coordinator thread to exit if
cdt_start_pending_restore() fails. Otherwise, this can produce a lot
of unexpected behavior (hang, crash).

The patch modifies mdc_kuc_reregister() to register the hsm agent in
background. This make independent reconnect and the agent
registration. It enables to re-activate resend for HSM_CT_REGISTER
without the LU-13455. The coordinator returns EINPROGRESS if not
ready and the client will resend the request for that case. So the
copytools can wait the coordinator to be ready.

Add regression test sanity-hsm 409a.

Fixes: fe5706e ("LU-16235 hsm: check CDT state before adding actions llog")
Fixes: 3d58403 ("LU-13455 ptlrpc: connect to MDT stucks")
Test-Parameters: testlist=sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=107,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm env=ONLY=409a,ONLY_REPEAT=20
Test-Parameters: testlist=conf-sanity env=ONLY=132,ONLY_REPEAT=20
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I14302d1053abbe76eeaaa1a63c6fd6d9b530baa9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51256
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-16032 osd: move unlink of large objects to separate thread 95/47995/31
Artem Blagodarenko [Fri, 13 Oct 2023 07:49:07 +0000 (15:49 +0800)]
LU-16032 osd: move unlink of large objects to separate thread

Final unlink and freeing of blocks for large objects can lead to
a thread hung with this call stack:

  Net: Service thread pid 1739 was inactive for 200.16s.
  The thread might be hung, or it might only be slow and will
  resume later.
  Dumping the stack trace for debugging purposes:
    __wait_on_buffer+0x2a/0x30
    ldiskfs_wait_block_bitmap+0xe0/0xf0 [ldiskfs]
    ldiskfs_read_block_bitmap+0x31/0x60 [ldiskfs]
    ldiskfs_free_blocks+0x329/0xbb0 [ldiskfs]
    ldiskfs_ext_remove_space+0x8a9/0x1150 [ldiskfs]
    ldiskfs_ext_truncate+0xb0/0xe0 [ldiskfs]
    ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs]
    ldiskfs_evict_inode+0x58a/0x630 [ldiskfs]
    evict+0xb4/0x180
    iput+0xfc/0x190
    osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
    lu_object_free.isra.30+0x68/0x170 [obdclass]
    lu_object_put+0xc5/0x3e0 [obdclass]
    ofd_destroy_by_fid+0x20e/0x500 [ofd]
    ofd_destroy_hdl+0x267/0x9f0 [ofd]
    tgt_request_handle+0xaee/0x15f0 [ptlrpc]
    ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
    ptlrpc_main+0xb34/0x1470 [ptlrpc]
    kthread+0xd1/0xe0

Let's move final unlink to workqueue if inode size > 1GB.  The size
threshold be configured by setting the minimum async truncate size
with the "osd-ldiskfs.*.delay_unlink_mb" parameter.

Writes to "osd-ldiskfs.*.force_sync" parameter will flush pending
delayed unlinks so that space can be reclaimed as needed.

Change-Id: Id535ae4c58732769effabee42835bc2da8cb5cc1
Signed-off-by: Artem Blagodarenko <ablagodarenko@whamcloud.com>
DDN-bug-id: DDN-3144
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47995
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17181 misc: don't block reclaim threads 27/52627/2
Alexey Lyashkov [Wed, 11 Oct 2023 09:04:32 +0000 (12:04 +0300)]
LU-17181 misc: don't block reclaim threads

memory reclaim threads may blocked by lustre reclaim
process, but lustre don't have any benifit from parallel
reclaim.

Test-Parameters: trivial
HPe-bug-id: LUS-11872
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I624edbb8833975864706ec51537d2954f5a9cea4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17103 lnet: use workqueue for lnd ping buffer updates 22/52522/18
Serguei Smirnov [Tue, 26 Sep 2023 23:57:46 +0000 (16:57 -0700)]
LU-17103 lnet: use workqueue for lnd ping buffer updates

Introduce workqueue for handling lnd-initiated ping buffer
update requests.

This is done to avoid the possibility of monitor thread
lock up waiting for the "old" ping buffer refcount to get
decremented during the update, while the message which
triggers the decrement is on the monitor thread's own queue
waiting to be processed.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY="207 500",ONLY_REPEAT=50
Fixes: 7ac399c5 ("LU-16949 lnet: get monitor thread to update ping buffer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I5176581703e52f4adbfff417040bebcc2489b79e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
6 months agoLU-17198 tests: running_in_vm to recognize qemu 04/52704/4
Alex Zhuravlev [Sun, 15 Oct 2023 18:58:07 +0000 (21:58 +0300)]
LU-17198 tests: running_in_vm to recognize qemu

qemu is reported in dmidecode's system-manufacturer field,
not in system-product-name

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id7e01ae2825835080d29ebec1750e53b87f5cf04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-17000 coverity: Fix Resource Leak(2) 86/52686/4
Arshad Hussain [Fri, 13 Oct 2023 08:27:38 +0000 (13:57 +0530)]
LU-17000 coverity: Fix Resource Leak(2)

This patch fixes error reported by coverity run.

CoverityID: 397140 ("Resource leak"): lfs.c
CoverityID: 397370 ("Resource leak"): dir.c
CoverityID: 397378 ("Resource leak"): obd.c
CoverityID: 397406 ("Resource leak"): obd.c

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I92bbf6749d667631b92868bcaf78af558c250441
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17000 coverity: Fix leak under mgc_request.c 56/52656/3
Arshad Hussain [Thu, 12 Oct 2023 08:46:43 +0000 (04:46 -0400)]
LU-17000 coverity: Fix leak under mgc_request.c

This patch fixes resource leak error reported
by coverity run.

CoverityID: 403113 ("Resource leak"): mgc_request.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie569a18cbacdb48c186d38ccc466ce86eeb1b28f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17149 tbf: nrs_tbf_id_cli_set should not modify the fmt 28/52528/3
Etienne AUJAMES [Wed, 27 Sep 2023 11:51:36 +0000 (13:51 +0200)]
LU-17149 tbf: nrs_tbf_id_cli_set should not modify the fmt

nrs_tbf_id_cli_set() needs to parse LDLM_ENQUEUE request to find
uid/gid.

It calls req_capsule_extend() that change the request format (rc_fmt).
If rc_fmt was not null, the function will not restore the initial
request format.

The following crash will occur the 2sd time that nrs_tbf_id_cli_set()
is called (1st: o_cli_find(), 2sd: o_cli_init()):

LustreError: 8949:0:(req_capsule_extend())
ASSERTION( fmt->rf_fields[i].nr >= old->rf_fields[i].nr ) failed:
Call Trace TBD:
[<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[<0>] req_capsule_extend+0x174/0x1b0 [ptlrpc]
[<0>] nrs_tbf_id_cli_set+0x1ee/0x2a0 [ptlrpc]
[<0>] nrs_tbf_generic_cli_init+0x50/0x180 [ptlrpc]
[<0>] nrs_tbf_res_get+0x1fe/0x430 [ptlrpc]
[<0>] nrs_resource_get+0x6c/0xe0 [ptlrpc]
[<0>] nrs_resource_get_safe+0x87/0xe0 [ptlrpc]
[<0>] ptlrpc_nrs_req_initialize+0x58/0xb0 [ptlrpc]
[<0>] ptlrpc_server_request_add+0x248/0xa20 [ptlrpc]
[<0>] ptlrpc_server_handle_req_in+0x36a/0x8c0 [ptlrpc]
[<0>] ptlrpc_main+0xb97/0x1530 [ptlrpc]
[<0>] kthread+0x134/0x150
[<0>] ret_from_fork+0x1f/0x40

Test-Parameters: testlist=sanityn env=ONLY=77
Test-Parameters: testlist=sanityn env=ONLY=77
Test-Parameters: testlist=sanityn env=ONLY=77
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ia762936262e8cde891ae2a9daf4ce691c6a6f616
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52528
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17171 test: improve sanity-quota test_41 91/52591/5
Lai Siyao [Sat, 30 Sep 2023 20:10:54 +0000 (16:10 -0400)]
LU-17171 test: improve sanity-quota test_41

On zfs backend, df result of project quota may print be a slightly
larger block used than quota result because the former is calcuated
in filesystem block size which is 4K.

Update sanity-quota test_41 to make it more robust.

Test-Parameters: trivial testlist=sanity-quota fstype=zfs
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ide51d9aaeb8907eb77acc30fa4fc76dcc16e8de0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52591
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
6 months agoLU-14361 statahead: Add test for statahead advise 30/51730/9
Qian Yingjin [Fri, 21 Jul 2023 10:21:11 +0000 (06:21 -0400)]
LU-14361 statahead: Add test for statahead advise

This patch adds a test program "aheadmany" and sanity/test_123g to
verfiy that statahead advise works as expected.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I751313ecb790c66f70a09bf8e9d13846b3c0032d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51730
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-12610 obd: delete OBD_ -> CFS_ redefinitions 40/35640/7
Timothy Day [Mon, 23 Oct 2023 15:50:23 +0000 (11:50 -0400)]
LU-12610 obd: delete OBD_ -> CFS_ redefinitions

With all consumers of the OBD macros fixed, finally
remove OBD macros that are simply redefinitions of CFS
macros.

Remove:
OBD_FAIL_PRECHECK(id)
OBD_FAIL_CHECK(id)
OBD_FAIL_CHECK_VALUE(id, value)
OBD_FAIL_CHECK_ORSET(id, value)
OBD_FAIL_CHECK_RESET(id, value)
OBD_FAIL_RETURN(id, ret)
OBD_FAIL_TIMEOUT(id, secs)
OBD_FAIL_TIMEOUT_MS(id, ms)
OBD_FAIL_TIMEOUT_ORSET(id, value, secs)
OBD_RACE(id)
OBD_FAIL_ONCE
OBD_FAILED

Avoid losing the unlikely optimization with OBD_FAIL_PRECHECK by
adding unlikely to CFS_FAIL_PRECHECK. For libcfs_private.h not
all callers of CFS_FAIL_PRECHECK had unlikely so this is fixed
as well.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I6620bae389a9e29da2c0258b07f0ca2a7f67c14a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/35640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16837 lov: NULL dereference in lov_delete_composite 26/52826/3
Bobi Jam [Wed, 25 Oct 2023 06:58:35 +0000 (14:58 +0800)]
LU-16837 lov: NULL dereference in lov_delete_composite

commit 14ed4a6f8f retroduced the issue fixed by commit
5da049d9ef ("LU-14389 lov: avoid NULL dereference in cleanup), this
patch makes the fix cover the new case added by 14ed4a6f8f.

Fixes: 14ed4a6f8f ("LU-16837 llite: handle unknown layout component")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4a2b72e21139b60519ed523b4851723c91f523c1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
6 months agoLU-11912 tests: fix racing in force_new_seq_all 01/52801/2
Li Dongyang [Mon, 23 Oct 2023 11:49:55 +0000 (22:49 +1100)]
LU-11912 tests: fix racing in force_new_seq_all

We run force_new_seq in parallel to reduce time spent
on consuming precreated objects.

However this could be racy when multiple MDTs are on
the same MDS, a task could finish for one MDT early
and reset the fail_loc to 0 on MDS while other tasks
are still working on other MDTs.

Replace OBD_FAIL_OSP_FORCE_NEW_SEQ with a new param
prealloc_force_new_seq for osp, so we can control
the seq rollover individually for each osp device.

Change-Id: I52dbd550564ca628a8a85c42951694d58b2b93a9
Fixes: 656fc937cf ("LU-11912 tests: consume precreated objects in parallel")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-6142 tests: Add missing /tmp/target2 under cleanup_src_tgt 70/52770/2
Arshad Hussain [Fri, 20 Oct 2023 08:25:47 +0000 (04:25 -0400)]
LU-6142 tests: Add missing /tmp/target2 under cleanup_src_tgt

cleanup_src_tgt() is called after each test case
under lustre-rsync-test.sh. The cleanup function
was missing /tmp/target2/... cleanup. This patch
adds the missing cleanup of folder /tmp/target2/...
It is good to have a clean start for next test case
as the diff (comparison) is performed with these
folders.

This patch also fixes space/tabs missmatch reported
by checkpatch

Test-Parameters: trivial testlist=lustre-rsync-test
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieb7cfa60d894f43f1aa7b2510d03bd07eeb90a1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>