Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-14540 o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED 14/42114/3
Chris Horn [Fri, 19 Mar 2021 18:22:26 +0000 (13:22 -0500)]
LU-14540 o2iblnd: Use REMOTE_DROPPED for ECONNREFUSED

ECONNREFUSED means that we received a response from the remote end,
so setting the LNet health status to REMOTE_DROPPED is more
appropriate than setting LOCAL_DROPPED. Using REMOTE_DROPPED will
decrement the peer NI health and allow us to try other peer NIs for
future sends.

Decrementing the peer NI health will also result in routes being
marked down, as appropriate, for cases where a router has refused the
connection request.

Test-Parameters: trivial
HPE-bug-id: LUS-9853
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8190f5d78a76ec25553908c4f215362c0c2051fc
Reviewed-on: https://review.whamcloud.com/42114
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14538 gss: make namespace optional in lgss_keyring 12/42112/2
Sebastien Buisson [Fri, 19 Mar 2021 14:46:58 +0000 (15:46 +0100)]
LU-14538 gss: make namespace optional in lgss_keyring

Introduce a new tunable 'sptlrpc.gss.gss_check_upcall_ns' to
make namespace support optional in lgss_keyring.
By default it is set to 1, which means adopt the standard behavior,
consisting in checking caller's namespace and switching namespace
if necessary.
When the tunable is set to 0, lgss_keyring sticks to the current
namespace.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib9d4e47935a718d4aae31fbb0d13f6bc8a4005a5
Reviewed-on: https://review.whamcloud.com/42112
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14522 ldlm: reprocess locks if enqueue failed 31/42031/7
Alex Zhuravlev [Sun, 14 Mar 2021 04:29:11 +0000 (07:29 +0300)]
LU-14522 ldlm: reprocess locks if enqueue failed

if the export got disconnected during enqueue, ldlm_handle_enqueue0()
drops the lock, but can skip reprocessing and this way all subsequent
waiting locks conflicting with the dopped one may get stuck.

with the patch most of racers succeed, otherwise 1/4 of runs get stuck

Fixes: 37932c4beb ("LU-10175 ldlm: IBITS lock convert instead of cancel")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I584b0de2656840da5dfa86a894fe02f138e1389d
Reviewed-on: https://review.whamcloud.com/42031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14487 lustre: remove references to Sun Trademark. 80/41880/3
Mr NeilBrown [Thu, 4 Mar 2021 02:51:23 +0000 (13:51 +1100)]
LU-14487 lustre: remove references to Sun Trademark.

"lustre" is no longer a Trademark of Sun Microsystems.  There is no
need to acknowledge the trademark is every file, so just remove all
these claims.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I214670b39c5718f2b691193f268a64856e0cd743
Reviewed-on: https://review.whamcloud.com/41880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-14450 kernel: kernel update RHEL8.3 [4.18.0-240.15.1.el8_3] 04/41704/5
Jian Yu [Wed, 24 Feb 2021 18:43:59 +0000 (10:43 -0800)]
LU-14450 kernel: kernel update RHEL8.3 [4.18.0-240.15.1.el8_3]

Update RHEL8.3 kernel to 4.18.0-240.15.1.el8_3.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.3 serverdistro=el8.3 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.3 serverdistro=el8.3 testlist=sanity

Change-Id: I92ca7769fac17221da376788cfe79887ecc4c19c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14119 osd: add mount option "resetoi" 02/41402/8
Lai Siyao [Wed, 3 Feb 2021 03:44:15 +0000 (11:44 +0800)]
LU-14119 osd: add mount option "resetoi"

OI files on zfs are special, and they can't be deleted by user space
tools like rm. Sometimes the OI files may contain stale OI mappings,
and they needed to be removed for namespace consistency. Add a mount
option 'resetoi' to recreate OI files on mount time, and it will
support both ldiskfs and zfs. This should be the standard way to
recreate OI files, other than mount as backend filesystem and unlink
them manually.

Add sanity-scrub 17.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idc0e4c2f3b81675c49c6c005bc30b61d8fd04503
Reviewed-on: https://review.whamcloud.com/41402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14119 osd: delete stale OI mapping entry 41/41741/4
Lai Siyao [Wed, 24 Feb 2021 03:31:06 +0000 (11:31 +0800)]
LU-14119 osd: delete stale OI mapping entry

Once LMA check shows OI mapping entry is stale, delete it from
OI table, as can avoid removing whole OI files.

Don't add OI mapping into cache until osd_fid_lookup(), because
the mapping in OI is not trustable until FID in LMA is checked,
otherwise it may mislead LFSCK.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I4b50dcc02149d485e4bf4a361ca2994daa280feb
Reviewed-on: https://review.whamcloud.com/41741
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14119 osd-zfs: enable LUDA_VERIFY 74/41274/3
Lai Siyao [Tue, 19 Jan 2021 13:37:50 +0000 (21:37 +0800)]
LU-14119 osd-zfs: enable LUDA_VERIFY

In osd_dir_it_rec(), if dirent is successfully got, and the FID in
dirent is sane, it returns right away, however if
LUDA_VERIFY|LUDA_VERIFY_DRYRUN is set, the FID in dirent should be
compared with the FID in LMA, and replaced with the latter one if
they are differet.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I35e2a4d4606044cd37cc5847cffc577740918988
Reviewed-on: https://review.whamcloud.com/41274
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14119 mdc: set fid2path RPC interruptible 19/41219/3
Lai Siyao [Wed, 13 Jan 2021 09:29:50 +0000 (17:29 +0800)]
LU-14119 mdc: set fid2path RPC interruptible

Sometimes OI scrub can't fix the inconsistency in FID and name, and
server will return -EINPROGRESS for fid2path request. Upon such
failure, client will keep resending the request. Set such request
to be interruptible to avoid deadlock.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I82192cb8a8256064ca632cabfe5581b12e86423b
Reviewed-on: https://review.whamcloud.com/41219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
3 years agoLU-14291 ptlrpc: format UPDATE messages in server-only code 25/41125/7
Mr NeilBrown [Fri, 30 Oct 2020 04:01:00 +0000 (15:01 +1100)]
LU-14291 ptlrpc: format UPDATE messages in server-only code

There are some ptlrpc messages that are only used for targets to
communicate with each other: Object Updates between Targets (OUT).

These are never needed by the client, so the code for handling them
can be conditionally compiled with HAVE_SERVER_SUPPORT.

The code in layout.c needs struct declaration that are in the file, so
group them at the end of the file and add #ifdef.
The code in pack_generic.c can stand alone, so move it to a new
pack_server.c and compile that only when server code is requested.

For simplicity, also make req_check_sepol() completely server-side
and provide an inline stub for client-only code.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I788352575a2109df389760fff45207ad6de3391b
Reviewed-on: https://review.whamcloud.com/41125
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14195 libcfs: switch to kfree_sensitive 08/40908/5
Mr NeilBrown [Wed, 9 Dec 2020 01:49:13 +0000 (12:49 +1100)]
LU-14195 libcfs: switch to kfree_sensitive

In Linux 5.10, kzfree() has been renamed kfree_sensitive().

So switch to the new name and provide back-compat support for older
kernels.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If665168477a0b6241a8ddf31a111cd465fe97783
Reviewed-on: https://review.whamcloud.com/40908
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13783 libcfs: provide fallback kallsyms_lookup_name() 26/40826/6
Mr NeilBrown [Tue, 2 Mar 2021 00:49:01 +0000 (11:49 +1100)]
LU-13783 libcfs: provide fallback kallsyms_lookup_name()

Since Linux 5.7, kallsyms_lookup_name() is no longer exported, so we
cannot rely on it.

So test for this, and when not available provide a fallback which just
returns NULL.

As this was the only way to access apply_workqueue_attrs() in recent
kernels, we need to cope with the absence of that function.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I09cc00047ec163a9395c5acd415505a8586e4e99
Reviewed-on: https://review.whamcloud.com/40826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14132 lod: do not initialize sub llogs twice 05/40605/12
Alex Zhuravlev [Wed, 11 Nov 2020 08:00:23 +0000 (11:00 +0300)]
LU-14132 lod: do not initialize sub llogs twice

this can happen during MDT re-activation and then result in leaked
objects:
lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 )

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0afb335ffb20532f9171dd2e514100b12f4d9a76
Reviewed-on: https://review.whamcloud.com/40605
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-11776 utils: add support lfs find with mdt hash flag 40/39340/7
Yang Sheng [Fri, 10 Jul 2020 15:31:17 +0000 (23:31 +0800)]
LU-11776 utils: add support lfs find with mdt hash flag

The lfs find can use mdt hash flag as a condition. Also
change it can find with one more mdt hash type.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I599bb1a3cc2c9ea2a523f50f119bd93a5520d213
Reviewed-on: https://review.whamcloud.com/39340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13397 lfs: mirror resync to keep sparseness 73/40773/11
Mikhail Pershin [Wed, 25 Nov 2020 16:05:05 +0000 (19:05 +0300)]
LU-13397 lfs: mirror resync to keep sparseness

Use SEEK_HOLE/SEEK_DATA in llapi_mirror_resync_many() to
copy just data chunks between components. Holes at the last
component are done with truncate(), holes in other components
are done with fallocate(FALLOC_FL_PUNCH_HOLE). In case of any
punch() error the hole is just copied via read(), i.e. as zeroes

Currently fallocate(FALLOC_FL_PUNCH_HOLE) is not supported yet,
so resync preserves sparseness only for last components

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id249739c5cd2d1c8a998da3341d326de1a8b8d32
Reviewed-on: https://review.whamcloud.com/40773
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-6142 lustre: convert IFTODT to S_DT 41/40641/3
Mr NeilBrown [Thu, 12 Nov 2020 23:01:06 +0000 (10:01 +1100)]
LU-6142 lustre: convert IFTODT to S_DT

In Linux v5.1-rc1~141^2~1 introduced include/linux/fs_types.h which
adds macros for manipulating file types, including S_DT() which
does what the userpsace IFTODT() macro does.

So change kernel code to use S_DT() instead of IFTODT(), and provide
definitions for kernels which don't yet have this file.

fs_types.h is included by fs.h, so we don't need to explicitly include
it anywhere.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If001f7e7a97992af690222b7524770c5e4b7003d
Reviewed-on: https://review.whamcloud.com/40641
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14090 mgs: no local logs flag 48/40448/7
Artem Blagodarenko [Thu, 16 Jul 2020 08:37:51 +0000 (04:37 -0400)]
LU-14090 mgs: no local logs flag

There is a feature that starts a target with a local copy of
config log in order to avoid a delay in communicating with
an MGS and to load mgs log updates later on. However, that
feature is not always useful.

When replace_nids adds records with new nids it does not
append remote config logs but overwrite corresponding
records in place. If a target starts using local config
log - it gets confused by outdated nids.

This patch adds tunefs.lustre --nolocallogs key that
sets nolocallogs flag, which says ignore local configs copy.
The flag is reset once new logs are uploaded from MGS.

tunefs.lustre --nolocallogs is suggested to be executed on
targets together with replace_nids on MGS.

HPE-bug-id: LUS-2510
Change-Id: I949c19ac701d287e1c1199bc12445989476a707b
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157574
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Nikitas Angelinas <nangelinas@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/40448
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12142 clio: fix hang on urgent cached pages 37/40237/12
Wang Shilong [Wed, 14 Oct 2020 02:49:49 +0000 (10:49 +0800)]
LU-12142 clio: fix hang on urgent cached pages

Few problems addressed by this patch:

1) We try to reserve cl_pages in batch, but we don't do
that for append IO, there is no reason to skip that.

2) IO might be not page aligned, calculate reserved pages
correctly for this case.

3) If we issue one large IO block size which is larger
than max_cached_mb, IO will never be finished, because
we don't have enough cl pages to finish it, split IO
in this case.

4) Readahead should fail if we are short of LRU page
slots to avoid deadlock.

After above adjustment, LRU slots are guranteed for normal
buffer write before IO starts, if block size is too large
for max LRU slots, IO will be split.

For extra readahead, don't try hard and quit if we
are short of LRU pages, since readahead could tolerate
errors, applications won't be aware of it.

besides newly added tests, following command with 64M
max_cached_mb setting and don't see client hang any more.

/usr/lib64/openmpi/bin/mpirun --allow-run-as-root -np 12
-wd /mnt/lustre ior -g -e -w -r -b 1g -T 10 -F -C -t 64m

Todo:
Performance benchmark for readahead

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I5c85454a40daeefb4fb97609d6aa28df2eafb99c
Reviewed-on: https://review.whamcloud.com/40237
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12142 readahead: limit over reservation 60/42060/5
Wang Shilong [Wed, 17 Mar 2021 09:58:00 +0000 (17:58 +0800)]
LU-12142 readahead: limit over reservation

For performance reason, exceeding @ra_max_pages are allowed to
cover current read window, but this should be limited with RPC
size in case a large block size read issued. Trim to RPC boundary.

Otherwise, too many read ahead pages might be issued and
make client short of LRU pages.

Fixes: 777b04a093 ("LU-13386 llite: allow current readahead to exceed reservation"
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Icf74b5fbc75cf836fedcad5184fcdf45c7b037b4
Reviewed-on: https://review.whamcloud.com/42060
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-10632 tests: recovery-small test_26 idle_timeout 06/42006/3
Andreas Dilger [Thu, 11 Mar 2021 09:39:57 +0000 (02:39 -0700)]
LU-10632 tests: recovery-small test_26 idle_timeout

In recovery-small test_26() use "lfs df" instead of plain "df"
since statfs may be fetched from the MDS cache and will not
ensure that the client->OST connections are currently active.

Also, check a few entries further back in the OSC state log for an
EVICTED message, in case the client idle disconnects from the server
again while checking all of the imports.

Test-Parameters: trivial testlist=recovery-small env=ONLY=26a,ONLY_REPEAT=100
Fixes: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8c370cb75f4e06258ef3c032630fc20354a15dcc
Reviewed-on: https://review.whamcloud.com/42006
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14534 gss: do not refresh context for LDLM callback 76/42076/2
Sebastien Buisson [Thu, 18 Mar 2021 16:17:31 +0000 (17:17 +0100)]
LU-14534 gss: do not refresh context for LDLM callback

If the request to be sent is an LDLM callback, do not try to
refresh context.
An LDLM callback is sent by a server to a client in order to make
it release a lock, on a communication channel that uses a reverse
context. It cannot be refreshed on its own, as it is the 'reverse'
(server-side) representation of a client context.
We do not care if the reverse context is expired, and want to send
the LDLM callback anyway. Once the client receives the AST, it is
its job to refresh its own context if it has expired, hence
refreshing the associated reverse context on server side, before
being able to send the LDLM_CANCEL requested by the server.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic8f4fe203f16ed5cfafd3da355c78cf58d96c3eb
Reviewed-on: https://review.whamcloud.com/42076
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14527 kernel: kernel update RHEL7.9 [3.10.0-1160.21.1.el7] 50/42050/4
Jian Yu [Tue, 16 Mar 2021 18:42:56 +0000 (11:42 -0700)]
LU-14527 kernel: kernel update RHEL7.9 [3.10.0-1160.21.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.21.1.el7.

Test-Parameters: clientdistro=el7.9 serverdistro=el7.9

Change-Id: I1a46fe492d280b19c0f93458aaac975a4c873caf
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/42050
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14204 tests: use first available import 19/42019/3
Sebastien Buisson [Fri, 12 Mar 2021 08:48:09 +0000 (09:48 +0100)]
LU-14204 tests: use first available import

In test suite, be careful to use first available import in case there
are multiple mount points.

Test-Parameters: trivial
Test-Parameters: env=SHARED_KEY=true testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib099cd5c9666e9d4faf9445846c91a225f4a8f57
Reviewed-on: https://review.whamcloud.com/42019
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-8837 lod: move lod-specifc pool config code into lod_dev 93/41993/4
Mr NeilBrown [Fri, 8 Jan 2021 03:23:01 +0000 (14:23 +1100)]
LU-8837 lod: move lod-specifc pool config code into lod_dev

obd_config.c contains code that only applies to lod devices, for
managing a QMT pool along-side each normal pool.

As this code is specific to lod, it is best to move it into the lod
module.  This is particularly helpful as it removes it from
client-only builds.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9e0d014a299c28b73e48ce2e06581cb011acce47
Reviewed-on: https://review.whamcloud.com/41993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14289 libcfs: discard cfs_array_alloc() 92/41992/2
Mr NeilBrown [Wed, 24 Feb 2021 23:57:19 +0000 (10:57 +1100)]
LU-14289 libcfs: discard cfs_array_alloc()

cfs_array_alloc() and _free() are used for precisely one array, and
provide little value beyond open-coding the alloc and free.

So discard these functions and alloc/free in the loops that already
exist for setup and cleanup.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2a66be311dbba269b0b43c3a75f17ccc8e946538
Reviewed-on: https://review.whamcloud.com/41992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14507 mdt: handle default stripe_count=-1 properly 83/41983/6
Andreas Dilger [Wed, 10 Mar 2021 16:57:44 +0000 (09:57 -0700)]
LU-14507 mdt: handle default stripe_count=-1 properly

If the default LMV stripe_count=-1 print it as a signed value
instead of unsigned, to better match how it is set with "-c -1".

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I106f266c33e2c2cf0f5bcc1491e4bc5ac93ebbe5
Reviewed-on: https://review.whamcloud.com/41983
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14506 hsm: correct default stripe offset in import 78/41978/2
John L. Hammond [Wed, 10 Mar 2021 15:20:29 +0000 (09:20 -0600)]
LU-14506 hsm: correct default stripe offset in import

In lhsmtool_posix, when calling llapi_hsm_import(), pass a stripe
offset of -1 rather than 0 to select the default. Add sanity-hsm
test_11c() to check that a file may be imported to a directory with a
default striping specifing a pool that does not include OST0000.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I40636c0620b2f9314eb13bf23a8cf6d02990f851
Reviewed-on: https://review.whamcloud.com/41978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
3 years agoLU-14462 gss: remove HAVE_SETNS from lgss_keyring 67/41967/3
Sebastien Buisson [Tue, 9 Mar 2021 16:11:44 +0000 (17:11 +0100)]
LU-14462 gss: remove HAVE_SETNS from lgss_keyring

For the sake of simplification, a previous patch removed the config
check that sets HAVE_SETNS, due to the fact that in kernels 3.10+
function setns() necessarily exists.
In this case, all #ifdef on HAVE_SETNS are erroneous because it is
not set whereas the function is actually available.
So remove all references to HAVE_SETNS in the code.

Fixes: 8e88bbfef5 ("LU-12477 lustre: remove obsolete config checks")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab0726c3e847a210185cc8c9353a79976acb1381
Reviewed-on: https://review.whamcloud.com/41967
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12678 lnet: convert lpni_refcount to a kref 41/41941/5
Mr. NeilBrown [Thu, 11 Mar 2021 22:43:41 +0000 (17:43 -0500)]
LU-12678 lnet: convert lpni_refcount to a kref

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

Change-Id: I857dff2c9838cb7d8f4b5f023f75f2d66119344f
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41941
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-9859 libcfs: remove linux-curproc.c 38/41938/10
Mr. NeilBrown [Thu, 18 Mar 2021 21:57:03 +0000 (17:57 -0400)]
LU-9859 libcfs: remove linux-curproc.c

The only real functionality remaining here is
cfs_curproc_cap_pack(),
and it can be trivially implemented as an inline
in curproc.h.
So do that and remove the file.

The rest can be moved to jobid.c

Linux-commit: 37d3b407dc14a13ec8bba3a4d7737c92f996e9c0

Change-Id: I3546841fa44accb19d0867099c17b16ede48228e
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/41938
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14479 ssk: explicitly set perm on key 29/41929/3
Sebastien Buisson [Mon, 8 Mar 2021 14:20:00 +0000 (15:20 +0100)]
LU-14479 ssk: explicitly set perm on key

When an SSK key is loaded, either via lgss_sk command or thanks to
skpath mount option, try to set permissions on the key.
This is to avoid a 'Permission denied' error when a Lustre client or
server wants to make use of the key later on.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1ed712ae4d07be306cc76b4e59fab303437558bb
Reviewed-on: https://review.whamcloud.com/41929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14494 mdt: check object exists in mdt_close_handle_layouts() 05/41905/7
John L. Hammond [Fri, 5 Mar 2021 18:47:43 +0000 (12:47 -0600)]
LU-14494 mdt: check object exists in mdt_close_handle_layouts()

In mdt_close_handle_layouts() the client supplied FID may not identify
an existing object. So check for this before calling lu_object_attr().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ib1710ca4bf7587e0496b3a37a2afb65f81250455
Reviewed-on: https://review.whamcloud.com/41905
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
3 years agoLU-14337 lov: return valid stripe_count/size for PFL files 03/41803/6
Emoly Liu [Tue, 16 Mar 2021 03:04:46 +0000 (11:04 +0800)]
LU-14337 lov: return valid stripe_count/size for PFL files

Dump struct lov_comp_md_v1 in function ll_lov_getstripe_ea_info()
correctly to avoid stripe_count=0 or stripe_size=0 returned by
old interface llapi_file_get_stripe(), which will cause
divide-by-zero for older userspace that calls this ioctl,
e.g. lustre ADIO driver.
The rule is:
- if stripe_count=0, return stripe_count=1;
- if stripe_size=0,
  -- for DoM files, return the stripe size of the second component,
     since the first component of DoM file data is placed on the
     MDT for faster access;
  -- else, return the stripe size of the last component.

Also, lov_getstripe_old.c and santy-pfl.sh test_25 is added to
verify this patch.

Test-parameters: testlist=sanity-pfl env=ONLY=25

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I4023ca4baff1b1ad2a439aa497baaabc56e891d2
Reviewed-on: https://review.whamcloud.com/41803
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13641 socklnd: remove tcp bonding 00/40000/12
Serguei Smirnov [Tue, 16 Mar 2021 21:34:26 +0000 (17:34 -0400)]
LU-13641 socklnd: remove tcp bonding

TCP bonding in the socklnd has become obsolete with LNet
Multi-Rail and there's no evidence it's being used anywhere.
Remove it to keep the code simple.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib456f951b8ccd59112c460085632a2cb3c982004
Reviewed-on: https://review.whamcloud.com/40000
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13569 lnet: Recover peer NI w/exponential backoff interval 20/39720/15
Chris Horn [Sun, 23 Aug 2020 15:16:18 +0000 (10:16 -0500)]
LU-13569 lnet: Recover peer NI w/exponential backoff interval

Perform LNet recovery pings of peer NIs with an exponential backoff
interval.
 - The interval is equal to 2^(number failed pings) up to a maximum
   of 900 seconds (15 minutes).
 - When a message is received the count of failed pings for the
   associated peer NI is reset to 0 so that recovery can happen more
   quickly.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic7e60455015a0236a96010c07fc0ddd02078cf92
Reviewed-on: https://review.whamcloud.com/39720
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13569 lnet: Only recover known good peer NIs 19/39719/15
Chris Horn [Thu, 16 Jul 2020 03:38:52 +0000 (22:38 -0500)]
LU-13569 lnet: Only recover known good peer NIs

A peer NI should not be eligible for recovery if we've never
received a message from it.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iec2fd015f6410ab91c6ef7c222cbed0204243106
Reviewed-on: https://review.whamcloud.com/39719
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13569 lnet: Age peer NI out of recovery 18/39718/15
Chris Horn [Sun, 23 Aug 2020 15:14:22 +0000 (10:14 -0500)]
LU-13569 lnet: Age peer NI out of recovery

No longer send recovery pings to a peer NI that has been in recovery
for the recovery time limit. A peer NI will become eligible for
recovery again once we receive a message from it.

The existing lpni_last_alive field is utilized for this new purpose.

A check for NULL lpni is removed from
lnet_handle_remote_failure_locked() because all callers of that
function already ensure the lpni is non-NULL.

lnet_peer_ni_add_to_recoveryq_locked() now takes the recovery queue
as an argument rather than using the_lnet.ln_mt_peerNIRecovq. This
allows the function to be used by lnet_recover_peer_nis().
lnet_peer_ni_add_to_recoveryq_locked() is also modified to take a ref
on the peer NI if it is added to the recovery queue. Previously, it
was the responsibility of callers to take this ref.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib4676540ac4bb040690a4fb047236c54eea0e752
Reviewed-on: https://review.whamcloud.com/39718
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-6142 checkpatch: treat CNETERR and CEMERG as log function 96/41996/2
Mr NeilBrown [Wed, 10 Mar 2021 23:22:54 +0000 (10:22 +1100)]
LU-6142 checkpatch: treat CNETERR and CEMERG as log function

CNETERR and CEMERG are log functions and should be treated as such by
checkpatch.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I295f0de9244578ebdc925e0e0783d3b436fc6fb0
Reviewed-on: https://review.whamcloud.com/41996
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-6142 lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily. 39/41939/3
Neil Brown [Mon, 1 Mar 2021 15:13:56 +0000 (10:13 -0500)]
LU-6142 lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily.

list_for_each_entry_safe() is only needed if the body of the
loop might change the list, or if it might drop a lock that would
otherwise prevent the list from being changed.

When the body does neither of these, list_for_each_entry() should be
preferred as it makes the behaviour of the loop more clear to readers.

In each of the cases changed there, the list cannot change while the
loop proceeds.

Change-Id: Ib0f08c5d4d7959b80a7a1490fb606e40e1cf5f85
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13397 lfs: mirror extend/copy keeps sparseness 72/40772/10
Mikhail Pershin [Mon, 23 Nov 2020 11:06:12 +0000 (14:06 +0300)]
LU-13397 lfs: mirror extend/copy keeps sparseness

- make ll_lseek() to work under group lock and on designated
  mirror
- enhance lfs mirror copy functions migrate_copy_data() and
  llapi_mirror_copy_many() with lseek() to find holes and copy
  only data chunks.

Both 'migrate' and 'copy' lfs functionality rewrite designated
mirror fully, so holes are not punched in destination file, but
truncate is called first to make sure old data is erased.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic4a8768b816c921acd7f0adb3311138caac05a7c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/40772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoNew tag 2.14.51 2.14.51 v2_14_51
Oleg Drokin [Tue, 23 Mar 2021 05:41:16 +0000 (01:41 -0400)]
New tag 2.14.51

Change-Id: Iaab01cccdeb761183879a9baf42d5106e0e880ce

3 years agoLU-14502 lov: fault page update cp_lov_index 54/41954/4
Bobi Jam [Tue, 9 Mar 2021 09:15:20 +0000 (17:15 +0800)]
LU-14502 lov: fault page update cp_lov_index

In fault IO, vvp_io_fault_start() could find an existing cl_page
associated with the vmpage covering the fault index, and the page
may still refer to another mirror of an old IO.

This patch update the fault page's cp_lov_index in lov_io_fault_start

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I50639700159a76061437fd2f1a09dadf25cfd33f
Reviewed-on: https://review.whamcloud.com/41954
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14473 test: check RUNAS and RUNAS_ID 85/41785/5
Olaf Faaland [Sat, 27 Feb 2021 00:53:38 +0000 (16:53 -0800)]
LU-14473 test: check RUNAS and RUNAS_ID

Validate RUNAS and RUNAS_ID before testing a file create, so
that the error messages can be more specific.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I87b2c279f981b34ab979cca42a8ae06128a294cc
Reviewed-on: https://review.whamcloud.com/41785
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14291 lustre: further cleanup of acl code. 32/42032/3
Mr NeilBrown [Sun, 14 Mar 2021 22:34:55 +0000 (09:34 +1100)]
LU-14291 lustre: further cleanup of acl code.

Code in lustre/obdclass/acl.c is only used in lustre/mdd/, so move the
file there, renaming to mdd_acl.c and removing EXPORT_SYMBOL()
declarations.

The function prototypes in lustre_eacl.h are moved to mdd_internal.h,
and the remainder of that file is discarded.  THe
HAVE_STRUCT_ACL_XATTR stanza, in particular, is unnecessary is it
exists in lustre_compat.h.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idb0978758640c5ad527d2c68c4fdf6dee32a731c
Reviewed-on: https://review.whamcloud.com/42032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-8837 lmv: don't use lqr_alloc spinlock in lmv 49/41949/6
Mr NeilBrown [Mon, 8 Mar 2021 22:28:48 +0000 (09:28 +1100)]
LU-8837 lmv: don't use lqr_alloc spinlock in lmv

The only place the lrq_alloc spinlock is used in lmv is in
lmv_locate_tgt_rr().  The purpose here is presumably to protect
lmv_qos_rr_index from concurrent updates.  This is a field that is
only tangentially related the the structure that holds the spinlock.

lmv_qos_rr_index is directly in 'struct lmv_obd' while lqr_alloc
is in struct lu_qos_rr which is in struct lu_qos, which is in lmv_obd.

As there is a spinlock in 'struct lmv_obd' (lmv_lock) it makes more
sense to use that to protect lmv_qos_rr_index.  Then the entire
lu_qos_rr structure will be unused on the client and can be made
server-only.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I926e6d31ca0ee1cbfff9905192428e28485ed448
Reviewed-on: https://review.whamcloud.com/41949
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14385 utils: add range check to strtol() in lfs.c 56/41756/4
Jian Yu [Thu, 11 Mar 2021 23:10:29 +0000 (15:10 -0800)]
LU-14385 utils: add range check to strtol() in lfs.c

Most of the strtol() and strtoll() functions called
in lfs.c did not check the range of the return value.
This patch fixes those issues.

Change-Id: I9ff51662bf0d2320961a7838da08f09552e9ef1e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41756
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14428 libcfs: discard cfs_trace_copyin_string() 90/41490/4
Mr NeilBrown [Tue, 9 Feb 2021 00:49:30 +0000 (11:49 +1100)]
LU-14428 libcfs: discard cfs_trace_copyin_string()

Instead of cfs_trace_copyin_string(), use memdup_user_nul().
This combines the allocation with the copyin, and nul-terminates.

The resulting code is a lot simpler.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I089c5da96b59ec62d177aea2f3d170bf751c6fec
Reviewed-on: https://review.whamcloud.com/41490
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14428 libcfs: discard cfs_trace_console_buffers[] 89/41489/4
Mr NeilBrown [Tue, 9 Feb 2021 00:28:45 +0000 (11:28 +1100)]
LU-14428 libcfs: discard cfs_trace_console_buffers[]

cfs_trace_console_buffers[] is a collection of buffers into which
various messages are formatted - with vscnprintf or similar - and
which are then passed to cfs_print_to_console which adds more
formatted information.

The two levels of formatting can instead be achieved using the "%pV"
format which takes a format-and-args.  If we do this, we don't need
cfs_trace_console_buffers[] and more.

One minor drawback is that cfs_tty_write_message() requires a final
string to print, not a format plus arguments.  This is only minor
because there is precisely one message that is ever sent to
cfs_tty_write_message(), and it contains no formatting.  So we now
generate a warning if the string passed with D_TTY ever contains
formatting, and just print that string ignoring any formatting.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic78ac3703e5b6321dade8c367753c0aec1cae60b
Reviewed-on: https://review.whamcloud.com/41489
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14398 hsm: use llapi_fid2path_at() in the copytool 08/41408/2
John L. Hammond [Wed, 3 Feb 2021 20:19:05 +0000 (14:19 -0600)]
LU-14398 hsm: use llapi_fid2path_at() in the copytool

In lhsmtool_posix.c and liblustreapi_hsm.c, convert several uses of
uses of llapi_fid2path() to llapi_fid2path_at().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ice64d02010b4260287be4d4e26c6b75b178bc81b
Reviewed-on: https://review.whamcloud.com/41408
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14179 lfs: avoid lfs find error with long paths 37/41337/8
Stephane Thiell [Fri, 26 Feb 2021 20:33:04 +0000 (12:33 -0800)]
LU-14179 lfs: avoid lfs find error with long paths

Test that files created in a directory having an absolute path length
of up to PATH_MAX-1 are properly found with lfs find. This change
might not cover other very deep directory tree (above PATH_MAX).

Signed-off-by: Stephane Thiell <sthiell@stanford.edu>
Change-Id: I44726efd5053c593094587e5c8a4652a3a876641
Reviewed-on: https://review.whamcloud.com/41337
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14119 lfsck: replace dt_lookup() with dt_lookup_dir() 18/41218/3
Lai Siyao [Wed, 13 Jan 2021 09:16:55 +0000 (17:16 +0800)]
LU-14119 lfsck: replace dt_lookup() with dt_lookup_dir()

Lfsck code calls dt_lookup() to lookup sub file under directory in
many places, but this function needs to to initialize directory with
dt_try_as_dir() first, while it's missing in several places, since
the overhead is trivial, call dt_lookup_dir() instead.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I40bd8d51edece50353af1729cf867572a0abea78
Reviewed-on: https://review.whamcloud.com/41218
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14110 obdclass: Protect cl_env_percpu[] 65/40565/11
Etienne AUJAMES [Tue, 3 Nov 2020 14:35:17 +0000 (15:35 +0100)]
LU-14110 obdclass: Protect cl_env_percpu[]

cl_env_percpu is not protected against multi client mounts on the
same node: "keys_fill" could be called with the same cl_env_percpu
context by several mount processes (race on lu_context.lc_value).

This patch add a mutex for cl_env_percpu to proctect contexts
"refill".

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Icfd6f3715899fa4ac5279e932f462e7cf29d98bd
Reviewed-on: https://review.whamcloud.com/40565
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-6142 llite: use d_is_symlink to test if dentry is a symlink 70/41770/5
Mr NeilBrown [Fri, 16 Oct 2020 00:07:21 +0000 (11:07 +1100)]
LU-6142 llite: use d_is_symlink to test if dentry is a symlink

Using d_is_symlink() is preferred to testing ->get_link or
->follow_link.

A recent patch made this work for foreign files/dirs by making sure
the entry type in d_flags is correct, so we can simplify the code in
ll_revalidate_dentry().

Fixes: 15d44e787e17 ("LU-12682 llite: fake symlink type of foreign file/dir")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie4c33ae1fb9a660ccbd50e2c70b6cde65cc9b990
Reviewed-on: https://review.whamcloud.com/41770
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14480 pool: wrong usage with ost list 15/41815/3
Vitaly Fertman [Wed, 16 Dec 2020 22:02:32 +0000 (01:02 +0300)]
LU-14480 pool: wrong usage with ost list

When the OST list is given on setstripe, it should have a priority
over the pool. Also, we check only for the 1st OST if it is in the
pool at the creation time, what worked well in past with -c and
works even with -C, but not with the OST list when some of the OSTs
are out of the pool.

Make the --pool and --ost options mutualy exclusive.
Drop the pool inheritance if the OST list is given.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I94a7fe97391f1185392f986f78ab1a372238972a
Reviewed-on: https://es-gerrit.dev.cray.com/158198
HPE-bug-id: LUS-9579
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/41815
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14182 lov: cancel layout lock on replay deadlock 67/40867/2
Vitaly Fertman [Fri, 4 Dec 2020 16:35:19 +0000 (19:35 +0300)]
LU-14182 lov: cancel layout lock on replay deadlock

layout locks are not replayed and instead cancelled as unused, what
requires to take lov_conf_lock. the semaphore may be already taken by
cl_lock_flush() which prepares a new IO which is not be able to be
sent to MDS as it is in the recovery.

HPE-bug-id: LUS-9232
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I1a1a91a81c19ad4deca9ff581107512642f0b666
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/40867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14291 build: use tgt_pool for lov layer 83/39683/8
James Simmons [Fri, 26 Feb 2021 21:41:09 +0000 (16:41 -0500)]
LU-14291 build: use tgt_pool for lov layer

New general code was created for target pool handling. We can
use this new code with the lov layer. Place this tgt_pool.c in
the obdclass instead of having a special target directory just to
build this code for the client.

Change-Id: I05542c1d654d79647f5e0853bb1d587ff265fdf9
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/39683
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-6142 lustre: remove ll_file_*_flag wrappers. 92/40292/6
Mr NeilBrown [Thu, 15 Oct 2020 23:34:04 +0000 (10:34 +1100)]
LU-6142 lustre: remove ll_file_*_flag wrappers.

ll_file_{test,set,clear,test_and_set}_flag are simple wrappers around
the various *_bit() functions.  They don't aid readability and the
convention in the kernel is to use the *_bit() functions directly.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0d50f8936ad9f97882f4771dd3210cc05fe43989
Reviewed-on: https://review.whamcloud.com/40292
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-8837 ptlrpc: mark some functions as static 47/41947/5
Mr NeilBrown [Mon, 1 Mar 2021 04:38:42 +0000 (15:38 +1100)]
LU-8837 ptlrpc: mark some functions as static

The functions
 ptlrpc_start_threads,
 ptlrpc_start_thread,
 ptlrpc_stop_all_threads
 ptlrpc_nrs_policy_register
and
 ptlrpc_nrs_policy_register

are only used in the same file that defines them, so mark them as
'static' and remove the declarations from include files.

 ptlrpc_nrs_policy_unregister

is never used at all, so remove it completely.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id7862b9da3c58ab980c0fcd4d07c1f119fbf7581
Reviewed-on: https://review.whamcloud.com/41947
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-14289 ptlrpc: rename cfs_binheap to simply binheap 75/41375/3
Mr NeilBrown [Mon, 1 Feb 2021 02:16:12 +0000 (13:16 +1100)]
LU-14289 ptlrpc: rename cfs_binheap to simply binheap

As the binheap code is no longer part of libcfs, the cfs_ prefix is
misleading.  As this code is local to one module and doesn't conflict
with anything global, there is no need for a prefix at all.  So change
cfs_binheap to binheap.

This patch was prepare using 'sed', then fixing a few text-alignment
issues caused by the loss of those 4 characters.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I168bec50898ec7b9ab72dc91b080af4852ddb3a4
Reviewed-on: https://review.whamcloud.com/41375
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14291 lustre: clean up lustre_eacl.h and make server-only 26/41126/4
Mr NeilBrown [Thu, 29 Oct 2020 05:13:36 +0000 (16:13 +1100)]
LU-14291 lustre: clean up lustre_eacl.h and make server-only

lustre_eacl.h contains a number of declarations that are never used:
remove them.

The declarations which are used are only needed on server-side files,
so remove the #include from elsewhere.

As obdclass/acl.c is only built server-side, remove the
 #ifdef HAVE_SERVER_SUPPORT
in the file.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If1a3d908bf8357041c38ab9d335efa1e051cef16
Reviewed-on: https://review.whamcloud.com/41126
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-13783 libcfs: don't depend on sysctl support for debugfs 32/40832/4
Mr NeilBrown [Thu, 12 Nov 2020 00:16:28 +0000 (11:16 +1100)]
LU-13783 libcfs: don't depend on sysctl support for debugfs

Since Linux v5.8-rc1~55^2~6 sysctl support routines like
proc_dointvec() expect a pointer to kernel-space, not userspace.

So stop using these function for debugfs files, and instead
provide bespoke functions.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I340a748bbfbd066054a73299ce32698aa39a0e2d
Reviewed-on: https://review.whamcloud.com/40832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13783 libcfs: support __vmalloc with only 2 args. 28/40328/7
Mr NeilBrown [Wed, 21 Oct 2020 04:26:35 +0000 (15:26 +1100)]
LU-13783 libcfs: support __vmalloc with only 2 args.

Since v5.8-rc1~201^2~19 Commit 88dca4ca5a93 ("mm: remove the pgprot
argument to __vmalloc") __vmalloc only takes 2 arguments.

So introduce __ll_vmalloc which takes 2 args, and calls
__vmalloc with correct number of args.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2c89512a12e28b27544a891620e448a9b752b089
Reviewed-on: https://review.whamcloud.com/40328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13903 utils: move userland only nidstr.h handling 15/39115/5
James Simmons [Mon, 8 Mar 2021 14:09:40 +0000 (09:09 -0500)]
LU-13903 utils: move userland only nidstr.h handling

The function cfs_expand_nidlist() no longer exist for kernel
internals. We can move the function prototype from the UAPI
header to string.h which is a libcfs user land header.
The structure netstrfns that is defined in a UAPI header
has been adding user land only handling. Additional its
use struct list_head which will confuse reviewers since
kernel developers see this as a kernel only thing.

Test-Parameters: trivial

Change-Id: Ifc3c87f6d3237a94d282d009455ff389278e73ea
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39115
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-10391 socklnd: convert ksocknal_add_peer to take sockaddr 08/38408/7
Mr NeilBrown [Tue, 28 Jan 2020 01:15:13 +0000 (12:15 +1100)]
LU-10391 socklnd: convert ksocknal_add_peer to take sockaddr

ksocknal_add_peer() now takes a 'struct sockaddr' which is currently
always an IPv4 address.  ksocknal_lauch_packet() is the main place
where the nid is converted to an IP address.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I194248662798542096e5cc9af985e6c0063a038a
Reviewed-on: https://review.whamcloud.com/38408
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12752 mdt: commitrw_write() - check dying object under lock 97/41797/5
Vladimir Saveliev [Mon, 1 Mar 2021 08:52:51 +0000 (11:52 +0300)]
LU-12752 mdt: commitrw_write() - check dying object under lock

If process writes to unlinked file the following race between
mdt_commitrw_write() and mdd_close() may occur because
mdt_commitrw_write() checks whether an object is dying without lock:

mdt_commitrw_write() checks lu_object_is_dying(&mo->mot_header) and it
not yet

mdd_close() interposes and destroys the object via
  mdo_destroy()
    lod_destroy()
      lod_sub_destroy()
        osd_destroy()
          obj->oo_destroyed = 1;

mdt_commitrw_write() continues, locks the object and returns ENOENT
from

  dt_attr_get()
    osd_attr_get()
      if (unlikely(obj->oo_destroyed))
        return -ENOENT;

If the file is built of DoM and raid component ll_delete_inode() calls
cl_sync_file_range() which is to iterate over both mdt and raid
components via mdc_io_fsync_start() and osc_io_fsync_start().  As
mdc_io_fsync_start() fails with -ENOENT due to failed write rpc,
osc_io_fsync_start() does not get called. Then
truncate_inode_pages_final() finds not-discarded pages and fails with:

  (osc_page.c:183:osc_page_delete()) Trying to teardown failed: -16
  (osc_page.c:184:osc_page_delete()) ASSERTION( 0 ) failed:
  (osc_page.c:184:osc_page_delete()) LBUG

Test to illustrate the issue is added.

The fix is to call lu_object_is_dying() under object lock.

Change-Id: I463c8a6f85d4f5fd934b167c6194f50ae9d4b7d4
HPE-bug-id: LUS-7189
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-on: https://review.whamcloud.com/41797
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14184 tests: component-add/del tests for DOM 70/40870/3
Vitaly Fertman [Fri, 4 Dec 2020 18:55:41 +0000 (21:55 +0300)]
LU-14184 tests: component-add/del tests for DOM

make duplicates of sanity-pfl 2,3 tests for DOM layout

HPE-bug-id: LUS-8282
Test-parameters: testlist="sanity-pfl/2.* sanity-pfl/3.*"
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: If73d7a436b2fc6b6b564cc6eec14ec9e7e4d6937
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/40870
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12828 ldlm: not freed req on enqueue 18/41818/2
Vitaly Fertman [Tue, 2 Mar 2021 20:43:08 +0000 (23:43 +0300)]
LU-12828 ldlm: not freed req on enqueue

ldlm_cli_enqueue may allocate a req but failed to allocate a req
slot and returns an errors without freeing the req.

Fixes: 85a12c6c8d ("LU-12828 ldlm: FLOCK request can be processed twice")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I9663528bbf2bf64f6439fed6c27d0bc3f274b867
HPE-bug-id: LUS-9337
Reviewed-on: https://es-gerrit.dev.cray.com/158433
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/41818
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14183 ldlm: wrong ldlm_add_waiting_lock usage 68/40868/2
Vitaly Fertman [Fri, 4 Dec 2020 17:22:55 +0000 (20:22 +0300)]
LU-14183 ldlm: wrong ldlm_add_waiting_lock usage

exp_bl_lock_at accounted the period since BLAST send until cancel RPC
came to server originally. LU-6032 started to update l_blast_sent for
expired locks which are still busy - prolonged locks when the timeout
expired. In fact, this is a good idea to cover not the whole period
but until any involved RPC comes - it avoids excessively large lock
callback timeouts - and the IO which does the lock prolong is also
able to re-start the AT cycle by updating the l_blast_sent.

Unfortunately, the change seems to be made occasionally as the main
prolong code was not adjusted accordingly.

Fixes: 292aa42e08 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock")
HPE-bug-id: LUS-9278
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Idc598508fc13aa33ac9fce56f13310ca6fc819d4
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/40868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted 36/41936/3
Yang Sheng [Mon, 8 Mar 2021 14:53:13 +0000 (22:53 +0800)]
LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted

The request may be referenced by other target even the threads
of service were stopped. It caused by some portal shared among
different services. Just wait the request to be released as a
workaround.

LustreError: (service.c::ptlrpc_service_purge_all())
ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed:
LustreError: (service.c::ptlrpc_service_purge_all()) LBUG
Pid: 21, comm: umount 3.10.0 #1 SMP
Call Trace:
 [<a01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<a01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<a0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
 [<a005e122>] ost_cleanup+0x82/0x1b0 [ost]
 [<a08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
 [<a08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
 [<a08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
 [<a08f8030>] class_decref+0x80/0x160 [obdclass]
 [<a08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
 [<a08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
 [<a08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
 [<a092a115>] server_stop_servers+0xd5/0x160 [obdclass]
 [<a092f6c6>] server_put_super+0x126/0xca0 [obdclass]
 [<8121068a>] generic_shutdown_super+0x6a/0xf0
 [<81210a62>] kill_anon_super+0x12/0x20
 [<a09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
 [<81210e59>] deactivate_locked_super+0x49/0x60
 [<812115a6>] deactivate_super+0x46/0x60
 [<8123019f>] cleanup_mnt+0x3f/0x80
 [<81230232>] __cleanup_mnt+0x12/0x20
 [<810ab085>] task_work_run+0xb5/0xf0
 [<8102ac12>] do_notify_resume+0x92/0xb0
 [<81783c83>] int_signal+0x12/0x17
 Kernel panic - not syncing: LBUG

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Idfb19df123ceae177a0e447e9344bac6861166bf
Reviewed-on: https://review.whamcloud.com/41936
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14492 tests: sanity 27Cb skip condition 03/41903/3
Alexander Zarochentsev [Thu, 15 Oct 2020 08:09:09 +0000 (11:09 +0300)]
LU-14492 tests: sanity 27Cb skip condition

The test skip condition is wrong and causes the
test to be skipped if large xattrs are not supported.
Fixing other tests as well.

Test-Parameters: trivial
Fixes: 591a9b4ce ("LU-9846 lod: Add overstriping support")
HPE-bug-id: LUS-9454
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7b9d96abb5e4cf2a3955e20828e57a64978e6229
Reviewed-on: https://review.whamcloud.com/41903
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
3 years agoLU-9859 libcfs: remove cfs_capable 83/41783/3
Peng Tao [Mon, 11 Jan 2021 15:49:38 +0000 (10:49 -0500)]
LU-9859 libcfs: remove cfs_capable

Use capable() directly.

Linux-commit: 2eb90a757e9d953c9e2a8fce530422189992fb1b

Change-Id: Iadaa3c743a350def37558b23d954f5dfd4e0844a
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/41783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14430 mdd: don't assert on default ACL big buffer 75/41775/6
Mikhail Pershin [Fri, 26 Feb 2021 14:48:36 +0000 (17:48 +0300)]
LU-14430 mdd: don't assert on default ACL big buffer

Previous patch may cause situations when default ACL buffer
is bigger than ACL buffer, so that default ACL EA may fit
into the former but not in the latter, causing assertion in
mdd_acl_init().

There is no need in assertion actually, just return -ERANGE so
ACL buffer will be re-allocated.

Fixes: f3d03bc38a3a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Change-Id: I8c0665ba693c60506812926a8372b61095d08f78
Reviewed-on: https://review.whamcloud.com/41775
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14427 libcfs: restore LNET_DUMP_ON_PANIC functionality. 88/41488/2
Mr NeilBrown [Tue, 9 Feb 2021 04:30:46 +0000 (15:30 +1100)]
LU-14427 libcfs: restore LNET_DUMP_ON_PANIC functionality.

The functionality enabled by --enable-panic-dumplog was inadvertently
removed in Commit ae0704381efc ("LU-9859 libcfs: merge linux-debug.c
into debug.c")

Restore it.

While we are there, add conditional-compliation for other code that is
only needed when this is enabled.

Test-Parameters: trivial
Fixes: ae0704381efc ("LU-9859 libcfs: merge linux-debug.c into debug.c")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If85882045c66e54ff8493396589d4ecbf13f8f3d
Reviewed-on: https://review.whamcloud.com/41488
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14401 sec: fix migrate for encrypted dir 13/41413/8
Sebastien Buisson [Thu, 4 Feb 2021 08:22:56 +0000 (17:22 +0900)]
LU-14401 sec: fix migrate for encrypted dir

When setting an encryption policy on a directory that we want to
be encrypted, we need to make sure it is empty.
But, in some cases, setting the LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr
should be allowed on non-empty directories, for instance when a
directory is migrated across MDTs into new shard directories.
Also, it is required for the encrpytion key to be available on the
client when migrating a directory so that the filenames can be
properly rehashed for the new MDT directory shard.
And, in any case, we need to prevent explicit setting of
LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr outside of encryption policy
definition.

Update sanity-sec test_49 to test migration of non-empty encrypted
directory, and add sanity-sec test_57 to test security.c protection.

Fixes: e8f74fb0f5 ("LU-12275 sec: verify dir is empty when setting enc policy")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2466ea35a871c6c07bdcf9fba7191485e855e655
Reviewed-on: https://review.whamcloud.com/41413
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-10973 lnet: initial LUTF C infrastructure 86/38086/41
Amir Shehata [Wed, 25 Mar 2020 02:07:58 +0000 (19:07 -0700)]
LU-10973 lnet: initial LUTF C infrastructure

LNet Unit test Framework is a utility that functionally tests LNet
via python scripts. It operates in a master/slave configuration.
Slaves run on multiple test nodes, while the master is responsible
for managing the slaves to perform specific tests.

The LUTF exercises the different LNet features via configuring
LNet through the lnetconfig interface or lnetctl, running traffic
and monitoring statistics and other logging to ensure that tests
have passed.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iefcc4d48d5f144a2abe1fdc0865331e9a9d27318
Reviewed-on: https://review.whamcloud.com/38086
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13107 utils: remove lctl lov_getconfig command 06/37106/11
Andreas Dilger [Fri, 26 Feb 2021 21:57:28 +0000 (16:57 -0500)]
LU-13107 utils: remove lctl lov_getconfig command

The "lctl lov_getconfig" command has been obsolete for some time,
but was kept around for sanity test_44a to work properly.  Now
that LU-11656 has landed, "lfs getstripe -d $DIR" can be used to
get the actual layout used for files created in a directory.

Remove the lov_getconfig command along with the IOC definition
it was using.

Test-Parameters: envdefinitions=ONLY=41a testlist=sanity
Change-Id: If94471b50fafc157c043d241dc19cdcd714cab07
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37106
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12885 mds: add enums for MDS_ATTR flags 12/33512/20
Andreas Dilger [Mon, 14 Oct 2019 03:13:18 +0000 (21:13 -0600)]
LU-12885 mds: add enums for MDS_ATTR flags

Add mds_attr_flags to the code to make it easier to follow the logic.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I833a6e6102f947a9276cb6bf03826fd4a53ebbe5
Reviewed-on: https://review.whamcloud.com/33512
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14491 ldiskfs: do not corrupt journal with bh modification 96/41896/2
Andrew Perepechko [Fri, 5 Mar 2021 14:10:38 +0000 (17:10 +0300)]
LU-14491 ldiskfs: do not corrupt journal with bh modification

Currently, ldiskfs_xattr_delete_inode() zeroes xattr inode
references in cached buffers that haven't been prepared by
get_write_access().

When using journal checksums, it is possible that these buffers
are modified after the checksum is calculated but before the
buffer has been written to journal. Journal replay will fail
with a journal checksum error message if this transaction needs
to be replayed.

Change-Id: Ia3d44f24715ad97b505e08706933e4eb608c115f
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-9483
Reviewed-on: https://review.whamcloud.com/41896
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-9679 osc: simplify osc_extent_find() 91/41691/5
NeilBrown [Thu, 13 Dec 2018 00:32:56 +0000 (11:32 +1100)]
LU-9679 osc: simplify osc_extent_find()

osc_extent_find() contains some code with the same functionality as
osc_extent_merge().  So replace that code with a call to
osc_extent_merge().

This requires that we set cur->oe_grants earlier, as
osc_extent_merge() needs that.
It also requires that osc_extent_merge() allow the victim to be
OES_INV.

Also:

 - fix a pre-existing bug - osc_extent_merge() should never try to
   merge two extends with different ->oe_mppr as later alignment
   checks can get confused.
 - Remove a redundant list_del_init() which is already included in
   __osc_extent_remove().

Linux-Commit: 85ebb57ddc5b ("lustre: osc: simplify osc_extent_find()")

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8a1e0d492f583ba9baf28bafa42d4e31c29ac0da
Reviewed-on: https://review.whamcloud.com/41691
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-6142 llite: create file_operations registeration function. 08/40608/6
James Simmons [Thu, 4 Feb 2021 14:48:15 +0000 (09:48 -0500)]
LU-6142 llite: create file_operations registeration function.

Create new ll_register_file_operations() to set sbi->ll_ops to the
correct struct file_operations. We can make all the struct
file_operations static.

Change-Id: I0369a4f64de5233d5272bc403f222366f9559000
Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/40608
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-6142 lustre: Make dev/body/type operations const 98/39398/7
Mr NeilBrown [Thu, 16 Jul 2020 04:14:10 +0000 (14:14 +1000)]
LU-6142 lustre: Make dev/body/type operations const

Many of
  struct md_device_operations
  struct dt_body_operations
  struct dt_object_operations
  struct dt_device_operations
  struct dt_index_operations
  struct lu_object_operations
  struct lu_device_operations
  struct lu_device_type_operations
are already const.  This patch makes the remainder 'const',
and changes a few to 'static'.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ife82c870a27a9e68e57208d49f51983a552e86ec
Reviewed-on: https://review.whamcloud.com/39398
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 years agoLU-14504 lod: lod_xattr_del() check obj existence 76/41976/2
Lai Siyao [Wed, 10 Mar 2021 10:13:18 +0000 (18:13 +0800)]
LU-14504 lod: lod_xattr_del() check obj existence

lod_declare_xattr_del() skips object if it doesn't exist, but
lod_xattr_del() doesn't, which may trigger assertion in
osp_xattr_del() if a stripe doesn't exist.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I00723d3b0243efd1357107c59dd86967e076e2af
Reviewed-on: https://review.whamcloud.com/41976
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14490 lmv: striped directory as subdirectory mount 93/41893/4
Lai Siyao [Fri, 5 Mar 2021 09:07:34 +0000 (17:07 +0800)]
LU-14490 lmv: striped directory as subdirectory mount

lmv_intent_lookup() will replace fid1 with stripe FID, but if striped
directory is mounted as subdirectory mount, it should be handled
differently. Because fid2 is directory master object, if stripe is
located on different MDT as master object, it will be treated as
remote object by server, thus server won't reply LOOKUP lock back,
therefore each file access needs to lookup "/".

And remote directory (either plain or striped) shouldn't be used for
subdirectory mount, because remote object can't get LOOKUP lock.
Add an option "mdt_enable_remote_subdir_mount" (1 by default for
backward compatibility), mdt_get_root() will return -EREMOTE if
user specified subdir is a remote directory and this option is
disabled.

Add sanity 247g, updated 247f.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I5e8f95ee95c4155336098e55b7569ed7a43865c1
Reviewed-on: https://review.whamcloud.com/41893
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13730 lod: don't confuse stale with primary flag 03/42003/7
Alex Zhuravlev [Thu, 11 Mar 2021 05:47:34 +0000 (08:47 +0300)]
LU-13730 lod: don't confuse stale with primary flag

there can be few in-sync replicas which are not primry.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8b984463a2665bc88f2f76247df5366a68d74ea6
Reviewed-on: https://review.whamcloud.com/42003
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12678 o2iblnd: change some ints to bool. 04/39304/5
Mr NeilBrown [Mon, 6 Jul 2020 12:34:39 +0000 (08:34 -0400)]
LU-12678 o2iblnd: change some ints to bool.

Each of these ints can suitably be bool.

Also fix various style issues.

Change-Id: Ic956366afc945f74e692dd5f8953149730a3703e
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39304
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13073 osp: don't block waiting for new objects 74/40274/42
Alex Zhuravlev [Fri, 16 Oct 2020 16:09:04 +0000 (19:09 +0300)]
LU-13073 osp: don't block waiting for new objects

if OST is down, then it's possible that few threads trying
to get already precreated object will get stuck. even worse
that all QoS-based allocations then are serialized by the
single semaphore, even those that wouldn't try to allocate
on failed OST.

the patch introduces noblock flag in the allocation hint
which is passed to OSP. then QoS code tries to allocate
objects in a non-blocking manner.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I38e66d7569aefecf800dbc32f1049ac87853439e
Reviewed-on: https://review.whamcloud.com/40274
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-9855 lustre: use with_imp_locked() more broadly. 95/39595/11
Mr NeilBrown [Fri, 7 Aug 2020 02:05:35 +0000 (12:05 +1000)]
LU-9855 lustre: use with_imp_locked() more broadly.

Several places in lustre take u.cli.cl_sem to protect access to
u.cli.cl_import, and so could use with_imp_locked() achieving cleaner
code.

Using with_imp_locked() in functions calling
ptlrpc_set_import_active() requires care as that function gets a
write-lock on ->cl_sem.  So they need to use with_imp_locked() only to
get a counted reference on the imp, and must drop the lock before
calling ptlrpc_set_import_active().

This patch makes those changes and also:

- introduces with_imp_locked_nested() for sptlrpc_conf_client_adapt(),
- re-indents obd_cleanup_client_import(), which is only tangentially
  related the the main purpose of this patch,
- removes code in ldlm_flock_completion_ast() which takes a copy
  of cl_import, and doesn't use it.
- adds with_imp_locked() to two functions named 'active_store' which
  weren't using it but should
- removes with_imp_locked() from ping_show() and instead includes it
  in ptlrpc_obd_ping() where 'imp' is actually used.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I01a713c200a1698af222bc72cf4f955227a98305
Reviewed-on: https://review.whamcloud.com/39595
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14395 kernel: kernel update RHEL7.9 [3.10.0-1160.15.2.el7] 22/41822/4
Jian Yu [Wed, 3 Mar 2021 01:41:12 +0000 (17:41 -0800)]
LU-14395 kernel: kernel update RHEL7.9 [3.10.0-1160.15.2.el7]

Update RHEL7.9 kernel to 3.10.0-1160.15.2.el7.

Change debuginfo download location since debuginfo.centos.org
does not provide kernel-debuginfo-common anymore.

The patch also reverts the following fix from RHEL 7.9 kernel
since version 3.10.0-1160.8.1.el7:

- [kernel] timer: Fix lockup in __run_timers() caused by
  large jiffies/timer_jiffies delta (Waiman Long) [1849716]

The above fix caused Hard LOCKUP kernel panic.

Test-Parameters: clientdistro=el7.9 serverdistro=el7.9

Change-Id: Icdd9e8bf4bd595dece266f6c5a9b0de344781a93
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41822
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 years agoLU-14477 lnet: handle possiblity of IPv6 being unavailable. 91/41791/5
Mr NeilBrown [Mon, 1 Mar 2021 00:54:25 +0000 (11:54 +1100)]
LU-14477 lnet: handle possiblity of IPv6 being unavailable.

If CONFIG_IPV6 is not enabled, the attempt to create an IPv6 socket
for accepting new incoming connections will fail.  In that case
we need to creae an IPv4 socket instead.

Also ipv6_dev_get_saddr will not be available, so we must not include
the code that tries to use it.

Test-Parameters: trivial
Fixes: e4fa181abf10 ("LU-10391 lnet: allow creation of IPv6 socket")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ib576c7ea498c90f549958f3c1aa0beb7fe2b66ad
Reviewed-on: https://review.whamcloud.com/41791
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14478 ldiskfs: support Ubuntu 20.04.1 kernel 5.4.0-66 86/41786/2
Jian Yu [Sun, 28 Feb 2021 05:06:56 +0000 (21:06 -0800)]
LU-14478 ldiskfs: support Ubuntu 20.04.1 kernel 5.4.0-66

This patch fixes the conflict in ext4-pdirop.patch to support
Ubuntu 20.04.1 server with kernel version greater than or
equal to 5.4.0-66.

Test-Parameters: trivial

Change-Id: I336f5bb430f87aaefc6d79a782dfd779d20e0cf7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41786
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14460 lnet: fix mismatched printf format 55/41755/4
Lei Feng [Thu, 25 Feb 2021 00:31:56 +0000 (08:31 +0800)]
LU-14460 lnet: fix mismatched printf format

Original "%llx" does not work on all platforms. Fix it.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I2edecbf66ccb2141c72294d324ade79574f5c084
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/41755
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14431 log: Add ending newline for some messages. 23/41723/5
Lei Feng [Tue, 23 Feb 2021 03:35:18 +0000 (11:35 +0800)]
LU-14431 log: Add ending newline for some messages.

Some log messages don't have ending newline. So two log messages
will be merged into one line and cause error for parsing program.
Add ending newline for these messages.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I79acba9fc494c148dfe2c56cdbe7694b4bbc5cf4
Test-Parameters: trivial
Reviewed-on: https://review.whamcloud.com/41723
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
3 years agoLU-5170 utils: add lfs df -H for decimal units 71/41271/3
Andreas Dilger [Wed, 20 Jan 2021 00:48:28 +0000 (17:48 -0700)]
LU-5170 utils: add lfs df -H for decimal units

Running "lfs df -ih" prints a base-two suffix for inode counts,
which is somewhat unintuitive (e.g. 100000 becomes 97.2K inodes).
While this is consistent with upstream "df", it also has a "-H"
option to print the output with decimal suffixes.

Add the -H/--si option to "lfs df" also.

Document the 'f' (flash) and 'N' (noprecreate) flags for "lfs df".

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I06b8df4ae2940107720e57013bf187b3473ebbe5
Reviewed-on: https://review.whamcloud.com/41271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-14279 test: fix block soft testing failure 94/41094/4
Wang Shilong [Mon, 28 Dec 2020 02:33:24 +0000 (10:33 +0800)]
LU-14279 test: fix block soft testing failure

Soft least qunit was introduced to avoid performance
drop when users have reached soft limit, but timer has
not reached, it tried to acquire more space(not more than
least qunit) to get reasonable performance.

Test cases need be aware of this, which means slave might
exceed quota limit a bit(but should not more than least qunit
eg 4M).

Test-Parameters: trivial testlist=sanity-quota env=ONLY="3a 3b 3c"
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ia221d97d158a8da4dc1fe1611aebac2f5086440e
Reviewed-on: https://review.whamcloud.com/41094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12477 lustre: check return status of register_shrinker() 83/40883/4
Mr NeilBrown [Mon, 7 Dec 2020 02:07:31 +0000 (13:07 +1100)]
LU-12477 lustre: check return status of register_shrinker()

register_shrinker() can fail with -ENOMEM.  We should check for that
and abort the relevant initialization functions when it happens.

For ldlm_pools, ldlm_pools_fini() can be called when ldlm_pools_init()
fails, or even in case where it hasn't been called.  So add a static
flag to ensure we ldlm_pools_fini() does undo things that haven't been
done.

For lu_global_init() we need to add proper cleanup if anything fails.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie66326486c7738547d4211095bb1d37dc75e0b6a
Reviewed-on: https://review.whamcloud.com/40883
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12477 libcfs: Further reduce complexity for shrinkers. 31/40831/6
Mr NeilBrown [Thu, 12 Nov 2020 05:08:26 +0000 (16:08 +1100)]
LU-12477 libcfs: Further reduce  complexity for shrinkers.

Commit c4c17fa4a3f5 ("LU-12477 libcfs: Remove obsolete config checks")
reduced the complexity of shinkers by removing support for older
kernels, but could have gone a lot further.  This patch adds
further simplification.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ibcc84f61e542b503f795b16a7144e430f8b73582
Reviewed-on: https://review.whamcloud.com/40831
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13903 build: Move GLIBC/openssl checks to where needed. 53/39653/6
Mr NeilBrown [Wed, 5 Aug 2020 02:06:52 +0000 (12:06 +1000)]
LU-13903 build: Move GLIBC/openssl checks to where needed.

Two config checks on glibs support:
LC_GLIBC_SUPPORT_FHANDLES
LC_GLIBC_SUPPORT_COPY_FILE_RANGE
and two on openssl support:
LC_OPENSSL_SSK
LC_OPENSSL_GETSEPOL

are currently only run when modules are being built.
The FHANDLES test is needed when building tests.
The COPY_FILE_RANGE test is needed when building
utils as are the OPENSSL checks

So move the calls to these tests to a more appropriate place, so that
  ./configure --disable-modules --disable-server
can run correctly.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id7801112cd53601b3d560119784cbd062bf9610e
Reviewed-on: https://review.whamcloud.com/39653
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 years agoLU-13779 lnet: Correct asymmetric route detection 49/39349/7
Chris Horn [Fri, 10 Jul 2020 17:33:50 +0000 (12:33 -0500)]
LU-13779 lnet: Correct asymmetric route detection

Failure to lookup the remote net for LNET_NIDNET(src_nid) indicates an
asymmetric route, but we do not drop the message in this case. Another
problem with this code is that there is no guarantee that we'll have a
route->lr_lnet that matches the net of ni->ni_nid.

We can move the asymmetric route detection to after we have looked up
the lpni of from_nid. Then, we can look at just the routes associated
with the gateway that owns the lpni. If one of those routes has
lr_net == LNET_NIDNET(src_nid), then the route is symmetrical.

Fixes: 4932febc12 ("LU-11894 lnet: check for asymmetrical route messages")
Test-Parameters: trivial
HPE-bug-id: LUS-9087
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8044d3f53e6f000c1e4d7c4e34b3b21afe0f9711
Reviewed-on: https://review.whamcloud.com/39349
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-12678 socklnd: change various ints to bool. 02/39302/3
Mr NeilBrown [Mon, 6 Jul 2020 12:34:41 +0000 (08:34 -0400)]
LU-12678 socklnd: change various ints to bool.

Each of these int variables, and one int function, are
really truth values, so change to bool.

Also convert some spaces to tabs etc.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia62a86e549c90a287a20a3b2ef7533c1b700d17e
Reviewed-on: https://review.whamcloud.com/39302
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 years agoLU-13783 libcfs: support removal of kernel_setsockopt() 59/39259/7
Mr NeilBrown [Fri, 3 Jul 2020 03:51:50 +0000 (13:51 +1000)]
LU-13783 libcfs: support removal of kernel_setsockopt()

Linux 5.8 removes kernel_setsockopt() and kernel_getsockopt(), and
provides some helper functions for some accesses that are
not trivial.

This patch adds those helpers to libcfs when they are not available,
and changes (nearly) all calls to kernel_[gs]etsockopt() to
either use direct access to a helper call.

->keepalive() is not available before v4.11-rc1~94^2~43^2~14
and there is no helper function, so for SO_KEEPALIVE we
need to have #ifdef code in the C file.

TCP_BACKOFF* setting are not converted as they are not available in
any upstream kernel, so no conversion is possible.

Also include some minor style fixes and change lnet_sock_setbuf() and
lnet_sock_getbuf() to be 'void' functions.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I539cf8d20555ddb3565fa75130fdd3acf709c545
Reviewed-on: https://review.whamcloud.com/39259
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>