Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-14702 osc: cleanup comment in osc_object_is_contended 75/43775/3
Li Xi [Tue, 25 May 2021 00:49:01 +0000 (08:49 +0800)]
LU-14702 osc: cleanup comment in osc_object_is_contended

ll_file_is_contended() does not exist any more, so the comment
is invalid.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: Ib68e8dc885e6812065c076d36dc61938a30d6980
Reviewed-on: https://review.whamcloud.com/43775
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14701 tests: wrong get[set]_osd_param() call 74/43774/2
Elena Gryaznova [Mon, 24 May 2021 15:14:26 +0000 (18:14 +0300)]
LU-14701 tests: wrong get[set]_osd_param() call

sanity-dom:sanityn:test_19 is always skipped because of
get_osd_param() is called incorrectly for DOM=yes.
For osd-* MDT device is to be used instead of default
  device=${2:-$FSNAME-OST*}

Fixes: a7625cd2f37a ("LU-3285 test: add Data-on-MDT tests and fixes")
Test-Parameters: trivial testlist=sanity-dom env=ONLY=sanityn
Test-Parameters: trivial testlist=sanityn env=ONLY=19
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9965
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I2bb9fc7fbaac966ea2254071e7ea82b963a93ad3
Reviewed-on: https://review.whamcloud.com/43774
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14693 mdt: skip DLM when opening volatile files 42/43742/2
John L. Hammond [Wed, 28 Apr 2021 18:43:51 +0000 (13:43 -0500)]
LU-14693 mdt: skip DLM when opening volatile files

In mdt_reint_open(), when opening a volatile file skip taking a
MDS_INODELOCK_UPDATE lock on the parent directory.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8ee89710f52e8097e1412897de91159702560e4a
Reviewed-on: https://review.whamcloud.com/43742
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13055 libcfs: allow comma-separated masks 41/43741/4
Andreas Dilger [Wed, 19 May 2021 07:44:32 +0000 (01:44 -0600)]
LU-13055 libcfs: allow comma-separated masks

For debug and changelog mask names, allow a comma-separated list
of names to be given, so that the space-separated list does not
need to be quoted for use.

Change sanity-quota to use a comma-separated list to verify it works.

Fix a couple of test cases where the debug parameter is set and
printed overly verbosely during tests.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icf1e3ebc74f0e48b38a65486b2275ec4c33ebbe5
Reviewed-on: https://review.whamcloud.com/43741
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14688 mdt: changelog purge deletes plain llog 19/43719/2
Alexander Boyko [Mon, 17 May 2021 13:29:01 +0000 (09:29 -0400)]
LU-14688 mdt: changelog purge deletes plain llog

With a massive cancel records changelog could delete a plain
llog file and skip one by one record cancelling.
Also patch fixes the race between llog_destroy and llog_next_block.

HPE-bug-id: LUS-9950
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I47c2ed97945e979745255381f83b6a417d7ba8b1
Reviewed-on: https://review.whamcloud.com/43719
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11188 lfs: add "--perm" option to "lfs find" 15/43715/4
Courrier Guillaume [Thu, 29 Apr 2021 09:35:01 +0000 (11:35 +0200)]
LU-11188 lfs: add "--perm" option to "lfs find"

Add support for "--perm" option to "lfs find".
The option supports both octal and symbolic representation and
follows the POSIX standard.
As for GNU find, it supports '-' and '/' modifiers before the
permission.

Signed-off-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Change-Id: I8e1292421986c3a4bde686f3c7dc7bfcb679cabc
Reviewed-on: https://review.whamcloud.com/43715
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
2 years agoLU-9537 utils: implement "lfs getstripe --fid" for directories 14/43714/4
Yoann Valeri [Wed, 12 May 2021 09:48:04 +0000 (11:48 +0200)]
LU-9537 utils: implement "lfs getstripe --fid" for directories

Enhance the lfs command by displaying a directory fid when using "lfs
getstripe --fid" on one.

When displaying information through "lfs getstripe --fid", we would
check if the given path was associated to a directory or not. If so,
the fid display would just be skipped, showing a simple blank line.
However, a user could still find the fid of a directory by using "lfs
path2fid" on the same directory.  Therefore, this patch adds a hook to
the underlying "llapi_fd2fid()" (called internally by "lfs path2fid")
when trying to display a directory fid with "lfs getstripe --fid".

Signed-off-by: Yoann Valeri <yoann.valeri@cea.fr>
Change-Id: Ia153717e3feb1a359b8b54297995365fc34a1c29
Reviewed-on: https://review.whamcloud.com/43714
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12678 lnet: use list_for_each_entry() 91/43591/4
James Simmons [Mon, 10 May 2021 20:10:29 +0000 (16:10 -0400)]
LU-12678 lnet: use list_for_each_entry()

Several loops use list_for_each(), then call list_entry()
each time in the loop This complexity can be replaced with
the use of  list_for_each_entry().

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ib7968466c4fce5173b20cbaf6c878975ba522d43
Reviewed-on: https://review.whamcloud.com/43591
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12836 osd-zfs: Catch all ZFS pool change events 52/43552/3
Tony Hutter [Fri, 12 Mar 2021 01:23:16 +0000 (17:23 -0800)]
LU-12836 osd-zfs: Catch all ZFS pool change events

This change adds the following symlinks:

  vdev_attach-lustre -> statechange-lustre.sh
  vdev_remove-lustre -> statechange-lustre.sh
  vdev_clear-lustre -> statechange-lustre.sh

This makes it so the statechange-lustre.sh script is also called on
all ZFS events that could change the pool state.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Change-Id: I18edc86749e8ab91bb45f21aafd3fd47e78cbaef
Reviewed-on: https://review.whamcloud.com/43552
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14663 mdc: start changelog thread upon first access 13/43513/5
Alex Zhuravlev [Sun, 2 May 2021 09:16:01 +0000 (12:16 +0300)]
LU-14663 mdc: start changelog thread upon first access

thus leaving the caller a chance to set CHANGELOG_FLAG_FOLLOW,
otherwise the thread (started from open()) can reach the end
of the changelog and exit early.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic14b6c991010bbe5197b5a8b0fedf0f4007e98c1
Reviewed-on: https://review.whamcloud.com/43513
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: limit hard links to linkEA size for enc files 87/43387/4
Sebastien Buisson [Mon, 19 Oct 2020 14:23:05 +0000 (23:23 +0900)]
LU-13717 sec: limit hard links to linkEA size for enc files

Some operations on encrypted files require to identify all names for
files having the same FID. For instance, for lookup, getattr or unlink
on encrypted files without the encryption key, we need to perform an
operation by FID instead of the actual name.
In order to make operations by FID unambiguous on server side, we
decide to limit the number of possible hard links for encrypted files,
to what the linkEA can contain.
Currently linkEA stores 4KiB of links, that is 14 NAME_MAX links, or
119 16-byte names.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I20a01874899f95b2ff61e05b2aa6851d135633e8
Reviewed-on: https://review.whamcloud.com/43387
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14629 sec: forbid file rename from enc to unencrypted dir 04/43404/6
Sebastien Buisson [Thu, 22 Apr 2021 09:26:51 +0000 (11:26 +0200)]
LU-14629 sec: forbid file rename from enc to unencrypted dir

fscrypt allows renaming an encrypted file from an encrypted directory
into an unencrypted directory. But it leaves the file encrypted,
sitting in an unencrypted directory, which can lead to unexpected
issues.
So just prevent this kind of rename, and adapt sanity-sec test_47
accordingly.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I38e17caa4786c1c8d80a363a826a5aa298eb0980
Reviewed-on: https://review.whamcloud.com/43404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14586 tests: set mpi np correctly 17/43217/2
Elena Gryaznova [Tue, 6 Apr 2021 08:05:02 +0000 (11:05 +0300)]
LU-14586 tests: set mpi np correctly

The number of mpi processes is to be calculated
based on the number of clients in clients subset.

Fixes: 9ecb000 ("LU-13281 tests: ha.sh improvements")
Test-Parameters: trivial
Signed-off-by: Elenai Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9716
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: If574743e2e29a309a8d7a021056fa726495fa959
Reviewed-on: https://review.whamcloud.com/43217
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14340 tests: remove stale test-framework functions 66/41266/3
Andreas Dilger [Tue, 19 Jan 2021 04:43:57 +0000 (21:43 -0700)]
LU-14340 tests: remove stale test-framework functions

Delete functions not referenced anywhere in test-framework.sh
(each one will still appear only once, where it is defined):

  $ grep "^[a-z].* *()" lustre/tests/test-framework.sh |
  sed -e 's/function //' -e 's/ *(.*//' |
  while read F; do (( $(grep $F lustre/tests/*.sh | wc -l) > 1 )) ||
      echo "$F"
  done

  mdsdevlabel ostdevlabel cleanup_check obd_name unmount_zfs
  is_empty_fs get_svr_devs at_min_get canonical_path agts_nodes
  mixed_ost_devs setstripe_nfsserver delayed_recovery_enabled
  mds_on_old_device remove_ost_objects remove_mdt_files
  duplicate_mdt_files get_block_size

The unmount_zfs() function returned by the above check *is*
used via unmount_fstype() calling it as "unmount_$fstype".

Fixes: 5a3dfc2b5d90 ("LU-7301 tests: delete old lfsck tests")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I71842152a87f15918147da860745ef8e981f6121
Reviewed-on: https://review.whamcloud.com/41266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13950 lnet: do not crash if lnet_sock_getaddr returns error 34/39834/6
Artem Blagodarenko [Tue, 25 Aug 2020 10:01:11 +0000 (06:01 -0400)]
LU-13950 lnet: do not crash if lnet_sock_getaddr returns error

Some issues with network lead to panic in ksocknal_accept

rc = lnet_sock_getaddr(sock, true, &peer_ip, &peer_port);
LASSERT(rc == 0); /* we succeeded before */

Let's pass this error to the caller.

Change-Id: I34d43c19b4e75422db50e7abb02cac3510882b0d
hpe-bug-id: LUS-9256
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157753
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/39834
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14687 llite: Return errors for aio 22/43722/7
Patrick Farrell [Wed, 19 May 2021 18:08:57 +0000 (14:08 -0400)]
LU-14687 llite: Return errors for aio

The aio code incorrectly discards errors from
ll_direct_rw_pages.  Fix this and add a test for this.

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I49dadd0b3692820687fa6a1339e00516edf7a5d5
Reviewed-on: https://review.whamcloud.com/43722
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11290 osc: Batch gang_lookup cbs 89/33089/10
Patrick Farrell [Wed, 3 Mar 2021 06:50:04 +0000 (09:50 +0300)]
LU-11290 osc: Batch gang_lookup cbs

The osc_page_gang_lookup call backs can be trivially
converted to operate in batches rather than one page at a
time.  This improves cancellation time for locks protecting
large numbers of pages by about 10% (after landing
another optimization (LU-11290 ldlm: page discard speedup)
it shows 6% for canceling a lock for 30GB cached file ).

Truncate to zero time (with one lock protecting many pages)
was improved by about 5-10% as well.  Lock weighing
performance should be improved slightly as well, but is
tricky to benchmark.

HPE-bug-id: LUS-6432
Change-Id: Ib30594ae97182cbeb18051d6cee860c97ae7e119
Signed-off-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/33089
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14273 tests: enhance ha.sh to run custom cmd on bg 73/41073/2
Elena Gryaznova [Tue, 22 Dec 2020 15:19:15 +0000 (18:19 +0300)]
LU-14273 tests: enhance ha.sh to run custom cmd on bg

We need this ability to run lfs migrate in parallel
with mdtest and IOR.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9371
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: Id52ed0731aba24d3f40813da5fd2bb9b94ae63e5
Reviewed-on: https://review.whamcloud.com/41073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13942 obd: check if sbi->ll_md_exp is initialized 12/39812/6
Artem Blagodarenko [Fri, 21 Aug 2020 17:43:38 +0000 (13:43 -0400)]
LU-13942 obd: check if sbi->ll_md_exp is initialized

Null reference at the start of obd_statfs() function is possible
because of ll_fill_super vs lctl race.

ll_md_exp is initialized in ll_fill_super()->
client_common_fill_super(), but if mount process stucks
in lustre_process_log() it doesn't reach client_common_fill_super().

Change-Id: Ife72a62ba42573e2a9c6d244e36cde738b70c15a
hpe-bug-id: LUS-9150
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157732
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/39812
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14642 flr: transfer layout version on layout change 72/43472/7
Bobi Jam [Wed, 28 Apr 2021 05:16:05 +0000 (13:16 +0800)]
LU-14642 flr: transfer layout version on layout change

After layout changed (mirror extend/split), the file's layout version
needs to transfer to OST ASAP so that following IO won't be blocked
since OFD will check its layout version stored in the xattr
XATTR_NAME_FID and find that the layout version from the client IO is
bigger (ofd_verify_layout_version()).

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I353800e868eaf13e3c795926b0d76fb1eb45c535
Reviewed-on: https://review.whamcloud.com/43472
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14645 utils: setstripe cleanup 65/43465/3
Vitaly Fertman [Tue, 27 Apr 2021 19:15:30 +0000 (22:15 +0300)]
LU-14645 utils: setstripe cleanup

lfs setstripe checks stripe parameters differently for PFL and !PFL
layouts. Whereas the PFL layout is checked in comp_args_to_layout()
individually and in llapi_layout_sanity_cb() in pairs, !PFL layout
verification is done partially in several places. Create a common
llapi_stripe_param_verify() for this purpose. Make the checks for
both cases symmetric.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I456b1b2e876229ac1a354d4e3879624325856574
HPE-bug-id: LUS-9886
Reviewed-on: https://es-gerrit.dev.cray.com/158589
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/43465
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14644 vvp: wait for nrpages to be updated 64/43464/2
Vitaly Fertman [Tue, 27 Apr 2021 18:43:06 +0000 (21:43 +0300)]
LU-14644 vvp: wait for nrpages to be updated

truncate_inode_pages() says there still may be a page in a process
of deletion upon return. wait for another thread which is doing
__delete_from_page_cache() to get nrpages updated.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I165b3d0866efaf2eb7e977520ebba4ee831874ab
HPE-bug-id: LUS-8842
Reviewed-on: https://es-gerrit.dev.cray.com/158557
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/43464
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14594 ptlrpc: do not match reply with resent RPC 42/43242/4
Vitaly Fertman [Thu, 8 Apr 2021 12:00:11 +0000 (15:00 +0300)]
LU-14594 ptlrpc: do not match reply with resent RPC

The server is able to filter by the connection ID, and drop late
coming RPCs of previous connections, however it does not happen for
replies. At the same time, this is a problem in some cases.

Allocate new matchbits for resends and check replies by them, instead
of xid. Connect RPCs are exceptions due to interop with old server -
at the time of connect we do not know yet if the server supports it.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I2aad037002b488b0c3371544ede0c47940f87efe
HPE-bug-id: LUS-9596
Reviewed-on: https://es-gerrit.dev.cray.com/158446
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/43242
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14673 sec: annotate algorithms taking optional key 56/43656/3
Sebastien Buisson [Tue, 11 May 2021 08:59:03 +0000 (10:59 +0200)]
LU-14673 sec: annotate algorithms taking optional key

Crypto algorithms implementing a ->setkey() method but that can also
be used without a key must set the CRYPTO_ALG_OPTIONAL_KEY flag if
defined in the kernel.
In Lustre, adler32 implementation defines a ->setkey() method, but
its "key" is not actually a cryptographic key.

Linux-commit: a208fa8f33031b9e0aba44c7d1b7e68eb0cbd29e

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I362211d1b1aa3763fe1481cebb3629b255f29e41
Reviewed-on: https://review.whamcloud.com/43656
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14689 hsm: starting running HSM coordinator should success 20/43720/2
Li Xi [Mon, 17 May 2021 14:49:55 +0000 (22:49 +0800)]
LU-14689 hsm: starting running HSM coordinator should success

When starting a running coordinator, the command should succeed
no matter how many times the command runs.

And this should be the same for stopping a stopped coordinator.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I99169de35d6fcc11e03604ac63cdc4358e25b3d2
Reviewed-on: https://review.whamcloud.com/43720
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14549 llite: refresh layout after mirror merge/split 16/43716/3
Bobi Jam [Mon, 17 May 2021 09:14:33 +0000 (17:14 +0800)]
LU-14549 llite: refresh layout after mirror merge/split

mirror merge/split updates file's LOVEA and revokes client's layout
lock, but the client issuing the layout change needs to refresh its
layout (lov->lsm) as well.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7671efe2fe5354ba0e1503b146045165608e042c
Reviewed-on: https://review.whamcloud.com/43716
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14430 mdd: use own rec_hdr for changelog declare 83/43683/3
Andreas Dilger [Thu, 13 May 2021 00:41:47 +0000 (18:41 -0600)]
LU-14430 mdd: use own rec_hdr for changelog declare

Do not use an lu_buf just to declare the changelog record.  This
only needs llog_rec_hdr to pass in lrh_len, so declaring rec_hdr
on the stack avoids the overhead of using the lu_buf.

Fixes: f3d03bc38a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b6f1d761aa98aa6ecb023894bde03dce23ebbe5
Reviewed-on: https://review.whamcloud.com/43683
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14681 obclass: fix typo of comment on job ID 67/43667/3
Li Xi [Wed, 12 May 2021 02:21:16 +0000 (10:21 +0800)]
LU-14681 obclass: fix typo of comment on job ID

Comments on how to get job ID are confusing because of the typo.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I9d714323f106dfb76eafc8d70346409b38a9b66b
Reviewed-on: https://review.whamcloud.com/43667
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14291 lustre: rename tgt_pool_* functions. 24/43624/3
James Simmons [Mon, 10 May 2021 13:43:56 +0000 (09:43 -0400)]
LU-14291 lustre: rename tgt_pool_* functions.

Functions starting with tgt_* represents code for target handling
used by Lustre servers. Now that the pool functions are used by
both clients and servers rename it to lu_tgt_* to mirror how
lu_tgt_desc_* is used since both represents remote server targets.

Test-Parameters: trivial
Change-Id: I2a9084d5bf9cea3b373c96e15cba1a41631d1172
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43624
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14672 kernel: kernel update SLES12 SP5 [4.12.14-122.66.2] 50/43550/2
Jian Yu [Wed, 5 May 2021 18:19:39 +0000 (11:19 -0700)]
LU-14672 kernel: kernel update SLES12 SP5 [4.12.14-122.66.2]

Update SLES12 SP5 kernel to 4.12.14-122.66.2 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: Ib2bf4795ccb21dbd0bb9202228ff32d73a203eee
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14671 kernel: kernel update SLES15 SP2 [5.3.18-24.61.1] 49/43549/2
Jian Yu [Wed, 5 May 2021 18:12:24 +0000 (11:12 -0700)]
LU-14671 kernel: kernel update SLES15 SP2 [5.3.18-24.61.1]

Update SLES15 SP2 kernel to 5.3.18-24.61.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 130 136 817" \
clientdistro=sles15sp2 serverdistro=el7.9 \
testlist=sanity

Change-Id: Ie0aab7cc7200796ed8e4d75862ceaef020943c08
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43549
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14670 kernel: kernel update RHEL7.9 [3.10.0-1160.25.1.el7] 48/43548/3
Jian Yu [Wed, 5 May 2021 17:58:18 +0000 (10:58 -0700)]
LU-14670 kernel: kernel update RHEL7.9 [3.10.0-1160.25.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.25.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ic846d648c45476cc4886ce86577605bf3e66d935
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14622 osd: mark pages accessed on reads 67/43367/6
Alex Zhuravlev [Mon, 19 Apr 2021 06:10:17 +0000 (09:10 +0300)]
LU-14622 osd: mark pages accessed on reads

to improve cache hit ratio

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If4850465d118ed62e9da105dc0cf144ff5041fd3
Reviewed-on: https://review.whamcloud.com/43367
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13783 ldiskfs: Add support for mainline 5.8 kernel 73/40373/4
Mr NeilBrown [Wed, 21 Oct 2020 02:34:09 +0000 (13:34 +1100)]
LU-13783 ldiskfs: Add support for mainline 5.8 kernel

Various changes needed for 5.8 over 5.4:
 - ext4_mark_inode_dirty is now a macro, so export
     __export_mark_inode_dirty instead
 - procfs additions need to use 'struct proc_ops'
 - inode-test.c is a new C file that we MUST NOT build
 - various ordinary conflicts

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I681ab26c60fb35a1ef5f518ee7cac8766e6fde47
Reviewed-on: https://review.whamcloud.com/40373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13952 quota: default OST Pool Quotas 73/39873/18
Sergey Cheremencev [Thu, 10 Sep 2020 12:48:01 +0000 (15:48 +0300)]
LU-13952 quota: default OST Pool Quotas

Patch makes ability to set default quota
limits per OST pool.
Patch also adds sanity-quota_73.

Test-Parameters: testlist=sanity-quota
HPE-bug-id: LUS-9133
Change-Id: I9e49def231aeeed4588e5e3fbcd29fdd62a35855
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39873
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13852 pcc: don't alloc FID in LLITE for pcc open 68/39568/11
Lai Siyao [Fri, 31 Jul 2020 18:09:06 +0000 (02:09 +0800)]
LU-13852 pcc: don't alloc FID in LLITE for pcc open

ll_lookup_it(IT_OPEN) always alloc FID on MDT0 for pcc open, but
the open request is sent to MDT where the name hash points to,
which may be different from the MDT where the FID is, which will
trigger osp_md_create() assertion because file is created remotely.

This FID allocation is not necessary, and it can be left to be done
in lmv_intent_open() by LMV layer, because the MDT is chosen in
LMV. Then when it's done, the FID allocated can be used to initialize
PCC inode.

Change assertion in osp_md_create() to error message and return
error.

Update sanity-sec 2a for this.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ccea3f9e7cca5083695c71135b9a5805f833b14
Reviewed-on: https://review.whamcloud.com/39568
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14004 llite: default lsm update may memory leak 03/40103/5
Lai Siyao [Sun, 27 Sep 2020 06:36:57 +0000 (14:36 +0800)]
LU-14004 llite: default lsm update may memory leak

ll_update_default_lsm_md() should check whether lli_default_lsm_md
is set before setting it to the data from lustre_md, and if it's set,
release the old data to avoid memory leak.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I9c8434c5d62f9fb751788031d6769fd49427c371
Reviewed-on: https://review.whamcloud.com/40103
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14646 flr: write a FLR file downgrade SoM 71/43471/5
Bobi Jam [Wed, 28 Apr 2021 04:43:11 +0000 (12:43 +0800)]
LU-14646 flr: write a FLR file downgrade SoM

Seek over file size and write a FLR file does not change its SoM
and that makes file size incorrect.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3075389721bdd40be60e9206c37f6c1bea514cce
Reviewed-on: https://review.whamcloud.com/43471
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14631 quota: fix qunit sort 10/43410/5
Sergey Cheremencev [Mon, 5 Apr 2021 12:27:34 +0000 (15:27 +0300)]
LU-14631 quota: fix qunit sort

Fix lqes_cmp that is used to sort lqes by qunit. As lqes_cmp returns
integer, it returns incorrects values if difference between qunits is
grater than 4GB causing write to hang instead of fail with -EDQUOT:

[<ffffffffc0701945>] cl_sync_io_wait+0x295/0x3c0 [obdclass]
[<ffffffffc07026f8>] cl_io_submit_sync+0x1c8/0x360 [obdclass]
[<ffffffffc128dc0a>] vvp_io_commit_sync+0x12a/0x460 [lustre]
[<ffffffffc128f5ee>] vvp_io_write_commit+0x4de/0x620 [lustre]
[<ffffffffc128fa39>] vvp_io_write_start+0x309/0x990 [lustre]
[<ffffffffc0700a18>] cl_io_start+0x68/0x130 [obdclass]
[<ffffffffc0702e8c>] cl_io_loop+0xcc/0x1c0 [obdclass]
[<ffffffffc1243514>] ll_file_io_generic+0x5c4/0xdc0 [lustre]
[<ffffffffc12441b9>] ll_file_aio_write+0x289/0x730 [lustre]
[<ffffffffc1244760>] ll_file_write+0x100/0x1c0 [lustre]
[<ffffffffa0241320>] vfs_write+0xc0/0x1f0
[<ffffffffa024213f>] SyS_write+0x7f/0xf0

The issue is occurred if a user hits block hard limit in a pool (pools
limit 6GB), while global limit is set to some huge value(53T in my case)

Change global limit in sanity-quota_1e to check that system doesn't
hung anymore.

HPE-bug-id: LUS-9891
Change-Id: I5a16fd3a40172187bbf35d9a9c9bfeef2ef3a108
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/43410
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14647 flr: mmap write/punch does not stale other mirrors 70/43470/5
Bobi Jam [Wed, 28 Apr 2021 05:07:36 +0000 (13:07 +0800)]
LU-14647 flr: mmap write/punch does not stale other mirrors

mmap write and punch/fallocate do not stale other mirrors and makes
FLR file contains different content in different mirrors.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I93a3eb5ba898e3bf0ce108718506b742ed485da5
Reviewed-on: https://review.whamcloud.com/43470
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14430 mdd: use own buffer for changelog 72/43672/3
Mikhail Pershin [Wed, 12 May 2021 06:55:29 +0000 (09:55 +0300)]
LU-14430 mdd: use own buffer for changelog

Use own persistent buffer for changelog needs to don't
interfere with generic big_buf in MDD thread info which
can be in use.

Fixes: f3d03bc38a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I4f4692b5556eaa98e2e23d7b58c925e33401e4e5
Reviewed-on: https://review.whamcloud.com/43672
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14593 utils: specify libzpool 36/43236/4
Alex Zhuravlev [Thu, 8 Apr 2021 14:07:09 +0000 (17:07 +0300)]
LU-14593 utils: specify libzpool

otherwise libmount_utils_zfs doesn't build on some platforms

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id850ba519f7a898e73c099887f0c9eca5c9b5b6f
Reviewed-on: https://review.whamcloud.com/43236
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12678 o2iblnd: fix bug in list_first_entry() change. 58/43558/2
Mr NeilBrown [Thu, 6 May 2021 00:19:30 +0000 (10:19 +1000)]
LU-12678 o2iblnd: fix bug in list_first_entry() change.

This comparison should be != NULL, else a NULL pointer could be
dereferenced.

Test-Parameters: trivial
Fixes: 34b57a6f8fcd ("LU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4510e2e0f2eb7b5bf86626e5ddb5ee537d3fae02
Reviewed-on: https://review.whamcloud.com/43558
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-13783 libcfs: use lsmcontext in security_release_secctx 84/43284/3
Jian Yu [Tue, 4 May 2021 23:55:34 +0000 (16:55 -0700)]
LU-13783 libcfs: use lsmcontext in security_release_secctx

Kernel linux-hwe-5.8 (5.8.0-22.23~20.04.1) introduces
struct lsmcontext and uses it in security_release_secctx(),
which reduces the argruments from 2 to 1.

Change-Id: I37e185493001d335b40ea0a6102db593cb18beb3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43284
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14526 flr: mirror split downgrade SOM 68/43168/11
Bobi Jam [Tue, 30 Mar 2021 11:20:26 +0000 (19:20 +0800)]
LU-14526 flr: mirror split downgrade SOM

After mirror split, the file's blocks on SoM is not accurate, this
patch downgrade the SoM from STRICT so that size glimpse does not
trust the SoM from the MDS.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I02350c24190d96af93fed8c1b8a0bc6beb2c4bc2
Reviewed-on: https://review.whamcloud.com/43168
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14291 lustre: limit header scope for server only handling 96/43096/12
James Simmons [Sat, 8 May 2021 16:15:33 +0000 (12:15 -0400)]
LU-14291 lustre: limit header scope for server only handling

The lustre headers have server only function declarations and
inline functions. Only include them if HAVE_SERVER_SUPPORT is
set. This gets us closer to building OpenSFS modules against
a Linux kernel with a native Lustre client. Move a few things
from UAPI headers that are used only by kernel space to kernel
only headers where they belong.

For mdc_changlog.c a debug message access a field that is
available the server which is never the case. Since its
nonsense we can remove the report of this field

Change-Id: I6e2d3bebc121aef97fe69344d496e230f62b28ad
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14195 lnet: improve compat code for IPV6_V6ONLY sock opt 59/43559/3
Mr NeilBrown [Thu, 6 May 2021 01:27:28 +0000 (11:27 +1000)]
LU-14195 lnet: improve compat code for IPV6_V6ONLY sock opt

As get_fs() and set_fs() are deprecated, using them to call
sock->ops->setsockopt() is not a good solution.
Since linux 5.9 (v5.8-rc4-1952-ga7b75c5a8c41) it has been
possible to pass a "sockptr" to ->setsockopt() which can provide
a kernel address.

Prior to 5.8, kernet_setsockopt() is available and should still be
used.

For 5.8, when neither preferred option is available, we can pass
a NULL pointer which has the same effect as a pointer to zero.

Fixes: 10d99554631b ("LU-13783 lnet: remove kernel_setsockopt() from lnet_sock_listen()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I78c1f735a73cc9c835371c139e946144c6df5108
Reviewed-on: https://review.whamcloud.com/43559
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14651 uapi: rename CONFIG_T_* to MGS_CFG_T_* 94/43494/5
James Simmons [Tue, 4 May 2021 14:12:05 +0000 (10:12 -0400)]
LU-14651 uapi: rename CONFIG_T_* to MGS_CFG_T_*

The Linux kernel uses CONFIG_* as a way to determine if a feature
is available. Using CONFIG_* in an UAPI is considered an error
and in the most recent kernels will break a build. While we don't
have any CONFIG_* in our UAPI headers we do have CONFIG_T_*
which is used for config logs. This naming confuses the Linux
kernel build system so just rename these variables to MGS_CFG_T_*
instead.

Test-Parameters: trivial
Change-Id: I574510c2da90e9ae608c9e2374d75220b7abb19d
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43494
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14583 llapi: handle symlinks in llapi_file_get_stripe() 29/43229/4
John L. Hammond [Wed, 7 Apr 2021 19:11:25 +0000 (14:11 -0500)]
LU-14583 llapi: handle symlinks in llapi_file_get_stripe()

In llapi_file_get_stripe(), if the IOC_MDC_GETFILESTRIPE ioctl handler
returns -ENOTTY or -ENODATA then try to resolve any symlinks in the
path and try again.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic046d6ef77d8342d47336144e3066cab3a940a96
Reviewed-on: https://review.whamcloud.com/43229
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14565 ofd: Do not rely on tgd_blockbit 54/43154/23
Arshad Hussain [Mon, 29 Mar 2021 05:22:11 +0000 (10:52 +0530)]
LU-14565 ofd: Do not rely on tgd_blockbit

tgd_blockbit is recordsize bits set during mkfs.
This once set does not change. However, 'zfs set'
can be used to change the OST blocksize. Instead
of using cached value of 'tgd_blockbit' always
calculate the blocksize bits which may have
changed.

Test-case: sanity/104c added.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Icc100cca0d5ae492c41d60f0bf97512450f796bc
Reviewed-on: https://review.whamcloud.com/43154
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10948 llite: Introduce inode open heat counter 58/32158/40
Oleg Drokin [Tue, 13 Apr 2021 07:46:41 +0000 (03:46 -0400)]
LU-10948 llite: Introduce inode open heat counter

Initial framework to support detection of naive apps that
assume open-closes are "free" and proceed to open/close
same files between minute operations.

We will track number of file opens per inode and last time inode
was closed.

Initially we'll expose these controls:
llite/opencache_threshold_count - enables functionality and controls after how
                                  many opens open lock is requested
llite/opencache_threshold_ms    - if any reopen happens within this time (in
                                  ms), the open would trigger open lock request
llite/opencache_max_ms          - If last close was longer than this many ms
                                  ago - start counting opens from zero again

Once enough useful data is collected we can look into adding a heatmap
or another similar mechanism to better manage it and enable it
by default with sensible settings.

Change-Id: I1aa5455b458840acad651f651c883a7a7a67ab4c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32158
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
2 years agoLU-14627 tests: Create unload_modules_local 25/43425/5
Chris Horn [Fri, 23 Apr 2021 19:05:02 +0000 (14:05 -0500)]
LU-14627 tests: Create unload_modules_local

t-f allows for loading modules on single node via load_modules_local.
However, there is no corresponding unload_modules_local that can be
called to cleanup after call to load_modules_local, so we create it.
unload_modules() refactored to use unload_modules_local.

Also address a potential issue that can prevent LND modules from
unloading. Some LNet setup (particularly those in sanity-lnet) may
require that we call lnetctl lnet unconfigure (or lctl net down)
to drop a ref on the module before it can be unloaded.

HPE-bug-id: LUS-9031
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6458a7728f5f559f8641c5a9e29dd775c8445c38
Reviewed-on: https://review.whamcloud.com/43425
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13783 libcfs: support absence of account_page_dirtied 27/40827/7
Mr NeilBrown [Thu, 29 Apr 2021 13:04:04 +0000 (09:04 -0400)]
LU-13783 libcfs: support absence of account_page_dirtied

Some kernels export neither account_page_dirtied nor
kallsyms_lookup_name.
For these kernels we need to use __set_page_dirty() and suffer the
cost of dropping an reclaiming the page-tree lock.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I69d934480832f3909d3ec103f11e1d62489d70d7
Reviewed-on: https://review.whamcloud.com/40827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13722 utils: lnetctl discrepancy in YAML output 69/40269/3
Cyril Bordage [Fri, 16 Oct 2020 14:10:28 +0000 (16:10 +0200)]
LU-13722 utils: lnetctl discrepancy in YAML output

Use "max_interfaces" instead of "max_intf" in YAML output/input.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: Id0d1d1a4220d817b238456946e28b985e1fdc80c
Reviewed-on: https://review.whamcloud.com/40269
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13912 lnet: Correct the router ping interval calculation 94/39694/7
Chris Horn [Mon, 17 Aug 2020 21:02:10 +0000 (16:02 -0500)]
LU-13912 lnet: Correct the router ping interval calculation

The router ping interval is being divided by the number of local nets
which results in sending pings more frequently than defined by the
alive_router_check_interval. In addition, the current code is structured
such that we may not find a peer net in need of a ping until after
inspecting the router list multiple times. Re-work the code so that the
loop that inspects a router's peer nets will look at all of them until
it either loops back around the list or it finds one that actually
needs to be pinged.

We also move the check of LNET_PEER_RTR_DISCOVERY so that we avoid the
work of inspecting the router's peer nets if the router is already being
discovered.

Test-Parameters: trivial
HPE-bug-id: LUS-9237
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5a4733002f29c0ade6aee62b4424313d5d245556
Reviewed-on: https://review.whamcloud.com/39694
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14641 osd-ldiskfs: write commit declaring improvement 46/43446/7
Wang Shilong [Mon, 26 Apr 2021 03:23:26 +0000 (11:23 +0800)]
LU-14641 osd-ldiskfs: write commit declaring improvement

This patch try to:

1)extent bytes could be missed to increase with less than
1M, fix to to compare it with current value, and decay
it for every allocation.

2)with system space usage growing up, mballoc codes won't
try best to scan block group to align best free extent as
we can. So extent bytes per extent could be decayed to a
very small value, this could make us reserve too many credits.
We could be more optimistic in the credit reservations, even
in a case where the filesystem is nearly full, it is extremely
unlikely that the worst case would ever be hit.

3)Add extent bytes stats and debug ability to analysis
over reservation problem.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I357c4a855147ba26a9e9bbe9ab1269bcfd44e5f3
Reviewed-on: https://review.whamcloud.com/43446
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14603 ptlrpc: quiet messages for unsupported opcodes 57/43257/3
Andreas Dilger [Sun, 11 Apr 2021 02:04:30 +0000 (20:04 -0600)]
LU-14603 ptlrpc: quiet messages for unsupported opcodes

Reduce message spew for unhandled RPC opcodes.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I35496168e3aa29ecb06076654ef0aa97ba2540e5
Reviewed-on: https://review.whamcloud.com/43257
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14669 tests: reduce time spent in sanity-sec 43/43543/4
Sebastien Buisson [Wed, 5 May 2021 13:01:15 +0000 (15:01 +0200)]
LU-14669 tests: reduce time spent in sanity-sec

Several tests in sanity-sec loop over a number of nodemaps (16),
a number of NID ranges (3), a number of IP addresses (6) and a
number of mapped IDs (10).
Whereas the benefits for test coverage of such a large number of
combinations is not clear, looping over all this certainly takes
a lot of time.
So arbitrarily reduce that to smaller numbers in case SLOW!=yes:
- 3 nodemaps;
- 2 NID ranges;
- 2 IP addresses;
- 3 mapped ID
This reduces the test time for sanity-sec from ~12000s to ~7500s.

Similarly, test_fops function, used by tests 16,17,18,19,20,21,22,
loops over a number of users (4), a number of mapped IDs (6), a
number of permission bit masks (4), and for each client.
Arbitrarily reduce that to smaller numbers in case SLOW!=yes, to
save test time without degrading test coverage:
- 2 users;
- 4 mapped IDs;
- 2 permission bit masks;
- from only one client.
This further reduces the test time from ~7500s to ~5000s.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 clientdistro=el8.3 serverdistro=el8.3 testlist=sanity-sec
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 clientdistro=el8.3 serverdistro=el8.3 testlist=sanity-sec env=SLOW=yes
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idbe9c8daca43feafe7ca6481902cf6b54e3fa87c
Reviewed-on: https://review.whamcloud.com/43543
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13439 lmv: qos stay on current MDT if less full 45/43445/9
Andreas Dilger [Sun, 25 Apr 2021 11:02:19 +0000 (05:02 -0600)]
LU-13439 lmv: qos stay on current MDT if less full

Keep "space balanced" subdirectories on the parent MDT if it is less
full than average, since it doesn't make sense to select another MDT
which may occasionally be *more* full.  This also reduces random
"MDT jumping" and needless remote directories.

Reduce the QOS threshold for space balanced LMV layouts, so that the
MDTs don't become too imbalanced before trying to fix the problem.

Change the LUSTRE_OP_MKDIR opcode to be 1 instead of 0, so it can
be seen that a valid opcode has been stored into the structure.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iab34c7eade03d761aa16b08f409f7e5d69cd70bd
Reviewed-on: https://review.whamcloud.com/43445
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13440 lmv: add default LMV inherit depth 31/43131/13
Lai Siyao [Mon, 15 Mar 2021 03:57:36 +0000 (11:57 +0800)]
LU-13440 lmv: add default LMV inherit depth

A new field "__u8 lum_max_inherit" is added into struct lmv_user_md,
which represents the inherit depth of default LMV. It will be
decreased by 1 for subdirectories.

The valid value of lum_max_inherit is [0, 255]:
* 0 means unlimited inherit.
* 1 means inherit end.
* 250 is the max inherit depth.
* [251, 254] are reserved.
* 255 means it's not set.

A new field "__u8 lum_max_inherit_rr" is added, if default stripe
offset is -1, lum_max_inherit_rr is non-zero, and system is balanced,
new directories are created in roundrobin mannner, otherwise they
are created on the MDT where their parents are located to avoid
creating remote directories. And similarly this value will be
decreased by 1 for each level of subdirectories.

The valid value of lum_max_inherit_rr is different:
* 0 means not set.
* 1 means inherit end.
* 250 is the max inherit depth.
* [251, 254] are reserved.
* 255 means unlimited inherit.

However for the user interface of "lfs", the valid value is [-1, 250]:
* -1 means unlimited inherit.
* 0 means not set.
* others are the same.

Add sanity 413c.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I98ccad8556a0469f83bd7d79f5086a2184d5b115
Reviewed-on: https://review.whamcloud.com/43131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
2 years agoLU-13440 obdclass: server qos penalty miscaculated 85/43385/2
Lai Siyao [Wed, 21 Apr 2021 12:05:52 +0000 (20:05 +0800)]
LU-13440 obdclass: server qos penalty miscaculated

Server qos penalty calculation uses active target count, but it
should use server count, which will make it larger than expected,
then weight of targets are often 0, and finally cause MDT0 is
often chosen in qos allocation.

Fixes: 45222b2ef ("LU-12624 obdclass: lu_tgt_descs cleanup")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1982363e4ff74c7344dd5e07d04e29214afa8a7f
Reviewed-on: https://review.whamcloud.com/43385
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
2 years agoLU-14628 ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec() 97/43397/3
Nikitas Angelinas [Thu, 15 Apr 2021 19:09:16 +0000 (12:09 -0700)]
LU-14628 ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec()

sptlrpc_gc_del_sec() calls mutex_lock() which calls might_sleep(), so
the explicit might_sleep() call can be removed as redundant.

Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Test-Parameters: trivial
Change-Id: I48714fae12e63ba5e37ec0f9aa3ab7f688b9475d
Reviewed-on: https://review.whamcloud.com/43397
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory. 19/43419/2
Mr. NeilBrown [Thu, 22 Apr 2021 18:27:38 +0000 (14:27 -0400)]
LU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory.

Convert
  list_entry(foo->next .....)
to
  list_first_entry(foo, ....)

in 'lnet/klnds

In several cases the call is combined with a list_empty() test and
list_first_entry_or_null() is used

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I3b2b33c3c9284c02e44610614d64a1f84be300a4
Reviewed-on: https://review.whamcloud.com/43419
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14632 tests: fix sanity-hsm test_606() 09/43409/3
Elena Gryaznova [Thu, 22 Apr 2021 14:15:42 +0000 (17:15 +0300)]
LU-14632 tests: fix sanity-hsm test_606()

After check_and_setup_lustre() mds1_dev is equal to
/dev/mapper/mds1_flakey, which is unexported to saved real
device  (/dev/vdc) by stop():
  stop mds1
    elif dm_flakey_supported mds1; then
        dm_cleanup_dev mds1
          unexport_dm_dev mds1
As a result stack_trap() is called with non existing
/dev/mapper/mds1_flakey:
  stack_trap 'stop mds1; start mds1 /dev/mapper/mds1_flakey
              -o rw,user_xattr' EXIT
and failed as:
  losetup: /dev/mapper/mds1_flakey: failed to set up loop device:
              No such file or directory
Reproducer:
  run ONLY=606 sh sanity-hsm.sh on "failover" setup (mds1_HOST !=
  mds1failover_HOST), no llmount.sh before the test

Fixes: 54b9e3f78935 ("LU-684 tests: replace dev_read_only patch with dm-flakey")
Test-Parameters: trivial testlist=sanity-hsm env=ONLY=606
Test-Parameters: fstype=zfs testlist=sanity-hsm env=ONLY=606
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9920
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I9ab3cbcf67c6fd046861810a2ceab262f211436b
Reviewed-on: https://review.whamcloud.com/43409
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: rework includes for client encryption 86/43386/3
Sebastien Buisson [Tue, 23 Mar 2021 14:19:01 +0000 (14:19 +0000)]
LU-13717 sec: rework includes for client encryption

Simplify includes for crypto, by not repeating stubs in case
HAVE_LUSTRE_CRYPTO is not defined.
Expose encoding routines that are going to be used in the Lustre
code (both client and server sides) with filename encryption.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6c5853d6da7120edd2bec3a12494251d873151a8
Reviewed-on: https://review.whamcloud.com/43386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14616 readahead: fix reserving for unaliged read 77/43377/7
Wang Shilong [Tue, 20 Apr 2021 01:47:25 +0000 (09:47 +0800)]
LU-14616 readahead: fix reserving for unaliged read

If read is [2K, 3K] on x86 platform, we only need
read one page, but it was calculated as 2 pages.

This could be problem, as we need reserve more
pages credits, vvp_page_completion_read() will only
free actual reading pages, which cause @ra_cur_pages
leaked.

Fixes: d4a54de84c0 ("LU-12367 llite: Fix page count for unaligned reads")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I3cf03965196c1af833184d9cfc16779f79f5722c
Reviewed-on: https://review.whamcloud.com/43377
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14621 mdd: fix lock-tx order in orphan cleanup 62/43362/7
Alex Zhuravlev [Sun, 18 Apr 2021 13:52:25 +0000 (16:52 +0300)]
LU-14621 mdd: fix lock-tx order in orphan cleanup

start the transaction and then take locks.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iba4be17df55459142baa7585f47231f1b72ebf0b
Reviewed-on: https://review.whamcloud.com/43362
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14616 readahead: export pages directly without RA 38/43338/3
Wang Shilong [Fri, 16 Apr 2021 02:04:17 +0000 (10:04 +0800)]
LU-14616 readahead: export pages directly without RA

With Readahead disabled, @vpg_defer_uptodate should not
be set as we don't reserve credits for such read.
In vvp_page_completion_read() we will call ll_ra_count_put()
which makes @ra_cur_pages negative.

Fixes: 7e8efb339b ("LU-12043 llite: fix to submit complete read block with ra disabled")
Change-Id: I1c9134f5972aa0d0e7aac998f02c690cc55b433b
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/43338
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14612 utils: Add functions llapi_group_lock64/unlock64() 10/43310/8
Emoly Liu [Sun, 25 Apr 2021 00:48:42 +0000 (08:48 +0800)]
LU-14612 utils: Add functions llapi_group_lock64/unlock64()

Add new functions llapi_group_lock64/unlock64() for 64-bit usage.

Test-Parameters: trivial
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id79ae634ba6a787974be481b51743604c4adb536
Reviewed-on: https://review.whamcloud.com/43310
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14604 kernel: kernel update RHEL8.3 [4.18.0-240.22.1.el8_3] 58/43258/3
Jian Yu [Tue, 13 Apr 2021 23:41:31 +0000 (16:41 -0700)]
LU-14604 kernel: kernel update RHEL8.3 [4.18.0-240.22.1.el8_3]

Update RHEL8.3 kernel to 4.18.0-240.22.1.el8_3.

Test-Parameters: trivial \
clientdistro=el8.3 serverdistro=el8.3 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.3 serverdistro=el8.3 testlist=sanity

Change-Id: I1a3152d95822a74e05f9b44f590a6cdb1f8b02b6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43258
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14599 osp: limit allocation at osp_sync_process_committed 50/43250/5
Alexander Boyko [Fri, 9 Apr 2021 12:16:55 +0000 (08:16 -0400)]
LU-14599 osp: limit allocation at osp_sync_process_committed

Sometimes osp cancels very large cookie list with 64K elements.
In this case osp_sync_process_committed() tries to allocate 64 pages
and uses vmalloc.
The fix limits memory allocation size to 4 page with kmalloc, and
reuse it in a loop.

HPE-bug-id: LUS-9815
Fixes: 6d7332102 ("LU-11924 osp: combine llog cancel operations")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic875335a28f78494fdb3cbc4b0145e5a43831ee8
Reviewed-on: https://review.whamcloud.com/43250
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14553 changelog: eliminate mdd_changelog_clear warning 25/43125/4
Olaf Faaland [Thu, 25 Mar 2021 01:35:10 +0000 (18:35 -0700)]
LU-14553 changelog: eliminate mdd_changelog_clear warning

When handling a changelog_clear request, the user may specify a
range of indices which do not exist.  Similarly, the user may
specify a changelog user which does not exist.  Neither indicates
a problem within Lustre that justifies a a console warning.

Change those cases to CDEBUG.

Test-Parameters: trivial
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I64bab12ef4978c4bf7139f5f36a39f9b109616fb
Reviewed-on: https://review.whamcloud.com/43125
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14537 mdd: directory migrate skips project ID check 10/42110/3
Lai Siyao [Sat, 13 Mar 2021 15:48:54 +0000 (23:48 +0800)]
LU-14537 mdd: directory migrate skips project ID check

mdd_migrate_sanity_check() used to call mdd_rename_sanity_check(),
while the latter checks parent and sub file project ID, which is
redundant for migration because it's an internal layout change.

Add sanity 230t.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If5ac2131acb1dfb30a312dc34052287776f581c7
Reviewed-on: https://review.whamcloud.com/42110
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14359 hsm: support a flatter HSM archive format 12/41312/6
John L. Hammond [Fri, 22 Jan 2021 16:56:06 +0000 (10:56 -0600)]
LU-14359 hsm: support a flatter HSM archive format

Add versioning (v1 and v2) to the HSM archive format (directory
layout):
  v1: (oid & 0xffff)/-/-/-/-/-/FID
  v2: ((oid ^ seq) & 0xffff)/FID

v1 is the original layout and the default. v2 is the new layout which
should be selected for new installs.

Add an option --archive-format to select the archive format.

Add YAML configuration file support to lhsmtool_posix with properties
achive_format and archive_path. Add an option --config to set the
config file.

Adapt sanity-hsm and test-framework to allow testing of both archive
formats.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I6d6bd0c8817a491848b554fa76078d876549cc1f
Reviewed-on: https://review.whamcloud.com/41312
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14357 fid: simplify locking for fid updates 99/41299/2
Mr NeilBrown [Fri, 22 Jan 2021 02:42:33 +0000 (13:42 +1100)]
LU-14357 fid: simplify locking for fid updates

'struct lu_client_seq' contains a mutex (lcs_mutex) and a second
open-coded mutex (lcs_waitq, lcs_update).  Both of these are using in
gettign a new fid, possibly from the server.

lcs_mutex is the main mutex which protects the local variables.  If an
RPC to the server is required, the extra mutex is held during that
RPC.

This was apparently intended to avoid some deadlock, presumably with
seq_client_flush().  However as seq_client_flush() now takes both
mutexes as well, it is still prone to any such deadlock, but does not
appear to actually suffer from one.
See:
  Commit 23e2a370c8a3 ("b=24255 move seq_client_alloc_seq out of
   lcs_sem")
  Commit d1feb5c774d4 ("LU-662 fix conflict between seq_client_flush
   and seq_client_alloc_fid")

for some of the history.

The extra open-coded mutex appears to provide no value, so let's
remove it.

As part of this, seq_fid_alloc_fini() is open-coded into the two call
sites, which adds further simplification.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia39eca7d925c9d49fbf942923de8af79dba4f6bf
Reviewed-on: https://review.whamcloud.com/41299
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12815 socklnd: add conns_per_peer parameter 56/41056/9
Serguei Smirnov [Tue, 30 Mar 2021 16:58:57 +0000 (12:58 -0400)]
LU-12815 socklnd: add conns_per_peer parameter

Introduce conns_per_peer ksocklnd module parameter.
In typed mode, this parameter shall control
the number of BULK_IN and BULK_OUT tcp connections,
while the number of CONTROL connections shall stay
at 1. In untyped mode, this parameter shall control
the number of untyped connections.
The default conns_per_peer is 1. Max is 127.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I70bbaf7899ae1fbc41de34553c8c4ad1c7d55f7e
Reviewed-on: https://review.whamcloud.com/41056
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13781 lnet: Local NI must be on same net as next-hop 52/39352/5
Chris Horn [Sun, 12 Jul 2020 15:47:55 +0000 (10:47 -0500)]
LU-13781 lnet: Local NI must be on same net as next-hop

When sending to a remote peer we need to restrict our selection of a
local NI to those on the same peer net as the next-hop.

The code currently selects a local NI on the peer net specified by the
lr_lnet field of the lnet_route returned by lnet_find_route_locked().
However, lnet_find_route_locked() may select a next-hop peer NI on any
local peer net - not just lr_lnet.

A redundant assignment to sd->sd_msg->msg_src_nid_param is also
removed. That variable is always set appropriately in
lnet_select_pathway().

Test-Parameters: trivial
HPE-bug-id: LUS-9095
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If1bec26d6646b9e66b99656d7db2dc538d631a34
Reviewed-on: https://review.whamcloud.com/39352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13107 utils: clean up lctl command usage 08/37108/4
Andreas Dilger [Sat, 28 Dec 2019 09:42:54 +0000 (02:42 -0700)]
LU-13107 utils: clean up lctl command usage

The lctl usage is confusing because it lists a number of valid
commands after "testing (DANGEROUS)", such as LFSCK and llog.

Move the useful commands before the "testing" section so it is
not mis-interpreted as all following commands are dangerous.
Group some other commands together with more related commands,
rather than whatever order they happened to be imlpemented in.

Remove function prototypes for commands that no longer exist.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I469f9c92953762cc46a68e44238c4b67ebacab07
Reviewed-on: https://review.whamcloud.com/37108
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11085 llite: reimplement range_lock with Linux interval_tree 26/39726/8
Mr NeilBrown [Tue, 25 Aug 2020 01:48:34 +0000 (11:48 +1000)]
LU-11085 llite: reimplement range_lock with Linux interval_tree

As a step towards removing the lustre interval-tree implementation,
reimplent range_lock to use Linux interval trees.

As Linux interval trees allow the same interval to be stored twice,
this allows the removal of the rl_next_lock list and associated code.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1d1669345fac0945e0e189b87a74ca8e7582e842
Reviewed-on: https://review.whamcloud.com/39726
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14618 lov: correctly handling sub-lock init failure 45/43345/2
Bobi Jam [Fri, 16 Apr 2021 15:56:01 +0000 (23:56 +0800)]
LU-14618 lov: correctly handling sub-lock init failure

In lov_lock_sub_init(), if a sublock initialization fails, it needs to
bail out of the outer loop as well as the inner one.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ic4e16f484a0a64c670eea5d47054bac19bc95144
Reviewed-on: https://review.whamcloud.com/43345
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14617 utils: llog_reader updatelog support 43/43343/4
Alexander Boyko [Fri, 16 Apr 2021 09:57:34 +0000 (05:57 -0400)]
LU-14617 utils: llog_reader updatelog support

The patch adds printing UPDATE_REC for llog_reader. It is usefull
for updatelog analyze. Here is an example of record

 [0x50001a21b:0x1233d:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x50001a211:0x475:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x3800182e3:0x475:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x200032c9a:0x245:0x0] type:xattr_set/7 params:3 p_0:0 p_1:1 p_2:2
 [0x200000001:0x15:0x0] type:write/12 params:2 p_0:3 p_1:4
 p_0 - 12/trusted.lov
 p_1 - 0/
 p_2 - 25972/\x0100000000000000000000000000000000000000000002000...
 p_3 - 25974/\x0800000000000000P\xD1AB006x0000000400EC^\x000000...
 p_4 - 1/

llog logic processing base on incrementing record index,
the fix adds checks for it. Also adds more info from header,
and drops useless - Bit X not set.

Test-Parameters: trivial
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id50de15040526dc07ae708ac5db046832706be31
Reviewed-on: https://review.whamcloud.com/43343
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14606 llog: hide ENOENT for cancelling record 64/43264/5
Alexander Boyko [Mon, 12 Apr 2021 12:19:47 +0000 (08:19 -0400)]
LU-14606 llog: hide ENOENT for cancelling record

Llog allows parallel records processing. A record could be cancelled
at callback. If two threads processing and cancelling the same record,
one thread would get ENOENT.
The error was observed during purging changlog records.The patch
adds reproducer test sanity 160m.

This is a valid case, let's hide ENOENT error from a caller.

HPE-bug-id: LUS-9826
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id00b959e6f329c2ad34966f8a17a52f71680f24c
Reviewed-on: https://review.whamcloud.com/43264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13937 osd: OST with isize < 256 stripe info stored in LMA 90/39790/7
Artem Blagodarenko [Tue, 1 Sep 2020 19:47:22 +0000 (15:47 -0400)]
LU-13937 osd: OST with isize < 256 stripe info stored in LMA

osd_xattr_set_pfid() expects XATTR_NAME_FID to be exist if
LMAC_STRIPE_INF is absent and delete it if LU_XATTR_REPLACE is set.
There are some cases, after upgrade, when LMAC_STRIPE_INF is absent
and there is no XATTR_NAME_FID.

No need return -ENODATA to the client, because having XATTR_NAME_FID
absent is what situation requires, so osd_xattr_set_pfid() can
continue to fill LMA.

Change-Id: If1690204912ba0875e2251264eb28286e930b3a7
hpe-bug-id: LUS-9106
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/39790
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14627 lnet: Allow delayed sends 16/43416/3
Chris Horn [Wed, 21 Apr 2021 19:22:46 +0000 (14:22 -0500)]
LU-14627 lnet: Allow delayed sends

The net_delay_add has some code related to delaying sends, but it
isn't fully implemented. Modify lnet_post_send_locked() to check
whether the message being sent matches a rule and should be delayed.

Fix some bugs with how the delay timers were set and checked.

HPE-bug-id: LUS-7651
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Icbd9ee81d2ff0162a01a4187807ea2114a42276d
Reviewed-on: https://review.whamcloud.com/43416
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13903 utils: separate out server code for lctl 41/42141/5
James Simmons [Wed, 7 Apr 2021 16:22:53 +0000 (12:22 -0400)]
LU-13903 utils: separate out server code for lctl

The utility lctl is used by both client and server management but
the code is not cleanly separated for each case. This breaks
building lctl for the native Linux Lustre client since it only
contains client related work. Fix up lctl to be slimed down for
client only targets.

Change-Id: I0c8c037087de1155e478a2580fc952629c341447
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/42141
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14344 mdc: make rpc set for MDS_STATFS interruptible 82/41282/10
Alex Zhuravlev [Wed, 20 Jan 2021 08:08:00 +0000 (11:08 +0300)]
LU-14344 mdc: make rpc set for MDS_STATFS interruptible

otherwise it ignores signals making imposible to interrupt
mount process with a signal which is checked by conf-sanity/23a

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I882c774f1143d03c248282ecf7a0e2079d0627f5
Reviewed-on: https://review.whamcloud.com/41282
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14397 ptlrpc: idle import vs lock enqueue race 03/41403/6
Alexander Boyko [Wed, 3 Feb 2021 11:04:52 +0000 (06:04 -0500)]
LU-14397 ptlrpc: idle import vs lock enqueue race

There is a window after ptlrpc_check_import_is_idle()
and setting LUSTRE_IMP_CONNECTING for lock enqueue.
The lock get granted on OST and is returned to the client.
Server's lock is destroyed on OST_DISCONNECT.

Perform import counters check with setting LUSTRE_IMP_CONNECTING.
A regression test_812c was added to sanity.

HPE-bug-id: LUS-8705
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I85da18b29ca58f811ecde8ce72ba24373388947e
Reviewed-on: https://review.whamcloud.com/41403
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14579 flr: mirror unlink and split race 69/43369/5
Bobi Jam [Fri, 2 Apr 2021 04:47:32 +0000 (12:47 +0800)]
LU-14579 flr: mirror unlink and split race

- protect lod_object::ldo_comp_entries during
  lod_obj_for_each_stripe(), since other thread could change the
  ldo_comp_entries at the same time.

- protect LOD in-memory layout during layout change
  layout_{add|set|del} and purge_mirror.

- fix lock-tx order in mdd_unlink: start the transaction and then
  take locks. (introduced in commit
  55d5235354d49aee0a330ad64beef4ed9004a27f)

- Add test case for mirror split and unlink race.

Fixes: 55d5235354 ("LU-14579 flr: GPF in lod_sub_declare_destroy")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ic54245c8755f660087fce46d1cad0ef7fa091245
Reviewed-on: https://review.whamcloud.com/43369
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14619 utils: mount.lustre max_sectors_kb help message 50/43350/2
Sebastien Piechurski [Fri, 16 Apr 2021 20:37:50 +0000 (22:37 +0200)]
LU-14619 utils: mount.lustre max_sectors_kb help message

Document the max_sectors_kb option in the usage message of
mount.lustre.

Test-Parameters: trivial=true
Signed-off-by: Sebastien Piechurski <sebastien.piechurski@atos.net>
Change-Id: I6fda43c17877b471fead4dbae1045cf82bd621d9
Reviewed-on: https://review.whamcloud.com/43350
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-13783 libcfs: add cfs_kallsyms_lookup_name() 96/43296/3
Jian Yu [Tue, 13 Apr 2021 05:12:27 +0000 (22:12 -0700)]
LU-13783 libcfs: add cfs_kallsyms_lookup_name()

The inline kallsyms_lookup_name() added by
commit d7249d9d70a caused the following failures:

libcfs/include/libcfs/linux/linux-misc.h:150:21:
error: conflicting types for ‘kallsyms_lookup_name’
  150 | static inline void *kallsyms_lookup_name(char *func)
      |                     ^~~~~~~~~~~~~~~~~~~~

include/linux/kallsyms.h:76:15:
note: previous declaration of ‘kallsyms_lookup_name’ was here
   76 | unsigned long kallsyms_lookup_name(const char *name);
      |               ^~~~~~~~~~~~~~~~~~~~

This patch removes the inline kallsyms_lookup_name() definition
from linux-misc.h and adds cfs_kallsyms_lookup_name() to wrap
kallsyms_lookup_name() if it is exported or return NULL in case of
kallsyms_lookup_name() is not exported.

Fixes: d7249d9d70a ("LU-13783 libcfs: provide fallback kallsyms_lookup_name()")
Change-Id: I4b2d4499948a8586b48db68484491ec76c3a609d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43296
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14592 lfsck: SEL files support 34/43234/5
Vitaly Fertman [Fri, 25 Dec 2020 23:12:50 +0000 (02:12 +0300)]
LU-14592 lfsck: SEL files support

lfsck works directly on the mdt, and therefore it can see SEL magic,
handle it properly in the lfsck code

check the extent range alignment by stripe size, as done in other
places, not by MIN_STRIPE_SIZE

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Iae9b0583f837f107eb0e5eb36eeb70643b92a12b
HPE-bug-id: LUS-9686, LUS-9639
Reviewed-on: https://es-gerrit.dev.cray.com/158202
Reviewed-on: https://es-gerrit.dev.cray.com/158287
Reviewed-on: https://es-gerrit.dev.cray.com/158216
Reviewed-on: https://es-gerrit.dev.cray.com/158463
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/43234
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14566 lnet: Skip discovery in LNetPrimaryNID if DD disabled 41/43141/2
Chris Horn [Fri, 26 Mar 2021 16:28:18 +0000 (11:28 -0500)]
LU-14566 lnet: Skip discovery in LNetPrimaryNID if DD disabled

If discovery is disabled locally then the discovery thread will not
modify any peer objects as a result of the discovery process. Thus,
the primary NID of any peer we're asked to discover will not change
as a result of discovery. Therefore, we do not need to actually
perform discovery in LNetPrimaryNID() if discovery is disabled
locally. Since this routine can result in long client mount times
when a Lustre server is down we should avoid this unnecessary
discovery.

Test-Parameters: trivial
HPE-bug-id: LUS-9887
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6d188e16422ad47a146d52bb24cdd1b77a30aa71
Reviewed-on: https://review.whamcloud.com/43141
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14487 libcfs: remove references to Sun Trademark. 37/42137/6
Mr NeilBrown [Mon, 22 Mar 2021 22:22:53 +0000 (09:22 +1100)]
LU-14487 libcfs: remove references to Sun Trademark.

"lustre" is no longer a Trademark of Sun Microsystems.  There is no
need to acknowledge the trademark in every file, so just remove all
these claims.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ife96ed8a26fd56a3c91933ae0a6b96784a8cc70a
Reviewed-on: https://review.whamcloud.com/42137
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-14430 mdt: fix maximum ACL handling 13/42013/4
Mikhail Pershin [Sun, 28 Feb 2021 09:24:12 +0000 (12:24 +0300)]
LU-14430 mdt: fix maximum ACL handling

Having maximum ACL cause big reply buffer and in that case
server could return -ERANGE in mdt_pack_acl2body() expecting
a client to resend RPC with bigger buffer. The problem is
that even in that case server can return -ERANGE causing
userspace tool to get this error after all.

Instead of estimating reply sizes in mdt_pack_acl2body()
let's just rely on mdt_fix_reply() code which does buffer
grow when it is needed

- add more credits for osd_create in ldiskfs because it
  copies also default ACLs during create
- remove code returning -ERANGE in mdt_pack_acl2body() and
  rely on mdt_fix_reply() reply buffers grow
- test is added to create as many ACLs as possible

Test-Parameters: env=ONLY=103e testlist=sanity
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If7af5c61f89ee1220d7982d4c61a7357051a811c
Reviewed-on: https://review.whamcloud.com/42013
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-8837 lustre: move lu_tgt_pool out of obd_target.h 51/41951/7
Mr NeilBrown [Sun, 18 Apr 2021 00:32:13 +0000 (20:32 -0400)]
LU-8837 lustre: move lu_tgt_pool out of obd_target.h

'struct lu_tgt_pool' is the only part of obd_target.h that is needed
on the client.
So move it and related declarations to lu_object.h.
Then obd_target.h does not need to be included by lu_object.h,
and it will only be included server-side;

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ia578d766fb7fae6ae2fbcfa651fe5b5264b1ff14
Reviewed-on: https://review.whamcloud.com/41951
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14428 libcfs: simplify task management in tracefile.c 92/41492/7
Mr NeilBrown [Tue, 9 Feb 2021 02:33:31 +0000 (13:33 +1100)]
LU-14428 libcfs: simplify task management in tracefile.c

The waitqueue, mutex, and two completions are not needed.
We can use kthread_stop/kthread_should_stop to synchronize
shutdown, cmpxchg() to ensure only one task is started, and a simple
wake_up_process() to wake the process.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I45390c2cb58b09a85033a7006412a0e2033130c5
Reviewed-on: https://review.whamcloud.com/41492
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy 85/40985/3
Chris Horn [Fri, 11 Dec 2020 18:04:32 +0000 (12:04 -0600)]
LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy

The MR design dictates that the peer nets and peer NIs are ordered
such that the peer net and peer NI for a peer's primary NID appears
first, followed by other peer NIs in the primary NID's peer net,
followed by other peer nets/NIs. This ordering is broken and it can
result in tripping an assertion if the primary NID of a peer is
deleted. Modify lnet_peer_attach_peer_ni() to check whether the
NI being attached is the peer's primary, and place it, and its
associated peer net, appropriately.

Modify lnet_peer_set_primary_nid() so that it updates the
lp_primary_nid before calling lnet_peer_add_nid() so that
lnet_peer_attach_peer_ni() can detect the situation where the
primary is changing and act appropriately.

Finally, modify lnet_peer_merge_data() to enforce the hierarchy
after it has finished merging the contents of the ping buffer. This
ensures we maintain the correct hierarchy in certain edge cases where
we've needed to reconcile two peers. e.g. if a peer adds a new
interface, the discovery push may arrive from that new interface
which will result in a second peer object being created which will
need to be reconciled with the original peer object.

HPE-bug-id: LUS-9630
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8397a24ba1ba0bba33846e7e97b8d60a8f26a1be
Reviewed-on: https://review.whamcloud.com/40985
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14206 lnet: Router ping timeout with discovery disabled 23/40923/3
Chris Horn [Wed, 9 Dec 2020 20:38:57 +0000 (14:38 -0600)]
LU-14206 lnet: Router ping timeout with discovery disabled

Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Test-Parameters: trivial
Fixes: 2832478194 ("LU-13029 lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I009c69d4f8990b72d83d9426c782c0e55c1023a4
Reviewed-on: https://review.whamcloud.com/40923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13933 tests: remove some unnecessary back-slashes. 50/39750/6
Mr NeilBrown [Thu, 13 Aug 2020 05:16:24 +0000 (15:16 +1000)]
LU-13933 tests: remove some unnecessary back-slashes.

Various scripts use "\ " where the "\" isn't needed.

In an 'awk' pattern, using "\ " is wrong.
In the arg to 'tr', there is no need to '\' a space.

In shell code, <<" foo ">> is easier to read than <<\ foo\ >>.

Also:

In shell code <<name="value">> can be <<name=value>> if <<value>>
doesn't contain unquoted spaces.  If value does contain quoted <<">>,
the extra quotes make it hard to read.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I53d0522071016286b4ef9f576f4a4e46b1ba2d6d
Reviewed-on: https://review.whamcloud.com/39750
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13569 lnet: Deprecate lnet_recovery_interval 22/39722/16
Chris Horn [Fri, 21 Aug 2020 19:42:48 +0000 (14:42 -0500)]
LU-13569 lnet: Deprecate lnet_recovery_interval

We no longer use a static recovery interval, so remove its remaining
uses and add warning that it has been deprecated.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iedc79803ef5b7ba041a531961eb77acd338abfb5
Reviewed-on: https://review.whamcloud.com/39722
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>