Whamcloud - gitweb
fs/lustre-release.git
4 months agoLU-14513 osd: release o_guard before quota acquisition 18/42018/11
Alex Zhuravlev [Fri, 12 Mar 2021 06:17:11 +0000 (09:17 +0300)]
LU-14513 osd: release o_guard before quota acquisition

to avoid deadlocks as regular transactions (like write) start
a transaction, then grab o_guard.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2678677ed6c213e4bed30cc1218e48b8f2900dc4
Reviewed-on: https://review.whamcloud.com/42018
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-930 utils: add --help option to lfs sub-commands 59/34659/21
Andreas Dilger [Tue, 21 Jan 2020 10:29:36 +0000 (03:29 -0700)]
LU-930 utils: add --help option to lfs sub-commands

Add the "--help" and "-h" options to lfs sub-commands, and
print out an error message if an invalid argument is given.
Otherwise, it is possible to get a help message but have no
idea why the command is failing (e.g. typo in argument name).

Format the usage messages consistently, using {} to indicate a
choice between multiple required parameters, putting arguments
in [] for optional parameters, and using capitalized arguments.

Update respective man pages to list "--help|-h" option.

Remove the old SETSTRIPE and GETSTRIPE checks from spelling.txt
to avoid spurious checkpatch warnings.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic583c8161d1d5380e353f43a8613dd86c93ebbe5
Reviewed-on: https://review.whamcloud.com/34659
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14489 utils: fix 'lfs find --mdt-count' 66/43866/3
Andreas Dilger [Fri, 28 May 2021 21:15:10 +0000 (15:15 -0600)]
LU-14489 utils: fix 'lfs find --mdt-count'

Running "lfs find --mdt-count" causes the find to exit if there
is no directory striping, rather than continuing to the next item.

If cb_get_dirstripe() receives ENODATA then it should consider
that directory as not having any striping and move on, rather
than returning this error to the caller.

Don't crash in cb_getdirstripe() if it is called with a NULL
directory pointer or no directory is opened.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8dd135a86a6a8911bf804542132b2e7a3ce7057
Reviewed-on: https://review.whamcloud.com/43866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12577 llog: protect partial updates from readers 89/43589/8
Alex Zhuravlev [Sun, 9 May 2021 06:32:55 +0000 (09:32 +0300)]
LU-12577 llog: protect partial updates from readers

llog_osd_write_rec() adds a record in few steps: the header is
updated first, then the record itself is appended. per-loghandle
semaphore is used, but remote readers allocate a new separate
loghandle for every access (header reading, blocks), the the
readers can't use loghandle's semaphore to avoid accessing partial
updates. use object-based locking [censored] to serialize the writer
vs the readers.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie4e4d4a1e9a6fcdea9fcca7d80b0da920e786424
Reviewed-on: https://review.whamcloud.com/43589
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13124 scrub: check for multiple linked file 94/37194/31
Hongchao Zhang [Tue, 8 Jun 2021 21:50:34 +0000 (05:50 +0800)]
LU-13124 scrub: check for multiple linked file

The files on OSTs should have only one link, but it could
have more than one link when there are some disk failures
"multiply claimed block(s)" and fixed by e2fsck to clone
these conflicted blocks. This patch adds the check of these
multiple linked files in Scrub on OST.

The name of the objects in "O" depends on the object's FID,
the directory pattern is O/[FID_SEQ]/[SUB_DIR]/[FID_OID],
the inodes of these multiple linked files are normal, but
there is only one directroy entry compatible with the object,
this patch scans all files under "O" to check whether its name
is matched with its FID.

Change-Id: I280a725939b037006935d47e9ef426a4a6a7b317
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14627 lnet: Ensure ref taken when queueing for discovery 18/43418/9
Chris Horn [Thu, 22 Apr 2021 19:51:44 +0000 (14:51 -0500)]
LU-14627 lnet: Ensure ref taken when queueing for discovery

Call lnet_peer_queue_for_discovery() in
lnet_discovery_event_handler() to ensure that we take a ref on
the peer when forcing it onto the discovery queue. This also ensures
that the peer state has LNET_PEER_DISCOVERING.

Add a test to sanity-lnet.sh that can trigger the refcount loss bug
in discovery.

HPE-bug-id: LUS-7651
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie2908668c4ffde0f993b5b7ea9aa58acd1d6fa9c
Reviewed-on: https://review.whamcloud.com/43418
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13569 tests: Check LNet Health recovery logic 23/39723/24
Chris Horn [Mon, 24 Aug 2020 21:14:07 +0000 (16:14 -0500)]
LU-13569 tests: Check LNet Health recovery logic

Add test cases to validate LNet Health recovery of local and peer
NIs.

The new test cases are added to the except list for aarch64 due to
an unresolved issue with the LNet drop functionality on that
architecture.

A few style issues are also addressed by this patch.

An asterisk was being supplied to the lctl net_drop_del commands when
this should have been the '-a' flag.

A bug in cleanup_testsuite is addressed where we were using the
wrong filename for the tmp files created by the subtests.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I965df2449770631caa03ced7726abb0ea76c17e6
Reviewed-on: https://review.whamcloud.com/39723
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13569 lnet: Add health ping stats 14/40314/12
Chris Horn [Thu, 15 Oct 2020 22:33:33 +0000 (17:33 -0500)]
LU-13569 lnet: Add health ping stats

Add the NI and peer NI ping count and next ping timestamp to
detailed output of lnetctl peer and net output.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I208cb3ea0b08a2984572cf0ec9874dbd09f6168e
Reviewed-on: https://review.whamcloud.com/40314
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14729 osd-ldiskfs: declare dirty block groups correctly 90/43890/2
Wang Shilong [Wed, 2 Jun 2021 01:52:39 +0000 (09:52 +0800)]
LU-14729 osd-ldiskfs: declare dirty block groups correctly

Calculate dirty block groups only include estimated extents,
indirect blocks and extent node/leaf blocks are missed, this
could make us short of credits.

Fixes: 0271b17b80a82 ("LU-14134 osd-ldiskfs: reduce credits for new writing")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Iec8525823b04e909c030f94bf75b8eca60d31c50
Reviewed-on: https://review.whamcloud.com/43890
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14093 llapi: remove ignored qualifier 12/43712/3
Dominique Martinet [Sat, 15 May 2021 22:32:53 +0000 (07:32 +0900)]
LU-14093 llapi: remove ignored qualifier

Fixes the following warning on newer gcc with -Wextra:
.../include/lustre/lustreapi.h:1000:1: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
 1000 | const __u16 llapi_layout_string_flags(char *string);
      | ^~~~~

As the parameter is ignored, this should make no code difference

Test-parameters: trivial

Change-Id: I049166bbc586007cdecc93225d508693607ef04e
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/43712
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14093 utils: fix format-overflow warning 11/43711/3
Dominique Martinet [Sat, 15 May 2021 22:32:47 +0000 (07:32 +0900)]
LU-14093 utils: fix format-overflow warning

Fix the following warning on gcc11 by making numbuf big enough to fit
format content.

lfs.c: In function ‘print_quota’:
lfs.c:7719:48: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
 7719 |                         sprintf(numbuf[0], "%s*", strbuf);
      |                                                ^
lfs.c:7719:25: note: ‘sprintf’ output between 2 and 33 bytes into a destination of size 32
 7719 |                         sprintf(numbuf[0], "%s*", strbuf);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-parameters: trivial

Change-Id: I021e6ffff2e1405eadbe689f718674af4d4d6376
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/43711
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
4 months agoLU-13811 client: don't panic for mgs evictions 55/43655/3
Alexander Boyko [Tue, 11 May 2021 09:33:36 +0000 (05:33 -0400)]
LU-13811 client: don't panic for mgs evictions

Avoid client panics for MGS evictions.
Create a function to check if the eviction is coming
from an MGS, and if so to ignore it.

Rework dump_on_eviction and lbug_on_eviction so
all logic is handled in one place.

Test-Parameters: trivial
HPE-bug-id: LUS-197
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Signed-off-by: Ben Evans <jevans@cray.com>
Change-Id: Iaa8b06f52fa22ac891b569bc8a2271c8e1e63a3b
Reviewed-on: https://review.whamcloud.com/43655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13971 quota: report Pool Quotas for a user 75/39975/12
Sergey Cheremencev [Fri, 18 Sep 2020 13:24:10 +0000 (16:24 +0300)]
LU-13971 quota: report Pool Quotas for a user

Patch adds ability to show quota limits and usage
from all pools per user. Since this patch
long option --pool without argument results
in printing Pool Quotas for all known pools:
lfs quota -u quota_usr --pool /mnt/testfs
Pools from lustre:
Quotas for pool: qpool1
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0   10240       -       0       0       0       -
Quotas for pool: qpool2
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0   20480       -       0       0       0       -

To get information for specific pool you still
need to set pool name after --pool:
lfs quota -u quota_usr --pool flash /mnt/testfs

Patch also adds sanity-quota_74 to check new
feature.

Test-Parameters: trivial testlist=sanity-quota
HPE-bug-id: LUS-8720
Change-Id: Ib918eef84c2352946ce13342471f36e2b500df32
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39975
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13419 osc: Move shrink update to per-write 14/38214/6
Patrick Farrell [Mon, 13 Apr 2020 16:23:42 +0000 (11:23 -0500)]
LU-13419 osc: Move shrink update to per-write

Updating the grant shrink interval is currently done for
each page submitted, rather than once per write.  Since
the grant shrink interval is in seconds, this is
unnecessary.

This came up because this function showed up in the perf
traces for https://review.whamcloud.com/#/c/38151/, and
it is called with the cl_loi_list_lock held.

Note that this change makes this access to the grant shrink
interval a 'dirty' access, without locking, but the grant
shrink interval is:
A) Already accessed like this in various places, and
B) can safely be out of date or suffer a lost update
without affecting correctness or performance.

IOR performance testing with this test:
mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

No patches:
5942 MiB/s
With 38151:
14950 MiB/s
With 38151+this:
15320 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I8110b3c2570c183d58be2bccdbf76813ea3e373a
Reviewed-on: https://review.whamcloud.com/38214
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14732 utils: ensure hsm_user_request extent fields are zero 93/43893/3
James Simmons [Tue, 1 Jun 2021 23:00:58 +0000 (19:00 -0400)]
LU-14732 utils: ensure hsm_user_request extent fields are zero

Another patch changed the linking flags for liblustreapi which
caused sanit-hsm 29c to fail. Debugging revealed the reason was
the extent fields in hsm_user_request sent to the kernel was
not zeroed out. For some reason the compiler flags hide this
bug. I found this is easily reproduced by removing
lhsmtool_posix_DEPENDENCIES which is not needed anyways since
the build system can calculate those dependencies for us.
Update the function llapi_hsm_user_request_alloc to zero out
the struct hsm_user_request allocated.

Change-Id: I02c1d6b5cec1ed3e89086a6a00e30ca81768409c
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43893
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10350 lod: adjust stripe count to available ost count 82/43882/4
Bobi Jam [Fri, 28 May 2021 08:25:52 +0000 (16:25 +0800)]
LU-10350 lod: adjust stripe count to available ost count

* When user specifies -1 stripe count or more stripe count than the
  ost count of a pool, we'd adjust the stripe count otherwise we
  cannot alloc enough stripe objects, as LOD reports as follows:

  lod_alloc_specific() can't lstripe objid [obj_fid]: have %d want %u

  where %d is the ost count of a pool, and %u is the total ost count
  if user specifies -1 stripe count of a bigger stripe count value
  than %d as user specifies.

* In ost-pool.sh, reset $MOUNT's stripe offset, so that the created
  diretory will not inherit it from root directory.

* Preserve the root directory layout in replay-single (run before
  ost-pools) to avoid leaving a bad layout on the root dir.
  Lustre-change: https://review.whamcloud.com/43872

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idf6884faf1271a3864710aeab0ba0eca154bf492
Reviewed-on: https://review.whamcloud.com/43882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14711 osc: Notify server if cache discard takes a long time 57/43857/7
Oleg Drokin [Fri, 28 May 2021 02:34:44 +0000 (22:34 -0400)]
LU-14711 osc: Notify server if cache discard takes a long time

Discarding a large number of pages from a mapping under a
single lock can take a really long time (750GB is over 170s).
Since there is no stream of RPCs sent to the server as with
read or write to prolong the DLM lock timeout, the server
may evict the client as it does not see progress is being made.

As such send periodic "empty" RPCs to the server to show the
client is still alive and working on the pages under the lock.

For compatibility reasons the RPC is formed as a one-byte
OST_READ request with a special flag set to avoid doing
actual IO, but older servers actually do the one-byte read

Change-Id: I4603c83e92c328d93e29adce8cbfac3d561b25d5
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43857
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
4 months agoLU-14721 tests: wait_destroy_complete should check MDTs 70/43870/3
Oleg Drokin [Sat, 29 May 2021 03:45:20 +0000 (23:45 -0400)]
LU-14721 tests: wait_destroy_complete should check MDTs

Ever since destroys handling was moved to MDTs we need to
move waiting for destroys completion to MDTs as well.

Change-Id: I31440ec048b960206a903387d7050aa13e45008d
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43870
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoNew tag 2.14.52 2.14.52 v2_14_52
Oleg Drokin [Fri, 11 Jun 2021 17:09:09 +0000 (13:09 -0400)]
New tag 2.14.52

Change-Id: I9882f84941588ab2a92f1d736559a6e903b32d49
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13783 procfs: fix improper prop_ops fields 80/43880/3
James Simmons [Mon, 31 May 2021 17:05:50 +0000 (13:05 -0400)]
LU-13783 procfs: fix improper prop_ops fields

The lod pool and nodemap proc_ops missed renaming the fields to
start with .proc_*. On newer distros like Ubuntu 20.04 HWE you
get the following compile error:

lustre-release/lustre/ptlrpc/nodemap_lproc.c:686:3: error: ‘const struct proc_ops’ has no member named ‘open’
  686 |  .open   = nodemap_ranges_open,

Test-Parameters: trivial
Fixes: 13cd0f9f667 ("LU-13344 libcfs: Abstract proc_fs with proc_ops")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I5fff7519a801f585690d468255f7ca6c73adcc90
Reviewed-on: https://review.whamcloud.com/43880
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14725 tests: sanity/27Q to remove own symlink in /tmp 75/43875/5
Alex Zhuravlev [Sun, 30 May 2021 15:36:39 +0000 (18:36 +0300)]
LU-14725 tests: sanity/27Q to remove own symlink in /tmp

otherwise any subsequent restart of MDS/MGS on a local setup
with ZFS backend gets stuck as zpool import scans /tmp and
stat's every found file.

Test-Parameters: trivial
Fixes: cd4caef54f ("LU-14583 llapi: handle symlinks in llapi_file_get_stripe()")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2eb4cb8819670acef0302e1fe5ab767be7f46842
Reviewed-on: https://review.whamcloud.com/43875
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14607 osp: separate buffer for large XATTR 36/43736/6
Lai Siyao [Wed, 19 May 2021 02:58:19 +0000 (10:58 +0800)]
LU-14607 osp: separate buffer for large XATTR

Once XATTR is too large to fit into PAGE_SIZE, allocate value in a
separate buffer for osp_xattr_entry.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ied090ff73e2e5cdeaf2d91a3670067210f2ab1d7
Reviewed-on: https://review.whamcloud.com/43736
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14661 lnet: Check if discovery toggled off in ping reply 08/43508/4
Chris Horn [Wed, 27 Jan 2021 18:22:09 +0000 (12:22 -0600)]
LU-14661 lnet: Check if discovery toggled off in ping reply

If a peer is initially discovered and found to have discovery
enabled, but the peer later reloads LNet with discovery disabled,
then we can delete the peer and re-create it the next time the peer
is discovered.

It is safe to delete and re-create the peer as long as it wasn't
configured manually.

In lnet_peer_deletion(), we need to use lnet_del_init() when removing
the peer from the discovery queue because the lnet_peer_del() code
path can result in a call to lnet_peer_queue_for_discovery() where
we check if the lp_dc_list is empty.

Test-Parameters: trivial
HPE-bug-id: LUS-9178
Fixes: aa7de0af69 ("LU-13895 lnet: Prevent discovery on peer marked deletion")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0b43d7541711a3b94c492082d4a29487ebe72b09
Reviewed-on: https://review.whamcloud.com/43508
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14660 lnet: Fix destination NID for discovery PUSH 07/43507/2
Chris Horn [Fri, 29 Jan 2021 14:08:08 +0000 (17:08 +0300)]
LU-14660 lnet: Fix destination NID for discovery PUSH

If we're sending a discovery PUSH after receiving a discovery
REPLY then we want to send via the same NID that the reply was
sent to. This introduces a challenge in selecting an appropriate
destination NID for the PUSH because lnet_select_pathway() will not
run the MR selection algorithm for choosing a peer NI if the source
NI has been specified.

It is reasonable to assume that the NID used by the message
originator in sending the REPLY is a suitable destination for the
discovery PUSH. Thus, we record this NID in the same location we
currently record the lp_disc_src_nid, and use it when sending the
PUSH. With this change, the only other user of lnet_peer_select_nid()
is lnet_peer_send_ping(). In the ping case we do not set a source NID,
so lnet_select_pathway() is free to choose any peer NI. So this change
allows us to get rid of lnet_peer_select_nid() altogether.

Alternatively, we would need to reproduce a lot of the path selection
algorithm inside lnet_peer_select_nid() in order to avoid sending to
unhealthy NIDs. It seems undesirable and unnecessary to duplicate that
logic.

Test-Parameters: trivial
HPE-bug-id: LUS-9333
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I47ef856075f049d71c395565974204b8f6fa9003
Reviewed-on: https://review.whamcloud.com/43507
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14432 misc: update e2fsprogs to 1.46.2.wc1 69/43469/2
Li Dongyang [Tue, 27 Apr 2021 23:31:59 +0000 (09:31 +1000)]
LU-14432 misc: update e2fsprogs to 1.46.2.wc1

Update Changelog for the new e2fsprogs release.

Change-Id: I173c43f1c777b7223a56841a06545c1741e1a903
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/43469
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
4 months agoLU-11839 iokit: Fix help message 14/43114/2
Arshad Hussain [Thu, 25 Mar 2021 10:50:59 +0000 (16:20 +0530)]
LU-11839 iokit: Fix help message

This patch fixes help message of iokit-gather-stats
to properly add "--help" instead of "-help". Two
hyphen/dashes are expected by getopts.

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I64270598fc19377571b68066d617b50fcb48cc12
Reviewed-on: https://review.whamcloud.com/43114
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14544 tests: duplicated export entries 30/42130/5
Elena Gryaznova [Mon, 22 Mar 2021 16:55:02 +0000 (19:55 +0300)]
LU-14544 tests: duplicated export entries

File /etc/exports could have $MNTPNT entry if
previous NFS tests interrupted.
With duplicated entries in /etc/exports nfs server
service fails to start/restart in RHEL 8 and SLES15.
Patch cleanups exports file before adding $MNTPNT entry.

Test-Parameters: trivial     clientdistro=el8.3 serverdistro=el8.3     testlist=parallel-scale-nfsv3,parallel-scale-nfsv4
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-6291
Reviewed-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I738bd0e8e79dc1ba84e6aa70e06fa47c49a935e0
Reviewed-on: https://review.whamcloud.com/42130
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14274 tests: enhance racer to set extra layout 77/41077/3
Elena Gryaznova [Mon, 24 May 2021 16:45:12 +0000 (19:45 +0300)]
LU-14274 tests: enhance racer to set extra layout

Patch adds an ability to set:
  - a generic "RACER_EXTRA_LAYOUT" contained any kind of
    layout in addition to layouts defined for
    RACER_ENABLE_*;
  - an initial racer RACER_PROGS commands list.
    The additional commands specified by RACER_EXTRA,
    RACER_ENABLE_REMOTE_DIRS, RACER_ENABLE_STRIPED_DIRS
    and RACER_ENABLE_MIGRATION are not ignored, i.e. the
    following parameters are to be set to run file_create only:
        RACER_ENABLE_REMOTE_DIRS=false
        RACER_ENABLE_STRIPED_DIRS=false
        RACER_ENABLE_MIGRATION=false.
Patch fixes NUM_THREADS and MAX_FILES to be passed correctly.

Test-Parameters: testlist=racer
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9142
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I9994dfdb0555a3acd75daa4cfd27a0cb62074e36
Reviewed-on: https://review.whamcloud.com/41077
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14138 ptlrpc: move more members in PTLRPC request into pill 69/40669/11
Qian Yingjin [Tue, 17 Nov 2020 15:12:44 +0000 (23:12 +0800)]
LU-14138 ptlrpc: move more members in PTLRPC request into pill

Some data members in the data structure @ptlrpc_request can be
moved into the data structure @rep_capsule:
/** Request message - what client sent */
struct lustre_msg *rq_reqmsg;
/** Reply message - server response */
struct lustre_msg *rq_repmsg;
/** Fields that help to see if request and reply were swabbed */
__u32 rq_req_swab_mask;
__u32 rq_rep_swab_mask;

After these data structures are reconstructed, @rep_capsule can
be more common used and it makes pack and unpack sub requests
in a batch PtlRPC request for the coming batch metadata processing
more easily.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib6d942b79ebf1a444d63b55ad4bc94813cf947c7
Reviewed-on: https://review.whamcloud.com/40669
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-14459 lmv: change default hash type to crush 84/43684/5
Andreas Dilger [Thu, 13 May 2021 01:20:04 +0000 (19:20 -0600)]
LU-14459 lmv: change default hash type to crush

Change the default hash type to CRUSH to minimize the number
of directory entries that need to be migrated.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I75aff45898044be9d12ae1bfad31b4693b3ebbe5
Reviewed-on: https://review.whamcloud.com/43684
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14690 kernel: new kernel [RHEL 8.4 4.18.0-305.3.1.el8_4] 25/43725/6
Jian Yu [Fri, 4 Jun 2021 07:37:18 +0000 (00:37 -0700)]
LU-14690 kernel: new kernel [RHEL 8.4 4.18.0-305.3.1.el8_4]

This patch makes changes to support new RHEL 8.4 release
for Lustre client.

Test-Parameters: trivial clientdistro=el8.4

Change-Id: I47d4706f9175d489ef0e6226492af20f44f0677e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43725
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14703 build: fixup lustre-libcfs.m4 comments 76/43776/2
Olaf Faaland [Tue, 25 May 2021 02:10:36 +0000 (19:10 -0700)]
LU-14703 build: fixup lustre-libcfs.m4 comments

Several macros whose ends are commented to identify the macro being
defined, like

.. # LIBCFS_FUBAR

have the wrong macro named in the comment.  Fix those
end comments so they match the opening comment or the
AC_DEFUN correctly.

Test-Parameters: trivial
Change-Id: Ia40ccb9e271e90306df37d0028734a84684e42ef
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/43776
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14682 tests: sanity-flr to remove temporary files 69/43669/4
Alex Zhuravlev [Wed, 12 May 2021 04:33:02 +0000 (07:33 +0300)]
LU-14682 tests: sanity-flr to remove temporary files

otherwise the test fails frequently on small local systems due
to lack of space.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7076bcf2346ae1ec7a4d1bead3d94b2c4bb57bbf
Reviewed-on: https://review.whamcloud.com/43669
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14665 lnet: simplify lnet_ni_add_interface 25/43525/12
Olaf Faaland [Tue, 4 May 2021 02:40:22 +0000 (19:40 -0700)]
LU-14665 lnet: simplify lnet_ni_add_interface

Remove an unnecessary counter and move the comment before
the relevant code.  Improve error messages.

Test-parameters: trivial

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Iffc7a128b16bc1b2be7a44413a5972c97b12a5fa
Reviewed-on: https://review.whamcloud.com/43525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13284 tests: few tests miss MDS_MOUNT_OPTS/OST_MOUNT_OPTS 69/37669/18
Alex Zhuravlev [Thu, 20 Feb 2020 11:57:08 +0000 (14:57 +0300)]
LU-13284 tests: few tests miss MDS_MOUNT_OPTS/OST_MOUNT_OPTS

Some tests mount servers without MDS_MOUNT_OPTS or OST_MOUNT_OPTS,
then localrecov mount option is lost and subsequent tests may fail
in a local testing environment.

Fixes: 8bd04b4e57 ("LU-12722 target: disable recovery for local clients")

Change-Id: I4e5d3a8678d027809ea9a0d129fbfbc8c6beae09
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37669
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14702 osc: cleanup comment in osc_object_is_contended 75/43775/3
Li Xi [Tue, 25 May 2021 00:49:01 +0000 (08:49 +0800)]
LU-14702 osc: cleanup comment in osc_object_is_contended

ll_file_is_contended() does not exist any more, so the comment
is invalid.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: Ib68e8dc885e6812065c076d36dc61938a30d6980
Reviewed-on: https://review.whamcloud.com/43775
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14701 tests: wrong get[set]_osd_param() call 74/43774/2
Elena Gryaznova [Mon, 24 May 2021 15:14:26 +0000 (18:14 +0300)]
LU-14701 tests: wrong get[set]_osd_param() call

sanity-dom:sanityn:test_19 is always skipped because of
get_osd_param() is called incorrectly for DOM=yes.
For osd-* MDT device is to be used instead of default
  device=${2:-$FSNAME-OST*}

Fixes: a7625cd2f37a ("LU-3285 test: add Data-on-MDT tests and fixes")
Test-Parameters: trivial testlist=sanity-dom env=ONLY=sanityn
Test-Parameters: trivial testlist=sanityn env=ONLY=19
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9965
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I2bb9fc7fbaac966ea2254071e7ea82b963a93ad3
Reviewed-on: https://review.whamcloud.com/43774
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14693 mdt: skip DLM when opening volatile files 42/43742/2
John L. Hammond [Wed, 28 Apr 2021 18:43:51 +0000 (13:43 -0500)]
LU-14693 mdt: skip DLM when opening volatile files

In mdt_reint_open(), when opening a volatile file skip taking a
MDS_INODELOCK_UPDATE lock on the parent directory.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8ee89710f52e8097e1412897de91159702560e4a
Reviewed-on: https://review.whamcloud.com/43742
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13055 libcfs: allow comma-separated masks 41/43741/4
Andreas Dilger [Wed, 19 May 2021 07:44:32 +0000 (01:44 -0600)]
LU-13055 libcfs: allow comma-separated masks

For debug and changelog mask names, allow a comma-separated list
of names to be given, so that the space-separated list does not
need to be quoted for use.

Change sanity-quota to use a comma-separated list to verify it works.

Fix a couple of test cases where the debug parameter is set and
printed overly verbosely during tests.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icf1e3ebc74f0e48b38a65486b2275ec4c33ebbe5
Reviewed-on: https://review.whamcloud.com/43741
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14688 mdt: changelog purge deletes plain llog 19/43719/2
Alexander Boyko [Mon, 17 May 2021 13:29:01 +0000 (09:29 -0400)]
LU-14688 mdt: changelog purge deletes plain llog

With a massive cancel records changelog could delete a plain
llog file and skip one by one record cancelling.
Also patch fixes the race between llog_destroy and llog_next_block.

HPE-bug-id: LUS-9950
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I47c2ed97945e979745255381f83b6a417d7ba8b1
Reviewed-on: https://review.whamcloud.com/43719
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11188 lfs: add "--perm" option to "lfs find" 15/43715/4
Courrier Guillaume [Thu, 29 Apr 2021 09:35:01 +0000 (11:35 +0200)]
LU-11188 lfs: add "--perm" option to "lfs find"

Add support for "--perm" option to "lfs find".
The option supports both octal and symbolic representation and
follows the POSIX standard.
As for GNU find, it supports '-' and '/' modifiers before the
permission.

Signed-off-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Change-Id: I8e1292421986c3a4bde686f3c7dc7bfcb679cabc
Reviewed-on: https://review.whamcloud.com/43715
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
4 months agoLU-9537 utils: implement "lfs getstripe --fid" for directories 14/43714/4
Yoann Valeri [Wed, 12 May 2021 09:48:04 +0000 (11:48 +0200)]
LU-9537 utils: implement "lfs getstripe --fid" for directories

Enhance the lfs command by displaying a directory fid when using "lfs
getstripe --fid" on one.

When displaying information through "lfs getstripe --fid", we would
check if the given path was associated to a directory or not. If so,
the fid display would just be skipped, showing a simple blank line.
However, a user could still find the fid of a directory by using "lfs
path2fid" on the same directory.  Therefore, this patch adds a hook to
the underlying "llapi_fd2fid()" (called internally by "lfs path2fid")
when trying to display a directory fid with "lfs getstripe --fid".

Signed-off-by: Yoann Valeri <yoann.valeri@cea.fr>
Change-Id: Ia153717e3feb1a359b8b54297995365fc34a1c29
Reviewed-on: https://review.whamcloud.com/43714
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12678 lnet: use list_for_each_entry() 91/43591/4
James Simmons [Mon, 10 May 2021 20:10:29 +0000 (16:10 -0400)]
LU-12678 lnet: use list_for_each_entry()

Several loops use list_for_each(), then call list_entry()
each time in the loop This complexity can be replaced with
the use of  list_for_each_entry().

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ib7968466c4fce5173b20cbaf6c878975ba522d43
Reviewed-on: https://review.whamcloud.com/43591
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12836 osd-zfs: Catch all ZFS pool change events 52/43552/3
Tony Hutter [Fri, 12 Mar 2021 01:23:16 +0000 (17:23 -0800)]
LU-12836 osd-zfs: Catch all ZFS pool change events

This change adds the following symlinks:

  vdev_attach-lustre -> statechange-lustre.sh
  vdev_remove-lustre -> statechange-lustre.sh
  vdev_clear-lustre -> statechange-lustre.sh

This makes it so the statechange-lustre.sh script is also called on
all ZFS events that could change the pool state.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Change-Id: I18edc86749e8ab91bb45f21aafd3fd47e78cbaef
Reviewed-on: https://review.whamcloud.com/43552
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14663 mdc: start changelog thread upon first access 13/43513/5
Alex Zhuravlev [Sun, 2 May 2021 09:16:01 +0000 (12:16 +0300)]
LU-14663 mdc: start changelog thread upon first access

thus leaving the caller a chance to set CHANGELOG_FLAG_FOLLOW,
otherwise the thread (started from open()) can reach the end
of the changelog and exit early.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic14b6c991010bbe5197b5a8b0fedf0f4007e98c1
Reviewed-on: https://review.whamcloud.com/43513
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13717 sec: limit hard links to linkEA size for enc files 87/43387/4
Sebastien Buisson [Mon, 19 Oct 2020 14:23:05 +0000 (23:23 +0900)]
LU-13717 sec: limit hard links to linkEA size for enc files

Some operations on encrypted files require to identify all names for
files having the same FID. For instance, for lookup, getattr or unlink
on encrypted files without the encryption key, we need to perform an
operation by FID instead of the actual name.
In order to make operations by FID unambiguous on server side, we
decide to limit the number of possible hard links for encrypted files,
to what the linkEA can contain.
Currently linkEA stores 4KiB of links, that is 14 NAME_MAX links, or
119 16-byte names.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I20a01874899f95b2ff61e05b2aa6851d135633e8
Reviewed-on: https://review.whamcloud.com/43387
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14629 sec: forbid file rename from enc to unencrypted dir 04/43404/6
Sebastien Buisson [Thu, 22 Apr 2021 09:26:51 +0000 (11:26 +0200)]
LU-14629 sec: forbid file rename from enc to unencrypted dir

fscrypt allows renaming an encrypted file from an encrypted directory
into an unencrypted directory. But it leaves the file encrypted,
sitting in an unencrypted directory, which can lead to unexpected
issues.
So just prevent this kind of rename, and adapt sanity-sec test_47
accordingly.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I38e17caa4786c1c8d80a363a826a5aa298eb0980
Reviewed-on: https://review.whamcloud.com/43404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14586 tests: set mpi np correctly 17/43217/2
Elena Gryaznova [Tue, 6 Apr 2021 08:05:02 +0000 (11:05 +0300)]
LU-14586 tests: set mpi np correctly

The number of mpi processes is to be calculated
based on the number of clients in clients subset.

Fixes: 9ecb000 ("LU-13281 tests: ha.sh improvements")
Test-Parameters: trivial
Signed-off-by: Elenai Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9716
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: If574743e2e29a309a8d7a021056fa726495fa959
Reviewed-on: https://review.whamcloud.com/43217
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14340 tests: remove stale test-framework functions 66/41266/3
Andreas Dilger [Tue, 19 Jan 2021 04:43:57 +0000 (21:43 -0700)]
LU-14340 tests: remove stale test-framework functions

Delete functions not referenced anywhere in test-framework.sh
(each one will still appear only once, where it is defined):

  $ grep "^[a-z].* *()" lustre/tests/test-framework.sh |
  sed -e 's/function //' -e 's/ *(.*//' |
  while read F; do (( $(grep $F lustre/tests/*.sh | wc -l) > 1 )) ||
      echo "$F"
  done

  mdsdevlabel ostdevlabel cleanup_check obd_name unmount_zfs
  is_empty_fs get_svr_devs at_min_get canonical_path agts_nodes
  mixed_ost_devs setstripe_nfsserver delayed_recovery_enabled
  mds_on_old_device remove_ost_objects remove_mdt_files
  duplicate_mdt_files get_block_size

The unmount_zfs() function returned by the above check *is*
used via unmount_fstype() calling it as "unmount_$fstype".

Fixes: 5a3dfc2b5d90 ("LU-7301 tests: delete old lfsck tests")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I71842152a87f15918147da860745ef8e981f6121
Reviewed-on: https://review.whamcloud.com/41266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13950 lnet: do not crash if lnet_sock_getaddr returns error 34/39834/6
Artem Blagodarenko [Tue, 25 Aug 2020 10:01:11 +0000 (06:01 -0400)]
LU-13950 lnet: do not crash if lnet_sock_getaddr returns error

Some issues with network lead to panic in ksocknal_accept

rc = lnet_sock_getaddr(sock, true, &peer_ip, &peer_port);
LASSERT(rc == 0); /* we succeeded before */

Let's pass this error to the caller.

Change-Id: I34d43c19b4e75422db50e7abb02cac3510882b0d
hpe-bug-id: LUS-9256
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157753
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/39834
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14687 llite: Return errors for aio 22/43722/7
Patrick Farrell [Wed, 19 May 2021 18:08:57 +0000 (14:08 -0400)]
LU-14687 llite: Return errors for aio

The aio code incorrectly discards errors from
ll_direct_rw_pages.  Fix this and add a test for this.

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I49dadd0b3692820687fa6a1339e00516edf7a5d5
Reviewed-on: https://review.whamcloud.com/43722
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11290 osc: Batch gang_lookup cbs 89/33089/10
Patrick Farrell [Wed, 3 Mar 2021 06:50:04 +0000 (09:50 +0300)]
LU-11290 osc: Batch gang_lookup cbs

The osc_page_gang_lookup call backs can be trivially
converted to operate in batches rather than one page at a
time.  This improves cancellation time for locks protecting
large numbers of pages by about 10% (after landing
another optimization (LU-11290 ldlm: page discard speedup)
it shows 6% for canceling a lock for 30GB cached file ).

Truncate to zero time (with one lock protecting many pages)
was improved by about 5-10% as well.  Lock weighing
performance should be improved slightly as well, but is
tricky to benchmark.

HPE-bug-id: LUS-6432
Change-Id: Ib30594ae97182cbeb18051d6cee860c97ae7e119
Signed-off-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/33089
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14273 tests: enhance ha.sh to run custom cmd on bg 73/41073/2
Elena Gryaznova [Tue, 22 Dec 2020 15:19:15 +0000 (18:19 +0300)]
LU-14273 tests: enhance ha.sh to run custom cmd on bg

We need this ability to run lfs migrate in parallel
with mdtest and IOR.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9371
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: Id52ed0731aba24d3f40813da5fd2bb9b94ae63e5
Reviewed-on: https://review.whamcloud.com/41073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13942 obd: check if sbi->ll_md_exp is initialized 12/39812/6
Artem Blagodarenko [Fri, 21 Aug 2020 17:43:38 +0000 (13:43 -0400)]
LU-13942 obd: check if sbi->ll_md_exp is initialized

Null reference at the start of obd_statfs() function is possible
because of ll_fill_super vs lctl race.

ll_md_exp is initialized in ll_fill_super()->
client_common_fill_super(), but if mount process stucks
in lustre_process_log() it doesn't reach client_common_fill_super().

Change-Id: Ife72a62ba42573e2a9c6d244e36cde738b70c15a
hpe-bug-id: LUS-9150
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157732
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/39812
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14642 flr: transfer layout version on layout change 72/43472/7
Bobi Jam [Wed, 28 Apr 2021 05:16:05 +0000 (13:16 +0800)]
LU-14642 flr: transfer layout version on layout change

After layout changed (mirror extend/split), the file's layout version
needs to transfer to OST ASAP so that following IO won't be blocked
since OFD will check its layout version stored in the xattr
XATTR_NAME_FID and find that the layout version from the client IO is
bigger (ofd_verify_layout_version()).

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I353800e868eaf13e3c795926b0d76fb1eb45c535
Reviewed-on: https://review.whamcloud.com/43472
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14645 utils: setstripe cleanup 65/43465/3
Vitaly Fertman [Tue, 27 Apr 2021 19:15:30 +0000 (22:15 +0300)]
LU-14645 utils: setstripe cleanup

lfs setstripe checks stripe parameters differently for PFL and !PFL
layouts. Whereas the PFL layout is checked in comp_args_to_layout()
individually and in llapi_layout_sanity_cb() in pairs, !PFL layout
verification is done partially in several places. Create a common
llapi_stripe_param_verify() for this purpose. Make the checks for
both cases symmetric.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I456b1b2e876229ac1a354d4e3879624325856574
HPE-bug-id: LUS-9886
Reviewed-on: https://es-gerrit.dev.cray.com/158589
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/43465
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14644 vvp: wait for nrpages to be updated 64/43464/2
Vitaly Fertman [Tue, 27 Apr 2021 18:43:06 +0000 (21:43 +0300)]
LU-14644 vvp: wait for nrpages to be updated

truncate_inode_pages() says there still may be a page in a process
of deletion upon return. wait for another thread which is doing
__delete_from_page_cache() to get nrpages updated.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I165b3d0866efaf2eb7e977520ebba4ee831874ab
HPE-bug-id: LUS-8842
Reviewed-on: https://es-gerrit.dev.cray.com/158557
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/43464
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14594 ptlrpc: do not match reply with resent RPC 42/43242/4
Vitaly Fertman [Thu, 8 Apr 2021 12:00:11 +0000 (15:00 +0300)]
LU-14594 ptlrpc: do not match reply with resent RPC

The server is able to filter by the connection ID, and drop late
coming RPCs of previous connections, however it does not happen for
replies. At the same time, this is a problem in some cases.

Allocate new matchbits for resends and check replies by them, instead
of xid. Connect RPCs are exceptions due to interop with old server -
at the time of connect we do not know yet if the server supports it.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I2aad037002b488b0c3371544ede0c47940f87efe
HPE-bug-id: LUS-9596
Reviewed-on: https://es-gerrit.dev.cray.com/158446
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/43242
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14673 sec: annotate algorithms taking optional key 56/43656/3
Sebastien Buisson [Tue, 11 May 2021 08:59:03 +0000 (10:59 +0200)]
LU-14673 sec: annotate algorithms taking optional key

Crypto algorithms implementing a ->setkey() method but that can also
be used without a key must set the CRYPTO_ALG_OPTIONAL_KEY flag if
defined in the kernel.
In Lustre, adler32 implementation defines a ->setkey() method, but
its "key" is not actually a cryptographic key.

Linux-commit: a208fa8f33031b9e0aba44c7d1b7e68eb0cbd29e

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I362211d1b1aa3763fe1481cebb3629b255f29e41
Reviewed-on: https://review.whamcloud.com/43656
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-14689 hsm: starting running HSM coordinator should success 20/43720/2
Li Xi [Mon, 17 May 2021 14:49:55 +0000 (22:49 +0800)]
LU-14689 hsm: starting running HSM coordinator should success

When starting a running coordinator, the command should succeed
no matter how many times the command runs.

And this should be the same for stopping a stopped coordinator.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I99169de35d6fcc11e03604ac63cdc4358e25b3d2
Reviewed-on: https://review.whamcloud.com/43720
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14549 llite: refresh layout after mirror merge/split 16/43716/3
Bobi Jam [Mon, 17 May 2021 09:14:33 +0000 (17:14 +0800)]
LU-14549 llite: refresh layout after mirror merge/split

mirror merge/split updates file's LOVEA and revokes client's layout
lock, but the client issuing the layout change needs to refresh its
layout (lov->lsm) as well.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7671efe2fe5354ba0e1503b146045165608e042c
Reviewed-on: https://review.whamcloud.com/43716
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14430 mdd: use own rec_hdr for changelog declare 83/43683/3
Andreas Dilger [Thu, 13 May 2021 00:41:47 +0000 (18:41 -0600)]
LU-14430 mdd: use own rec_hdr for changelog declare

Do not use an lu_buf just to declare the changelog record.  This
only needs llog_rec_hdr to pass in lrh_len, so declaring rec_hdr
on the stack avoids the overhead of using the lu_buf.

Fixes: f3d03bc38a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b6f1d761aa98aa6ecb023894bde03dce23ebbe5
Reviewed-on: https://review.whamcloud.com/43683
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14681 obclass: fix typo of comment on job ID 67/43667/3
Li Xi [Wed, 12 May 2021 02:21:16 +0000 (10:21 +0800)]
LU-14681 obclass: fix typo of comment on job ID

Comments on how to get job ID are confusing because of the typo.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I9d714323f106dfb76eafc8d70346409b38a9b66b
Reviewed-on: https://review.whamcloud.com/43667
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14291 lustre: rename tgt_pool_* functions. 24/43624/3
James Simmons [Mon, 10 May 2021 13:43:56 +0000 (09:43 -0400)]
LU-14291 lustre: rename tgt_pool_* functions.

Functions starting with tgt_* represents code for target handling
used by Lustre servers. Now that the pool functions are used by
both clients and servers rename it to lu_tgt_* to mirror how
lu_tgt_desc_* is used since both represents remote server targets.

Test-Parameters: trivial
Change-Id: I2a9084d5bf9cea3b373c96e15cba1a41631d1172
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43624
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-14672 kernel: kernel update SLES12 SP5 [4.12.14-122.66.2] 50/43550/2
Jian Yu [Wed, 5 May 2021 18:19:39 +0000 (11:19 -0700)]
LU-14672 kernel: kernel update SLES12 SP5 [4.12.14-122.66.2]

Update SLES12 SP5 kernel to 4.12.14-122.66.2 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: Ib2bf4795ccb21dbd0bb9202228ff32d73a203eee
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14671 kernel: kernel update SLES15 SP2 [5.3.18-24.61.1] 49/43549/2
Jian Yu [Wed, 5 May 2021 18:12:24 +0000 (11:12 -0700)]
LU-14671 kernel: kernel update SLES15 SP2 [5.3.18-24.61.1]

Update SLES15 SP2 kernel to 5.3.18-24.61.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 130 136 817" \
clientdistro=sles15sp2 serverdistro=el7.9 \
testlist=sanity

Change-Id: Ie0aab7cc7200796ed8e4d75862ceaef020943c08
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43549
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14670 kernel: kernel update RHEL7.9 [3.10.0-1160.25.1.el7] 48/43548/3
Jian Yu [Wed, 5 May 2021 17:58:18 +0000 (10:58 -0700)]
LU-14670 kernel: kernel update RHEL7.9 [3.10.0-1160.25.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.25.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: Ic846d648c45476cc4886ce86577605bf3e66d935
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14622 osd: mark pages accessed on reads 67/43367/6
Alex Zhuravlev [Mon, 19 Apr 2021 06:10:17 +0000 (09:10 +0300)]
LU-14622 osd: mark pages accessed on reads

to improve cache hit ratio

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If4850465d118ed62e9da105dc0cf144ff5041fd3
Reviewed-on: https://review.whamcloud.com/43367
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13783 ldiskfs: Add support for mainline 5.8 kernel 73/40373/4
Mr NeilBrown [Wed, 21 Oct 2020 02:34:09 +0000 (13:34 +1100)]
LU-13783 ldiskfs: Add support for mainline 5.8 kernel

Various changes needed for 5.8 over 5.4:
 - ext4_mark_inode_dirty is now a macro, so export
     __export_mark_inode_dirty instead
 - procfs additions need to use 'struct proc_ops'
 - inode-test.c is a new C file that we MUST NOT build
 - various ordinary conflicts

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I681ab26c60fb35a1ef5f518ee7cac8766e6fde47
Reviewed-on: https://review.whamcloud.com/40373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13952 quota: default OST Pool Quotas 73/39873/18
Sergey Cheremencev [Thu, 10 Sep 2020 12:48:01 +0000 (15:48 +0300)]
LU-13952 quota: default OST Pool Quotas

Patch makes ability to set default quota
limits per OST pool.
Patch also adds sanity-quota_73.

Test-Parameters: testlist=sanity-quota
HPE-bug-id: LUS-9133
Change-Id: I9e49def231aeeed4588e5e3fbcd29fdd62a35855
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39873
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13852 pcc: don't alloc FID in LLITE for pcc open 68/39568/11
Lai Siyao [Fri, 31 Jul 2020 18:09:06 +0000 (02:09 +0800)]
LU-13852 pcc: don't alloc FID in LLITE for pcc open

ll_lookup_it(IT_OPEN) always alloc FID on MDT0 for pcc open, but
the open request is sent to MDT where the name hash points to,
which may be different from the MDT where the FID is, which will
trigger osp_md_create() assertion because file is created remotely.

This FID allocation is not necessary, and it can be left to be done
in lmv_intent_open() by LMV layer, because the MDT is chosen in
LMV. Then when it's done, the FID allocated can be used to initialize
PCC inode.

Change assertion in osp_md_create() to error message and return
error.

Update sanity-sec 2a for this.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ccea3f9e7cca5083695c71135b9a5805f833b14
Reviewed-on: https://review.whamcloud.com/39568
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14004 llite: default lsm update may memory leak 03/40103/5
Lai Siyao [Sun, 27 Sep 2020 06:36:57 +0000 (14:36 +0800)]
LU-14004 llite: default lsm update may memory leak

ll_update_default_lsm_md() should check whether lli_default_lsm_md
is set before setting it to the data from lustre_md, and if it's set,
release the old data to avoid memory leak.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I9c8434c5d62f9fb751788031d6769fd49427c371
Reviewed-on: https://review.whamcloud.com/40103
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14646 flr: write a FLR file downgrade SoM 71/43471/5
Bobi Jam [Wed, 28 Apr 2021 04:43:11 +0000 (12:43 +0800)]
LU-14646 flr: write a FLR file downgrade SoM

Seek over file size and write a FLR file does not change its SoM
and that makes file size incorrect.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3075389721bdd40be60e9206c37f6c1bea514cce
Reviewed-on: https://review.whamcloud.com/43471
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14631 quota: fix qunit sort 10/43410/5
Sergey Cheremencev [Mon, 5 Apr 2021 12:27:34 +0000 (15:27 +0300)]
LU-14631 quota: fix qunit sort

Fix lqes_cmp that is used to sort lqes by qunit. As lqes_cmp returns
integer, it returns incorrects values if difference between qunits is
grater than 4GB causing write to hang instead of fail with -EDQUOT:

[<ffffffffc0701945>] cl_sync_io_wait+0x295/0x3c0 [obdclass]
[<ffffffffc07026f8>] cl_io_submit_sync+0x1c8/0x360 [obdclass]
[<ffffffffc128dc0a>] vvp_io_commit_sync+0x12a/0x460 [lustre]
[<ffffffffc128f5ee>] vvp_io_write_commit+0x4de/0x620 [lustre]
[<ffffffffc128fa39>] vvp_io_write_start+0x309/0x990 [lustre]
[<ffffffffc0700a18>] cl_io_start+0x68/0x130 [obdclass]
[<ffffffffc0702e8c>] cl_io_loop+0xcc/0x1c0 [obdclass]
[<ffffffffc1243514>] ll_file_io_generic+0x5c4/0xdc0 [lustre]
[<ffffffffc12441b9>] ll_file_aio_write+0x289/0x730 [lustre]
[<ffffffffc1244760>] ll_file_write+0x100/0x1c0 [lustre]
[<ffffffffa0241320>] vfs_write+0xc0/0x1f0
[<ffffffffa024213f>] SyS_write+0x7f/0xf0

The issue is occurred if a user hits block hard limit in a pool (pools
limit 6GB), while global limit is set to some huge value(53T in my case)

Change global limit in sanity-quota_1e to check that system doesn't
hung anymore.

HPE-bug-id: LUS-9891
Change-Id: I5a16fd3a40172187bbf35d9a9c9bfeef2ef3a108
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/43410
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14647 flr: mmap write/punch does not stale other mirrors 70/43470/5
Bobi Jam [Wed, 28 Apr 2021 05:07:36 +0000 (13:07 +0800)]
LU-14647 flr: mmap write/punch does not stale other mirrors

mmap write and punch/fallocate do not stale other mirrors and makes
FLR file contains different content in different mirrors.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I93a3eb5ba898e3bf0ce108718506b742ed485da5
Reviewed-on: https://review.whamcloud.com/43470
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14430 mdd: use own buffer for changelog 72/43672/3
Mikhail Pershin [Wed, 12 May 2021 06:55:29 +0000 (09:55 +0300)]
LU-14430 mdd: use own buffer for changelog

Use own persistent buffer for changelog needs to don't
interfere with generic big_buf in MDD thread info which
can be in use.

Fixes: f3d03bc38a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I4f4692b5556eaa98e2e23d7b58c925e33401e4e5
Reviewed-on: https://review.whamcloud.com/43672
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14593 utils: specify libzpool 36/43236/4
Alex Zhuravlev [Thu, 8 Apr 2021 14:07:09 +0000 (17:07 +0300)]
LU-14593 utils: specify libzpool

otherwise libmount_utils_zfs doesn't build on some platforms

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id850ba519f7a898e73c099887f0c9eca5c9b5b6f
Reviewed-on: https://review.whamcloud.com/43236
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-12678 o2iblnd: fix bug in list_first_entry() change. 58/43558/2
Mr NeilBrown [Thu, 6 May 2021 00:19:30 +0000 (10:19 +1000)]
LU-12678 o2iblnd: fix bug in list_first_entry() change.

This comparison should be != NULL, else a NULL pointer could be
dereferenced.

Test-Parameters: trivial
Fixes: 34b57a6f8fcd ("LU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4510e2e0f2eb7b5bf86626e5ddb5ee537d3fae02
Reviewed-on: https://review.whamcloud.com/43558
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-13783 libcfs: use lsmcontext in security_release_secctx 84/43284/3
Jian Yu [Tue, 4 May 2021 23:55:34 +0000 (16:55 -0700)]
LU-13783 libcfs: use lsmcontext in security_release_secctx

Kernel linux-hwe-5.8 (5.8.0-22.23~20.04.1) introduces
struct lsmcontext and uses it in security_release_secctx(),
which reduces the argruments from 2 to 1.

Change-Id: I37e185493001d335b40ea0a6102db593cb18beb3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43284
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14526 flr: mirror split downgrade SOM 68/43168/11
Bobi Jam [Tue, 30 Mar 2021 11:20:26 +0000 (19:20 +0800)]
LU-14526 flr: mirror split downgrade SOM

After mirror split, the file's blocks on SoM is not accurate, this
patch downgrade the SoM from STRICT so that size glimpse does not
trust the SoM from the MDS.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I02350c24190d96af93fed8c1b8a0bc6beb2c4bc2
Reviewed-on: https://review.whamcloud.com/43168
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14291 lustre: limit header scope for server only handling 96/43096/12
James Simmons [Sat, 8 May 2021 16:15:33 +0000 (12:15 -0400)]
LU-14291 lustre: limit header scope for server only handling

The lustre headers have server only function declarations and
inline functions. Only include them if HAVE_SERVER_SUPPORT is
set. This gets us closer to building OpenSFS modules against
a Linux kernel with a native Lustre client. Move a few things
from UAPI headers that are used only by kernel space to kernel
only headers where they belong.

For mdc_changlog.c a debug message access a field that is
available the server which is never the case. Since its
nonsense we can remove the report of this field

Change-Id: I6e2d3bebc121aef97fe69344d496e230f62b28ad
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14195 lnet: improve compat code for IPV6_V6ONLY sock opt 59/43559/3
Mr NeilBrown [Thu, 6 May 2021 01:27:28 +0000 (11:27 +1000)]
LU-14195 lnet: improve compat code for IPV6_V6ONLY sock opt

As get_fs() and set_fs() are deprecated, using them to call
sock->ops->setsockopt() is not a good solution.
Since linux 5.9 (v5.8-rc4-1952-ga7b75c5a8c41) it has been
possible to pass a "sockptr" to ->setsockopt() which can provide
a kernel address.

Prior to 5.8, kernet_setsockopt() is available and should still be
used.

For 5.8, when neither preferred option is available, we can pass
a NULL pointer which has the same effect as a pointer to zero.

Fixes: 10d99554631b ("LU-13783 lnet: remove kernel_setsockopt() from lnet_sock_listen()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I78c1f735a73cc9c835371c139e946144c6df5108
Reviewed-on: https://review.whamcloud.com/43559
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14651 uapi: rename CONFIG_T_* to MGS_CFG_T_* 94/43494/5
James Simmons [Tue, 4 May 2021 14:12:05 +0000 (10:12 -0400)]
LU-14651 uapi: rename CONFIG_T_* to MGS_CFG_T_*

The Linux kernel uses CONFIG_* as a way to determine if a feature
is available. Using CONFIG_* in an UAPI is considered an error
and in the most recent kernels will break a build. While we don't
have any CONFIG_* in our UAPI headers we do have CONFIG_T_*
which is used for config logs. This naming confuses the Linux
kernel build system so just rename these variables to MGS_CFG_T_*
instead.

Test-Parameters: trivial
Change-Id: I574510c2da90e9ae608c9e2374d75220b7abb19d
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43494
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14583 llapi: handle symlinks in llapi_file_get_stripe() 29/43229/4
John L. Hammond [Wed, 7 Apr 2021 19:11:25 +0000 (14:11 -0500)]
LU-14583 llapi: handle symlinks in llapi_file_get_stripe()

In llapi_file_get_stripe(), if the IOC_MDC_GETFILESTRIPE ioctl handler
returns -ENOTTY or -ENODATA then try to resolve any symlinks in the
path and try again.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic046d6ef77d8342d47336144e3066cab3a940a96
Reviewed-on: https://review.whamcloud.com/43229
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14565 ofd: Do not rely on tgd_blockbit 54/43154/23
Arshad Hussain [Mon, 29 Mar 2021 05:22:11 +0000 (10:52 +0530)]
LU-14565 ofd: Do not rely on tgd_blockbit

tgd_blockbit is recordsize bits set during mkfs.
This once set does not change. However, 'zfs set'
can be used to change the OST blocksize. Instead
of using cached value of 'tgd_blockbit' always
calculate the blocksize bits which may have
changed.

Test-case: sanity/104c added.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Icc100cca0d5ae492c41d60f0bf97512450f796bc
Reviewed-on: https://review.whamcloud.com/43154
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10948 llite: Introduce inode open heat counter 58/32158/40
Oleg Drokin [Tue, 13 Apr 2021 07:46:41 +0000 (03:46 -0400)]
LU-10948 llite: Introduce inode open heat counter

Initial framework to support detection of naive apps that
assume open-closes are "free" and proceed to open/close
same files between minute operations.

We will track number of file opens per inode and last time inode
was closed.

Initially we'll expose these controls:
llite/opencache_threshold_count - enables functionality and controls after how
                                  many opens open lock is requested
llite/opencache_threshold_ms    - if any reopen happens within this time (in
                                  ms), the open would trigger open lock request
llite/opencache_max_ms          - If last close was longer than this many ms
                                  ago - start counting opens from zero again

Once enough useful data is collected we can look into adding a heatmap
or another similar mechanism to better manage it and enable it
by default with sensible settings.

Change-Id: I1aa5455b458840acad651f651c883a7a7a67ab4c
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32158
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
5 months agoLU-14627 tests: Create unload_modules_local 25/43425/5
Chris Horn [Fri, 23 Apr 2021 19:05:02 +0000 (14:05 -0500)]
LU-14627 tests: Create unload_modules_local

t-f allows for loading modules on single node via load_modules_local.
However, there is no corresponding unload_modules_local that can be
called to cleanup after call to load_modules_local, so we create it.
unload_modules() refactored to use unload_modules_local.

Also address a potential issue that can prevent LND modules from
unloading. Some LNet setup (particularly those in sanity-lnet) may
require that we call lnetctl lnet unconfigure (or lctl net down)
to drop a ref on the module before it can be unloaded.

HPE-bug-id: LUS-9031
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6458a7728f5f559f8641c5a9e29dd775c8445c38
Reviewed-on: https://review.whamcloud.com/43425
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13783 libcfs: support absence of account_page_dirtied 27/40827/7
Mr NeilBrown [Thu, 29 Apr 2021 13:04:04 +0000 (09:04 -0400)]
LU-13783 libcfs: support absence of account_page_dirtied

Some kernels export neither account_page_dirtied nor
kallsyms_lookup_name.
For these kernels we need to use __set_page_dirty() and suffer the
cost of dropping an reclaiming the page-tree lock.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I69d934480832f3909d3ec103f11e1d62489d70d7
Reviewed-on: https://review.whamcloud.com/40827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13722 utils: lnetctl discrepancy in YAML output 69/40269/3
Cyril Bordage [Fri, 16 Oct 2020 14:10:28 +0000 (16:10 +0200)]
LU-13722 utils: lnetctl discrepancy in YAML output

Use "max_interfaces" instead of "max_intf" in YAML output/input.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: Id0d1d1a4220d817b238456946e28b985e1fdc80c
Reviewed-on: https://review.whamcloud.com/40269
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-13912 lnet: Correct the router ping interval calculation 94/39694/7
Chris Horn [Mon, 17 Aug 2020 21:02:10 +0000 (16:02 -0500)]
LU-13912 lnet: Correct the router ping interval calculation

The router ping interval is being divided by the number of local nets
which results in sending pings more frequently than defined by the
alive_router_check_interval. In addition, the current code is structured
such that we may not find a peer net in need of a ping until after
inspecting the router list multiple times. Re-work the code so that the
loop that inspects a router's peer nets will look at all of them until
it either loops back around the list or it finds one that actually
needs to be pinged.

We also move the check of LNET_PEER_RTR_DISCOVERY so that we avoid the
work of inspecting the router's peer nets if the router is already being
discovered.

Test-Parameters: trivial
HPE-bug-id: LUS-9237
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5a4733002f29c0ade6aee62b4424313d5d245556
Reviewed-on: https://review.whamcloud.com/39694
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14641 osd-ldiskfs: write commit declaring improvement 46/43446/7
Wang Shilong [Mon, 26 Apr 2021 03:23:26 +0000 (11:23 +0800)]
LU-14641 osd-ldiskfs: write commit declaring improvement

This patch try to:

1)extent bytes could be missed to increase with less than
1M, fix to to compare it with current value, and decay
it for every allocation.

2)with system space usage growing up, mballoc codes won't
try best to scan block group to align best free extent as
we can. So extent bytes per extent could be decayed to a
very small value, this could make us reserve too many credits.
We could be more optimistic in the credit reservations, even
in a case where the filesystem is nearly full, it is extremely
unlikely that the worst case would ever be hit.

3)Add extent bytes stats and debug ability to analysis
over reservation problem.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I357c4a855147ba26a9e9bbe9ab1269bcfd44e5f3
Reviewed-on: https://review.whamcloud.com/43446
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14603 ptlrpc: quiet messages for unsupported opcodes 57/43257/3
Andreas Dilger [Sun, 11 Apr 2021 02:04:30 +0000 (20:04 -0600)]
LU-14603 ptlrpc: quiet messages for unsupported opcodes

Reduce message spew for unhandled RPC opcodes.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I35496168e3aa29ecb06076654ef0aa97ba2540e5
Reviewed-on: https://review.whamcloud.com/43257
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14669 tests: reduce time spent in sanity-sec 43/43543/4
Sebastien Buisson [Wed, 5 May 2021 13:01:15 +0000 (15:01 +0200)]
LU-14669 tests: reduce time spent in sanity-sec

Several tests in sanity-sec loop over a number of nodemaps (16),
a number of NID ranges (3), a number of IP addresses (6) and a
number of mapped IDs (10).
Whereas the benefits for test coverage of such a large number of
combinations is not clear, looping over all this certainly takes
a lot of time.
So arbitrarily reduce that to smaller numbers in case SLOW!=yes:
- 3 nodemaps;
- 2 NID ranges;
- 2 IP addresses;
- 3 mapped ID
This reduces the test time for sanity-sec from ~12000s to ~7500s.

Similarly, test_fops function, used by tests 16,17,18,19,20,21,22,
loops over a number of users (4), a number of mapped IDs (6), a
number of permission bit masks (4), and for each client.
Arbitrarily reduce that to smaller numbers in case SLOW!=yes, to
save test time without degrading test coverage:
- 2 users;
- 4 mapped IDs;
- 2 permission bit masks;
- from only one client.
This further reduces the test time from ~7500s to ~5000s.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 clientdistro=el8.3 serverdistro=el8.3 testlist=sanity-sec
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 clientdistro=el8.3 serverdistro=el8.3 testlist=sanity-sec env=SLOW=yes
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idbe9c8daca43feafe7ca6481902cf6b54e3fa87c
Reviewed-on: https://review.whamcloud.com/43543
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13439 lmv: qos stay on current MDT if less full 45/43445/9
Andreas Dilger [Sun, 25 Apr 2021 11:02:19 +0000 (05:02 -0600)]
LU-13439 lmv: qos stay on current MDT if less full

Keep "space balanced" subdirectories on the parent MDT if it is less
full than average, since it doesn't make sense to select another MDT
which may occasionally be *more* full.  This also reduces random
"MDT jumping" and needless remote directories.

Reduce the QOS threshold for space balanced LMV layouts, so that the
MDTs don't become too imbalanced before trying to fix the problem.

Change the LUSTRE_OP_MKDIR opcode to be 1 instead of 0, so it can
be seen that a valid opcode has been stored into the structure.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iab34c7eade03d761aa16b08f409f7e5d69cd70bd
Reviewed-on: https://review.whamcloud.com/43445
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13440 lmv: add default LMV inherit depth 31/43131/13
Lai Siyao [Mon, 15 Mar 2021 03:57:36 +0000 (11:57 +0800)]
LU-13440 lmv: add default LMV inherit depth

A new field "__u8 lum_max_inherit" is added into struct lmv_user_md,
which represents the inherit depth of default LMV. It will be
decreased by 1 for subdirectories.

The valid value of lum_max_inherit is [0, 255]:
* 0 means unlimited inherit.
* 1 means inherit end.
* 250 is the max inherit depth.
* [251, 254] are reserved.
* 255 means it's not set.

A new field "__u8 lum_max_inherit_rr" is added, if default stripe
offset is -1, lum_max_inherit_rr is non-zero, and system is balanced,
new directories are created in roundrobin mannner, otherwise they
are created on the MDT where their parents are located to avoid
creating remote directories. And similarly this value will be
decreased by 1 for each level of subdirectories.

The valid value of lum_max_inherit_rr is different:
* 0 means not set.
* 1 means inherit end.
* 250 is the max inherit depth.
* [251, 254] are reserved.
* 255 means unlimited inherit.

However for the user interface of "lfs", the valid value is [-1, 250]:
* -1 means unlimited inherit.
* 0 means not set.
* others are the same.

Add sanity 413c.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I98ccad8556a0469f83bd7d79f5086a2184d5b115
Reviewed-on: https://review.whamcloud.com/43131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
5 months agoLU-13440 obdclass: server qos penalty miscaculated 85/43385/2
Lai Siyao [Wed, 21 Apr 2021 12:05:52 +0000 (20:05 +0800)]
LU-13440 obdclass: server qos penalty miscaculated

Server qos penalty calculation uses active target count, but it
should use server count, which will make it larger than expected,
then weight of targets are often 0, and finally cause MDT0 is
often chosen in qos allocation.

Fixes: 45222b2ef ("LU-12624 obdclass: lu_tgt_descs cleanup")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1982363e4ff74c7344dd5e07d04e29214afa8a7f
Reviewed-on: https://review.whamcloud.com/43385
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
5 months agoLU-14628 ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec() 97/43397/3
Nikitas Angelinas [Thu, 15 Apr 2021 19:09:16 +0000 (12:09 -0700)]
LU-14628 ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec()

sptlrpc_gc_del_sec() calls mutex_lock() which calls might_sleep(), so
the explicit might_sleep() call can be removed as redundant.

Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Test-Parameters: trivial
Change-Id: I48714fae12e63ba5e37ec0f9aa3ab7f688b9475d
Reviewed-on: https://review.whamcloud.com/43397
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory. 19/43419/2
Mr. NeilBrown [Thu, 22 Apr 2021 18:27:38 +0000 (14:27 -0400)]
LU-12678 lnet: use list_first_entry() in lnet/klnds subdirectory.

Convert
  list_entry(foo->next .....)
to
  list_first_entry(foo, ....)

in 'lnet/klnds

In several cases the call is combined with a list_empty() test and
list_first_entry_or_null() is used

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Change-Id: I3b2b33c3c9284c02e44610614d64a1f84be300a4
Reviewed-on: https://review.whamcloud.com/43419
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14632 tests: fix sanity-hsm test_606() 09/43409/3
Elena Gryaznova [Thu, 22 Apr 2021 14:15:42 +0000 (17:15 +0300)]
LU-14632 tests: fix sanity-hsm test_606()

After check_and_setup_lustre() mds1_dev is equal to
/dev/mapper/mds1_flakey, which is unexported to saved real
device  (/dev/vdc) by stop():
  stop mds1
    elif dm_flakey_supported mds1; then
        dm_cleanup_dev mds1
          unexport_dm_dev mds1
As a result stack_trap() is called with non existing
/dev/mapper/mds1_flakey:
  stack_trap 'stop mds1; start mds1 /dev/mapper/mds1_flakey
              -o rw,user_xattr' EXIT
and failed as:
  losetup: /dev/mapper/mds1_flakey: failed to set up loop device:
              No such file or directory
Reproducer:
  run ONLY=606 sh sanity-hsm.sh on "failover" setup (mds1_HOST !=
  mds1failover_HOST), no llmount.sh before the test

Fixes: 54b9e3f78935 ("LU-684 tests: replace dev_read_only patch with dm-flakey")
Test-Parameters: trivial testlist=sanity-hsm env=ONLY=606
Test-Parameters: fstype=zfs testlist=sanity-hsm env=ONLY=606
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9920
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I9ab3cbcf67c6fd046861810a2ceab262f211436b
Reviewed-on: https://review.whamcloud.com/43409
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-13717 sec: rework includes for client encryption 86/43386/3
Sebastien Buisson [Tue, 23 Mar 2021 14:19:01 +0000 (14:19 +0000)]
LU-13717 sec: rework includes for client encryption

Simplify includes for crypto, by not repeating stubs in case
HAVE_LUSTRE_CRYPTO is not defined.
Expose encoding routines that are going to be used in the Lustre
code (both client and server sides) with filename encryption.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6c5853d6da7120edd2bec3a12494251d873151a8
Reviewed-on: https://review.whamcloud.com/43386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>