Whamcloud - gitweb
fs/lustre-release.git
3 months agoLU-14762 lmv: compare space to mkdir on parent MDT 97/43997/5
Lai Siyao [Mon, 14 Jun 2021 07:26:47 +0000 (15:26 +0800)]
LU-14762 lmv: compare space to mkdir on parent MDT

In QOS subdirectory creation, subdirectories are kept on parent MDT
if it is less full than average, however it checks weight other than
free space, while "weight = free space - penalty", if MDTs have
different penalties, the result is not accurate, therefore this may
not work.

Check free space instead, and loosen the critirion to allow the
free space within the range of QOS threshold.

Fixes: 3f6fc483013d ("LU-13439 lmv: qos stay on current MDT if less full")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id34cf8f3f58fee9d329f0d05c2f7a6463b67dfe1
Reviewed-on: https://review.whamcloud.com/43997
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14654 tests: Ensure recovery_limit zero works as expected 02/43502/3
Chris Horn [Thu, 29 Apr 2021 18:09:07 +0000 (13:09 -0500)]
LU-14654 tests: Ensure recovery_limit zero works as expected

When lnet_recovery_limit is set to zero (the default) peer NIs are
eligible for recovery pings indefinitely. Verify this functionality
by modifying sanity-lnet test_211 to use recovery_limit 0 to make
a peer NI re-eligible for recovery.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-9953
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I00cb0940133e15ec73491e875d08b6db2bff3fe5
Reviewed-on: https://review.whamcloud.com/43502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14654 lnet: Correct peer NI recovery age out calculation 01/43501/3
Chris Horn [Thu, 29 Apr 2021 18:14:34 +0000 (13:14 -0500)]
LU-14654 lnet: Correct peer NI recovery age out calculation

The calculation to age a peer NI out of recovery is only valid if
lnet_recovery_limit is non-zero. When set to zero, we allow peer NIs
to be in recovery indefinitely.

Test-Parameters: trivial
HPE-bug-id: LUS-9953
Fixes: cc27201a76 ("LU-13569 lnet: Age peer NI out of recovery")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6bb40ca3a9affa0eaaae9deb1cecdb03e4bb42c5
Reviewed-on: https://review.whamcloud.com/43501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13055 mdd: per-user changelog names and mask 80/43380/9
Mikhail Pershin [Tue, 22 Jun 2021 18:16:26 +0000 (21:16 +0300)]
LU-13055 mdd: per-user changelog names and mask

Allow specifying a name for newly-registered changelog users,
rather than the default "clNNN" that is otherwise used. This
allows services to register a "well-known" changelog user,
rather than having to store the changelog username in HA storage
outside of the filesystem.

Each changelog user still has a unique ID appended to it, to allow
the changelog_clear and changelog_deregister commands to be run
using only the ID if necessary/desired. User name can be used to
deregister. User name is also unique per server.

If no name is given, then default "cl" format is used.

With this new functionality, it is possible to specify the name like:
 # lctl --device testfs-MDT0000 changelog_register --user watcher
   testfs-MDT0000: Registered changelog userid 'cl13-watcher'

Per-user mask is also added to allow specific operation logging on
per-user basis. Mask can be set only during registration. Resulting
mask from per-server mask and all user masks is used for current
changelog operations.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I56028f54cc97bbc9af03fd6559c19ef854f759d8
Reviewed-on: https://review.whamcloud.com/43380
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-4684 tests: enable racer directory migration 59/41359/4
Andreas Dilger [Thu, 28 Jan 2021 20:44:27 +0000 (13:44 -0700)]
LU-4684 tests: enable racer directory migration

Enable the dir_migrate test by default in racer test runs.

Update test selection logic to match newer script code style.

Test-Parameters: trivial testlist=racer env=DURATION=3600
Test-Parameters: fstype=zfs testlist=racer env=DURATION=600
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifba84c64b30d90b4a159232751b68c48c88dafcc
Reviewed-on: https://review.whamcloud.com/41359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14139 llite: simplify callback handling for async getattr 12/40712/11
Qian Yingjin [Thu, 19 Nov 2020 15:15:37 +0000 (23:15 +0800)]
LU-14139 llite: simplify callback handling for async getattr

In this patch, it prepares the inode and set lock data directly in
the callback interpret of the intent async getattr RPC request (in
ptlrpcd context), simplifies the old impementation that defer this
work in the statahead thread.

According to the benchmark result, the workload "ls -l" to a large
directory on a client without any caching (server and client),
containing 1M files (47001 bytes) shows the results with measured
elapsed time:
- w/o patch: 180 seconds;
- w patch: 181 seconds;

There is no any obvious performance regession.

Change-Id: Ifcfad3eb26d831bec3beea0c3d7045f31d35fa6a
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40712
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-6142 obdclass: resolve lu_ref checkpatch issues 88/44088/2
James Simmons [Sat, 26 Jun 2021 18:05:15 +0000 (14:05 -0400)]
LU-6142 obdclass: resolve lu_ref checkpatch issues

Fix up all the checkpatch issues reported for the code handling
lu_ref. Also change USE_LU_REF to CONFIG_LUSTRE_DEBUG_LU_REF
which will match what will be upstream.

Change-Id: I100e2679fc04c97eb67e4d44c4f6a6b530da6fa8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44088
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14734 osd-ldiskfs: enable large_dir automatically 31/43931/7
Andreas Dilger [Sat, 5 Jun 2021 08:34:15 +0000 (02:34 -0600)]
LU-14734 osd-ldiskfs: enable large_dir automatically

Enable the large_dir feature automatically at mount time for
filesystems that do not have it enabled already.  Otherwise,
the REMOTE_PARENT_DIR may overflow if there are many remote
entries created, or for object directories on very large OSTs.
It isn't really needed on a dedicated MGS filesystem.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1c4ead26b09d60567ad12945d7b366b53475cebb
Reviewed-on: https://review.whamcloud.com/43931
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14516 mgc: configurable wait-to-reprocess time 20/42020/19
Alex Zhuravlev [Fri, 12 Mar 2021 09:00:37 +0000 (12:00 +0300)]
LU-14516 mgc: configurable wait-to-reprocess time

so we can set it shorter, for testing purposes at least. to change
minimal wait time MGC module option 'mgc_requeue_timeout_min'
should be used (in seconds). additionally a random value upto
mgc_requeue_timeout_min is added to avoid a flood of config re-read
requests from clients. if mgc_requeue_timeout_min is set to 0,
then random part will be upto 1 second.

ost-pools: before: 5840s, after:a 3474s
sanity-flr: before: 1575s, after: 1381s
sanity-quota: before: 10679s, after: 9703s

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iff7dad4ba14d687b7e891a1c346397e4c370800d
Reviewed-on: https://review.whamcloud.com/42020
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14074 scripts: automatic LNet unconfigure 98/41698/3
Cyril Bordage [Fri, 19 Feb 2021 17:12:45 +0000 (18:12 +0100)]
LU-14074 scripts: automatic LNet unconfigure

After using the lnetctl utility a reference count is taken on the LNet
modules. lnetctl lnet unconfigure is called in order for
lustre_rmmod to remove the LNet module.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7251a0c62c45da7b3cb0fddea97394b32cb6902a
Reviewed-on: https://review.whamcloud.com/41698
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13799 osc: Simplify clipping for transient pages 40/39440/12
Patrick Farrell [Fri, 7 May 2021 15:38:07 +0000 (11:38 -0400)]
LU-13799 osc: Simplify clipping for transient pages

The combination of page clip and page flag setting for
transient pages takes up several % of the time when
submitting them for async DIO.

But neither is required - Transient pages do not change
after creation except in limited cases, and in any case,
they are only accessible from the submitting thread -
there is no possibility of parallel access.

So we can set the page flags, etc, at init time.

This patch improves i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 22 ms/GiB

Totals:
Write: 204 ms/GiB
Read: 198 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     4647 MiB/s
read      4888 MiB/s

Plus this patch:
write     5030 MiB/s
read      5174 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I974ebb0f55734a8628f1f7e1c01092eb2ce5f83b
Reviewed-on: https://review.whamcloud.com/39440
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13799 clio: Implement real list splice 39/39439/11
Patrick Farrell [Fri, 7 May 2021 15:37:40 +0000 (11:37 -0400)]
LU-13799 clio: Implement real list splice

Lustre's list_splice is actually just a slightly
depressing list_for_each; let's use a real list_splice.

This saves significant time in AIO/DIO page submission,
getting a several % performance boost.

This patch reduces i/o time in ms/GiB by:
Write: 16 ms/GiB
Read: 14 ms/GiB

Totals:
Write: 220 ms/GiB
Read: 209 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     4326 MiB/s
read      4587 MiB/s

With this patch:
write     4647 MiB/s
read      4888 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Icfd4a3d9dd6f162b011b402a1c88d7dae53eff40
Reviewed-on: https://review.whamcloud.com/39439
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-13799 osc: Don't get time for each page 37/39437/13
Patrick Farrell [Fri, 7 May 2021 15:35:28 +0000 (11:35 -0400)]
LU-13799 osc: Don't get time for each page

Getting the time when each batch of pages starts is
sufficiently accurate, and ktime_get() is several % of the
CPU time when doing AIO + DIO.

This relies on previous patches in this series.

Measuring this in milliseconds/gigabyte lets us measure the
improvement in absolute terms, rather than just relative
terms.

This patch reduces i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 6 ms/GiB

Totals:
Write: 237 ms/GiB
Read: 223 ms/GiB

IOR:
mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
Without the patch:
write     4030 MiB/s
read      4468  MiB/s

With patch:
write     4326 MiB/s
read      4587 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I02897bf810683bc77a7d09156cdb83ba1d25ebf1
Reviewed-on: https://review.whamcloud.com/39437
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-13798 llite: parallelize direct i/o issuance 36/39436/30
Patrick Farrell [Fri, 28 May 2021 23:53:55 +0000 (19:53 -0400)]
LU-13798 llite: parallelize direct i/o issuance

Currently, the direct i/o code issues an i/o to a given
stripe, and then waits for that i/o to complete.  (This is
for i/os from a single process.)  This forces DIO to send
only one RPC at a time, serially.

In the case of multi-stripe files and larger i/os from
userspace, this means that i/o is serialized - so single
thread/single process direct i/o doesn't see any benefit
from the combination of extra stripes & larger i/os.

Using part of the AIO support, it is possible to move this
waiting up a level, so it happens after all the i/o is
issued.  (See LU-4198 for AIO support.)

This means we can issue many RPCs and then wait,
dramatically improving performance vs waiting for each RPC
serially.

This is referred to as 'parallel dio'.

Notes:
AIO is not supported on pipes, so we fall back to the old
sync behavior if the source or destination is a pipe.

Error handling is similar to buffered writes: We do not
wait for individual chunks, so we can get an error on an RPC
in the middle of an i/o.  The solution is to return an
error in this case, because we cannot know how many bytes
were written contiguously.  This is similar to buffered i/o
combined with fsync().

The performance improvement from this is dramatic, and
greater at larger sizes.

lfs setstripe -c 8 -S 4M .
mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
Without the patch:
write     764.85 MiB/s
read      682.87 MiB/s

With patch:
write     4030 MiB/s
read      4468  MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I7e8df7d16b131b55a235f57c3280509559f94476
Reviewed-on: https://review.whamcloud.com/39436
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9680 utils: add netlink infrastructure 30/34230/36
James Simmons [Wed, 16 Jun 2021 19:28:13 +0000 (15:28 -0400)]
LU-9680 utils: add netlink infrastructure

Netlink was designed as a successor to ioctl as defined under
RFC 3549. There are several advantages to using netlink over
ioctls or virtual file system interfaces like proc. Collecting
proc doesn't scale well which was seen with power drain on Android
phones. A netlink implementation was developed to remove this
performance hit. Details can be read at:

https://lwn.net/Articles/406975

Besides the scaling gains the other benefit is the flexiblity
with API changes. Adding or removing information to be transmitted
doesn't require creating a new interface like ioctl do. Instead
you add new code to handle the stream of attributes read from the
socket. Lastly you can multiplex data to N listeners with groups
using one request.

This patch adds netlink handling in a generic way that can be
used by the libyaml library. This greatly lowers the barrier by
only requiring the implementor to understand the libyaml API.

Change-Id: Idcdac653a1f9cc9931238e869c3beadaefcf3410
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/34230
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13716 tests: skip sanity 205b for older servers 93/43993/2
James Nunez [Sat, 12 Jun 2021 00:05:08 +0000 (18:05 -0600)]
LU-13716 tests: skip sanity 205b for older servers

Lustre job stats and sanity test 205b were modified in Lustre
version 2.13.54.91.  When we run version intop testing with
servers less than this version and clients that are greater,
the test will fail.

Skip sanity test 205b for Lustre servers with version less than
2.13.54.91 and client greater than that version.

Test-Parameters: trivial
Test-Parameters: serverdistro=el7.9 serverversion=2.12.6 env=ONLY=205 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Icc5d6a6adcf03e5bd16b678596f28590fe31516e
Reviewed-on: https://review.whamcloud.com/43993
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
3 months agoLU-14533 tests: skip sanity-pfl 0d for older servers 71/43971/3
James Nunez [Thu, 10 Jun 2021 21:05:18 +0000 (15:05 -0600)]
LU-14533 tests: skip sanity-pfl 0d for older servers

sanity-pfl test 0d was added to Lustre version 2.14.50.115.
When we run version interop testing with servers with
version less than this, the test will fail.

We should skip sanity-pfl test 0d if the Lustre server
version is less than 2.14.50.115.

Fixes: 83e38bba62 ("LU-14180 utils: verify setstripe comp_end is valid")

Test-Parameters: trivial
Test-Parameters: serverversion=2.14.0 serverdistro=el8.3 env=ONLY=0d testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I49b45c7a1e4804fece33d53a4fb946b49254de2b
Reviewed-on: https://review.whamcloud.com/43971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
3 months agoLU-14322 tests: skip sanityn 51e for old servers 69/43969/3
James Nunez [Thu, 10 Jun 2021 18:54:51 +0000 (12:54 -0600)]
LU-14322 tests: skip sanityn 51e for old servers

sanityn test 51e was added to Lustre version 2.13.54.148.
When we run version interop testing with servers less than
this version, the test will fail.

We should skip sanityn test 51e if the server version is
less than 2.13.54.148.

Fixes: 3ea729fe82 ("LU-13693 lfs: check early for MDS_OPEN_DIRECTORY")

Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 env=ONLY=51e testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Id2f165b275c97c3a1396a0da18a3f254dbe5efa7
Reviewed-on: https://review.whamcloud.com/43969
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
3 months agoLU-14755 tests: create custom pools 66/43966/2
Elena Gryaznova [Thu, 10 Jun 2021 09:51:52 +0000 (12:51 +0300)]
LU-14755 tests: create custom pools

We are interested in running some tests on fs with
the pools. The proposed enhancement allows to create
$FS_NPOOLS number of pools containing $FS_POOL_NOSTS
number of osts. If $FS_NPOOLS not set the number of
pools created is $OSTCOUNT / $FS_POOL_NOSTS.
Pools names are $FS_POOL based. Pools are not created if
FS_POOL not set.
Examples 1:
  FS_POOL=global OSTCOUNT=2
lustre.global0
OST lustre-OST0000_UUID
OST lustre-OST0001_UUID
Example 2:
  FS_POOL=global OSTCOUNT=6 FS_POOL_NOSTS=3
lustre.global0
OST lustre-OST0000_UUID
OST lustre-OST0001_UUID
OST lustre-OST0002_UUID
lustre.global1
OST lustre-OST0003_UUID
OST lustre-OST0004_UUID
OST lustre-OST0005_UUID
Example 3:
  FS_POOL=p OSTCOUNT=5 KEEP_POOLS=true FS_NPOOLS=7 FS_POOL_NOSTS=3
Pool: lustre.p0
lustre-OST0000_UUID
lustre-OST0001_UUID
lustre-OST0002_UUID
Pool: lustre.p1
lustre-OST0003_UUID
lustre-OST0004_UUID
lustre-OST0000_UUID
Pool: lustre.p2
lustre-OST0001_UUID
lustre-OST0002_UUID
lustre-OST0003_UUID
Pool: lustre.p3
lustre-OST0004_UUID
lustre-OST0000_UUID
lustre-OST0001_UUID
Pool: lustre.p4
lustre-OST0002_UUID
lustre-OST0003_UUID
lustre-OST0004_UUID
Pool: lustre.p5
lustre-OST0000_UUID
lustre-OST0001_UUID
lustre-OST0002_UUID
Pool: lustre.p6
lustre-OST0003_UUID
lustre-OST0004_UUID
lustre-OST0000_UUID

Patch adds the ability to remove all old pools at the
start if DELETE_OLD_POOLS set to true (default is false)
and the ability keep the new pools not deleted at the
end if KEEP_POOLS set to true (default is false).

Test-Parameters: trivial testlist=sanity-flr,ost-pools,ost-pools,sanity-pfl,sanity,sanityn
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-8172
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I73b72f9f39933b5b875978ce4fede5e9828c4c71
Reviewed-on: https://review.whamcloud.com/43966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14327 tests: skip sanity-sec test 55 for older servers 49/43949/5
James Nunez [Tue, 8 Jun 2021 16:34:29 +0000 (10:34 -0600)]
LU-14327 tests: skip sanity-sec test 55 for older servers

sanity-sec test 55 was added to lustre-master version
2.13.57.12 and to lustre-b2_12 version 2.12.6.3.  When
we run version interop testing with Lustre servers less
than these versions, the test will fail.  Thus, skip
sanity-sec test 55 for Lustre severs less than 2.12.6.3.

Fixes: 355787745f21 (“LU-14121 nodemap: do not force fsuid/fsgid squashing”)

Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 env=ONLY=55 testlist=sanity-sec
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ie002c921e853897105396185b38485799df31b7a
Reviewed-on: https://review.whamcloud.com/43949
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9897 utils: allow setting llverfs subdir count 47/39347/2
Andreas Dilger [Fri, 30 Aug 2019 23:19:29 +0000 (17:19 -0600)]
LU-9897 utils: allow setting llverfs subdir count

Allow specifying the subdirectory count directly rather
than calculating it based on the filesystem size.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Idcae188ef4bdb417f0f983718bce7e55093ebbe5
Reviewed-on: https://review.whamcloud.com/39347
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12982 tests: skip conf-sanity 5i for old servers 11/36811/4
James Nunez [Wed, 20 Nov 2019 22:03:11 +0000 (15:03 -0700)]
LU-12982 tests: skip conf-sanity 5i for old servers

conf-sanity tests 5i was added to lustre-master with version
2.12.54.  For all version interop testing with Lustre servers with
version less than 2.12.54 and newer clients, conf-sanity test 5i
will fail and should be skipped.

Fixes: d1b5146eda4f (LU-12206 mdt: mdt_init0 failure handling)

Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 fstype=ldiskfs env=ONLY=5 testlist=conf-sanity

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ia493b6f80b42fbd92254150e8d40a6fbb1039635
Reviewed-on: https://review.whamcloud.com/36811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14752 obdclass: handle EBUSY returned for lu_object hashtable 68/43968/4
James Simmons [Thu, 10 Jun 2021 16:53:57 +0000 (12:53 -0400)]
LU-14752 obdclass: handle EBUSY returned for lu_object hashtable

When the rhashtable grows to a certain size it will be rescaled.
When rescaling you can be returned a ENOMEM or EBUSY error. This
we reported as:

LustreError: 3594004:0:(lu_object.c:2472:lu_object_assign_fid()) ASSERTION( rc == 0 ) failed: failed hashtable insertion: rc = -16
LustreError: 3594004:0:(lu_object.c:2472:lu_object_assign_fid()) LBUG
Pid: 3594004, comm: mdt01_020 4.18.0-240.22.1.1toss.t4.x86_64 #1 SMP Tue Apr 13 17:18:40 PDT 2021
Call Trace TBD:
Kernel panic - not syncing: LBUG
...
Call Trace:
dump_stack+0x5c/0x80
panic+0xe7/0x2a9
lbug_with_loc.cold.10+0x18/0x18 [libcfs]
lu_object_assign_fid+0x3b8/0x3c0 [obdclass]

Add handling the EBUSY case for our lu_object hash.

Fixes: aff14dbc522 ("LU-8130 lu_object: convert lu_object cache to rhashtable")
Change-Id: Id85f32633117e02850b799e8d95e3e35d982cbd4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43968
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14741 obdclass: Wake up entire queue of requests on close completion 41/43941/3
Oleg Drokin [Mon, 7 Jun 2021 19:17:27 +0000 (15:17 -0400)]
LU-14741 obdclass: Wake up entire queue of requests on close completion

Since close requests could be stuck behind normal requests and get
more slots we need to wake up entire accumulated queue waiting
for the next modrpc slot or have additional waitqueue just for
close requests.

This patch goes with the former approach.

Fixes: 1fc013f901 ("LU-5319 mdc: manage number of modify RPCs in flight")
Change-Id: Ib4333c7f6731dd435364d5e5f529577a1600a235
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43941
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
3 months agoLU-13417 test: use mkdir_on_mdt0() in replay-dual 92/43492/6
Lai Siyao [Thu, 29 Apr 2021 03:51:33 +0000 (11:51 +0800)]
LU-13417 test: use mkdir_on_mdt0() in replay-dual

Replace mkdir with mkdir_on_mdt0() in replay-dual.sh if directory
needs to be created on MDT0.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-dual
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I9093e633412991571e18cb0ea264af013672bd8b
Reviewed-on: https://review.whamcloud.com/43492
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 months agoLU-13417 test: use mkdir_on_mdt0() in misc tests 91/43491/4
Lai Siyao [Thu, 29 Apr 2021 03:46:21 +0000 (11:46 +0800)]
LU-13417 test: use mkdir_on_mdt0() in misc tests

Replace mkdir with mkdir_on_mdt0() if directory needs to be created
on MDT0 in following tests:
* conf-sanity
* lustre-rsync-test
* ost-pools
* replay-ost-single
* replay-single
* replay-vbr
* sanity-hsm
* sanity-pcc
* sanity-quota
* sanity-sec

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=conf-sanity,lustre-rsync-test,ost-pools,replay-ost-single,replay-single,replay-vbr,sanity-hsm,sanity-pcc,sanity-quota,sanity-sec
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I96369f25982558a1dac7f4f7fe80a95bc1c0207d
Reviewed-on: https://review.whamcloud.com/43491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 months agoLU-13417 test: add mkdir_on_mdt0() 89/43489/7
Lai Siyao [Wed, 28 Apr 2021 14:36:24 +0000 (22:36 +0800)]
LU-13417 test: add mkdir_on_mdt0()

Once default LMV is set on ROOT, and default stripe offset is "-1",
mkdir may not create directory on MDT0, but it's a premise for many
tests. Add a function mkdir_on_mdt0() to create directory on MDT0
by "lfs mkdir -i 0".

Replace mkdir with mkdir_on_mdt0() for such tests in sanity.sh and
sanityn.sh.

Test-Parameters: trivial testlist=sanityn
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I6155d036e6b28153d0bdbdbc01088bd68ee9e0af
Reviewed-on: https://review.whamcloud.com/43489
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 months agoLU-10948 mdt: New connect flag for non-open-by-fid lock request 07/43907/4
Oleg Drokin [Thu, 3 Jun 2021 00:10:47 +0000 (20:10 -0400)]
LU-10948 mdt: New connect flag for non-open-by-fid lock request

While we removed the 2.1 check for open by fid when open
lock is requested, when you talk to old servers that don't
have that patch - they get an open error, so introduce a compat
flag.

Change-Id: I94d50ad98a2828519853a35fa90c5063adf2feab
Fixes: 41d99c4902 ("LU-10948 llite: Introduce inode open heat counter")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43907
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 months agoLU-14742 socklnd: detect link state to set fatal error on ni 52/43952/4
Serguei Smirnov [Tue, 8 Jun 2021 21:11:41 +0000 (14:11 -0700)]
LU-14742 socklnd: detect link state to set fatal error on ni

To help avoid selecting lnet ni which corresponds to a downed
ethernet link for sending, add a mechanism for detecting link
events in socklnd. On link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag, similar to o2iblnd way.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie9f4f02fcb8b988c77bf63f751d5a621e79e9f58
Reviewed-on: https://review.whamcloud.com/43952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14729 osd-ldiskfs: fix to declare write commits 94/43994/4
Wang Shilong [Mon, 14 Jun 2021 01:28:51 +0000 (09:28 +0800)]
LU-14729 osd-ldiskfs: fix to declare write commits

Fallocation might introduce unwritten extents, writting
data will trigger extents split, so we should reserve
credits for this case, to avoid complicated calculation,
we just use normal credits calculation if extent is mapped
as unwritten.

See comments in ext4:
If we add a single extent, then in the worse case, each tree
level index/leaf need to be changed in case of the tree split.
If more extents are inserted, they could cause the whole tree
split more than once, but this is really rare.

Lustre always reserve extents in 1 extent case, this is wrong.
Also fix indirect blocks calculation.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I9b67ec7b002711f040f46d0c77a645bb6f57a7de
Reviewed-on: https://review.whamcloud.com/43994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12888 tests: remove big files in sanity 11/36511/10
Alex Zhuravlev [Fri, 18 Oct 2019 04:56:52 +0000 (07:56 +0300)]
LU-12888 tests: remove big files in sanity

otherwise sanity easily fails on a local setup

Test-Parameters: trivial

Change-Id: Ia0a561e650fca05837445eebe25ff1dea15366e4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36511
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 months agoLU-14093 utils: fix DLSYM buffer over flow 38/43938/3
James Simmons [Mon, 7 Jun 2021 12:33:59 +0000 (08:33 -0400)]
LU-14093 utils: fix DLSYM buffer over flow

The 'name' string passed to DLSYM macro is created from the fsname
buffer in load_backfs_module(). That buffer is greater than 512
bytes in size but the temporary buffer in DLSYM is only 64. The
newest gcc version detect this bug.

mount_utils.c: In function ‘load_backfs_module’:
mount_utils.c:530:36: error: ‘%s’ directive output may be truncated writing up to 507 bytes into a region of size 64 [-Werror=format-truncation=]
  530 |   snprintf(_fname, sizeof(_fname), "%s_%s", prefix, #func); \
      |                                    ^~~~~~~
mount_utils.c:593:2: note: in expansion of macro ‘DLSYM’
  593 |  DLSYM(name, ops, init);

Change-Id: I8ae30a5288f236fb9272dffd40f44175e5e03ef9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43938
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14736 utils: Change leak_finder to use stdout 34/43934/3
Patrick Farrell [Sat, 5 Jun 2021 21:17:23 +0000 (17:17 -0400)]
LU-14736 utils: Change leak_finder to use stdout

It is not an error for a leak checking script to find a
leak, so don't have leak_finder.pl print to stderr.  It also
prints several pieces of basic status to stderr, for which
there is no reason at all.

This makes it easier to redirect the output for interactive
use.

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Iab226726ca4b36ada40a305962beedc363398c37
Reviewed-on: https://review.whamcloud.com/43934
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
4 months agoLU-14731 mdd: clear orphans changelog entries 01/43901/4
John L. Hammond [Wed, 2 Jun 2021 17:05:01 +0000 (12:05 -0500)]
LU-14731 mdd: clear orphans changelog entries

In mdd_changelog_llog_init(), adjust the orphan changelog index logic
to account for the case when no users are registered. Add sanity
test_160n() to verify this.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I03b0c1002a0e16f26af8ec23bf06c9a07dec858a
Reviewed-on: https://review.whamcloud.com/43901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14690 kernel: RHEL 8.4 server support 91/43791/8
Jian Yu [Fri, 4 Jun 2021 07:47:14 +0000 (00:47 -0700)]
LU-14690 kernel: RHEL 8.4 server support

This patch makes changes to support RHEL 8.4 release with
kernel 4.18.0-305.3.1.el8_4 for Lustre server.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Change-Id: I484af80c4764367b40b28ce459a6ff9d87edf3a8
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43791
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14653 tests: Correct include path for sanity-lnet test_300 00/43500/6
Chris Horn [Thu, 29 Apr 2021 17:45:56 +0000 (12:45 -0500)]
LU-14653 tests: Correct include path for sanity-lnet test_300

We need to supply an appropriate include path for sanity-lnet
test_300 when we're running in tree.

Test-Parameters: trivial testlist=sanity-lnet env=ONLY=300
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia04a713ef6f1989507a77a618328d31f74d48e0d
Reviewed-on: https://review.whamcloud.com/43500
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14649 lnet: Correct distance calculation of local NIDs 98/43498/3
Chris Horn [Wed, 28 Apr 2021 16:33:40 +0000 (11:33 -0500)]
LU-14649 lnet: Correct distance calculation of local NIDs

Multi-rail peers can have multiple local NIDs on the same net, but
LNetDist() may only identify a NID as local if it is the first one
returned by lnet_get_next_ni_locked().

We need to check all local NIs to find a match for the target NID
in LNetDist().

Add test to check LNetDist() calculation of local NIDs for a peer with
multiple NIDs on the same net.

HPE-bug-id: LUS-9964
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ic8855f7798a90972c69d89d039d0bba882d8aed1
Reviewed-on: https://review.whamcloud.com/43498
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14513 osd: release o_guard before quota acquisition 18/42018/11
Alex Zhuravlev [Fri, 12 Mar 2021 06:17:11 +0000 (09:17 +0300)]
LU-14513 osd: release o_guard before quota acquisition

to avoid deadlocks as regular transactions (like write) start
a transaction, then grab o_guard.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2678677ed6c213e4bed30cc1218e48b8f2900dc4
Reviewed-on: https://review.whamcloud.com/42018
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-930 utils: add --help option to lfs sub-commands 59/34659/21
Andreas Dilger [Tue, 21 Jan 2020 10:29:36 +0000 (03:29 -0700)]
LU-930 utils: add --help option to lfs sub-commands

Add the "--help" and "-h" options to lfs sub-commands, and
print out an error message if an invalid argument is given.
Otherwise, it is possible to get a help message but have no
idea why the command is failing (e.g. typo in argument name).

Format the usage messages consistently, using {} to indicate a
choice between multiple required parameters, putting arguments
in [] for optional parameters, and using capitalized arguments.

Update respective man pages to list "--help|-h" option.

Remove the old SETSTRIPE and GETSTRIPE checks from spelling.txt
to avoid spurious checkpatch warnings.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic583c8161d1d5380e353f43a8613dd86c93ebbe5
Reviewed-on: https://review.whamcloud.com/34659
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14489 utils: fix 'lfs find --mdt-count' 66/43866/3
Andreas Dilger [Fri, 28 May 2021 21:15:10 +0000 (15:15 -0600)]
LU-14489 utils: fix 'lfs find --mdt-count'

Running "lfs find --mdt-count" causes the find to exit if there
is no directory striping, rather than continuing to the next item.

If cb_get_dirstripe() receives ENODATA then it should consider
that directory as not having any striping and move on, rather
than returning this error to the caller.

Don't crash in cb_getdirstripe() if it is called with a NULL
directory pointer or no directory is opened.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8dd135a86a6a8911bf804542132b2e7a3ce7057
Reviewed-on: https://review.whamcloud.com/43866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12577 llog: protect partial updates from readers 89/43589/8
Alex Zhuravlev [Sun, 9 May 2021 06:32:55 +0000 (09:32 +0300)]
LU-12577 llog: protect partial updates from readers

llog_osd_write_rec() adds a record in few steps: the header is
updated first, then the record itself is appended. per-loghandle
semaphore is used, but remote readers allocate a new separate
loghandle for every access (header reading, blocks), the the
readers can't use loghandle's semaphore to avoid accessing partial
updates. use object-based locking [censored] to serialize the writer
vs the readers.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie4e4d4a1e9a6fcdea9fcca7d80b0da920e786424
Reviewed-on: https://review.whamcloud.com/43589
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13124 scrub: check for multiple linked file 94/37194/31
Hongchao Zhang [Tue, 8 Jun 2021 21:50:34 +0000 (05:50 +0800)]
LU-13124 scrub: check for multiple linked file

The files on OSTs should have only one link, but it could
have more than one link when there are some disk failures
"multiply claimed block(s)" and fixed by e2fsck to clone
these conflicted blocks. This patch adds the check of these
multiple linked files in Scrub on OST.

The name of the objects in "O" depends on the object's FID,
the directory pattern is O/[FID_SEQ]/[SUB_DIR]/[FID_OID],
the inodes of these multiple linked files are normal, but
there is only one directroy entry compatible with the object,
this patch scans all files under "O" to check whether its name
is matched with its FID.

Change-Id: I280a725939b037006935d47e9ef426a4a6a7b317
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14627 lnet: Ensure ref taken when queueing for discovery 18/43418/9
Chris Horn [Thu, 22 Apr 2021 19:51:44 +0000 (14:51 -0500)]
LU-14627 lnet: Ensure ref taken when queueing for discovery

Call lnet_peer_queue_for_discovery() in
lnet_discovery_event_handler() to ensure that we take a ref on
the peer when forcing it onto the discovery queue. This also ensures
that the peer state has LNET_PEER_DISCOVERING.

Add a test to sanity-lnet.sh that can trigger the refcount loss bug
in discovery.

HPE-bug-id: LUS-7651
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie2908668c4ffde0f993b5b7ea9aa58acd1d6fa9c
Reviewed-on: https://review.whamcloud.com/43418
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13569 tests: Check LNet Health recovery logic 23/39723/24
Chris Horn [Mon, 24 Aug 2020 21:14:07 +0000 (16:14 -0500)]
LU-13569 tests: Check LNet Health recovery logic

Add test cases to validate LNet Health recovery of local and peer
NIs.

The new test cases are added to the except list for aarch64 due to
an unresolved issue with the LNet drop functionality on that
architecture.

A few style issues are also addressed by this patch.

An asterisk was being supplied to the lctl net_drop_del commands when
this should have been the '-a' flag.

A bug in cleanup_testsuite is addressed where we were using the
wrong filename for the tmp files created by the subtests.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I965df2449770631caa03ced7726abb0ea76c17e6
Reviewed-on: https://review.whamcloud.com/39723
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13569 lnet: Add health ping stats 14/40314/12
Chris Horn [Thu, 15 Oct 2020 22:33:33 +0000 (17:33 -0500)]
LU-13569 lnet: Add health ping stats

Add the NI and peer NI ping count and next ping timestamp to
detailed output of lnetctl peer and net output.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I208cb3ea0b08a2984572cf0ec9874dbd09f6168e
Reviewed-on: https://review.whamcloud.com/40314
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14729 osd-ldiskfs: declare dirty block groups correctly 90/43890/2
Wang Shilong [Wed, 2 Jun 2021 01:52:39 +0000 (09:52 +0800)]
LU-14729 osd-ldiskfs: declare dirty block groups correctly

Calculate dirty block groups only include estimated extents,
indirect blocks and extent node/leaf blocks are missed, this
could make us short of credits.

Fixes: 0271b17b80a82 ("LU-14134 osd-ldiskfs: reduce credits for new writing")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Iec8525823b04e909c030f94bf75b8eca60d31c50
Reviewed-on: https://review.whamcloud.com/43890
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14093 llapi: remove ignored qualifier 12/43712/3
Dominique Martinet [Sat, 15 May 2021 22:32:53 +0000 (07:32 +0900)]
LU-14093 llapi: remove ignored qualifier

Fixes the following warning on newer gcc with -Wextra:
.../include/lustre/lustreapi.h:1000:1: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
 1000 | const __u16 llapi_layout_string_flags(char *string);
      | ^~~~~

As the parameter is ignored, this should make no code difference

Test-parameters: trivial

Change-Id: I049166bbc586007cdecc93225d508693607ef04e
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/43712
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14093 utils: fix format-overflow warning 11/43711/3
Dominique Martinet [Sat, 15 May 2021 22:32:47 +0000 (07:32 +0900)]
LU-14093 utils: fix format-overflow warning

Fix the following warning on gcc11 by making numbuf big enough to fit
format content.

lfs.c: In function ‘print_quota’:
lfs.c:7719:48: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
 7719 |                         sprintf(numbuf[0], "%s*", strbuf);
      |                                                ^
lfs.c:7719:25: note: ‘sprintf’ output between 2 and 33 bytes into a destination of size 32
 7719 |                         sprintf(numbuf[0], "%s*", strbuf);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-parameters: trivial

Change-Id: I021e6ffff2e1405eadbe689f718674af4d4d6376
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/43711
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
4 months agoLU-13811 client: don't panic for mgs evictions 55/43655/3
Alexander Boyko [Tue, 11 May 2021 09:33:36 +0000 (05:33 -0400)]
LU-13811 client: don't panic for mgs evictions

Avoid client panics for MGS evictions.
Create a function to check if the eviction is coming
from an MGS, and if so to ignore it.

Rework dump_on_eviction and lbug_on_eviction so
all logic is handled in one place.

Test-Parameters: trivial
HPE-bug-id: LUS-197
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Signed-off-by: Ben Evans <jevans@cray.com>
Change-Id: Iaa8b06f52fa22ac891b569bc8a2271c8e1e63a3b
Reviewed-on: https://review.whamcloud.com/43655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13971 quota: report Pool Quotas for a user 75/39975/12
Sergey Cheremencev [Fri, 18 Sep 2020 13:24:10 +0000 (16:24 +0300)]
LU-13971 quota: report Pool Quotas for a user

Patch adds ability to show quota limits and usage
from all pools per user. Since this patch
long option --pool without argument results
in printing Pool Quotas for all known pools:
lfs quota -u quota_usr --pool /mnt/testfs
Pools from lustre:
Quotas for pool: qpool1
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0   10240       -       0       0       0       -
Quotas for pool: qpool2
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0   20480       -       0       0       0       -

To get information for specific pool you still
need to set pool name after --pool:
lfs quota -u quota_usr --pool flash /mnt/testfs

Patch also adds sanity-quota_74 to check new
feature.

Test-Parameters: trivial testlist=sanity-quota
HPE-bug-id: LUS-8720
Change-Id: Ib918eef84c2352946ce13342471f36e2b500df32
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39975
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13419 osc: Move shrink update to per-write 14/38214/6
Patrick Farrell [Mon, 13 Apr 2020 16:23:42 +0000 (11:23 -0500)]
LU-13419 osc: Move shrink update to per-write

Updating the grant shrink interval is currently done for
each page submitted, rather than once per write.  Since
the grant shrink interval is in seconds, this is
unnecessary.

This came up because this function showed up in the perf
traces for https://review.whamcloud.com/#/c/38151/, and
it is called with the cl_loi_list_lock held.

Note that this change makes this access to the grant shrink
interval a 'dirty' access, without locking, but the grant
shrink interval is:
A) Already accessed like this in various places, and
B) can safely be out of date or suffer a lost update
without affecting correctness or performance.

IOR performance testing with this test:
mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

No patches:
5942 MiB/s
With 38151:
14950 MiB/s
With 38151+this:
15320 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I8110b3c2570c183d58be2bccdbf76813ea3e373a
Reviewed-on: https://review.whamcloud.com/38214
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14732 utils: ensure hsm_user_request extent fields are zero 93/43893/3
James Simmons [Tue, 1 Jun 2021 23:00:58 +0000 (19:00 -0400)]
LU-14732 utils: ensure hsm_user_request extent fields are zero

Another patch changed the linking flags for liblustreapi which
caused sanit-hsm 29c to fail. Debugging revealed the reason was
the extent fields in hsm_user_request sent to the kernel was
not zeroed out. For some reason the compiler flags hide this
bug. I found this is easily reproduced by removing
lhsmtool_posix_DEPENDENCIES which is not needed anyways since
the build system can calculate those dependencies for us.
Update the function llapi_hsm_user_request_alloc to zero out
the struct hsm_user_request allocated.

Change-Id: I02c1d6b5cec1ed3e89086a6a00e30ca81768409c
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43893
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10350 lod: adjust stripe count to available ost count 82/43882/4
Bobi Jam [Fri, 28 May 2021 08:25:52 +0000 (16:25 +0800)]
LU-10350 lod: adjust stripe count to available ost count

* When user specifies -1 stripe count or more stripe count than the
  ost count of a pool, we'd adjust the stripe count otherwise we
  cannot alloc enough stripe objects, as LOD reports as follows:

  lod_alloc_specific() can't lstripe objid [obj_fid]: have %d want %u

  where %d is the ost count of a pool, and %u is the total ost count
  if user specifies -1 stripe count of a bigger stripe count value
  than %d as user specifies.

* In ost-pool.sh, reset $MOUNT's stripe offset, so that the created
  diretory will not inherit it from root directory.

* Preserve the root directory layout in replay-single (run before
  ost-pools) to avoid leaving a bad layout on the root dir.
  Lustre-change: https://review.whamcloud.com/43872

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idf6884faf1271a3864710aeab0ba0eca154bf492
Reviewed-on: https://review.whamcloud.com/43882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14711 osc: Notify server if cache discard takes a long time 57/43857/7
Oleg Drokin [Fri, 28 May 2021 02:34:44 +0000 (22:34 -0400)]
LU-14711 osc: Notify server if cache discard takes a long time

Discarding a large number of pages from a mapping under a
single lock can take a really long time (750GB is over 170s).
Since there is no stream of RPCs sent to the server as with
read or write to prolong the DLM lock timeout, the server
may evict the client as it does not see progress is being made.

As such send periodic "empty" RPCs to the server to show the
client is still alive and working on the pages under the lock.

For compatibility reasons the RPC is formed as a one-byte
OST_READ request with a special flag set to avoid doing
actual IO, but older servers actually do the one-byte read

Change-Id: I4603c83e92c328d93e29adce8cbfac3d561b25d5
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43857
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
4 months agoLU-14721 tests: wait_destroy_complete should check MDTs 70/43870/3
Oleg Drokin [Sat, 29 May 2021 03:45:20 +0000 (23:45 -0400)]
LU-14721 tests: wait_destroy_complete should check MDTs

Ever since destroys handling was moved to MDTs we need to
move waiting for destroys completion to MDTs as well.

Change-Id: I31440ec048b960206a903387d7050aa13e45008d
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43870
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoNew tag 2.14.52 2.14.52 v2_14_52
Oleg Drokin [Fri, 11 Jun 2021 17:09:09 +0000 (13:09 -0400)]
New tag 2.14.52

Change-Id: I9882f84941588ab2a92f1d736559a6e903b32d49
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13783 procfs: fix improper prop_ops fields 80/43880/3
James Simmons [Mon, 31 May 2021 17:05:50 +0000 (13:05 -0400)]
LU-13783 procfs: fix improper prop_ops fields

The lod pool and nodemap proc_ops missed renaming the fields to
start with .proc_*. On newer distros like Ubuntu 20.04 HWE you
get the following compile error:

lustre-release/lustre/ptlrpc/nodemap_lproc.c:686:3: error: ‘const struct proc_ops’ has no member named ‘open’
  686 |  .open   = nodemap_ranges_open,

Test-Parameters: trivial
Fixes: 13cd0f9f667 ("LU-13344 libcfs: Abstract proc_fs with proc_ops")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I5fff7519a801f585690d468255f7ca6c73adcc90
Reviewed-on: https://review.whamcloud.com/43880
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14725 tests: sanity/27Q to remove own symlink in /tmp 75/43875/5
Alex Zhuravlev [Sun, 30 May 2021 15:36:39 +0000 (18:36 +0300)]
LU-14725 tests: sanity/27Q to remove own symlink in /tmp

otherwise any subsequent restart of MDS/MGS on a local setup
with ZFS backend gets stuck as zpool import scans /tmp and
stat's every found file.

Test-Parameters: trivial
Fixes: cd4caef54f ("LU-14583 llapi: handle symlinks in llapi_file_get_stripe()")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2eb4cb8819670acef0302e1fe5ab767be7f46842
Reviewed-on: https://review.whamcloud.com/43875
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14607 osp: separate buffer for large XATTR 36/43736/6
Lai Siyao [Wed, 19 May 2021 02:58:19 +0000 (10:58 +0800)]
LU-14607 osp: separate buffer for large XATTR

Once XATTR is too large to fit into PAGE_SIZE, allocate value in a
separate buffer for osp_xattr_entry.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ied090ff73e2e5cdeaf2d91a3670067210f2ab1d7
Reviewed-on: https://review.whamcloud.com/43736
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14661 lnet: Check if discovery toggled off in ping reply 08/43508/4
Chris Horn [Wed, 27 Jan 2021 18:22:09 +0000 (12:22 -0600)]
LU-14661 lnet: Check if discovery toggled off in ping reply

If a peer is initially discovered and found to have discovery
enabled, but the peer later reloads LNet with discovery disabled,
then we can delete the peer and re-create it the next time the peer
is discovered.

It is safe to delete and re-create the peer as long as it wasn't
configured manually.

In lnet_peer_deletion(), we need to use lnet_del_init() when removing
the peer from the discovery queue because the lnet_peer_del() code
path can result in a call to lnet_peer_queue_for_discovery() where
we check if the lp_dc_list is empty.

Test-Parameters: trivial
HPE-bug-id: LUS-9178
Fixes: aa7de0af69 ("LU-13895 lnet: Prevent discovery on peer marked deletion")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0b43d7541711a3b94c492082d4a29487ebe72b09
Reviewed-on: https://review.whamcloud.com/43508
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14660 lnet: Fix destination NID for discovery PUSH 07/43507/2
Chris Horn [Fri, 29 Jan 2021 14:08:08 +0000 (17:08 +0300)]
LU-14660 lnet: Fix destination NID for discovery PUSH

If we're sending a discovery PUSH after receiving a discovery
REPLY then we want to send via the same NID that the reply was
sent to. This introduces a challenge in selecting an appropriate
destination NID for the PUSH because lnet_select_pathway() will not
run the MR selection algorithm for choosing a peer NI if the source
NI has been specified.

It is reasonable to assume that the NID used by the message
originator in sending the REPLY is a suitable destination for the
discovery PUSH. Thus, we record this NID in the same location we
currently record the lp_disc_src_nid, and use it when sending the
PUSH. With this change, the only other user of lnet_peer_select_nid()
is lnet_peer_send_ping(). In the ping case we do not set a source NID,
so lnet_select_pathway() is free to choose any peer NI. So this change
allows us to get rid of lnet_peer_select_nid() altogether.

Alternatively, we would need to reproduce a lot of the path selection
algorithm inside lnet_peer_select_nid() in order to avoid sending to
unhealthy NIDs. It seems undesirable and unnecessary to duplicate that
logic.

Test-Parameters: trivial
HPE-bug-id: LUS-9333
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I47ef856075f049d71c395565974204b8f6fa9003
Reviewed-on: https://review.whamcloud.com/43507
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14432 misc: update e2fsprogs to 1.46.2.wc1 69/43469/2
Li Dongyang [Tue, 27 Apr 2021 23:31:59 +0000 (09:31 +1000)]
LU-14432 misc: update e2fsprogs to 1.46.2.wc1

Update Changelog for the new e2fsprogs release.

Change-Id: I173c43f1c777b7223a56841a06545c1741e1a903
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/43469
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
4 months agoLU-11839 iokit: Fix help message 14/43114/2
Arshad Hussain [Thu, 25 Mar 2021 10:50:59 +0000 (16:20 +0530)]
LU-11839 iokit: Fix help message

This patch fixes help message of iokit-gather-stats
to properly add "--help" instead of "-help". Two
hyphen/dashes are expected by getopts.

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I64270598fc19377571b68066d617b50fcb48cc12
Reviewed-on: https://review.whamcloud.com/43114
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14544 tests: duplicated export entries 30/42130/5
Elena Gryaznova [Mon, 22 Mar 2021 16:55:02 +0000 (19:55 +0300)]
LU-14544 tests: duplicated export entries

File /etc/exports could have $MNTPNT entry if
previous NFS tests interrupted.
With duplicated entries in /etc/exports nfs server
service fails to start/restart in RHEL 8 and SLES15.
Patch cleanups exports file before adding $MNTPNT entry.

Test-Parameters: trivial     clientdistro=el8.3 serverdistro=el8.3     testlist=parallel-scale-nfsv3,parallel-scale-nfsv4
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-6291
Reviewed-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I738bd0e8e79dc1ba84e6aa70e06fa47c49a935e0
Reviewed-on: https://review.whamcloud.com/42130
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14274 tests: enhance racer to set extra layout 77/41077/3
Elena Gryaznova [Mon, 24 May 2021 16:45:12 +0000 (19:45 +0300)]
LU-14274 tests: enhance racer to set extra layout

Patch adds an ability to set:
  - a generic "RACER_EXTRA_LAYOUT" contained any kind of
    layout in addition to layouts defined for
    RACER_ENABLE_*;
  - an initial racer RACER_PROGS commands list.
    The additional commands specified by RACER_EXTRA,
    RACER_ENABLE_REMOTE_DIRS, RACER_ENABLE_STRIPED_DIRS
    and RACER_ENABLE_MIGRATION are not ignored, i.e. the
    following parameters are to be set to run file_create only:
        RACER_ENABLE_REMOTE_DIRS=false
        RACER_ENABLE_STRIPED_DIRS=false
        RACER_ENABLE_MIGRATION=false.
Patch fixes NUM_THREADS and MAX_FILES to be passed correctly.

Test-Parameters: testlist=racer
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9142
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I9994dfdb0555a3acd75daa4cfd27a0cb62074e36
Reviewed-on: https://review.whamcloud.com/41077
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14138 ptlrpc: move more members in PTLRPC request into pill 69/40669/11
Qian Yingjin [Tue, 17 Nov 2020 15:12:44 +0000 (23:12 +0800)]
LU-14138 ptlrpc: move more members in PTLRPC request into pill

Some data members in the data structure @ptlrpc_request can be
moved into the data structure @rep_capsule:
/** Request message - what client sent */
struct lustre_msg *rq_reqmsg;
/** Reply message - server response */
struct lustre_msg *rq_repmsg;
/** Fields that help to see if request and reply were swabbed */
__u32 rq_req_swab_mask;
__u32 rq_rep_swab_mask;

After these data structures are reconstructed, @rep_capsule can
be more common used and it makes pack and unpack sub requests
in a batch PtlRPC request for the coming batch metadata processing
more easily.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib6d942b79ebf1a444d63b55ad4bc94813cf947c7
Reviewed-on: https://review.whamcloud.com/40669
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-14459 lmv: change default hash type to crush 84/43684/5
Andreas Dilger [Thu, 13 May 2021 01:20:04 +0000 (19:20 -0600)]
LU-14459 lmv: change default hash type to crush

Change the default hash type to CRUSH to minimize the number
of directory entries that need to be migrated.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I75aff45898044be9d12ae1bfad31b4693b3ebbe5
Reviewed-on: https://review.whamcloud.com/43684
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14690 kernel: new kernel [RHEL 8.4 4.18.0-305.3.1.el8_4] 25/43725/6
Jian Yu [Fri, 4 Jun 2021 07:37:18 +0000 (00:37 -0700)]
LU-14690 kernel: new kernel [RHEL 8.4 4.18.0-305.3.1.el8_4]

This patch makes changes to support new RHEL 8.4 release
for Lustre client.

Test-Parameters: trivial clientdistro=el8.4

Change-Id: I47d4706f9175d489ef0e6226492af20f44f0677e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43725
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14703 build: fixup lustre-libcfs.m4 comments 76/43776/2
Olaf Faaland [Tue, 25 May 2021 02:10:36 +0000 (19:10 -0700)]
LU-14703 build: fixup lustre-libcfs.m4 comments

Several macros whose ends are commented to identify the macro being
defined, like

.. # LIBCFS_FUBAR

have the wrong macro named in the comment.  Fix those
end comments so they match the opening comment or the
AC_DEFUN correctly.

Test-Parameters: trivial
Change-Id: Ia40ccb9e271e90306df37d0028734a84684e42ef
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/43776
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14682 tests: sanity-flr to remove temporary files 69/43669/4
Alex Zhuravlev [Wed, 12 May 2021 04:33:02 +0000 (07:33 +0300)]
LU-14682 tests: sanity-flr to remove temporary files

otherwise the test fails frequently on small local systems due
to lack of space.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7076bcf2346ae1ec7a4d1bead3d94b2c4bb57bbf
Reviewed-on: https://review.whamcloud.com/43669
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14665 lnet: simplify lnet_ni_add_interface 25/43525/12
Olaf Faaland [Tue, 4 May 2021 02:40:22 +0000 (19:40 -0700)]
LU-14665 lnet: simplify lnet_ni_add_interface

Remove an unnecessary counter and move the comment before
the relevant code.  Improve error messages.

Test-parameters: trivial

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Iffc7a128b16bc1b2be7a44413a5972c97b12a5fa
Reviewed-on: https://review.whamcloud.com/43525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13284 tests: few tests miss MDS_MOUNT_OPTS/OST_MOUNT_OPTS 69/37669/18
Alex Zhuravlev [Thu, 20 Feb 2020 11:57:08 +0000 (14:57 +0300)]
LU-13284 tests: few tests miss MDS_MOUNT_OPTS/OST_MOUNT_OPTS

Some tests mount servers without MDS_MOUNT_OPTS or OST_MOUNT_OPTS,
then localrecov mount option is lost and subsequent tests may fail
in a local testing environment.

Fixes: 8bd04b4e57 ("LU-12722 target: disable recovery for local clients")

Change-Id: I4e5d3a8678d027809ea9a0d129fbfbc8c6beae09
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37669
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14702 osc: cleanup comment in osc_object_is_contended 75/43775/3
Li Xi [Tue, 25 May 2021 00:49:01 +0000 (08:49 +0800)]
LU-14702 osc: cleanup comment in osc_object_is_contended

ll_file_is_contended() does not exist any more, so the comment
is invalid.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: Ib68e8dc885e6812065c076d36dc61938a30d6980
Reviewed-on: https://review.whamcloud.com/43775
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14701 tests: wrong get[set]_osd_param() call 74/43774/2
Elena Gryaznova [Mon, 24 May 2021 15:14:26 +0000 (18:14 +0300)]
LU-14701 tests: wrong get[set]_osd_param() call

sanity-dom:sanityn:test_19 is always skipped because of
get_osd_param() is called incorrectly for DOM=yes.
For osd-* MDT device is to be used instead of default
  device=${2:-$FSNAME-OST*}

Fixes: a7625cd2f37a ("LU-3285 test: add Data-on-MDT tests and fixes")
Test-Parameters: trivial testlist=sanity-dom env=ONLY=sanityn
Test-Parameters: trivial testlist=sanityn env=ONLY=19
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9965
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I2bb9fc7fbaac966ea2254071e7ea82b963a93ad3
Reviewed-on: https://review.whamcloud.com/43774
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14693 mdt: skip DLM when opening volatile files 42/43742/2
John L. Hammond [Wed, 28 Apr 2021 18:43:51 +0000 (13:43 -0500)]
LU-14693 mdt: skip DLM when opening volatile files

In mdt_reint_open(), when opening a volatile file skip taking a
MDS_INODELOCK_UPDATE lock on the parent directory.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8ee89710f52e8097e1412897de91159702560e4a
Reviewed-on: https://review.whamcloud.com/43742
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13055 libcfs: allow comma-separated masks 41/43741/4
Andreas Dilger [Wed, 19 May 2021 07:44:32 +0000 (01:44 -0600)]
LU-13055 libcfs: allow comma-separated masks

For debug and changelog mask names, allow a comma-separated list
of names to be given, so that the space-separated list does not
need to be quoted for use.

Change sanity-quota to use a comma-separated list to verify it works.

Fix a couple of test cases where the debug parameter is set and
printed overly verbosely during tests.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icf1e3ebc74f0e48b38a65486b2275ec4c33ebbe5
Reviewed-on: https://review.whamcloud.com/43741
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14688 mdt: changelog purge deletes plain llog 19/43719/2
Alexander Boyko [Mon, 17 May 2021 13:29:01 +0000 (09:29 -0400)]
LU-14688 mdt: changelog purge deletes plain llog

With a massive cancel records changelog could delete a plain
llog file and skip one by one record cancelling.
Also patch fixes the race between llog_destroy and llog_next_block.

HPE-bug-id: LUS-9950
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I47c2ed97945e979745255381f83b6a417d7ba8b1
Reviewed-on: https://review.whamcloud.com/43719
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11188 lfs: add "--perm" option to "lfs find" 15/43715/4
Courrier Guillaume [Thu, 29 Apr 2021 09:35:01 +0000 (11:35 +0200)]
LU-11188 lfs: add "--perm" option to "lfs find"

Add support for "--perm" option to "lfs find".
The option supports both octal and symbolic representation and
follows the POSIX standard.
As for GNU find, it supports '-' and '/' modifiers before the
permission.

Signed-off-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Change-Id: I8e1292421986c3a4bde686f3c7dc7bfcb679cabc
Reviewed-on: https://review.whamcloud.com/43715
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
4 months agoLU-9537 utils: implement "lfs getstripe --fid" for directories 14/43714/4
Yoann Valeri [Wed, 12 May 2021 09:48:04 +0000 (11:48 +0200)]
LU-9537 utils: implement "lfs getstripe --fid" for directories

Enhance the lfs command by displaying a directory fid when using "lfs
getstripe --fid" on one.

When displaying information through "lfs getstripe --fid", we would
check if the given path was associated to a directory or not. If so,
the fid display would just be skipped, showing a simple blank line.
However, a user could still find the fid of a directory by using "lfs
path2fid" on the same directory.  Therefore, this patch adds a hook to
the underlying "llapi_fd2fid()" (called internally by "lfs path2fid")
when trying to display a directory fid with "lfs getstripe --fid".

Signed-off-by: Yoann Valeri <yoann.valeri@cea.fr>
Change-Id: Ia153717e3feb1a359b8b54297995365fc34a1c29
Reviewed-on: https://review.whamcloud.com/43714
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12678 lnet: use list_for_each_entry() 91/43591/4
James Simmons [Mon, 10 May 2021 20:10:29 +0000 (16:10 -0400)]
LU-12678 lnet: use list_for_each_entry()

Several loops use list_for_each(), then call list_entry()
each time in the loop This complexity can be replaced with
the use of  list_for_each_entry().

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ib7968466c4fce5173b20cbaf6c878975ba522d43
Reviewed-on: https://review.whamcloud.com/43591
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12836 osd-zfs: Catch all ZFS pool change events 52/43552/3
Tony Hutter [Fri, 12 Mar 2021 01:23:16 +0000 (17:23 -0800)]
LU-12836 osd-zfs: Catch all ZFS pool change events

This change adds the following symlinks:

  vdev_attach-lustre -> statechange-lustre.sh
  vdev_remove-lustre -> statechange-lustre.sh
  vdev_clear-lustre -> statechange-lustre.sh

This makes it so the statechange-lustre.sh script is also called on
all ZFS events that could change the pool state.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Change-Id: I18edc86749e8ab91bb45f21aafd3fd47e78cbaef
Reviewed-on: https://review.whamcloud.com/43552
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14663 mdc: start changelog thread upon first access 13/43513/5
Alex Zhuravlev [Sun, 2 May 2021 09:16:01 +0000 (12:16 +0300)]
LU-14663 mdc: start changelog thread upon first access

thus leaving the caller a chance to set CHANGELOG_FLAG_FOLLOW,
otherwise the thread (started from open()) can reach the end
of the changelog and exit early.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic14b6c991010bbe5197b5a8b0fedf0f4007e98c1
Reviewed-on: https://review.whamcloud.com/43513
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13717 sec: limit hard links to linkEA size for enc files 87/43387/4
Sebastien Buisson [Mon, 19 Oct 2020 14:23:05 +0000 (23:23 +0900)]
LU-13717 sec: limit hard links to linkEA size for enc files

Some operations on encrypted files require to identify all names for
files having the same FID. For instance, for lookup, getattr or unlink
on encrypted files without the encryption key, we need to perform an
operation by FID instead of the actual name.
In order to make operations by FID unambiguous on server side, we
decide to limit the number of possible hard links for encrypted files,
to what the linkEA can contain.
Currently linkEA stores 4KiB of links, that is 14 NAME_MAX links, or
119 16-byte names.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I20a01874899f95b2ff61e05b2aa6851d135633e8
Reviewed-on: https://review.whamcloud.com/43387
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14629 sec: forbid file rename from enc to unencrypted dir 04/43404/6
Sebastien Buisson [Thu, 22 Apr 2021 09:26:51 +0000 (11:26 +0200)]
LU-14629 sec: forbid file rename from enc to unencrypted dir

fscrypt allows renaming an encrypted file from an encrypted directory
into an unencrypted directory. But it leaves the file encrypted,
sitting in an unencrypted directory, which can lead to unexpected
issues.
So just prevent this kind of rename, and adapt sanity-sec test_47
accordingly.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I38e17caa4786c1c8d80a363a826a5aa298eb0980
Reviewed-on: https://review.whamcloud.com/43404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14586 tests: set mpi np correctly 17/43217/2
Elena Gryaznova [Tue, 6 Apr 2021 08:05:02 +0000 (11:05 +0300)]
LU-14586 tests: set mpi np correctly

The number of mpi processes is to be calculated
based on the number of clients in clients subset.

Fixes: 9ecb000 ("LU-13281 tests: ha.sh improvements")
Test-Parameters: trivial
Signed-off-by: Elenai Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9716
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: If574743e2e29a309a8d7a021056fa726495fa959
Reviewed-on: https://review.whamcloud.com/43217
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14340 tests: remove stale test-framework functions 66/41266/3
Andreas Dilger [Tue, 19 Jan 2021 04:43:57 +0000 (21:43 -0700)]
LU-14340 tests: remove stale test-framework functions

Delete functions not referenced anywhere in test-framework.sh
(each one will still appear only once, where it is defined):

  $ grep "^[a-z].* *()" lustre/tests/test-framework.sh |
  sed -e 's/function //' -e 's/ *(.*//' |
  while read F; do (( $(grep $F lustre/tests/*.sh | wc -l) > 1 )) ||
      echo "$F"
  done

  mdsdevlabel ostdevlabel cleanup_check obd_name unmount_zfs
  is_empty_fs get_svr_devs at_min_get canonical_path agts_nodes
  mixed_ost_devs setstripe_nfsserver delayed_recovery_enabled
  mds_on_old_device remove_ost_objects remove_mdt_files
  duplicate_mdt_files get_block_size

The unmount_zfs() function returned by the above check *is*
used via unmount_fstype() calling it as "unmount_$fstype".

Fixes: 5a3dfc2b5d90 ("LU-7301 tests: delete old lfsck tests")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I71842152a87f15918147da860745ef8e981f6121
Reviewed-on: https://review.whamcloud.com/41266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13950 lnet: do not crash if lnet_sock_getaddr returns error 34/39834/6
Artem Blagodarenko [Tue, 25 Aug 2020 10:01:11 +0000 (06:01 -0400)]
LU-13950 lnet: do not crash if lnet_sock_getaddr returns error

Some issues with network lead to panic in ksocknal_accept

rc = lnet_sock_getaddr(sock, true, &peer_ip, &peer_port);
LASSERT(rc == 0); /* we succeeded before */

Let's pass this error to the caller.

Change-Id: I34d43c19b4e75422db50e7abb02cac3510882b0d
hpe-bug-id: LUS-9256
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157753
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/39834
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14687 llite: Return errors for aio 22/43722/7
Patrick Farrell [Wed, 19 May 2021 18:08:57 +0000 (14:08 -0400)]
LU-14687 llite: Return errors for aio

The aio code incorrectly discards errors from
ll_direct_rw_pages.  Fix this and add a test for this.

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I49dadd0b3692820687fa6a1339e00516edf7a5d5
Reviewed-on: https://review.whamcloud.com/43722
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-11290 osc: Batch gang_lookup cbs 89/33089/10
Patrick Farrell [Wed, 3 Mar 2021 06:50:04 +0000 (09:50 +0300)]
LU-11290 osc: Batch gang_lookup cbs

The osc_page_gang_lookup call backs can be trivially
converted to operate in batches rather than one page at a
time.  This improves cancellation time for locks protecting
large numbers of pages by about 10% (after landing
another optimization (LU-11290 ldlm: page discard speedup)
it shows 6% for canceling a lock for 30GB cached file ).

Truncate to zero time (with one lock protecting many pages)
was improved by about 5-10% as well.  Lock weighing
performance should be improved slightly as well, but is
tricky to benchmark.

HPE-bug-id: LUS-6432
Change-Id: Ib30594ae97182cbeb18051d6cee860c97ae7e119
Signed-off-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/33089
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14273 tests: enhance ha.sh to run custom cmd on bg 73/41073/2
Elena Gryaznova [Tue, 22 Dec 2020 15:19:15 +0000 (18:19 +0300)]
LU-14273 tests: enhance ha.sh to run custom cmd on bg

We need this ability to run lfs migrate in parallel
with mdtest and IOR.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9371
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: Id52ed0731aba24d3f40813da5fd2bb9b94ae63e5
Reviewed-on: https://review.whamcloud.com/41073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-13942 obd: check if sbi->ll_md_exp is initialized 12/39812/6
Artem Blagodarenko [Fri, 21 Aug 2020 17:43:38 +0000 (13:43 -0400)]
LU-13942 obd: check if sbi->ll_md_exp is initialized

Null reference at the start of obd_statfs() function is possible
because of ll_fill_super vs lctl race.

ll_md_exp is initialized in ll_fill_super()->
client_common_fill_super(), but if mount process stucks
in lustre_process_log() it doesn't reach client_common_fill_super().

Change-Id: Ife72a62ba42573e2a9c6d244e36cde738b70c15a
hpe-bug-id: LUS-9150
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157732
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/39812
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-14642 flr: transfer layout version on layout change 72/43472/7
Bobi Jam [Wed, 28 Apr 2021 05:16:05 +0000 (13:16 +0800)]
LU-14642 flr: transfer layout version on layout change

After layout changed (mirror extend/split), the file's layout version
needs to transfer to OST ASAP so that following IO won't be blocked
since OFD will check its layout version stored in the xattr
XATTR_NAME_FID and find that the layout version from the client IO is
bigger (ofd_verify_layout_version()).

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I353800e868eaf13e3c795926b0d76fb1eb45c535
Reviewed-on: https://review.whamcloud.com/43472
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14645 utils: setstripe cleanup 65/43465/3
Vitaly Fertman [Tue, 27 Apr 2021 19:15:30 +0000 (22:15 +0300)]
LU-14645 utils: setstripe cleanup

lfs setstripe checks stripe parameters differently for PFL and !PFL
layouts. Whereas the PFL layout is checked in comp_args_to_layout()
individually and in llapi_layout_sanity_cb() in pairs, !PFL layout
verification is done partially in several places. Create a common
llapi_stripe_param_verify() for this purpose. Make the checks for
both cases symmetric.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I456b1b2e876229ac1a354d4e3879624325856574
HPE-bug-id: LUS-9886
Reviewed-on: https://es-gerrit.dev.cray.com/158589
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/43465
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14644 vvp: wait for nrpages to be updated 64/43464/2
Vitaly Fertman [Tue, 27 Apr 2021 18:43:06 +0000 (21:43 +0300)]
LU-14644 vvp: wait for nrpages to be updated

truncate_inode_pages() says there still may be a page in a process
of deletion upon return. wait for another thread which is doing
__delete_from_page_cache() to get nrpages updated.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I165b3d0866efaf2eb7e977520ebba4ee831874ab
HPE-bug-id: LUS-8842
Reviewed-on: https://es-gerrit.dev.cray.com/158557
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/43464
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14594 ptlrpc: do not match reply with resent RPC 42/43242/4
Vitaly Fertman [Thu, 8 Apr 2021 12:00:11 +0000 (15:00 +0300)]
LU-14594 ptlrpc: do not match reply with resent RPC

The server is able to filter by the connection ID, and drop late
coming RPCs of previous connections, however it does not happen for
replies. At the same time, this is a problem in some cases.

Allocate new matchbits for resends and check replies by them, instead
of xid. Connect RPCs are exceptions due to interop with old server -
at the time of connect we do not know yet if the server supports it.

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I2aad037002b488b0c3371544ede0c47940f87efe
HPE-bug-id: LUS-9596
Reviewed-on: https://es-gerrit.dev.cray.com/158446
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/43242
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14673 sec: annotate algorithms taking optional key 56/43656/3
Sebastien Buisson [Tue, 11 May 2021 08:59:03 +0000 (10:59 +0200)]
LU-14673 sec: annotate algorithms taking optional key

Crypto algorithms implementing a ->setkey() method but that can also
be used without a key must set the CRYPTO_ALG_OPTIONAL_KEY flag if
defined in the kernel.
In Lustre, adler32 implementation defines a ->setkey() method, but
its "key" is not actually a cryptographic key.

Linux-commit: a208fa8f33031b9e0aba44c7d1b7e68eb0cbd29e

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I362211d1b1aa3763fe1481cebb3629b255f29e41
Reviewed-on: https://review.whamcloud.com/43656
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-14689 hsm: starting running HSM coordinator should success 20/43720/2
Li Xi [Mon, 17 May 2021 14:49:55 +0000 (22:49 +0800)]
LU-14689 hsm: starting running HSM coordinator should success

When starting a running coordinator, the command should succeed
no matter how many times the command runs.

And this should be the same for stopping a stopped coordinator.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I99169de35d6fcc11e03604ac63cdc4358e25b3d2
Reviewed-on: https://review.whamcloud.com/43720
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14549 llite: refresh layout after mirror merge/split 16/43716/3
Bobi Jam [Mon, 17 May 2021 09:14:33 +0000 (17:14 +0800)]
LU-14549 llite: refresh layout after mirror merge/split

mirror merge/split updates file's LOVEA and revokes client's layout
lock, but the client issuing the layout change needs to refresh its
layout (lov->lsm) as well.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7671efe2fe5354ba0e1503b146045165608e042c
Reviewed-on: https://review.whamcloud.com/43716
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14430 mdd: use own rec_hdr for changelog declare 83/43683/3
Andreas Dilger [Thu, 13 May 2021 00:41:47 +0000 (18:41 -0600)]
LU-14430 mdd: use own rec_hdr for changelog declare

Do not use an lu_buf just to declare the changelog record.  This
only needs llog_rec_hdr to pass in lrh_len, so declaring rec_hdr
on the stack avoids the overhead of using the lu_buf.

Fixes: f3d03bc38a ("LU-14430 mdd: fix inheritance of big default ACLs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b6f1d761aa98aa6ecb023894bde03dce23ebbe5
Reviewed-on: https://review.whamcloud.com/43683
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14681 obclass: fix typo of comment on job ID 67/43667/3
Li Xi [Wed, 12 May 2021 02:21:16 +0000 (10:21 +0800)]
LU-14681 obclass: fix typo of comment on job ID

Comments on how to get job ID are confusing because of the typo.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I9d714323f106dfb76eafc8d70346409b38a9b66b
Reviewed-on: https://review.whamcloud.com/43667
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>