Whamcloud - gitweb
fs/lustre-release.git
2 months agoNew tag 2.14.53 2.14.53 v2_14_53
Oleg Drokin [Fri, 30 Jul 2021 19:20:13 +0000 (15:20 -0400)]
New tag 2.14.53

Change-Id: Idff781cb6333d4f0af90c0e729f3cd0231022a5a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag 13/40813/4
Qian Yingjin [Tue, 1 Dec 2020 08:22:08 +0000 (16:22 +0800)]
LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag

The upcoming new feature PCC-RO is combined with FLR and extend
the on-disk data strucutre 'enum lov_comp_md_flags' for layout
components. It adds a new layout flag: LCM_FL_PCC_RDONLY.

enum lov_comp_md_flags {
LCM_FL_NONE = 0x0,
LCM_FL_RDONLY = 0x1,
LCM_FL_WRITE_PENDING = 0x2,
LCM_FL_SYNC_PENDING = 0x3,
LCM_FL_PCC_RDONLY = 0x8,
LCM_FL_FLR_MASK         = 0xB,
};

The LCM_FL_PCC_RDONLY flag, which is dedicated for PCC-RO, is
different from LCM_FL_RDONLY.
A PCC-RO cached file could be in the state:
- LCM_FL_PCC_RDONLY | LCM_FL_RDONLY: it means that all FLR
  components are synced and in up-to-date state. The replicated
  file is on read-only state. And then one client attaches the
  file into the PCC backend with PCC-RO mode.
- LCM_FL_PCC_RDONLY | LCM_FL_WRITE_PENDING: it means the file was
  once modified, the data content of layout components are not
  synced. MDT has already picked a promary replica and marked
  other components as STALE. At this time, a client can still
  PCC-RO attach the file. On this client, the primary component
  and the PCC copy are both in up-to-date state.

As a new LCM_FL_PCC_RDONLY flag is added, the old client may not
understand this new FLR layout flag, and may result in
inconsistent data access.

This patch adds this new flag for the purpose of compatibility and
interoperability.

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I2e96f413c0b35355503c20dfea0a39d39a216d90
Reviewed-on: https://review.whamcloud.com/40813
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14814 osc: osc: Do not flush on lockless cancel 52/44152/7
Patrick Farrell [Tue, 6 Jul 2021 15:20:56 +0000 (11:20 -0400)]
LU-14814 osc: osc: Do not flush on lockless cancel

The cancellation of a an OSC lock without an LDLM lock
(a 'lockless' OSC lock) should not flush pages.  Only
direct i/o is allowed to use a lockless OSC lock, and
direct i/o does not create flushable pages.

DIO pages are not flushable because:
A) all synced ASAP, and
B) the OSC extents created for them are not added to the
extent tree which is used to track these pages.

Instead, this has the effect of trying to flush pages from
ongoing buffered i/o.  This can lead to crashes like the
following:

osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed

This assert essentially says the lock cancellation
(hp == 1) found an active i/o (an extent in the OES_ACTIVE
state).

This is not allowed because the flushing code assumes an
LDLM lock is being cancelled, which will only start once
there is no active i/o.  Because the OSC lock being
cancelled is not associated with an LDLM lock, this is not
true, and nothing prevents active i/o under a different
lock, leading to this assert.

The solution is simply to not flush pages when cancelling a
no-LDLM-lock OSC lock.

Additional note:
New lockless OSC locks cannot be created if they are
blocked by a regular OSC lock, but a new regular lock can
be created if there is a lockless lock present.

Thus, the sequence is something like this:
Direct i/o creates lockless OSC lock
Buffered i/o creates OSC and LDLM lock on the same range
Direct i/o finishes, starts cancelling its OSC lock
Buffered i/o is still ongoing, with extents in OES_ACTIVE

This results in the above crash during the OSC lock
cancellation.

Note it would be possible to resolve this issue by not
allowing lockless OSC locks to match regular OSC locks, but
this is not necessary, since there's no reason for lockless
locks to flush pages on cancellation.

Test-Parameters: env=ONLY=398b,ONLY_REPEAT=200 testlist=sanity
Test-Parameters: env=ONLY=77,ONLY_REPEAT=100 testlist=sanityn
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iceb1747b66232cad3f7e90ec271310a13a687a33
Reviewed-on: https://review.whamcloud.com/44152
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14838 osc: Remove client contention support 05/44205/5
Patrick Farrell [Fri, 9 Jul 2021 20:13:36 +0000 (16:13 -0400)]
LU-14838 osc: Remove client contention support

Lockless buffered i/o and contention detection don't work,
lockless bufferd i/o is unfixable and contention detection
is broken enough that it will have to be rewritten.

Let's remove both.  This patch starts the removal by
pulling the client side support.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If8583eff176bddb33e197befb967d229f8ca5688
Reviewed-on: https://review.whamcloud.com/44205
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14838 osc: Remove lockless truncate 04/44204/5
Patrick Farrell [Fri, 9 Jul 2021 20:13:09 +0000 (16:13 -0400)]
LU-14838 osc: Remove lockless truncate

Lockless truncate does not work and cannot be made to work.

Fundamentally, it has no means of ensuring consistency
across clients because it can't force them all to drop
cached data without locking.

It's been off for years - let's just get rid of it.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia2979fb6b31a61da6d4833e9f463fcd5b6dbd718
Reviewed-on: https://review.whamcloud.com/44204
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c 32/44332/2
Mr. NeilBrown [Fri, 16 Jul 2021 12:24:10 +0000 (08:24 -0400)]
LU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c

This type is only used in libcfs/module.c, so make it local to there.
If any other module ever wanted to add its own symlinks,
it would probably be easiest to export lnet_debugfs_root
and just call debugfs_create_symlink as required.

Linux-commit: c4f907719736b720aa831447828809840e533371

Test-Parameters: trivial
Change-Id: I08221cfc781735451a292ba20cd35b9172fc20f2
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/44332
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 months agoLU-14637 flr: get rid of excluding dom+flr support test 85/44185/8
Bobi Jam [Thu, 8 Jul 2021 14:34:18 +0000 (22:34 +0800)]
LU-14637 flr: get rid of excluding dom+flr support test

Now that DoM+FLR are supported, fix the tests that expect this
combination of features on a file to fail.

Fixes: 0bff64be320fd ("LU-9771 flr: to not support dom+flr for phase 1")
Fixes: 44a721b8c1063 ("LU-11421 dom: manual OST-to-DOM migration via mirroring)
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9fc76e797e469744107e5d0453b78729226be0ee
Reviewed-on: https://review.whamcloud.com/44185
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-14789 tests: make sanity 133f and 133g working 84/44184/4
Cyril Bordage [Thu, 8 Jul 2021 14:35:44 +0000 (16:35 +0200)]
LU-14789 tests: make sanity 133f and 133g working

Tests sanity 133f and 133g were doing nothing after change 38567.
Because the argument given to badarea_io was not a path, 0 was always
returned. This patch finds the complete path of the parameters
returned by "lctl get_param" and gives them to badarea_io.

Fixes: 1c54733894f8 ("LU-10401 tests: add -F so list_param prints entry type")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7a8914e2950d5a8b2a93948c4fbe889520a3198c
Reviewed-on: https://review.whamcloud.com/44184
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-14788 lnet: check memdup_user_nul using IS_ERR 91/44091/4
Cyril Bordage [Mon, 28 Jun 2021 13:13:21 +0000 (15:13 +0200)]
LU-14788 lnet: check memdup_user_nul using IS_ERR

Crash in __proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.

Fixes: 67af976c806 ("LU-14428 libcfs: discard cfs_trace_copyin_string()"
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I1fabf2499b6bbee7b94a2f802fbcbd9270d901b3
Reviewed-on: https://review.whamcloud.com/44091
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13055 doc: update changelog manpages 22/44022/5
Mikhail Pershin [Thu, 17 Jun 2021 14:11:51 +0000 (17:11 +0300)]
LU-13055 doc: update changelog manpages

Add lctl-changelog_register.8 and lctl-changelog_deregister.8
manpages and update lctl.8 manpage to refer to them.

Fixes: 15305c3c3fe7 ("LU-12214 build: fix build without lustre_utils")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie41db630c72f61a884cd8000e0a4aeeb42ca60eb
Reviewed-on: https://review.whamcloud.com/44022
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14748 build: gcc9 fix address of packed member warning 61/43961/4
Shaun Tancheff [Wed, 9 Jun 2021 16:25:21 +0000 (11:25 -0500)]
LU-14748 build: gcc9 fix address of packed member warning

gcc9 complains about use of __swabXXs() with some packed
structures.
Use __swabXX() for these cases.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I7437c6f21d38c209ef452b41760aad6d1d3d6034
Reviewed-on: https://review.whamcloud.com/43961
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14740 llite: avoid project quota overflow 39/43939/8
Wang Shilong [Mon, 7 Jun 2021 15:40:22 +0000 (23:40 +0800)]
LU-14740 llite: avoid project quota overflow

Currently, project ID is stored as u32, max possible
value for it is 4294967295.

However, VFS reserve max value for special usage, see
following function:

  static inline bool
  qid_has_mapping(struct user_namespace *ns, struct kqid qid)
  {
          return from_kqid(ns, qid) != (qid_t) -1;
  }

So qid_has_mapping() could return 0 for id 4294967295.
A further try on chown test:

  $ chown 4294967295:4294967295 c.sh
  chown: invalid user: ‘4294967295:4294967295
  $ chown 4294967294:4294967294 c.sh

Fix to check max possible value for project ID in the
client kernel side, and add a test case for this.

Test-parameters: trivial testlist=sanity-quota
Fixes: 7b5c1f1404c3 ("LU-13845 utils: Quota id 0xFFFFFFFF is invalid")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ide8b9cc79d9b7f2a8b9860a0c0f683ec903b8f91
Reviewed-on: https://review.whamcloud.com/43939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14430 mdd: rename mti_fid to mdi_fid and friends 40/43740/5
Andreas Dilger [Tue, 18 May 2021 17:56:32 +0000 (11:56 -0600)]
LU-14430 mdd: rename mti_fid to mdi_fid and friends

Rename mdd_thread_info fields to avoid confusion with mdt_thread_info.
The final patch to rename mdd_thread_info fields to a unique prefix:

  mti_cattr->mdi_cattr
  mti_fid->mdi_fid
  mti_fid2->mdi_fid2
  MTI_KEEP_KEY->MDI_KEEP_KEY
  mti_la_for_fix->mdi_la_for_fix
  mti_la_for_start->mdi_la_for_start
  mti_pattr->mdi_pattr
  mti_tattr->mdi_tattr
  mti_tpattr->mdi_tpattr

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I17bcc3ddfae400a5ca76e4f654c696da6d3ebbe5
Reviewed-on: https://review.whamcloud.com/43740
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13717 sec: handle null algo for filename encryption 88/43388/6
Sebastien Buisson [Thu, 25 Mar 2021 16:55:35 +0000 (17:55 +0100)]
LU-13717 sec: handle null algo for filename encryption

Encrypted files created with Lustre 2.14 have clear text file names.
With new code implementing filename encryption, newly created files
will have cipher text names, unless they are in an encrypted directory
created in Lustre 2.14.

So we need to make sure llcrypt library can properly handle the "null"
algorithm for client side filename encryption, which is basically a
no-op.
Handling this "null" algo for filename encryption will not be possible
with the in-kernel fscrypt library, so modify the behaviour of
configure to build with embedded llcrypt by default, and only build
against in-kernel fscrypt if explicitly specified via
--enable-crypto=in-kernel configure option.

The objective is to urge users to convert their encrypted directories
to the new fashion that encrypts filenames.
However, with the new code some operations on encrypted files created
with 2.14 might not be possible, like migrate, so expressly forbid
migrate on files that use the "null" algorithm for client side
filename encryption.

Finally, we revert commit 11fcbfa9de4a5170abc2c5df2a6e4e02f0f84268
("LU-12275 sec: force file name encryption policy to null") so that
new encrypted directories will enforce filename encryption.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I393945adc9b720a56544b5da0669cb2848507457
Reviewed-on: https://review.whamcloud.com/43388
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13799 osc: Improve osc_queue_sync_pages 82/39482/23
Patrick Farrell [Tue, 15 Jun 2021 14:23:04 +0000 (10:23 -0400)]
LU-13799 osc: Improve osc_queue_sync_pages

This patch was split and partially done in:
https://review.whamcloud.com/38214

So the text below refers to the combination of this patch
and that one.  This patch now just improves a looped atomic
add by replacing with a single one.  The rest of the grant
calcuation change is in
https://review.whamcloud.com/38214

(I am retaining the text below to show the performance
improvement)
----------
osc_queue_sync_pages now has a grant calculation component,
this has a pretty painful impact on the new faster DIO
performance.  Specifically, per page ktime_get() and the
per-page atomic_add cost close to 10% of total CPU time in
the DIO path.

We can make this per batch of pages rather than for each
page, which reduces this cost from 10% of CPU to almost
nothing.

This improves write performance by about 10% (but has no
effect on reads, since they don't use grant).

This patch reduces i/o time in ms/GiB by:
Write: 10 ms/GiB
Read: 0 ms/GiB

Totals:
Write: 158 ms/GiB
Read: 161 ms/GiB

mpirun -np 1 $IOR -w -t 1G -b 64G -o $FILE --posix.odirect

Before patch:
write     6071

After patch:
write     6470

(Read is similar.)

This also fixes a mistake in c24c25dc1b / LU-13419 where it
removed the shrink interval update entirely from the direct
i/o path.

Fixes: c24c25dc1b ("LU-13419 osc: Move shrink update to per-write")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ic606e03be58239c291ec0382fa89eba64560da53
Reviewed-on: https://review.whamcloud.com/39482
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13799 clio: Skip prep for transients 48/39448/17
Patrick Farrell [Fri, 7 May 2021 19:51:32 +0000 (15:51 -0400)]
LU-13799 clio: Skip prep for transients

The work done by cpo_prep() (etc) is unnecessary for
transient pages.  This gives only a minimal performance
boost and is better seen as a step towards removing the
cl_page abstraction for transient pages.

But, it does consistently give around 1% better
performance.

This patch reduces i/o time in ms/GiB by:
Write: 1 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 169 ms/GiB
Read: 161 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        6028 MiB/s
read         6305 MiB/s

Plus this patch:
write        6071 MiB/s
read         6355 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Ib94f57cde468c9aaea952e1bb89db8fcf4b35e07
Reviewed-on: https://review.whamcloud.com/39448
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13799 llite: Adjust dio refcounting 47/39447/16
Patrick Farrell [Fri, 7 May 2021 19:50:15 +0000 (15:50 -0400)]
LU-13799 llite: Adjust dio refcounting

We get a page reference in cl_page_find, then immediately
add another for cl_2queue_add and remove the first
reference.  This is pretty silly, since the life cycle is
the same on these.

This improves DIO/AIO page submission by around 2%.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 170 ms/GiB
Read: 162 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous pa5ches in series:
write        5955 MiB/s
read         6218 MiB/s

Plus this patch:
write        6028 MiB/s
read         6305 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I228eca6d48c6007bbf2c8caae5e477b7d40521d1
Reviewed-on: https://review.whamcloud.com/39447
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13799 lov: Improve DIO submit 46/39446/16
Patrick Farrell [Fri, 7 May 2021 19:42:20 +0000 (15:42 -0400)]
LU-13799 lov: Improve DIO submit

Skip some unnecessary looping in page submission for the
DIO case.

This gives about a 2% improvement for AIO/DIO page
submission.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 172 ms/GiB
Read: 165 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7726 MiB/s
read         5899 MiB/s

Plus this patch:
write        5954 MiB/s
read         6217 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Iedad978438ee3f1f3290d990311532626cba9e2d
Reviewed-on: https://review.whamcloud.com/39446
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13799 llite: Remove transient page counting 41/39441/15
Patrick Farrell [Sat, 29 May 2021 01:32:43 +0000 (21:32 -0400)]
LU-13799 llite: Remove transient page counting

Transient page counting is not used for anything, as
already noted in the commit message, but costs something
like 4% of the time in DIO page submission.

Remove it.

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 11 ms/GiB

Totals:
Write: 174 ms/GiB
Read: 167 ms/GiB

With previous patches in series:
write     5703 MiB/s
read      5756 MiB/s

Plus this patch:
write     5900 MiB/s
read      6136 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I825de4f1b5d1dd1476a4a711bfa51e7d24b5027a
Reviewed-on: https://review.whamcloud.com/39441
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-13799 llite: Modify AIO/DIO reference counting 42/39442/14
Patrick Farrell [Fri, 7 May 2021 15:50:51 +0000 (11:50 -0400)]
LU-13799 llite: Modify AIO/DIO reference counting

For DIO pages, it's enough to have a reference on the
cl_object associated with the AIO.  This saves taking a
reference on the cl_object for each page, which saves about
5% of the time when doing DIO/AIO.

This is possible because the lifecycle of the aio struct is
always greater than that of the associated pages.

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 198 ms/GiB
Read: 197 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5030 MiB/s
read      5174 MiB/s

Plus this patch:
write     5183 MiB/s
read      5200 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I970cda20417265b4b66a8eed6e74440e5d3373b8
Reviewed-on: https://review.whamcloud.com/39442
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13326 mds: remove MDS_SETATTR_PORTAL and service 98/37798/7
Andreas Dilger [Wed, 4 Mar 2020 20:28:26 +0000 (12:28 -0800)]
LU-13326 mds: remove MDS_SETATTR_PORTAL and service

Remove the MDS_SETATTR_PORTAL and the service threads listening on
this portal since they are unused since Lustre 2.1 and are no longer
needed.

Remove module tunables related to the mds_attr service threads:
- mds_attr_num_threads
- mds_attr_cpu_bind
- mds_attr_num_cpts

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I64f4f3f0004e1895ef7b49b31a4ad687a1abcca2
Reviewed-on: https://review.whamcloud.com/37798
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13417 test: mkdir_on_mdt0() in more tests 15/44315/8
Lai Siyao [Thu, 8 Jul 2021 08:09:01 +0000 (16:09 +0800)]
LU-13417 test: mkdir_on_mdt0() in more tests

Replace mkdir with mkdir_on_mdt0() in several tests.

Update recovery-small test_110k() in case there are opened files on
MDT1 which would cause umount stall.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iebc32568b7fc146b658f47c5f5053fd3db24432f
Reviewed-on: https://review.whamcloud.com/44315
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14655 lnet: Protect lpni deref in lnet_health_check 03/43503/3
Chris Horn [Wed, 28 Apr 2021 01:10:16 +0000 (20:10 -0500)]
LU-14655 lnet: Protect lpni deref in lnet_health_check

Discovery thread can modify peer NI/peer net/peer relationship
so we need to be careful when dereferencing the peer NI pointer in
lnet_health_check(). Discovery thread operations under net lock, so
move the peer NI dereference under the net lock which is taken for
incrementing the health stats.

Move some of the other code that is only relevant for messages with a
health status != LNET_MSG_STATUS_OK under the appropriate condition.

HPE-bug-id: LUS-9962
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3e6763b71bcdc9281f46b79c59e40f939190d468
Reviewed-on: https://review.whamcloud.com/43503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13417 test: generate uneven MDTs early for sanity 413 84/44384/6
Lai Siyao [Tue, 20 Jul 2021 01:24:36 +0000 (09:24 +0800)]
LU-13417 test: generate uneven MDTs early for sanity 413

Fill MDT early to generate uneven MDTs for sanity test_413, and
add test_413z to unlink these directories.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-1
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I84e3670bb40c3666488139d6a272f29188b0dfae
Reviewed-on: https://review.whamcloud.com/44384
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14868 llite: revert 'simplify callback handling for async getattr' 71/44371/6
Andreas Dilger [Wed, 21 Jul 2021 23:38:37 +0000 (23:38 +0000)]
LU-14868 llite: revert 'simplify callback handling for async getattr'

This reverts commit cbaaa7cde45f59372c75577d7274f7e2e38acd24.

This is causing process hangs and timeouts during file removal.

Test-Parameters: trivial
Fixes: cbaaa7cde4 ("LU-14139 llite: simplify callback handling for async getattr")
Change-Id: I77f5bc460850bfe7a5143e22b0c5f3e14a40474a
Reviewed-on: https://review.whamcloud.com/44371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-14826 mdt: getattr_name("..") under striped directory 68/44168/6
Lai Siyao [Thu, 8 Jul 2021 14:25:51 +0000 (10:25 -0400)]
LU-14826 mdt: getattr_name("..") under striped directory

For getattr_name(".."), it should return FID of the master object for
striped directories. This includes changes on both client and server:
* lmv_getattr_name() should use master object FID if it's looking up
  "..".
* mdt_raw_lookup() should check parent object is sub stripe, if so
  it needs to lookup again to get master object FID. For old client
  without above change this needs to be checked twice.

This is needed by NFS export, because ll_get_parent() find parent by
getattr_name("..").

Reenable check_fhandle_syscall and update sanityn test_102.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I72c951293e41656ce3778750147402d7f8ca4cec
Reviewed-on: https://review.whamcloud.com/44168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 months agoLU-14833 sec: quiet spurious gss_init_svc_upcall() message 97/44197/2
Sebastien Buisson [Fri, 9 Jul 2021 12:52:40 +0000 (14:52 +0200)]
LU-14833 sec: quiet spurious gss_init_svc_upcall() message

Switch from CWARN to CDEBUG(D_SEC) for message printed by
gss_init_svc_upcall():
Init channel is not opened by lsvcgssd, following request might be
dropped until lsvcgssd is active
Indeed, this message is printed no matter GSS is enabled or not, and
we do not have any way to check this by the time the kernel module
is loaded.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I66c8c2a16e58ca75973226c80e0f4a92c90b4025
Reviewed-on: https://review.whamcloud.com/44197
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14114 lnet: print device status in net show command 69/44169/2
Cyril Bordage [Wed, 7 Jul 2021 13:27:54 +0000 (15:27 +0200)]
LU-14114 lnet: print device status in net show command

A device can be in fatal state, if the cable was disconnected, or the
port brought down on the switch side. In these cases, the LND (o2iblnd
for now), will flag the device in fatal state. That device will not be
used any further. However, it's health will not be decremented. This
causes some confusion when examining the state of the node.
It is better to print the device status in the output of the lnetctl
net show command.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7c635ab1062f6153449fcec1bc07585065818a72
Reviewed-on: https://review.whamcloud.com/44169
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14804 nodemap: do not return error for improper ACL 27/44127/3
Sebastien Buisson [Thu, 1 Jul 2021 15:20:39 +0000 (00:20 +0900)]
LU-14804 nodemap: do not return error for improper ACL

In nodemap_map_acl(), in case the ACL is incorrect, do nothing
and just return initial size to caller.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I26aba9ce43e4a8878bfa47e145b1b44cfff89403
Reviewed-on: https://review.whamcloud.com/44127
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14300 quota: avoid nested lqe lookup 26/43326/5
Sergey Cheremencev [Mon, 12 Apr 2021 23:44:34 +0000 (02:44 +0300)]
LU-14300 quota: avoid nested lqe lookup

lqe_locate called from qmt_pool_lqes_lookup for lqe
that hasn't an entry on a disk calls qmt_lqe_set_default.
This may call qmt_set_id_notify->qmt_pool_lqes_spec
and rewrite already added lqes in a qti. Rewritten
lqes may trigger an assertion:

LustreError: 5072:0:(qmt_pool.c:838:qmt_pool_lqes_lookup()) ASSERTION( (((qmt_info(env)->qti_lqes_num) > 16 ? qmt_info(env)->qti_lqes : qmt_info(env)->qti_lqes_small)[(qmt_info(env)->qti_glbl_lqe_idx)])->lqe_is_global ) failed:
LustreError: 5072:0:(qmt_pool.c:838:qmt_pool_lqes_lookup()) LBUG
Pid: 5072, comm: mdt_rdpg00_003 3.10.0-957.1.3957.1.3.x4.1.15.x86_64 #1 SMP Mon Nov 18 14:47:03 PST 2019
Call Trace:
 [<ffffffffc046f62c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<ffffffffc046f94c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<ffffffffc0e4ae38>] qmt_pool_lqes_lookup+0x798/0x8f0 [lquota]
 [<ffffffffc0e3b0ce>] qmt_intent_policy+0x86e/0xe00 [lquota]
 [<ffffffffc109d53d>] mdt_intent_opc+0x3bd/0xb40 [mdt]
 [<ffffffffc10a5134>] mdt_intent_policy+0x1a4/0x360 [mdt]
 [<ffffffffc0a7bedb>] ldlm_lock_enqueue+0x3cb/0xad0 [ptlrpc]
 [<ffffffffc0aa4a46>] ldlm_handle_enqueue0+0xa56/0x1610 [ptlrpc]
 [<ffffffffc0b304b2>] tgt_enqueue+0x62/0x210 [ptlrpc]
 [<ffffffffc0b3753a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]

or a deadlock(2 same lqes qti_lqes array):

 call_rwsem_down_write_failed+0x17/0x30
 qti_lqes_write_lock+0xb1/0x1b0 [lquota]
 qmt_dqacq0+0x2ee/0x1ac0 [lquota]
 qmt_intent_policy+0xbfe/0xe00 [lquota]
 mdt_intent_opc+0x3ba/0xb50 [mdt]
 mdt_intent_policy+0x1a1/0x360 [mdt]
 ldlm_lock_enqueue+0x3d6/0xaa0 [ptlrpc]
 ldlm_handle_enqueue0+0xa76/0x1620 [ptlrpc]
 tgt_enqueue+0x62/0x210 [ptlrpc]
 tgt_request_handle+0x96a/0x1680 [ptlrpc]
 kthread+0xd1/0xe0

Patch adds a sanity-quota_73b to check that the isssue
doesn't exist anymore.

Change-Id: Ib1ebe82c3b6e819b2538f30af08930060bd659ae
HPE-bug-id: LUS-9902
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158581
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/43326
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14508 lfs: make mirror operations preserve timestamps 09/42009/17
John L. Hammond [Thu, 11 Mar 2021 16:02:54 +0000 (10:02 -0600)]
LU-14508 lfs: make mirror operations preserve timestamps

Save and try to restore the file timestamps around the various mirror
operations. Add sanity-flr tests 61[abc] to verify this.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5ef754e46cfbe82c731a709209576bbfcc73af3d
Reviewed-on: https://review.whamcloud.com/42009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12214 build: fix SLES build/install 72/39972/20
Alexey Lyashkov [Fri, 15 Jan 2021 15:21:15 +0000 (10:21 -0500)]
LU-12214 build: fix SLES build/install

Redhat and SuSe can have different library name for same devel,
lets drop a strong requrement to the library package name and
ask rpm to use an autoprovide option.

Test-Parameters: trivial
Test-Parameters: clientdistro=sles15sp1 ossdistro=el7.7 mdsdistro=el8.2
HPE-bug-id: LUS-7204
Fixes: e1bf37870d LU-12214 build: fix build with gss enabled
Fixes: d746e64fe1 LU-13562 build: SUSE build support for azure
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I7e0fe83f9090e7616ab156fa75fed4821099406e
Reviewed-on: https://review.whamcloud.com/39972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12022 tests: error on resync failure sanity-flr 54/35754/7
James Nunez [Tue, 15 Jun 2021 17:14:49 +0000 (11:14 -0600)]
LU-12022 tests: error on resync failure sanity-flr

In sanity-flr test 200, we should error if the final resync
fails.  Replace all calls to 'mirror_io resync' that does
not inject an error to  '$LFS mirror resync'.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I9b2ec1beb7060086808b7529467bef80c8e9659f
Reviewed-on: https://review.whamcloud.com/35754
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
3 months agoLU-6142 libcfs: checkpatch cleanup of libcfs fail.c 07/44207/4
James Simmons [Sat, 10 Jul 2021 14:54:23 +0000 (10:54 -0400)]
LU-6142 libcfs: checkpatch cleanup of libcfs fail.c

Resolve several checkpatch issues reported for fail.c. This brings
us into aligment with the native Linux client version.

Test-Parameters: trivial
Change-Id: I71e71f48a94fa20756f7696b5fbf115c919d05d3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44207
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg 22/44122/3
Mr NeilBrown [Thu, 1 Jul 2021 03:19:29 +0000 (13:19 +1000)]
LU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg

Rather than requiring the called to format a thread name into a temp
buffer, change these thread_start function to accept a format and
args, and to hand them directly to kthread_run().

This is done with a macro rather than a function as the functions are
trivial and varargs is slightly easier with macros.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I73926ef38a9e84061d1a3f9acf5c0be4a247f957
Reviewed-on: https://review.whamcloud.com/44122
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-6142 lnet: discard lnet_current_net_count 89/44089/2
Mr NeilBrown [Mon, 28 Jun 2021 06:22:02 +0000 (16:22 +1000)]
LU-6142 lnet: discard lnet_current_net_count

The variable lnet_current_net_count is never used.  So remove it.
The function lnet_get_net_count() is only used to update thar
variable, so remove it too.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id61f381f6220356c5b96c8a5822d8748a8ba43a4
Reviewed-on: https://review.whamcloud.com/44089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14217 osd-zfs: allow SEEK_HOLE/DATA only with sync 70/40970/2
Mikhail Pershin [Tue, 15 Dec 2020 11:47:20 +0000 (14:47 +0300)]
LU-14217 osd-zfs: allow SEEK_HOLE/DATA only with sync

ZFS doesn't report valid offset for SEEK_DATA if there are dirty
data, but may report SEEK_HOLE correctly that cause unreliable
results when same offset can be reported as HOLE (correctly) and
also as DATA, incorrectly but because switching to generic approach,
assuming all file is data and hole beyond end of file.

To avoid that we have to sync dirty data when dmu_offset_next()
reports EBUSY and repeat lseek call. Considering that this can
cause slowdown this behavior is controlled via new 'sync_on_lseek'
option. With this option turned off osd-zfs reports that it doesn't
support SEEK_DATA/HOLE because we cannot use unrealiable results
in our tools to copy sparse data

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic92c127628ce517a9c2f79f595a1d16116930383
Reviewed-on: https://review.whamcloud.com/40970
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14805 llite: No locked parallel DIO 31/44131/3
Patrick Farrell [Fri, 2 Jul 2021 17:24:48 +0000 (13:24 -0400)]
LU-14805 llite: No locked parallel DIO

If we are doing locked DIO, the OSC & LDLM locks are
released at the end of cl_io_loop, ie, before we wait for
parallel DIO at the llite layer.

This is problematic because the locks are released before
i/o done using them is complete; this can lead to data
inconsistencies.  (And at least one LBUG, see LU-14805.)

The easiest solution for now is only do parallel DIO when
working lockless (which is the default; DIO only switches
to locked to manage conflicts with buffered i/o).

This problem & fix apply to AIO as well as parallel DIO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If98a0551d6dde54220b406b26e978e284a6b1ebf
Reviewed-on: https://review.whamcloud.com/44131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13440 utils: fix handling of lsa_stripe_off = -1 30/43530/11
Andreas Dilger [Tue, 4 May 2021 01:25:23 +0000 (19:25 -0600)]
LU-13440 utils: fix handling of lsa_stripe_off = -1

Use LMV_OFFSET_DEFAULT instead of "-1" for parsing lfs_setdirstripe()
since parse_targets() will return "(__u32)-1" to the caller for the
stripe index, but lsa_stripe_off is a signed long long so it is
interpreted as 4294967295.  This causes the parsing to fail when
"lfs setdirstripe -i -1 --max-inherit-rr 1" is used.

Update sanity test_413a/413c to also specify "-i -1" to verify this.

In sanity test 413a,413b and 413c, create "qos" directory on most
full directory, so that its subdirectories won't be created on the
same MDT.

Fixes: f167f78e3bfd ("LU-13440 lmv: add default LMV inherit depth")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic934f859173155b1b2df56fcd315c8da633ebbe5
Reviewed-on: https://review.whamcloud.com/43530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14541 llite: avoid stale data reading 76/43476/5
Wang Shilong [Wed, 28 Apr 2021 14:26:10 +0000 (22:26 +0800)]
LU-14541 llite: avoid stale data reading

remove_mapping() can prohibit to kill page from page cache due page
refcount!=2, in vvp_page_delete() clear uptodate flag in case
stale data reading later.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I322debec951b1a342246475456c0f40e10b0e578
Reviewed-on: https://review.whamcloud.com/43476
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14828 tests: Remove extra debug 75/44175/3
Patrick Farrell [Wed, 7 Jul 2021 16:44:52 +0000 (12:44 -0400)]
LU-14828 tests: Remove extra debug

Accidentally committed 398m with extra debug.
This is sometimes causing OOMs in testing, and it's a
mistake anyway.

Fixes: cba07b68f9 ("LU-13798 llite: parallelize direct i/o issuance")
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I734aa3a952d2c085b3fc0014af1bdc0e881000e6
Reviewed-on: https://review.whamcloud.com/44175
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14817 build: __xa_set_mark is not checked anymore 38/44138/3
Vitaly Fertman [Sat, 3 Jul 2021 09:25:14 +0000 (12:25 +0300)]
LU-14817 build: __xa_set_mark is not checked anymore

LC__XA_SET_MARK does not check for __xa_set_mark anymore after
LU-9859, however the result variable still exists and its value
has changed from 'no' to 'yes'.

Test-Parameters: trivial
Fixes: 84e12028be ("LU-9859 libcfs: add support for Xarray")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I24fffe7f2727b1d892ec3cabfc6e65ae8f68e024
Reviewed-on: https://review.whamcloud.com/44138
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14808 utils: fix YAML support for DOM files 33/44133/3
Vitaly Fertman [Tue, 15 Jun 2021 14:47:25 +0000 (17:47 +0300)]
LU-14808 utils: fix YAML support for DOM files

LFS getstripe never reports LLAPI_LAYOUT_DEFAULT for any stripe
parameter, but 0 or -1 whatever is appropriate.

LU-3285 added extra verification for the DOM parameters, precisely
the stripe count, size and offset have no sense for DOM and are
expected to be LLAPI_LAYOUT_DEFAULT. However, this brakes the yaml
support which uses getstripe output as the wanted values.

Also move the sanity-flr test_6 to ALWAYS_EXCEPT due to LU-14818.

Fixes: 6744eb8eeb ("LU-3285 lfs: add parameter for Data-on-MDT file")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
HPE-bug-id: LUS-10090
Change-Id: Ide0c0fc264c7d1bac487306edf896d90153cf768
Reviewed-on: https://es-gerrit.dev.cray.com/158810
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/44133
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14786 lod: create missing debugfs file 13/44113/3
James Simmons [Tue, 29 Jun 2021 17:14:39 +0000 (13:14 -0400)]
LU-14786 lod: create missing debugfs file

While cleaning up debugfs symlinks the needed, but unused lod debugfs
directory was dropped. This results in the broken symlink

/sys/kernel/debug/lustre/lov/lustre-MDT0000-mdtlov

lctl params handling didn't see this due to glob returning only valid
directory entries so the error didn't get reported by stat(). Restore
the debugfs directory and add a new test to conf-sanity to detect any
potential breakage in the future.

Change-Id: I8fe0732d6caeeb83554833205998e24214343f88
Test-Parameters: env=ONLY=10a testlist=conf-sanity
Fixes: 462d476d ("LU-8066 obd: cleanup server sysfs symlinks handling")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44113
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14677 sec: migrate/extend/split on encrypted file 78/43878/6
Sebastien Buisson [Fri, 28 May 2021 16:11:53 +0000 (18:11 +0200)]
LU-14677 sec: migrate/extend/split on encrypted file

lfs migrate/extend/split makes use of volatile files to swap layouts.
When operation is carried out on an encrypted file, the volatile file
must be assigned the same encryption context as the original file, so
that data moved/copied to different OSTs is identical to the original
file's.
Also update sanity-sec test_52 to exercise these commands.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3878b5e9e6d3738dfee0ce0f89a3646e6a7b976f
Reviewed-on: https://review.whamcloud.com/43878
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14430 mdd: rename mti_oa to mdi_oa and friends 39/43739/5
Andreas Dilger [Thu, 13 May 2021 10:42:20 +0000 (04:42 -0600)]
LU-14430 mdd: rename mti_oa to mdi_oa and friends

Rename fields in mdd_thread_info to confusion with mdt_thread_info.
The second patch of several to rename all mdd_thread_info fields
to use a more unique field prefix:

  mti_dof->mdi_dof
  mti_dt_rec->mdi_dt_rec
  mti_ent->mdi_ent
  mti_flags->mdi_flags
  mti_hint->mdi_hint
  mti_key->mdi_key
  mti_link_data->mdi_link_data
  mti_name->mdi_name
  mti_oa->mdi_oa
  mti_range->mdi_range
  mti_spec->mdi_spec

The mti_lmv and mti_lrl fields are removed since they are unused.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6fd4b7f26b7e9561d8a8585eaa5438d6093ebbe5
Reviewed-on: https://review.whamcloud.com/43739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14430 mdd: rename mti_big_buf to mdi_big_buf 38/43738/6
Andreas Dilger [Thu, 13 May 2021 10:27:49 +0000 (04:27 -0600)]
LU-14430 mdd: rename mti_big_buf to mdi_big_buf

Avoid serious confusion with the MDT mti_big_buf, and other fields
in mdd_thread_info, since they are two separate buffers completely.

  mti_big_buf->mdi_big_buf
  mti_chlg_buf->mdi_chlg_buf
  mti_link_buf->mdi_link_buf
  mti_xattr_buf->mdi_xattr_buf

The first patch of several to rename all mdd_thread_info fields.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib0ec91c8481e747ed058afe5c08c3f60203ebbe5
Reviewed-on: https://review.whamcloud.com/43738
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10499 pcc: introducing OBD_CONNECT2_PCCRO flag 91/40791/8
Qian Yingjin [Mon, 30 Nov 2020 02:08:17 +0000 (10:08 +0800)]
LU-10499 pcc: introducing OBD_CONNECT2_PCCRO flag

Add a new connection flag OBD_CONNECT2_PCCRO to solve the access
consistency from the old client without PCC-RO support.

By necessity, also include definitions for OBD_CONNECT2_MODE_CONVERT
and OBD_CONNECT2_BATCH_RPC so obd_connect_names[] works.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I19716e94a86e53353c1628d414c92e61e084dfc9
Reviewed-on: https://review.whamcloud.com/40791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
3 months agoLU-14780 llite: failed ASSERTION(ldlm_has_layout(lock)) 54/44054/2
Bobi Jam [Fri, 4 Jun 2021 03:58:29 +0000 (11:58 +0800)]
LU-14780 llite: failed ASSERTION(ldlm_has_layout(lock))

When setting layout in layout lock, the lock could lost its layout
bits, and we'd try fetch the layout lock again.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I10f96e4cb03cfe228d3c1ea1500b1a8d8e4e5e54
Reviewed-on: https://review.whamcloud.com/44054
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14459 mdt: support fixed directory layout 91/43291/7
Lai Siyao [Wed, 28 Apr 2021 21:30:00 +0000 (05:30 +0800)]
LU-14459 mdt: support fixed directory layout

User may not want directories split automatically in some cases:
*.directory migrated.
* directory restriped.

To support this, an LMV flag LMV_HASH_FLAG_FIXED is added, and it will
be set on migrated/restriped directories. NB, if directory is migrated
or restriped to a one-stripe directory, it won't be transformed into a
plain directory, because this flag needs to be kept.

Update sanity 230q.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Icd12b2aa34d391e32c3323a8b9c24449ea3e3d0e
Reviewed-on: https://review.whamcloud.com/43291
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14459 mdt: restripe parent may be a stripe 90/43290/8
Lai Siyao [Mon, 12 Apr 2021 03:30:13 +0000 (11:30 +0800)]
LU-14459 mdt: restripe parent may be a stripe

mdt_restripe() check parent LMV sanity with lmv_is_sane(), but parent
may be a stripe, use lmv_is_sane2() instead.

Clear lmv_migrate_hash/offset in layout shrink/update, though it
won't cause any issue, it's strange to see values set in debug logs.

Add more race check between directory restripe, auto-split and
migration.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I38950a07a8c9a8b4b20a2fd7aff229d27dbb403c
Reviewed-on: https://review.whamcloud.com/43290
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14459 llite: reset pfid after dir migration 89/43289/7
Lai Siyao [Mon, 12 Apr 2021 03:17:37 +0000 (11:17 +0800)]
LU-14459 llite: reset pfid after dir migration

A plain directory will be turned into to a stripe upon
migration/restripe, and reversely if target is plain directory, the
target stripe will be turned into directory after.

In the first case, set pfid, and in the latter case, clear pfid,
otherwise ll_lock_cancel_bits() will use the wrong master inode.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I01cac0103dc79d493166e6b090508d24f9678a57
Reviewed-on: https://review.whamcloud.com/43289
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14739 quota: nodemap squashed root cannot bypass quota 88/43988/7
Sebastien Buisson [Fri, 11 Jun 2021 14:49:47 +0000 (16:49 +0200)]
LU-14739 quota: nodemap squashed root cannot bypass quota

When root on client is squashed via a nodemap's squash_uid/squash_gid,
its IOs must not bypass quota enforcement as it normally does without
squashing.
So on client side, do not set OBD_BRW_FROM_GRANT for every page being
used by root. And on server side, check if root is squashed via a
nodemap and remove OBD_BRW_NOQUOTA.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I95b31277273589e363193cba8b84870f008bb079
Reviewed-on: https://review.whamcloud.com/43988
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14733 o2iblnd: Avoid double posting invalidate 90/44190/3
Mike Marciniszyn [Wed, 7 Jul 2021 19:16:01 +0000 (15:16 -0400)]
LU-14733 o2iblnd: Avoid double posting invalidate

When the kib_tx is provisioned during kiblnd_fmr_pool_map(), spare
WRs in the kib_fast_reg_descriptor are setup and the mapping of
pages is given to the mr.

kiblnd_post_tx_locked() then posts the spare WRs from the
kib_fast_reg_descriptor.

if (rc == 0)
return 0;

The code returns and the kib_fast_reg_descriptor is still contains
the spare WRs.   The next time the kib_tx is used, the
now obsolete WRs will be inadvertently posted.   For rdmavt, the
obsolete invalidate will cause an -EINVAL to be returned from
the post send.

Fix by adding a state variable frd_posted to the kib_fast_reg_descriptor.
The variable is set to false in kiblnd_fmr_pool_unmap().
kiblnd_post_tx_locked() is adjusted to avoid prepending the
kib_fast_reg_descriptor WRs when frd_posted is true.   After
the post succeeds, the frd_posted is set to true.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Change-Id: I426dd05e635392e75d1aa48808782a229e83ce5f
Reviewed-on: https://review.whamcloud.com/44190
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14733 o2iblnd: Move racy NULL assignment 89/44189/2
Mike Marciniszyn [Wed, 7 Jul 2021 19:16:00 +0000 (15:16 -0400)]
LU-14733 o2iblnd: Move racy NULL assignment

kiblnd_fmr_pool_unmap() can race map and subsequent processing
because of this flaw in unmap:

if (frd) {
frd->frd_valid = false;
spin_lock(&fps->fps_lock);
list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
spin_unlock(&fps->fps_lock);
fmr->fmr_frd = NULL;
}

The fmr can be pulled off the list in kiblnd_fmr_pool_unmap() on
another CPU an fmr_frd could be in a state of flux and
potentially be seen incorrectly later on as the kib_tx is processed.

Fix my moving the fmr_frd assignment to before the fmr is added to the
list.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Change-Id: Ibddf132a363ecfe9db3cc06287cec873c021d2fb
Reviewed-on: https://review.whamcloud.com/44189
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14798 lnet: RMDA infrastructure updates 09/44109/2
Amir Shehata [Thu, 6 Feb 2020 01:46:03 +0000 (17:46 -0800)]
LU-14798 lnet: RMDA infrastructure updates

Add infrastructure to force RDMA for payloads < 4K.
Add infrastructure to extract the first page in a
payload. Useful for determining the type of the payload
to be transmitted.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id7dc26c83f00dadd26feca94fc4d8233872650d3
Lustre-change: https://review.whamcloud.com/37453
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44109
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14779 utils: no DNS lookups for NID in get_param 56/44056/4
Andreas Dilger [Wed, 23 Jun 2021 08:20:24 +0000 (02:20 -0600)]
LU-14779 utils: no DNS lookups for NID in get_param

Calling libcfs_str2nid() speculatively in "lctl get_param" to see
if there is a NID in the parameter name results in multiple DNS
lookups for invalid hostnames (e.g. "exports.192.168.0.10"). That
may take a very long time if there are a large number of connected
clients, and if the DNS server overloaded or is having problems.

Instead of doing these speculative NID conversions, skip the whole
NID string in the parameter name for the two known parameters that
may contain a NID ("*.exports.<NID>.*" and "*.MGC<NID>.*").  This
is considerably faster since it is only working on a local string.

If new parameters are added that contain a NID (unlikely, but
possible), then "clean_path()" would need to be updated as part
of that change.

Fixes: 85cbe1a3ee69 ("LU-5030 util: migrate lctl params functions to use cfs_get_paths()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I51f865e4ce3a7bc4879f9d688c4b3a68d731810f
Reviewed-on: https://review.whamcloud.com/44056
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14778 readahead: fix to reserve min pages 50/44050/3
Wang Shilong [Tue, 22 Jun 2021 01:26:40 +0000 (09:26 +0800)]
LU-14778 readahead: fix to reserve min pages

@pages_min might be larger than @pages which indicate
more pages should be read, and it will cause a warning
later.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ifd82f709c3877172f08b87ab0551da735a0613e0
Reviewed-on: https://review.whamcloud.com/44050
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14776 ldiskfs: Add Ubuntu 20.04 HWE support 39/44039/4
James Simmons [Mon, 28 Jun 2021 16:15:43 +0000 (10:15 -0600)]
LU-14776 ldiskfs: Add Ubuntu 20.04 HWE support

Use the already landed ldiskfs support for Linux 5.8.0 to enable
support for the Ubuntu 20.04 HWE 5.8.0-53 kernel. Another change
that started with the 5.7 kernel is removal of the flag
EXT4_GET_BLOCKS_KEEP_SIZE. The code was no longer needed with the
removal of EXT4_EOFBLOCKS_FL which happened in 2012. e2fsprog
support for this flag has been removed since version 1.42.2.

Change-Id: I60db446bab50178a601e1c2c20e782435f9f50f2
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44039
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14767 utils: mkfs.lustre allow lazy_itable_init=1 19/44019/3
Andreas Dilger [Wed, 16 Jun 2021 20:48:33 +0000 (14:48 -0600)]
LU-14767 utils: mkfs.lustre allow lazy_itable_init=1

When "lazy_itable_init=0" was added to the mke2fs options the call
to append_unique() to see whether "lazy_itable_init" was already
listed in the mke2fs options was incorrect. It checks to see if
"lazy_itable_init=0" is already present in the options, and doesn't
match "lazy_itable_init=1" if it was specified on the command-line.

Separate the key and value passed to append_unique() so that it can
check if any form of the key is present in the existing options.

Test-Parameters: trivial testlist=conf-sanity
Fixes: 701cc249594e ("LU-13533 utils: ext4lazyinit should be disabled")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic7a6dbb81f004dd35f0f1c5f5ddec0fb363ebbe5
Reviewed-on: https://review.whamcloud.com/44019
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11872 quota: add get/set project support for non-dir/file 06/44006/7
Wang Shilong [Tue, 22 Jun 2021 12:09:29 +0000 (20:09 +0800)]
LU-11872 quota: add get/set project support for non-dir/file

Add ablity to get/set non-dir/file's project ID and state.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ib8eee09254f9751797b5deb7f753c34eb2c0d5a5
Reviewed-on: https://review.whamcloud.com/44006
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14750 lnet: use ni fatal error when calculating net health 62/43962/2
Serguei Smirnov [Wed, 9 Jun 2021 21:22:12 +0000 (14:22 -0700)]
LU-14750 lnet: use ni fatal error when calculating net health

When ni is flagged with "fatal_error" by LND, its health score
remains unaffected. This allows for the net containing such ni
to be selected for tx even if it is the only ni in this net.
Take "fatal_error" status of the ni into account when calculating
the net health score.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ib76245f835f1458873f0c05ad9b6727d295857de
Reviewed-on: https://review.whamcloud.com/43962
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-11698 libcfs: Add checksum speed under /sys/fs 43/43943/11
Arshad Hussain [Tue, 8 Jun 2021 09:32:01 +0000 (05:32 -0400)]
LU-11698 libcfs: Add checksum speed under /sys/fs

This patch adds total of registered checksum and all
registered checksum names along with their speed under
/sys/kernel/debug/lustre/checksum_speed

TestCase sanity/77m added.

Sample output:
$ lctl get_param checksum_speed
checksum_speed=adler32: 1955
crc32: 2423
crc32c: 14035

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: If125032e35bfd9221eb66e6f77bf7e3753ffcc0f
Reviewed-on: https://review.whamcloud.com/43943
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14093 lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64 13/43713/6
Dominique Martinet [Sat, 15 May 2021 22:32:56 +0000 (07:32 +0900)]
LU-14093 lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64

Fix the following warning on new gcc with -Wextra when including
lustre_idl.h on external project:

.../include/linux/lnet/lnet-types.h: In function LNetMDHandleIsInvalid:
.../include/linux/lnet/lnet-types.h:355:46:
   error: comparison of integer expressions of different signedness:
   int and __u64 {aka long long unsigned int} [-Werror=sign-compare]
        return (LNET_WIRE_HANDLE_COOKIE_NONE == h.cookie);
                                             ^~

Change-Id: I05f21dcca5fe9dd15d1e0b6cb9a29c3999bcd807
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/43713
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14648 lod: protect lod_object layout info 71/43671/3
Bobi Jam [Wed, 12 May 2021 08:18:00 +0000 (16:18 +0800)]
LU-14648 lod: protect lod_object layout info

Need to protect lod_object's layout access with ldo_layout_mutex.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2c4a2078bdce64d15485d3ff18f6670d42ca90ba
Reviewed-on: https://review.whamcloud.com/43671
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14762 lmv: compare space to mkdir on parent MDT 97/43997/5
Lai Siyao [Mon, 14 Jun 2021 07:26:47 +0000 (15:26 +0800)]
LU-14762 lmv: compare space to mkdir on parent MDT

In QOS subdirectory creation, subdirectories are kept on parent MDT
if it is less full than average, however it checks weight other than
free space, while "weight = free space - penalty", if MDTs have
different penalties, the result is not accurate, therefore this may
not work.

Check free space instead, and loosen the critirion to allow the
free space within the range of QOS threshold.

Fixes: 3f6fc483013d ("LU-13439 lmv: qos stay on current MDT if less full")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id34cf8f3f58fee9d329f0d05c2f7a6463b67dfe1
Reviewed-on: https://review.whamcloud.com/43997
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14654 tests: Ensure recovery_limit zero works as expected 02/43502/3
Chris Horn [Thu, 29 Apr 2021 18:09:07 +0000 (13:09 -0500)]
LU-14654 tests: Ensure recovery_limit zero works as expected

When lnet_recovery_limit is set to zero (the default) peer NIs are
eligible for recovery pings indefinitely. Verify this functionality
by modifying sanity-lnet test_211 to use recovery_limit 0 to make
a peer NI re-eligible for recovery.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-9953
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I00cb0940133e15ec73491e875d08b6db2bff3fe5
Reviewed-on: https://review.whamcloud.com/43502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14654 lnet: Correct peer NI recovery age out calculation 01/43501/3
Chris Horn [Thu, 29 Apr 2021 18:14:34 +0000 (13:14 -0500)]
LU-14654 lnet: Correct peer NI recovery age out calculation

The calculation to age a peer NI out of recovery is only valid if
lnet_recovery_limit is non-zero. When set to zero, we allow peer NIs
to be in recovery indefinitely.

Test-Parameters: trivial
HPE-bug-id: LUS-9953
Fixes: cc27201a76 ("LU-13569 lnet: Age peer NI out of recovery")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6bb40ca3a9affa0eaaae9deb1cecdb03e4bb42c5
Reviewed-on: https://review.whamcloud.com/43501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13055 mdd: per-user changelog names and mask 80/43380/9
Mikhail Pershin [Tue, 22 Jun 2021 18:16:26 +0000 (21:16 +0300)]
LU-13055 mdd: per-user changelog names and mask

Allow specifying a name for newly-registered changelog users,
rather than the default "clNNN" that is otherwise used. This
allows services to register a "well-known" changelog user,
rather than having to store the changelog username in HA storage
outside of the filesystem.

Each changelog user still has a unique ID appended to it, to allow
the changelog_clear and changelog_deregister commands to be run
using only the ID if necessary/desired. User name can be used to
deregister. User name is also unique per server.

If no name is given, then default "cl" format is used.

With this new functionality, it is possible to specify the name like:
 # lctl --device testfs-MDT0000 changelog_register --user watcher
   testfs-MDT0000: Registered changelog userid 'cl13-watcher'

Per-user mask is also added to allow specific operation logging on
per-user basis. Mask can be set only during registration. Resulting
mask from per-server mask and all user masks is used for current
changelog operations.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I56028f54cc97bbc9af03fd6559c19ef854f759d8
Reviewed-on: https://review.whamcloud.com/43380
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-4684 tests: enable racer directory migration 59/41359/4
Andreas Dilger [Thu, 28 Jan 2021 20:44:27 +0000 (13:44 -0700)]
LU-4684 tests: enable racer directory migration

Enable the dir_migrate test by default in racer test runs.

Update test selection logic to match newer script code style.

Test-Parameters: trivial testlist=racer env=DURATION=3600
Test-Parameters: fstype=zfs testlist=racer env=DURATION=600
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifba84c64b30d90b4a159232751b68c48c88dafcc
Reviewed-on: https://review.whamcloud.com/41359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14139 llite: simplify callback handling for async getattr 12/40712/11
Qian Yingjin [Thu, 19 Nov 2020 15:15:37 +0000 (23:15 +0800)]
LU-14139 llite: simplify callback handling for async getattr

In this patch, it prepares the inode and set lock data directly in
the callback interpret of the intent async getattr RPC request (in
ptlrpcd context), simplifies the old impementation that defer this
work in the statahead thread.

According to the benchmark result, the workload "ls -l" to a large
directory on a client without any caching (server and client),
containing 1M files (47001 bytes) shows the results with measured
elapsed time:
- w/o patch: 180 seconds;
- w patch: 181 seconds;

There is no any obvious performance regession.

Change-Id: Ifcfad3eb26d831bec3beea0c3d7045f31d35fa6a
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40712
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-6142 obdclass: resolve lu_ref checkpatch issues 88/44088/2
James Simmons [Sat, 26 Jun 2021 18:05:15 +0000 (14:05 -0400)]
LU-6142 obdclass: resolve lu_ref checkpatch issues

Fix up all the checkpatch issues reported for the code handling
lu_ref. Also change USE_LU_REF to CONFIG_LUSTRE_DEBUG_LU_REF
which will match what will be upstream.

Change-Id: I100e2679fc04c97eb67e4d44c4f6a6b530da6fa8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44088
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14734 osd-ldiskfs: enable large_dir automatically 31/43931/7
Andreas Dilger [Sat, 5 Jun 2021 08:34:15 +0000 (02:34 -0600)]
LU-14734 osd-ldiskfs: enable large_dir automatically

Enable the large_dir feature automatically at mount time for
filesystems that do not have it enabled already.  Otherwise,
the REMOTE_PARENT_DIR may overflow if there are many remote
entries created, or for object directories on very large OSTs.
It isn't really needed on a dedicated MGS filesystem.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1c4ead26b09d60567ad12945d7b366b53475cebb
Reviewed-on: https://review.whamcloud.com/43931
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14516 mgc: configurable wait-to-reprocess time 20/42020/19
Alex Zhuravlev [Fri, 12 Mar 2021 09:00:37 +0000 (12:00 +0300)]
LU-14516 mgc: configurable wait-to-reprocess time

so we can set it shorter, for testing purposes at least. to change
minimal wait time MGC module option 'mgc_requeue_timeout_min'
should be used (in seconds). additionally a random value upto
mgc_requeue_timeout_min is added to avoid a flood of config re-read
requests from clients. if mgc_requeue_timeout_min is set to 0,
then random part will be upto 1 second.

ost-pools: before: 5840s, after:a 3474s
sanity-flr: before: 1575s, after: 1381s
sanity-quota: before: 10679s, after: 9703s

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iff7dad4ba14d687b7e891a1c346397e4c370800d
Reviewed-on: https://review.whamcloud.com/42020
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14074 scripts: automatic LNet unconfigure 98/41698/3
Cyril Bordage [Fri, 19 Feb 2021 17:12:45 +0000 (18:12 +0100)]
LU-14074 scripts: automatic LNet unconfigure

After using the lnetctl utility a reference count is taken on the LNet
modules. lnetctl lnet unconfigure is called in order for
lustre_rmmod to remove the LNet module.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I7251a0c62c45da7b3cb0fddea97394b32cb6902a
Reviewed-on: https://review.whamcloud.com/41698
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13799 osc: Simplify clipping for transient pages 40/39440/12
Patrick Farrell [Fri, 7 May 2021 15:38:07 +0000 (11:38 -0400)]
LU-13799 osc: Simplify clipping for transient pages

The combination of page clip and page flag setting for
transient pages takes up several % of the time when
submitting them for async DIO.

But neither is required - Transient pages do not change
after creation except in limited cases, and in any case,
they are only accessible from the submitting thread -
there is no possibility of parallel access.

So we can set the page flags, etc, at init time.

This patch improves i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 22 ms/GiB

Totals:
Write: 204 ms/GiB
Read: 198 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     4647 MiB/s
read      4888 MiB/s

Plus this patch:
write     5030 MiB/s
read      5174 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I974ebb0f55734a8628f1f7e1c01092eb2ce5f83b
Reviewed-on: https://review.whamcloud.com/39440
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13799 clio: Implement real list splice 39/39439/11
Patrick Farrell [Fri, 7 May 2021 15:37:40 +0000 (11:37 -0400)]
LU-13799 clio: Implement real list splice

Lustre's list_splice is actually just a slightly
depressing list_for_each; let's use a real list_splice.

This saves significant time in AIO/DIO page submission,
getting a several % performance boost.

This patch reduces i/o time in ms/GiB by:
Write: 16 ms/GiB
Read: 14 ms/GiB

Totals:
Write: 220 ms/GiB
Read: 209 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     4326 MiB/s
read      4587 MiB/s

With this patch:
write     4647 MiB/s
read      4888 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Icfd4a3d9dd6f162b011b402a1c88d7dae53eff40
Reviewed-on: https://review.whamcloud.com/39439
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-13799 osc: Don't get time for each page 37/39437/13
Patrick Farrell [Fri, 7 May 2021 15:35:28 +0000 (11:35 -0400)]
LU-13799 osc: Don't get time for each page

Getting the time when each batch of pages starts is
sufficiently accurate, and ktime_get() is several % of the
CPU time when doing AIO + DIO.

This relies on previous patches in this series.

Measuring this in milliseconds/gigabyte lets us measure the
improvement in absolute terms, rather than just relative
terms.

This patch reduces i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 6 ms/GiB

Totals:
Write: 237 ms/GiB
Read: 223 ms/GiB

IOR:
mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
Without the patch:
write     4030 MiB/s
read      4468  MiB/s

With patch:
write     4326 MiB/s
read      4587 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I02897bf810683bc77a7d09156cdb83ba1d25ebf1
Reviewed-on: https://review.whamcloud.com/39437
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-13798 llite: parallelize direct i/o issuance 36/39436/30
Patrick Farrell [Fri, 28 May 2021 23:53:55 +0000 (19:53 -0400)]
LU-13798 llite: parallelize direct i/o issuance

Currently, the direct i/o code issues an i/o to a given
stripe, and then waits for that i/o to complete.  (This is
for i/os from a single process.)  This forces DIO to send
only one RPC at a time, serially.

In the case of multi-stripe files and larger i/os from
userspace, this means that i/o is serialized - so single
thread/single process direct i/o doesn't see any benefit
from the combination of extra stripes & larger i/os.

Using part of the AIO support, it is possible to move this
waiting up a level, so it happens after all the i/o is
issued.  (See LU-4198 for AIO support.)

This means we can issue many RPCs and then wait,
dramatically improving performance vs waiting for each RPC
serially.

This is referred to as 'parallel dio'.

Notes:
AIO is not supported on pipes, so we fall back to the old
sync behavior if the source or destination is a pipe.

Error handling is similar to buffered writes: We do not
wait for individual chunks, so we can get an error on an RPC
in the middle of an i/o.  The solution is to return an
error in this case, because we cannot know how many bytes
were written contiguously.  This is similar to buffered i/o
combined with fsync().

The performance improvement from this is dramatic, and
greater at larger sizes.

lfs setstripe -c 8 -S 4M .
mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect
Without the patch:
write     764.85 MiB/s
read      682.87 MiB/s

With patch:
write     4030 MiB/s
read      4468  MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I7e8df7d16b131b55a235f57c3280509559f94476
Reviewed-on: https://review.whamcloud.com/39436
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9680 utils: add netlink infrastructure 30/34230/36
James Simmons [Wed, 16 Jun 2021 19:28:13 +0000 (15:28 -0400)]
LU-9680 utils: add netlink infrastructure

Netlink was designed as a successor to ioctl as defined under
RFC 3549. There are several advantages to using netlink over
ioctls or virtual file system interfaces like proc. Collecting
proc doesn't scale well which was seen with power drain on Android
phones. A netlink implementation was developed to remove this
performance hit. Details can be read at:

https://lwn.net/Articles/406975

Besides the scaling gains the other benefit is the flexiblity
with API changes. Adding or removing information to be transmitted
doesn't require creating a new interface like ioctl do. Instead
you add new code to handle the stream of attributes read from the
socket. Lastly you can multiplex data to N listeners with groups
using one request.

This patch adds netlink handling in a generic way that can be
used by the libyaml library. This greatly lowers the barrier by
only requiring the implementor to understand the libyaml API.

Change-Id: Idcdac653a1f9cc9931238e869c3beadaefcf3410
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/34230
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13716 tests: skip sanity 205b for older servers 93/43993/2
James Nunez [Sat, 12 Jun 2021 00:05:08 +0000 (18:05 -0600)]
LU-13716 tests: skip sanity 205b for older servers

Lustre job stats and sanity test 205b were modified in Lustre
version 2.13.54.91.  When we run version intop testing with
servers less than this version and clients that are greater,
the test will fail.

Skip sanity test 205b for Lustre servers with version less than
2.13.54.91 and client greater than that version.

Test-Parameters: trivial
Test-Parameters: serverdistro=el7.9 serverversion=2.12.6 env=ONLY=205 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Icc5d6a6adcf03e5bd16b678596f28590fe31516e
Reviewed-on: https://review.whamcloud.com/43993
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
3 months agoLU-14533 tests: skip sanity-pfl 0d for older servers 71/43971/3
James Nunez [Thu, 10 Jun 2021 21:05:18 +0000 (15:05 -0600)]
LU-14533 tests: skip sanity-pfl 0d for older servers

sanity-pfl test 0d was added to Lustre version 2.14.50.115.
When we run version interop testing with servers with
version less than this, the test will fail.

We should skip sanity-pfl test 0d if the Lustre server
version is less than 2.14.50.115.

Fixes: 83e38bba62 ("LU-14180 utils: verify setstripe comp_end is valid")

Test-Parameters: trivial
Test-Parameters: serverversion=2.14.0 serverdistro=el8.3 env=ONLY=0d testlist=sanity-pfl
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I49b45c7a1e4804fece33d53a4fb946b49254de2b
Reviewed-on: https://review.whamcloud.com/43971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
3 months agoLU-14322 tests: skip sanityn 51e for old servers 69/43969/3
James Nunez [Thu, 10 Jun 2021 18:54:51 +0000 (12:54 -0600)]
LU-14322 tests: skip sanityn 51e for old servers

sanityn test 51e was added to Lustre version 2.13.54.148.
When we run version interop testing with servers less than
this version, the test will fail.

We should skip sanityn test 51e if the server version is
less than 2.13.54.148.

Fixes: 3ea729fe82 ("LU-13693 lfs: check early for MDS_OPEN_DIRECTORY")

Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 env=ONLY=51e testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Id2f165b275c97c3a1396a0da18a3f254dbe5efa7
Reviewed-on: https://review.whamcloud.com/43969
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
3 months agoLU-14755 tests: create custom pools 66/43966/2
Elena Gryaznova [Thu, 10 Jun 2021 09:51:52 +0000 (12:51 +0300)]
LU-14755 tests: create custom pools

We are interested in running some tests on fs with
the pools. The proposed enhancement allows to create
$FS_NPOOLS number of pools containing $FS_POOL_NOSTS
number of osts. If $FS_NPOOLS not set the number of
pools created is $OSTCOUNT / $FS_POOL_NOSTS.
Pools names are $FS_POOL based. Pools are not created if
FS_POOL not set.
Examples 1:
  FS_POOL=global OSTCOUNT=2
lustre.global0
OST lustre-OST0000_UUID
OST lustre-OST0001_UUID
Example 2:
  FS_POOL=global OSTCOUNT=6 FS_POOL_NOSTS=3
lustre.global0
OST lustre-OST0000_UUID
OST lustre-OST0001_UUID
OST lustre-OST0002_UUID
lustre.global1
OST lustre-OST0003_UUID
OST lustre-OST0004_UUID
OST lustre-OST0005_UUID
Example 3:
  FS_POOL=p OSTCOUNT=5 KEEP_POOLS=true FS_NPOOLS=7 FS_POOL_NOSTS=3
Pool: lustre.p0
lustre-OST0000_UUID
lustre-OST0001_UUID
lustre-OST0002_UUID
Pool: lustre.p1
lustre-OST0003_UUID
lustre-OST0004_UUID
lustre-OST0000_UUID
Pool: lustre.p2
lustre-OST0001_UUID
lustre-OST0002_UUID
lustre-OST0003_UUID
Pool: lustre.p3
lustre-OST0004_UUID
lustre-OST0000_UUID
lustre-OST0001_UUID
Pool: lustre.p4
lustre-OST0002_UUID
lustre-OST0003_UUID
lustre-OST0004_UUID
Pool: lustre.p5
lustre-OST0000_UUID
lustre-OST0001_UUID
lustre-OST0002_UUID
Pool: lustre.p6
lustre-OST0003_UUID
lustre-OST0004_UUID
lustre-OST0000_UUID

Patch adds the ability to remove all old pools at the
start if DELETE_OLD_POOLS set to true (default is false)
and the ability keep the new pools not deleted at the
end if KEEP_POOLS set to true (default is false).

Test-Parameters: trivial testlist=sanity-flr,ost-pools,ost-pools,sanity-pfl,sanity,sanityn
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-8172
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I73b72f9f39933b5b875978ce4fede5e9828c4c71
Reviewed-on: https://review.whamcloud.com/43966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14327 tests: skip sanity-sec test 55 for older servers 49/43949/5
James Nunez [Tue, 8 Jun 2021 16:34:29 +0000 (10:34 -0600)]
LU-14327 tests: skip sanity-sec test 55 for older servers

sanity-sec test 55 was added to lustre-master version
2.13.57.12 and to lustre-b2_12 version 2.12.6.3.  When
we run version interop testing with Lustre servers less
than these versions, the test will fail.  Thus, skip
sanity-sec test 55 for Lustre severs less than 2.12.6.3.

Fixes: 355787745f21 (“LU-14121 nodemap: do not force fsuid/fsgid squashing”)

Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 env=ONLY=55 testlist=sanity-sec
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ie002c921e853897105396185b38485799df31b7a
Reviewed-on: https://review.whamcloud.com/43949
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-9897 utils: allow setting llverfs subdir count 47/39347/2
Andreas Dilger [Fri, 30 Aug 2019 23:19:29 +0000 (17:19 -0600)]
LU-9897 utils: allow setting llverfs subdir count

Allow specifying the subdirectory count directly rather
than calculating it based on the filesystem size.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Idcae188ef4bdb417f0f983718bce7e55093ebbe5
Reviewed-on: https://review.whamcloud.com/39347
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-12982 tests: skip conf-sanity 5i for old servers 11/36811/4
James Nunez [Wed, 20 Nov 2019 22:03:11 +0000 (15:03 -0700)]
LU-12982 tests: skip conf-sanity 5i for old servers

conf-sanity tests 5i was added to lustre-master with version
2.12.54.  For all version interop testing with Lustre servers with
version less than 2.12.54 and newer clients, conf-sanity test 5i
will fail and should be skipped.

Fixes: d1b5146eda4f (LU-12206 mdt: mdt_init0 failure handling)

Test-Parameters: trivial
Test-Parameters: serverversion=2.12.6 serverdistro=el7.9 fstype=ldiskfs env=ONLY=5 testlist=conf-sanity

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ia493b6f80b42fbd92254150e8d40a6fbb1039635
Reviewed-on: https://review.whamcloud.com/36811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14752 obdclass: handle EBUSY returned for lu_object hashtable 68/43968/4
James Simmons [Thu, 10 Jun 2021 16:53:57 +0000 (12:53 -0400)]
LU-14752 obdclass: handle EBUSY returned for lu_object hashtable

When the rhashtable grows to a certain size it will be rescaled.
When rescaling you can be returned a ENOMEM or EBUSY error. This
we reported as:

LustreError: 3594004:0:(lu_object.c:2472:lu_object_assign_fid()) ASSERTION( rc == 0 ) failed: failed hashtable insertion: rc = -16
LustreError: 3594004:0:(lu_object.c:2472:lu_object_assign_fid()) LBUG
Pid: 3594004, comm: mdt01_020 4.18.0-240.22.1.1toss.t4.x86_64 #1 SMP Tue Apr 13 17:18:40 PDT 2021
Call Trace TBD:
Kernel panic - not syncing: LBUG
...
Call Trace:
dump_stack+0x5c/0x80
panic+0xe7/0x2a9
lbug_with_loc.cold.10+0x18/0x18 [libcfs]
lu_object_assign_fid+0x3b8/0x3c0 [obdclass]

Add handling the EBUSY case for our lu_object hash.

Fixes: aff14dbc522 ("LU-8130 lu_object: convert lu_object cache to rhashtable")
Change-Id: Id85f32633117e02850b799e8d95e3e35d982cbd4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43968
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-14741 obdclass: Wake up entire queue of requests on close completion 41/43941/3
Oleg Drokin [Mon, 7 Jun 2021 19:17:27 +0000 (15:17 -0400)]
LU-14741 obdclass: Wake up entire queue of requests on close completion

Since close requests could be stuck behind normal requests and get
more slots we need to wake up entire accumulated queue waiting
for the next modrpc slot or have additional waitqueue just for
close requests.

This patch goes with the former approach.

Fixes: 1fc013f901 ("LU-5319 mdc: manage number of modify RPCs in flight")
Change-Id: Ib4333c7f6731dd435364d5e5f529577a1600a235
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43941
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
3 months agoLU-13417 test: use mkdir_on_mdt0() in replay-dual 92/43492/6
Lai Siyao [Thu, 29 Apr 2021 03:51:33 +0000 (11:51 +0800)]
LU-13417 test: use mkdir_on_mdt0() in replay-dual

Replace mkdir with mkdir_on_mdt0() in replay-dual.sh if directory
needs to be created on MDT0.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-dual
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I9093e633412991571e18cb0ea264af013672bd8b
Reviewed-on: https://review.whamcloud.com/43492
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 months agoLU-13417 test: use mkdir_on_mdt0() in misc tests 91/43491/4
Lai Siyao [Thu, 29 Apr 2021 03:46:21 +0000 (11:46 +0800)]
LU-13417 test: use mkdir_on_mdt0() in misc tests

Replace mkdir with mkdir_on_mdt0() if directory needs to be created
on MDT0 in following tests:
* conf-sanity
* lustre-rsync-test
* ost-pools
* replay-ost-single
* replay-single
* replay-vbr
* sanity-hsm
* sanity-pcc
* sanity-quota
* sanity-sec

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=conf-sanity,lustre-rsync-test,ost-pools,replay-ost-single,replay-single,replay-vbr,sanity-hsm,sanity-pcc,sanity-quota,sanity-sec
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I96369f25982558a1dac7f4f7fe80a95bc1c0207d
Reviewed-on: https://review.whamcloud.com/43491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 months agoLU-13417 test: add mkdir_on_mdt0() 89/43489/7
Lai Siyao [Wed, 28 Apr 2021 14:36:24 +0000 (22:36 +0800)]
LU-13417 test: add mkdir_on_mdt0()

Once default LMV is set on ROOT, and default stripe offset is "-1",
mkdir may not create directory on MDT0, but it's a premise for many
tests. Add a function mkdir_on_mdt0() to create directory on MDT0
by "lfs mkdir -i 0".

Replace mkdir with mkdir_on_mdt0() for such tests in sanity.sh and
sanityn.sh.

Test-Parameters: trivial testlist=sanityn
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I6155d036e6b28153d0bdbdbc01088bd68ee9e0af
Reviewed-on: https://review.whamcloud.com/43489
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
3 months agoLU-10948 mdt: New connect flag for non-open-by-fid lock request 07/43907/4
Oleg Drokin [Thu, 3 Jun 2021 00:10:47 +0000 (20:10 -0400)]
LU-10948 mdt: New connect flag for non-open-by-fid lock request

While we removed the 2.1 check for open by fid when open
lock is requested, when you talk to old servers that don't
have that patch - they get an open error, so introduce a compat
flag.

Change-Id: I94d50ad98a2828519853a35fa90c5063adf2feab
Fixes: 41d99c4902 ("LU-10948 llite: Introduce inode open heat counter")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43907
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 months agoLU-14742 socklnd: detect link state to set fatal error on ni 52/43952/4
Serguei Smirnov [Tue, 8 Jun 2021 21:11:41 +0000 (14:11 -0700)]
LU-14742 socklnd: detect link state to set fatal error on ni

To help avoid selecting lnet ni which corresponds to a downed
ethernet link for sending, add a mechanism for detecting link
events in socklnd. On link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag, similar to o2iblnd way.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie9f4f02fcb8b988c77bf63f751d5a621e79e9f58
Reviewed-on: https://review.whamcloud.com/43952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14729 osd-ldiskfs: fix to declare write commits 94/43994/4
Wang Shilong [Mon, 14 Jun 2021 01:28:51 +0000 (09:28 +0800)]
LU-14729 osd-ldiskfs: fix to declare write commits

Fallocation might introduce unwritten extents, writting
data will trigger extents split, so we should reserve
credits for this case, to avoid complicated calculation,
we just use normal credits calculation if extent is mapped
as unwritten.

See comments in ext4:
If we add a single extent, then in the worse case, each tree
level index/leaf need to be changed in case of the tree split.
If more extents are inserted, they could cause the whole tree
split more than once, but this is really rare.

Lustre always reserve extents in 1 extent case, this is wrong.
Also fix indirect blocks calculation.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I9b67ec7b002711f040f46d0c77a645bb6f57a7de
Reviewed-on: https://review.whamcloud.com/43994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-12888 tests: remove big files in sanity 11/36511/10
Alex Zhuravlev [Fri, 18 Oct 2019 04:56:52 +0000 (07:56 +0300)]
LU-12888 tests: remove big files in sanity

otherwise sanity easily fails on a local setup

Test-Parameters: trivial

Change-Id: Ia0a561e650fca05837445eebe25ff1dea15366e4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36511
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 months agoLU-14093 utils: fix DLSYM buffer over flow 38/43938/3
James Simmons [Mon, 7 Jun 2021 12:33:59 +0000 (08:33 -0400)]
LU-14093 utils: fix DLSYM buffer over flow

The 'name' string passed to DLSYM macro is created from the fsname
buffer in load_backfs_module(). That buffer is greater than 512
bytes in size but the temporary buffer in DLSYM is only 64. The
newest gcc version detect this bug.

mount_utils.c: In function ‘load_backfs_module’:
mount_utils.c:530:36: error: ‘%s’ directive output may be truncated writing up to 507 bytes into a region of size 64 [-Werror=format-truncation=]
  530 |   snprintf(_fname, sizeof(_fname), "%s_%s", prefix, #func); \
      |                                    ^~~~~~~
mount_utils.c:593:2: note: in expansion of macro ‘DLSYM’
  593 |  DLSYM(name, ops, init);

Change-Id: I8ae30a5288f236fb9272dffd40f44175e5e03ef9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43938
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14736 utils: Change leak_finder to use stdout 34/43934/3
Patrick Farrell [Sat, 5 Jun 2021 21:17:23 +0000 (17:17 -0400)]
LU-14736 utils: Change leak_finder to use stdout

It is not an error for a leak checking script to find a
leak, so don't have leak_finder.pl print to stderr.  It also
prints several pieces of basic status to stderr, for which
there is no reason at all.

This makes it easier to redirect the output for interactive
use.

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: Iab226726ca4b36ada40a305962beedc363398c37
Reviewed-on: https://review.whamcloud.com/43934
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
4 months agoLU-14731 mdd: clear orphans changelog entries 01/43901/4
John L. Hammond [Wed, 2 Jun 2021 17:05:01 +0000 (12:05 -0500)]
LU-14731 mdd: clear orphans changelog entries

In mdd_changelog_llog_init(), adjust the orphan changelog index logic
to account for the case when no users are registered. Add sanity
test_160n() to verify this.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I03b0c1002a0e16f26af8ec23bf06c9a07dec858a
Reviewed-on: https://review.whamcloud.com/43901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14690 kernel: RHEL 8.4 server support 91/43791/8
Jian Yu [Fri, 4 Jun 2021 07:47:14 +0000 (00:47 -0700)]
LU-14690 kernel: RHEL 8.4 server support

This patch makes changes to support RHEL 8.4 release with
kernel 4.18.0-305.3.1.el8_4 for Lustre server.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Change-Id: I484af80c4764367b40b28ce459a6ff9d87edf3a8
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43791
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>