git://git.whamcloud.com - fs/lustre-release.git/log

LU-10030 utils: add lfs tool to change/list project of file

Currently, we are using chattr/lsattr for project quota
interface, this have some problems:

1)Client side need patched e2fsprogs or latest upstream e2fsprogs.
2)Project quota will be no longer osd-ldiskfs based, ZFS
too, zfs guys might dislike ldiskfs tool dependency for them.
3)customers argue chattr might be a little dangerous.

So this patch add native lfs tools for project quota.
usage: project [-p id] [-s] [-r] <file|directory..>
          set project ID and/or inherit flag for specified
          file(s) or directory.
       project [-d|-r [-0]] <file|directory...>
          list project ID and flags on file(s) or directory,
          print outliers
       project -c [-d|-r [-p id] [-0]] <file|directory..>
          check project ID and flags on file(s) or directory,
          print outliers
       project -C [-r] [-k] <file|directory..>
          clear the project inherit flag and ID on the file
          or directory

Test-Parameters: testlist=sanity-quota,sanity-quota,sanity-quota,\
    sanity-quota clientdistro=el7 serverdistro=el7 \
    ostfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I45960fb8fbd12e22a654792fba517896c0447447
Reviewed-on: https://review.whamcloud.com/29190
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10192 lfsck: verify agent entry

Originally, we only support agent entry for ldiskfs backend,
and the osd-ldiskfs will maintain agent entry from cross-MDTs
directory, NOT corss-MDTs regular file. So if someone create
cross-MDTs hard link or renames regular file cross-MDTs, then
related object will become invisible to userspace when mount
the MDT as 'ldiskfs' directly.

On the other hand, old ZFS based MDT also did not support
agent entry. When upgraded from the old ZFS based device,
or migrated from old ldiskfs based MDT to new ZFS based MDT,
some (or all) agent entries need to be created.

So we enhance the namespace LFSCK logic to check whether the
agent entry is properly setup or not. If not, the LFSCK will
trigger the lower layer agent entry verify mechanism via set
xattr operation.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0aa83aff8b39b894dbde19f573c078faf0ef249c
Reviewed-on: https://review.whamcloud.com/29985
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9796 kernel: improve metadata performaces for RHEL7

Port following upstream patch for RHEL7:

commit de92c8caf16ca84926fa31b7a5590c0fb9c0d5ca
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jun 8 12:46:37 2015 -0400

    jbd2: speedup jbd2_journal_get_[write|undo]_access()

    jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
    frequently called for buffers that are already part of the running
    transaction - most frequently it is the case for bitmaps, inode table
    blocks, and superblock. Since in such cases we have nothing to do, it is
    unfortunate we still grab reference to journal head, lock the bh, lock
    bh_state only to find out there's nothing to do.

    Improving this is a bit subtle though since until we find out journal
    head is attached to the running transaction, it can disappear from under
    us because checkpointing / commit decided it's no longer needed. We deal
    with this by protecting journal_head slab with RCU. We still have to be
    careful about journal head being freed & reallocated within slab and
    about exposing journal head in consistent state (in particular
    b_modified and b_frozen_data must be in correct state before we allow
    user to touch the buffer).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
From 087ffd4eae9929afd06f6a709861df3c3508492a Mon Sep 17 00:00:00 2001
Date: Fri, 4 Dec 2015 12:29:28 -0500
Subject: [PATCH] jbd2: fix null committed data return in undo_access

     introduced jbd2_write_access_granted() to improve write|undo_access
     speed, but missed to check the status of b_committed_data which caused
     a kernel panic on ocfs2.
     ...

     Fixes: de92c8caf16c("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patches improve 10% of file create, 17% of file lustre unlink
performances

Change-Id: I8082be396209d8f658e3265cedf32670e15a53f5
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/28276
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9543 ofd: fiemap deadlock

lock_zero_regions() locks all zero regions by acquiring
a set of independent locks.
It can deadlock with a PW lock for the whole file from
a client.

Indeed it isn't required to have all zero regions locked
at once, we need only force clients to flush data for
these regions.

Change-Id: Ib48e2bd9e6f715eb54a7821acde7b38b0de6650c
Seagate-bug-id: MRP-4393
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/15734
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Tested-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Reviewed-on: https://review.whamcloud.com/27224
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Artem Blagodarenko <c17828@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8895 target: limit grant allocation

tgt_grant_alloc() is missing a check for amount of space already
granted to a client. If the client submits number of RPCs
simultaneously when the client's grant is below its max amount of
grants then the server may grant the client with amount of grants
substantially exceeding the amount of grants requested in one RPC. In
case of decent number of clients that may lead to ENOSPC long before
the lack of disk space is really achieved.

Limit grants given to a client to asked amount plus grants for 2 full
write RPCs.

A test to illustrate the issue is included.
The test needs to lower debug level so that dd provided sufficient I/O
throughput.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Seagate-bug-id: MRP-4013
Change-Id: Ie6a8abbad28a06bc1d55ff2fd042b9664a29e9e4
Reviewed-on: https://review.whamcloud.com/24096
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6609 test: wait for import state FULL

recovery-small 26a sometimes couldn't remove sub-test dirs.
Decrement of export number may be caused by net issues.
So, now test is passed only when import state becomes EVICTED.
And in the end it waits for state FULL before removing sub-test dirs.

Test-Parameters: trivial envdefinitions=SLOW=yes,ONLY=26a testlist=recovery-small

Change-Id: Ib6156f4761bc79d89b42654898b51cc86c2ef40a
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/1277
Signed-off-by: Sergey Cheremencev <Sergey_Cheremencev@xyratex.com>
Xyratex-bug-id: MRP-1168
Tested-by: Jenkins
Tested-by: Elena Gryaznova <elena_gryaznova@xyratex.com>
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-on: https://review.whamcloud.com/14843
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9019 llite: change lli_glimpse_time to ktime

Currently lli_glimpse_time is in jiffies which can vary between
platforms. Migrate to ktime since we need more than second
time resolution that is consistent on any platform. Replace the
last cfs_time_current_sec() with ktime_get_real_seconds().

Change-Id: I352c3adbd07d9dadb7e5dbe180447a1cb18a48d2
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30601
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5637 tests: set filefrag blocksize to 1024

If blocksize for filefrag is unspecified it defaults to 1024 bytes.
But for example at SL7 blocksize defaults is 4096.
Set it to 1024 to have everywhere the same results.

Test-Parameters: trivial envdefinitions=ONLY=130 testlist=sanity
Change-Id: I550446c6c7c0b85aa769f0f8a7575a6d33b2dc4b
Cray-bug-id: MRP-2933
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://review.whamcloud.com/30391
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>

LU-10269 ldlm: allow trybits in waiting queue

Lock trybits can be kept while lock is waiting and each new
lock filters trybits of locks in the waiting queue. When lock
if granted finally remaining trybits are added to the granted
bits. Therefore trybits can be granted for blocking lock if
no other locks take these bits while lock is waiting.

Test-Parameters: mdscount=1 mdtcount=1 testlist=racer,racer,racer,racer
Signed-off-by: Mikhal Pershin <mike.pershin@intel.com>
Change-Id: I775f776f4cf8b581e32e4a1585e862e1764b5bed
Reviewed-on: https://review.whamcloud.com/30343
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8626 hsm: expose the number of active hsm requests per type

This patch creates 3 new proc files under the hsm directory:
- archive_count
- restore_count
- remove_count

These should help monitor the coordinator's health and allow
policy engine to adapt their request flow.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I30c9fb658e8c14a181b094b51408c92df609c3ca
Reviewed-on: https://review.whamcloud.com/30336
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8271 nodemap: wait before getting large conf if changed

If a nodemap configuration spans multiple RPCs, it's possible for the
nodemap config to change between RPCs. Previously, the MGC would
immediately retry to get the nodemap config. This patch modifies the
behavior so the nodemap config lock is readded to the wait queue,
which will delay retrying by ~10s.

Test-Parameters: envdefinitions=SLOW=yes \
testlist=sanity-sec,sanity-sec,sanity-sec

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ie4e70def712e5eaa38adecc450e39c0380e34b69
Reviewed-on: https://review.whamcloud.com/26781
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10321 lfsck: not start lfsck during umount

There is race condition bewtween lfsck_start and umount:
the LFSCK may be triggered just after the LFSCK stopped
during umount the target, then nobody will stop the new
started LFSCK, as to the umount may be blocked.

This patch sets flag on the lfsck instance when umount
that will prevent subsequent lfsck_start.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I779f862d4195d4289bb9dd96575cd7746ac4b35b
Reviewed-on: https://review.whamcloud.com/30513
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10321 lfsck: allow to stop the in-starting lfsck

The LFSCK start logic will hold li_mutex on the lfsck instance
during LFSCK start processing. The LFSCK stop logic also needs
to take the li_mutex on the lfsck instance when stop the LFSCK.
If someone triggers lfsck_stop (such as when umount the target)
before the lfsck_start return, then lfsck_stop will be blocked
on the li_mutex. And if the li_mutex holder is blocked by other
things, for example, it may be waiting for the LFSCK RPC to be
handled by remote server (MDT/OST) but the connection or remote
server is not ready yet, then the lfsck_stop will be blocked.

To avoid such cascade block trouble, the patch makes lfsck_stop
can go ahead without taking li_mutex, then it can directly tell
related LFSCK engines the stop event even if former lfsck_start
does not complete yet.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I6e168d955db33d74778142235a8ed2802d3577d9
Reviewed-on: https://review.whamcloud.com/30420
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10212 ldlm: fix prolong for destroyed lock

For a IO request ofd_prolong_extent_locks use
a fast path if the lock is found by handle. If the lock
has LDLM_FL_DESTROYED, prolong should try a general path.

No lock was accounted for IO request with destroyed lock
and ESTALE error happaned for a client.

operation ost_read to node x.x.x.x@o2ib failed: rc = -116

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I63e619d0330279bb2ae678ed98b1c0e899ad4e08
Reviewed-on: https://review.whamcloud.com/29992
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>

LU-5163 mdd: migrated entry may not exist

During dirent migration, we shouldn't assert file exists.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I11bbc5556007ec045b7a5d57a250981082ef6d70
Reviewed-on: https://review.whamcloud.com/26620
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6142 gnilnd: handle LNet core typedef removal

The LNet layer has removed all the typedefs. Update the gnilnd
driver to handle the removals.

Test-parameters: trivial

Change-Id: I96377a977bc9ef689a0c48afb77106aa2a8993a1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30474
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9943 lnd: correct WR fast reg accounting

Ensure that enough WRs are allocated for the fast reg
case which needs two additional WRs per transfer:
the first for memory window registration and the second for
memory window invalidation.

Failure to allocate these causes the following problem:
mlx5_warn:mlx5_0:begin_wqe:4085(pid 9590): work queue overflow

Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Icf98b6bbb3d98fb29794173da84412070f13541b
Reviewed-on: https://review.whamcloud.com/30311
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10185 gnilnd: Change gnc_tx_bits to unsigned long

gnc_tx_bits declared as __u8. Change to unsigned long.

The goal is to align gnc_tx_bits[] to a 32 bit boundary.
The initial declaration I believe was not a good choice.
The use of the gnc_tx_bits makes more sense as a long than as a 8 bit
value.
It is used by:
static inline int test_and_clear_bit(int nr, unsigned long *addr);
unsigned long find_next_zero_bit(unsigned long *addr, unsigned long
size, unsigned long offset);
static inline int test_and_set_bit(int nr, unsigned long *addr);
all of which takes a unsigned long pointer.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I559e2a356182f253716d30f69bf675c485fe1b72
Reviewed-on: https://review.whamcloud.com/29897
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10185 gnilnd: change default credits

It has been found that a credit value of 64 reduces likelihood of file
io induced congestion without producing performance degradation when a
many router single compute pattern occurs.
Change default compute credits to 64 based on this finding.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I0a1f46bdb2c327a7e6dc9eeb145fe1418d691da1
Reviewed-on: https://review.whamcloud.com/29896
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Shimek <knathrak@gmail.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10185 gnilnd: Use consistent size for GNI MBOX.

KMALLOC_MAX_SIZE is not consistent when changing
the allocator. For example when using the SLUB allocator
the size is set to 30 bits. For slab allocator the max size
is 25 bits or 32MB.

For SLAB this is a no-op change for other allocators the size
is also set to 32MB.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Icab4c69f9afd276d55360a76991ba9783f83ac8b
Reviewed-on: https://review.whamcloud.com/29895
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10185 gnilnd: Add mod param to adjust vmalloc's retry flag

In kernel 3.12 gnilnd had issues with the memory allocator
getting stuck trying to find pages for long periods of time
so __GFP_NORETRY was added to fail fast. The memory subsystem
has changed enough that it is beleived that vmalloc will fail
if it can't get pages.

Add modparam to allow no_retry flag to be enabled or disabled
dynamically.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I067a8d7b52d237e464fceae7e2220f2488b68957
Reviewed-on: https://review.whamcloud.com/29894
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Shimek <knathrak@gmail.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10185 gnilnd: treat seq_printf as void function

Starting with the 4.3 linux kernel seq_printf is a void
function not returning the character count. Earlier cleanups
did not update all of gnilnd to accomodate this change
when they updated the rest of Lustre. This change takes
care of what was missed.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ie76c1cc7578cb5919a1f2aca6fb15fd8b75882c1
Reviewed-on: https://review.whamcloud.com/29893
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10185 gnilnd: SLES 12 SP2 support and cleanup

Handle a couple changes for SLES 12 SP2:
- Handle rename of set_mb() to smp_store_mb()
- Handle rename of __GFP_WAIT to __GFP_RECLAIM
- Handle parameter change to sock_create_kern()

Also remove various dead code.

Test-parameters: trivial

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I9b5799a492b347fe7961f2d70e24bed5cc2e7021
Reviewed-on: https://review.whamcloud.com/29892
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10371 ptlrpc: check for posix_acl a_entries

Since commit 2211d5ba5c6c4e972ba6dbc912b2897425ea6621
posix_acl_xattr_entry a_entries[0] was removed.
Make sure the LASSERTF test works with Kernels
after this commit.

Test-Parameters: trivial
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Change-Id: If2404c89775a5f9077c7d9379d73c8187b796a3a
Reviewed-on: https://review.whamcloud.com/30495
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10302 ldlm: destroy lock if LVB init fails

In ldlm_cli_enqueue_local(), destroy the newly created lock if
ldlm_lvbo_init() fails. Rename sanity test_232() to test_232a(), fix
the fail_loc setting to happen on the correct node, and check that the
OST can be unmounted after the failed write. Add sanity test_232b() to
do the same but for data version since that uses a different LDLM
path.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I18e9594b9a2461afbc66128f477d3185a6627bc0
Reviewed-on: https://review.whamcloud.com/30477
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10149 llite: avoid live-lock when concurrent mmap()s

Patch is an attempt to delay page-fault retry from Client side for
inode with extents being mmap()'ed, to prevent live-lock situation
to occur with other page's competitors.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Id04664fce1a5dad4dbdd7ad4b183dffb8e38b844
Reviewed-on: https://review.whamcloud.com/30465
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10301 kernel: kernel update RHEL7.4 [3.10.0-693.11.1.el7]

update RHEL 7.4 kernel to 3.10.0-693.11.1.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Iea7a4604c333c26fdc1b2439e6a00f27472d2410
Reviewed-on: https://review.whamcloud.com/30401
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10291 lnd: remove concurrent_sends tunable

Concurrent sends tunable was intended to limit the number of in-flight
transfers per connection. However queue depth does the exact same job.
So for example if the queue depth is negotiated to 16 and
concurrent_sends is set to 32, the maximum number of in-flight
transfers doesn't exceed 16. There is no need to keep concurrent_sends
around since it doesn't add any unique functionality

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I4dd7f5ecac5a7d10403c4f0ab0f01374a2f10206
Reviewed-on: https://review.whamcloud.com/30312
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9998 libcfs: split single NUMA node into partitions

For machines with single NUMA node change default behavior
and slpit it with cpu_npartitions as it was before 2.8.59.
See LU-5050 libcfs: default CPT matches NUMA topology

Change-Id: I7f9122931a88fd5770628d7ae21b764efc21c134
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/29645
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10192 osd-zfs: create agent entry for remote entry

In DNE environment, the object (in spite of regular file
or directory) and its name entry may reside on different
MDTs. Under such case, we will create an agent entry on
the MDT where the object resides. The agent entry references
the object locally, that makes the object to be visible to
the userspace when mounted as 'zfs' directly. Then the
userspace tools, such as 'tar' can handle the object properly.
That is compatibile between ldiskfs backend and ZFS backend.

We handle the agent entry during set linkEA that is the common
interface for both regular file and directroy, can handle kinds
of cases, such as create/link/unlink/rename, and so on.

NOTE: we can NOT do that when ea_{insert,delete} that is only
for directory.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Icc4a63027221edf279994fbecda4d47cc121b799
Reviewed-on: https://review.whamcloud.com/29617
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10192 osd-ldiskfs: create agent entry for remote entry

In DNE environment, the object (in spite of regular file
or directory) and its name entry may reside on different
MDTs. Under such case, we will create an agent entry on
the MDT where the object resides. The agent entry references
the object locally, that makes the object to be visible to
the userspace when mounted as 'ldiskfs' directly. Then the
userspace tools, such as 'tar' can handle the object properly.

We handle the agent entry during set linkEA that is the common
interface for both regular file and directroy, can handle kinds
of cases, such as create/link/unlink/rename, and so on.

NOTE: we can NOT do that when ea_{insert,delete} that is only
for directory.

There are two known issues:
1. For one object, we will create at most one agent entry even if
   there may be more than one cross-MDTs hard links on the object.
   So the local e2fsck may claim that the object's nlink is larger
   than the name entries that reference such inode. And in further,
   the e2fsck will fix the nlink attribute to match the local
   references. Then it will cause the object's nlink attribute to
   be inconsistent with the global references. it is bad but not
   fatal. The ref_del() can handle the zero-referenced case. On the
   other hand, the global namespace LFSCK can repair the object's
   attribute according to the linkEA.
2. There may be too many hard links on the object as to its linkEA
   overflow, then the linkEA entry for cross-MDTs reference may be
   discarded. If such case happened, then at this point, we do not
   know whether there are some cross-MDTs reference. But there are
   local references, it guarantees that object is visible to
   userspace when mounted as 'ldiskfs'. That is enough.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ic9cc68a4dc864307dd5dd5fdb930bfac699c8379
Reviewed-on: https://review.whamcloud.com/29984
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9019 libcfs: add ktime_compare for platforms lacking it.

The function ktime_compare() was added to the linux kernel
starting with verison 3.19. Add to libcfs support for older
platforms lacking this function.

Test-Parameters: trivial

Change-Id: I7826b8e78d0dc2c633490a2949210176a0003d9a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30475
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-1757 brw: Fix short i/o and enable for mdc

The short i/o flag was left out of the OST flags in the
original patch, meaning it was not really on. Also, the
short_io_size value was used uninitialized, meaning it
was sometimes non-zero, which coudl lead to several issues.

Also add the short i/o flag to the MDC/MDT for data on MDT.
Quick testing suggests this works fine with no further
changes.

Cray-bug-id: LUS-187
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I4154b87d5ad73b53467b0382368fad7c5ba177fe
Reviewed-on: https://review.whamcloud.com/30435
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10213 lnd: calculate qp max_send_wrs properly

The maximum in-flight transfers can not exceed the
negotiated queue depth. Instead of calculating the
max_send_wrs to be the negotiated number of frags *
concurrent sends, it should be the negotiated number
of frags * queue depth.

If that value is too large for successful qp creation then
we reduce the queue depth in a loop until we successfully
create the qp or the queue depth dips below 2.

Due to the queue depth negotiation protocol it is guaranteed
that the queue depth on both the active and the passive
will match.

This change resolves the discrepancy created by the previous
code which reduces max_send_wr by a quarter.

That could lead to:
mlx5_ib_post_send:4184:(pid 26272): Failed to prepare WQE
When the o2iblnd transfers a message which requires more
WRs than the max that has been allocated.

Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I88f96f950bf4c0a8efd4df812d44e5e20d5907dc
Reviewed-on: https://review.whamcloud.com/30310
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9019 sec: migrate to 64 bit time

Replace cfs_time_current_sec() to avoid the overflow
issues in 2038 with ktime_get_real_seconds(). Mirgate
the rest of the gss code to time64_t to avoid the 2038
overflow issue.

Currently in encrypt_page_pools we are reporting the jiffy
cycles for "max wait time" which not only doesn't make
sense but can vary from platform to platform. Instead we
will report in terms of milliseconds. That requires changing
epp_st_max_wait into ktime_t since we need better than
seconds precision. Lastly the time in encrypt_page_pools for
"last access" and "last shrink" was showing up negative.
This was due to epp_last_* field being set to the number of
seconds since the epoch instead of the number of seconds
since the node booted. Change epp_last_* to being set by
ktime_get_seconds() instead of ktime_get_real_seconds()
resolves this problem.

Test-Parameters: trivial

Change-Id: Ia2d559454287675699a067121760543a2e6877da
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/29859
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9810 lnd: use less CQ entries for each connection

Currently we have a 2 work requests chains per transfer.
It mean OFED stack will generate only 2 events if transfer will
faild. Reduce number CQ entries to avoid extra resource consumption.

Test-Parameters: trivial
Seagate-bug-id: MRP-4508
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Change-Id: I0c06fef9589478f40ef7e1eeacff2aec7013e562
Reviewed-on: https://review.whamcloud.com/28279
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10129 lnd: rework map_on_demand behavior

map_on_demand use is ambiguous. In kernels which supported global
memory regions, map-on-demand was used to limit the number of RDMA
fragments transferred between peers. That tuanble didn't impact the
memory allocation since the maximum allowed was always allocated. It
was used however as a variable in determining max_send_wr. With the
introduction of FMR if the number of fragments exceed the negotiated
number of fragments, which is determined by the map-on-demand value,
then FMR is used because we can transfer all the fragments in one FMR
fragment.

The latest kernels have removed support for global memory regions so
the use of map-on-demand has become ineffectual. However, we need to
keep it for backwards compatibility.  The behavior has changed as
follows:

map_on_demand is a flag used to determine if we can use FMR or
FastReg.  This is applicable for kernels which support global memory
regions. For later kernels this flag is always enabled, since we will
always either use FMR or FastReg For kernels which support global
memory regions map_on_demand defaults to 0 which means we will be
using global memory regions exclusively.  If it is set to a value
other than 0, then we will behave as follows:
  1. Always default the number of fragments to IBLND_MAX_RDMA_FRAGS
  2. Create FMR/FastReg pools
  3. Negotiate the supported number of fragments per connection
  4. Attempt to transmit using global memory regions only if
     map-on-demand is off, otherwise use FMR or FastReg.
  5. In case of transmitting tx with GAPS over FMR we will need to
     transmit it with multiple fragments. Look at the comments in
     kiblnd_fmr_map_tx() for an explanation of the behavior.

For later kernels we default map_on_demand to 1 and not allow it to be
set to 0, since there is no longer support for global memory regions.
Behavior:
  1. Default the number of fragments to IBLND_MAX_RDMA_FRAGS
  2. Create FMR/FastReg pools
  3. Negotiate the supported number of fragments per connection
  4. Look at the comments in kiblnd_fmr_map_tx() for an explanation of
     the behavior when transmit with GAPS verses contiguous.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I20b0ea57ec394a050603b5a638f515dfc4ac9446
Reviewed-on: https://review.whamcloud.com/29995
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9714 llog: fix llog_process_thread race

lgh_cur_offset is used at llog_write_osd_rec,
and modified at llog_process_thread.
When two threads of llog_process_thread are processing the
same llog, it could happen that one thread do llog_write,
and second modify lgh_cur_offset. This situation drives to
wrong modification of llog.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Seagate-bug-id: MRP-4455
Change-Id: I7a63850c876146b14118a7d395cf3cfb3a40dd67
Reviewed-on: https://review.whamcloud.com/27838
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-5216 hsm: cancel hsm actions running on CT when killed"

This is too complicated given what it accomplishes and review
comments to that effect were not addressed.

This reverts commit 462c7aae05dfc9cd730f44ffdc661c4c36294012.

Change-Id: I7ab12e62780a8c3f4d4428980d9ce5be02101761
Reviewed-on: https://review.whamcloud.com/30615
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-9019 osc: migrate to time64_t"

This causes frequent failures tracked in LU-10403

This reverts commit 4f2a5d5887492da9abe320074511811415e0a06c.

Change-Id: I7d255fcee654508b6df233728624a39917853b98
Reviewed-on: https://review.whamcloud.com/30571
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10232 lov: call cl_object_attr_get under cl_attr lock

cl_object_attr_get() must be called under cl_object_attr_lock
get. There is place in lov_getstripe where it is called
without that lock.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ia0a2322ba4ff0ff4affb081375cb108fbf2988c4
Reviewed-on: https://review.whamcloud.com/30052
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10005 osp: cache non-exist EA

OSP should also cache non-exist EA, otherwise
it will keep sending RPC to try to get remote
non-exist EA.

So if default stripe EA does not exist on root
MDT, file creation on non-root MDT will always
try to send extra OSP RPC to get the EA.

This patch also fixes a LFSCK bug that may cause
the LFSCK repaired items to be counted repeately.

Test-Parameters: mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs mdscount=2 mdtcount=4 testlist=sanity-lfsck,sanity-lfsck,sanity-lfsck
Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Ie9c8cb5fd54e404f1af97de47e809f6f96de8d86
Reviewed-on: https://review.whamcloud.com/29078
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>

LU-10331 out: use OBD_ALLOC_LARGE() for update buffers

In out_handle() the update buffers may be up to 100KB so use
OBD_ALLOC_LARGE() to avoid high order page allocation errors.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ie9970eb219ff0e5357132e0858cb16db8f703209
Reviewed-on: https://review.whamcloud.com/30455
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>

LU-10038 test: ignore readdir races in sanity 133g

In sanity test_133g() writing to some files (for example
mdt/*/exports/clear) may cause cause the proc tree to change while
find is running. Avoid errors in these cases by adding the
-ignore_readdir_race option to the find invocations.

Test-Parameters: trivial testlist=sanity

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I3ce61ee0f4e1041041d6872a0fe03488a9df363c
Reviewed-on: https://review.whamcloud.com/30451
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>

LU-10346 utils: use the correct lib path in mount_utils.c

The backfs_load_module() function of mount_utils.c used an outdated
to link itself against the osd library. This patch fixes it.

Test-Parameters: trivial
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I90a523b45b5772c05f7520e810eec7eac14b75be
Reviewed-on: https://review.whamcloud.com/30431
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10338 test: handle empty CLIENTS in mountcli()

In mountcli() use if [ -n "$CLIENTS" ]; ... rather than
[ -n "$CLIENTS" ] && ... to avoid immediate exit on error when CLIENTS is
the empty string.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I440c3eefb4bee134caee1db5ff2ecdbdf9c1ee5c
Reviewed-on: https://review.whamcloud.com/30413
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10330 tests: fix version check in sanity.sh test_101g

The 16MB RPC patch is available in 2.7.x, so one more version
check needs to be added to 2.7.17 or later to avoid interop
failure.

Test-Parameters: trivial

Change-Id: I6eeefbf74017aaeeb8998accd6da04d8de75606c
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/30389
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9535 tests: check server version in sanityn 77j

The QoS feature in the TBF policy was added to Lustre with
tag 2.9.53 and a patch with commit hash d2c403363f6. Thus,
sanityn test 77j should only be run for servers with version
2.9.53 and later.

Test-Parameters: trivial mdsjob=lustre-b2_9 ossjob=lustre-b2_9 serverbuildno=2 testlist=sanityn
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ia2a04b5ae688809d4a84a2dc8459598f02932119
Reviewed-on: https://review.whamcloud.com/30385
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10267 llog: fix EOF handling in llog_client_next_block()

In llog_client_next_block() update *cur_idx and *cur_offset in the
special case that the handler has returned -EIO after reaching the end
of the log without finding the desired record. This fixes client side
EOF detection in llog_process_thread().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: If30ebec065ddd38fb7b06c3e16f96bb3cd76fa1b
Reviewed-on: https://review.whamcloud.com/30313
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10248 mdd: set PFID for swap and merge layout

It should update PFID for the OST objects if the layout has been
moved to another file. So far only swap and merge layout would
have this requirement.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ic2fd7d7a88cd0ef4501d2f02baf58dd177e07973
Reviewed-on: https://review.whamcloud.com/30292
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8999 test: check chown result

For test_38 in sanity-quota, the chown operation should be
checked whether it succeeds or not. the patch also collects
the output of quota users if the test failed.

Change-Id: I6c04f9519f3f097af7126064380faf1bdc4fff6a
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/30243
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10257 clio: remove unused cl_lock layers

Remove the struct vvp_lock and struct lovsub_lock, omitting these
layers from cl_lock. Adjust cl_lock_enqueue() to allow for empty locks
(corresponding to unstriped files).

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ib1478d0cfb9604540ffc38eb9b01da4f4f96212a
Reviewed-on: https://review.whamcloud.com/30192
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9474 tests: rewrite copytool_setup to use stack_trap

Replace copytool_setup with its rewrite (copytool setup) as much
as possible. There are some interaction between copytool_setup()
and other test functions (mainly import_file() and its derivatives).
Later patch(es) will handle those interactions and completely
remove copytool_setup() from sanity-hsm.

Among the benefits of using the new copytool() function:
- it has a nice command line interface
- it uses stack_trap() to register clean up actions
- it makes it easier to run sanity-hsm with other copytools

Test-Parameters: trivial clientcount=3 mdscount=2 testlist=sanity-hsm,sanity-hsm
Change-Id: I81635da39f19d001e0f5c6348b233b45c4298fd0
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-on: https://review.whamcloud.com/30097
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10227 ptlrpc: simplify struct ptlrpc_request_set

Remove obd_statfs_rqset(), replacing its only use with
obd_statfs(). Collapse lov_statfs_async() and lov_statfs() into a
single function, removing the need for lov_statfs_interpret().

Remove the then unused set_wakeup_ptr, set_cblist, set_interpret, and
set_arg members of struct ptlrpc_request_set. Remove struct
ptlrpc_set_cbdata and ptlrpc_set_add_cb(). On x86_64 this reduces the
size of struct ptlrpc_request_set from 152 bytes to 112.

Add sanity test_118n() to ensure that OST_STATFS requests are still
async.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iec8aa378157367f03de96f82d67158b281ec374c
Reviewed-on: https://review.whamcloud.com/30060
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9774 mdt: mdt_dump_lmm(), don't panic

skip compond layouts as mdt_dump_lmm() doesn't understand that yet.

Change-Id: I48d447f81d58466f473de871bc59d011d8b6f7ba
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/29188
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-10190 osd-zfs: create agent object for remote object

In DNE environment, the object and its name entry may reside
on different MDTs. Under such case, we will create an agent
object on the MDT where the name entry resides. The agent
object is empty to indicates that the real object for this
name entry resides on another MDT. If without agent object,
related name entry will be skipped when perform MDT side file
level backup/restore via ZPL by userspace tool, such as 'tar'.

Test-Parameters: envdefinitions=ONLY=803 testlist=sanity mdtfilesystemtype=zfs ostfilesystemtype=zfs mdscount=2 mdtcount=4
Test-Parameters: envdefinitions=ONLY=34 testlist=sanity-lfsck mdtfilesystemtype=zfs ostfilesystemtype=zfs mdscount=2 mdtcount=4
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7fb8c54ca774f3877bcd326bc10d8e99083ac90a
Reviewed-on: https://review.whamcloud.com/28855
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10189 osd: handle PFID EA in LMA properly

Originally, the issue was caused by old ldiskfs OST device
with 256-bytes sized inode. Because the inode inline space
was very limited, we have to store the PFID EA inside LMA
EA for stripe and PFL component information.

When we restore the OST from such old OST via server side
file level backup, then such composite LMA will be on the
new OST even if the new OST inode has enough inline space
to hold separated PFID EA.

In futher, if we migrate the old OST from ldiskfs to ZFS,
then such composite LMA will also be on the ZFS based OST
although the PFID EA can be stroed independently on ZFS.

So the OSD logic, in spite of ldiskfs or ZFS, needs to
understand the composite LMA EA, and handle it properly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I2b66787e725e13da7984f1bc2df45760dfbe4c4d
Reviewed-on: https://review.whamcloud.com/29696
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10187 osd-zfs: repair FID-in-dirent for ZFS backend

If the ZFS backend is restored from MDT file-level backup, then
the FID-in-dirent will be lost. The patch can regenerate/repair
the lost/corrupted FID-in-dirent during namespace LFSCK.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I747cdb75adc93971a7450f9e6b5e99598c79f656
Reviewed-on: https://review.whamcloud.com/28609
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10186 osd-zfs: move LAST_ID OI mapping out of oi.xx

Originally, the OI mapping for LAST_ID was put under the
oi.xx file, it may cause some issues:

1) If we remove the oi.xx file, then before the OI scrub
   scanned the whole device, we can not know the last ID
   for related sequence. That will fail the system start
   as to the OI scrub will be stopped by force.

2) It is incompatible with ldiskfs backend, that will cause
   trouble after migrating from ldiskfs-based device.

This patch moves the LAST_ID OI mapping from oi.xx file to
/O/<seq>/LAST_ID as ldiskfs backend does.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibfef1d9ef5be72f6e04ff79f92302483f7fe8a15
Reviewed-on: https://review.whamcloud.com/28679
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>

LU-10188 osd-zfs: handle non 4K aligned block size

After restored from server side file level backup, the files
created via ZPL may use non 4K-bytes aligned block size. When
the device is mounted as Lustre, the zfs-osd will adjust the
client visible OST-objects' block size as at least 4K-bytes
aligned. For the objects that cannot be reset the block size,
the OSD logic needs to handle the non aligned case properly.
Otherwise, Lustre I/O handler may cause osd-zfs or ZFS panic
when osd_bufs_get_read().

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ic07aeec3fc774508cedd6d24cca54b76171d143b
Reviewed-on: https://review.whamcloud.com/29241
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8602 libcfs: call proper crypto algo when keys are passed in

In most cases keys are not passed to cfs_crypto_hash_alloc()
but if they are then crypto_ahash_setkey() will fail. Keys
are only handled by the hmac version of the algorithm requested.
If a key is passed into cfs_crypto_hash_alloc() then we should
request the hmac version of the algorithm when calling
crypto_alloc_ahash().

Change-Id: I080d89bc864b236524ef11f50df41b750ecab9fe
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/25199
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5020 llite: don't reconnect MGC if connecting

1. In mgc_set_info_async(KEY=KEY_INIT_RECOV_BACKUP "init_recov_bk"),
   the MGC import should be reconnected only if its state is
   LUSTRE_IMP_DISCON

2. in mgc_target_register, if the target will regenerate the config,
   we should use some longer delay limit to wait the MGC to connect
   to MGS for the target (server) will fail to exit if the request
   expired due to delay limit.

3. In case of parallel mount, the async cleanup of OSS will affects
   the following mount for the OSS can't be setup again, then there
   should be some barrier to sync with the OSS cleanup.

Change-Id: I805b84cf12100ec2cc68f95bb65a9c396e0fbc1b
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/10229
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-618 llite: IO accounting of page read

When CONFIG_TASK_IO_ACCOUNTING is used with Lustre, writes are
accounted for but not read.

The accounting is normally done in the kernel for page writeback
and readahead functionlity, Therefore, as Lustre implements its
own readahead, it must also maintain its own accounting on read
(but not for write)

Change-Id: I19f330be65324a8da002f9d61cb9262345ecb012
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/1636
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-10268 lfsck: postpone lfsck start until initialized

Sometimes, the LFSCK start request may comes (from remote server)
before local target initialized. If we start the LFSCK right away
on current server, the LFSCK engine may access NULL pointer, such
as lookup FID with NULL 'ss_server_fld'.

To avoid such trouble, start LFSCK logic will return -EINPROGRESS
to the request sponsor. It is the sponsor duty to retry the start
request some time later.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: If7bc44e025b5f3c4f977b3a35e3784ada548a2df
Reviewed-on: https://review.whamcloud.com/30259
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10269 ldlm: fix the issues introduced by try bits

Try to grant optinal try_bits if lock is not blocked
and to be granted immediately otherwise trybits are cleared,
therefore there are no locks with trybits in both granted and
waiting queue.

Also mdt_object_lock_try() is changed to don't return negative for
try lock only case.

Test-Parameters: mdscount=1 mdtcount=1 testlist=racer,racer,racer
Signed-off-by: Mikhal Pershin <mike.pershin@intel.com>
Change-Id: If3f5734355511ad1af71d5996026d04c60f3e8c1
Reviewed-on: https://review.whamcloud.com/30246
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9019 osc: migrate to time64_t

Change od_contention_time from int to time64_t to make it clear
this field is in units of seconds. With this change move the other
*_contention_time fields from jiffies to time64_t to avoid the
overhead of switching between the two different time formats.
Change ops_submit_time also to time64_t since using jiffies
doesn't gain us anything and having it in time64_t makes it clear
this is code related to time.

Change-Id: I97d3685ac61781e6b1dc8634bc105bb0ffe76aa0
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30063
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8626 hsm: count the number of started requests of each type

Add counters in the coordinator to keep track of how many started hsm
requests of each type (ARCHIVE, RESTORE, REMOVE) the coordinator is
managing. There is no counter for CANCEL requests as the coordinator
does not keep track of them like it does for the other requests.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I0c1e9881f2f6a4f005dfc3545b1b51714eb91b7b
Reviewed-on: https://review.whamcloud.com/28677
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5216 hsm: cancel hsm actions running on CT when killed

This patch handles:

1. Unexpected client (data mover node) eviction could
cause on going hsm requests to be stuck in "STARTED"
state as the copy tool running on the data mover node
is not available anymore and requests could not be
finished. This patch unregisters the copy tool and
cancels all the requests on the copytool's agent.

2. The way to stop a copytool is to KILL it. In this
case also, as explained above, on going hsm operations
get stuck in "STARTED" state and if another copytool
tries to do any hsm activity on that file, it will not be
processed as the hsm status of the file is still STARTED.

In such cases i.e
if the copytool's agent is killed then we are just going
to add the hsm cancel request to agent record
and mark request state as ARS_CANCELED. mdt_cordinator()
thread running in the back ground looks into the request state
and marks it as CANCELED. This allows hsm status to be
maintained at proper state and allows any further
operation to proceed as expected.

Adding test_62 to sanity-hsm.sh to verify the fix

Test-Parameters: trivial testlist=sanity-hsm
Seagate-bug-id: MRP-2464
Signed-off-by: vinayakswami hariharmath <vinayakswami.hariharmath@seagate.com>
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Change-Id: Iceb1a3450bcbcec287fe11c6a9fce45fc6097e3c
Reviewed-on: https://review.whamcloud.com/24238
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6349 idl: remove obsolete RPC MSG flags

Add comments for MSG_* flags, and delete obsolete flags
MSG_LAST_REPLAY, MSG_AT_SUPPORT, MSG_DELAY_REPLAY, MSG_VERSION_REPLAY.

Remove now-unused imp_no_lock_replay field.

Add comments for MSG_CONNECT_* flags, and delete obsolete flag
MSG_CONNECT_ASYNC.

Delete them from wirecheck files.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I4f6ba93151246d173d796a6625a683cf65f767f7
Reviewed-on: https://review.whamcloud.com/17831
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6349 idl: clean up and document ptlrpc structures

Miscellaneous cleanups for wire-related structures found during the
Lustre protocol documentation project.

Remove obsolete and unused since 2.3 OBD_QC_CALLBACK RPC.

Move definition of constants used by lustre_msg_v2 and ptlrpc_body
fields to immediately before their respective structures.

Add comments for struct members for lustre_msg_v2, ptlrpc_body_v3,
ldlm_request, and obd_statfs.

Rename RQF_MDS_INTENT_CLOSE to RQF_MDS_CLOSE_INTENT to make it more
easily found with RQF_MDS_CLOSE. Rename mdt_intent_close_client()
and mdc_intent_close_pack() to mdt_close_intent_client() and
mdc_close_intent_pack() to match.

Rename RQF_LDLM_GL_DESC_CALLBACK to RQF_LDLM_GL_CALLBACK_DESC to make
it match RQF_LDLM_FL_CALLBACK and RQF_LDLM_BL_CALLBACK.

Remove unused MSG_OP_FLAG_MASK, MSG_OP_FLAG_SHIFT, MSG_GEN_FLAG_MASK.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I21be2ec0aacef9965ff9c835174b5b017b3ebbe5
Reviewed-on: https://review.whamcloud.com/17830
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10134 lfsck: not add requests if engine out of work

There is race condition between LFSCK assistant engine and
LFSCK request generators: before the LFSCK assistant engine
exit, it will mark itself as 'stopping', then cleanup the
in-queue requests, and then mark itself as 'stopped'. It is
expected that the 'stopping' status will prevent generators
adding more LFSCK requests. But current implementation only
checks 'stopped' or not. So if the LFSCK engine thread exit
before the whole system scanned that may because of some
failures or on demand, more LFSCK requests may be added in
the queue after the cleanup.

The patch fixes the wrong logic by checking 'running' or not,
and stop adding more LFSCK requests if not 'running'.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ic2b5ca3f5e80b5be5a5c60aa24f0b54682b717d9
Reviewed-on: https://review.whamcloud.com/30165
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9983 osp: align the OSP request size by 4k

In osp_object_update_request_create() round up the size of the object
update request to a multiple of the PAGE_SIZE to avoid issues with IB
HW that cannot handle gaps in memory regions.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Id5c16b031dcb16d764c4e4f325f51b9ecf454533
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-on: https://review.whamcloud.com/29270
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7585 osd-ldiskfs: auto scrub control

Originally, there was a lproc interface in OSD, named
"auto_scrub", used for the control of auto detecting
inconsistent OI mapping and trigger OI scrub. But such
switch is too simily, either 'on' or 'off'. It is not
convenient for the real system control. For example:
If we just finished one cycle OI scrub, should we auto
detect OI inconsistency in the subsequent lookup()?
If yes, it will cause some unnecessary overhead. But
if no, then as long as there was once OI scrub, we will
have no more chance to auto detect corrupted OI mapping.

To resolve such trouble, this patch enhances the lproc
interface "auto_scrub" to allow the system administrator
to specify how long (in seconds) the system can be trusted
after the lastest OI scrub. During such trusted interval,
we will not auto detect inconsistent OI mapping. the
default value is one month (60 * 60 * 24 * 30).

The patch also replaces cfs_time_current_sec() with
ktime_get_real_seconds() for some cleanup work.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iae2c7dd1da92c27d40357c62cd94e886228c86f7
Reviewed-on: https://review.whamcloud.com/29710
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7585 scrub: general framework for OI scrub

Reconstruct OI scrub code to make some data structure and
functions can be shared between osd-ldiskfs and osd-ZFS.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I4c4e3fcbd11078d10592136e9e55de66516a2b16
Reviewed-on: https://review.whamcloud.com/28607
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>

LU-10129 lnd: set device capabilities

MLX-4, MLX-5 and OPA support different capabilities. Query the
device and cache the capabilities of the device for future use.

MLX5 can support fast registration and gaps
MLX4 and OPA only support FMR

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I70d468f8af52d263139e7b51341bf4b5150b89c1
Reviewed-on: https://review.whamcloud.com/30309
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7340 mdd: changelogs garbage collection

When changelogs are almost full (few number of
free entries in catalog), try to recover some space
by unregistering users that are idle since too long,
based on new Changelog User record field (in fact,
using previous cur_padding unused field) to keep
track of last user's changelog cancel request time,
or based on gap between user index and current
ChangeLog record, for older registered user.
sanity/test_160[f,g] have been added to verify feature.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I2100b101772e6d027675e5efa5606d4be24342a0
Reviewed-on: https://review.whamcloud.com/27103
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-10344 test: create files on local node in sanity-hsm

In sanity-hsm, rewrite create_file() to call dd ... directly rather
than do_facet $SINGLEAGT dd ... Rewrite create_archive_file() to call
do_facet $SINGLEAGT ... directly instead of calling
create_file(). Move the Lustre parent directory creation from
create_archive_file() to import_file().

Test-Parameters: trivial testlist=sanity-hsm

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I923475c32fa69b0ac59b95b90b42c147cf361274
Reviewed-on: https://review.whamcloud.com/30433
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>

New tag 2.10.56

Change-Id: I128e691f84fed31ea3b9332be6e254252c6de174
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10296 obdclass: no errors on missing o_cleanup

it's legal to have o_cleanup undefined.

Change-Id: Ifd80dbd36fabcbfbd9a1609746d79a37e5a01023
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/30314
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-10297 lod: prepare inuse array always

The OST inuse array should be prepared for composite files and for
plain files as well, since creating OST objects in both cases needs
to use it.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ibb37b680c68f8883650cdee6bebebc1c4d844623
Reviewed-on: https://review.whamcloud.com/30334
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-930 man: better describe conf_parm -d in lctl.8

Add a bit more description to "lctl conf_param -d".

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ib7a71122ea3e77e910d817a0ecdc4e6d1ddfd2eb
Reviewed-on: https://review.whamcloud.com/30241
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4761 tests: clean up spurious messages and files

Reduce spurious messages that appear in the test and console logs
that are not useful for debugging, and just clutter things up.

Clean up a few files left over from tests.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ie17888a8f9c5864530dcb455a0cd4e51f83ebbe5
Reviewed-on: https://review.whamcloud.com/29012
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4134 obdclass: fix double free in failure path

We should just decref for obd if register is failed.
Since class_export_put will invoke class_free_dev,
then we may face a double free.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ia8d1e487c69c4de1c7c247158cc8615aa6b6093a
Reviewed-on: https://review.whamcloud.com/29967
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-9727 lustre: add uid/gid to Changelogs entries

Add a new changelog extension named changelog_ext_uidgid to hold
uid/gid information.
uid/gid info is added to every Changelog entry type except MARK, in
the form 'u=<uid>:<gid>':
5 01CREAT 15:44:32.385864793 2017.07.18 0x0 t=[0x200000402:0x3:0x0]
ef=0x1 u=500:500 p=[0x200000402:0x2:0x0] file1

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie09e4cd146dea75985de00b1da58f75c2a5928f2
Reviewed-on: https://review.whamcloud.com/28114
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>

LU-10224 obd: free obd_svc_stats when all users are gone

During object device shutdown obd_svc_stats must only be freed
after all access methods from user-land are no longer possible
to prevent any race and further crash.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Iea4f5b0486779c1721c90f32538af1a723f76a79
Reviewed-on: https://review.whamcloud.com/30249
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5761 tests: re-add test_89 to replay-single ALWAYS_EXCEPT

Re-add replay-single test_89 to the ALWAYS_EXCEPT list, as it
is still failing intermittently.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ia98e28a2c4b863fd48cf0cd5d5ca9fe9cbafe9f3
Reviewed-on: https://review.whamcloud.com/30252
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Merge "LU-9771 flr: Merge branch 'flr'"

LU-9897 utils: make liblnetconfig a hard requirment

With the recent changes to LNet lnetctl is now really a hard
requirment. This patch makes lustre require libyaml libraries
in order to build. In turn liblnetconfig and lnetctl are now
always built.

Test-Parameters: trivial

Change-Id: I26ff9397f3d5ba11da5ab4e76658ffd8c27ed035
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30204
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10256 hsm: enable setting archive_id in hsm_set

Setting HSM flags in lfs_hsm_change_flags(...,LFS_HSM_SET)
does not allow to specify the archive_id, that is,
in llapi_hsm_state_set(path, mask, 0, 0 /* archive_id */)
archive_id = 0 is always set, which means no identifier change.
For having full flexibility (e.g. for debugging), introduce
the additional option --archive-id in hsm_set. If the option
is not provided, the default behavior (archive_id = 0) is used
and no archive identifier change is done. In addition a
test case is provided to check the functionality and shell
function get_hsm_archive_id() is modified in favor of more
robust grep for contents after pattern approach.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Change-Id: I2145a18ecf32479527bb045140e5e881e58dd115
Reviewed-on: https://review.whamcloud.com/30150
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8497 osp: handle remote -ENOMEM from osp_send_update_req()

In osp_send_update_req() detect an unsent request by checking
rq_queued_time == 0 rather than rq_set == NULL, which is always true
after returning from ptlrpc_queue_wait().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ief959b71600157a9c3521775cc06994326e50c51
Reviewed-on: https://review.whamcloud.com/30083
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8402 llite: simplify ll_inode_revalidate()

ll_inode_revalidate() is only called by ll_getattr() so move the
contents of ll_inode_revalidate() to ll_getattr() and rename
__ll_inode_revalidate() to ll_inode_revalidate().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ieeba1b74551b7a531e1db47ccb0638c72c90adb6
Reviewed-on: https://review.whamcloud.com/30036
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8402 llite: assume OBD_CONNECT_ATTRFID

OBD_CONNECT_ATTRFID has been supported by MDSs since before 1.6 so add
it to the client side required flags and remove some code to handle
servers that do not support it.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I841ae21811c0903c10a84e659d2d1f8255c8109f
Reviewed-on: https://review.whamcloud.com/30010
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-8402 ldlm: assume OBD_CONNECT_IBITS

Clients and MDSs have supported and required OBD_CONNECT_IBITS since
before 1.6 so remove obsolete code to handle clients that do not
support this flag.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I24a623b063c518755b81d02fc48fa0f76aacd318
Reviewed-on: https://review.whamcloud.com/30009
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10165 llite: disable statahead if starting statahead fail

Once starting statahead thread fails, it should disable statahead.
Current code only does this when "sai != NULL", instead it should
check whether current process is opening this dir, so for cases
like current file is not the first dirent, or sai allocation fail,
it won't retry statahead.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Iaedddd3659cdffeab51800f45b02f0b39c4a1ec1
Reviewed-on: https://review.whamcloud.com/29817
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5170 lfs: Standardize error messages in macros

Error messages in ARG2INT() and ARG2ULL() are updated to a standard
format. Messages are prefixed with the name of the utility and the
command that caused the error. User-provided values are delimited
with single quotes.

Test-Parameters: trivial
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I31a0a30ac15681826e7e25b8a44d56174fb23e08
Reviewed-on: https://review.whamcloud.com/28250
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>

LU-9727 lustre: Add an additional set of 64 changelog flags.

This adds a new changelog extension containing 64 additional flag
bits, to be used for future changelog extensions.
The presence of the extension is signalled using the last remaining
unused changelog flag bit.
The new extension is present in all changelog records by default, but
will be removed from records read by legacy changelog consumers.

Signed-off-by: Matthew Sanderson <matthew.sanderson@anu.edu.au>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iad05880e82400d7e84d927e84e44d1454e240a80
Reviewed-on: https://review.whamcloud.com/28045
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10271 hsm: don't release with wrong size

The result of ll_merge_attr operation was ignored during
ll_hsm_release. For this case the released file could have
the wrong size on MDS, like
File: ‘released.file’
Size: 0 Blocks: 0 IO Block: 4194304 regular empty file

Patch adds test_253 sanity-hsm, to check hsm release
operation when cl_object_attr_get failed. It produces
the wrong size-on-mds for a released file.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I94df1889265e5b6d03b16745de93e52af95d5b7c
Reviewed-on: https://review.whamcloud.com/30240
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-10176 tests: limit DoM tests to new servers only

Run DoM tests only if server has support for DoM by checking
its version is 2.10.55 at least.

Signed-off-by: Mikhal Pershin <mike.pershin@intel.com>
Change-Id: I79f26cd3c80fbfc4bf48ddc3fcc762248d491034
Reviewed-on: https://review.whamcloud.com/30085
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>

LU-6142 uapi: Get rid of lustre_fid typedef

Replace it with struct lu_fid. Update the userland code and man
pages to reflect this change.

Linux-commit: d8f6bc9a53f97d1ea4b2b955672904338643308b

Test-Parameters: trivial

Change-Id: I0b7e0770dd9da9bdac55c02c2ec98aea7cea7100
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/29849
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>

LU-10220 mdd: fix buf alloc in mdd_changelog_data_store_by_fid

Fix allocation of mti_big_buf by call to lu_buf_check_and_alloc()
in mdd_changelog_data_store_by_fid().
reclen must take the header size of struct llog_changelog_rec into
account.

Maybe no memory corruptions were seen before because the buffer size
allocated in a previous call to mdd_declare_changelog_store() was
covering the need. But audit will add more information in changelog
records, provoking memory corruptions without this fix.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id0a06c412b54c0ae12c15d53f3e166e3e5d9ed68
Reviewed-on: https://review.whamcloud.com/30014
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>