git://git.whamcloud.com - fs/lustre-release.git/log

LU-6284 ptlrpc: comment for FLD_QUERY RPC reply swab

The 'fld_read_server' uses 'RMF_GENERIC_DATA' to hold the 'FLD_QUERY'
RPC reply that is composed of 'struct lu_seq_range_array'. But there
is not registered swabber function for 'RMF_GENERIC_DATA'. So the RPC
peers need to handle the RPC reply with fixed little-endian format.

In theory, we can define new structure with some swabber registered
to handle the 'FLD_QUERY' RPC reply result automatically. But from
the implementation view, it is not easy to be done within current
'struct req_msg_field' framework. Because the sequence range array
in the RPC reply is not fixed length, instead, its length depends
on 'lu_seq_range' count, that is unknown when prepare the RPC buffer.
Generally, for such flexible length RPC usage, there will be a field
in the RPC layout to indicate the data length. But for the 'FLD_READ'
RPC, we have no way to do that unless we add new length filed that
will broken the on-wire RPC protocol and cause interoperability
trouble with old peer.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I466a7e229e4ecbb062e6d0f8eea3c6f053ef5e75
Reviewed-on: http://review.whamcloud.com/22309
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>

LU-7898 osd: remove unnecessary declarations

Refactor the code a bit to remove unnecessary declarations
(which are very expensive in ZFS). The patch also introduces
initial preparations to support large dnodes - it tracks
all declared EAs at object creation and tracked number can
be used to request dnode of appropriate size.

With this patch + LU-7918 disk/memory space reserved for a
single-stripe creation goes down from ~33MB to 4.6MB.

Performance improvements from this patch are also significant.
Running mdtest create performance on a test node (ramdisk):

    Threads    0.6.5   0.6.5+patch
        1       9933       14279
        2      12870       20469
        4      16405       26407
        8      19320       28254
       16      15648       26620
       32      14107       26483

Change-Id: I2c25542e51a320b1b48b4782b5f0b43799de5fe9
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/19101
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/22296

LU-8524 tests: Awk command re-structured to pass correct value

Variable "selinux_policy" was having incorrect value passed to
it due to inappropriate structuring of couple of commands.
Both those commands have been restructured in order to have the
correct value passed to the variable "selinux_policy".
Test was run with this modified piece of code successfully,
and the result for it can be found in comments section of the
ticket.

Test-Parameters: trivial testlist=sanity-selinux,sanity-selinux,sanity-selinux
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: I3c6c86d607edeadd03ab694435fb201c08c23654
Reviewed-on: http://review.whamcloud.com/22070
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8165 target: detect race by checking last_rcvd slot index

A race can occur on Server during Client connection and
concurent eviction, when Client's last_rcvd slot index has still
not been assigned (-1).
This patch adds a check to address such condition.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ifead82719a0dc9411f1b79d6c8c59eb9ef339fa5
Reviewed-on: http://review.whamcloud.com/20328
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Grégoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8081 osd-ldiskfs: improve transaction debug message

Print the actual credit limits that were exceeded when complaining
on the console about problems with transaction credit accounting.

Ensure all transaction credit messages include the device name.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I17125bb39ecaf699a722ac77bf29060cde3ebbe5
Reviewed-on: http://review.whamcloud.com/19865
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7650 o2iblnd: Put back work queue check previously removed

The previous patch, http://review.whamcloud.com/21304/, removed
a check needed until LU-5718 is properly addressed. With
the check, LU-5718 results in an error message and a lost
RDMA operation. Without it, we have memory corruption and
a crash (much harder to debug).

Putting the check back in case LU-5718 is not fixed soon.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I2efcc4e60a80794b38174da707d2a7fc27f81b6a
Reviewed-on: http://review.whamcloud.com/22281
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8410 ldiskfs: new FIEMAP API

With RH 6.5 old API was deprecated and was removed.
Backport a new API from ext4 upstream in opposite to copy-paste
older buggy code as FIEMAP now uses in current code.

Kernel upstream commit is 91dd8c114499e9818f2d5919ef0b9eee61810220
ext4: prevent race while walking extent tree for fiemap.

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Change-Id: I7790c9e1a9cbfbd2cc429292aa764250e0525e21
Reviewed-on: http://review.whamcloud.com/21603
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8348 recovery: don't send last_committed after panic

Do not update last_committed if we are not sure the
commit was successful.

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Seagate-bug-id: MRP-3013
Change-Id: I176b86a01cac46bd7d6af85843135a57a3df0e87
Reviewed-on: http://review.whamcloud.com/21060
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7782 scrub: handle slave obj of striped directory

When lookup item under striped directory, we need to locate the
master MDT-object of the striped directory firstly, then the
client will send lookup (getattr_by_name) RPC to the MDT with
some slave MDT-object's FID and the item's name. If the system
is restored from MDT file level backup, then before the OI scrub
completely built the OI files, the OI mappings of the master
MDT-object and slave MDT-object may be invalid. Usually, it is
not a problem for the master MDT-object. Because when locate the
master MDT-object, we will do name based lookup (for the striped
directory itself) firstly, during such process we can setup the
correct OI mapping for the master MDT-object. But it will be
trouble for the slave MDT-object. Because the client will not
trigger name based lookup on the MDT to locate the slave
MDT-object before locating item under the striped directory,
then when osd_fid_lookup(), it will find that the OI mapping
for the slave MDT-object is invalid and does not know what the
right OI mapping is, then the MDT has to return -EINPROGRESS to
the client to notify that the OI scrub is rebuiding the OI file,
related OI mapping is unknown yet, please try again later. And
then client will re-try the RPC again and again until related
OI mapping has been updated. That is quite inefficient.

To resolve above trouble, we will handle it as the following
two cases:

1) The slave MDT-object and the MDT-object are on different
   MDTs. It is relative easy. Be as one of remote MDT-objects,
   the slave MDT-object is linked under /REMOTE_PARENT_DIR
   with the name of its FID string. We can locate the slave
   MDT-object via lookup the /REMOTE_PARENT_DIR directly.

2) The slave MDT-object and the MDT-object reside on the same
   MDT. Under such case, during lookup the master MDT-object,
   we will lookup the slave MDT-object via readdir the master
   MDT-object, because the slave MDT-objects information are
   stored as sub-directories with the name "${FID}:${index}".
   Then when find the local slave MDT-object, its OI mapping
   will be recorded. Then subsequent osd_fid_lookup() will
   know the correct OI mapping for the slave MDT-object.

The patch also fix a race between osd_fid_lookup and OI scrub:
the OI scrub thread will remove osd_inconsistent_item from the
global list before updating related OI mapping, and if someone
call osd_fid_lookup() for the OI mapping during such interval,
it will get failure and trigger full mode OI scrub by wrong.

The patch also enhance sanity-scrub to avoid DNE in sanity-scrub
on one MDT.

Test-Parameters: mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs mdscount=2 mdtcount=4 testlist=sanity-scrub,sanity-scrub,sanity-scrub
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I6b449ef86221410dfc16005a586ed140b9a48b38
Reviewed-on: http://review.whamcloud.com/21506
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7433 ldlm: xattr locks are lost on mdt

mdt_intent_getxattr() can return EFAULT if a buffer cannot be found,
it is returned after lock_replace, where a new lock is installed into
lockp. An error forces ldlm_lock_enqueue() to destroy the original
lock, but ldlm_handle_enqueue0() drops the reference on the new lock.
xattr client code implied intent error is returned under a lock,
which is immediately cancelled. Check if a lock obtained and cancel
it properly for error cases. Note: we should support both cases for
interop needs, an intent error under a lock and with a lock abort.
Keep returning a lock with an intent error for interop purposes for
now, to be dropped later when client will get old enough.
make all intent ops to work through md_intent_lock: getxattr
and layout, which should extract the intent error.

Signed-off-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Change-Id: I7b628b50448c4bdb26a3a8758fc16a44212ad9ac
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Tested-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Reviewed-on: http://review.whamcloud.com/17220
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8565 test: change sanity 255a to not fail for performance when running in VM

Considering we may run our testing in VMs with other parallel workloads,
and also out VMs are short on memory. Therefore the complete time of I/O
task is unreliable and depends on the workload on the host machine when
the task is running.
So as Andreas suggested, here we change sanity 255a to not fail even if
the performance isn't as expected when running in a VM, like we did to
sanity 248.

Test-Parameters: trivial

Change-Id: If2a76c64f053dc6c7dc8acf4afd5a68ea3a757b6
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Reviewed-on: http://review.whamcloud.com/22375
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-8518 kernel: kernel update [SLES12 SP1 3.12.62-60.62]

Update target and kernel_config files for new version

Test-Parameters: clientdistro=sles12 testgroup=review-ldiskfs \
mdsdistro=sles12 ossdistro=sles12 mdsfilesystemtype=ldiskfs \
mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I3f15d7910e4d356ee696696c3c9af9d9b9d589f2
Reviewed-on: http://review.whamcloud.com/22045
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8513 kernel: kernel update RHEL7.2 [3.10.0-327.28.3.el7]

Update RHEL7.2 kernel to 3.10.0-327.28.3.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I39b888ff6bcb905dd5f5b58c3a014734e4144742
Reviewed-on: http://review.whamcloud.com/22049
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8510 dne: set osd_obj_ea_ops::dt_invalidate

git commit 226fd401f9d8bfcd1a71bf264d9baef1e0842441 omits setting
dt_invalidate operation for osd_obj_ea_ops.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I33ae8b7239e056b3fb6981c9bc2dc0ec3c530e15
Reviewed-on: http://review.whamcloud.com/22017
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-522 lod: do not ignore degraded flag of ost.

QoS allocation algorithm ignores degraded flag of OSTs.
Added a check for degraded ost flag in lod_alloc_qos().

Seagate-bug-id: MRP-2820

Signed-off-by: Jadhav Vikram <jadhav.vikram@seagate.com>
Change-Id: Ib2390518afff7b9bd459ce64bf609af99071e46d
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/9966
Tested-by: Jenkins
Reviewed-by: Ujjwal Lanjewar <ujjwal.lanjewar@seagate.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Tested-by: Parinay Vijayprakash Kondekar <parinay.kondekar@seagate.com>
Reviewed-on: http://review.whamcloud.com/20747
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7655 tests: ost fake write for performance testing

Just drop the pages in ofd_commitrw_write(), but we need to maintain
correct file size and always create a transaction so client can pin
those pages in memory until transaction commits.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ia9a2af0a159c8969479656d3a7016db3cda71a91
Reviewed-on: http://review.whamcloud.com/5164
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8560 llite: handle is_compat_task() rename

The linux kernel 4.6 renamed is_compat_task() to
in_compat_syscall().

Change-Id: I2d3733a1ec03873d000b9f25aa8a98c3b02be410
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/22208
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8560 libcfs: handle stacktrace function address() change

Starting in linux kernel 4.6 the address() function
from struct stacktrace now return an int. Update
Lustre to handle this change.

Change-Id: I7d14c9134de3ae5642e2cad7d1d3829eb4ee9c50
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/22207
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8560 libcfs: handle PAGE_CACHE_* removal in newer kernels

Starting with linux kernel 4.6 all the PAGE_CACHE_* defines
have been removed. Now it is required to use PAGE_* instead.
This is a simple blanket change since PAGE_CACHE_* was always
the same as PAGE_*.

Change-Id: I3ba8954d44969e2473afa939bbb8b8b5b1345446
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/22206
Tested-by: Maloo <hpdd-maloo@intel.com>
Tested-by: Jenkins
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8560 libcfs: add autoconf test for crypto changes

For linux 4.5 kernels the simple ifdef test in
linux-crypto.c worked but with linux 4.6+ kernels
we need to add a proper crypto api test for the
new inline functions crypto_ahash_alg_name() and
crypto_ahash_driver_name().

Test-Parameters: trivial

Change-Id: Ic18808b622d374cf6dc2417220ed83adc43ea692
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/22205
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8560 lustre: remove unused crypto handlers in lustre_compat.h

The unused crypto code in lustre_compat.h doesn't
build with linux kernel version 4.6+. Since its
not used just delete it.

Test-Parameters: trivial

Change-Id: If7634428357837372f4756b0ace3af9c2cd77366
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/22204
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8407 recovery: more clear message about recovery failure

Currently, the DNE recovery depends on the update logs on the MDTs.
If fail to get the update logs from some MDT(s), then the recovery
cannot go ahead. Different from client-side recovery failure, the
cross-MDT recovery failure may cause the namespace inconsistency.
Because we does not want to export the inconsistent namespace to
client, then we make the recovery (not abort because of timeout)
to wait there until related update logs available.

So if some MDT does not up or not mount, then the recovery on other
MDTs will hung there. As the time going, the client (re)connection
will trigger warning message on the MDTs to say about the recovery
hung. But such message does not clearly describe what happened.

This patch addes callback interface in target_distribute_txn_data,
called 'tdtd_show_update_logs_retrievers'. It allows the users to
check which MDTs are still in fetching update logs. Then the admin
can check related MDTs in detail when hit recovery trouble.

This patch also introduce new recovery status "WAITING" for the
case of update logs not ready for some MDT(s). Under such case,
the non-ready MDTs index and waited time will be shown.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: If5ed4487fe1e6d94f02479d83f6a187d6427b3a7
Reviewed-on: http://review.whamcloud.com/21759
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8361 lfsck: detect Lustre device automatically

Originally, when start/stop/query LFSCK, the user needs to
specify the Lustre device via "-M" option explicitly. Even
if there is only single Lustre device on current server or
the user wants to start the LFSCK on all devices with the
"-A" option specified, the "-M" option is still required.
Such requirement is inconvenient. This patch enhances the
LFSCK user interfaces to allow the user to run the LFSCK
commands without "-M" specified. Instead, it will select
the available Lustre device on current server automatically.
But under the following cases the "-M" option is still
required: if there are multiple devices on current server
those belong to different Lustre filesystems, or if "-A"
option is not specified and there are multiple devices on
current server.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I291b958440b2409c93cdc8ef3a5e3fbe14885141
Reviewed-on: http://review.whamcloud.com/21596
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-1482 mdd: Setting xattr are properly checked with and without ACLs

Setting extended attributes permissions are properly checked with and
without ACLs. In user.* namespace, only regular files and directories
can have extended attributes. For sticky directories, only the owner
and privileged user can write attributes.

Intel-bug-id: LDEV-40
Intel-change: http://review.whamcloud.com/15848

Change-Id: Ibd79dcc15e61839d878f4847f7836f29d823be61
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/21496
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8498 nodemap: new zfs index files not properly initialized

Calling index ->next on a new zfs returns a non-zero RC, but ldiskfs
indexes start with a blank record. This change modifies the config
load code to always write the default nodemap to an empty index file.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: I30a365f65463979889f09f7ad5ffcdacc83fa868
Reviewed-on: http://review.whamcloud.com/21939
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8333 test: make sure COS is cleared

In subtest 21b of replay-dual, the COS could be set after the MDT
is failed over, and the test will fail in this case

Change-Id: I9401b905593c76f8fddfab19ab9eb6c0fe886e41
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/21924
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7903 mdt: dump exports information on console

To avoid being truncated in debug log, obd_exports_barrier() should
dump the exports information on console along with the "Is it stuck?"
warning message.

Test-Parameters: testlist=recovery-small,recovery-small,recovery-small
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I9dbaa7ed1d590db89ad6f42b66ec883dfb8b7ce1
Reviewed-on: http://review.whamcloud.com/21599
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-3815 tests: sanity-hsm - Remove tests from Always_Except

Removing tests 34/35/36 from the ALWAYS_EXCEPT list

Test-Parameters: trivial \
testlist=sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm

Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: I293b45ab0f8ff27c4f35500ffa30ba348489e788
Reviewed-on: http://review.whamcloud.com/20079
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-7898 osd: remove unnecessary declarations"

This patch causes build failures in master due to
reverted LU-7899 6cd79ab5860c5 patch that I failed
to catch in time due to deficiency in my build process.

This cannot be easily fixed since apparently a big
chunk of functionality was yanked from under this patch,
so I can only revert it for now.

This reverts commit ead6df2feee9c143b617cb60e50e403c955bd401.

Change-Id: I5ee89bf0c9260312f157c251b83dd417fa2cf260
Reviewed-on: http://review.whamcloud.com/22293
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8175 ldlm: conflicting PW & PR extent locks on a client

PW lock isn't replayed once a lock is marked
LDLM_FL_CANCELING and glimpse lock doesn't wait for
conflicting locks on the client. So the server will
grant a PR lock in response to the glimpse lock request,
which conflicts with the PW lock in LDLM_FL_CANCELING
state on the client.

Lock in LDLM_FL_CANCELING state may still have pending IO,
so it should be replayed until LDLM_FL_BL_DONE is set to
avoid granted conflicting lock by a server.

Change-Id: I99a1d81a8932ac7b7b3346558446f9d638156309
Seagate-bug-id: MRP-3311
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://review.whamcloud.com/20345
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8500 ldlm: fix export reference problem

1, in client_import_del_conn, the export returned from
class_conn2export is not released after using it.

2, in ptlrpc_connect_interpret, the export is not released
if the connect_flags isn't compatible.

Change-Id: Ie7ef9cb0de2fa1aba71d3981ce47ae87c75e82d8
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/22031
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2547 test: re-enable 24a/b of recovery-small

Re-enable test_24a/b of recovery-small.

Test-Parameters: trivial testlist=recovery-small,recovery-small,recovery-small
Test-Parameters: testlist=recovery-small,recovery-small,recovery-small

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ie3d111e36a5a3792b3c3b5a7bd7f6b9979a321d5
Reviewed-on: http://review.whamcloud.com/22020
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8349 ldlm: ASSERTION(flock->blocking_export!=0) failed

Hash lock protects only during .hs_put_locked.
Switch to atomic blocking_refs.

Whole policy structure was zeroed twice.
Once during enqueue and second time during resend or replay.

Policy structure should be initialized with default values
only in ldlm_lock_new().

Change-Id: Ib916f64cd03cfe812c86463b4354bf5a9bbcdd56
Seagate-bug-id: MRP-2536, MRP-2909
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Signed-off-by: Ben Evans <bevans@cray.com>
Reviewed-on: http://review.whamcloud.com/21061
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7898 osd: remove unnecessary declarations

Refactor the code a bit to remove unnecessary declarations
(which are very expensive in ZFS). The patch also introduces
initial preparations to support large dnodes - it tracks
all declared EAs at object creation and tracked number can
be used to request dnode of appropriate size.

With this patch + LU-7918 disk/memory space reserved for a
single-stripe creation goes down from ~33MB to 4.6MB.

Performance improvements from this patch are also significant.
Running mdtest create performance on a test node (ramdisk):

    Threads    0.6.5   0.6.5+patch
        1       9933       14279
        2      12870       20469
        4      16405       26407
        8      19320       28254
       16      15648       26620
       32      14107       26483

Change-Id: I0778ad8d13ba1f7a5fa5ad5d874fbb1bd7203958
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/19101
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7044 test: Skip sanityn test_77e/77f/77g

Skip sanityn test_77e/77f/77g if server is older than 2.7.58

Test-Parameters: trivial testlist=sanityn

Change-Id: Ic2d93d74027d66f4471a4916cf35c830fd4225bb
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/19054
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7813 tests: clean up ost-pools.sh

Clean up the tests in ost-pools.sh to drop archaic use of
"lfs getstripe -v" that parses the output text in favour of
using options for "lfs getstripe -c" for OST count.

Add the check for newly-created dir/file being in the pool
into create_dir() and create_file().

Test-Parameters: trivial testlist=ost-pools

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ib2df663a62f89df48a70d07702b41f05f0194ef9
Reviewed-on: http://review.whamcloud.com/18889
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7593 target: umount vs tgt_last_rcvd_update deadlock

tgt_client_del() and
ofd_commitrw_write->tgt_last_rcvd_update
take transaction and ted->ted_lcd_lock
in different order:

thread1:
    osd_trans_start
    tgt_client_data_update
    tgt_client_del       <<< mutex_lock(&ted->ted_lcd_lock);
    ofd_obd_disconnect
    class_disconnect_export_list
    class_disconnect_exports
    class_cleanup
    ...
    sys_umount

thread2:
    __mutex_lock_slowpath
    mutex_lock          <<< mutex_lock(&ted->ted_lcd_lock);
    tgt_last_rcvd_update
    tgt_txn_stop_cb
    dt_txn_hook_stop
    osd_trans_stop
    ofd_trans_stop
    ofd_commitrw_write
    ...
    tgt_brw_write

Lock only around tgt_client_data_write() inside
the tgt_client_data_update()

Change-Id: Id3f60636be2abb3b70a99ee44b735aab7dfb7657
Seagate-bug-id: MRP-3109
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://review.whamcloud.com/17704
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7149 tests: restore writethrough_cache_enable

Test sanity.sh test_224c is failed as expected if executed separately
and passes if executed by automatic system. Tests 155d,155f,155h,156
do "set_cache writethrough off" and don't restore the state. This
makes next tests work incorrectly.

This patch adds writethrough_cache_enable restore for each function
above.

Test-Parameters: trivial

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
Xyratex-bug-id: MRP-2590
Change-Id: I5f4f3f6c419a3aa415426607e776403da9822c2c
Reviewed-on: http://review.whamcloud.com/16424
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 libcfs: move uid handling to linux directory

Simple patch to move the uid handling added to handle
older kernels to the linux directory. The linux
directory is where we handle APIs of newer kernels
with older distribution kernels.

Test-Parameters: trivial

Change-Id: Ie3676d33ce33ebc0f98ffa460cba37ab55928617
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/22139
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8540 o2iblnd: Add support for 5arg ib_map_mr_sg()

Starting in kernel v4.7, ib_map_mr_sg() takes five arguments
rather than four. It added an "sg_offset_p" offset pointer
argument.

RHEL7.3 also contains this change.

Change-Id: Ie63c992421bdf4ca195cf55152e6dfed9cf40e1d
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/22126
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Li Dongyang <dongyang.li@anu.edu.au>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8507 lnet: Enable setting per NI peer_credits

The code to allow peer_credits to be set per NI was originally
"left inactive" because there were concerns about peer_credits
interfering with the ability for IB nodes to connect to each
other when peer_credits are not the same (peer_credits controls
the queue depth for IB). With LU-3322, the values do not have
to match so it is now safe to enable this code so peer_credits
can be set per NI.

This patch enables existing code for setting per NI peer_credits.

Second this patch fixes a long standing bug in that the conf data
was not being used to set variables in the lnet_ni structure until
after lnd_startup() was called which meant LND drivers were
ignoring struct lnet_ni tunable values being set. Now we change
struct lnet_ni data fields based on conf data before calling
lnd_startup().

Test-Parameters: trivial
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I28ede7a139c43ca9a3d1b22255d3358694057918
Reviewed-on: http://review.whamcloud.com/21948
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8501 lnet: Ensure routing is turned on first time

In lnet_rtrpools_enable(), a mistake was made and routing
was not being turned on when the rtrpools are being allocated
for the first time.

This patch fixes that routine so we remember to turn on
routing after allocating the rtrpools.

Test-Parameters: trivial
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I8ef3e11bc8082cdce93e53d640f69e59ddbe9588
Reviewed-on: http://review.whamcloud.com/21934
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7803 tests: Cleanup after sanity/78

Remove large file created by sanity/78 regardless of failure. If this
file is left after failure, it causes some cascading failures because
of limited space available.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ib359b9024360015ce92f209e5350f2d679071cb8
Reviewed-on: http://review.whamcloud.com/21808
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8443 utils: exclude "resize" parameter with meta_bg option

Partitions with size > 256TB must use meta_bg option. This option
is not compatible with "resize_inode" option and "resize" extended
option. For optimization reason "resize" option is enabled by
default. For filesystems with < 2^32 blocks this optimization is
useless.

This patch disables resize option if meta_bg is enabled. The test
that formats Lustre FS with "^resize_inode,meta_bg" options on OST
added.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
Seagate-bug-id: MRP-3647
Change-Id: Ibea2d18f79498636a165a682cf6b6435f7cebfba
Reviewed-on: http://review.whamcloud.com/21545
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8025 llite: make vvp_io_write_start lockless for newer kernels

When support for newer kernels was backported from the
upstream kernel it lacked any of the enhancements done
for newer version of lustre. This work makes the newer
kernel support lockless writes like the rest of the
lustre llite code.

Change-Id: I6ea32dbb3097aea3e2031e1121e238e549bccc9b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Ben Evans <bevans@cray.com>
Reviewed-on: http://review.whamcloud.com/19840
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7927 llite: Deadlock between ll_setattr and write/ll_fsync

The patch http://review.whamcloud.com/10013 (commit 85bd36cc695)
"LU-4840 lfs: Use file lease to implement migration" moves
lli_trunc_sem into vvp layer. It violates lli_trunc_sem/i_mutex
locking order. So i_mutex should be taken after lli_trunc_sem now.

Change-Id: I2ecd52b7ae6eca74c6db7d94b1de1333560bc45d
Seagate-bug-id: MRP-3372
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://review.whamcloud.com/19165
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 libcfs: cleanup list handling

For the kernel space side we should use list.h directly
expect in the case of kernel API changes that impact us
then we use linux-list.h that handles those API changes.
A few of the user land utilities use a list implementation
so we provide a separate list implementation for the
libcfs library.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I1280d74a629dbaa9c11a3c506fd635fab99ce182
Reviewed-on: http://review.whamcloud.com/15200
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8514 mdd: transaction failure should be checked

Transaction failure should not be silently ignored, otherwise
MDT doesn't know whether current operation have transaction, therefore
save lock upon transaction failure.

Add sanity.sh 407 for this.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ie133a77c7f1bf890319dbd3cc2b03412a23f5c82
Reviewed-on: http://review.whamcloud.com/22071
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8408 mgc: handle config_llog_data::cld_refcount properly

Originally, the logic of handling config_llog_data::cld_refcount
is some confusing, it may cause the cld_refcount to be leaked or
trigger "LASSERT(atomic_read(&cld->cld_refcount) > 0);" when put
the reference. This patch clean related logic as following:

1) When the 'cld' is created, its reference is set as 1.

2) No need additional reference when add the 'cld' into the list
   'config_llog_list'.

3) Inrease 'cld_refcount' when set lock data after mgc_enqueue()
   done successfully by mgc_process_log().

4) When mgc_requeue_thread() traversals the 'config_llog_list',
   it needs to take additional reference on each 'cld' to avoid
   being freed during subsequent processing. The reference also
   prevents the 'cld' to be dropped from the 'config_llog_list',
   then the mgc_requeue_thread() can safely locate next 'cld',
   and then decrease the 'cld_refcount' for previous one.

5) mgc_blocking_ast() will drop the reference of 'cld_refcount'
   that is taken in mgc_process_log().

6) The others need to call config_log_find() to find the 'cld'
   if want to access related config log data. That will increase
   the 'cld_refcount' to avoid being freed during accessing. The
   sponsor needs to call config_log_put() after using the 'cld'.

7) Other confused or redundant logic are dropped.

On the other hand, the patch also enhances the protection for
'config_llog_data' flags, such as 'cld_stopping'/'cld_lostlock'
as following.

a) Use 'config_list_lock' (spinlock) to handle the possible
   parallel accessing of these flags among mgc_requeue_thread()
   and others config llog data visitors, such as mount/umount,
   blocking_ast, and so on.

b) Use 'config_llog_data::cld_lock' (mutex) to pretect other
   parallel accessing of these flags among kinds of blockable
   operations, such as mount, umount, and blocking ast.

The 'config_llog_data::cld_lock' is also used for protecting
the sub-cld members, such as 'cld_sptlrpc'/'cld_params', and
so on.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9fb6c3b7ae23dcea147aca7ffec240e0f33ef746
Reviewed-on: http://review.whamcloud.com/21616
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

New tag 2.8.57

Change-Id: I00319d4310725e3ffce4bdad12ab532663b88c17
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8523 test: sanity 311 is too strict

sanity 311 unlinks 1000 files, but the real destroyed objects may be
less, because there is some delay from when the files are unlinked
and when the MDS destroys the objects on the OSTs. Previously it's
set to check at least 900 objects are destroyed, but autotest found
only 880 objects destroyed in some cases, so now it's reduced to 800.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I88f45ae744475f2e2cdf8f82c1405164d6f4cd1c
Reviewed-on: http://review.whamcloud.com/22210
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-4865 zfs: grow block size by write pattern

This patch grows the block size by write RPC. The osd-zfs blocksize
used to be fixed at 128KB, which is too big for random write and
too small for seqential write.

This patch decides the block size by the first few RPCs. If the first
few RPCs are sequential, mostly it will pick maximum block size for
the object; otherwise, a feasible block size will be picked by the
RPC size.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I66f7cbdc2b5e0365058b152b4865b00cdabb0cf3
Reviewed-on: http://review.whamcloud.com/18441
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Don Brady <don.brady@intel.com>

LU-8006 ptlrpc: specify ordering of TBF policy rules

With this patch, when inserting a new rule, the rank of the rule
can be given by "start" command. Also, the rank of the rule can be
changed by command of "change".

lctl set_param ost.OSS.ost_io.nrs_tbf_rule=
"start $NAME jobid={$ID} rate=$RATE rank=$NEXT"
lctl set_param ost.OSS.ost_io.nrs_tbf_rule=
"change $NAME rate=$RATE rank=$NEXT"

$NAME is the target rule name. $NEXT is the rule name that the target
rule will be moved before.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I6b465342365d6c09710616cd3c9e068b66a8fc89
Reviewed-on: http://review.whamcloud.com/19476
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7845 lnet: check if ni is in current net namespace

Add new 'ni_net_ns' field to struct lnet_ni to hold a reference
to original net namespace in which ni is created.
In LNetDist(), check if ni was created in same net namespace as
current's one. If not, assign order above 0xffff0000, to make
this ni not a priority.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5abde6e325983352b42c0eafe16aef22567e3e0e
Reviewed-on: http://review.whamcloud.com/21884
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7977 lnet: Have selftest use proper units (MB/s or MiB/s)

lnet-selftest currently reports bandwidth statistics as
MB/s but it is really calculated as MiB/s.

This patch corrects the output to say MiB/s and adds a
new option, "--mbs" to the "lst stat" command to change
the units to MB/s.

Test-Parameters: trivial
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: Iae8f6ca92b9b0ee00e6307eaf22e5c0791ed323d
Reviewed-on: http://review.whamcloud.com/20891
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8285 test: Allow LNet logging as default in autotest

The default in the local.sh configuration file for autotest
is to turn off all logging from three subsystems: lnet, lnd,
and pinger. There is no good reason to be doing this and
this could be hiding important logs highlighting bugs.

This patch makes the default to allow all subsystems to log.

Test-Parameters: trivial
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I8ef88679b1aa716311a10f7be43480ee3184d1a0
Reviewed-on: http://review.whamcloud.com/20818
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8249 lnet: potential deadlock in lnet

Fixes potential deadlock in LNetMDAttach (vfree must not be called in
interrupt context in linux kernel versions prior to 3.10).

Signed-off-by: Quentin Bouget <quentin.bouget.ocre@cea.fr>
Change-Id: I1b421b470bab97d58f441040c39b9f1caf11b1fe
Reviewed-on: http://review.whamcloud.com/20676
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8480 ofd: hold obd_dev_lock across grant comparison

Hold obd_dev_lock until the global ofd_tot_* grant values are saved,
so that their comparison is not racy. Otherwise it is possible to
report grant inconsistencies when multiple clients are unmounted.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I19ffd102b657df2df539d01d182a782aa17ad924
Reviewed-on: http://review.whamcloud.com/21813
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7528 test: Deregister changelog client on test fail.

The tests which are related to changelog did not deregister
changelog when any failure was encountered, because of which
any changes afterward were also recorded leading to huge
test stdouts. I have added the functions changelog_cleanup
which will deregister the changelog client created by the
test before exiting. This function is to be used in place
of changelog deregister in any test that registers a changelog
client and a trap statement is included to execute this
function on EXIT.

Test-Parameters: trivial

Seagate-bug-id: MRP-3063
Signed-off-by: Kirtankumar Krishna Shetty <kirtan.shetty@seagate.com>
Change-Id: I7f26f266ba8bda294b75ff5619d95c26704fd83f
Reviewed-on: http://review.whamcloud.com/17506
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6844 tests: re-enable striped dir

Since this failure should be fixed by
http://review.whamcloud.com/21088

Let's revert http://review.whamcloud.com/20022
to re-enable striped dir in replay-single 70b.

Test-Parameters: trivial testlist=replay-single,replay-single,replay-single,replay-single,replay-single,replay-single
Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Ie7fc18d4d57a74be6925d8a635fdb09d4917a2e7
Reviewed-on: http://review.whamcloud.com/21508
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8383 build: Spec file cleanup after LU-5614

Add dependency from kmod-%{lustre_name}-tests
Fix BuildRequires: %kernel_module_package_buildreqs

Test-Parameters: trivial

Change-Id: I92325687812f10fb308971391e67bb80c08ae5db
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/22125
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8056 build: announce linux kernel 4.5.7 support

Bump kernel version in ChangeLog to latest supported
kernel which is 4.5.7

Test-Parameters: trivial

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I9607aa68e67174e588b284d4e3048131e2dcc2bd
Reviewed-on: http://review.whamcloud.com/21970
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8495 kernel: kernel update [SLES11 SP4 3.0.101-80]

Update SLES11 SP4 kernel to 3.0.101-80

Test-Parameters: mdsdistro=sles11sp4 ossdistro=sles11sp4 \
  clientdistro=sles11sp4 mdsfilesystemtype=ldiskfs \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
  testgroup=review-ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I2878d388bd58905643ff73401eeae166c34aac95
Reviewed-on: http://review.whamcloud.com/21866
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8056 mem: handle GFP_IOFS removal in newer kernels

Starting with linux kernel 4.5 GFP_IOFS has been removed.
GFP_IOFS was meant to be a short hande to clear two
GFP flags but it was never used properly. Replace it with
GFP_NOFS instead.

Change-Id: I97e045b1363ce216426ae709145b839a838e5762
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/21781
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6401 headers: Move functions out of lustre_idl.h

Migrate functions
lma_to_lustre_flags, lustre_to_lma_flags
set/get_mrc_cr_flags
ldlm_res_eq
ldlm_extent_overlap
ldlm_extent_contain
ldlm_request_bufsize
rec_tail
agent_req_in_final_state
lustre_print_user_md
all PTLRPC dump_* functions
lovea_slot_is_dummy

Delete unused
lmv_mds_md_stripe_count

Signed-off-by: Ben Evans <bevans@cray.com>
Change-Id: If65e9f63b727889f4952d5c326b18356cc4dae9d
Reviewed-on: http://review.whamcloud.com/21484
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>

LU-8371 llite: Trust creates in revalidate too.

By forcing creates to always go via lookup we lose some
important caching benefits too.
Instead let's trust creates with positive cached entries.

Then we have 3 possible outcomes:
1. Negative dentry - we go via atomic_open and do the create
   by name there.
2. Positive dentry, no contention - we just go straight to
   ll_intent_file_open and open by fid.
3. positive dentry, contention - by the time we reach the server,
   the inode is gone. We get ENOENT which is unacceptable to return
   from create. But since we know it's a create, we substitute it
   with ESTALE and VFS retries again with LOOKUP_REVAL set, we catch
   that in revalidate and force a lookup (same path as before this
   patch).

Change-Id: I7b006a50703bfb37e8747dca0f95b2c512b82429
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/21168
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-8084 lfsck: handle linkea record length properly

The record length in the linkea may be corrupted. If we do not handle
the invalid record length when locate the next linkea record or delete
the current record, it may cause invalid memory accessing and corrupt
other data in RAM, and then cause kinds of strange RAM issues.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I27d724025c8157ecf51e3269a39e2fdfbc27a27d
Reviewed-on: http://review.whamcloud.com/19877
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7160 mgs: Skip processing .bak files on MGS

lctl replace_nids command saves previous version of
config files to file with original_name.bak file name.
This file should never be processed by MGS.

This patch adds code that skips file with .bak extention
from list to be processed by MGS.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
Xyratex-bug-id: MRP-2742
Change-Id: I5ad5cf5548d395459d2245394ef3f7764fe8f0ca
Reviewed-on: http://review.whamcloud.com/16428
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8454 llite: normal user can't set FS default stripe

Current client doesn't check permission before updating filesystem
default stripe on MGS, which isn't secure and obvious.

Since we setattr on MDS first, and then set default stripe on MGS,
we can just return error upon setattr failure.

Now filesystem default stripe is stored in ROOT in MDT, so saving
it in system config is for compatibility with old servers, this
will be removed in the future.

Add sanity 65m to verify this.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ia224a9211c1ceab08a3a064adc67bc945ee3fc11
Reviewed-on: http://review.whamcloud.com/21612
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8493 osp: Do not set stale for new osp obj

Do not set stale for the new OSP object, otherwise
it will cause ESTALE failure for the following
write operation, see osp_md_declare_write().

This problem is brought in by
http://review.whamcloud.com/19041

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Ib92deab4e0c900d59fbdc2bf50e17fd29fd2ecce
Reviewed-on: http://review.whamcloud.com/21861
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7650 o2iblnd: handle mixed page size configurations.

Currently it is not possible to send LNet traffic between
two nodes using infiniband hardware that have different
page sizes for the case when RDMA fragments are used.
When two nodes establish a connection they tell the other
node the maximum number of RDMA fragments they support.
The issue is that the units are pages, and 256 64K pages
corresponds to 16MB of data, whereas a 4K page system is
limited to messages with 1MB of data. The solution is to
report over the wire the maximum number of fragments in
4K unites regardless of the native page size. The recipient
then uses its native page size to translate into the
maximum number of pages sized fragments it can send to
the other node.

Change-Id: I5aa4a464a0320fbd1841f9ad3add810e7b4f124a
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/21304
Tested-by: Jenkins
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 client: remove types abstraction from client code

Originally when lustre code was built for userland we needed
a proper way to handle 32 bit and 64 bit platforms when
reporting unsigned longs. Now that this code is only built
for kernel space and the kernel has it own special string
handling functions we don't need this abstraction anymore.
Remove this abstraction from the client side code.

Change-Id: Ic0c55a413237bdf57d60031c12d5d9b62fa39cef
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/20590
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4433 tests: fix mds-survey.sh to support multiple MDTs

This patch fixes mds-survey.sh and mds-survey to support
multiple MDTs.

Test-Parameters: mdtcount=1 testlist=mds-survey
Test-Parameters: envdefinitions=PTLDEBUG=-1,DEBUG_SIZE=150 mdscount=2 mdtcount=4 testlist=mds-survey
Signed-off-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I9193b7fb65ab8b5dfd0817a8c203dae463deb090
Reviewed-on: http://review.whamcloud.com/19437
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7903 hsm: leaked export refcount

Add missed class_export_put() in mdt_hsm_agent_send().

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ie9119c53f11901573161034a85bfa7bf83ca6ff8
Reviewed-on: http://review.whamcloud.com/21942
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8258 nodemap: fix userspace address access in proc code

The fileset proc write handler was incorrectly passing the userspace
buffer address directly to the nodemap code. This patch copies it to
kernel space before passing it. Because the buffer could be greater
than 2k, allocate the buffer off stack.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: If90c1a95c80b2afd2a4cf6a70dc41d28dd157a2f
Reviewed-on: http://review.whamcloud.com/21857
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8314 utils: revert lfs_getdirstripe to non-recursive mode

Since 2.7 'lfs getdirstripe' enabled recursion mode by default,
while it's not the obvious behavior, this patch reverts it
to non-recursive mode.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I683f5dec203230b36ee3da404e7f0817e91d090f
Reviewed-on: http://review.whamcloud.com/21516
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8460 osc: max_pages_per_rpc should be chunk size aligned

max_pages_per_rpc should be chunk size aligned.

obd_brw_size need to be at least one block size.

Improve the LASSERT() to an LASSERTF() that prints the related
parameters to help debug problem.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: If73b8f05052f96970f3e97015a4642152ace2a38
Reviewed-on: http://review.whamcloud.com/21825
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4931 ladvise: Add willread advice support for ladvise

This patch adds WILLREAD advice to ladvise framework. OSS will
prefetch data into memory when this hint is provided. It is not
garanteed how long the cached pages will be kept in memory.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I21394b88a22a8c46ceae7151402341364860ee88
Reviewed-on: http://review.whamcloud.com/12458
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7646 lnet: Stop Infinite CON RACE Condition

In current code, when a CON RACE occurs, the passive side will
let the node with the higher NID value win the race.

We have a field case where a node can have a "stuck"
connection which never goes away and is the trigger of a
never-ending loop of re-connections.

This patch introduces a counter to how many times a
connection in a connecting state has been the cause of a CON RACE
rejection. After 20 times (constant MAX_CONN_RACES_BEFORE_ABORT),
we assume the connection is stuck and let the other side (with
lower NID) win.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I32e035806e95868b13c28c42e241b969940a35c9
Reviewed-on: http://review.whamcloud.com/19430
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7483 tests: sanity test_103a misc test in acl corrected

sanity test_103a was failing under SELinux enabled client
throwing misc test failed error message. With the previous
command "ls -dl d/l | awk 'sub(/\\./, "", $1); {print $1}'"
the output consisted of 2 rows which was causing the failure.
The issue is mentioned in the comments section of LU-7483.

The solution was to filter the results using
"ls -dl d/l | awk '{ sub(/\\.$/, "", $1); print $1 }'"
instead and have the desired result. Results for successfully
run sanity test_103a under SELinux enabled client environment
can be found in the ticket.

Test-Parameters: trivial testlist=sanity,sanity
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: I3fcd60161873040b66d7004fb1cf682b41a0b8d9
Reviewed-on: http://review.whamcloud.com/21722
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8468 kernel: kernel update RHEL7.2 [3.10.0-327.28.2.el7]

Update RHEL7.2 kernel to 3.10.0-327.28.2.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I63fe2cde33efba13be29e0bff0a4ef6b9a3306f5
Reviewed-on: http://review.whamcloud.com/21692
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8474 tests: stop MGS before setup_noconfig in conf-sanity.sh

In conf-sanity.sh, some sub-tests will leave MGS mounted
under separate MGT and MDT configuration, which will cause
setup_noconfig() fail. This patch fixes the issue by stopping
MGS before running setup_noconfig().

Test-Parameters: trivial combinedmdsmgs=false envdefinitions=ONLY=55 testlist=conf-sanity

Test-Parameters: trivial envdefinitions=ONLY=55 testlist=conf-sanity

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I53f5549c392d300a2c76bbc4ce68e9a8198ba559
Reviewed-on: http://review.whamcloud.com/21653
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Parinay Kondekar <parinay.kondekar@seagate.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8473 tests: skip conf-sanity test 41a with separate MGT and MDT

conf-sanity test 41a is to test “nosvc” and “nomgs” mount options
on a combined MGT/MDT device. It’s not applicable for separate
MGT and MDT configuration. This patch adds codes to check that.

Test-Parameters: trivial combinedmdsmgs=false envdefinitions=ONLY=41a testlist=conf-sanity

Test-Parameters: trivial envdefinitions=ONLY=41a testlist=conf-sanity

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I967f63c70953c7c8e5bb296e832ee5335e56f69c
Reviewed-on: http://review.whamcloud.com/21651
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7061 osd-ldiskfs: NULL pointer in osd_scrub_refresh_mapping

Commit c0dafc483c (change 16138) missed a spot. id can be NULL for
DTO_INDEX_DELETE operation.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Id73f8dfb1834ff5275da006c03f59d4c56286aa7
Reviewed-on: http://review.whamcloud.com/20620
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8085 scrub: increase iteration cursor to skip unused inodes

After the OI scrub iteration handled the last used bits in the
inode table, it should increase the iteration position to next
group before the jump to avoid loop for ever.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib7a63202a134ecc82070868b9630430f054b69fa
Reviewed-on: http://review.whamcloud.com/19876
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-8083 lfsck: repair symbol file nlink properly

Miss to check symbolic link case in lfsck_namespace_repair_nlink.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I420d558803672100292990f1ff4888c03888c39a
Reviewed-on: http://review.whamcloud.com/19874
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6688 tests: use proper nodes for NRS test

Several of the NRS test for sanityn are reporting
error: set_param: ost/OSS/*/nrs_policies: Found no match.
The reason for this is that the test are attempting
to configure NRS oss settings on the MDS servers.
Those oss settings don't exist on MDS servers.
Change the test to alter the oss NRS settings on
the OSS servers instead.

Test-Parameters: trivial testlist=sanityn

Change-Id: I83600165fc1b9f0d9c6ee0d093f54604c46328b9
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/21764
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>

LU-7472 tests: Allows to specify IOR block size with suffix

Added a variable io_blockUnit which can
be set to K, M or G and adjust the IOR block sizes.
Modified to keep m as the default option. Default block
size for IOR tests would be 6MB.

Test-Parameters: trivial
Seagate-bug-id: MRP-2685
Change-Id: Ie7a11d8cb06faad902abc56bca4fc5914df8f42d
Signed-off-by: Aditya Pandit <aditya.pandit@seagate.com>
Signed-off-by: Ashish Purkar <ashish.purkar@seagate.com>
Reviewed-on: http://review.whamcloud.com/17354
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7225 llite: ladvise protocol changes

This patch makes some changes to the ladvise API and
protocol to support lock ahead and possible future users.

Primarily, it separates the userspace API arguments from
the structures which go out on the network, and adds a
number of 'value' fields without a predefined use.

The meaning of each value field can be different for
different advice types, allowing some extensibility.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I7ac18e546f16a20c3c6bc6849becb0d45e3d5dc9
Reviewed-on: http://review.whamcloud.com/20666
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7800 llog: check if next llog exists

Because next llog creation will only be checked
and created in declare phase, and it does not
serialize the catllog accessing in the whole
declare_add and add process, so if there are
mulitple threads access the catlog at the same
time, and if the llog creation did not succeeds,
then next_log (in catllog) might be NULL, and
it will cause panic in llog_cat_current_log().

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I2343023c1f3109c077c98d78d3669377d95ed42f
Reviewed-on: http://review.whamcloud.com/18542
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7117 osp: set ptlrpc_request::rq_allow_replay properly

In ptlrpc layer, if the ptlrpc_request::rq_allow_replay is set,
then such RPC can be sent to remote peer even if it is not the
replay RPC during the remote server recovery. Such flag is used
for sending RPC under the case of current server and the remote
server are both in recovery.

On the other hand, abusing such flag will cause some trouble.
For example: consider DNE mode, assume the MDT_m is in recover,
the MDT_n is healthy. At that time, one client can send a normal
reint unlink RPC to the MDT_n to remove the file_A (that resides
on the MDT_n) under the dir_B (that resides on the MDT_m). Under
such case, the MDT_n needs to lookup the dir_B with the file_A's
name, means the MDT_n needs to send lookup OUT RPC to the MDT_m,
but before that it needs to lock the dir_B with LDLM_ENQUEUE RPC
firstly. Because the MDT_m is recovering, since the LDLM_ENQUEUE
RPC is not for replay, it should be blocked until the recovery
done on the MDT_m. That is expected behavior. But if the MDT_n
(via OSP) sets ptlrpc_request::rq_allow_replay improperly, then
such LDLM_ENQUEUE RPC may be sent to the MDT_m during the MDT_m
recovery and granted without conflict. And then the subsequent
lookup OUT RPC may obtain some stale information from the MDT_m
if the dir_B has NOT been recovered yet.

So the ptlrpc_request::rq_allow_replay will be set during current
MDT recovery. On the other hand, there are multiple threads those
are related with the recovery, such as target_recovery_thread and
lod_sub_recovery_thread. Because the obd_device::obd_recovering
is controlled by the target_recovery_thread that is started later
than the lod_sub_recovery_thread. Only checking the obd_recovering
flag does not work under some cases. So it needs to check other
flags: obd_device::obd_replayable and obd_device::obd_no_conn to
distinguish recovery related RPC properly.

So for above case, the client sponsored unlink will be blocked on
the MDT_n for the LDLM_ENQUEUE RPC until the MDT_m recovery done.

Test-Parameters: mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs mdscount=2 mdtcount=4 testlist=replay-single,replay-single,replay-single
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Id9ac542751cc0042fba0a94166dfc57ace52dc69
Reviewed-on: http://review.whamcloud.com/20940
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2805 tests: sanity -Remove test_184c from Always Except

For sanity.sh removing test_184c from the ALWAYS_EXCEPT
list.

Test-Parameters: trivial mdtfilesystemtype=zfs ostfilesystemtype=zfs mdsfilesystemtype=zfs envdefinitions=SLOW=yes testlist=sanity,sanity,sanity,sanity
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: Iba0b5b0ff2613e94c1cc90921face828605d05cb
Reviewed-on: http://review.whamcloud.com/20104
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-7899 osd: batch EA updates"

Reverting this patch as it seems to be causing OOM issus
documented in LU-8449 and there does not seem to be an
easy fix in sight.

This reverts commit 6cd79ab5860c59c2a640a9e8ca4ee86eec050b43.

Change-Id: I934af93d893b01dad7190471b6b1a7bdffb1b509
Reviewed-on: http://review.whamcloud.com/21878
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-8479 obdclass: Reserve some value for OBD_FAIL_* macros

Since these value were used by other branch. So
reserve them in master for consistency.

Test-Parameters: trivial envdefinitions=ONLY=0 testlist=sanity

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I3b262c34b3d86effeeeecb924092c2ffc8764c42
Reviewed-on: http://review.whamcloud.com/21766
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-8383 build: Spec file cleanup after LU-5614"

This patch appears to break SLES builds with:
error: Failed build dependencies:
kernel-syms is needed by lustre-2.8.56_23_ge8273a3-1.x86_64
make: *** [srpm] Error 1

This reverts commit 55836cd0e55eb1912911c6f195412c99852115aa.

Change-Id: I612e02431a7aafa4bb3daa7b3fb14a31e08175e3
Reviewed-on: http://review.whamcloud.com/21877
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6447 mdt: mdt_identity_upcall to not block with rwlock held

mdt_identity_upcall is currently calling call_usermodehelper
with an rwlock held, which is a no-no since it allocates memory
and schedules. Just replace the rwlock with a rw_semaphore.

Change-Id: I7b063a4db47313fbae6241da7bcec2c397b8e8c4
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-on: http://review.whamcloud.com/14432
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>

LU-8371 llite: optimize atomic_open of negative dentry.

No point in talking to MDS in that case if we are not creating,
just return -ENOENT.

Change-Id: I15c00fdc841e5e9d4d1923b2353f7fdc5910d67b
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/21161
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-8471 obdclass: restore EXPORT_SYMBOL for lu_ref* functions

LU-5829 removed a lot of EXPORT_SYMBOL, including for the lu_ref_*
functions. As these functions are only compiled in when the
--enable-lu_ref is passed to configure, the breakage was missed. This
patch restores the missing EXPORT_SYMBOLS that were present, except
for lu_ref_print and lu_ref_is_marker which are only used in the
obdclass module.

Test-Parameters: trivial
Signed-off-by: Frank Zago <fzago@cray.com>
Change-Id: I16e6065b75c568a18386c0f0a746484fdad38d6e
Reviewed-on: http://review.whamcloud.com/21640
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>