git://git.whamcloud.com - fs/lustre-release.git/log

LU-7124 o2iblnd: limit cap.max_send_wr for MLX5

Decrease cap.max_send_wr until it is accepted by rdma_create_qp()

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ib76f07d997ea579f86fca467329ad357ed26b36f
Reviewed-on: http://review.whamcloud.com/18347
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7003 utils: must quote the value of the context option

As per mount(8) notes, the context value might contain commas,
in which case the value has to be properly quoted, otherwise
mount(8) will interpret the comma as a separator between mount
options.

Signed-off-by: Frederic Saunier <frederic.saunier@atos.net>
Change-Id: I75e958da26273fe2a15ccae01a1176c3549821b8
Reviewed-on: http://review.whamcloud.com/18294
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7002 utils: SELinux context repeated in mount opts

SELinux context can only be specified once in mount options,
thus append_context_for_mount() should not apply once again
when called at mount time.

Signed-off-by: Frederic Saunier <frederic.saunier@atos.net>
Change-Id: I350b67ec42691b875aab085224185f2a7582d41d
Reviewed-on: http://review.whamcloud.com/18319
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Grégoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7736 scripts: ensure lustre_rmmod unload all modules

The luste_rmmod script unloads the lustre modules recursively from
libcfs, unloading each dependent module first.

If the module dependency order makes an LND module to be unloaded
before the ptlrpc module, the LND unload fails and it results in
lnet and libcfs modules still loaded at the end.

# modprobe lustre
# lctl list_nids
10.1.0.64@o2ib
# lustre_rmmod
Modules still loaded:
lnet/lnet/lnet.o libcfs/libcfs/libcfs.o

This patch ensures modules are all unloaded by the lustre_rmmod
script.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Change-Id: Id94308332a4f5f95f4617f0d5882a9e2857ee20d
Reviewed-on: http://review.whamcloud.com/18279
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frederic Saunier <frederic.saunier@atos.net>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7727 mdt: fail FMODE_WRITE open if the client is read only

O_WRONLY/O_RDWR open on a file will get EROFS on a read only client,
but the rpc gets sent to the mdt anyway.
mdt will increase the mot_write_count of the mdt object, blocking
subsequent FMODE_EXEC open to the same file.

This patch makes sure we fail the FMODE_WRITE open with EROFS on mdt
if the open request is from a read only client.
We also do a similar check on the client so we can fail with EROFS
straight away without sending the rpc to mdt.

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Change-Id: I8b08e9d100a1ab8edf2fa47d4e2ebc5170f36df5
Reviewed-on: http://review.whamcloud.com/18242
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ian Costello <icostello@ddn.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7689 obdclass: limit lu_site hash table size on clients

Allocating a big hash table using the formula for osd
does not really work for clients. We will create new
hash table for each mount on a single client which is
a lot of memory more than expected.

This patch limits the hash table up to 8M on clients,
which has 524288 entries.

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Change-Id: I908fda102ec5fd46c1325e0e41f5fe291aaa3378
Reviewed-on: http://review.whamcloud.com/18048
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7608 kernel: kernel upgrade [SLES12 SP1 3.12.51-60.25]

change supported version from base SLES12 to SLES12 SP1
Update target file for new version
Add kernel config for server build
Revise one of the ldiskfs patches for sles12,
as the rhel7 version it was sharing no longer applies.
Revise lbuild for sles12 server builds.

Test-Parameters: clientdistro=sles12 testgroup=review-ldiskfs \
mdsdistro=sles12 ossdistro=sles12 mdsfilesystemtype=ldiskfs \
mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I72a6964615356a2dfa0a9b4dae49a9457ed617b0
Reviewed-on: http://review.whamcloud.com/17857
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7781 kernel: kernel update RHEL7.2 [3.10.0-327.10.1.el7]

Update RHEL7.2 kernel to 3.10.0-327.10.1.el7

Test-Parameters: clientdistro=el7 mdsdistro=el7 ossdistro=el7 \
mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs testgroup=review-ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ic4b375e61cb069697c289c665484090787a93fe3
Reviewed-on: http://review.whamcloud.com/18478
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7024 tests: Skip squash_id if server version is older than 2.5.53

Add server version check in squash_id to skip if older than 2.5.53

Change-Id: I5e725a03873b4c776a9d90e003071ef777343901
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/17799
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7524 fld: fld_clientlookup retries next target

fld_client_lookup() retries for another target, if the remote
target is deactive. This was introduced in
http://review.whamcloud.com/#/c/14313/ For getting the
next export target from the list, we use:
target = list_entry(target->ft_chain.next, struct
lu_fld_target,ft_chain);

Now for tests that deactivate the last target,
&(target->ft_chain) is the last entry in the list, and the
next of the last entry(target->ft_chain.next) is the list_head.
Using the macro list_entry maps the list_head pointer back into
a pointer to the structure that contains the list_head. Thus,
it turns the head of the list into its containing
structure(lu_fld_target).
Hence, since the head of the list does not have any data
associated with it, the containing structure(i.e.target)
formed from the head of the list also does not have any data.
Therefore, an export target with no obd device data is generated.
This corrupted export target(generated from the head of the
list) causes the assertion.

The fix is: While fld_client_lookup retries for another target,
if the next entry in the export target list is the head of the
list(&fld->lcf_targets), move to the next entry after the
head(target->ft_chain.next->next) and retrieve the target.
Else retrieve the next target entry(target->ft_chain.next).

Seagate-bug-id: MRP-3200
Signed-off-by: Noopur Maheshwari <noopur.maheshwari@seagate.com>
Change-Id: Ia353437a315de0f1bb44d8822e836ac969b0567f
Reviewed-on: http://review.whamcloud.com/17683
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4423 libcfs: Merge linux-proc.c into module.c

module.c was previously the sole exporter of symbols from linux-proc.c
This patch removes the global symbols by merging the two files

Linux-commit: 87643abf92484074937594897145bb53efc0e77e

Signed-off-by: Matthew Tyler <matt.tyler@flashics.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I557efbfd37b5a23d2fdd8cd5a2aa399a7a494bc6
Reviewed-on: http://review.whamcloud.com/17201
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-3782 test: Fix for faliure when no file are created.

The test_18 of ost-pools.sh failed when the createmany
function returns zero and test showed a divide by zero
error. Fixed the issue by adding a check for the return
value and also fixed incorrect average calculation.

Seagate-bug-id: MRP-1117
Signed-off-by: Kirtankumar Krishna Shetty <kirtan.shetty@seagate.com>
Change-Id: I13ea0c1aeb974b5e74dfc7d92fe4e3b744a1fba6
Reviewed-on: http://review.whamcloud.com/16939
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4640 mdt: implement Remove Archive on Last Unlink policy

This patch introduces RAoLU policy where an implicit remove
request will be sent to archive/Agent upon last close of an
unlinked file.
Policy can be enabled/disabled using an lprocfs tunable.
If CDT not running, requests will be queued.

test_26[a-c] related tests have been also added in sanity-hsm.

This patch also contains a small fix to prevent unnecessary
progress infos to be gathered for REMOVE requests.

Patch now also handles cases where unlinked file is closed
from mdt_export_cleanup() after Client eviction. And
specific test_26d has been added in sanity-hsm.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I7affb20b2834bcd0618412349fc3adc7f6744de0
Reviewed-on: http://review.whamcloud.com/14384
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6391 llite: Add client mount opt to ignore suppress_pings

When Lustre servers enable 'suppress_pings', all clients will stop
pinging. However, some clients may not have external mechanism
to notify Lustre servers for node death and therefore need to
preserve the Lustre ping.

This patch provides a mount option 'always_ping' so that the
client will not stop pinging even if the server has enabled
'suppress_pings'.

Signed-off-by: Wally Wang <wang@cray.com>
Change-Id: Ia7b45e8d2dbb53f02157ef2ab1d327d9483c2455
Reviewed-on: http://review.whamcloud.com/14127
Tested-by: Jenkins
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5505 clio: revise read ahead algorithm

ras_window_len should only be updated in ras_update() by read
pattern and it can't be adjusted in ll_readahead() at all;
ras_consecutive_pages is used to detect read pattern from
mmap. It will be used to increase read ahead window length
gradually.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I78b41646ccd8d9d1c810196a8cbcf58adbcb9319
Reviewed-on: http://review.whamcloud.com/11528
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7786 tests: improve racer cleanup

On cleanup racer terminates child scripts: file_create.sh,
dir_create.sh, etc. Children of those srcipts do not get terminated
that way. Long running commands, like dd, causes annoying warnings:
/mnt/lustre2 is still busy, wait one second
on attempt to umount $DIR2.

Add trap to all child scripts to have then to cleanup on exiting.

Seagate-bug-id: MRP-2106
Change-Id: Ie9453449ceea3657881ebc0ce1edeb9e259c848e
Signed-off-by: Lokesh Nagappa Jaliminche <lokesh.jaliminche@seagate.com>
Reviewed-on: http://review.whamcloud.com/18475
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7718 utils: lfs getstripe does not work on bind mount

As /etc/mtab does not list original mount type and fsname
of bind mount point, using /prc/mounts to list out the
properties of the original mount point of bind mount.

Signed-off-by: vinayakswami hariharmath <vinayakswami.hariharmath@seagate.com>
Change-Id: Icf3644303552d56ad4e336decc5fadca581ff358
Reviewed-on: http://review.whamcloud.com/18195
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7627 tests: parse filefrag output correctly

Tests 130[bcde] failed because numbers parsed with leading zeros
treated as octal number so there is error value too great for base
when parsed lun id greater than 0007 value, Removing leading spaces
in parsed number and adding prefix 0x in condition solved the issue.
sanity tests 130[bcde] can now run on 10+ osts.

Seagate-bug-id: MRP-2107
Signed-off-by: Jadhav Vikram <jadhav.vikram@seagate.com>
Change-Id: I29d8c77905e5f23666b0e691cc2928d0cc858f59
Reviewed-on: http://review.whamcloud.com/17794
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 libcfs: remove typedefs in libcfs source code

Convert most of the typdefs used in libcfs source code
to a standard struct. Only a few left which will be
completely removed in later patches. The typedef
function pointers will be the only ones remaining.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I5c9efa10d23904c456154d5afbb80c4197e91641
Reviewed-on: http://review.whamcloud.com/17202
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 libcfs: remove types.h from userland code

With the cleanup of lustre_idl.h much of the lustre
specific types from the libcfs types.h header has
been removed. This cleanup removes the remaining
user land references allowing use to make libcfs
types.h a kernel specific header.

Change-Id: I888ace937993bab9f63be00fd3ef5e8f3a0a1803
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16879
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5783 o2iblnd: Add Fast Reg memory registration support

FMR is deprecated and it not supported by the mlx5 driver.
This patch add memory management extensions support as
backup of FMR.

Change-Id: I58f01aac3cbef0edc0934d75bcf13888f84beb0d
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/17606
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7256 tests: wait current LFSCK to exit before next test

During the sanity-lfsck tests, some test cases only check the
LFSCK status on MDT0, and go ahead if the status matches the
expected one. For DNE cases, such check maybe not enough, and
may leave un-finished LFSCK instances on other MDT(s). It may
cause the following trouble:

1) When move to next test case, the un-finished LFSCK instance
   may cause new LFSCK instance failure or other strange LFSCK
   check/repair behaviour.

2) If it is the last test case, and some MDT(s) umounted, then
   when some un-finished LFSCK instance on another online MDT
   needs to talk with some umounted MDT, then related RPC will
   trigger reconnect to the umounted MDT, and the LFSCK engine
   will hung there till such MDT mount again. That is the case
   for this ticket hit. In fact, we should allow the server to
   umount even though some LFSCK instances run on other nodes.
   The LU-6684 patch (http://review.whamcloud.com/#/c/17032/)
   will handle that more properly.

This patch is mainly for adjusting test scripts to wait all
the LFSCK instances to exit before the next test case.

There are two options to wait all the LFSCK instances to exit:

1) Check the LFSCK status via related lproc interface on each
   target (MDT/OST) one by one.

2) Export some new interface to check all the LFSCK instances'
   status via single command.

We choose the later solution. Because it is convenient for the
sys-admin. The new interface is 'lctl lfsck_query'. Its usage:

lctl lfsck_query <-M | --device MDT_device> [-h | --help]
                 [-t | --type check_type[,check_type...]]
                 [-w | --wait]

options:
-M: device to query LFSCK on.
-t: LFSCK type(s) to be queried (default is all).
-w: do not return until LFSCK not running.
-h: help message.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0bdab85e47eb290bfe3605dfc37caf7ea35d186a
Reviewed-on: http://review.whamcloud.com/17406
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7038 obdclass: lu_site_purge() to handle purge-all

if the callers wants to purge all objects, then scanning
should start from the first bucket.

Change-Id: I9f75ff0bd10e1d501e7c790b03ef6a73819a96d1
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18505
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7815 mdt: pinger should not evict MDT-MDT export

MDT-MDT export should not be added obd_chained_timed
list, to avoid being evicted by evict pinger thread.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I99ae008905b3654a9ddc66ec60c27613f9930592
Reviewed-on: http://review.whamcloud.com/18676
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7825 mdt: release parent lock correctly for rename

Release source and target parent lock correctly in
mdt_reint_rename_internal().

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Iea4f6c4571362bff700167bdf29d2833d5f30a4d
Reviewed-on: http://review.whamcloud.com/18707
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6757 ldiskfs: large EA support

For large EA support, ext4_xattr_check_names() will return -EIO. This
patch fixes that by checking whether the large EA value is saved in an
external EA inode or not.

this mod copys a small subset of http://review.whamcloud.com/16012 that made
this change in el7 ldiskfs into sles12 ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I5f29eb5d87e1a3a0a928298ed3ac993c7a0bcdd1
Reviewed-on: http://review.whamcloud.com/18449
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7581 ldiskfs: RHEL7.2 fix wrong EA inode backpointer check

Port http://review.whamcloud.com/17675 for RHEL7.2:

EA inode is linked back to the parent inode using
i_mtime.tv_sec filed. An inode number bigger 2G gets
mangled due to sign bit extension over the high bits
of tv_sec. It causes parent backpointer checks to fail.
Add an explicit integer type conversion to ignore high
bits of i_mtime.tv_sec.

Alexander Zarochentsev <alexander.zarochentsev@seagate.com>

Fix other code style issues and comments for upstream kernel.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I7d402f000cde5ffb1ededf7a80276538e4465757
Reviewed-on: http://review.whamcloud.com/18436
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7151 tests: fix sanity test_205 on SLES12

Invoke cancel_lru_locks to avoid read from cache.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I46c96629fbba8439439b653ef596c3fdbc96e80c
Reviewed-on: http://review.whamcloud.com/17826
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-7646 o2iblnd: connrace protocol improvement"

This reverts commit a62050bbcf70831f3c16b5c61a04816c1296909b.

Change-Id: I6743a44130dae02ffcd8ca0adf43cfe3b6d461ed
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-on: http://review.whamcloud.com/18541
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7778 osd: check if the object is destroyed

Do not do reference increase, if the object is
estroyed.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I260f9850fe325b8a5bc5693bc3e25a84eeec6da7
Reviewed-on: http://review.whamcloud.com/18509
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7766 lnet: Don't call roundup_pow_of_two on zero in LNetEQAlloc

roundup_pow_of_two return when called on a zero argument is
undefined, so don't call it like that.

This fixes a problem introduced by commit http://review.whamcloud.com/16914
since 0 is a valid count parameter for LNetEQAlloc. Also manifesting
itself as an annoying kernel warning:
LNet: 3486:0:(lib-eq.c:85:LNetEQAlloc()) EQ callback is guaranteed to get every event, do you still want to set eqcount 1 for polling event which will have locking overhead? Please contact with developer to confirm

Change-Id: I9874d50807fff7bb3a039aa9c2eb4f9ca8565242
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/18370
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>

LU-7704 utils: check LOOP_CTL_GET_FREE aginst target kernel

instead of a building host.

Change-Id: I838d4c6a8ed076013aaba7bc0aa8eb434fa10be8
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18121
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7764 kernel: kernel update RHEL 6.7 [2.6.32-573.18.1.el6]

Update RHEL 6.7 kernel to 2.6.32-573.18.1.el6

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I3742514df667277fc16b4e4a71a75f3e8b16a8f9
Reviewed-on: http://review.whamcloud.com/18395
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7724 mdd: create mdd_changelog_on and _off functions

Change mdd_changelog_on with arguments for on and off into
two distinct functions, one for on, one for off

Signed-off-by: Ben Evans <bevans@cray.com>
Change-Id: I42f930fb6a81421c8cd42ebfe95a521007bf5df6
Reviewed-on: http://review.whamcloud.com/18223
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5092 nodemap: save id maps to targets in new index file

Creates a new nodemap index file on registered targets and modifies
it on changes to the nodemaps. On init, the nodemap/idmaps are
loaded into memory from the index file. Currently only the MGS
registers itself.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ie2b84e25ecc02d5d3daebf268d2f72ffaf142758
Reviewed-on: http://review.whamcloud.com/11813
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6389 utils: fix lustre_rsync read retry

The read() syscall could return less than the amount of data
requested. Retry the read call until all data is read or an
error is returned.

Even though Lustre will retry the short read internally, the
code may as well be written correctly.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I712969c4f920b53fa6dc27ddcb968cb82df88a44
Reviewed-on: http://review.whamcloud.com/18275
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2049 grant: add support for OBD_CONNECT_GRANT_PARAM

Add support for grant overhead calculation on the client side.
To do so, clients track usage on a per-extent basis. An extent is
composed of contiguous blocks.
The OST now returns to the OSC layer several parameters to consume
grant more accurately:
- the backend filesystem block size which is the minimal grant
  allocation unit;
- the maximum extent size;
- the extent insertion cost.
Clients now pack in bulk write how much grant space was consumed for
the RPC. Dirty data accounting also adopts the same scheme.

Moreover, each backend OSD now reports its own set of parameters:
- For ldiskfs, we usually have a 4KB block size with a maximum extent
  size of 32MB (theoretical limit of 128MB) and an extent insertion
  cost of 6 x 4KB = 24KB
- For ZFS, we report a block size of 128KB, an extent size of 128
  blocks (i.e. 16MB with 128KB block size) and a block insertion cost
  of 112KB.

Besides, there is now no more generic metadata overhead reservation
done inside each OSD. Instead grant space is inflated for clients
that do not support the new grant parameters. That said, a tiny
percentage (typically 0.76%) of the free space is still reserved
inside each OSD to avoid fragmentation which might hurt performance
and impact our grant calculation (e.g. extents are broken due to
fragmentation).

This patch also fixes several other issues:

- Bulk write resent by ptlrpc after reconnection could trigger
spurious error messages related to broken dirty accounting.
The issue was that oa_dirty is discarded for resent requests
(grant flag cleared in ost_brw_write()), so we can legitimately
have grant > fed_dirty in ofd_grant_check().
This was fixed by reseting fed_dirty on reconnection and skipping
the dirty accounting check in ofd_grant_check() in the case of
ptlrpc resend.

- In obd_connect_data_seqprint(), the connection flags cannot fit
  in a 32-bit integer.

- When merging two OSC extents, an extent tax should be released
  in both the merged extent and in the grant accounting.

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I9c738235583324dfae7eade034db28a8161f8ef5
Reviewed-on: http://review.whamcloud.com/7793
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7713 osd: osd-zfs should serialize destroy vs. others

otherwise we can get unexpected EEXIST from DMU at any time.
sanityn/91 hits this regularly.

Change-Id: I948413a0689f1ceae7f073a1e33adef023eb274c
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18155
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7437 utils: continue on errors in lctl {get,set}_param

The lctl get_param, list_param, and set_param code was accidentally
changed from "handle all parameters and return any error at the end"
to "stop at the first failed parameter and return an error".

The "try all parameters" behaviour was originally implemented by
user request in 2.2.90-14-g5ad45a6 http://review.whamcloud.com/3245
but appears to have recently been broken in v2_7_63_0-45-g2d11035
by http://review.whamcloud.com/17223 moving lprocfs_param_pattern()
into the parameter handling loop rather than in the *param_display()
functions.  The new error handling for lprocfs_param_pattern() was
handled as an error, when it would previously only have been saved.
This was further aggrivated by the reorganization to cfs_get_paths()
in v2_7_66_0-10-g85cbe1a.

Restore the old error handling and add tests to verify it works.
Print error messages whenever parameter processing fails before
continuing to the next parameter, so that the user can see which
parameter(s) had the error.  Include the operation type in the
messages printed by param_display(), since this is available.

The cfs_get_paths() patch also broke "lctl list_param -D" handling
(print only directories) by returning an error when this option was
used because display_param() returned NULL for non-directories (so
they wouldn't be printed), but the error handling treated this as
ENOMEM.  Move the handling for non-directories out of display_param
to avoid this, and add a test for "lctl list_param -D".

Fix a new memory leak in display_name() if realloc() fails.  Use the
existing param_name and don't append the type suffix in this case.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I710585a4b8614f00e3837560a968cd4f0c300c1e
Reviewed-on: http://review.whamcloud.com/18338
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7579 mdd: do not mark object as an orphan early

do not set LUSTRE_ORPHAN_FL before calling __mdd_orphan_add()
as racing mdd_la_get() may set ORPHAN_OBJ causing an false
assertion in __mdd_orphan_add().

Change-Id: If8a9417cdb3c0a9d1e96ac1345e841dc5fc89b53
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18444
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7774 lnet: issue in the offset in hash table

the offset in hash table is overflowed for no wildcard portal.
The offset for no wildcard has been corrected as for wildcard
in the LU-1622

Signed-off-by: Alyona Romanenko <alyona.romanenko@seagate.com>
Change-Id: Ib45539ade0e3ed127d82448333da8f91b3146291
Reviewed-on: http://review.whamcloud.com/18422
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7680 mdd: put migrated object on the orphan list

do this unconditionally. mdd_finish_unlink() can't do this
properly as la_nlink cached in ma greater than 0. this
results in lost inodes (i.e. we still have an entry in OI,
but corresponding inode doesn't exist) and many messages
like:
LustreError: 3458:0:(osd_handler.c:3239:osd_object_ref_del())
lustre-MDT0001-osd: nlink == 0 on [0x240000403:0x53b:0x0],
maybe an upgraded file? (LU-3915)

the patch also adds a sanity check in osd_object_release() to
ensure that nobody is trying to leave non-destroyed object with
nlink = 0.

Change-Id: Iecfae75944854d8e9613431acb68ad17dfea90f0
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18032
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7579 osd: move ORPHAN/DEAD flag to OSD

If a directory is unlinked while it is opened, then
it will set LUSTRE_ORPHAN_FL flag to osd object, so
the remote target can get this status by dt_attr_get().

And also LUSTRE_ORPHAN_FL will be stored inside
LMA (converted to LMAI_ORPHAN). In the meantime,
it will remove LUSTRE_SLAVE_DEAD_FL flag for dead
stripes of a striped directory, i.e. all of
dead directories (stripes) will use LMAI_ORPHAN
in LMA to indicate the status.

Also in osp_xattr, it should retrieve the return
value from the reply, which is the real length of
EA, instead of using reply buffer length.

Change-Id: I14e933b5f008981cbacbdfa478e4ec8cbebf97dc
Signed-off-by: Di Wang <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/18024
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

New tag 2.8.50

Start of 2.9 release development cycle

Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Change-Id: I2643371cba4b25743eac142849d3ffc5935b3b50

LU-7725 osp: get update reply from replied req

Only tries to retrieve update reply from replied request,
to avoid unnecessary error console message.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I7b31d3e6b2a66fcce8fc77173f2a438216573f98
Reviewed-on: http://review.whamcloud.com/18232
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7737 lod: not return -EIO during process updates log

Do not return -EIO in lod_process_updates_recovery(),
otherwise the update log might be deleted incorrectly,
especially when doing umount during recovery, see
llog_process_thread().

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I822dae9984eb044ce3d63a30d8bb24294f46dd65
Reviewed-on: http://review.whamcloud.com/18308
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6020 gss: add lsvcgssd init.d wrapper

This patch adds a trivial init.d wrapper for
lsvcgssd so that it can be started automatically.

Xyratex-bug-id: SNT-15

Change-Id: Ida6770a80ee1dd04025d6eabf4202a61179dc9be
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-on: http://review.whamcloud.com/15546
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7679 build: strengthen lustre-dkms during on-target build

This patch modifies internal scripts to strengthen
lustre[-client]-dkms package vs issues ([spl,zfs]-dkms not built,
configure errors, ...) that could be encountered during on-target
DKMS build step.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I2dc8c91ff5874228dde0bc7995e45a6dcc6973d4
Reviewed-on: http://review.whamcloud.com/18167
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Reviewed-by: Michael MacDonald <michael.macdonald@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-3531 doc: update lfs migrate and mkdir documentation.

lfs mkdir command has it's own man page. Update lfs main man page
to point directly at that page. lfs migrate is updated to accurately
record usage and limitations.

Signed-off-by: Richard Henwood <richard.henwood@intel.com>
Change-Id: Ie247f1f267a4d55a43ce5aaa734c8b8da487771c
Reviewed-on: http://review.whamcloud.com/17392
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7623 mdt: Match up prototype and definition of mdt_hsm_cdt_control_seq_write

Change-Id: I2cc59f8d165358e5256f59106ba819dfb50db85c
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17789
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-4423 socklnd: use kernel socket sockopt apis

Change old way of ops->setsockopt or ops->getsockopt in kernel
to kernel_setsockopt or kernel_getsockopt.

Imported from mainline kernel,
commit 80db2734acbc78db12798cfb611d6acc7fe389e6

Change-Id: I996c4b8cd24bd506ffcc87e4ff5ae731c59a9109
Signed-off-by: Fredrick John Berchmans <fredrickprashanth@gmail.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17786
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-7623 lnet: Properly declare lnet_ping() forward declaration

It was missing __user attribute for the userspace pointer before.

Change-Id: I21937c153294387f14e3114d7970dc879a055cbe
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17784
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-7623 lmv: Mark lmv_hsm_ct_register/unregister uarg as __user

Since it is a userspace pointer, this makes things neater and
sparse happier.

Change-Id: I3249ecba20a2018b6ebba4d257ce918b4bd9aed1
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17783
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-7623 lmv: Properly mark lmv_fid2path uarg argment as __user

This makes sparse happy too.

Change-Id: Ice8067168af9a6d13900e6224d3224dbb6bf0541
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17782
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-7623 Update obd iocontrol methods with __user attribute

lmv_iocontrol, osc_iocontrol, mdt_iocontrol, mgs_iocontrol, ofd_iocontrol,
osc_iocontrol, osp_iocontrol and echo_client_brw_ioctl were somehow missing
the __user attribute for uarg.

Change-Id: I10603823f5856fee6ca48c2aea03273e9d29144e
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17781
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-6587 obdclass: use OBD_FREE_LARGE with OBD_ALLOC_LARGE

The change to use is_vmalloc_addr() instead of checking the allocation
size was introduced in commit 919b85d796f8, which allows using trying
kmalloc() before vmalloc(), but the deprecation of OBD_FREE_LARGE()
should not have happened since this adds needless overhead.

Use OBD_FREE_LARGE() for memory allocated with OBD_ALLOC_LARGE() so
that we only need to check is_vmalloc_addr() in OBD_FREE_LARGE()
instead of every call to OBD_FREE().

Add comments to data structures using OBD_ALLOC_LARGE() memory so
that it is clear to the users that OBD_FREE_LARGE() must be used
when freeing that memory.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ief38142f6f777eec4ec0dae4ec64bfbf78b804ed
Reviewed-on: http://review.whamcloud.com/18034
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>

LU-6719 osd-zfs: Ignore EEXIST during object init

ZFS can return EEXIST if object exists but is being destroyed.

Specifically see dnode_hold_impl()

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Id99b406b2f02a1337b9f1566fba30dbced755d5d
Reviewed-on: http://review.whamcloud.com/18054
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7578 gnilnd: Return correct error on GNI_RC_ERROR_NOMEM

gni_mem_register() can now return GNI_RC_ERROR_NOMEM.
The upper layers need GNI_RC_ERROR_RESOURCE returned so that the
registration will retry.
In kgnilnd_mem_register, convert GNI_RC_ERROR_NOMEM to
GNI_RC_ERROR_RESOURCE.

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I117acbe7ed24447bb2cf6d36b7f4814eea05ac2d
Reviewed-on: http://review.whamcloud.com/17666
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7578 gnilnd: Handle new return code in gni_mem_register()

gni_mem_register() can now return GNI_RC_ERROR_NOMEM. Add
GNI_RC_ERROR_NOMEM to the case statement of handled return codes.

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ib591212f070b5eb15240fa4bdd247aa3deb4357a
Reviewed-on: http://review.whamcloud.com/17665
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7578 gnilnd: Add module parameter reg_fail_timeout

During network outages on very large machines, it is possible to use
up all of GART space with connections that are in purgatory waiting
to be freed when we finally make a new connection.
This mod adds a timeout parameter so that when we fail registering
memory for fma blocks for a period of time, we can bring the node down
so it is not stuck in a state of being up but unusable.
This can only happen on service nodes as there can potentially be 10s
of thousands of connections.
A recommended setting for reg_fail_timeout would be 60 - 300 seconds.
The default setting for reg_fail_timeout is -1 (disabled).

Set fail_loc 0xf002 which fails memory registrations and see that we
BUG after the required timeout.
Test that transient registration failures within the timeout period
do not cause BUG.

Signed-off-by: Chris Horn <hornc@cray.com>
Signed-off-by: Chuck Fossen <chuckf@cray.com>
Change-Id: I214b5e5a297c547f3c4675fcc263e5dd8aaed24f
Reviewed-on: http://review.whamcloud.com/17664
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7646 o2iblnd: connrace protocol improvement

This patch can allow a peer that has lower NID to win the connection
race if it has already lost the race for many times.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I49c8151469ff9c4019213117396c49231f6b6948
Reviewed-on: http://review.whamcloud.com/18037
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7729 target: fix process_req_last_xid() return value

process_req_last_xid() returns ptlrpc_error() on error, which
actually returns 0 to caller mistankely.

Test-Parameters: envdefinitions=ONLY=failover_ost \
clientcount=4 osscount=2 mdscount=2 mdtcount=1 \
austeroptions=-R failover=true iscsi=1 \
testlist=recovery-mds-scale

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I136a8ef153a3ea08dcbf05e11fb412e31947be20
Reviewed-on: http://review.whamcloud.com/18245
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7584 tests: create file on single MDS in sanity test 129

In sanity test 129, it creates only one more file to check
whether the directory size exceeds the limit or not. However,
with DNE configuration, the new file might be created in a
different stripe from the previous one that hit ENOSPC.
So, directory size might not exceed the limit, which causes
the test fail.

Since the test is for checking ldiskfs dir size parameters, the
patch just fixes it to create files on single MDS so as to make
sure creating new files will increase the directory size.

Test-Parameters: envdefinitions=ONLY=129 clientdistro=el7 ossdistro=el7 mdsdistro=el7 mdscount=2 mdtcount=4 testlist=sanity
Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I75a2437fe3a4f6b160651d8704799ce8478a0041
Reviewed-on: http://review.whamcloud.com/18192
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6824 ldiskfs: add dir htree growing warning patch

RHEL 7.2 and SLES 12 were supported after landing commit
07660ad33a7d109cced29b6400f99f25adab3f54. This patch adds
the missing ext4-give-warning-with-dir-htree-growing.patch
into the series files for both distros.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I40b4d34de467bc933dd43e175d78e37f59d91b16
Reviewed-on: http://review.whamcloud.com/18169
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7564 osp: Do not match the lock for OSP

In DNE operation, we do not need match the lock
in the OSP cache, so to lock the remote object
exclusively on master MDT, then other threads on
master MDT will not be able to access the remote
object at the same time.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I69a4f243fb26f4e37857fea6fd63b650b6ad046e
Reviewed-on: http://review.whamcloud.com/18206
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6225 test: test-framework does not cleanup for failed tests

adding reset_fail_loc to error_noexit() func in test-framework
which resets fail_loc and makes sure that the next test
will be started with no error injected.

Xyratex-bug-id: MRP-2079
Signed-off-by: gaurav mahajan <gaurav.mahajan@seagate.com>
Change-Id: I8cadd21a794d0eb429aee4734d47bd56caf0b8fe
Signed-off-by: gaurav mahajan <gaurav.mahajan@seagate.com>
Reviewed-on: http://review.whamcloud.com/13692
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-3953 build: Fix duplicate snmp directory packaging

The %{_datadir}/lustre/snmp/mibs is in conflict with the later
%{_datadir}/lustre in the %files section. Fortunately, it just
prints a warning rather than aborting the process. But we can
fix that warning.

We remove the more specific %{_datadir}/lustre/snmp/mibs since
the files are already included with the more general form.

Change-Id: I293f0bf07760719f7cf3e1a963e49c007a483311
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/18191
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Brian J. Murrell <brian.murrell@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7578 gnilnd: Modify allocator flags to prevent waiting

kgnilnd currently utilized several flags to try and prevent specific
things from causing the node to hang. This has not been enough to
prevent oom conditions from stalling all network traffic on computes
nodes during periods where memory filling tests are run doing IO.
Based on discussions with the kernel group we are adding a new flag
__GFP_NORETRY to the allocator flags in the hopes that it prevents the
allocator from spinning forever. Change GFP_NOFS to GFP_NOIO to fully
protect against any "IO" occuring in an IO path.

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I2bcc71ebf6e8ff75d2ac41cae44387294328c74c
Reviewed-on: http://review.whamcloud.com/17663
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7468 tests: update maloo_upload.sh to create upload.tar.gz

Uploaded files are now expected to have the '.tar.gz' extension.
This patch updates maloo_upload.sh to create upload.tar.gz before
uploading.

Signed-off-by: Leonel Ochoa <leonel.ochoa@intel.com>
Change-Id: Id8b6dd08dde873fad9e85438360e451945903e9c
Reviewed-on: http://review.whamcloud.com/17344
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7703 mdd: linkea should be updated properly at migration

when we're migrating a directory and fix children's linkeas,
do this correctly - search for old fid, replace with a new one.

Change-Id: Ib48f73d51ca635083d733202c59a9bdcdfe116fb
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18109
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7443 llog: remove unused and empty llog

This patch adds ability to remove plain llog during record
cancellation for inactive plain llog. Before it such files
were removed during mount operation. And this is not enough
for changelog. The current marker of catalog could reach the
undeleted record, and this causes changelog problem.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Seagate-bug-id: MRP-2897
Change-Id: Ic24a1643f2fb264ad1212668e382a0bbc9b735b7
Reviewed-on: http://review.whamcloud.com/17227
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5030 util: migrate lctl params functions to use cfs_get_paths()

Make the normal lctl set_param,list_param, and get_param
operations to use the new cfs_get_paths() function which
enables sysfs support along side procfs.

Change-Id: I5817e96c3172de53930776f0891f2a642907bfde
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Wang Chao <chao.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/17466
Reviewed-by: Ryan Haasken <haasken@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7715 out: fix err_serious in out_handle

Only return err_serious before out_handle() pack reply.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I088501019c3b79561e8a0c43609e33f3a5a7d746
Reviewed-on: http://review.whamcloud.com/18187
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7716 mdt: No is_subdir check for same dir rename

In rename, if the source and target are in the same
directory, then it does not need is_subdir check.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I03a4aff71b2c284197a8f78f6306568249162aca
Reviewed-on: http://review.whamcloud.com/18172
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7705 ldlm: make round_timeout() static

to make gcc5 happy.

Change-Id: I5e92facd497c04b2595dea3782935f2cc5791de1
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/18119
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7576 llapi: use dirname() in opendir_parent()

In opendir_parent() pass the path through dirname() so that the
resulting directory may be used with basename().

Add test_230i() to sanity.sh to ensure that lfs migrate -m tolerates
trailing slashes.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I330717da540618052bc5efbb5df9cbe6c4194050
Reviewed-on: http://review.whamcloud.com/17796
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7229 hsm: relax time check of sanity-hsm test_60

If the copytool and test script round clock time in a different
way, a strict time check would causes failure.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I97ebe02d6a0cdd9425ef68e5770e63ac9968ebaa
Reviewed-on: http://review.whamcloud.com/17742
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7578 gnilnd: Revert max_immediate setting

max_immediate was changed based on performance testing for
5.2UP04 and 6.0, this caused the eager_recv path to always use vmalloc
when allocating space for new eager messages. The vmalloc path is very
slow especially when constantly freeing at the same time across all
CPU's

This change will also cause more messages to be governed by the
service nodes rdma engine.

Modifications
max_immediate default is now 2048.
max_immediate is now read only.
eager_credits is now writeable at run time.

Signed-off-by: James Shimek <jshimek@cray.com>
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I2754d28b1f05a7aeaaeac7fc5f41f1f36568d79c
Reviewed-on: http://review.whamcloud.com/17667
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 libcfs: remove userland headers from libcfs.h

Currently libcfs.h is used as a master header that
contains all the needed headers. Since Lustre user
land utilities and applications no longer have a
strong dependency on libcfs.h we can remove all
the added user land headers contained in libcfs.h.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I6403d109875a1d42d8490a3a1c7635f2dac9fc90
Reviewed-on: http://review.whamcloud.com/16914
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6245 libcfs: make libcfs_ioctl.h and lnetctl.h uapi compliant

For UAPI headers the policy is to only have data
structures shared between user land and kernel
space. All non data structures except a reference
to libcfs_ioctl_data_adjust() have been removed.
libcfs_ioctl_data_adjust can go away when the two
module.c files for libcfs will merger. For lnetctl.h
we remove userland only function prototypes.h

Change-Id: I4e09041a7f0b590d7eb81eda32f0bccdfb9d28ac
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/17643
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6684 lfsck: set the lfsck notify as interruptable

If the LFSCK engine is notifying the remote LFSCK engine about some
LFSCK event, such as LE_PHASE1_DONE, but if the remote server (MDT
or OST) is offline, then such notification RPC will be blocked until
the remote server is online. At that time, if someone wants to stop
the LFSCK, he/she has to wait.

To avoid such trouble, we will make the LFSCK notification RPC to
be interruptable. Then even if some remote server is offline, the
running LFSCK still can be stopped.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ie9220bc578eb9fe1b1b804a6732fe8ecfba4affb
Reviewed-on: http://review.whamcloud.com/18082
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

New tag 2.7.66

Change-Id: I540150c9567b137ea14fb4799fa1e2e942ac6b52
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7710 test: sync all clients in recovery-small 130[acb]

In recovery-small test_130[abc]() call sync on all clients rather than
just on the client where the test script is running.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I1d7aec650d08a6fb417a5df3509b657e9ccda902
Reviewed-on: http://review.whamcloud.com/18138
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7070 tests: Skip sanity 24x based on server version

sanity test 24x tests cross MDT rename and link. Cross-MDT
rename and link was added to Lustre after the 2.7.55 tag. Thus,
only run sanity 24x for server version 2.7.56 or later.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I0c2e8c8581b8499ec7f1a25092b17be29aa49c1e
Reviewed-on: http://review.whamcloud.com/17990
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7273 tests: dump stacks upon CT stop failure

This patch adds full threads stacks dump upon copytool stop failure
at end of grace period, in sanity-hsm/wait_copytools().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I3da4876b55fbc72c941bbf75cc89819acecc82c0
Reviewed-on: http://review.whamcloud.com/16782
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6788 build: Remove build/lbuild backwards compatibility symlink

Enough time has passed since lbuild was moved to contrib to remove
the symlink that we left behind in the build directory to accommodate
Intel's build farm.

Change-Id: I4d3b6038aad0663c3030590d161b6d71d05e6d43
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/15464
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5147 doc: design docs in documentation dir

Move design documents into the ./Documentation directory.
Update references to design documentation in the source code.
Minor readability updates to ldiskfs.txt.

Signed-off-by: Richard Henwood <richard.henwood@intel.com>
Change-Id: Ia4d1662225d019358876caade6f564c48f450fff
Reviewed-on: http://review.whamcloud.com/10618
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Christopher J. Morrone <morrone2@llnl.gov>

LU-7309 lod: notify client retry creation

In lod_alloc_rr, if there is no available OSTs to allocate
the object required by some client and there is OSP connecting
to OST at the same time, then it should indicate the client to
retry the creation request later.

Change-Id: I6740edf830dbe736e33e24c92387df371f070570
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/17839
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7623 lov: Get rid of an ugly statfs hack in lov_iocontrol

For some crazy reason ll_obd_statfs decided to decode async flag
passed from userspace and then pass it via a userspace pointer
argument to lov_iocontrol.
This patch moves flags decoding to lov_iocontrol where it belongs.

Change-Id: I1b54e778d60b878fc3fc463c256aad360b2cab21
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17780
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-7623 lnet: Get rid of IOC_LIBCFS_PORTALS_COMPATIBILITY ioctl

This has been unused for ages and could be safely removed now.

Change-Id: I89af1bcce77119780de623b69ee1c74da1bfcce2
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17779
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-7623 lnet: Get rid of IOC_LIBCFS_DEBUG_PEER hack

IOC_LIBCFS_DEBUG_PEER was added back in the stone ages to print debug
statistics on a peer when peer timeout happens.
Redo it properly as a separate LNet API call,
also get rid of "ioctl" forwarding into the underlying LNDs,
since no current LNDs implement this function anymore.

Change-Id: I3ec68a28faf840eb67d6084aa0fa5dcbbe2d7567
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17778
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>

LU-3538 dne: Commit-on-Sharing for DNE

This patch contains three parts:
1. Sync-on-Cancel for cross-MDT lock, which eleminates dependency
   between transactions and distributed transaction which modified
   remote object, this can guarantee the change of the distributed
   transaction will not be lost.
2. enable Commit-on-Sharing for DNE, PW/EX locks will be converted
   to COS locks, but by default they are ignored, when operation
   finds itself a distributed transaction, it will lock with
   LDLM_FL_COS_INCOMPAT flag to check against existed COS locks.
   This will eliminate dependency between distributed transaction
   and transactions which modify the same local object, and it
   guarantees distributed transaction can always be recovered.
3. striped directory creation needs to ensure its parent permanent
   on disk, to ensure this, cache child locks in mkdir.

Sync-on-Cancel for cross-MDT lock

When two operations have dependency on an object, and the first
operation has a PW/EX cross-MDT lock on this object, trigger
transaction commit on the MDT where the object resides to
eliminate dependency, in short, this patch eliminates dependency
between locks and existed PW/EX cross-MDT lock.

This patch contains following changes:
* enable Sync on Cancel for DNE by default.
* save cross-MDT lock into tgt_uncommitted_soc_locks after use,
  and it will be released upon transaction commit, note, just
  a lock refcount is taken when lock is saved, the read/write
  count is released in mdt_object_unlock().
* the saved cross-MDT lock will be discarded upon BAST,
  because the MDT where the object resides will do sync on lock
  cancel.
* use existed BLOCKING_SYNC_ON_CANCEL mechanism to commit
  transaction upon cross-MDT lock cancel.

Commit-on-Sharing for DNE

On DNE, Commit-on-Sharing is disabled by default, but MDT local
PW/EX lock will be saved as COS lock, and such lock will be
ignored in compatilibity check by default, unless it's required,
there are two situations:
1. when distributed transaction locks local object, it will
   conflict with COS locks.
2. when distributed transaction enqueues cross-MDT lock, it will
   conflict with COS locks.

This patch contains following changes:
* on DNE, local PW/EX lock is converted to COS and saved like
  before even when COS is not enabled.
* above COS locks will be ignored in lock compatibility check by
  default, so for local operations COS won't take effect. But if
  operation finds itself may modify remote MDT object, it will lock
  all local locks with COS checked.
* cross-MDT lock will always conflict with COS locks.
* if operation is reint, it will check whether it's a distributed
  operation (involved objects are remote or striped) if so, check
  against COS locks when enqueing locks.

Eliminate dependency in dir creation

Mkdir needs to take a lock on child, so that any subsequent
distributed operation using that directory would observe a conflict
and ensure that the original mkdir is committed.

Benchmark result with createmany/unlinkmany is as follows:
        mkdir rmdir open unlink mknod unlink (ops/sec)
2.6     1194 1310 1314 1185 2242 1396
master   978 1166  937 1028 1681 1202
current  930 1161  918 1018 1691 1202

* 10 createmany/unlinkmany processes running on local client
  (on MDS), 4M dirs/files created/unlinked, and the numbers are
  average of 10 processes.
* for 2.6, each process is running on a separate mountpoint.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I91928d097cbb26bd1e1089c3f8851ac6a6440a69
Reviewed-on: http://review.whamcloud.com/12530
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7638 recovery: do not abort update recovery.

When normal recovery timeout, if there are update
replay in the queue, it should still keep the
exports of other MDTs and continue update replay
until recovery is manually aborted.

Add tdtd_recovery_threads_count/waitq to manage
the update recovery threads(retrieving the update
log), so during abort, these recovery threads
should be stopped, then it can cleanup the update
replay reqs in the list.

Fix the negative recovery time console message.

Add test cases replay-single 119 and 120 to verify
these cases.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Iedcc4922f1500aedec664ff70266b6d2e9f812de
Reviewed-on: http://review.whamcloud.com/17885
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7490 recovery: abort update recovery once fails

If update or MDT-MDT recovery fails, then we abort
the replay and resent, because further updates might
cause filesystem or llog corruption.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Icc7241e94159f7f46a99fb003643605fe2a13c8d
Reviewed-on: http://review.whamcloud.com/17199
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7054 o2iblnd: less intense allocating retry

ko2iblnd may retry too frequent for growing pools, all schedulers
are spinning if another thread is in progress of allocating a new
pool and can't finish right away because of high system load.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I21be43c6f77b1ae13d500ecbd6795b6d0099d2f1
Reviewed-on: http://review.whamcloud.com/16470
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7635 utils: Fix lhsmtool_posix interval reporting

At specified time intervals lhsmtool_posix reports how much data it's
written. It should report how much data has been written since last
update, but it reports total data written.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I0e85b81fa2a8cf16474cc832bca30bf1425fa81c
Reviewed-on: http://review.whamcloud.com/17878
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7666 llog: use correct size when freeing log header

In llog_cat_new_log() pass the allocated size of the llog header to
OBD_FREE_LARGE().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ib8a2ae8608918a9913b01dda967365cd9f7a3925
Reviewed-on: http://review.whamcloud.com/18009
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7482 tests: fix uninitialized value in llapi_hsm_test

llapi_hsm_user_request_alloc doesn't zero the memory, so all the
fields in the returned structure must be set.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ib2d99138a5bab6253c00da5d48ebb90e9679e235
Reviewed-on: http://review.whamcloud.com/17364
Reviewed-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7039 llog: update llog header and size

Once update request fails due to eviction or other failures,
all of update request in the sending list should return fail,
because after the failure, the update log in the following
request will have wrong llog bitmap. So once this happens,it
will

1. invalidate all of requests in the sending list.
2. lod_sub will update the llog header from remote target.
3. Then Sending list can accept new request.

Also a few other fixes for llog corruption

1. Because the size in OSP cache is not safe, because no lock
protect it. So we will add lgh_write_offset in loghandle to
track the write offset for remote update llog, and revalidate
the offset during updating the llog header.

2. rollback the lgh_index and bitmap once add new records
fails.

Add replay-single.sh 118 to verify the case.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I2d3a700d3363867ac60aeb6b7641eceb65dfe12a
Reviewed-on: http://review.whamcloud.com/16969
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-7324 lnet: Use after free in lnet_ptl_match_delay()

In lnet_ptl_match_delay() we check msg->msg_rx_delayed to see whether
the message has been added to the delay queue. But this check is done
after lnet_ptl_unlock() and lnet_res_unlock(), and the message can be
processed and freed before the check.

Replace the check with checking rc against LNET_MATCHMD_NONE, which
is how the callers of lnet_ptl_match_delay() know whether the message
was added to the delay queue. To make this work we reset rc in the
loop when there was no match and the message hasn't been delayed. In
addition reorganize the code and add comments to clarify the logic.

In lnet_ptl_match_md() a similar msg->msg_rx_delayed is replaced for
the same reason.

Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Ifbc6573664fdc4849b9155b6102c8589e692996b
Reviewed-on: http://review.whamcloud.com/17840
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>