git://git.whamcloud.com - fs/lustre-release.git/log

LU-6020 kerberos: proper sg list initialization

This patch adds sg_init_table() calls in order
to have proper sg list initialization including
magics, tables sizes, etc.

Without it, when using kernels with CONFIG_DEBUG_SG
option, the following crash can happen:

kernel BUG at include/linux/scatterlist.h:65!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/system/cpu/online
CPU 0

Pid: 4911, comm: ptlrpcd_3 Not tainted 2.6.32-431 #7 /D525MWV
RIP: 0010:[<ffffffffa0b60170>] [<ffffffffa0b60170>] krb5_make_checksum+0x750/0x770 [ptlrpc_gss]

Change-Id: Ic6c52c8b15393d8d7f67f4bf675c1f57cf27004a
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-on: http://review.whamcloud.com/13631
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>

LU-6030 ldiskfs: clean up ext4-fiemap patch

Move ext4-fiemap patch to osd-ldiskfs. So we can
remove this patch entirely.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I639733f6f106398bbc3d5e2ffc6fa8a06ffe867f
Reviewed-on: http://review.whamcloud.com/13571
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6215 osc: use list_for_each_entry_safe() when delete items

Since we will remove items off the list using list_del_init() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().

Linux-commit: f13ab92effb94c8fc5eade75f6f246facd7ef5be

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I6ec6d8073da6e0aa45e9d8a6ee7cde84ed9cab07
Reviewed-on: http://review.whamcloud.com/13956
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6261 gnilnd: Cray interconnect rollup

I am leaving a few lines in structure definitions that are
longer than 80 columns. It's not the time to reformat the
whole structure.
-------------------------------------------------------------
Subject: Update debug messages for rca and quiesce events
Description:
Change informational message when receiving down event for
better tracking of RCA event issues to display under console
logging.
Clarify the message printed when we receive connection request
from a down node.
Simplify quiesce messages to just report the start and end of
quiesce.
-------------------------------------------------------------
Subject: Limit fma block allocations.
Description:
Under network pressure whereby thousands of nodes need to
reconnect all at the same time, routers can run out of memory
allocating fma blocks for mailboxes since the previous ones
cannot be cleaned up until a new connection is established.
Limit the amount of fma blocks that can be allocated to 3
quarters of total memory. This leaves memory free for other
allocations which tend to be much smaller than the mailboxs.
This should only be needed on service nodes.
Clean up some whitespace in kgn_data_t.
-------------------------------------------------------------
Subject: Double deregistration error.
Description:
lustre:18920 introduced a bug which causes us to deregister
the same memory twice when the transfer is unaligned.
Clean up the tx_buffer_copy after a deregistration so that
kgnilnd_rdma can properly register the memory on the retry.
-------------------------------------------------------------
Subject: Stack reset is causing pings to timeout instead of
failing immediately.
Description:
It is possible to register with the same MDD after a stack
reset causing pings to timeout instead of failing right away.
During a stack reset, we need to deregister with a hold
timeout set so we don't use the same mdd after the stack reset
is complete.
This was found by gnilnd regression test 110c.
-------------------------------------------------------------
Subject: Post rdma resource error
Description:
Handle kgni_post_rdma resource error by unmapping the tx and
put it back on the TX_MAPQ.
Also fixed:
fast_reconn variable check was using the pointer instead of
it's value.
bug that causes a stall when calling
kgnilnd_wakeup_rca_thread() when regression test causes
startup failures and the rca thread has not started yet.
Only call sock_release if socket was created.
Changed some stats prints to print unsigned values so they
don't show as negative.
-------------------------------------------------------------
Subject: limit kgnilnd conns in purgatory
Description:
Currently kgnilnd allows for an infinite number of connections
in purgatory, which in the face of a missed rca event can
cause nodes to slowly run out of memory from continued timed
out connection requests to those halted or dead nodes.
This mod makes the following changes to alleviate this issue:
1. Add a module parameter and live tunable allowing us to
limit
   number of connections per peer held in purgatory.
2. Remove the fast reconnect path on the server by making
   that tunable contain different settings for computes
   and service nodes. fast_reconnect is on for computes and
   off for service nodes. This setting can be changed on a
   live system.
3. In the kgnilnd reaper code utilize the tunable and remove
   the oldest purgatoried connections as new connections are
   put into purgatory. This will keep memory usage down and
   allow a system to stay up in the face of nodes being down
   and rca not informing us that they are down.
-------------------------------------------------------------
Subject: Update kgnilnd to be KNC aware.
Description:
Kgnilnd currently ignores rt_accel nodetype events coming from
RCA. This is incorrect as KNC's down and up events are
reported as rt_accel.
Since we currently ignore rt_accel events this causes us to
continually attempt to talk to down KNC nodes.
With this mod we now recognize rt_accel events allowing us to
prevent
communications with down KNC nodes.
-------------------------------------------------------------
Subject: Always notify LNET on GNILND_RCA_NODE_DOWN
Description:
When an LNET router fails it can take router_ping_timeout +
live_router_check_interval seconds for all peers to detect the
down router. For peers on a gni network this can be over two
minutes. During this time peers will continue to use the
failed router.
In some situations gnilnd will receive an event from RCA
notifying that the node is down within 30 seconds of the node
failure. This is much faster than relying on the router
pinger, so gnilnd should call lnet_notify() to notify LNET,
upon receipt of the RCA event, that a peer is down.
-------------------------------------------------------------
Subject: Add fast reconnect path and update lnet_notify last
alive timestamp.
Description:
A lustre client can time out a router during a blade failure
which causes multiple quiesce cycles.
When we time out a connection, reconnect even if there are no
tx's waiting to be sent. This causes an lnet_notify up
notification so we don't need to wait
for the router pinger to bring the connection back up.
At the end of a quiesce, call lnet_notify that the peer is
still up which updates the last alive timestamp.
Various debug message cleanup.
-------------------------------------------------------------
Subject: gnilnd proc_dir_entry port - part 2
Description:
PDE_DATA is defined by libcfs in Cray-master and therefore
only needed by b2_5
-------------------------------------------------------------
Subject: gnilnd proc_dir_entry port
Description:
In SLES12 create_proc_entry and create_proc_read_entry have
been removed, and struct proc_dir_entry is no longer public.
This mod ports all proc functions to use seq_file.
-------------------------------------------------------------
Subject: Remove system.h from gnilnd
Description:
There is no longer system.h for x86 and gnilnd doesn't seem to
need it.
Remove it from gnilnd include.
-------------------------------------------------------------

Signed-off-by: Chuck Fossen <chuckf@cray.com>
Change-Id: Iad14538751cc50fbd03fd3d4876ca41f4c0a223f
Reviewed-on: http://review.whamcloud.com/13812
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4727 hsm: use IOC_MDC_GETFILEINFO in restore

Use IOC_MDC_GETFILEINFO rather than fstatat() to get the original file
attributes during restore. Add test_12p to sanity-hsm to check that
triggering an implicit restore from the copytool's own mount point
does not wedge the copytool.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I1b1eeb703c60907a2759fdb6d8fb8728a13f8918
Reviewed-on: http://review.whamcloud.com/13750
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6049 obdclass: Add synchro in lu_context_key_degister()

When unloading a module, it may happen that lu_context_key_degister()
removes a key while a thread is either registering it in a new
context (lu_context_init(), lu_context_refill()), or using it when
exiting from a context (lu_context__exit(), lu_context__fini()).

In these cases, we reference a key which no longer exists, and
the system crashes either because we use a *POISON'ed* pointer
in key_fini() -> key->lct_fini(), or because one of the following
assertions fails:
- lu_context_key_degister():
        ASSERTION(cfs_atomic_read(&key->lct_used) == 1)
                  failed: key has instances: 2

- lu_context_exit():
        ASSERTION(key != NULL)

- key_fini():
        ASSERTION(atomic_read(&key->lct_used) > 1)

This can also leads to SLAB objects which are not freed:
        slab error in kmem_cache_destroy(): cache `echo_thread_kmem':
                   Can't free all objects

Note: ptlrpc service threads need to call lu_context_init/fini in
each loop (for each RPC), and this could be a big performance issue
on fat SMP machines if we add serialization by a spinlock and need
to lock/unlock it for multiple times for each RPC.

So the aim of this patch, which only impacts some low frequently used
functions, is:
1) to add a synchronization in lu_context_key_quiesce(), also called
    by lu_context_key_degister(), to wait until all key::lct_init()
    methods have completed, by serializing with keys_fill()
2) to add a synchronization in lu_context_key_degister(), to wait
    until all transient contexts referencing this key have run
    key::lct_fini() method

Signed-off-by: Patrick Valentin <patrick.valentin@bull.net>
Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Change-Id: Id4ad974e8c7b8053d6e35ebce60cfbcf91dc230b
Reviewed-on: http://review.whamcloud.com/13164
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6203 tests: early lock cancel to allow early copytool death

Since copytool death check+timing has been introduced with patch for
LU-5622, sanity-hsm/test_251() has experienced several failures
due to copytool death being delayed and to timeout, because of lock
cancel.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I399b37854b98626c4c92a367d543b79aebf9eb4e
Reviewed-on: http://review.whamcloud.com/13646
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6321 lfsck: make lfsck_namespace trace file as index

Originally, the "lfsck_namespace" file stored both the namespace
LFSCK statistics information and the FIDs to be double scanned.
But to improve the namespace LFSCK performance (since Lustre-2.7),
we used multiple trace files with the name "lfsck_namespace_xx".
At that time, the original "lfsck_namespace" file only need to
record the namespace LFSCK statistics information. So we made it
as regular file, NOT index file. Such changes will cause trouble
when downgrade to Lustre-2.6 or older, becuase the old namespace
LFSCK needs an index trace file instead of regular file. To avoid
the compatibility issues, we will keep the "lfsck_namespace" file
as index file on b2_7 and newer release.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I76d8b1416c4c507793aa9bbab2d52cc7d8daa440
Reviewed-on: http://review.whamcloud.com/13945
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6307 obdclass: distinguish MGC/MDT connection properly

In the 5f8847bca12afb798de600299356ed2e3655a53e, we introduced the
version checking for the MDT-MDT connection. But there is a corner
that the MGC will set OBD_CONNECT_MNE_SWAB (that is defined as the
same as OBD_CONNECT_MDS_MDS) in the connection flags for Imperative
Recovery interoperability issues with MGS. So the server needs to
know whether the connection is really from another MDT or from the
MGC via checking OBD_CONNECT_FID (that is not set for the MGC-MGS
connection).

Test-Parameters: envdefinitions=ONLY=105 clientjob=lustre-b2_6 clientbuildno=19 testlist=recovery-small
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9cee743d5474702b77adbb8c3dedd6c19faef15f
Reviewed-on: http://review.whamcloud.com/13927
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5760 rpm: remove Red Hat specific check for init scripts

The issue with build under mock-based environments is related to
a sloppy heuristic of checking for the existence of checking for
two files under /etc, and assuming that is a good way to identify
a Red Hat system. We had a concern about this for other systems.

So, let's remove this Red Hat specific check of /etc files.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ibc6af75ebea51b39d5ff4c8473db2e3828ffea68
Reviewed-on: http://review.whamcloud.com/12377
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6173 llite: allocate and free client cache asynchronously

Since the inflight request holds import refcount as well as export,
sometimes obd_disconnect() in client_common_put_super() can't put
the last refcount of OSC import (e.g. due to network disconnection),
this will cause cl_cache being accessed after free.

To fix this issue, ccc_users is used as cl_cache refcount, and
lov/llite/osc all hold one cl_cache refcount respectively, to avoid
the race that a new OST is being added into the system when the client
is mounted.
The following cl_cache functions are added:
- cl_cache_init(): allocate and initialize cl_cache
- cl_cache_incref(): increase cl_cache refcount
- cl_cache_decref(): decrease cl_cache refcount and free the cache
if refcount=0.

Also, the fix of LU-2543 is not needed anymore, so reverted.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I22ff10b683b683d49d603e5dc2de3397746a79bb
Reviewed-on: http://review.whamcloud.com/13746
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5393 osd-ldiskfs: read i_size once to protect against race

There have been several occurences of ASSERTION(local_nb[i].rc == 0)
failures in ost_brw_read(), where inode's i_size has changed due to
a racing write/growth beyond EOF. osd_read_prep() must protect
himself against this legal behavior by only reading i_size once.

Also removed m local variable declaration/usage apparently outdated.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I5d931d5254b970e7031363f37114d0bad8b573fa
Reviewed-on: http://review.whamcloud.com/13707
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4306 test: bump grace time in test_4a of s-q

Use longer grace time in test_4a of s-q to make it more
tolerance on timing.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I8580779fe2f7d2f4bb8e119be78b574fb6ac01cb
Reviewed-on: http://review.whamcloud.com/13704
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5937 lfs: ensure a valid directory size in lfs find

For a striped directory (and a striped file) the size returned by
LL_IOC_MDC_GETINFO may not be vaild. In cb_find_init() if the size of
a directory is needed then get it by calling fstat().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Iddb9aa8e6664a09ff866a3995741cae17e1c9962
Reviewed-on: http://review.whamcloud.com/13456
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6070 libcfs: provide separate buffers for libcfs_*2str()

Provide duplicates with separate buffers for libcfs_*2str() functions.

Replace libcfs_nid2str() with libcfs_nid2str_r() function in critical
places.

Provide buffer size for nf_addr2str functions.

Use __u32 as nf_type always

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I7505271954745d1b1e288ef4e09a7f52bd970536
Reviewed-on: http://review.whamcloud.com/13185
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-5843 tests: fix recovery-small test_61

Test used obdfilter last_id as number while it is OID,
e.g. 0x100000000:16. Patch fixes test to exract object ID
from OID.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If921cf41253450ab035a75be6fb34145aee1a197
Reviewed-on: http://review.whamcloud.com/12653
Tested-by: Jenkins
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4839 tests: wait for copytool start sanity-hsm/60

Wait for copytool to start transfer before checking progress interval.
copytool, in certain environments (heavily loaded NFS backed target),
can take an extrodinarly long time (>30s) to open destination file.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I56908a16240b61a51fe1395a8104eddc6aa3131f
Reviewed-on: http://review.whamcloud.com/13731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6137 ldiskfs: simplify nocmtime patch

Simplify the nocmtime patch by patching only ext4_current_time(),
this fixed the defect that original patch doesn't handle setacl
code path, it can also avoid the risk of future changes adding
new places that needs to be fixed.

Remove the obsolete xattr-no-update-ctime patch.

Signed-off-by: Anreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I02928c4f867e9476f0bc1815dd3256e3d79dadf7
Reviewed-on: http://review.whamcloud.com/13705
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>

LU-4223 tests: fix conf-sanity test_32 typo

The t32_wait_til_devices_gone() function incorrectly calls
"lctl devices_list" instead of "lctl device_list" if there
is a timeout waiting for the loop devices to be cleaned up.
Since this is only used for debugging output after an error,
it wasn't actually causing any additional failures.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I858e789b16251835bce7af46e4f5233c95500c1e
Reviewed-on: http://review.whamcloud.com/13265
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat

The sa_spill_alloc()/sa_spill_free() interfaces have been retired.
Callers may either use the more memory efficient zio_buf_alloc()/
zio_buf_free() which are now exported, or they may use their own
allocator.

For the purposes of this patch an osd_zio_buf_alloc()/
osd_zio_buf_free() wrapper function was introduced which layers
on whichever interface is provided.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Id1d19d7b4c440808b8b3fd042f687b10c1b869f2
Reviewed-on: http://review.whamcloud.com/13097
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-6038 osd-zfs: Avoid redefining KM_SLEEP

Due to some long overdue memory management cleanup in the ZoL kmem
implementation the definition of KM_SLEEP has changed.  This change
was expected to be transparent to consumers but it causes issues
for Lustre because it explicitly redefines KM_SLEEP.  This was
originally done to avoid overriding the Linux slab interfaces.

This change implements a more portable fix.  Instead of preventing
the inclusion of the kmem.h header by setting the guard.  The
kmem_cache_* preprocessor macros are explictly undefined to make
the Linux interface available.

The related ZoL pull requests are as follows:

  https://github.com/zfsonlinux/spl/pull/414
  https://github.com/zfsonlinux/zfs/pull/2918

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Id1d19d7b4c440808b8b3fd042f687b10c1b869f3
Reviewed-on: http://review.whamcloud.com/13096
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2828 test: Remove tests from ALWAYS_EXCEPT list

conf-sanity tests 59 and 64 were added to the ALWAYS_EXCEPT list
in commit b2c829b7be757cd2bc523ab0d2857a77eeb7a349 for LU-2469.

Commit 1e7845ecbe5f3e8ac1aa0d3e345e6cf6cf6f0543, for LU-2828, resolves
the cause of the conf-sanity test 59 and 64 failures.

conf-sanity test 59 and 64 need to be removed from the except list.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I4b70485f91e0096c2e4387ebcdc95cf5720a7e16
Reviewed-on: http://review.whamcloud.com/13757
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2194 test: remove test_19b from except list

the patch to fix the problem has been landed, test_19b in
recovery_small should be removed from except list.

Change-Id: I748a7dfb4f70a42a0f17ab93803cb2d6d05b32db
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/13671
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-3680 osd: reduce osd_thread_info in ldiskfs osd

by unioning few rarely used fields. now the structure should
fit a page:

(gdb) p sizeof(struct osd_thread_info)
$1 = 3296

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I75d5c6fefa41884390ce155781e0963884a3ad2c
Reviewed-on: http://review.whamcloud.com/9726
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>

LU-5264 obdclass: fix race during key quiescency

Upon umount, presumably of last device using same OSD back-end,
to prepare for module unload, lu_context_key_quiesce() is run to
remove all module's key reference in any context linked on
lu_context_remembered list.
Threads must protect against such transversal processing when
exiting from its context.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: If2c8199fa764236308b49950672129a63b8877f5
Reviewed-on: http://review.whamcloud.com/13103
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6301 llite: cleanup open handle for client open failure

For open case, the client side open handling thread may hit error
after the MDT grant the open. Under such case, the client should
send close RPC to the MDT as cleanup; otherwise, the open handle
on the MDT will be leaked there until the client umount or evicted.

If the LFSCK marks LU_OBJECT_HEARD_BANSHEE on the MDT-object that is
opened by others for repairing some inconsistency, such as repairing
multiple-referenced OST-object, because the leaked open handle still
references the MDT-object, then it will block the subsequent threads
that want to locate such object via FID.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I1fff2cde179b039e3bee562ef79d5cf3587fe3c8
Reviewed-on: http://review.whamcloud.com/13709
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6280 lod: delete xattr on striped dir

In lod_xattr_del(), it need delete EA on all stripes of
striped directory.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I398a03d6a41daee34a344104d67cf8efa7d97f6a
Reviewed-on: http://review.whamcloud.com/13867
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6312 lfsck: modify llsd_master_list with spin_lock

There was spin_lock leak in layout LFSCK lfsck_layout_slave_quit,
that may cause modifying lfsck_layout_slave_data::llsd_master_list
without spin_lock when others traverses such list with spin_lock,
as to the later one(s) access invalid RAM or fall into soft-lockup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I61749ebd6c36d4b21eb20bcc1c46dbe16a1c7f2c
Reviewed-on: http://review.whamcloud.com/13921
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6256 test: Skip sanity test_184e if MDS version older than 2.6.94

Skip sanity test_184e if MDS version older than 2.6.94

Change-Id: Ib491b079a3adc998a12d9bbcb7985ad2e718453b
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/13845
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5938 mdd: fixed oops when dereferencing structure

In mdd_changelog_ns_store() and mdd_changelog_data_store(),
lu_ucred(env) can be NULL, so do not dereference it.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I45d0cbbb171f05ee1d04e628a3b31c256e0d3951
Reviewed-on: http://review.whamcloud.com/13619
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

Revert "LU-5417 lfs: fix comparison between signed and unsigned"

This change is incorrect after all. While it's a noop on x86_64, it's a very important overflow check for 32bit arches.

This reverts commit b5b354a75b5e697e90892878ecb26459cb9a6a21.

Change-Id: I8810da3407d91e63c6e1c062a483a26ffc1bcd97
Reviewed-on: http://review.whamcloud.com/13903
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5912 build: Fix XeonPhi build

Need an extra check for old kernel style parameters.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I92b1b8579d2190bf526b3194cd83d0917fb3b4af
Reviewed-on: http://review.whamcloud.com/13730
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6216 tests: compile fixes for PPC64, and for clang

Fix the following warnings for PPC64:
  llapi_hsm_test.c: In function 'test101_progress':
  llapi_hsm_test.c:563: error: format '%llu' expects type 'long long
    unsigned int', but argument 8 has type '__u64'

and move the nested functions outside their current functions since
clang doesn't support them.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I034b097f3817a5919adcb8dc3465b00833174f63
Reviewed-on: http://review.whamcloud.com/13800
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Move master branch to 2.8 development

Change-Id: If8635d108b6a10b02e01b747b694bdfab4594ba2
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6263 lmv: fix parent FID for migration

If the migrating directory is under striped directory, it needs
to set right stripe FID for its parent.

Update migration test script (sanity test_230) to do migration
under striped dir.

Add -i to test_mkdir().

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic230f9b63bc21c1391e397a0d3ff689e3f0ba5dc
Reviewed-on: http://review.whamcloud.com/13817
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins

LU-6230 lfsck: reload OSP-object via set LOV EA on LOD-object

Generally, we should use bottom device (OSD) to update parent
LOV EA. But because the LOD-object still references the wrong
OSP-object that should be detached after the parent's LOV EA
refreshed. Unfortunately, there is no suitable API for that.
So we have to make the LOD to re-load the OSP-object(s) via
replacing the LOV EA against the LOD-object.

Once the DNE2 patches have been landed, we can replace the
LOD device with the OSD device.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I960f42dacc8ee23dd98a2b986f0a83cb53b62c15
Reviewed-on: http://review.whamcloud.com/13848
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6138 lfsck: set async windows size properly

If the async windows size is set as zero, then the LFSCK main engine
on the MDT will pre-load objects as fast as possible. Under such case,
if the peer server(s) cannot handle the pre-load requests in time, it
will cause a lot of pre-load requests waiting on the MDT as to memory
pressure. To avoid such trouble, we will forbid to set the LFSCK async
windows size as zero or other too large (> LFSCK_ASYNC_WIN_MAX) valid.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3468236a4a0705ea60b49704583b051c99c77cd5
Reviewed-on: http://review.whamcloud.com/13818
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-5791 lfsck: use bottom device to locate object

For the LFSCK modification, if only updates single object, or the
objects to be updated reside on the same server, in spite of local
or remote, then try to locate the object(s) against the bottom (OSD
or OSP) device; otherwise, there will be some update(s) on the local
server, and others on remote server, then either locate the object(s)
against LOD device or use two transaction for the modification.

Similarly, the transaction handle will be created on the proper device
corresponding to the object(s).

This patch also fixes some memory leak issues caused by using wrong
device for remote modification, one of the reason for LU-6138.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I09a60bed3bd49a193d57214c4252904cb4546ab2
Reviewed-on: http://review.whamcloud.com/13392
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6231 osp: prepare OUT RPC after remote transaction start

According to our current transaction/dt_object_lock framework
(to make the cross-MDTs modification for DNE1 to be workable),
the transaction sponsor will start the transaction firstly, then
try to acquire related dt_object_lock if needed. Under such rules,
if we want to prepare the OUT RPC in the transaction declare phase,
then related attr/xattr should be known without dt_object_lock. But
such condition maybe not true for some remote transaction case. For
example:

For linkEA repairing (by LFSCK) case, before the LFSCK thread obtained
the dt_object_lock on the target MDT-object, it cannot know whether
the MDT-object has linkEA or not, neither invalid or not.

Since the LFSCK thread cannot hold dt_object_lock before the remote
transaction start (otherwise there will be some potential deadlock),
it cannot prepare related OUT RPC for repairing during the declare
phase as other normal transactions do.

To resolve the trouble, we will make OSP to prepare related OUT RPC
after remote transaction started, and trigger the remote updating
(send RPC) when trans_stop. Then the up layer users, such as LFSCK,
can follow the general rule to handle trans_start/dt_object_lock
for repairing linkEA inconsistency without distinguishing remote
MDT-object.

In fact, above solution for remote transaction should be the normal
model without considering DNE1. The trouble brought by DNE1 will be
resolved in DNE2. At that time, this patch can be removed.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib2ed4c290c9ae12b6f544575aa5313f0dc83a5af
Reviewed-on: http://review.whamcloud.com/13710
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5914 lfsck: dt_index_try before dt_lookup

Otherwise it may cause dt_lookup() LBUG when locate the parent
directory MDT-object that is not in cache.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibbed865e58d8f9a4d4b67265b02ba804efb9719e
Reviewed-on: http://review.whamcloud.com/13801
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-5275 gnilnd: Add definition for PDE_DATA

With the move of PDE_DATA to lprocfs_status.h there
was one klnd driver, gnilnd, that needed this define.
So the simple solution is to include the needed header.

Change-Id: I0b2bbc8d2efeab8e253f11b0e58df51c0002d5ae
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13527
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5604 tgt: return missed fail ids

OBD_FAIL_LDLM_REPLY is missing from tgt_enqueue, and it's actually
not suitable for tgt_enqueue anymore because tgt_enqueue() is a
common handler now.

This patch includes a few changes:
- tgt_enqueue sets tgt_session_info::tsi_reply_fail_id to
  OBD_FAIL_MGS/MDS/OST_LDLM_REPLY_NET based on type of target.

- rewrite test_52 of replay-single, the only reason that test_52
  can pass is because there is a typo:

  $CHECKSTAT -t file $DIR/$tfile-* which should be $DIR/$tfile

- add definitions for OBD_FAIL_LDLM_SRV_CP/BL/GL_AST and resolve
  OBD_FAIL conflictions

- OBD_FAIL_UPDATE_OBJ_NET_REP was renamed to
  OBD_FAIL_OUT_UPDATE_NET_REP but referenced with old name in tests.

- OBD_FAIL_MDS_FAIL_LOV_LOG_ADD check is obsoleted as well as tests.
  Meanwhile the OSP code was updated to fix panic in case of error.

- OBD_FAIL_TGT_LAST_REPLAY is removed along with test. It was never
  used and it seems it was even introduced by mistake.

Test-Parameters: envdefinitions=SLOW=yes alwaysuploadlogs testlist=replay-dual,replay-single
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If5113e459f5628047e17114b6bc20ba910f3c142
Reviewed-on: http://review.whamcloud.com/12232
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6138 lfsck: NOT hold reference on pre-loaded object

To improve the LFSCK performance, the LFSCK main engine will pre-load
the object locally or remotely, then generate related LFSCK request
that reference the pre-loaded object, and then push the request into
related LFSCK pipeline. The LFSCK assistant thread will handle the
LFSCK request some later asynchronously.

Originally, the LFSCK request holds the pre-loaded object reference,
so the assistant thread can handle it directly without locating the
object by FID again. But holding the object reference will cause the
object cannot be purged out RAM. If some LFSCK request has held the
object, and some other unlinked the object before the LFSCK assistant
thread handling the LFSCK request, then the unlinked object will be
cached in RAM until the last reference released. Because the LFSCK
main engine and assistant thread run asynchronously, we do not know
when the LFSCK request that holding the object reference will be
handled. If the assistant thread needs to locate the object with
the same FID before that, it will fall into self-deadlock for ever.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I516653aa2143bb32a5f350b314951b78dead3e79
Reviewed-on: http://review.whamcloud.com/13666
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6235 scrub: replace the stale OI mapping

If the OI mapping on the OST contains an invalid one, then the OI
lookup via osd_obj_map_lookup() may return -ENOENT. From the view
of OI scrub, it is indistinguishable from the case of there is no
such OI mapping, then it will cause the OI scrub to use "INSERT"
@ops for osd_scrub_refresh_mapping() to repair such inconsistency
by wrong. So the osd_obj_map_lookup() should return -ESTALE under
the case of invalid OI mapping exists, then the OI scrub can use
"UPDATE" @ops for osd_scrub_refresh_mapping() to repair.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I013125eb0aaec683ac8f56ec32a30e7858262f87
Reviewed-on: http://review.whamcloud.com/13745
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6239 doc: include missing lnetctl.8

Doing a man lnetctl currently doesn't work on system
with lustre installed. This is due to lnetctl.8 does
not get included in generated rpms. This simple fix
ensure lnetctl.8 is included in the rpms.

Change-Id: I72e2ef2841f5936e1d0def538c239ee2da32d7c3
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13749
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5873 ldiskfs: osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed

The bug happens when 16TB-4KB limit is exceeded during write.

Add check for maximum file size on client and server sides.

Xyratex-bug-id: MRP-2131
Change-Id: I73f0ee803670ada869c2618f275049948668848e
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://review.whamcloud.com/12600
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6222 statahead: add to list before make ready

__sa_make_ready() set entry ready before adding to list, so that
revalidate_statahead_dentry()->sa_kill() may free an entry which
is not in any list yet.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I0b5f7200fb74c88450133d66bf7bf38d9355036f
Reviewed-on: http://review.whamcloud.com/13708
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5549 mdc: cl_default_mds_easize not refreshed

The client_obd::cl_default_mds_easize field should track the largest
observed EA size advertised by the MDT, subject to a reasonable upper
bound.  The MDC uses cl_default_mds_easize to calculate the initial
size of request buffers.  The default value should be small enough to
avoid wasted memory and excessive use of vmalloc(), yet large enough
to accommodate the common use case.

In the current code, the default value is only updated if
client_obd::cl_max_mds_easize is strictly less than
mdt_body::mbo_max_mdsize. This condition is almost never met, because
client_obd::cl_max_mds_easize is computed at client mount-time based
on the number of OSTs in the filesystem, so the MDT won't ever observe
and advertise an EA size larger than that.

As a result, client_obd::cl_default_mds_easize indefinitely retains
its initial value, which is computed at client mount-time based on
the filesystem's default stripe width. Any getattr() requests for
widely striped files will consequently allocate a request buffer
that is too small, forcing reallocations on both the client and
server side. To avoid this, update client_obd::cl_default_mds_easize
independently of the value of client_obd::cl_max_mds_easize.

In addition, this patch includes these changes:

- Add comments to the client_obd structure to clarify what the
  cl_{default,max}_mds_{cookie,ea}size values mean.

- Prevent mdc_get_info() from storing uninitialized data in
  client_obd::cl_max_mds_cookiesize.

- Use 4096 as an upper bound for the default values.  The former
  bound of PAGE_CACHE_SIZE is too large on 64k-page platforms
  (i.e. PPC), so it fails to prevent the vmalloc() spinlock
  contention described in LU-3338. The new value was chosen to
  be large enough to accommodate common use cases while staying
  well below the 16k threshold at which allocations start using
  vmalloc().

- Add test case 27E to ./lustre/tests/sanity.sh.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Kyle Blatter <kyleblatter@llnl.gov>
Change-Id: I363017844d6af3e6b67b7c03bd206226f9495116
Reviewed-on: http://review.whamcloud.com/11614
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5549 llite: make default_easize writeable in /proc

Allow default_easize to be tuned via /proc. A system administrator
might want this if a rare access to widely striped files drives up the
value on a filesystem where narrowly striped files are the more common
case. In practice, however, this is wanted primarily to facilitate
a test case for LU-5549.

- Plumb the necessary interfaces through the LMV and MDC layers
  to expose write access to this value by higher layers.

- Add block comments to modified functions.

- Correct misspelling of "default" in /proc handler function names
  in lustre/llite/lproc_llite.c. The file names in /proc were already
  spelled correctly so there are no issues with backward
  compatibility.

- Convert remaining space-indented lines in lmv_set_info_async()
  to tabs.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Change-Id: Iae2c8d0ca28cccf12af9372b1a10a0f9d170fddf
Reviewed-on: http://review.whamcloud.com/13112
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>

LU-5523 mdt: add --index option to default dir stripe

Add --index option to default dirstripe EA. If MDT find
out the client send the create req to the wrong MDT because
of default stripeEA, it will return -EREMOTE, then client
will retrieve default stripeEA through xattr cache, and
re-create the object.

Add delete default dirstripeEA (-d) to delete dir default
stripeEA.

Add ldo_dir_def_striping_cached and ldo_def_striping_cached
to means if default striping EA has been cached in ldo_object.

And ldo_striping_cached means if the object's own striping
has been loaded from disk.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic2896e9050f1581344db9368b8f7b25bfded3d7d
Reviewed-on: http://review.whamcloud.com/13360
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4647 lctl: add nodemap man pages to lctl

This patch adds separate man pages for the 8 lctl nodemap commands,
and updates the lctl man page to point to them.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ia1350471a2878a8f4057d66a91141ad8dd132bc2
Reviewed-on: http://review.whamcloud.com/13478
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6086 obdclass: check peer's version for MDT-MDT connection

Because new DNE/LFSCK changed some wire protocol, we cannot support
the interoperations between different version MDTs. The basic rules
for the permitted connection are:
1) The @major in the connection version should be the same;
2) The @minor in the connection version should be the same;
3) The difference of the @patch in the connection version should NOT
more than 3.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e77f305c7552ad01e92c97f1eda0756f1291d30
Reviewed-on: http://review.whamcloud.com/13285
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-6199 ldiskfs: delete bad WARN_ON_ONCE from ldiskfs

lustre needs to call certain ext4/ldiskfs entry points without locking
i_mutex in order to avoid deadlocks. This triggers a warning check
in ext4 code new in el6.6, not present in el6.5. Already fixed
in ldiskfs patches for future kernel versions, but wasn't fixed for
el6.6

This mod adds an ldiskfs patch to eliminate the warning.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ia375a6d851a5262c578d722e2f8f4db2ea5249b7
Reviewed-on: http://review.whamcloud.com/13604
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

new tag 2.6.94

Change-Id: I7cbdaaa209cb5c3db1612f0f9f36ac6668906962

LU-6084 ptlrpc: prevent request timeout grow due to recovery

Patch fixes the issue with growing request timeout which occured
after commit 1d889090 for LU-5079. While commit itself is correct,
it reveals another issue. If request is being processed for a long
time on server then client adaptive timeouts will adapt to that
after receiving reply and new requests will have bigger timeout.
Another problem is that server AT history is corrupted by recovery
request processing time which not pure service time but includes
also waiting time for clients to recover.

Patch prevents the AT stats update from early replies on client and
from recovering requests processing time on server.
The ptlrpc_at_recv_early_reply() still updates the current request
timeout as asked by server, but don't include this into AT stats.
The real reply will bring that data from server after all.

Test-Parameters: alwaysuploadlogs testlist=replay-vbr,replay-dual

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ifcadfd669162013b6ccb386eb2b508bd9f0b22d9
Reviewed-on: http://review.whamcloud.com/13520
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6082 tests: fix too slow nodemap SLOW test

The SLOW test for nodemap is too slow to complete. This patch changes
the test to do 000-007, 010-070, 100-700 (octal) instead of testing
all modes, as was done before.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
mdtcount=1 testlist=sanity-sec

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ic92a3718de078ccfd13cf0b6580ab078dfedb144
Reviewed-on: http://review.whamcloud.com/13605
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-6109 lfsck: check FID validity before locating object

It is possible that the FID from iteration or linkEA is corrupted.
The LFSCK needs to check its validity before locating the object
with it to avoid falling into hung or other unexpected status.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I1df8d085bf5abf926d03882457cb8b221633d3aa
Reviewed-on: http://review.whamcloud.com/13511
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>

LU-6010 lnet: prevent assert on LNet module unload

There is a use case where lnet can be unloaded while there are
no NIs configured.  Removing lnet in this case will cause
LNetFini() to be called without a prior call to LNetNIFini().
This will cause the LASSERT(the_lnet.ln_refcount == 0) to be
triggered.

To deal with this use case when LNet is configured a reference
count on the module is taken using try_module_get().  This way
LNet must be unconfigured before it could be removed; therefore
avoiding the above case.  When LNet is unconfigured module_put()
is called to return the reference count.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I0f283eeb395fa9a076a4d65ab3edd5e7807fc169
Reviewed-on: http://review.whamcloud.com/13110
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6175 ha: add health_check routine to the MDS, MGS and OSD

Patch adds obd_health_check() methods in MDS and MGS to check
ptlrpc services health like OST does. Patch adds also health_check()
routine directly to OSD to check it is mounted and is not read-only.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ib4af652b08e7e3616ebb3b99ce3e4ad03bdd5ab5
Reviewed-on: http://review.whamcloud.com/13558
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6166 utils: fix bug in lr_link

When create a hard link of a file, the path and the file name are
same if it at the root directory, so the length of the path and name
will be the same in this case.

Signed-off-by: Wu Libin <lwu@ddn.com>
Change-Id: I3a72491efdc041ad0e96d036b04600b76bb646fe
Reviewed-on: http://review.whamcloud.com/13546
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6167 utils: fix bugs in lustre_sync

The lustre_rsync will cause endloop and core dump problems, this
patch fix this problems. In function lr_cascade_move, it should
delete "curr" node in the "parents" list first, then move to the
next lr_cascade_move.

Signed-off-by: Wu Libin <lwu@ddn.com>
Change-Id: I5a5686ab89379da37453d07a5a00df4fd217ee59
Reviewed-on: http://review.whamcloud.com/13545
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6125 test: sanity test_27i defect: missing test_mkdir()

Fix sanity test_27i() to call test_mkdir()

Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Xyratex-bug-id: MRP-1194
Reviewed-by: Alexander Zarochentsev <alexander_zarochentsev@xyratex.com>
Change-Id: I093cb44590b98857189d69d1b8f6e9e9c423d3bc
Reviewed-on: http://review.whamcloud.com/13407
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5510 scrub: ldiskfs_create_inode returns locked inode

There was race condition between creating new inode and OI scrub:
the OI scrub may find the new created inode just after the creator
creating it but before setting the LMA EA. Originally, to resolve
such trouble, the creator will set the new created inode's state
as LDISKFS_STATE_LUSTRE_NOSCRUB. But such state is set after the
new inode unlocked. So the OI scrub still has some chance to find
the new created inode with neither LDISKFS_STATE_LUSTRE_NOSCRUB
nor LMA EA.

Be as improvement, this patch makes the ldiskfs_create_inode() to
return the new created inode with lock. The caller can set more
state (not only for LFSCK, but also for other purposes in future)
on the new created inode before unlock it.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Idc1a8fbd3701f7e431ef4b7858cfdf4674d74add
Reviewed-on: http://review.whamcloud.com/13187
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>

LU-5722 obdclass: reorganize busy object accounting

Due to some accounting bug, lsb_busy of a hash bucket can become
larger than the total number of objects in said bucket. A busy object
can be counted more than once. When that happens, a negative value is
returned by the shrinker to Linux's shrink_slab() function. In older
kernel, such as 2.6.32 used in RHEL 6, this will cause a forever loop
inside shrink_slab(), in essence hanging the host.

Instead of trying (and failing) to count the busy objects, count the
objects than are not busy, i.e. the objects that are present on the
lsb_lru list. The number of busy objects is then the difference
between the number of objects in the hash and the objects on the
lsb_lru list.

Change-Id: Ia6973991a1ff7fc53cdf8132bf2aab532934cf94
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12468
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6120 lfsck: notify ever failed server to exit LFSCK

During the first-stage scanning, the local LFSCK instance records
which OSTs have ever failed to respond LFSCK verification requests
(maybe because of network issues or the OST itself trouble). Then
before start the second-stage scanning, the local LFSCK instance
will notify those ever failed OSTs to skip orphan handling since
they missed some OST-objects verification via la_sync_failures().

Originally, after la_sync_failures(), related OSTs will be removed
from the LFSCK targets list, in spite of whether la_sync_failures()
succeed or not, then the subsequent LFSCK notification RPCs will not
be sent to those OSTs. That may cause some OST(s) cannot exit LFSCK
expectedly, and then the subsequent LFSCK start will get failure
since former LFSCK instance has not exit.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Id0283c78527d6a3a6c563de7ce6af1fe2d3f1a30
Reviewed-on: http://review.whamcloud.com/13525
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6050 target: control OST-index in IDIF via ROCOMPAT flag

Introduce new flag OBD_ROCOMPAT_IDX_IN_IDIF that is stored in the
last_rcvd file. For new formatted OST device, it will be auto set;
for the case of upgrading from old OST device, you can enable it
via the lproc interface osd-ldiskfs.index_in_idif. With such flag
enabled, for new created OST-object, its IDIF-in-LMA will contain
the OST-index; for the existing OST-object, the OSD will convert
old format IDIF as new format IDIF with OST-index stored in the
LMA EA when accessing such OST-object or via OI scrub. Once such
flag is enabled, it cannot be reverted back, so the system cannot
be downgraded to the orignal incompatible version.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e6e089d54fdb3970bb201eedac8dc09be2cc1c1
Reviewed-on: http://review.whamcloud.com/13516
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6063 kernel: use proper flags for call_usermodehelper

When a parameter is permanently changed on the MGS the
MGS send a changelog packet to the proper nodes that
are affected by the change. Once the nodes receive the
change they then call the userland utility lctl to
change its local value. When calling a userland
application from the kernel you specify a flag to
control the interaction with the application. Originally
by default the flag was set to 0 which is UMH_NO_WAIT
which meant lctl was being called asynchronously. In
older kernels this was fine since UHM_NO_WAIT and
UHM_WAIT_PROC had nearly the same logic. This changed
with newer kernels which broke updating our parameters.
Plus doing a UHM_NO_WAIT doesn't report back a error
if something goes wrong with lctl. The fix is to set
the flag to UHM_WAIT_PROC so kernel space waits until
lctl has finished and we get a proper error code if
something does go wrong with lctl. Secondly the patch
uses the proper flag name instead of a number for the
use of call_usermodehelper in mdt_identity.c so the
code is more readable.

Change-Id: I016fd4342315e9db6ec3ef544bcfb3a477c97b52
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13677
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-6154 zfs: striped directory and migration on ZFS

1. Increase/decrease the refcount for sub_stripe object,
because we need explicitly increase/decrease refcount
for ZFS directory.

2. setup/cleanup sequence service for osd-zfs, so it can
create FID for local OSD.

3. Do not zero dah_eadata in OSD layer, instead of set it
MDD layer, so striping create process will be interferred.

4. Put 0 at the end of link data during migration, since
osd-zfs does not do it when reading link.

5. Create orphan object with linkEA data, so if migration
is interrupted, then other threads are able to read entries
from this half-migrated directory, because osd-zfs needs to
retrieve the parent FID from linkea data during read dir
entries (see osd_dir_it_rec()).

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I67cbd0b09d2716b163277425066dcf155df68039
Reviewed-on: http://review.whamcloud.com/13518
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6162 kernel: kernel update RHEL6.6 [2.6.32-504.8.1.el6]

Update RHEL6.6 kernel to 2.6.32-504.8.1.el6

Test-Parameters: clientdistro=el6.6 mdsdistro=el6.6\
ossdistro=el6.6 mdsfilesystemtype=ldiskfs\
mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: If1bf2bca5f70e305be4859d8f5f196b3574abed3
Reviewed-on: http://review.whamcloud.com/13560
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6106 tests: Skip test_16 to test_23 if MDS version older than 2.6.90

Skip sanity-sec test_16 to test_23 if MDS version older than 2.6.90

Change-Id: I0f95dae3a7a0bdef52160a3ca76fefac6765007c
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/13509
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

Revert "LU-1214 ptlrpc: start minimum service threads"

This seems to have broke something and causes wide conf-sanity failures.
See LU-6206 for more info

This reverts commit 43f96aa9cc3cec66d9b9e0a03e5fc23e094525e7.

Change-Id: Ie0d7124c72c7e590581ec92c2ab49c3d7bfa09fe
Reviewed-on: http://review.whamcloud.com/13647
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5829 ptlrpc: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Change-Id: I5dad1093f136577fa268cd7ecbebd1d660cfa8ef
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12510
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4870 lfsck: lock old MDT-object in migrating

According to current metadata migration implementation, before the old
MDT-object is removed, both the new MDT-object and old MDT-object will
reference the same LOV layout. Then if the layout LFSCK finds the new
MDT-object by race, it will regard related OST-object(s) as multiple
referenced case, and will try to create new OST-object(s) for the new
MDT-object. To avoid such trouble, the layout LFSCK needs to lock the
old MDT-object before confirm the multiple referenced case.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e42cb86683c33bedfef01ae7f6e2cc305f1137d
Reviewed-on: http://review.whamcloud.com/13182
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-4712 llite: lock the inode to be migrated

Because the inode and its connected dentries will be cleared
out of the cache after migration, the inode needs to be locked
during the migration.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ibbbb33473de1a67df85ef8930debcf22cd775bcb
Reviewed-on: http://review.whamcloud.com/9689
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5242 osd-zfs: umount hang in sanity 133g

Disable 78 79 80 that's known to trigger txg_wait_open()
hang which would block umount forever.

Change-Id: I3770c11120790f55ecc021cc054971e00acc951b
Signed-off-by: Isaac Huang <he.huang@intel.com>
Test-Parameters: mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs testlist=sanity,sanity,sanity,sanity,sanity,sanity
Reviewed-on: http://review.whamcloud.com/13600
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5820 lfsck: use multiple namespace LFSCK trace files

The namespace LFSCK uses trace file to record the FID of the object
that has multiple hard links, or has remote name entry, or contains
some uncertain inconsistency, and so on. Only single namespace LFSCK
trace file may be not efficient, especially when there are millions
of FIDs to be recorded. So use multiple namespace LFSCK trace files
and per trace file based semaphore to control the concurrent access
of the trace file.

For Lustre-2.x (x <= 6), the LFSCK used LFSCK_NAMESPACE_MAGIC_V1 as
the namespace trace file magic. When downgrade to such old release,
the old LFSCK will not recognize the new LFSCK_NAMESPACE_MAGIC_V2 in
the new trace file, then it will reset the whole LFSCK, and will not
cause start failure. The similar case will happen when upgrade from
such old release.

This patch also drops some repeated FID recording in the namespace
LFSCK trace file. Related FID should have been recorded in the trace
file via lfsck_namespace_exec_oit(), it is unnecessary to do that
again when scanning the directory.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iec27c52b21789dbde1e4c1153f61162f028ceac3
Reviewed-on: http://review.whamcloud.com/12809
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>

LU-6095 tests: define TRUNCATE program for racer

In file_truncate.sh of racer, TRUNCATE was not defined for remote
clients. Let it point to tests/truncate in case it's not defined.

The same thing happens to MCREATE and LFS, fix them also and do
some cleanup.

Test-Parameters: alwaysuploadlogs testlist=racer
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ie6898f1573bd19810a2d8f14dc0aa375d3774e08
Reviewed-on: http://review.whamcloud.com/13501
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5357 lod: hold thandle during lod_trans_stop

Hold thandle during lod_trans_stop, to avoid the thandle
being freed in local transaction stop.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I2448d725e35b119a61bbfb2e9567446d203bec16
Reviewed-on: http://review.whamcloud.com/13420
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6115 test: sanity 133g defect: missing return after "skip"

Patch fixes test_133g(): add return() after skip()

Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Xyratex-bug-id: MRP-2153
Change-Id: I1787e1300930542c5a34c5a7e8bd277df28bf17a
Reviewed-on: http://review.whamcloud.com/13389
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>

LU-5829 obdclass: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Removed now unused function cat_cancel_cb() and fixed 3 comments in
test code mentioning this function.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ia0fa1e8e65f197235c04997f56b49d8fd87d4fd6
Reviewed-on: http://review.whamcloud.com/13323
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5829 misc: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ibb6dd722c47c7c76275ac24f1a6d8a4a988f433a
Reviewed-on: http://review.whamcloud.com/13321
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2430 utils: fix "lfs mv" command parsing

Fix the lfs_mv() long option parsing so that it uses "--mdt-index"
instead of incorrectly requiring "----mdt-index" for the short "-M"
option.

Fix up some error messages in lfs_mv() as well, and change a test
case to use the long option form.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I20ffde97fb5d31364e91d6b21d407eb3323ebbe5
Reviewed-on: http://review.whamcloud.com/13161
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5478 lov: get rid of obd_* typedefs

We have a bunch of typedefs for common things that made no sense
and hid the actual type from plain view.
Replace them with proper uXX or sXX types.
Exception is in lustre_idl.h and lustre_ioctl.h where
they are replaced with __uXX and __sXX to be able to be included
in userspace. Replace obd_off with loff_t.

patch 3 in series: modify lov/lmv

Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I9dfcc0bac691160c64ef8a120887b160c0c6986f
Reviewed-on: http://review.whamcloud.com/13144
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>

LU-2675 lnet: assume a kernel build

In lnet/lnet/ and lnet/selftest/ assume a kernel build (assume that
__KERNEL__ is defined). Remove some common code only needed for user
space LNet.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I79d6f50bac895116628c93c35e23f64dd102780f
Reviewed-on: http://review.whamcloud.com/13121
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5957 mdt: Update MDT flags after layout swap

Swap MOF_LOV_CREATED flags between MDT objects after a layout swap to
guarantee that layout will be re-created on next write if its LOV has
been deleted.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: I3d0497d8be2a7335c1fb43e10af2b222243e6a81
Reviewed-on: http://review.whamcloud.com/12877
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-2445 lfs: fixed support for lfs migrate -b

-b is the short alias for --block to the lfs migrate command, but
wasn't set in the call to getopt_long().

Change-Id: Ie7397b994a34de71b9978cf51b55961b4c9ded69
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12627
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-5521 grant: quiet message on grant waiting timeout

Use at_max in osc_enter_cache() to bound how long we wait for grant
space before switching to synchronous I/Os. Do not print a message
on the console when the timeout is hit since such long wait can
be legitimate with flaky network (i.e. BRW is resent multiple times).

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Change-Id: I63b40783381f6133e2f77dbc0f827e13f571ccd2
Reviewed-on: http://review.whamcloud.com/12146
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5023 tests: check FID seq properly for sanity-lfsck t_11b

To guarantee the right FID seq to be checked.

Other scripts improvement for error handling.

Try to collect more logs.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I51cb75c15cc7421721ea0bc149fc2a5a72c13cc6
Reviewed-on: http://review.whamcloud.com/10276
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6081 user: use random() instead of /dev/urandom

/dev/urandom gives good random numbers, but using it is very prone to
error, and opening/closing the device every time a number is needed
takes time.

Instead, initializes the library with our seed by calling srandom(),
and then use random(). Export a boolean variable
liblustreapi_initialized to let applications check that the library
was properly initialized by the loader.

Signed-off-by: frank zago <fzago@cray.com>
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Ie6ced0d39df29d7054919e239add58a23115ec35
Reviewed-on: http://review.whamcloud.com/13277
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-1214 ptlrpc: start minimum service threads

If the ptlrpc_min_threads parameter is changed via /proc after the
service has started, then at least the requested number of service
threads should be started. Otherwise this parameter would only be
used at initial thread startup and ignored if changed via /proc.

Fix conf-sanity.sh test_52[ab] to verify that at least the minimum
number of threads has been started when threads_min parameter is
changed, instead of just checking the parameter itself. Also fix
test code style for 80-column line wrapping and tabs for indents.

The head utility does not always support shortcut "-1" option. It
should be specified as "-n1".

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I6e4bb4131d7500a93952b64102f885c76558cab0
Reviewed-on: http://review.whamcloud.com/2876
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5816 target: don't trigger watchdog waiting in recovery

In target_recovery_thread, the process should not be considered
to be "blocked state" if it was waiting something to happen,
otherwise, the kernel watchdog will print:

task tgt_recov:19764 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
tgt_recov D 0000000000000003 0 19764 2 0x00000000
Call Trace:
check_for_clients+0x0/0x70 [ptlrpc]
target_recovery_overseer+0x9d/0x230 [ptlrpc]
exp_connect_healthy+0x0/0x20 [ptlrpc]
autoremove_wake_function+0x0/0x40
target_recovery_thread+0x0/0x1920 [ptlrpc]

Change-Id: Ic1ad4dce1df974dd99e0b28cee211de173d178e5
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/12672
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>

LU-6147 lfsck: NOT purge object by OI scrub

Originally, when the OI scrub found some inconsistent FID mapping,
it will repair the FID mapping and ask others to reload the object
by purging such object. Such behavior may cause others to hang.
Because if the object corresponding to the FID has already been
established in RAM, and if some other holds the object's reference,
such as the LFSCK engine will hold the .lustre/lost+found/MDTxxxx,
then purging object will set LU_OBJECT_HEARD_BANSHEE on the object,
then the subsequent object find against such FID will be blocked
until the object's reference become zero and re-establish the object
in RAM. Unfortunately, if it is the object's reference holder tries
to find the same object, it will be blocked by itself for ever.

On the other hand, on the server side, the OI scrub will repair
the bad OI mappping, if the object is established in RAM before
its bad FID mapping repaired, then it must be marked as non-exist,
and should not be cached in RAM after the last reference released.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I651ef5f5e8f4f478f07bcbb5622b345deed7cb31
Reviewed-on: http://review.whamcloud.com/13493
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6031 test: Check server version in recovery-small test 10d

Test should check server version for interoperability needs.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I3b46ba9291c8c64cc3d3c235c0985f88df23f633
Reviewed-on: http://review.whamcloud.com/13557
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>

LU-6171 kernel: kernel update [RHEL7 3.10.0-123.20.1.el7]

update RHEL7 kernel to 3.10.0-123.20.1.el7

Test-Parameters: clientdistro=el7 mdsfilesystemtype=ldiskfs\
mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ieb1e8a2bb4cd86268721af91dd15d2c5bc69d0bf
Reviewed-on: http://review.whamcloud.com/13570
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-3536 osd: allocate it for each iteration.

Add osd iteration structure(osd_it_ea) to specific SLAB,
and allocate new osd_it_ea for each iteration, so iteration
can be nested, which will help DNE and LFSCK.

Since iteration for iam and quota are not so often,
we just allocate them with normal OBD_ALLOC_PTR.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I6402259708264f9341f314e7a2f6afe16cc66481
Reviewed-on: http://review.whamcloud.com/13223
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5971 llite: rename ccc_req to vvp_req

Rename struct ccc_req to struct vvp_req and move related functions
from lustre/llite/lcommon_cl.c to the new file lustre/llite/vvp_req.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I6589cd1e039b41e55fcd833476f6a58ff2492900
Reviewed-on: http://review.whamcloud.com/13377
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-6003 lnet: improvement to router checker

This patch starts router checker thread all the time.

The router checker only checks routes by ping if
live_router_check_interval or dead_router_check_interval are set
to something other than 0, and there are routes configured.

If these conditions are not met the router checker sleeps until woken
up when a route is added. It is also woken up whenever the RC is
being stopped to ensure the thread doesn't hang.

In the future when DLC starts configuring the live and dead
router_check_interval parameters, then by manipulating them
the router checker can be turned on and off by the user.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I778690755e7121abd575f1a261637cb6dc754edd
Reviewed-on: http://review.whamcloud.com/13035
Tested-by: Jenkins
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5823 clio: add cl_object_find_cbdata()

* Delete obsolete obd_ops::o_find_cbdata interface.
* Delete obsolete obd_ops::o_change_cbdata interface.
* Add cl_object_find_cbdata().

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I2e64e2e9a112783cb5c66bf4580fd1aec794417b
Reviewed-on: http://review.whamcloud.com/12494
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

LU-5971 llite: move vvp_io functions to vvp_io.c

Move all vvp_io related functions from lustre/llite/lcommon_cl.c to
the sole file where they are used lustre/llite/vvp_io.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I5b7d9671a32aaff7a2ebce42b0f5ff10e2eeb4ab
Reviewed-on: http://review.whamcloud.com/13376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

New tag 2.6.93

Change-Id: I826747da53ed1d9b0b2417b7b597dab3b76088a3