Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-6070 libcfs: provide separate buffers for libcfs_*2str() 85/13185/6
Dmitry Eremin [Thu, 25 Dec 2014 12:12:02 +0000 (15:12 +0300)]
LU-6070 libcfs: provide separate buffers for libcfs_*2str()

Provide duplicates with separate buffers for libcfs_*2str() functions.

Replace libcfs_nid2str() with libcfs_nid2str_r() function in critical
places.

Provide buffer size for nf_addr2str functions.

Use __u32 as nf_type always

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I7505271954745d1b1e288ef4e09a7f52bd970536
Reviewed-on: http://review.whamcloud.com/13185
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-5843 tests: fix recovery-small test_61 53/12653/6
Mikhail Pershin [Mon, 10 Nov 2014 17:12:39 +0000 (20:12 +0300)]
LU-5843 tests: fix recovery-small test_61

Test used obdfilter last_id as number while it is OID,
e.g. 0x100000000:16. Patch fixes test to exract object ID
from OID.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If921cf41253450ab035a75be6fb34145aee1a197
Reviewed-on: http://review.whamcloud.com/12653
Tested-by: Jenkins
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-4839 tests: wait for copytool start sanity-hsm/60 31/13731/2
Nathaniel Clark [Wed, 11 Feb 2015 13:48:12 +0000 (08:48 -0500)]
LU-4839 tests: wait for copytool start sanity-hsm/60

Wait for copytool to start transfer before checking progress interval.
copytool, in certain environments (heavily loaded NFS backed target),
can take an extrodinarly long time (>30s) to open destination file.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I56908a16240b61a51fe1395a8104eddc6aa3131f
Reviewed-on: http://review.whamcloud.com/13731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6137 ldiskfs: simplify nocmtime patch 05/13705/2
Niu Yawei [Tue, 10 Feb 2015 03:21:00 +0000 (22:21 -0500)]
LU-6137 ldiskfs: simplify nocmtime patch

Simplify the nocmtime patch by patching only ext4_current_time(),
this fixed the defect that original patch doesn't handle setacl
code path, it can also avoid the risk of future changes adding
new places that needs to be fixed.

Remove the obsolete xattr-no-update-ctime patch.

Signed-off-by: Anreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I02928c4f867e9476f0bc1815dd3256e3d79dadf7
Reviewed-on: http://review.whamcloud.com/13705
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
4 years agoLU-4223 tests: fix conf-sanity test_32 typo 65/13265/2
Andreas Dilger [Wed, 7 Jan 2015 10:16:10 +0000 (03:16 -0700)]
LU-4223 tests: fix conf-sanity test_32 typo

The t32_wait_til_devices_gone() function incorrectly calls
"lctl devices_list" instead of "lctl device_list" if there
is a timeout waiting for the loop devices to be cleaned up.
Since this is only used for debugging output after an error,
it wasn't actually causing any additional failures.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I858e789b16251835bce7af46e4f5233c95500c1e
Reviewed-on: http://review.whamcloud.com/13265
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat 97/13097/6
Brian Behlendorf [Wed, 17 Dec 2014 00:14:17 +0000 (16:14 -0800)]
LU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat

The sa_spill_alloc()/sa_spill_free() interfaces have been retired.
Callers may either use the more memory efficient zio_buf_alloc()/
zio_buf_free() which are now exported, or they may use their own
allocator.

For the purposes of this patch an osd_zio_buf_alloc()/
osd_zio_buf_free() wrapper function was introduced which layers
on whichever interface is provided.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Id1d19d7b4c440808b8b3fd042f687b10c1b869f2
Reviewed-on: http://review.whamcloud.com/13097
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-6038 osd-zfs: Avoid redefining KM_SLEEP 96/13096/5
Brian Behlendorf [Tue, 16 Dec 2014 23:24:28 +0000 (15:24 -0800)]
LU-6038 osd-zfs: Avoid redefining KM_SLEEP

Due to some long overdue memory management cleanup in the ZoL kmem
implementation the definition of KM_SLEEP has changed.  This change
was expected to be transparent to consumers but it causes issues
for Lustre because it explicitly redefines KM_SLEEP.  This was
originally done to avoid overriding the Linux slab interfaces.

This change implements a more portable fix.  Instead of preventing
the inclusion of the kmem.h header by setting the guard.  The
kmem_cache_* preprocessor macros are explictly undefined to make
the Linux interface available.

The related ZoL pull requests are as follows:

  https://github.com/zfsonlinux/spl/pull/414
  https://github.com/zfsonlinux/zfs/pull/2918

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Id1d19d7b4c440808b8b3fd042f687b10c1b869f3
Reviewed-on: http://review.whamcloud.com/13096
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-2828 test: Remove tests from ALWAYS_EXCEPT list 57/13757/2
James Nunez [Fri, 13 Feb 2015 17:12:30 +0000 (10:12 -0700)]
LU-2828 test: Remove tests from ALWAYS_EXCEPT list

conf-sanity tests 59 and 64 were added to the ALWAYS_EXCEPT list
in commit b2c829b7be757cd2bc523ab0d2857a77eeb7a349 for LU-2469.

Commit 1e7845ecbe5f3e8ac1aa0d3e345e6cf6cf6f0543, for LU-2828, resolves
the cause of the conf-sanity test 59 and 64 failures.

conf-sanity test 59 and 64 need to be removed from the except list.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I4b70485f91e0096c2e4387ebcdc95cf5720a7e16
Reviewed-on: http://review.whamcloud.com/13757
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-2194 test: remove test_19b from except list 71/13671/2
Hongchao Zhang [Wed, 19 Nov 2014 15:45:33 +0000 (23:45 +0800)]
LU-2194 test: remove test_19b from except list

the patch to fix the problem has been landed, test_19b in
recovery_small should be removed from except list.

Change-Id: I748a7dfb4f70a42a0f17ab93803cb2d6d05b32db
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/13671
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-3680 osd: reduce osd_thread_info in ldiskfs osd 26/9726/26
Alex Zhuravlev [Wed, 19 Mar 2014 08:20:16 +0000 (12:20 +0400)]
LU-3680 osd: reduce osd_thread_info in ldiskfs osd

by unioning few rarely used fields. now the structure should
fit a page:

(gdb) p sizeof(struct osd_thread_info)
$1 = 3296

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I75d5c6fefa41884390ce155781e0963884a3ad2c
Reviewed-on: http://review.whamcloud.com/9726
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
4 years agoLU-5264 obdclass: fix race during key quiescency 03/13103/3
Bruno Faccini [Wed, 17 Dec 2014 09:57:07 +0000 (10:57 +0100)]
LU-5264 obdclass: fix race during key quiescency

Upon umount, presumably of last device using same OSD back-end,
to prepare for module unload, lu_context_key_quiesce() is run to
remove all module's key reference in any context linked on
lu_context_remembered list.
Threads must protect against such transversal processing when
exiting from its context.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: If2c8199fa764236308b49950672129a63b8877f5
Reviewed-on: http://review.whamcloud.com/13103
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6301 llite: cleanup open handle for client open failure 09/13709/13
Fan Yong [Sun, 30 Nov 2014 00:26:29 +0000 (08:26 +0800)]
LU-6301 llite: cleanup open handle for client open failure

For open case, the client side open handling thread may hit error
after the MDT grant the open. Under such case, the client should
send close RPC to the MDT as cleanup; otherwise, the open handle
on the MDT will be leaked there until the client umount or evicted.

If the LFSCK marks LU_OBJECT_HEARD_BANSHEE on the MDT-object that is
opened by others for repairing some inconsistency, such as repairing
multiple-referenced OST-object, because the leaked open handle still
references the MDT-object, then it will block the subsequent threads
that want to locate such object via FID.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I1fff2cde179b039e3bee562ef79d5cf3587fe3c8
Reviewed-on: http://review.whamcloud.com/13709
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6280 lod: delete xattr on striped dir 67/13867/2
wang di [Tue, 24 Feb 2015 04:22:03 +0000 (20:22 -0800)]
LU-6280 lod: delete xattr on striped dir

In lod_xattr_del(), it need delete EA on all stripes of
striped directory.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I398a03d6a41daee34a344104d67cf8efa7d97f6a
Reviewed-on: http://review.whamcloud.com/13867
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6312 lfsck: modify llsd_master_list with spin_lock 21/13921/2
Fan Yong [Thu, 4 Dec 2014 14:00:50 +0000 (22:00 +0800)]
LU-6312 lfsck: modify llsd_master_list with spin_lock

There was spin_lock leak in layout LFSCK lfsck_layout_slave_quit,
that may cause modifying lfsck_layout_slave_data::llsd_master_list
without spin_lock when others traverses such list with spin_lock,
as to the later one(s) access invalid RAM or fall into soft-lockup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I61749ebd6c36d4b21eb20bcc1c46dbe16a1c7f2c
Reviewed-on: http://review.whamcloud.com/13921
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6256 test: Skip sanity test_184e if MDS version older than 2.6.94 45/13845/4
Wei Liu [Mon, 23 Feb 2015 21:19:05 +0000 (13:19 -0800)]
LU-6256 test: Skip sanity test_184e if MDS version older than 2.6.94

Skip sanity test_184e if MDS version older than 2.6.94

Change-Id: Ib491b079a3adc998a12d9bbcb7985ad2e718453b
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/13845
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5938 mdd: fixed oops when dereferencing structure 19/13619/5
Frank Zago [Tue, 3 Feb 2015 18:37:00 +0000 (12:37 -0600)]
LU-5938 mdd: fixed oops when dereferencing structure

In mdd_changelog_ns_store() and mdd_changelog_data_store(),
lu_ucred(env) can be NULL, so do not dereference it.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I45d0cbbb171f05ee1d04e628a3b31c256e0d3951
Reviewed-on: http://review.whamcloud.com/13619
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoRevert "LU-5417 lfs: fix comparison between signed and unsigned" 03/13903/2
Oleg Drokin [Fri, 27 Feb 2015 07:43:55 +0000 (07:43 +0000)]
Revert "LU-5417 lfs: fix comparison between signed and unsigned"

This change is incorrect after all. While it's a noop on x86_64, it's a very important overflow check for 32bit arches.

This reverts commit b5b354a75b5e697e90892878ecb26459cb9a6a21.

Change-Id: I8810da3407d91e63c6e1c062a483a26ffc1bcd97
Reviewed-on: http://review.whamcloud.com/13903
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5912 build: Fix XeonPhi build 30/13730/2
Dmitry Eremin [Wed, 11 Feb 2015 13:39:31 +0000 (16:39 +0300)]
LU-5912 build: Fix XeonPhi build

Need an extra check for old kernel style parameters.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I92b1b8579d2190bf526b3194cd83d0917fb3b4af
Reviewed-on: http://review.whamcloud.com/13730
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6216 tests: compile fixes for PPC64, and for clang 00/13800/2
Frank Zago [Thu, 19 Feb 2015 00:56:34 +0000 (18:56 -0600)]
LU-6216 tests: compile fixes for PPC64, and for clang

Fix the following warnings for PPC64:
  llapi_hsm_test.c: In function 'test101_progress':
  llapi_hsm_test.c:563: error: format '%llu' expects type 'long long
    unsigned int', but argument 8 has type '__u64'

and move the nested functions outside their current functions since
clang doesn't support them.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I034b097f3817a5919adcb8dc3465b00833174f63
Reviewed-on: http://review.whamcloud.com/13800
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoMove master branch to 2.8 development 2.7.50 v2_7_50 v2_7_50_0
Oleg Drokin [Fri, 27 Feb 2015 05:56:17 +0000 (00:56 -0500)]
Move master branch to 2.8 development

Change-Id: If8635d108b6a10b02e01b747b694bdfab4594ba2
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6263 lmv: fix parent FID for migration 17/13817/8
wang di [Wed, 18 Feb 2015 18:58:29 +0000 (10:58 -0800)]
LU-6263 lmv: fix parent FID for migration

If the migrating directory is under striped directory, it needs
to set right stripe FID for its parent.

Update migration test script (sanity test_230) to do migration
under striped dir.

Add -i to test_mkdir().

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic230f9b63bc21c1391e397a0d3ff689e3f0ba5dc
Reviewed-on: http://review.whamcloud.com/13817
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins
4 years agoLU-6230 lfsck: reload OSP-object via set LOV EA on LOD-object 48/13848/2
Fan Yong [Sun, 30 Nov 2014 00:22:10 +0000 (08:22 +0800)]
LU-6230 lfsck: reload OSP-object via set LOV EA on LOD-object

Generally, we should use bottom device (OSD) to update parent
LOV EA. But because the LOD-object still references the wrong
OSP-object that should be detached after the parent's LOV EA
refreshed. Unfortunately, there is no suitable API for that.
So we have to make the LOD to re-load the OSP-object(s) via
replacing the LOV EA against the LOD-object.

Once the DNE2 patches have been landed, we can replace the
LOD device with the OSD device.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I960f42dacc8ee23dd98a2b986f0a83cb53b62c15
Reviewed-on: http://review.whamcloud.com/13848
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6138 lfsck: set async windows size properly 18/13818/5
Fan Yong [Mon, 24 Nov 2014 21:02:28 +0000 (05:02 +0800)]
LU-6138 lfsck: set async windows size properly

If the async windows size is set as zero, then the LFSCK main engine
on the MDT will pre-load objects as fast as possible. Under such case,
if the peer server(s) cannot handle the pre-load requests in time, it
will cause a lot of pre-load requests waiting on the MDT as to memory
pressure. To avoid such trouble, we will forbid to set the LFSCK async
windows size as zero or other too large (> LFSCK_ASYNC_WIN_MAX) valid.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3468236a4a0705ea60b49704583b051c99c77cd5
Reviewed-on: http://review.whamcloud.com/13818
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-5791 lfsck: use bottom device to locate object 92/13392/10
Fan Yong [Mon, 24 Nov 2014 09:57:48 +0000 (17:57 +0800)]
LU-5791 lfsck: use bottom device to locate object

For the LFSCK modification, if only updates single object, or the
objects to be updated reside on the same server, in spite of local
or remote, then try to locate the object(s) against the bottom (OSD
or OSP) device; otherwise, there will be some update(s) on the local
server, and others on remote server, then either locate the object(s)
against LOD device or use two transaction for the modification.

Similarly, the transaction handle will be created on the proper device
corresponding to the object(s).

This patch also fixes some memory leak issues caused by using wrong
device for remote modification, one of the reason for LU-6138.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I09a60bed3bd49a193d57214c4252904cb4546ab2
Reviewed-on: http://review.whamcloud.com/13392
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6231 osp: prepare OUT RPC after remote transaction start 10/13710/4
Fan Yong [Thu, 20 Nov 2014 05:21:40 +0000 (13:21 +0800)]
LU-6231 osp: prepare OUT RPC after remote transaction start

According to our current transaction/dt_object_lock framework
(to make the cross-MDTs modification for DNE1 to be workable),
the transaction sponsor will start the transaction firstly, then
try to acquire related dt_object_lock if needed. Under such rules,
if we want to prepare the OUT RPC in the transaction declare phase,
then related attr/xattr should be known without dt_object_lock. But
such condition maybe not true for some remote transaction case. For
example:

For linkEA repairing (by LFSCK) case, before the LFSCK thread obtained
the dt_object_lock on the target MDT-object, it cannot know whether
the MDT-object has linkEA or not, neither invalid or not.

Since the LFSCK thread cannot hold dt_object_lock before the remote
transaction start (otherwise there will be some potential deadlock),
it cannot prepare related OUT RPC for repairing during the declare
phase as other normal transactions do.

To resolve the trouble, we will make OSP to prepare related OUT RPC
after remote transaction started, and trigger the remote updating
(send RPC) when trans_stop. Then the up layer users, such as LFSCK,
can follow the general rule to handle trans_start/dt_object_lock
for repairing linkEA inconsistency without distinguishing remote
MDT-object.

In fact, above solution for remote transaction should be the normal
model without considering DNE1. The trouble brought by DNE1 will be
resolved in DNE2. At that time, this patch can be removed.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib2ed4c290c9ae12b6f544575aa5313f0dc83a5af
Reviewed-on: http://review.whamcloud.com/13710
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5914 lfsck: dt_index_try before dt_lookup 01/13801/3
Fan Yong [Mon, 24 Nov 2014 04:08:30 +0000 (12:08 +0800)]
LU-5914 lfsck: dt_index_try before dt_lookup

Otherwise it may cause dt_lookup() LBUG when locate the parent
directory MDT-object that is not in cache.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibbed865e58d8f9a4d4b67265b02ba804efb9719e
Reviewed-on: http://review.whamcloud.com/13801
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-5275 gnilnd: Add definition for PDE_DATA 27/13527/4
James Simmons [Tue, 10 Feb 2015 23:37:30 +0000 (18:37 -0500)]
LU-5275 gnilnd: Add definition for PDE_DATA

With the move of PDE_DATA to lprocfs_status.h there
was one klnd driver, gnilnd, that needed this define.
So the simple solution is to include the needed header.

Change-Id: I0b2bbc8d2efeab8e253f11b0e58df51c0002d5ae
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13527
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5604 tgt: return missed fail ids 32/12232/7
Liang Zhen [Mon, 17 Nov 2014 15:35:54 +0000 (23:35 +0800)]
LU-5604 tgt: return missed fail ids

OBD_FAIL_LDLM_REPLY is missing from tgt_enqueue, and it's actually
not suitable for tgt_enqueue anymore because tgt_enqueue() is a
common handler now.

This patch includes a few changes:
- tgt_enqueue sets tgt_session_info::tsi_reply_fail_id to
  OBD_FAIL_MGS/MDS/OST_LDLM_REPLY_NET based on type of target.

- rewrite test_52 of replay-single, the only reason that test_52
  can pass is because there is a typo:

  $CHECKSTAT -t file $DIR/$tfile-* which should be $DIR/$tfile

- add definitions for OBD_FAIL_LDLM_SRV_CP/BL/GL_AST and resolve
  OBD_FAIL conflictions

- OBD_FAIL_UPDATE_OBJ_NET_REP was renamed to
  OBD_FAIL_OUT_UPDATE_NET_REP but referenced with old name in tests.

- OBD_FAIL_MDS_FAIL_LOV_LOG_ADD check is obsoleted as well as tests.
  Meanwhile the OSP code was updated to fix panic in case of error.

- OBD_FAIL_TGT_LAST_REPLAY is removed along with test. It was never
  used and it seems it was even introduced by mistake.

Test-Parameters: envdefinitions=SLOW=yes alwaysuploadlogs testlist=replay-dual,replay-single
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If5113e459f5628047e17114b6bc20ba910f3c142
Reviewed-on: http://review.whamcloud.com/12232
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6138 lfsck: NOT hold reference on pre-loaded object 66/13666/5
Fan Yong [Thu, 20 Nov 2014 04:55:33 +0000 (12:55 +0800)]
LU-6138 lfsck: NOT hold reference on pre-loaded object

To improve the LFSCK performance, the LFSCK main engine will pre-load
the object locally or remotely, then generate related LFSCK request
that reference the pre-loaded object, and then push the request into
related LFSCK pipeline. The LFSCK assistant thread will handle the
LFSCK request some later asynchronously.

Originally, the LFSCK request holds the pre-loaded object reference,
so the assistant thread can handle it directly without locating the
object by FID again. But holding the object reference will cause the
object cannot be purged out RAM. If some LFSCK request has held the
object, and some other unlinked the object before the LFSCK assistant
thread handling the LFSCK request, then the unlinked object will be
cached in RAM until the last reference released. Because the LFSCK
main engine and assistant thread run asynchronously, we do not know
when the LFSCK request that holding the object reference will be
handled. If the assistant thread needs to locate the object with
the same FID before that, it will fall into self-deadlock for ever.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I516653aa2143bb32a5f350b314951b78dead3e79
Reviewed-on: http://review.whamcloud.com/13666
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6235 scrub: replace the stale OI mapping 45/13745/3
Fan Yong [Thu, 20 Nov 2014 04:54:59 +0000 (12:54 +0800)]
LU-6235 scrub: replace the stale OI mapping

If the OI mapping on the OST contains an invalid one, then the OI
lookup via osd_obj_map_lookup() may return -ENOENT. From the view
of OI scrub, it is indistinguishable from the case of there is no
such OI mapping, then it will cause the OI scrub to use "INSERT"
@ops for osd_scrub_refresh_mapping() to repair such inconsistency
by wrong. So the osd_obj_map_lookup() should return -ESTALE under
the case of invalid OI mapping exists, then the OI scrub can use
"UPDATE" @ops for osd_scrub_refresh_mapping() to repair.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I013125eb0aaec683ac8f56ec32a30e7858262f87
Reviewed-on: http://review.whamcloud.com/13745
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6239 doc: include missing lnetctl.8 49/13749/2
James Simmons [Thu, 12 Feb 2015 19:14:50 +0000 (14:14 -0500)]
LU-6239 doc: include missing lnetctl.8

Doing a man lnetctl currently doesn't work on system
with lustre installed. This is due to lnetctl.8 does
not get included in generated rpms. This simple fix
ensure lnetctl.8 is included in the rpms.

Change-Id: I72e2ef2841f5936e1d0def538c239ee2da32d7c3
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13749
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5873 ldiskfs: osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed 00/12600/9
Andriy Skulysh [Mon, 10 Nov 2014 10:48:11 +0000 (12:48 +0200)]
LU-5873 ldiskfs: osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed

The bug happens when  16TB-4KB limit is exceeded during write.

Add check for maximum file size on client and server sides.

Xyratex-bug-id: MRP-2131
Change-Id: I73f0ee803670ada869c2618f275049948668848e
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://review.whamcloud.com/12600
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6222 statahead: add to list before make ready 08/13708/2
Lai Siyao [Tue, 10 Feb 2015 13:44:44 +0000 (21:44 +0800)]
LU-6222 statahead: add to list before make ready

__sa_make_ready() set entry ready before adding to list, so that
revalidate_statahead_dentry()->sa_kill() may free an entry which
is not in any list yet.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I0b5f7200fb74c88450133d66bf7bf38d9355036f
Reviewed-on: http://review.whamcloud.com/13708
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5549 mdc: cl_default_mds_easize not refreshed 14/11614/12
Ned Bass [Wed, 17 Dec 2014 00:05:42 +0000 (16:05 -0800)]
LU-5549 mdc: cl_default_mds_easize not refreshed

The client_obd::cl_default_mds_easize field should track the largest
observed EA size advertised by the MDT, subject to a reasonable upper
bound.  The MDC uses cl_default_mds_easize to calculate the initial
size of request buffers.  The default value should be small enough to
avoid wasted memory and excessive use of vmalloc(), yet large enough
to accommodate the common use case.

In the current code, the default value is only updated if
client_obd::cl_max_mds_easize is strictly less than
mdt_body::mbo_max_mdsize. This condition is almost never met, because
client_obd::cl_max_mds_easize is computed at client mount-time based
on the number of OSTs in the filesystem, so the MDT won't ever observe
and advertise an EA size larger than that.

As a result, client_obd::cl_default_mds_easize indefinitely retains
its initial value, which is computed at client mount-time based on
the filesystem's default stripe width. Any getattr() requests for
widely striped files will consequently allocate a request buffer
that is too small, forcing reallocations on both the client and
server side. To avoid this, update client_obd::cl_default_mds_easize
independently of the value of client_obd::cl_max_mds_easize.

In addition, this patch includes these changes:

- Add comments to the client_obd structure to clarify what the
  cl_{default,max}_mds_{cookie,ea}size values mean.

- Prevent mdc_get_info() from storing uninitialized data in
  client_obd::cl_max_mds_cookiesize.

- Use 4096 as an upper bound for the default values.  The former
  bound of PAGE_CACHE_SIZE is too large on 64k-page platforms
  (i.e. PPC), so it fails to prevent the vmalloc() spinlock
  contention described in LU-3338. The new value was chosen to
  be large enough to accommodate common use cases while staying
  well below the 16k threshold at which allocations start using
  vmalloc().

- Add test case 27E to ./lustre/tests/sanity.sh.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Kyle Blatter <kyleblatter@llnl.gov>
Change-Id: I363017844d6af3e6b67b7c03bd206226f9495116
Reviewed-on: http://review.whamcloud.com/11614
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5549 llite: make default_easize writeable in /proc 12/13112/7
Ned Bass [Wed, 17 Dec 2014 00:03:10 +0000 (16:03 -0800)]
LU-5549 llite: make default_easize writeable in /proc

Allow default_easize to be tuned via /proc. A system administrator
might want this if a rare access to widely striped files drives up the
value on a filesystem where narrowly striped files are the more common
case. In practice, however, this is wanted primarily to facilitate
a test case for LU-5549.

- Plumb the necessary interfaces through the LMV and MDC layers
  to expose write access to this value by higher layers.

- Add block comments to modified functions.

- Correct misspelling of "default" in /proc handler function names
  in lustre/llite/lproc_llite.c. The file names in /proc were already
  spelled correctly so there are no issues with backward
  compatibility.

- Convert remaining space-indented lines in lmv_set_info_async()
  to tabs.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Change-Id: Iae2c8d0ca28cccf12af9372b1a10a0f9d170fddf
Reviewed-on: http://review.whamcloud.com/13112
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
5 years agoLU-5523 mdt: add --index option to default dir stripe 60/13360/10
wang di [Fri, 16 Jan 2015 00:23:44 +0000 (16:23 -0800)]
LU-5523 mdt: add --index option to default dir stripe

Add --index option to default dirstripe EA. If MDT find
out the client send the create req to the wrong MDT because
of default stripeEA, it will return -EREMOTE, then client
will retrieve default stripeEA through xattr cache, and
re-create the object.

Add delete default dirstripeEA (-d) to delete dir default
stripeEA.

Add ldo_dir_def_striping_cached and ldo_def_striping_cached
to means if default striping EA has been cached in ldo_object.

And ldo_striping_cached means if the object's own striping
has been loaded from disk.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic2896e9050f1581344db9368b8f7b25bfded3d7d
Reviewed-on: http://review.whamcloud.com/13360
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4647 lctl: add nodemap man pages to lctl 78/13478/3
Kit Westneat [Wed, 21 Jan 2015 02:55:13 +0000 (21:55 -0500)]
LU-4647 lctl: add nodemap man pages to lctl

This patch adds separate man pages for the 8 lctl nodemap commands,
and updates the lctl man page to point to them.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ia1350471a2878a8f4057d66a91141ad8dd132bc2
Reviewed-on: http://review.whamcloud.com/13478
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6086 obdclass: check peer's version for MDT-MDT connection 85/13285/5
Fan Yong [Tue, 4 Nov 2014 09:32:15 +0000 (17:32 +0800)]
LU-6086 obdclass: check peer's version for MDT-MDT connection

Because new DNE/LFSCK changed some wire protocol, we cannot support
the interoperations between different version MDTs. The basic rules
for the permitted connection are:
1) The @major in the connection version should be the same;
2) The @minor in the connection version should be the same;
3) The difference of the @patch in the connection version should NOT
   more than 3.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e77f305c7552ad01e92c97f1eda0756f1291d30
Reviewed-on: http://review.whamcloud.com/13285
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6199 ldiskfs: delete bad WARN_ON_ONCE from ldiskfs 04/13604/5
Bob Glossman [Tue, 3 Feb 2015 00:39:07 +0000 (16:39 -0800)]
LU-6199 ldiskfs: delete bad WARN_ON_ONCE from ldiskfs

lustre needs to call certain ext4/ldiskfs entry points without locking
i_mutex in order to avoid deadlocks.  This triggers a warning check
in ext4 code new in el6.6, not present in el6.5.  Already fixed
in ldiskfs patches for future kernel versions, but wasn't fixed for
el6.6

This mod adds an ldiskfs patch to eliminate the warning.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ia375a6d851a5262c578d722e2f8f4db2ea5249b7
Reviewed-on: http://review.whamcloud.com/13604
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agonew tag 2.6.94 2.6.94 v2_6_94 v2_6_94_0
Oleg Drokin [Mon, 9 Feb 2015 06:24:32 +0000 (01:24 -0500)]
new tag 2.6.94

Change-Id: I7cbdaaa209cb5c3db1612f0f9f36ac6668906962

5 years agoLU-6084 ptlrpc: prevent request timeout grow due to recovery 20/13520/9
Mikhail Pershin [Tue, 3 Feb 2015 18:30:14 +0000 (10:30 -0800)]
LU-6084 ptlrpc: prevent request timeout grow due to recovery

Patch fixes the issue with growing request timeout which occured
after commit 1d889090 for LU-5079. While commit itself is correct,
it reveals another issue. If request is being processed for a long
time on server then client adaptive timeouts will adapt to that
after receiving reply and new requests will have bigger timeout.
Another problem is that server AT history is corrupted by recovery
request processing time which not pure service time but includes
also waiting time for clients to recover.

Patch prevents the AT stats update from early replies on client and
from recovering requests processing time on server.
The ptlrpc_at_recv_early_reply() still updates the current request
timeout as asked by server, but don't include this into AT stats.
The real reply will bring that data from server after all.

Test-Parameters: alwaysuploadlogs testlist=replay-vbr,replay-dual

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ifcadfd669162013b6ccb386eb2b508bd9f0b22d9
Reviewed-on: http://review.whamcloud.com/13520
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6082 tests: fix too slow nodemap SLOW test 05/13605/5
Kit Westneat [Sat, 7 Feb 2015 08:38:42 +0000 (00:38 -0800)]
LU-6082 tests: fix too slow nodemap SLOW test

The SLOW test for nodemap is too slow to complete. This patch changes
the test to do 000-007, 010-070, 100-700 (octal) instead of testing
all modes, as was done before.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
mdtcount=1 testlist=sanity-sec

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ic92a3718de078ccfd13cf0b6580ab078dfedb144
Reviewed-on: http://review.whamcloud.com/13605
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-6109 lfsck: check FID validity before locating object 11/13511/4
Fan Yong [Mon, 10 Nov 2014 09:46:33 +0000 (17:46 +0800)]
LU-6109 lfsck: check FID validity before locating object

It is possible that the FID from iteration or linkEA is corrupted.
The LFSCK needs to check its validity before locating the object
with it to avoid falling into hung or other unexpected status.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I1df8d085bf5abf926d03882457cb8b221633d3aa
Reviewed-on: http://review.whamcloud.com/13511
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
5 years agoLU-6010 lnet: prevent assert on LNet module unload 10/13110/16
Amir Shehata [Wed, 17 Dec 2014 17:35:15 +0000 (09:35 -0800)]
LU-6010 lnet: prevent assert on LNet module unload

There is a use case where lnet can be unloaded while there are
no NIs configured.  Removing lnet in this case will cause
LNetFini() to be called without a prior call to LNetNIFini().
This will cause the LASSERT(the_lnet.ln_refcount == 0) to be
triggered.

To deal with this use case when LNet is configured a reference
count on the module is taken using try_module_get().  This way
LNet must be unconfigured before it could be removed; therefore
avoiding the above case.  When LNet is unconfigured module_put()
is called to return the reference count.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I0f283eeb395fa9a076a4d65ab3edd5e7807fc169
Reviewed-on: http://review.whamcloud.com/13110
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6175 ha: add health_check routine to the MDS, MGS and OSD 58/13558/2
Mikhail Pershin [Tue, 27 Jan 2015 23:25:04 +0000 (02:25 +0300)]
LU-6175 ha: add health_check routine to the MDS, MGS and OSD

Patch adds obd_health_check() methods in MDS and MGS to check
ptlrpc services health like OST does. Patch adds also health_check()
routine directly to OSD to check it is mounted and is not read-only.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ib4af652b08e7e3616ebb3b99ce3e4ad03bdd5ab5
Reviewed-on: http://review.whamcloud.com/13558
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6166 utils: fix bug in lr_link 46/13546/2
Wu Libin [Wed, 28 Jan 2015 06:17:06 +0000 (14:17 +0800)]
LU-6166 utils: fix bug in lr_link

When create a hard link of a file, the path and the file name are
same if it at the root directory, so the length of the path and name
will be the same in this case.

Signed-off-by: Wu Libin <lwu@ddn.com>
Change-Id: I3a72491efdc041ad0e96d036b04600b76bb646fe
Reviewed-on: http://review.whamcloud.com/13546
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6167 utils: fix bugs in lustre_sync 45/13545/3
Wu Libin [Wed, 28 Jan 2015 05:55:20 +0000 (13:55 +0800)]
LU-6167 utils: fix bugs in lustre_sync

The lustre_rsync will cause endloop and core dump problems, this
patch fix this problems. In function lr_cascade_move, it should
delete "curr" node in the "parents" list first, then move to the
next lr_cascade_move.

Signed-off-by: Wu Libin <lwu@ddn.com>
Change-Id: I5a5686ab89379da37453d07a5a00df4fd217ee59
Reviewed-on: http://review.whamcloud.com/13545
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6125 test: sanity test_27i defect: missing test_mkdir() 07/13407/4
Elena Gryaznova [Tue, 3 Feb 2015 00:10:14 +0000 (04:10 +0400)]
LU-6125 test: sanity test_27i defect: missing test_mkdir()

Fix sanity test_27i() to call test_mkdir()

Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Xyratex-bug-id: MRP-1194
Reviewed-by: Alexander Zarochentsev <alexander_zarochentsev@xyratex.com>
Change-Id: I093cb44590b98857189d69d1b8f6e9e9c423d3bc
Reviewed-on: http://review.whamcloud.com/13407
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5510 scrub: ldiskfs_create_inode returns locked inode 87/13187/6
Fan Yong [Thu, 13 Nov 2014 16:45:52 +0000 (00:45 +0800)]
LU-5510 scrub: ldiskfs_create_inode returns locked inode

There was race condition between creating new inode and OI scrub:
the OI scrub may find the new created inode just after the creator
creating it but before setting the LMA EA. Originally, to resolve
such trouble, the creator will set the new created inode's state
as LDISKFS_STATE_LUSTRE_NOSCRUB. But such state is set after the
new inode unlocked. So the OI scrub still has some chance to find
the new created inode with neither LDISKFS_STATE_LUSTRE_NOSCRUB
nor LMA EA.

Be as improvement, this patch makes the ldiskfs_create_inode() to
return the new created inode with lock. The caller can set more
state (not only for LFSCK, but also for other purposes in future)
on the new created inode before unlock it.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Idc1a8fbd3701f7e431ef4b7858cfdf4674d74add
Reviewed-on: http://review.whamcloud.com/13187
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-5722 obdclass: reorganize busy object accounting 68/12468/5
Frank Zago [Tue, 28 Oct 2014 22:02:14 +0000 (17:02 -0500)]
LU-5722 obdclass: reorganize busy object accounting

Due to some accounting bug, lsb_busy of a hash bucket can become
larger than the total number of objects in said bucket. A busy object
can be counted more than once. When that happens, a negative value is
returned by the shrinker to Linux's shrink_slab() function. In older
kernel, such as 2.6.32 used in RHEL 6, this will cause a forever loop
inside shrink_slab(), in essence hanging the host.

Instead of trying (and failing) to count the busy objects, count the
objects than are not busy, i.e. the objects that are present on the
lsb_lru list. The number of busy objects is then the difference
between the number of objects in the hash and the objects on the
lsb_lru list.

Change-Id: Ia6973991a1ff7fc53cdf8132bf2aab532934cf94
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12468
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6120 lfsck: notify ever failed server to exit LFSCK 25/13525/3
Fan Yong [Mon, 10 Nov 2014 08:48:08 +0000 (16:48 +0800)]
LU-6120 lfsck: notify ever failed server to exit LFSCK

During the first-stage scanning, the local LFSCK instance records
which OSTs have ever failed to respond LFSCK verification requests
(maybe because of network issues or the OST itself trouble). Then
before start the second-stage scanning, the local LFSCK instance
will notify those ever failed OSTs to skip orphan handling since
they missed some OST-objects verification via la_sync_failures().

Originally, after la_sync_failures(), related OSTs will be removed
from the LFSCK targets list, in spite of whether la_sync_failures()
succeed or not, then the subsequent LFSCK notification RPCs will not
be sent to those OSTs. That may cause some OST(s) cannot exit LFSCK
expectedly, and then the subsequent LFSCK start will get failure
since former LFSCK instance has not exit.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Id0283c78527d6a3a6c563de7ce6af1fe2d3f1a30
Reviewed-on: http://review.whamcloud.com/13525
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6050 target: control OST-index in IDIF via ROCOMPAT flag 16/13516/5
Fan Yong [Mon, 10 Nov 2014 20:48:24 +0000 (04:48 +0800)]
LU-6050 target: control OST-index in IDIF via  ROCOMPAT flag

Introduce new flag OBD_ROCOMPAT_IDX_IN_IDIF that is stored in the
last_rcvd file. For new formatted OST device, it will be auto set;
for the case of upgrading from old OST device, you can enable it
via the lproc interface osd-ldiskfs.index_in_idif. With such flag
enabled, for new created OST-object, its IDIF-in-LMA will contain
the OST-index; for the existing OST-object, the OSD will convert
old format IDIF as new format IDIF with OST-index stored in the
LMA EA when accessing such OST-object or via OI scrub. Once such
flag is enabled, it cannot be reverted back, so the system cannot
be downgraded to the orignal incompatible version.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e6e089d54fdb3970bb201eedac8dc09be2cc1c1
Reviewed-on: http://review.whamcloud.com/13516
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6063 kernel: use proper flags for call_usermodehelper 77/13677/2
James Simmons [Fri, 6 Feb 2015 17:43:05 +0000 (12:43 -0500)]
LU-6063 kernel: use proper flags for call_usermodehelper

When a parameter is permanently changed on the MGS the
MGS send a changelog packet to the proper nodes that
are affected by the change. Once the nodes receive the
change they then call the userland utility lctl to
change its local value. When calling a userland
application from the kernel you specify a flag to
control the interaction with the application. Originally
by default the flag was set to 0 which is UMH_NO_WAIT
which meant lctl was being called asynchronously. In
older kernels this was fine since UHM_NO_WAIT and
UHM_WAIT_PROC had nearly the same logic. This changed
with newer kernels which broke updating our parameters.
Plus doing a UHM_NO_WAIT doesn't report back a error
if something goes wrong with lctl. The fix is to set
the flag to UHM_WAIT_PROC so kernel space waits until
lctl has finished and we get a proper error code if
something does go wrong with lctl. Secondly the patch
uses the proper flag name instead of a number for the
use of call_usermodehelper in mdt_identity.c so the
code is more readable.

Change-Id: I016fd4342315e9db6ec3ef544bcfb3a477c97b52
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13677
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6154 zfs: striped directory and migration on ZFS 18/13518/5
wang di [Sun, 11 Jan 2015 20:19:30 +0000 (12:19 -0800)]
LU-6154 zfs: striped directory and migration on ZFS

1. Increase/decrease the refcount for sub_stripe object,
because we need explicitly increase/decrease refcount
for ZFS directory.

2. setup/cleanup sequence service for osd-zfs, so it can
create FID for local OSD.

3. Do not zero dah_eadata in OSD layer, instead of set it
MDD layer, so striping create process will be interferred.

4. Put 0 at the end of link data during migration, since
osd-zfs does not do it when reading link.

5. Create orphan object with linkEA data, so if migration
is interrupted, then other threads are able to read entries
from this half-migrated directory, because osd-zfs needs to
retrieve the parent FID from linkea data during read dir
entries (see osd_dir_it_rec()).

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I67cbd0b09d2716b163277425066dcf155df68039
Reviewed-on: http://review.whamcloud.com/13518
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6162 kernel: kernel update RHEL6.6 [2.6.32-504.8.1.el6] 60/13560/5
Bob Glossman [Tue, 27 Jan 2015 22:39:59 +0000 (14:39 -0800)]
LU-6162 kernel: kernel update RHEL6.6 [2.6.32-504.8.1.el6]

Update RHEL6.6 kernel to 2.6.32-504.8.1.el6

Test-Parameters: clientdistro=el6.6 mdsdistro=el6.6\
  ossdistro=el6.6 mdsfilesystemtype=ldiskfs\
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: If1bf2bca5f70e305be4859d8f5f196b3574abed3
Reviewed-on: http://review.whamcloud.com/13560
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6106 tests: Skip test_16 to test_23 if MDS version older than 2.6.90 09/13509/4
Wei Liu [Fri, 23 Jan 2015 01:12:44 +0000 (17:12 -0800)]
LU-6106 tests: Skip test_16 to test_23 if MDS version older than 2.6.90

Skip sanity-sec test_16 to test_23 if MDS version older than 2.6.90

Change-Id: I0f95dae3a7a0bdef52160a3ca76fefac6765007c
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/13509
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoRevert "LU-1214 ptlrpc: start minimum service threads" 47/13647/2
Oleg Drokin [Wed, 4 Feb 2015 18:11:53 +0000 (18:11 +0000)]
Revert "LU-1214 ptlrpc: start minimum service threads"

This seems to have broke something and causes wide conf-sanity failures.
See LU-6206 for more info

This reverts commit 43f96aa9cc3cec66d9b9e0a03e5fc23e094525e7.

Change-Id: Ie0d7124c72c7e590581ec92c2ab49c3d7bfa09fe
Reviewed-on: http://review.whamcloud.com/13647
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5829 ptlrpc: remove unnecessary EXPORT_SYMBOL 10/12510/13
Frank Zago [Fri, 9 Jan 2015 18:21:12 +0000 (12:21 -0600)]
LU-5829 ptlrpc: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Change-Id: I5dad1093f136577fa268cd7ecbebd1d660cfa8ef
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12510
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4870 lfsck: lock old MDT-object in migrating 82/13182/6
Fan Yong [Tue, 21 Oct 2014 13:54:21 +0000 (21:54 +0800)]
LU-4870 lfsck: lock old MDT-object in migrating

According to current metadata migration implementation, before the old
MDT-object is removed, both the new MDT-object and old MDT-object will
reference the same LOV layout. Then if the layout LFSCK finds the new
MDT-object by race, it will regard related OST-object(s) as multiple
referenced case, and will try to create new OST-object(s) for the new
MDT-object. To avoid such trouble, the layout LFSCK needs to lock the
old MDT-object before confirm the multiple referenced case.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e42cb86683c33bedfef01ae7f6e2cc305f1137d
Reviewed-on: http://review.whamcloud.com/13182
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4712 llite: lock the inode to be migrated 89/9689/8
wang di [Mon, 17 Mar 2014 18:23:02 +0000 (11:23 -0700)]
LU-4712 llite: lock the inode to be migrated

Because the inode and its connected dentries will be cleared
out of the cache after migration, the inode needs to be locked
during the migration.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ibbbb33473de1a67df85ef8930debcf22cd775bcb
Reviewed-on: http://review.whamcloud.com/9689
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5242 osd-zfs: umount hang in sanity 133g 00/13600/2
Isaac Huang [Mon, 2 Feb 2015 23:43:30 +0000 (16:43 -0700)]
LU-5242 osd-zfs: umount hang in sanity 133g

Disable 78 79 80 that's known to trigger txg_wait_open()
hang which would block umount forever.

Change-Id: I3770c11120790f55ecc021cc054971e00acc951b
Signed-off-by: Isaac Huang <he.huang@intel.com>
Test-Parameters: mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs testlist=sanity,sanity,sanity,sanity,sanity,sanity
Reviewed-on: http://review.whamcloud.com/13600
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5820 lfsck: use multiple namespace LFSCK trace files 09/12809/19
Fan Yong [Thu, 6 Nov 2014 11:59:27 +0000 (19:59 +0800)]
LU-5820 lfsck: use multiple namespace LFSCK trace files

The namespace LFSCK uses trace file to record the FID of the object
that has multiple hard links, or has remote name entry, or contains
some uncertain inconsistency, and so on. Only single namespace LFSCK
trace file may be not efficient, especially when there are millions
of FIDs to be recorded. So use multiple namespace LFSCK trace files
and per trace file based semaphore to control the concurrent access
of the trace file.

For Lustre-2.x (x <= 6), the LFSCK used LFSCK_NAMESPACE_MAGIC_V1 as
the namespace trace file magic. When downgrade to such old release,
the old LFSCK will not recognize the new LFSCK_NAMESPACE_MAGIC_V2 in
the new trace file, then it will reset the whole LFSCK, and will not
cause start failure. The similar case will happen when upgrade from
such old release.

This patch also drops some repeated FID recording in the namespace
LFSCK trace file. Related FID should have been recorded in the trace
file via lfsck_namespace_exec_oit(), it is unnecessary to do that
again when scanning the directory.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iec27c52b21789dbde1e4c1153f61162f028ceac3
Reviewed-on: http://review.whamcloud.com/12809
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
5 years agoLU-6095 tests: define TRUNCATE program for racer 01/13501/6
Jinshan Xiong [Thu, 22 Jan 2015 20:52:12 +0000 (12:52 -0800)]
LU-6095 tests: define TRUNCATE program for racer

In file_truncate.sh of racer, TRUNCATE was not defined for remote
clients. Let it point to tests/truncate in case it's not defined.

The same thing happens to MCREATE and LFS, fix them also and do
some cleanup.

Test-Parameters: alwaysuploadlogs testlist=racer
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ie6898f1573bd19810a2d8f14dc0aa375d3774e08
Reviewed-on: http://review.whamcloud.com/13501
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5357 lod: hold thandle during lod_trans_stop 20/13420/2
wang di [Wed, 14 Jan 2015 13:25:31 +0000 (05:25 -0800)]
LU-5357 lod: hold thandle during lod_trans_stop

Hold thandle during lod_trans_stop, to avoid the thandle
being freed in local transaction stop.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I2448d725e35b119a61bbfb2e9567446d203bec16
Reviewed-on: http://review.whamcloud.com/13420
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6115 test: sanity 133g defect: missing return after "skip" 89/13389/3
Elena Gryaznova [Wed, 14 Jan 2015 21:00:51 +0000 (01:00 +0400)]
LU-6115 test: sanity 133g defect: missing return after "skip"

Patch fixes test_133g(): add return() after skip()

Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Xyratex-bug-id: MRP-2153
Change-Id: I1787e1300930542c5a34c5a7e8bd277df28bf17a
Reviewed-on: http://review.whamcloud.com/13389
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
5 years agoLU-5829 obdclass: remove unnecessary EXPORT_SYMBOL 23/13323/4
Frank Zago [Fri, 9 Jan 2015 18:25:18 +0000 (12:25 -0600)]
LU-5829 obdclass: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Removed now unused function cat_cancel_cb() and fixed 3 comments in
test code mentioning this function.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ia0fa1e8e65f197235c04997f56b49d8fd87d4fd6
Reviewed-on: http://review.whamcloud.com/13323
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5829 misc: remove unnecessary EXPORT_SYMBOL 21/13321/2
Frank Zago [Fri, 9 Jan 2015 18:28:13 +0000 (12:28 -0600)]
LU-5829 misc: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ibb6dd722c47c7c76275ac24f1a6d8a4a988f433a
Reviewed-on: http://review.whamcloud.com/13321
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2430 utils: fix "lfs mv" command parsing 61/13161/2
Andreas Dilger [Sat, 20 Dec 2014 00:03:34 +0000 (17:03 -0700)]
LU-2430 utils: fix "lfs mv" command parsing

Fix the lfs_mv() long option parsing so that it uses "--mdt-index"
instead of incorrectly requiring "----mdt-index" for the short "-M"
option.

Fix up some error messages in lfs_mv() as well, and change a test
case to use the long option form.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I20ffde97fb5d31364e91d6b21d407eb3323ebbe5
Reviewed-on: http://review.whamcloud.com/13161
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5478 lov: get rid of obd_* typedefs 44/13144/5
Dmitry Eremin [Fri, 19 Dec 2014 13:42:51 +0000 (16:42 +0300)]
LU-5478 lov: get rid of obd_* typedefs

We have a bunch of typedefs for common things that made no sense
and hid the actual type from plain view.
Replace them with proper uXX or sXX types.
Exception is in lustre_idl.h and lustre_ioctl.h where
they are replaced with __uXX and __sXX to be able to be included
in userspace. Replace obd_off with loff_t.

patch 3 in series: modify lov/lmv

Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I9dfcc0bac691160c64ef8a120887b160c0c6986f
Reviewed-on: http://review.whamcloud.com/13144
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
5 years agoLU-2675 lnet: assume a kernel build 21/13121/6
John L. Hammond [Wed, 28 Jan 2015 16:40:23 +0000 (11:40 -0500)]
LU-2675 lnet: assume a kernel build

In lnet/lnet/ and lnet/selftest/ assume a kernel build (assume that
__KERNEL__ is defined). Remove some common code only needed for user
space LNet.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I79d6f50bac895116628c93c35e23f64dd102780f
Reviewed-on: http://review.whamcloud.com/13121
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5957 mdt: Update MDT flags after layout swap 77/12877/2
Henri Doreau [Thu, 27 Nov 2014 13:51:09 +0000 (14:51 +0100)]
LU-5957 mdt: Update MDT flags after layout swap

Swap MOF_LOV_CREATED flags between MDT objects after a layout swap to
guarantee that layout will be re-created on next write if its LOV has
been deleted.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: I3d0497d8be2a7335c1fb43e10af2b222243e6a81
Reviewed-on: http://review.whamcloud.com/12877
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2445 lfs: fixed support for lfs migrate -b 27/12627/4
Frank Zago [Fri, 7 Nov 2014 21:15:15 +0000 (15:15 -0600)]
LU-2445 lfs: fixed support for lfs migrate -b

-b is the short alias for --block to the lfs migrate command, but
wasn't set in the call to getopt_long().

Change-Id: Ie7397b994a34de71b9978cf51b55961b4c9ded69
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12627
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-5521 grant: quiet message on grant waiting timeout 46/12146/6
Johann Lombardi [Mon, 1 Sep 2014 10:38:31 +0000 (12:38 +0200)]
LU-5521 grant: quiet message on grant waiting timeout

Use at_max in osc_enter_cache() to bound how long we wait for grant
space before switching to synchronous I/Os. Do not print a message
on the console when the timeout is hit since such long wait can
be legitimate with flaky network (i.e. BRW is resent multiple times).

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Change-Id: I63b40783381f6133e2f77dbc0f827e13f571ccd2
Reviewed-on: http://review.whamcloud.com/12146
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5023 tests: check FID seq properly for sanity-lfsck t_11b 76/10276/4
Fan Yong [Fri, 10 Oct 2014 18:14:04 +0000 (02:14 +0800)]
LU-5023 tests: check FID seq properly for sanity-lfsck t_11b

To guarantee the right FID seq to be checked.

Other scripts improvement for error handling.

Try to collect more logs.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I51cb75c15cc7421721ea0bc149fc2a5a72c13cc6
Reviewed-on: http://review.whamcloud.com/10276
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6081 user: use random() instead of /dev/urandom 77/13277/12
Patrick Farrell [Tue, 9 Dec 2014 04:26:28 +0000 (22:26 -0600)]
LU-6081 user: use random() instead of /dev/urandom

/dev/urandom gives good random numbers, but using it is very prone to
error, and opening/closing the device every time a number is needed
takes time.

Instead, initializes the library with our seed by calling srandom(),
and then use random(). Export a boolean variable
liblustreapi_initialized to let applications check that the library
was properly initialized by the loader.

Signed-off-by: frank zago <fzago@cray.com>
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Ie6ced0d39df29d7054919e239add58a23115ec35
Reviewed-on: http://review.whamcloud.com/13277
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-1214 ptlrpc: start minimum service threads 76/2876/18
Andreas Dilger [Wed, 14 Jan 2015 14:55:44 +0000 (09:55 -0500)]
LU-1214 ptlrpc: start minimum service threads

If the ptlrpc_min_threads parameter is changed via /proc after the
service has started, then at least the requested number of service
threads should be started.  Otherwise this parameter would only be
used at initial thread startup and ignored if changed via /proc.

Fix conf-sanity.sh test_52[ab] to verify that at least the minimum
number of threads has been started when threads_min parameter is
changed, instead of just checking the parameter itself.  Also fix
test code style for 80-column line wrapping and tabs for indents.

The head utility does not always support shortcut "-1" option. It
should be specified as "-n1".

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I6e4bb4131d7500a93952b64102f885c76558cab0
Reviewed-on: http://review.whamcloud.com/2876
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5816 target: don't trigger watchdog waiting in recovery 72/12672/7
Hongchao Zhang [Thu, 9 Oct 2014 22:43:31 +0000 (06:43 +0800)]
LU-5816 target: don't trigger watchdog waiting in recovery

In target_recovery_thread, the process should not be considered
to be "blocked state" if it was waiting something to happen,
otherwise, the kernel watchdog will print:

task tgt_recov:19764 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
tgt_recov     D 0000000000000003     0 19764      2 0x00000000
Call Trace:
check_for_clients+0x0/0x70 [ptlrpc]
target_recovery_overseer+0x9d/0x230 [ptlrpc]
exp_connect_healthy+0x0/0x20 [ptlrpc]
autoremove_wake_function+0x0/0x40
target_recovery_thread+0x0/0x1920 [ptlrpc]

Change-Id: Ic1ad4dce1df974dd99e0b28cee211de173d178e5
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/12672
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
5 years agoLU-6147 lfsck: NOT purge object by OI scrub 93/13493/8
Fan Yong [Sun, 9 Nov 2014 04:00:41 +0000 (12:00 +0800)]
LU-6147 lfsck: NOT purge object by OI scrub

Originally, when the OI scrub found some inconsistent FID mapping,
it will repair the FID mapping and ask others to reload the object
by purging such object. Such behavior may cause others to hang.
Because if the object corresponding to the FID has already been
established in RAM, and if some other holds the object's reference,
such as the LFSCK engine will hold the .lustre/lost+found/MDTxxxx,
then purging object will set LU_OBJECT_HEARD_BANSHEE on the object,
then the subsequent object find against such FID will be blocked
until the object's reference become zero and re-establish the object
in RAM. Unfortunately, if it is the object's reference holder tries
to find the same object, it will be blocked by itself for ever.

On the other hand, on the server side, the OI scrub will repair
the bad OI mappping, if the object is established in RAM before
its bad FID mapping repaired, then it must be marked as non-exist,
and should not be cached in RAM after the last reference released.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I651ef5f5e8f4f478f07bcbb5622b345deed7cb31
Reviewed-on: http://review.whamcloud.com/13493
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6031 test: Check server version in recovery-small test 10d 57/13557/2
Mikhail Pershin [Wed, 28 Jan 2015 21:33:02 +0000 (00:33 +0300)]
LU-6031 test: Check server version in recovery-small test 10d

Test should check server version for interoperability needs.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I3b46ba9291c8c64cc3d3c235c0985f88df23f633
Reviewed-on: http://review.whamcloud.com/13557
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6171 kernel: kernel update [RHEL7 3.10.0-123.20.1.el7] 70/13570/2
Bob Glossman [Fri, 30 Jan 2015 15:46:55 +0000 (07:46 -0800)]
LU-6171 kernel: kernel update [RHEL7 3.10.0-123.20.1.el7]

update RHEL7 kernel to 3.10.0-123.20.1.el7

Test-Parameters: clientdistro=el7 mdsfilesystemtype=ldiskfs\
        mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ieb1e8a2bb4cd86268721af91dd15d2c5bc69d0bf
Reviewed-on: http://review.whamcloud.com/13570
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-3536 osd: allocate it for each iteration. 23/13223/8
wang di [Mon, 22 Dec 2014 23:08:41 +0000 (15:08 -0800)]
LU-3536 osd: allocate it for each iteration.

Add osd iteration structure(osd_it_ea) to specific SLAB,
and allocate new osd_it_ea for each iteration, so iteration
can be nested, which will help DNE and LFSCK.

Since iteration for iam and quota are not so often,
we just allocate them with normal OBD_ALLOC_PTR.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I6402259708264f9341f314e7a2f6afe16cc66481
Reviewed-on: http://review.whamcloud.com/13223
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5971 llite: rename ccc_req to vvp_req 77/13377/3
John L. Hammond [Tue, 13 Jan 2015 16:06:42 +0000 (10:06 -0600)]
LU-5971 llite: rename ccc_req to vvp_req

Rename struct ccc_req to struct vvp_req and move related functions
from lustre/llite/lcommon_cl.c to the new file lustre/llite/vvp_req.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I6589cd1e039b41e55fcd833476f6a58ff2492900
Reviewed-on: http://review.whamcloud.com/13377
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6003 lnet: improvement to router checker 35/13035/6
Amir Shehata [Thu, 11 Dec 2014 18:52:26 +0000 (10:52 -0800)]
LU-6003 lnet: improvement to router checker

This patch starts router checker thread all the time.

The router checker only checks routes by ping if
live_router_check_interval or dead_router_check_interval are set
to something other than 0, and there are routes configured.

If these conditions are not met the router checker sleeps until woken
up when a route is added.  It is also woken up whenever the RC is
being stopped to ensure the thread doesn't hang.

In the future when DLC starts configuring the live and dead
router_check_interval parameters, then by manipulating them
the router checker can be turned on and off by the user.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I778690755e7121abd575f1a261637cb6dc754edd
Reviewed-on: http://review.whamcloud.com/13035
Tested-by: Jenkins
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5823 clio: add cl_object_find_cbdata() 94/12494/13
Bobi Jam [Thu, 30 Oct 2014 07:00:22 +0000 (15:00 +0800)]
LU-5823 clio: add cl_object_find_cbdata()

* Delete obsolete obd_ops::o_find_cbdata interface.
* Delete obsolete obd_ops::o_change_cbdata interface.
* Add cl_object_find_cbdata().

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I2e64e2e9a112783cb5c66bf4580fd1aec794417b
Reviewed-on: http://review.whamcloud.com/12494
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5971 llite: move vvp_io functions to vvp_io.c 76/13376/3
John L. Hammond [Tue, 13 Jan 2015 15:29:14 +0000 (09:29 -0600)]
LU-5971 llite: move vvp_io functions to vvp_io.c

Move all vvp_io related functions from lustre/llite/lcommon_cl.c to
the sole file where they are used lustre/llite/vvp_io.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I5b7d9671a32aaff7a2ebce42b0f5ff10e2eeb4ab
Reviewed-on: http://review.whamcloud.com/13376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoNew tag 2.6.93 2.6.93 2_6_93 2_6_93_0 v2_6_93 v2_6_93_0
Oleg Drokin [Tue, 27 Jan 2015 18:04:42 +0000 (13:04 -0500)]
New tag 2.6.93

Change-Id: I826747da53ed1d9b0b2417b7b597dab3b76088a3

5 years agoLU-6114 test: add $mbench_OPTIONS to run_metabench() 88/13388/3
Elena Gryaznova [Wed, 14 Jan 2015 21:32:29 +0000 (01:32 +0400)]
LU-6114 test: add $mbench_OPTIONS to run_metabench()

Cray's metabench version requires -p <dictionary> parameter.
Patch adds mbench_OPTIONS to metabench call.

Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Xyratex-bug-id: MRP-1113
Reviewed-by: Vladimir Saveliev <vladimir_saveliev@xyratex.com>
Reviewed-by: Alexander Lezhoev <Alexander_Lezhoev@xyratex.com>
Change-Id: Id00f96c034f3d2d501421c0dd435354becea7512
Reviewed-on: http://review.whamcloud.com/13388
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6081 lfs: fixed bad return value 93/13293/2
Frank Zago [Thu, 8 Jan 2015 16:09:14 +0000 (10:09 -0600)]
LU-6081 lfs: fixed bad return value

When a command parameter line is invalid, CMD_HELP should be returned.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Icca4886ca2d6497837ea359b3a96398253467e19
Reviewed-on: http://review.whamcloud.com/13293
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5499 tests: use grep -w to search /proc/mounts 09/12409/4
Andreas Dilger [Fri, 24 Oct 2014 00:42:45 +0000 (18:42 -0600)]
LU-5499 tests: use grep -w to search /proc/mounts

When searching for /sbin/mount.lustre in /proc/mounts, use "grep -qw"
instead of using a trailing space, because if the mount.lustre binary
is deleted while it is mounted (e.g. by "make clean") it may have a
non-printable character following it and not be found and unmounted.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ia33c4d4b7efa73f543999f73da198fa0698cab07
Reviewed-on: http://review.whamcloud.com/12409
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-1095 llite: improve max_readahead console messages 99/12399/3
Andreas Dilger [Thu, 23 Oct 2014 11:47:30 +0000 (05:47 -0600)]
LU-1095 llite: improve max_readahead console messages

Improve the max_readahead_mb, max_readahead_per_file_mb, and
max_read_ahead_whole_mb console error messages to print the
parameters properly in MB instead of PAGE_SIZE units, and include
the filesystem name and bad parameters in the output.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ifae8bd7012c2b5e11306fd8ecb53ef7fe500c1e2
Reviewed-on: http://review.whamcloud.com/12399
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5997 mdd: initialize mdd's obd->obd_vars 80/12980/6
Vladimir Saveliev [Mon, 5 Jan 2015 15:01:53 +0000 (10:01 -0500)]
LU-5997 mdd: initialize mdd's obd->obd_vars

mdd_procfs_init() initializes obd->obd_vars of not mdd's obd, but
mdt's one. Having mdd's obd->obd_vars uninitialized leads conf_param
to fail on setting mdd' parametes.

Xyratex-bug-id: MRP-2277
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Change-Id: I065dc9e4577816ce08f22787116fae4f7e971db5
Reviewed-on: http://review.whamcloud.com/12980
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2675 lustre: remove lustre/include/linux for debian 95/13495/2
Li Xi [Thu, 22 Jan 2015 08:35:02 +0000 (16:35 +0800)]
LU-2675 lustre: remove lustre/include/linux for debian

The directory of lustre/include/linux has been removed. Build
system for Debian shouldn't pack that directory any more.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I7d28681f574a990b8c54261567a3f107f9a9d159
Reviewed-on: http://review.whamcloud.com/13495
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5275 build: add LPROCFS to the deprecated symbol list 63/13463/2
John L. Hammond [Tue, 20 Jan 2015 02:29:29 +0000 (20:29 -0600)]
LU-5275 build: add LPROCFS to the deprecated symbol list

In contrib/scripts/checkpatch.pl deprecate LPROCFS and suggest use of
CONFIG_PROC_FS instead.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I801acd18b97c5c1aa474aa3960c9bfc0758e3652
Reviewed-on: http://review.whamcloud.com/13463
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6087 lod: use correct attrs in striped directory create 73/13473/2
John L. Hammond [Tue, 20 Jan 2015 22:02:45 +0000 (16:02 -0600)]
LU-6087 lod: use correct attrs in striped directory create

In lod_xattr_set_lmv() use the times, ownership, and mode of the local
object when creating the shards. Add test_33f to sanity.sh to check
that the ownership is handled properly.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Icc511d0f56888bcc8c095f0da4a6bdf99ccdeab5
Reviewed-on: http://review.whamcloud.com/13473
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5863 tests: add a separate MGS/MDS test case into conf-sanity 91/13391/3
Jian Yu [Wed, 14 Jan 2015 01:16:54 +0000 (17:16 -0800)]
LU-5863 tests: add a separate MGS/MDS test case into conf-sanity

In conf-sanity.sh, test 21d is a basic test case that verifies
separate MGS/MDS. However, it's always skipped under combined
MGS/MDS configuration. This patch adds a new test case 21e to
setup another Lustre filesystem to verify separate MGS/MDS without
depending on the configuration of the origial Lustre filesystem.

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ONLY=21 \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs testlist=conf-sanity

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ONLY=21 \
mdtfilesystemtype=zfs mdsfilesystemtype=zfs \
ostfilesystemtype=zfs testlist=conf-sanity

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I3defa936a9b4f97dc3849c3a4a9626332da53d0f
Reviewed-on: http://review.whamcloud.com/13391
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6146 tests: race condition for check/use cfs_fail_val 81/13481/9
Fan Yong [Tue, 4 Nov 2014 08:02:22 +0000 (16:02 +0800)]
LU-6146 tests: race condition for check/use cfs_fail_val

There are some race conditions when check/use cfs_fail_val.
For example: when inject failure stub for LFSCK test as following:

764   if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_DELAY2) &&
765       cfs_fail_val > 0) {
766           struct l_wait_info lwi;
767
768           lwi = LWI_TIMEOUT(cfs_time_seconds(cfs_fail_val),
769                             NULL, NULL);
770           l_wait_event(thread->t_ctl_waitq,
771                        !thread_is_running(thread),
772                        &lwi);
773
774           if (unlikely(!thread_is_running(thread))) {
775                   CDEBUG(D_LFSCK, "%s: scan dir exit for engine "
776                          "stop, parent "DFID", cookie "LPX64"n",
777                          lfsck_lfsck2name(lfsck),
778                          PFID(lfsck_dto2fid(dir)),
779                          lfsck->li_cookie_dir);
780                   RETURN(0);
781           }
782   }

The "cfs_fail_val" may be changed as zero by others after the check
at the line 765 but before using it at the line 768. Then the LFSCK
engine will fall into "wait" until someone run "lfsck_stop".

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I418621faaf6a1f42ba1d541b37374c1dc21831be
Reviewed-on: http://review.whamcloud.com/13481
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-5859 llog: do not cleanup orphans in remote catalogs 14/13414/3
Alex Zhuravlev [Thu, 15 Jan 2015 10:26:43 +0000 (13:26 +0300)]
LU-5859 llog: do not cleanup orphans in remote catalogs

when a catalog is being processed by the client, just ignore
empty llogs, do not try to clean them as the client has no
direct access to the storage.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: Ida933d44475fd392fe3db96bcdd4a05076b63881
Reviewed-on: http://review.whamcloud.com/13414
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-3716 obdecho: create a separate root object for echo access 30/10130/12
Jian Yu [Sat, 17 Jan 2015 07:00:47 +0000 (23:00 -0800)]
LU-3716 obdecho: create a separate root object for echo access

Currently, while echo client and normal client are attached at the
same time, both md echo objects and normal objects are created
and looked up under the same root object (ROOT), which will cause
ASSERTION( lu_device_is_mdt(o->lo_dev) ) failure.

This patch fixes the issue by creating a separate root object
(ROOT_ECHO) for echo access. The md echo objects created under
this root object can only be accessed by echo client. Normal client
will never see these echo objects.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
mdtcount=1 testlist=mds-survey

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I8d8a9bd2c467bb40a7993d492aa3d4ba6676ac8f
Reviewed-on: http://review.whamcloud.com/10130
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6088 lmv: Do not revalidate stripes with master lock 32/13432/5
wang di [Wed, 14 Jan 2015 22:47:53 +0000 (14:47 -0800)]
LU-6088 lmv: Do not revalidate stripes with master lock

Do not revalidate slave stripes while holding master lock.
Otherwise if the revalidating slaves are blocked, then the
master lock can not be released in time.

Remove some unnecesary merging in ll_revalidate_slave(), and
the attributes will be stored in each stripe, only
merging them if required.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I57c43236894e2bbbf9a20b1d90c5ab2a5dc62ef1
Reviewed-on: http://review.whamcloud.com/13432
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4951 scripts: remove nodemap.ko in dkms.conf 84/12784/2
Bruno Faccini [Wed, 19 Nov 2014 15:45:25 +0000 (16:45 +0100)]
LU-4951 scripts: remove nodemap.ko in dkms.conf

This new/2nd patch for a similar cause (new/removed Lustre module)
fixes dkms.conf creation script to comply with nodemap.ko removal
that has been introduced by LU-4647 patch (Gerrit change #9299, at
http://review.whamcloud.com/9299, Commit
83f04354ff68a14d7492e35a9576c91492a1206c) that has landed in
master (between tags 2.6.54 and 2.6.90).

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I9d0eb4dd3da31c46d7eda54e0ced998edb837741
Reviewed-on: http://review.whamcloud.com/12784
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>