Whamcloud - gitweb
fs/lustre-release.git
5 years agoLU-6047 mdt: remove Size on MDS support 42/13442/3
John L. Hammond [Fri, 16 Jan 2015 19:33:46 +0000 (13:33 -0600)]
LU-6047 mdt: remove Size on MDS support

Remove size on MDS support from lustre/mdt/. In struct mdt_object
change the struct mutex mot_ioepoch_mutex member to spinlock_t
mot_write_lock and rename mot_writecount to mot_write_count.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I271117618f7b88a22ddbcca4db5a4723ab48e3ea
Reviewed-on: http://review.whamcloud.com/13442
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6313 tests: more robust for scrub test_11 57/13957/3
Fan Yong [Tue, 3 Mar 2015 21:13:28 +0000 (05:13 +0800)]
LU-6313 tests: more robust for scrub test_11

For the sanity-scrub test_11, except for the known created by the
test scripts, there may be other objects (such as for llog) have
been created before the first OI scrub scaning. So it is not easy
to estimate how many objects should be skipped during the first
OI scrub scanning. So we only check that the number of skipped
files is more than the number or known created.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iced9fb255559394117880514c5e716d05a81a177
Reviewed-on: http://review.whamcloud.com/13957
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6335 kernel: kernel upgrade [RHEL7.1 3.10.0-229.el7] 90/14090/4
Bob Glossman [Thu, 12 Mar 2015 17:05:26 +0000 (10:05 -0700)]
LU-6335 kernel: kernel upgrade [RHEL7.1 3.10.0-229.el7]

upgrade from RHEL7.0 to RHEL7.1 3.10.0-229.el7 kernel

Test-Parameters: clientdistro=el7 testgroup=review-ldiskfs \
  mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I6b733eb4571b57339889e927c4658c02e7ac7f34
Reviewed-on: http://review.whamcloud.com/14090
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6221 utils: hsm_root is also required for --dry-run 73/13673/2
Bruno Faccini [Fri, 6 Feb 2015 12:13:02 +0000 (13:13 +0100)]
LU-6221 utils: hsm_root is also required for --dry-run

Not specifying hsm_root in copytool/lhsmtool_posix command line for
--dry-run mode can lead to failure/error.
This path ensures that hsm_root will be required even for --dry-run.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Icaa2af6d1365751d9e77b2be3f60aacc9c1f6a5c
Reviewed-on: http://review.whamcloud.com/13673
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6378 kernel: simplify quota-avoid-dqget-call.patch 35/14135/3
Niu Yawei [Mon, 23 Mar 2015 05:23:32 +0000 (01:23 -0400)]
LU-6378 kernel: simplify quota-avoid-dqget-call.patch

Backport the patch from upstream kernel, which doesn't rely
on the I_NEW to skip dqget()/dqput() calls, it should check
the i_dquot directly instead.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I10e2e8284704bc7cf9ffae4ee88f06fafef14b1a
Reviewed-on: http://review.whamcloud.com/14135
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
5 years agoLU-6356 ptlrpc: ret -ECONNREFUSED if not context found in req 43/14043/4
Sebastien Buisson [Wed, 11 Mar 2015 10:31:08 +0000 (11:31 +0100)]
LU-6356 ptlrpc: ret -ECONNREFUSED if not context found in req

Return -ECONNREFUSED instead of -ENOMEM in sptlrpc_req_get_ctx()
if no context is found in req.
It it more graceful?

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Change-Id: If1b142199a94d1976093a7d26a05e49a63f50469
Reviewed-on: http://review.whamcloud.com/14043
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6245 libcfs: cleanup libcfs lock handling 93/13793/4
James Simmons [Mon, 9 Mar 2015 20:53:41 +0000 (16:53 -0400)]
LU-6245 libcfs: cleanup libcfs lock handling

Previously with libcfs being built for user land and kernel
space wrappers were created to transparently handle locking.
Now that user land support has been removed we delete all
those locking wrappers with this patch.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: Icbd9b5c0918cb01202439416b220b6f327144a91
Reviewed-on: http://review.whamcloud.com/13793
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6245 lnet: remove kernel defines in userland headers 92/13792/6
James Simmons [Thu, 19 Mar 2015 15:57:34 +0000 (11:57 -0400)]
LU-6245 lnet: remove kernel defines in userland headers

Currently the lnet headers used for user land applications
contain various kernel definations. This is due to the
fact libcfs contains kernel wrappers for user land which
will be going away. This patch sorted the header data
so all kernel containing structures are moved out of
headers that user land will use.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I3904cd692bf2debd3123cbf8ca98dfc518ce0a97
Reviewed-on: http://review.whamcloud.com/13792
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6395 mgc: one byte shorter for logname allocation 46/14146/2
wang di [Sun, 22 Mar 2015 15:15:19 +0000 (08:15 -0700)]
LU-6395 mgc: one byte shorter for logname allocation

One byte shorter for logname allocation in mgc_llog_local_copy(),
which might cause buffer overflow in the following sprintf().

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ie758c3650c1cf7848874d9fd3a02a5618043eb8f
Reviewed-on: http://review.whamcloud.com/14146
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-5953 build: use installed OFED by default 86/12686/19
Bruno Faccini [Wed, 12 Nov 2014 14:23:06 +0000 (15:23 +0100)]
LU-5953 build: use installed OFED by default

During LNET autoconf phase, if OFED installed and its devel headers
are available, default to use it instead of in-Kernel IB driver.

Also handle wrong case where OFED installed but not devel preventing
to build against OFED. Had to add new "patches" vs "kernel_patches"
dir name use in recent OFED versions and to avoid its check to
collide with inkernel-IB builds case.

Current OFED headers detection mechanism allow for non-standard
prefix but relies on "ofed_info" command and on "%prefix/openib"
link (both are ok for 1.5.x and 3.x versions), and should work
for both source and DKMS Lustre builds.

Test-Parameters: nettypes=o2ib
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I82639f8392d5fe707a3b1a1719d53ab937e918b5
Reviewed-on: http://review.whamcloud.com/12686
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6218 osd-zfs: increase redundancy for meta data 41/13741/4
Isaac Huang [Thu, 12 Feb 2015 01:45:13 +0000 (18:45 -0700)]
LU-6218 osd-zfs: increase redundancy for meta data

Use DMU_OTN_UINT8_METADATA for local objects so their
data blocks would get an additional ditto copy. This
increases redundancy and hence the chance of recovery
by zpool scrub in the event of corruption.

Change-Id: I502da680521027733ea53744905c47f569a1b531
Signed-off-by: Isaac Huang <he.huang@intel.com>
Test-Parameters: mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs
Reviewed-on: http://review.whamcloud.com/13741
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5757 hsm: strengthen checks for flags and archive id 37/13337/9
Bruno Faccini [Sat, 10 Jan 2015 11:33:49 +0000 (12:33 +0100)]
LU-5757 hsm: strengthen checks for flags and archive id

Prior to this patch undefined flags bits and out of range
archive id can be set.
Also changed the concerned error handling that has been
recently added (LU-5732) as part of sanity-hsm/test_500.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I64403de4529f0214bab55c2fc13281b0a3d30a11
Reviewed-on: http://review.whamcloud.com/13337
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6245 libcfs: move lucache from libcfs to lustre 83/13783/8
James Simmons [Mon, 9 Mar 2015 20:06:38 +0000 (16:06 -0400)]
LU-6245 libcfs: move lucache from libcfs to lustre

The lucache handling in libcfs is used only for
idmap handling in the obdclass and mdt layers.
Since this is the case we can move the lucache
handling into the lustre stack. As a bonus the
lucache will only be built when we enable server
support instead of the current state of it being
built for clients as well.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I812f2c5952ea79bd023435e5fac1955316c9c59f
Reviewed-on: http://review.whamcloud.com/13783
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6245 libcfs: remove tcpip abstraction from libcfs 60/13760/19
James Simmons [Fri, 13 Mar 2015 17:11:19 +0000 (13:11 -0400)]
LU-6245 libcfs: remove tcpip abstraction from libcfs

Since libcfs no longer builds for user land we can
move the tcpip abstraction that exist to the LNET
layer which is the only place that uses it. Also
the migrated code will use native linux kernel
apis directly instead of with wrappers.

Change-Id: Iaa39e4f581f18cfe586feb5bfbf4233a2f2335c7
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13760
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5823 clio: add cl_object_fiemap() 35/12535/19
Bobi Jam [Mon, 3 Nov 2014 10:52:29 +0000 (18:52 +0800)]
LU-5823 clio: add cl_object_fiemap()

* Add cl_object_operations::coo_fiemap().
* Add cl_object_fiemap() to get FIEMAP mappings.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ie32eb5ddb8d2daa1a66055f347cef4757d039e75
Reviewed-on: http://review.whamcloud.com/12535
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6319 tests: Clean up sanityn ALWAYS_EXCEPT list 53/13953/2
James Nunez [Tue, 3 Mar 2015 17:39:45 +0000 (10:39 -0700)]
LU-6319 tests: Clean up sanityn ALWAYS_EXCEPT list

At some point between Lustre 1.8 and 2.1, sanityn test 22 was removed.
Test number 22 is still included in the ALWAYS_EXCEPT list and in the
EXCEPT list; the list of tests that will not be run under normal
(autotest) testing. Remove test 22 from the ALWAYS_EXCEPT and EXCEPT
lists.

Also, tests 11 and 14 are skipped for SUSE10. All Lustre branches from
b2_4 to current master are not built for and are no longer tested on
SLES10. Remove the check for SUSE10 and remove tests 11 and 14 from
the ALWAYS_EXCEPT list.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I2222770874c6c2da816cfd54b371ddc9c0da370b
Reviewed-on: http://review.whamcloud.com/13953
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5929 tests: conf-sanity test 72 call tune2fs on MDSs 61/12761/3
James Nunez [Tue, 18 Nov 2014 01:32:29 +0000 (18:32 -0700)]
LU-5929 tests: conf-sanity test 72 call tune2fs on MDSs

The call to tune2fs tuning MDTs with "-O extents" is now
run on the MDS node(s) and not the client.

Test-Parameters: alwaysuploadlogs testlist=conf-sanity

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ie4e1bb2d9447d86c9b0144e8f2564b5c2444842d
Reviewed-on: http://review.whamcloud.com/12761
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5504 utils: add const qualifier to changelog accessors. 87/13787/3
Frank Zago [Tue, 17 Feb 2015 21:11:40 +0000 (15:11 -0600)]
LU-5504 utils: add const qualifier to changelog accessors.

Commit 6e1365 changed 4 functions, and commit 0f22e4 accidentally
reverted them.

This patch put them back, as well as adding new ones to
changelog_rec_size(), changelog_rec_varsize(), changelog_rec_rename()
and changelog_rec_jobid().

6e1365 also changed changelog_rec_name() and changelog_rec_sname() to
return a const, but that is not possible anymore.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I19e389c422795c2ece4d7af369099cb733d3cb1a
Reviewed-on: http://review.whamcloud.com/13787
Tested-by: Jenkins
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-6322 lfsck: show start/complete time directly 48/13948/4
Fan Yong [Tue, 3 Mar 2015 21:00:32 +0000 (05:00 +0800)]
LU-6322 lfsck: show start/complete time directly

It is more easy for the users to use/understand when the LFSCK
was started and/or when the LFSCK completed by showing related
time directly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibdacccf1abba6041eaddd6bb5456fb122e9ca994
Reviewed-on: http://review.whamcloud.com/13948
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6317 lfsck: NOT count the objects repeatedly 33/13933/4
Fan Yong [Sun, 7 Dec 2014 04:41:18 +0000 (12:41 +0800)]
LU-6317 lfsck: NOT count the objects repeatedly

The namespace LFSCK uses object-table based iteration plus namespace
based directory traversing to scan the system. So one object will be
returned twice by them. Counting the objects repeatedly will confuse
the users. So the namespace LFSCK should only count the objects that
are scanned via namespace based directory traversing.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I36459743843e1db1e9372d46d3aafddef033d699
Reviewed-on: http://review.whamcloud.com/13933
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6316 lfsck: skip dot name entry 23/13923/2
Fan Yong [Thu, 4 Dec 2014 15:19:57 +0000 (23:19 +0800)]
LU-6316 lfsck: skip dot name entry

It is unnecessary for the namespace LFSCK to verify the dot
entry since it is always on the local MDT and has no linkEA.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I01289b04c8807e930c6f777007f1e1fb3295431d
Reviewed-on: http://review.whamcloud.com/13923
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6235 osd-ldiskfs: NOT skip LMAC_NOT_IN_OI in osd_check_lma 22/13922/2
Fan Yong [Thu, 4 Dec 2014 14:21:13 +0000 (22:21 +0800)]
LU-6235 osd-ldiskfs: NOT skip LMAC_NOT_IN_OI in osd_check_lma

Sometimes, the ost-object may references a wrong indoe because of
the invalid OI mapping. Usually, the OSD can auto detect that via
osd_check_lma(). For old system, if the inode's LMV contains flag
LMAC_NOT_IN_OI, it would skip related checking. But such behavior
is wrong, if may cause the osd-object to reference some important
system inode, and cause system crash via subsequent modification.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib5cdf2ee4d9893a87fde3caf81109eabdad9ecfa
Reviewed-on: http://review.whamcloud.com/13922
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5682 lfsck: optimize ldlm lock used by LFSCK 66/12766/11
Fan Yong [Tue, 25 Nov 2014 14:24:59 +0000 (22:24 +0800)]
LU-5682 lfsck: optimize ldlm lock used by LFSCK

When LFSCK repairs some inconsistency, it needs to take related
ldlm lock(s) firstly to prevent concurrent modifications or purge
client side cache. Originally, to simply the implementation, the
LFSCK just simply acquires LCK_EX mode ibits lock(s) on related
object(s). But such coarse-grained lock policy may be not efficient
for some directory-based modification, such as insert name entry to
the directory.

This patch introduces lfsck PDO (Parallel Directory Operations) lock
for directory-based LFSCK modification, it only locks part of the
directory with the given <object, name> pairs, then allow others to
access or modify the different part(s) of the directory in parallel,
and also avoid to purge client-side cache unnecessarily.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I29bad81112c14e3aaecaa2b808e60ea74c10a702
Reviewed-on: http://review.whamcloud.com/12766
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6129 lnet: DLC design doc 19/13419/6
Amir Shehata [Thu, 15 Jan 2015 18:30:24 +0000 (10:30 -0800)]
LU-6129 lnet: DLC design doc

Add a DLC design document to lustre/doc/

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I3ec283f960aba7b56afadcc1d2a7770604efb023
Reviewed-on: http://review.whamcloud.com/13419
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6384 mdt: propagate find errors in mdt_fid2path() 08/14108/2
John L. Hammond [Thu, 19 Mar 2015 14:41:19 +0000 (09:41 -0500)]
LU-6384 mdt: propagate find errors in mdt_fid2path()

In mdt_fid2path() propagate the specific error from mdt_object_find()
rather than returning -EINVAL.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ib09f3741f95c0f3484f9a7839e31c583ecc34761
Reviewed-on: http://review.whamcloud.com/14108
Tested-by: Jenkins
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6134 utils: lfs should only open/stat files if needed 22/13822/3
Andreas Dilger [Fri, 20 Feb 2015 12:25:43 +0000 (05:25 -0700)]
LU-6134 utils: lfs should only open/stat files if needed

Since (commit 322968acf183) "lfs find" would needlessly open() and
fstat() every file if the --ost, -uid/user, -gid/group, -[amc]time,
or -size options were used, to get the MDT index for each file.
This was causing "lfs find --ost" to fail if an OST was offline, and
added needless overhead that "lfs find" was meant to avoid.

The MDT index is only needed if --mdt is used, so only get it in
that case.  It also wasn't necessary to call fstat() in this case
either because the file type was already known at this point.

Some other minor cleanups related to fetching the MDT index:
- don't use ret in cb_get_dirstripe() as it is isn't needed
- fix cb_get_mdt_index() to avoid a Coverity false positive due to
  initializing rc and having a conditional branch that is always taken
- convert spaces to tabs for related code, other minor style fixes

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ib41ced742fe5068f504f540479e6b4718d2540e5
Reviewed-on: http://review.whamcloud.com/13822
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6345 test: compare /bin/sleep in sanity-hsm.sh test_30c 25/14025/3
Emoly Liu [Wed, 4 Mar 2015 22:07:21 +0000 (06:07 +0800)]
LU-6345 test: compare /bin/sleep in sanity-hsm.sh test_30c

In case /bin/sleep is modified during the test, we do a checksum
at the beginning and the end of the test respectively, and won't
mark the test a failure if the checksum has changed.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I1ff472ea6052e2df9ba9fd4c78a4cf53686e1ccd
Reviewed-on: http://review.whamcloud.com/14025
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6014 mdt: remove unused function 26/12326/7
Alex Zhuravlev [Fri, 17 Oct 2014 12:13:25 +0000 (16:13 +0400)]
LU-6014 mdt: remove unused function

mdt_trans_stop() is not used.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I34da179288bef9a928173aa79e53cc74082abed7
Reviewed-on: http://review.whamcloud.com/12326
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
5 years agoLU-6283 ptlrpc: re-add NRS policy registration symbol exports 03/14003/2
Nikitas Angelinas [Fri, 6 Mar 2015 21:38:34 +0000 (13:38 -0800)]
LU-6283 ptlrpc: re-add NRS policy registration symbol exports

Export the ptlrpc_nrs_policy_(register|unregister)() functions, in
order to allow modules other than ptlrpc to load NRS policies on
demand.

These symbols were unexported as part of a subsystem-wide
symbol-unexporting effort for PTLRPC by commit 3ee0e09.

Signed-off-by: Nikitas Angelinas <nikitas.angelinas@seagate.com>
Xyratex-bug-id: MRP-2489
Change-Id: Ic294a94202fb644f997f11b931f7f9bc36d221ba
Reviewed-on: http://review.whamcloud.com/14003
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
5 years agoLU-6219 utils: remove O_NONBLOCK usage for archive file 72/13672/2
Bruno Faccini [Fri, 6 Feb 2015 11:03:32 +0000 (12:03 +0100)]
LU-6219 utils: remove O_NONBLOCK usage for archive file

In the first implementations of Posix copytool, file archive/restore
was using a loop around select() with handling/retry on EAGAIN.
This was later found as useless for regular files and fixed/changed
in patch for LU-3971 (http://review.whamcloud.com/7583, commit
397ebc93cef378e6d77450cdd095e2737b94f2f6).
But the O_NONBLOCK flag usage, during open() of file on archive, has
been kept since and must be removed, even if ineffective, to improve
code clarity.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ie6350e6f3951545f50783c9fff6753793b7a9a33
Reviewed-on: http://review.whamcloud.com/13672
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6205 tests: fix bash expansion of fid 18/13618/5
Frank Zago [Tue, 3 Feb 2015 20:27:54 +0000 (14:27 -0600)]
LU-6205 tests: fix bash expansion of fid

When calling lfs path2fid, the FID is returned between bracket. When
that fid variable is used, it may be expanded by the shell to
something else. For instance:

  $ touch x
  $ ../utils/lfs fid2path lustre [0x200000be7:0xb:0x0]
  bad FID format [x], should be [0x1:0x2:0x0]

  fid2path: error on FID x: Invalid argument

This will cause some tests, like 154A or 238, to sometimes fail.

Use quotes where the FIDs are used.

Replace "$(lfs ..." with "$($LFS ..." and made a couple variables
local.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I3c1a34585ebaa596d66063f5ada3ccfc4d202ade
Reviewed-on: http://review.whamcloud.com/13618
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
5 years agoLU-5829 ldlm: remove unnecessary EXPORT_SYMBOL 24/13324/2
Frank Zago [Fri, 9 Jan 2015 18:25:58 +0000 (12:25 -0600)]
LU-5829 ldlm: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ic182e844d621e6ba3c22e685c72b3702ccbb793b
Reviewed-on: http://review.whamcloud.com/13324
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5829 lnet: remove unnecessary EXPORT_SYMBOL 20/13320/5
Frank Zago [Fri, 9 Jan 2015 18:24:18 +0000 (12:24 -0600)]
LU-5829 lnet: remove unnecessary EXPORT_SYMBOL

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I8374d2da55d839e361be5721425d7270425f2286
Reviewed-on: http://review.whamcloud.com/13320
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6078 utils: fix copytool file bounds checking 26/13226/3
Bruno Faccini [Fri, 2 Jan 2015 12:37:37 +0000 (13:37 +0100)]
LU-6078 utils: fix copytool file bounds checking

Strengthen copytool file bounds checking in either full
or partial/extent mode.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I36939f9a18362ca3131e39b8d390978dfc79405a
Reviewed-on: http://review.whamcloud.com/13226
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4820 osd: drop memcpy in zfs osd 91/12991/12
Alex Zhuravlev [Mon, 24 Mar 2014 15:30:19 +0000 (19:30 +0400)]
LU-4820 osd: drop memcpy in zfs osd

dmu_read() was called from osd_read_prep() copying from
ARC bufs into the same ARC bufs. seem to be the remainings
of pre-zerocopy age.

Change-Id: I87c10a2d484b7fe0be370349a2bfeb857ddd74e9
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/12991
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4239 tests: test FID related APIs 45/12545/11
Frank Zago [Wed, 15 Oct 2014 19:19:44 +0000 (14:19 -0500)]
LU-4239 tests: test FID related APIs

This adds a few stress tests to the user lustre API, related to FIDs.

Change-Id: I34144a8f4c446e55c6630d31cae6a133d61eb304
Signed-off-by: Frank Zago <fzago@cray.com>
Test-Parameters: alwaysuploadlogs envdefinitions=ONLY=154g testlist=sanity
Reviewed-on: http://review.whamcloud.com/12545
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5657 doc: allow the use of rst2man to build man pages 40/12040/6
frank zago [Thu, 25 Sep 2014 00:39:59 +0000 (19:39 -0500)]
LU-5657 doc: allow the use of rst2man to build man pages

The man page sources can now be written in reStructuredText (rst), and
the man pages will automatically be generated with rst2man.

Added a build dependency on rst2man and the package python-docutils.

Converted lustreapi.7 to ReST to validate the solution.

Change-Id: I69e9892a238a002eb86769ed65b758cba55543bb
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12040
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6340 lnet: LNet startup script fix 00/14000/5
Amir Shehata [Fri, 6 Mar 2015 20:17:49 +0000 (12:17 -0800)]
LU-6340 lnet: LNet startup script fix

When starting up LNet via the startup script, check if the default
LNet yaml configuration file is present, if it is, then make sure
to use "lnetctl lnet configure" to bring up LNet instead of
"lctl network up".  The latter configures networks and routes
defined in the modparams, while the former does not, since the
configuration defined in the YAML file will be used.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I1a05cba2a9a6b7a2179b541f1ea5db6d2e89b243
Reviewed-on: http://review.whamcloud.com/14000
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6107 tests: Skip sanityn test_82 if server version is older than 2.6.92 10/13510/3
Wei Liu [Fri, 23 Jan 2015 06:11:31 +0000 (22:11 -0800)]
LU-6107 tests: Skip sanityn test_82 if server version is older than 2.6.92

Skip sanityn test_82 if server version is older than 2.6.92

Change-Id: I2361a333ad1edfb546f18d1a1bb34c9d9173c2e5
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/13510
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-1593 tests: Remove sanity 34h from ALWAYS_EXCEPT 18/13918/4
James Nunez [Fri, 27 Feb 2015 23:27:30 +0000 (16:27 -0700)]
LU-1593 tests: Remove sanity 34h from ALWAYS_EXCEPT

Sanity test 34h is skipped for all ZFS testing. Since the
issue in LU-1593 is resolved, test 34h needs to be removed
from the ALWAYS_EXCEPT list.

Test-Parameters: alwaysuploadlogs

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I922a179f76fba7643f9bd7251509433848f384ec
Reviewed-on: http://review.whamcloud.com/13918
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6357 kernel: kernel update RHEL6.6 [2.6.32-504.12.2.el6] 58/14058/2
Bob Glossman [Wed, 11 Mar 2015 18:02:40 +0000 (11:02 -0700)]
LU-6357 kernel: kernel update RHEL6.6 [2.6.32-504.12.2.el6]

Update RHEL6.6 kernel to 2.6.32-504.12.2.el6

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I75af90922aac0e3e06aa7952ad87aec8b57bc1d2
Reviewed-on: http://review.whamcloud.com/14058
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5022 ldiskfs: enable support for RHEL7 49/10249/39
Yang Sheng [Mon, 9 Mar 2015 15:05:52 +0000 (23:05 +0800)]
LU-5022 ldiskfs: enable support for RHEL7

This patch adds support for RHEL7.1 [3.10.0-229.el7] kernel.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ifbc294a53bd21eb35d373637d3326fc3c611c9f0
Reviewed-on: http://review.whamcloud.com/10249
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-3259 clio: Revise read ahead implementation 59/10859/17
Jinshan Xiong [Fri, 16 Jan 2015 19:23:38 +0000 (11:23 -0800)]
LU-3259 clio: Revise read ahead implementation

In this implementation, read ahead will hold the underlying DLM lock
to add read ahead pages. A new cl_io operation cio_read_ahead() is
added for this purpose. It takes parameter cl_read_ahead{} so that
each layer can adjust it by their own requirements. For example, at
OSC layer, it will make sure the read ahead region is covered by a
LDLM lock; at the LOV layer, it will make sure that the region won't
cross stripe boundary.

Legacy callback cpo_is_under_lock() is removed.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ic388e3a3f744ea5a8352cc8529e32a71073bddb3
Reviewed-on: http://review.whamcloud.com/10859
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6324 osd: allow larger osd_thread_info for debug 55/13955/2
John L. Hammond [Tue, 3 Mar 2015 20:18:13 +0000 (14:18 -0600)]
LU-6324 osd: allow larger osd_thread_info for debug

In osd_mod_init() skip the CLASSERT() on the size of struct
osd_thread_info if CONFIG_DEBUG_MUTEXES or CONFIG_DEBUG_SPINLOCK is
defined.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I1320403a345886cefaf538dbf80d7c49fa226183
Reviewed-on: http://review.whamcloud.com/13955
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5155 scripts: added lustre/scripts/zfsobj2fid 21/13721/5
Christopher J. Morrone [Wed, 11 Feb 2015 04:13:45 +0000 (21:13 -0700)]
LU-5155 scripts: added lustre/scripts/zfsobj2fid

The zfsobj2fid script converts ZFS object xattr FID to
standard Lustre FID format so it can be used with Lustre
tools like "lfs fid2path".

Change-Id: Id87ff0533a5431a292bca24a76815642f4318083
Signed-off-by: Isaac Huang <he.huang@intel.com>
Reviewed-on: http://review.whamcloud.com/13721
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5848 lfsck: debug log for sanity-lfsck test_18e 50/13950/3
Fan Yong [Tue, 3 Mar 2015 21:09:52 +0000 (05:09 +0800)]
LU-5848 lfsck: debug log for sanity-lfsck test_18e

More debug information for sanity-lfsck test_18e.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I18682ef13c0a12063e3cb595b2e16961451bbe89
Reviewed-on: http://review.whamcloud.com/13950
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6030 osd-ldiskfs: improve mount option handling 72/13572/13
Yang Sheng [Thu, 12 Feb 2015 18:24:54 +0000 (02:24 +0800)]
LU-6030 osd-ldiskfs: improve mount option handling

--handle force-over-128tb option to osd layer
--handle bigendian-check option to osd layer
--strip out extents option & remove extents-mount-options patch
--strip out iopen & mballoc mount options
--back LDISKFS_SUPER_MAGIC to EXT4_SUPER_MAGIC

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ic9bf431d0826d6279fc76f7fd1d7e356e421f292
Reviewed-on: http://review.whamcloud.com/13572
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-6047 obd: remove client Size on MDS support 69/13169/3
John L. Hammond [Mon, 22 Dec 2014 18:40:21 +0000 (12:40 -0600)]
LU-6047 obd: remove client Size on MDS support

Remove the unused OBD MD API method md_done_writing(). Remove the
unused logcookie and struct md_open_data ** parameters from
md_setattr(). Remove the unused functions iattr_from_obdo(),
md_from_obdo(), and obdo_refresh_inode().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I59bf2b101807f5b582eb7ab27e5a742284800979
Reviewed-on: http://review.whamcloud.com/13169
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6047 llite: remove client Size on MDS support 26/13126/10
John L. Hammond [Mon, 15 Dec 2014 18:47:21 +0000 (12:47 -0600)]
LU-6047 llite: remove client Size on MDS support

Size on MDS support have been in preview since at least 2.0.0. Remove
support for it from lustre/llite/.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I0b31d893453ef57e54cc9052d4fb6a669a11e28f
Reviewed-on: http://review.whamcloud.com/13126
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6040 lnet: remove messages from lazy portal on NI shutdown 36/13836/4
Amir Shehata [Sat, 21 Feb 2015 00:05:31 +0000 (16:05 -0800)]
LU-6040 lnet: remove messages from lazy portal on NI shutdown

When shutting down an NI in a busy system, some messages received
on this NI, might be on the lazy portal.  They would have grabbed
a ref count on the NI.  Therefore NI will not be removed until
messages are processed.

In order to avoid this scenario, when an NI is shutdown go through
all messages queued on the lazy portal and drop messages for the
NI being shutdown

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I67c8b720a6eb62fded4f084c1acea69dcdc8d2b6
Reviewed-on: http://review.whamcloud.com/13836
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6001 build: cleanup build scripts after reorganization 87/12987/10
Dmitry Eremin [Mon, 8 Dec 2014 15:25:45 +0000 (18:25 +0300)]
LU-6001 build: cleanup build scripts after reorganization

After passing a few configuration parameters in "--with/--without"
option to rpmbuild some code become useless.

Don't pass options through configure_args that can be passed through
rpmbuild options. This allows to avoid unexpected behavior during
the build from source rpm.

Change module-dist-hook: target according coding guidelines.

Remove obsolete liblustre.{a,so} from .spec file that were actually
removed in commit cdfbc722f4d63d3ed3740cbb549062f712010d90.

Don't add the version of kernel to .src.rpm.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ib5f50d257b5d95efe9c45d1865f9dab9ccc3c19a
Reviewed-on: http://review.whamcloud.com/12987
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5231 hsm: display file size in decimal not hex 78/13678/3
Frank Zago [Fri, 6 Feb 2015 19:55:07 +0000 (13:55 -0600)]
LU-5231 hsm: display file size in decimal not hex

'lfs hsm_action' displays the file sizes in hex:
  somebigfile: ARCHIVE running (0xf1c00000 bytes moved)

This is not user friendly. Use decimal instead.

Remove the last occurences of LPX64 in lfs.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ib964c162b275bc836104cec3500a2f03c73dffeb
Reviewed-on: http://review.whamcloud.com/13678
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-6209 lnet: Delete all obsolete LND drivers 63/13663/4
James Simmons [Tue, 10 Feb 2015 02:28:45 +0000 (21:28 -0500)]
LU-6209 lnet: Delete all obsolete LND drivers

Remove ralnd, mxlnd, qswlnd drivers. They are no
longer supported and have not even been buildable
for a long time.

Change-Id: I9c88b446028e79122b5847448fdd23fb6cb5c530
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13663
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6020 kerberos: proper sg list initialization 31/13631/3
Andrew Perepechko [Wed, 4 Feb 2015 13:50:10 +0000 (16:50 +0300)]
LU-6020 kerberos: proper sg list initialization

This patch adds sg_init_table() calls in order
to have proper sg list initialization including
magics, tables sizes, etc.

Without it, when using kernels with CONFIG_DEBUG_SG
option, the following crash can happen:

kernel BUG at include/linux/scatterlist.h:65!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/system/cpu/online
CPU 0

Pid: 4911, comm: ptlrpcd_3 Not tainted 2.6.32-431 #7                  /D525MWV
RIP: 0010:[<ffffffffa0b60170>]  [<ffffffffa0b60170>] krb5_make_checksum+0x750/0x770 [ptlrpc_gss]

Change-Id: Ic6c52c8b15393d8d7f67f4bf675c1f57cf27004a
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-on: http://review.whamcloud.com/13631
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
5 years agoLU-6030 ldiskfs: clean up ext4-fiemap patch 71/13571/11
Yang Sheng [Thu, 29 Jan 2015 10:13:55 +0000 (18:13 +0800)]
LU-6030 ldiskfs: clean up ext4-fiemap patch

Move ext4-fiemap patch to osd-ldiskfs. So we can
remove this patch entirely.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I639733f6f106398bbc3d5e2ffc6fa8a06ffe867f
Reviewed-on: http://review.whamcloud.com/13571
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6215 osc: use list_for_each_entry_safe() when delete items 56/13956/3
Andreas Dilger [Tue, 3 Mar 2015 20:22:51 +0000 (13:22 -0700)]
LU-6215 osc: use list_for_each_entry_safe() when delete items

Since we will remove items off the list using list_del_init() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().

Linux-commit: f13ab92effb94c8fc5eade75f6f246facd7ef5be

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I6ec6d8073da6e0aa45e9d8a6ee7cde84ed9cab07
Reviewed-on: http://review.whamcloud.com/13956
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6261 gnilnd: Cray interconnect rollup 12/13812/4
Chuck Fossen [Thu, 19 Feb 2015 21:21:42 +0000 (15:21 -0600)]
LU-6261 gnilnd: Cray interconnect rollup

I am leaving a few lines in structure definitions that are
longer than 80 columns. It's not the time to reformat the
whole structure.
-------------------------------------------------------------
Subject: Update debug messages for rca and quiesce events
Description:
Change informational message when receiving down event for
better tracking of RCA event issues to display under console
logging.
Clarify the message printed when we receive connection request
from a down node.
Simplify quiesce messages to just report the start and end of
quiesce.
-------------------------------------------------------------
Subject: Limit fma block allocations.
Description:
Under network pressure whereby thousands of nodes need to
reconnect all at the same time, routers can run out of memory
allocating fma blocks for mailboxes since the previous ones
cannot be cleaned up until a new connection is established.
Limit the amount of fma blocks that can be allocated to 3
quarters of total memory. This leaves memory free for other
allocations which tend to be much smaller than the mailboxs.
This should only be needed on service nodes.
Clean up some whitespace in kgn_data_t.
-------------------------------------------------------------
Subject: Double deregistration error.
Description:
lustre:18920 introduced a bug which causes us to deregister
the same memory twice when the transfer is unaligned.
Clean up the tx_buffer_copy after a deregistration so that
kgnilnd_rdma can properly register the memory on the retry.
-------------------------------------------------------------
Subject: Stack reset is causing pings to timeout instead of
failing immediately.
Description:
It is possible to register with the same MDD after a stack
reset causing pings to timeout instead of failing right away.
During a stack reset, we need to deregister with a hold
timeout set so we don't use the same mdd after the stack reset
is complete.
This was found by gnilnd regression test 110c.
-------------------------------------------------------------
Subject: Post rdma resource error
Description:
Handle kgni_post_rdma resource error by unmapping the tx and
put it back on the TX_MAPQ.
Also fixed:
fast_reconn variable check was using the pointer instead of
it's value.
bug that causes a stall when calling
kgnilnd_wakeup_rca_thread() when regression test causes
startup failures and the rca thread has not started yet.
Only call sock_release if socket was created.
Changed some stats prints to print unsigned values so they
don't show as negative.
-------------------------------------------------------------
Subject: limit kgnilnd conns in purgatory
Description:
Currently kgnilnd allows for an infinite number of connections
in purgatory, which in the face of a missed rca event can
cause nodes to slowly run out of memory from continued timed
out connection requests to those halted or dead nodes.
This mod makes the following changes to alleviate this issue:
1. Add a module parameter and live tunable allowing us to
limit
   number of connections per peer held in purgatory.
2. Remove the fast reconnect path on the server by making
   that tunable contain different settings for computes
   and service nodes. fast_reconnect is on for computes and
   off for service nodes. This setting can be changed on a
   live system.
3. In the kgnilnd reaper code utilize the tunable and remove
   the oldest purgatoried connections as new connections are
   put into purgatory. This will keep memory usage down and
   allow a system to stay up in the face of nodes being down
   and rca not informing us that they are down.
-------------------------------------------------------------
Subject: Update kgnilnd to be KNC aware.
Description:
Kgnilnd currently ignores rt_accel nodetype events coming from
RCA. This is incorrect as KNC's down and up events are
reported as rt_accel.
Since we currently ignore rt_accel events this causes us to
continually attempt to talk to down KNC nodes.
With this mod we now recognize rt_accel events allowing us to
prevent
communications with down KNC nodes.
-------------------------------------------------------------
Subject: Always notify LNET on GNILND_RCA_NODE_DOWN
Description:
When an LNET router fails it can take router_ping_timeout +
live_router_check_interval seconds for all peers to detect the
down router. For peers on a gni network this can be over two
minutes. During this time peers will continue to use the
failed router.
In some situations gnilnd will receive an event from RCA
notifying that the node is down within 30 seconds of the node
failure. This is much faster than relying on the router
pinger, so gnilnd should call lnet_notify() to notify LNET,
upon receipt of the RCA event, that a peer is down.
-------------------------------------------------------------
Subject: Add fast reconnect path and update lnet_notify last
alive timestamp.
Description:
A lustre client can time out a router during a blade failure
which causes multiple quiesce cycles.
When we time out a connection, reconnect even if there are no
tx's waiting to be sent. This causes an lnet_notify up
notification so we don't need to wait
for the router pinger to bring the connection back up.
At the end of a quiesce, call lnet_notify that the peer is
still up which updates the last alive timestamp.
Various debug message cleanup.
-------------------------------------------------------------
Subject: gnilnd proc_dir_entry port - part 2
Description:
PDE_DATA is defined by libcfs in Cray-master and therefore
only needed by b2_5
-------------------------------------------------------------
Subject: gnilnd proc_dir_entry port
Description:
In SLES12 create_proc_entry and create_proc_read_entry have
been removed, and struct proc_dir_entry is no longer public.
This mod ports all proc functions to use seq_file.
-------------------------------------------------------------
Subject: Remove system.h from gnilnd
Description:
There is no longer system.h for x86 and gnilnd doesn't seem to
need it.
Remove it from gnilnd include.
-------------------------------------------------------------

Signed-off-by: Chuck Fossen <chuckf@cray.com>
Change-Id: Iad14538751cc50fbd03fd3d4876ca41f4c0a223f
Reviewed-on: http://review.whamcloud.com/13812
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4727 hsm: use IOC_MDC_GETFILEINFO in restore 50/13750/6
John L. Hammond [Thu, 12 Feb 2015 19:53:18 +0000 (13:53 -0600)]
LU-4727 hsm: use IOC_MDC_GETFILEINFO in restore

Use IOC_MDC_GETFILEINFO rather than fstatat() to get the original file
attributes during restore. Add test_12p to sanity-hsm to check that
triggering an implicit restore from the copytool's own mount point
does not wedge the copytool.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I1b1eeb703c60907a2759fdb6d8fb8728a13f8918
Reviewed-on: http://review.whamcloud.com/13750
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6049 obdclass: Add synchro in lu_context_key_degister() 64/13164/7
Patrick Valentin [Mon, 22 Dec 2014 10:11:54 +0000 (11:11 +0100)]
LU-6049 obdclass: Add synchro in lu_context_key_degister()

When unloading a module, it may happen that lu_context_key_degister()
removes a key while a thread is either registering it in a new
context (lu_context_init(), lu_context_refill()), or using it when
exiting from a context (lu_context__exit(), lu_context__fini()).

In these cases, we reference a key which no longer exists, and
the system crashes either because we use a *POISON'ed* pointer
in key_fini() -> key->lct_fini(), or because one of the following
assertions fails:
 - lu_context_key_degister():
        ASSERTION(cfs_atomic_read(&key->lct_used) == 1)
                  failed: key has instances: 2

 - lu_context_exit():
        ASSERTION(key != NULL)

 - key_fini():
        ASSERTION(atomic_read(&key->lct_used) > 1)

This can also leads to SLAB objects which are not freed:
        slab error in kmem_cache_destroy(): cache `echo_thread_kmem':
                   Can't free all objects

Note: ptlrpc service threads need to call lu_context_init/fini in
each loop (for each RPC), and this could be a big performance issue
on fat SMP machines if we add serialization by a spinlock and need
to lock/unlock it for multiple times for each RPC.

So the aim of this patch, which only impacts some low frequently used
functions, is:
 1) to add a synchronization in lu_context_key_quiesce(), also called
    by lu_context_key_degister(), to wait until all key::lct_init()
    methods have completed, by serializing with keys_fill()
 2) to add a synchronization in lu_context_key_degister(), to wait
    until all transient contexts referencing this key have run
    key::lct_fini() method

Signed-off-by: Patrick Valentin <patrick.valentin@bull.net>
Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Change-Id: Id4ad974e8c7b8053d6e35ebce60cfbcf91dc230b
Reviewed-on: http://review.whamcloud.com/13164
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6203 tests: early lock cancel to allow early copytool death 46/13646/2
Bruno Faccini [Wed, 4 Feb 2015 16:39:38 +0000 (17:39 +0100)]
LU-6203 tests: early lock cancel to allow early copytool death

Since copytool death check+timing has been introduced with patch for
LU-5622, sanity-hsm/test_251() has experienced several failures
due to copytool death being delayed and to timeout, because of lock
cancel.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I399b37854b98626c4c92a367d543b79aebf9eb4e
Reviewed-on: http://review.whamcloud.com/13646
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6321 lfsck: make lfsck_namespace trace file as index 45/13945/2
Fan Yong [Sun, 7 Dec 2014 01:00:55 +0000 (09:00 +0800)]
LU-6321 lfsck: make lfsck_namespace trace file as index

Originally, the "lfsck_namespace" file stored both the namespace
LFSCK statistics information and the FIDs to be double scanned.
But to improve the namespace LFSCK performance (since Lustre-2.7),
we used multiple trace files with the name "lfsck_namespace_xx".
At that time, the original "lfsck_namespace" file only need to
record the namespace LFSCK statistics information. So we made it
as regular file, NOT index file. Such changes will cause trouble
when downgrade to Lustre-2.6 or older, becuase the old namespace
LFSCK needs an index trace file instead of regular file. To avoid
the compatibility issues, we will keep the "lfsck_namespace" file
as index file on b2_7 and newer release.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I76d8b1416c4c507793aa9bbab2d52cc7d8daa440
Reviewed-on: http://review.whamcloud.com/13945
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6307 obdclass: distinguish MGC/MDT connection properly 27/13927/2
Fan Yong [Thu, 4 Dec 2014 16:58:06 +0000 (00:58 +0800)]
LU-6307 obdclass: distinguish MGC/MDT connection properly

In the 5f8847bca12afb798de600299356ed2e3655a53e, we introduced the
version checking for the MDT-MDT connection. But there is a corner
that the MGC will set OBD_CONNECT_MNE_SWAB (that is defined as the
same as OBD_CONNECT_MDS_MDS) in the connection flags for Imperative
Recovery interoperability issues with MGS. So the server needs to
know whether the connection is really from another MDT or from the
MGC via checking OBD_CONNECT_FID (that is not set for the MGC-MGS
connection).

Test-Parameters: envdefinitions=ONLY=105 clientjob=lustre-b2_6 clientbuildno=19 testlist=recovery-small
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9cee743d5474702b77adbb8c3dedd6c19faef15f
Reviewed-on: http://review.whamcloud.com/13927
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5760 rpm: remove Red Hat specific check for init scripts 77/12377/6
Dmitry Eremin [Wed, 22 Oct 2014 12:05:18 +0000 (16:05 +0400)]
LU-5760 rpm: remove Red Hat specific check for init scripts

The issue with build under mock-based environments is related to
a sloppy heuristic of checking for the existence of checking for
two files under /etc, and assuming that is a good way to identify
a Red Hat system. We had a concern about this for other systems.

So, let's remove this Red Hat specific check of /etc files.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ibc6af75ebea51b39d5ff4c8473db2e3828ffea68
Reviewed-on: http://review.whamcloud.com/12377
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6173 llite: allocate and free client cache asynchronously 46/13746/4
Emoly Liu [Fri, 13 Feb 2015 02:15:48 +0000 (10:15 +0800)]
LU-6173 llite: allocate and free client cache asynchronously

Since the inflight request holds import refcount as well as export,
sometimes obd_disconnect() in client_common_put_super() can't put
the last refcount of OSC import (e.g. due to network disconnection),
this will cause cl_cache being accessed after free.

To fix this issue, ccc_users is used as cl_cache refcount, and
lov/llite/osc all hold one cl_cache refcount respectively, to avoid
the race that a new OST is being added into the system when the client
is mounted.
The following cl_cache functions are added:
- cl_cache_init(): allocate and initialize cl_cache
- cl_cache_incref(): increase cl_cache refcount
- cl_cache_decref(): decrease cl_cache refcount and free the cache
  if refcount=0.

Also, the fix of LU-2543 is not needed anymore, so reverted.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I22ff10b683b683d49d603e5dc2de3397746a79bb
Reviewed-on: http://review.whamcloud.com/13746
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5393 osd-ldiskfs: read i_size once to protect against race 07/13707/6
Bruno Faccini [Tue, 10 Feb 2015 10:07:11 +0000 (11:07 +0100)]
LU-5393 osd-ldiskfs: read i_size once to protect against race

There have been several occurences of ASSERTION(local_nb[i].rc == 0)
failures in ost_brw_read(), where inode's i_size has changed due to
a racing write/growth beyond EOF. osd_read_prep() must protect
himself against this legal behavior by only reading i_size once.

Also removed m local variable declaration/usage apparently outdated.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I5d931d5254b970e7031363f37114d0bad8b573fa
Reviewed-on: http://review.whamcloud.com/13707
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4306 test: bump grace time in test_4a of s-q 04/13704/2
Niu Yawei [Tue, 10 Feb 2015 02:37:24 +0000 (21:37 -0500)]
LU-4306 test: bump grace time in test_4a of s-q

Use longer grace time in test_4a of s-q to make it more
tolerance on timing.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I8580779fe2f7d2f4bb8e119be78b574fb6ac01cb
Reviewed-on: http://review.whamcloud.com/13704
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5937 lfs: ensure a valid directory size in lfs find 56/13456/4
John L. Hammond [Sun, 18 Jan 2015 02:31:59 +0000 (18:31 -0800)]
LU-5937 lfs: ensure a valid directory size in lfs find

For a striped directory (and a striped file) the size returned by
LL_IOC_MDC_GETINFO may not be vaild. In cb_find_init() if the size of
a directory is needed then get it by calling fstat().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Iddb9aa8e6664a09ff866a3995741cae17e1c9962
Reviewed-on: http://review.whamcloud.com/13456
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6070 libcfs: provide separate buffers for libcfs_*2str() 85/13185/6
Dmitry Eremin [Thu, 25 Dec 2014 12:12:02 +0000 (15:12 +0300)]
LU-6070 libcfs: provide separate buffers for libcfs_*2str()

Provide duplicates with separate buffers for libcfs_*2str() functions.

Replace libcfs_nid2str() with libcfs_nid2str_r() function in critical
places.

Provide buffer size for nf_addr2str functions.

Use __u32 as nf_type always

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I7505271954745d1b1e288ef4e09a7f52bd970536
Reviewed-on: http://review.whamcloud.com/13185
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-5843 tests: fix recovery-small test_61 53/12653/6
Mikhail Pershin [Mon, 10 Nov 2014 17:12:39 +0000 (20:12 +0300)]
LU-5843 tests: fix recovery-small test_61

Test used obdfilter last_id as number while it is OID,
e.g. 0x100000000:16. Patch fixes test to exract object ID
from OID.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If921cf41253450ab035a75be6fb34145aee1a197
Reviewed-on: http://review.whamcloud.com/12653
Tested-by: Jenkins
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4839 tests: wait for copytool start sanity-hsm/60 31/13731/2
Nathaniel Clark [Wed, 11 Feb 2015 13:48:12 +0000 (08:48 -0500)]
LU-4839 tests: wait for copytool start sanity-hsm/60

Wait for copytool to start transfer before checking progress interval.
copytool, in certain environments (heavily loaded NFS backed target),
can take an extrodinarly long time (>30s) to open destination file.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I56908a16240b61a51fe1395a8104eddc6aa3131f
Reviewed-on: http://review.whamcloud.com/13731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6137 ldiskfs: simplify nocmtime patch 05/13705/2
Niu Yawei [Tue, 10 Feb 2015 03:21:00 +0000 (22:21 -0500)]
LU-6137 ldiskfs: simplify nocmtime patch

Simplify the nocmtime patch by patching only ext4_current_time(),
this fixed the defect that original patch doesn't handle setacl
code path, it can also avoid the risk of future changes adding
new places that needs to be fixed.

Remove the obsolete xattr-no-update-ctime patch.

Signed-off-by: Anreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I02928c4f867e9476f0bc1815dd3256e3d79dadf7
Reviewed-on: http://review.whamcloud.com/13705
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
5 years agoLU-4223 tests: fix conf-sanity test_32 typo 65/13265/2
Andreas Dilger [Wed, 7 Jan 2015 10:16:10 +0000 (03:16 -0700)]
LU-4223 tests: fix conf-sanity test_32 typo

The t32_wait_til_devices_gone() function incorrectly calls
"lctl devices_list" instead of "lctl device_list" if there
is a timeout waiting for the loop devices to be cleaned up.
Since this is only used for debugging output after an error,
it wasn't actually causing any additional failures.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I858e789b16251835bce7af46e4f5233c95500c1e
Reviewed-on: http://review.whamcloud.com/13265
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat 97/13097/6
Brian Behlendorf [Wed, 17 Dec 2014 00:14:17 +0000 (16:14 -0800)]
LU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat

The sa_spill_alloc()/sa_spill_free() interfaces have been retired.
Callers may either use the more memory efficient zio_buf_alloc()/
zio_buf_free() which are now exported, or they may use their own
allocator.

For the purposes of this patch an osd_zio_buf_alloc()/
osd_zio_buf_free() wrapper function was introduced which layers
on whichever interface is provided.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Id1d19d7b4c440808b8b3fd042f687b10c1b869f2
Reviewed-on: http://review.whamcloud.com/13097
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-6038 osd-zfs: Avoid redefining KM_SLEEP 96/13096/5
Brian Behlendorf [Tue, 16 Dec 2014 23:24:28 +0000 (15:24 -0800)]
LU-6038 osd-zfs: Avoid redefining KM_SLEEP

Due to some long overdue memory management cleanup in the ZoL kmem
implementation the definition of KM_SLEEP has changed.  This change
was expected to be transparent to consumers but it causes issues
for Lustre because it explicitly redefines KM_SLEEP.  This was
originally done to avoid overriding the Linux slab interfaces.

This change implements a more portable fix.  Instead of preventing
the inclusion of the kmem.h header by setting the guard.  The
kmem_cache_* preprocessor macros are explictly undefined to make
the Linux interface available.

The related ZoL pull requests are as follows:

  https://github.com/zfsonlinux/spl/pull/414
  https://github.com/zfsonlinux/zfs/pull/2918

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Id1d19d7b4c440808b8b3fd042f687b10c1b869f3
Reviewed-on: http://review.whamcloud.com/13096
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2828 test: Remove tests from ALWAYS_EXCEPT list 57/13757/2
James Nunez [Fri, 13 Feb 2015 17:12:30 +0000 (10:12 -0700)]
LU-2828 test: Remove tests from ALWAYS_EXCEPT list

conf-sanity tests 59 and 64 were added to the ALWAYS_EXCEPT list
in commit b2c829b7be757cd2bc523ab0d2857a77eeb7a349 for LU-2469.

Commit 1e7845ecbe5f3e8ac1aa0d3e345e6cf6cf6f0543, for LU-2828, resolves
the cause of the conf-sanity test 59 and 64 failures.

conf-sanity test 59 and 64 need to be removed from the except list.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I4b70485f91e0096c2e4387ebcdc95cf5720a7e16
Reviewed-on: http://review.whamcloud.com/13757
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2194 test: remove test_19b from except list 71/13671/2
Hongchao Zhang [Wed, 19 Nov 2014 15:45:33 +0000 (23:45 +0800)]
LU-2194 test: remove test_19b from except list

the patch to fix the problem has been landed, test_19b in
recovery_small should be removed from except list.

Change-Id: I748a7dfb4f70a42a0f17ab93803cb2d6d05b32db
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/13671
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-3680 osd: reduce osd_thread_info in ldiskfs osd 26/9726/26
Alex Zhuravlev [Wed, 19 Mar 2014 08:20:16 +0000 (12:20 +0400)]
LU-3680 osd: reduce osd_thread_info in ldiskfs osd

by unioning few rarely used fields. now the structure should
fit a page:

(gdb) p sizeof(struct osd_thread_info)
$1 = 3296

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I75d5c6fefa41884390ce155781e0963884a3ad2c
Reviewed-on: http://review.whamcloud.com/9726
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
5 years agoLU-5264 obdclass: fix race during key quiescency 03/13103/3
Bruno Faccini [Wed, 17 Dec 2014 09:57:07 +0000 (10:57 +0100)]
LU-5264 obdclass: fix race during key quiescency

Upon umount, presumably of last device using same OSD back-end,
to prepare for module unload, lu_context_key_quiesce() is run to
remove all module's key reference in any context linked on
lu_context_remembered list.
Threads must protect against such transversal processing when
exiting from its context.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: If2c8199fa764236308b49950672129a63b8877f5
Reviewed-on: http://review.whamcloud.com/13103
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6301 llite: cleanup open handle for client open failure 09/13709/13
Fan Yong [Sun, 30 Nov 2014 00:26:29 +0000 (08:26 +0800)]
LU-6301 llite: cleanup open handle for client open failure

For open case, the client side open handling thread may hit error
after the MDT grant the open. Under such case, the client should
send close RPC to the MDT as cleanup; otherwise, the open handle
on the MDT will be leaked there until the client umount or evicted.

If the LFSCK marks LU_OBJECT_HEARD_BANSHEE on the MDT-object that is
opened by others for repairing some inconsistency, such as repairing
multiple-referenced OST-object, because the leaked open handle still
references the MDT-object, then it will block the subsequent threads
that want to locate such object via FID.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I1fff2cde179b039e3bee562ef79d5cf3587fe3c8
Reviewed-on: http://review.whamcloud.com/13709
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6280 lod: delete xattr on striped dir 67/13867/2
wang di [Tue, 24 Feb 2015 04:22:03 +0000 (20:22 -0800)]
LU-6280 lod: delete xattr on striped dir

In lod_xattr_del(), it need delete EA on all stripes of
striped directory.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I398a03d6a41daee34a344104d67cf8efa7d97f6a
Reviewed-on: http://review.whamcloud.com/13867
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6312 lfsck: modify llsd_master_list with spin_lock 21/13921/2
Fan Yong [Thu, 4 Dec 2014 14:00:50 +0000 (22:00 +0800)]
LU-6312 lfsck: modify llsd_master_list with spin_lock

There was spin_lock leak in layout LFSCK lfsck_layout_slave_quit,
that may cause modifying lfsck_layout_slave_data::llsd_master_list
without spin_lock when others traverses such list with spin_lock,
as to the later one(s) access invalid RAM or fall into soft-lockup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I61749ebd6c36d4b21eb20bcc1c46dbe16a1c7f2c
Reviewed-on: http://review.whamcloud.com/13921
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6256 test: Skip sanity test_184e if MDS version older than 2.6.94 45/13845/4
Wei Liu [Mon, 23 Feb 2015 21:19:05 +0000 (13:19 -0800)]
LU-6256 test: Skip sanity test_184e if MDS version older than 2.6.94

Skip sanity test_184e if MDS version older than 2.6.94

Change-Id: Ib491b079a3adc998a12d9bbcb7985ad2e718453b
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/13845
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5938 mdd: fixed oops when dereferencing structure 19/13619/5
Frank Zago [Tue, 3 Feb 2015 18:37:00 +0000 (12:37 -0600)]
LU-5938 mdd: fixed oops when dereferencing structure

In mdd_changelog_ns_store() and mdd_changelog_data_store(),
lu_ucred(env) can be NULL, so do not dereference it.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I45d0cbbb171f05ee1d04e628a3b31c256e0d3951
Reviewed-on: http://review.whamcloud.com/13619
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoRevert "LU-5417 lfs: fix comparison between signed and unsigned" 03/13903/2
Oleg Drokin [Fri, 27 Feb 2015 07:43:55 +0000 (07:43 +0000)]
Revert "LU-5417 lfs: fix comparison between signed and unsigned"

This change is incorrect after all. While it's a noop on x86_64, it's a very important overflow check for 32bit arches.

This reverts commit b5b354a75b5e697e90892878ecb26459cb9a6a21.

Change-Id: I8810da3407d91e63c6e1c062a483a26ffc1bcd97
Reviewed-on: http://review.whamcloud.com/13903
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5912 build: Fix XeonPhi build 30/13730/2
Dmitry Eremin [Wed, 11 Feb 2015 13:39:31 +0000 (16:39 +0300)]
LU-5912 build: Fix XeonPhi build

Need an extra check for old kernel style parameters.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I92b1b8579d2190bf526b3194cd83d0917fb3b4af
Reviewed-on: http://review.whamcloud.com/13730
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6216 tests: compile fixes for PPC64, and for clang 00/13800/2
Frank Zago [Thu, 19 Feb 2015 00:56:34 +0000 (18:56 -0600)]
LU-6216 tests: compile fixes for PPC64, and for clang

Fix the following warnings for PPC64:
  llapi_hsm_test.c: In function 'test101_progress':
  llapi_hsm_test.c:563: error: format '%llu' expects type 'long long
    unsigned int', but argument 8 has type '__u64'

and move the nested functions outside their current functions since
clang doesn't support them.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I034b097f3817a5919adcb8dc3465b00833174f63
Reviewed-on: http://review.whamcloud.com/13800
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoMove master branch to 2.8 development 2.7.50 v2_7_50 v2_7_50_0
Oleg Drokin [Fri, 27 Feb 2015 05:56:17 +0000 (00:56 -0500)]
Move master branch to 2.8 development

Change-Id: If8635d108b6a10b02e01b747b694bdfab4594ba2
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6263 lmv: fix parent FID for migration 17/13817/8
wang di [Wed, 18 Feb 2015 18:58:29 +0000 (10:58 -0800)]
LU-6263 lmv: fix parent FID for migration

If the migrating directory is under striped directory, it needs
to set right stripe FID for its parent.

Update migration test script (sanity test_230) to do migration
under striped dir.

Add -i to test_mkdir().

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic230f9b63bc21c1391e397a0d3ff689e3f0ba5dc
Reviewed-on: http://review.whamcloud.com/13817
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Jenkins
5 years agoLU-6230 lfsck: reload OSP-object via set LOV EA on LOD-object 48/13848/2
Fan Yong [Sun, 30 Nov 2014 00:22:10 +0000 (08:22 +0800)]
LU-6230 lfsck: reload OSP-object via set LOV EA on LOD-object

Generally, we should use bottom device (OSD) to update parent
LOV EA. But because the LOD-object still references the wrong
OSP-object that should be detached after the parent's LOV EA
refreshed. Unfortunately, there is no suitable API for that.
So we have to make the LOD to re-load the OSP-object(s) via
replacing the LOV EA against the LOD-object.

Once the DNE2 patches have been landed, we can replace the
LOD device with the OSD device.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I960f42dacc8ee23dd98a2b986f0a83cb53b62c15
Reviewed-on: http://review.whamcloud.com/13848
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6138 lfsck: set async windows size properly 18/13818/5
Fan Yong [Mon, 24 Nov 2014 21:02:28 +0000 (05:02 +0800)]
LU-6138 lfsck: set async windows size properly

If the async windows size is set as zero, then the LFSCK main engine
on the MDT will pre-load objects as fast as possible. Under such case,
if the peer server(s) cannot handle the pre-load requests in time, it
will cause a lot of pre-load requests waiting on the MDT as to memory
pressure. To avoid such trouble, we will forbid to set the LFSCK async
windows size as zero or other too large (> LFSCK_ASYNC_WIN_MAX) valid.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3468236a4a0705ea60b49704583b051c99c77cd5
Reviewed-on: http://review.whamcloud.com/13818
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-5791 lfsck: use bottom device to locate object 92/13392/10
Fan Yong [Mon, 24 Nov 2014 09:57:48 +0000 (17:57 +0800)]
LU-5791 lfsck: use bottom device to locate object

For the LFSCK modification, if only updates single object, or the
objects to be updated reside on the same server, in spite of local
or remote, then try to locate the object(s) against the bottom (OSD
or OSP) device; otherwise, there will be some update(s) on the local
server, and others on remote server, then either locate the object(s)
against LOD device or use two transaction for the modification.

Similarly, the transaction handle will be created on the proper device
corresponding to the object(s).

This patch also fixes some memory leak issues caused by using wrong
device for remote modification, one of the reason for LU-6138.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I09a60bed3bd49a193d57214c4252904cb4546ab2
Reviewed-on: http://review.whamcloud.com/13392
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6231 osp: prepare OUT RPC after remote transaction start 10/13710/4
Fan Yong [Thu, 20 Nov 2014 05:21:40 +0000 (13:21 +0800)]
LU-6231 osp: prepare OUT RPC after remote transaction start

According to our current transaction/dt_object_lock framework
(to make the cross-MDTs modification for DNE1 to be workable),
the transaction sponsor will start the transaction firstly, then
try to acquire related dt_object_lock if needed. Under such rules,
if we want to prepare the OUT RPC in the transaction declare phase,
then related attr/xattr should be known without dt_object_lock. But
such condition maybe not true for some remote transaction case. For
example:

For linkEA repairing (by LFSCK) case, before the LFSCK thread obtained
the dt_object_lock on the target MDT-object, it cannot know whether
the MDT-object has linkEA or not, neither invalid or not.

Since the LFSCK thread cannot hold dt_object_lock before the remote
transaction start (otherwise there will be some potential deadlock),
it cannot prepare related OUT RPC for repairing during the declare
phase as other normal transactions do.

To resolve the trouble, we will make OSP to prepare related OUT RPC
after remote transaction started, and trigger the remote updating
(send RPC) when trans_stop. Then the up layer users, such as LFSCK,
can follow the general rule to handle trans_start/dt_object_lock
for repairing linkEA inconsistency without distinguishing remote
MDT-object.

In fact, above solution for remote transaction should be the normal
model without considering DNE1. The trouble brought by DNE1 will be
resolved in DNE2. At that time, this patch can be removed.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib2ed4c290c9ae12b6f544575aa5313f0dc83a5af
Reviewed-on: http://review.whamcloud.com/13710
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5914 lfsck: dt_index_try before dt_lookup 01/13801/3
Fan Yong [Mon, 24 Nov 2014 04:08:30 +0000 (12:08 +0800)]
LU-5914 lfsck: dt_index_try before dt_lookup

Otherwise it may cause dt_lookup() LBUG when locate the parent
directory MDT-object that is not in cache.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibbed865e58d8f9a4d4b67265b02ba804efb9719e
Reviewed-on: http://review.whamcloud.com/13801
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-5275 gnilnd: Add definition for PDE_DATA 27/13527/4
James Simmons [Tue, 10 Feb 2015 23:37:30 +0000 (18:37 -0500)]
LU-5275 gnilnd: Add definition for PDE_DATA

With the move of PDE_DATA to lprocfs_status.h there
was one klnd driver, gnilnd, that needed this define.
So the simple solution is to include the needed header.

Change-Id: I0b2bbc8d2efeab8e253f11b0e58df51c0002d5ae
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13527
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5604 tgt: return missed fail ids 32/12232/7
Liang Zhen [Mon, 17 Nov 2014 15:35:54 +0000 (23:35 +0800)]
LU-5604 tgt: return missed fail ids

OBD_FAIL_LDLM_REPLY is missing from tgt_enqueue, and it's actually
not suitable for tgt_enqueue anymore because tgt_enqueue() is a
common handler now.

This patch includes a few changes:
- tgt_enqueue sets tgt_session_info::tsi_reply_fail_id to
  OBD_FAIL_MGS/MDS/OST_LDLM_REPLY_NET based on type of target.

- rewrite test_52 of replay-single, the only reason that test_52
  can pass is because there is a typo:

  $CHECKSTAT -t file $DIR/$tfile-* which should be $DIR/$tfile

- add definitions for OBD_FAIL_LDLM_SRV_CP/BL/GL_AST and resolve
  OBD_FAIL conflictions

- OBD_FAIL_UPDATE_OBJ_NET_REP was renamed to
  OBD_FAIL_OUT_UPDATE_NET_REP but referenced with old name in tests.

- OBD_FAIL_MDS_FAIL_LOV_LOG_ADD check is obsoleted as well as tests.
  Meanwhile the OSP code was updated to fix panic in case of error.

- OBD_FAIL_TGT_LAST_REPLAY is removed along with test. It was never
  used and it seems it was even introduced by mistake.

Test-Parameters: envdefinitions=SLOW=yes alwaysuploadlogs testlist=replay-dual,replay-single
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If5113e459f5628047e17114b6bc20ba910f3c142
Reviewed-on: http://review.whamcloud.com/12232
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6138 lfsck: NOT hold reference on pre-loaded object 66/13666/5
Fan Yong [Thu, 20 Nov 2014 04:55:33 +0000 (12:55 +0800)]
LU-6138 lfsck: NOT hold reference on pre-loaded object

To improve the LFSCK performance, the LFSCK main engine will pre-load
the object locally or remotely, then generate related LFSCK request
that reference the pre-loaded object, and then push the request into
related LFSCK pipeline. The LFSCK assistant thread will handle the
LFSCK request some later asynchronously.

Originally, the LFSCK request holds the pre-loaded object reference,
so the assistant thread can handle it directly without locating the
object by FID again. But holding the object reference will cause the
object cannot be purged out RAM. If some LFSCK request has held the
object, and some other unlinked the object before the LFSCK assistant
thread handling the LFSCK request, then the unlinked object will be
cached in RAM until the last reference released. Because the LFSCK
main engine and assistant thread run asynchronously, we do not know
when the LFSCK request that holding the object reference will be
handled. If the assistant thread needs to locate the object with
the same FID before that, it will fall into self-deadlock for ever.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I516653aa2143bb32a5f350b314951b78dead3e79
Reviewed-on: http://review.whamcloud.com/13666
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6235 scrub: replace the stale OI mapping 45/13745/3
Fan Yong [Thu, 20 Nov 2014 04:54:59 +0000 (12:54 +0800)]
LU-6235 scrub: replace the stale OI mapping

If the OI mapping on the OST contains an invalid one, then the OI
lookup via osd_obj_map_lookup() may return -ENOENT. From the view
of OI scrub, it is indistinguishable from the case of there is no
such OI mapping, then it will cause the OI scrub to use "INSERT"
@ops for osd_scrub_refresh_mapping() to repair such inconsistency
by wrong. So the osd_obj_map_lookup() should return -ESTALE under
the case of invalid OI mapping exists, then the OI scrub can use
"UPDATE" @ops for osd_scrub_refresh_mapping() to repair.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I013125eb0aaec683ac8f56ec32a30e7858262f87
Reviewed-on: http://review.whamcloud.com/13745
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6239 doc: include missing lnetctl.8 49/13749/2
James Simmons [Thu, 12 Feb 2015 19:14:50 +0000 (14:14 -0500)]
LU-6239 doc: include missing lnetctl.8

Doing a man lnetctl currently doesn't work on system
with lustre installed. This is due to lnetctl.8 does
not get included in generated rpms. This simple fix
ensure lnetctl.8 is included in the rpms.

Change-Id: I72e2ef2841f5936e1d0def538c239ee2da32d7c3
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/13749
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5873 ldiskfs: osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed 00/12600/9
Andriy Skulysh [Mon, 10 Nov 2014 10:48:11 +0000 (12:48 +0200)]
LU-5873 ldiskfs: osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed

The bug happens when  16TB-4KB limit is exceeded during write.

Add check for maximum file size on client and server sides.

Xyratex-bug-id: MRP-2131
Change-Id: I73f0ee803670ada869c2618f275049948668848e
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: http://review.whamcloud.com/12600
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-6222 statahead: add to list before make ready 08/13708/2
Lai Siyao [Tue, 10 Feb 2015 13:44:44 +0000 (21:44 +0800)]
LU-6222 statahead: add to list before make ready

__sa_make_ready() set entry ready before adding to list, so that
revalidate_statahead_dentry()->sa_kill() may free an entry which
is not in any list yet.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I0b5f7200fb74c88450133d66bf7bf38d9355036f
Reviewed-on: http://review.whamcloud.com/13708
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>