Whamcloud - gitweb
fs/lustre-release.git
7 years agoLU-3314 scrub: use special fixed FID for .lustre
Fan Yong [Sat, 4 May 2013 13:39:32 +0000 (21:39 +0800)]
LU-3314 scrub: use special fixed FID for .lustre

For lustre-2.x (x <= 3), the ".lustre" has NO FID-in-LMA,
so the client will get IGIF for the ".lustre" object when
the MDT restart.

From the OI scrub view, when the MDT upgrade to Lustre-2.4,
it does not know whether there are some old clients cached
the ".lustre" IGIF during the upgrading. Two choices:

1) Generate IGIF-in-LMA and IGIF-in-OI for the ".lustre".
   It will allow the old connected clients to access the
   ".lustre" with cached IGIF. But it will cause others
   on the MDT failed to check "fid_is_dot_lustre()".

2) Use fixed FID {FID_SEQ_DOT_LUSTRE, FID_OID_DOT_LUSTRE, 0}
   for ".lustre" in spite of whether there are some clients
   cached the ".lustre" IGIF or not. It enables the check
   "fid_is_dot_lustre()" on the MDT, although it will cause
   that the old connected clients cannot access the ".lustre"
   with the cached IGIF.

Usually, it is rare case for the old connected clients
to access the ".lustre" with cached IGIF. So we prefer
to the solution 2).

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ifa491850ddab0de0b67aab124dc206ad4f714428
Reviewed-on: http://review.whamcloud.com/6309
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3187 ost: check pre 2.4 echo client in obdo validation
wang di [Tue, 7 May 2013 07:00:46 +0000 (00:00 -0700)]
LU-3187 ost: check pre 2.4 echo client in obdo validation

Because old echo client still uses o_id/o_seq for objid,
but new echo client will uses FID for the objid. Add
OBD_CONNECT_FID for 2.4 echo client, so 2.4 OST will
convert o_id/o_seq to FID if the request from old echo
client.

Add local flag OBD_FL_OSTID for o_flags to indicate
OST does not support FID yet, then echo client will
still send o_id/o_seq to OST.

cleanup ost_validate_obdo

Test-Parameters: clientjob=lustre-b2_1 clientbuildno=197 testlist=sanity,obdfilter-survey
Test-Parameters: serverjob=lustre-b2_1 serverbuildno=197 testlist=sanity,obdfilter-survey
Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I6001c813b668cf53a66d0d9d74f322bad63765ed
Reviewed-on: http://review.whamcloud.com/6287
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
7 years agoLU-3303 test: sanity-quota test_18 incomplete error message
James Nunez [Thu, 9 May 2013 17:04:45 +0000 (11:04 -0600)]
LU-3303 test: sanity-quota test_18 incomplete error message

test_18 is sending the error message to quota_error with
two string literals with no line continuation. quota_error
essentially ignores the second string literal.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I54e5cccf307349fab072d6fac2832f6bd98a3bfd
Reviewed-on: http://review.whamcloud.com/6302
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3100 tests: Skip recovery-small test_111
James Nunez [Fri, 12 Apr 2013 23:19:28 +0000 (17:19 -0600)]
LU-3100 tests: Skip recovery-small test_111

Recovery-small test_111 now checks the metadata server version and
will run the test for versions 2.3.62 and above. The test will be
skipped for all other server versions.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I58aa58dd7cd8a27209fb1c4ee258de9d3a84b144
Reviewed-on: http://review.whamcloud.com/5975
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoRevert "LU-3188 osc: shorten IO calling path"
Oleg Drokin [Fri, 10 May 2013 18:28:14 +0000 (14:28 -0400)]
Revert "LU-3188 osc: shorten IO calling path"

This commit seems to have caused more problems than it fixed.

This reverts commit 83ae17df2bdce837e62473aec27c03d67312c8ea.

7 years agoLU-3302 llog: Do not use ostid swab for llogid
wang di [Tue, 7 May 2013 18:43:43 +0000 (11:43 -0700)]
LU-3302 llog: Do not use ostid swab for llogid

Since logid still use id/seq format in the request,
it will be swabbed by its own swab func, instead of
using ostid swab, which might see logid as FID incorrectly.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I35f554e99a765384218ee3a357c931d6be1de6b3
Reviewed-on: http://review.whamcloud.com/6305
Tested-by: Hudson
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2886 scrub: non-fixed FIDs for some local files
Fan Yong [Sat, 4 May 2013 06:16:01 +0000 (14:16 +0800)]
LU-2886 scrub: non-fixed FIDs for some local files

For old lustre-2.x (x < 4), the local files PENDING/lfsck_bookmark
used fixed/reserved FIDs when created. The cases have been changed
to use variable local FIDs since lustre-2.4. So adjust OI scrub to
match the changes.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I6d1f2f19afa3d777b16838ae03f46a39a58b537a
Reviewed-on: http://review.whamcloud.com/6299
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3301 utils: Replace %llu with LPU64
Christopher J. Morrone [Thu, 9 May 2013 00:28:02 +0000 (17:28 -0700)]
LU-3301 utils: Replace %llu with LPU64

Replace incorrect use of %llu with LPU64 to allow compilation
on ppc64.

Change-Id: I85794527a43ed1577cf43ddb3470a9c8d070f11b
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/6296
Reviewed-by: Bobi Jam <bobijam.xu@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2886 obdclass: use common way to store lastid
Mikhail Pershin [Mon, 29 Apr 2013 16:34:25 +0000 (20:34 +0400)]
LU-2886 obdclass: use common way to store lastid

Local files last id are stored in root in files named seq-xxx-lastid
while lastid for OST objects is stored in O/seq/LAST_ID special
object with zero OID and handled by OSD.
Patch reworks local files lastid to be stored in O/seq/LAST_ID too
and using the same format.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I72710e84cf0f7a0903b0b88ac3f9432eb59ea716
Reviewed-on: http://review.whamcloud.com/6199
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
7 years agoLU-3271 lov: only dump header in lsm_lmm_verify
Andreas Dilger [Fri, 3 May 2013 18:49:17 +0000 (12:49 -0600)]
LU-3271 lov: only dump header in lsm_lmm_verify

If lsm_lmm_verify_*() find an error in the lov_mds_md header
structure, don't dump the full stripe information, since this
can be totally bogus (e.g. if stripe_count == -1 or similar).
Instead, just dump the header information for debugging.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Idf8c8bc35b156181aff9f0c5f0ea1f73c89e33d1
Reviewed-on: http://review.whamcloud.com/6261
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2886 obdclass: remove obsoleted md_local_file.c
Mikhail Pershin [Sun, 21 Apr 2013 16:15:29 +0000 (20:15 +0400)]
LU-2886 obdclass: remove obsoleted md_local_file.c

This library is not used anymore and is replaced by
local_storage.c. Patch removed last remnants of it.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I4d1371908db898efc8c5cd650357c449f369f51b
Reviewed-on: http://review.whamcloud.com/6107
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3281 osc: some cleanup to reduce stack overflow chance
Bobi Jam [Mon, 6 May 2013 15:38:21 +0000 (23:38 +0800)]
LU-3281 osc: some cleanup to reduce stack overflow chance

ptlrpcd_add_req() will wake_up other process, do not hold a spinlock
before calling ptlrpcd_queue_work()->ptlrpcd_add_req().

If current process is allocating memory, memory shrinker could get to
osc_lru_del(), don't call osc_lru_shrink() further since it could
lead a long calling chain.

Use static string OES_STRINGS in OSC_EXTENT_DUMP() to reduce stack
footprint.

Alloc crattr on heap for osc_build_rpc() to reduce stack footprint.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I1a1ce0b46850773a2ae45ce16be6b708adc40ab8
Reviewed-on: http://review.whamcloud.com/6270
Tested-by: Hudson
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3280 ldlm: suppress useless lock RPC for layout
jcl [Mon, 6 May 2013 11:12:55 +0000 (13:12 +0200)]
LU-3280 ldlm: suppress useless lock RPC for layout

In ldlm_lock_decref_internal() when l_lvb_data is freed to
reduce memory consumption, LDLM_FL_LVB_READY is not
cleared, so later when the lock is reused lvb is not
updated. But clearing LDLM_FL_LVB_READY forces layout refetch
at each file access, so the better is to remove the optimization.
The use case is after a restore in HSM.

Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Change-Id: I3aa56cf39fe34941d227400410b27db32479b1b1
Reviewed-on: http://review.whamcloud.com/6268
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3190 mdd: not return linkEA for dead obj
wang di [Thu, 2 May 2013 13:23:59 +0000 (06:23 -0700)]
LU-3190 mdd: not return linkEA for dead obj

1. Not return linkEA for dead object.
2. Check lma_self_fid to match object FID after get
real LMA from the object.
3. clear oi cache during oi delete.
4. correct error value if ldlm_handle_enqueue return
some value other than ldlm_err_t(for example -ESTALE).

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I1d2345eb01ff58584ffba31f86bb408396780aeb
Reviewed-on: http://review.whamcloud.com/6252
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoNew tag 2.3.65 2.3.65 v2_3_65 v2_3_65_0
Oleg Drokin [Tue, 7 May 2013 19:15:09 +0000 (15:15 -0400)]
New tag 2.3.65

Change-Id: I65f1eca5083842566443fd577e5a88f1dc57e88e

7 years agoLU-3275 osd-ldiskfs: re-order osd device init/fini
Fan Yong [Wed, 1 May 2013 11:22:47 +0000 (19:22 +0800)]
LU-3275 osd-ldiskfs: re-order osd device init/fini

There was race condition between the background OI scrub thread and
osd_device_init0: OI scrub thread may try to access non-initialized
osd_device::od_ost_map. So the osd initization/start process should
NOT trigger OI scrub until all the OI (OI files, /O, and ect.) have
been initialized. Reverse ordre for osd divice fini.

Test-Parameters: testlist=sanity-scrub

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I99a2a72ace947a0f79ace5c6f445cf896d884e63
Reviewed-on: http://review.whamcloud.com/6267
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3117 build: zfs-0.6.1 kmod+dkms compatibility
Brian Behlendorf [Thu, 28 Mar 2013 19:18:27 +0000 (12:18 -0700)]
LU-3117 build: zfs-0.6.1 kmod+dkms compatibility

With the release of zfs-0.6.1 the default install location of
the zfs kmod headers and objects has changed.  The kmod headers
which are common for a zfs version are now installed under
/usr/src/zfs-<version>/ path.  The objects, which are kernel
specific, are installed under /usr/src/zfs-<version>/<kernel>/.

This was done just prior to the official 0.6.1 release because
this scheme satisfies the packaging requirements of the major
distributions.  Making the change now means we shouldn't need
to change it again.

To accomidate this change the lustre-build-zfs.m4 has been
updated in the following ways:

* The new zfs header and object paths were added to the list
  of default search paths.  The DKMS build paths were also added
  to allow compilation against zfs-kmod or zfs-dkms packages.

* Support for building the spl and zfs code recursively as
  part of the Lustre build process was removed.

* The lustre-osd-zfs packages 'Requires' line was changed to
  require zfs-kmod.  Either the zfs-kmod or zfs-dkms packages
  can be used to satisfy this requirement.

* Fix incorrect usage of @ZFS_OBJ@ in osd-zfs/Makefile.in,
  the include directory us under @ZFS@ with the headers.
  These happens to be the same location before so it never
  caused issues.

* EXTRA_LIBZFS_INCLUDE renamed ZFS_LIBZFS_INCLUDE, this was
  done for consistency.

* Failing to build ldiskfs should not automatically disable
  all server support.  The zfs osd may still be buildable.

* General m4 cleanup and simplification of lustre-build-zfs.m4.

* Ensure new zfs/spl build correctly with lbuild.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Change-Id: Ib686211c4f9ace39a41053ce8a20112d1121def9
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-on: http://review.whamcloud.com/5960
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alexey Shvetsov <alexxy@gentoo.org>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2752 build: Enhance build for cross compilation for MIC
Dmitry Eremin [Wed, 13 Mar 2013 13:40:47 +0000 (17:40 +0400)]
LU-2752 build: Enhance build for cross compilation for MIC

Enhance lustre build for cross compilation for the Intel(R) Xeon
Phi(TM) card. In addition to standard build the GNU cross
toolchain for the Intel(R) Xeon Phi(TM) can be used to produce
client binaries for the Intel(R) Xeon Phi(TM) card. To enable
this just specify appropriate --host and --build option for
./configure.

For example, to produce Lustre client binaries for Intel(R) Xeon
Phi(TM) card just execute the following commands:

NOTE: You should have "intel-mic-gpl-<version>.x86_64" package
installed and MIC GPL sources unpacked in /opt/intel/mic/src.

export PATH=/usr/linux-k1om-4.7/bin:$PATH

sh ./autogen.sh

./configure --with-linux=/opt/intel/mic/src/card/kernel \
    --disable-server --without-o2ib \
    --host=x86_64-k1om-linux --build=x86_64-pc-linux

make

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I4347c65f67bd836116532989c2132457f5eee934
Reviewed-on: http://review.whamcloud.com/5273
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-2915 doc: update Lustre doc for LFSCK
Fan Yong [Mon, 15 Apr 2013 22:05:15 +0000 (06:05 +0800)]
LU-2915 doc: update Lustre doc for LFSCK

Add section to describe the new LFSCK.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I11f1763a1f1527f8a7c0893df0fbce2788679775
Reviewed-on: http://review.whamcloud.com/5913
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Hudson
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoRevert "LU-2139 osc: Track and limit "unstable" pages"
Oleg Drokin [Mon, 6 May 2013 20:26:06 +0000 (16:26 -0400)]
Revert "LU-2139 osc: Track and limit "unstable" pages"

This seems to be causing multiple issues: LU-3274, LU-3277

This reverts commit 5661651b2cc6414686e7da581589c2ea0e1f1969.

7 years agoLU-1095 debug: quiet noisy console error messages
Andreas Dilger [Fri, 3 May 2013 23:25:47 +0000 (17:25 -0600)]
LU-1095 debug: quiet noisy console error messages

Quiet a number of overly noisy and unhelpful console error
messages.  Improve the format of other nearby errors.

In the case of {lod,lov}_fix_desc_stripe_size(), this doesn't
even need a console message unless it is actually changing
some stripe size that is below the minimum.  Typically it is
only zero and is being bumped up to the default value.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I0830e514c426080b8c3446c0a1a359c313b530d9
Reviewed-on: http://review.whamcloud.com/6264
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2193 ofd: check object exists before orphan destroy
Bobi Jam [Mon, 6 May 2013 02:05:05 +0000 (10:05 +0800)]
LU-2193 ofd: check object exists before orphan destroy

MDS replay object destroys after recovery could meet non-existing
objects, skip it before following futile actions.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ibf233a07fa73d3226fdde5e2c020e73f51428f74
Reviewed-on: http://review.whamcloud.com/6266
Tested-by: Hudson
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3267 utils: missing setting echo seq for getattr/setattr
wang di [Fri, 3 May 2013 07:00:10 +0000 (00:00 -0700)]
LU-3267 utils: missing setting echo seq for getattr/setattr

It should set echo seq before do echo getattr/setattr, otherwise
echo_client will regard it as the object with MDT0 sequence.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I1ea6cf0c9ef1edc5a81ba6f50345916deea4f95c
Reviewed-on: http://review.whamcloud.com/6263
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
7 years agoLU-2742 oi: handle IGIF lookup
Fan Yong [Mon, 29 Apr 2013 04:15:27 +0000 (12:15 +0800)]
LU-2742 oi: handle IGIF lookup

For the MDT upgraded from 1.8 device, OI scrub will immobilize the
IGIF and insert the mapping for "IGIF <=> ino#" into the OI file.
There are three cases to be processed:

1) After the upgrading, all IGIFs have been processed and inserted
   into OI files. Then any IGIF object lookup should be processed
   as normal FID does, means via OI. If no entry in the OI file,
   then the IGIF object does not exist.

2) During the upgrading, some IGIFs may be already in OI files,
   others may be not yet. Under such case, lookup an IGIF in OI
   files may return -ENOENT, but it does not means the IGIF obj
   does not exist. Since the new immobilize IGIF is the same as
   the original one before backup/restore, OSD will generate
   local identifier from IGIF directly as old 2.x did.

3) The upgrading is paused, and backup/restore. Means upgrading
   from 1.8 may be partly processed, but some clients may hold
   some immobilized IGIFs, and use them to access related objects.
   Under such case, OSD does not know whether an given IGIF has
   been processed or to be processed, and it also cannot generate
   local ino#/gen# directly from the immobilized IGIF because of
   the backup/restore. Then force OSD to lookup the given IGIF
   in OI files, and if no entry, then ask the client to retry
   after upgrading completed. No better choice.

Test-Parameters: testlist=sanity-scrub

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7fff7898e2fcdf6b075bb04817f3b825c9a911b0
Reviewed-on: http://review.whamcloud.com/6254
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3188 osc: shorten IO calling path
Bobi Jam [Sun, 28 Apr 2013 09:11:40 +0000 (17:11 +0800)]
LU-3188 osc: shorten IO calling path

By using osc_io_unplug_aync() for osc_queue_sync_pages() to shorten
the IO calling path, to reduce the chance of stack overflow.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I06112d9fb7b069d68c7945641e65d6430d0bb380
Reviewed-on: http://review.whamcloud.com/6191
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Keith Mannthey <kemannthey@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3245 mdd: Fixed OBF of the FS root
Henri Doreau [Mon, 29 Apr 2013 13:07:42 +0000 (15:07 +0200)]
LU-3245 mdd: Fixed OBF of the FS root

Added a check to prevent fid access to the filesystem root directory
from returning EINVAL.

Added a sanity check accordingly.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iffb8bdb71377b878d165e9e48049c6b2ebd14859
Reviewed-on: http://review.whamcloud.com/6209
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2677 utils: update e2fsprogs to 1.42.7.wc1
Andreas Dilger [Thu, 2 May 2013 03:42:59 +0000 (21:42 -0600)]
LU-2677 utils: update e2fsprogs to 1.42.7.wc1

Update the required e2fsprogs version to 1.42.7.wc1 to handle
smaller on-disk LMA and filter_fid structure sizes.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Id5bcbe1cd1bc2c170e253dbcb0bd0beb1a46c36f
Reviewed-on: http://review.whamcloud.com/6233
Tested-by: Hudson
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3179 fids: fix compilation error with gcc 4.7.2
Alex Zhuravlev [Tue, 16 Apr 2013 15:21:13 +0000 (19:21 +0400)]
LU-3179 fids: fix compilation error with gcc 4.7.2

initialize oi.oi.oi_id which gcc 4.7.2 is afraid of being
used later.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: Iaeb6eac01340d80786463efe67ff74479017f074
Reviewed-on: http://review.whamcloud.com/6064
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2442 kernel: SLES11 performance fixes and updates
James Simmons [Sun, 28 Apr 2013 13:46:37 +0000 (09:46 -0400)]
LU-2442 kernel: SLES11 performance fixes and updates

This patch completes the SLES11 SP2 kernel support. The first problem
solved is the metadata performance slow down due to the heavy dqptr_sem.
So we remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
unchanged), and the functionality of this lock is implemented
by other locks:

  * i_dquot is protected by i_lock, however only this pointer, the
    content of this struct is by dq_data_lock.

  * Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.

The second are small optimizations block level tunable optimizations.
Added in support to simulate fail over for testing purposes. The bulk
of the changes are removal of obsolete SLES10 and syncing SLES11 SP1
kernel side support to 2.6.32.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I474bdd8cf8293d2918273afa074e442aafa28d4c
Reviewed-on: http://review.whamcloud.com/6168
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3208 tests: Fix typo in replay-single/70b
Nathaniel Clark [Tue, 23 Apr 2013 15:56:23 +0000 (11:56 -0400)]
LU-3208 tests: Fix typo in replay-single/70b

Missing space in if statement, caused syntax error:

replay-single.sh: line 1912 [: missing `]`

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I2543f9b8fe41153d6109a64d2619a8443e700d9d
Reviewed-on: http://review.whamcloud.com/6131
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3172 ost: call lustre_msg_get_opc() once inside ost_handle()
Nikitas Angelinas [Mon, 15 Apr 2013 12:17:15 +0000 (13:17 +0100)]
LU-3172 ost: call lustre_msg_get_opc() once inside ost_handle()

lustre_msg_get_opc() is called a few times inside ost_handle();
there is no real benefit from this, and we can just make one call
when entering the function.

There may be other occurrences in the source where this applies,
apart from ost_handle(), but we can at least make this change here
for now, as this is a frequently-called function.

Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Change-Id: I7a8badc30ca31cb6826463ae5390cef96dec345f
Xyratex-bug-id: MRP-698
Reviewed-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-on: http://review.whamcloud.com/6055
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3023 build: fix fuzzy logic in get_root_path()
Sebastien Buisson [Mon, 25 Mar 2013 13:52:34 +0000 (14:52 +0100)]
LU-3023 build: fix fuzzy logic in get_root_path()

Thanks to the call to llapi_is_lustre_mnt(), we are sure that
mnt.mnt_fsname contains ":/".
And, if no other mountpoint is found (len ==0), there is no reason
to abort scanning, so the check and -EINVAL can just be removed.

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Change-Id: Id77c41fbf53782d657034f152f724a3bdc83408c
Reviewed-on: http://review.whamcloud.com/5832
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
7 years agoLU-2826 tests: add useful text to error() calls
Emoly Liu [Tue, 2 Apr 2013 07:18:29 +0000 (15:18 +0800)]
LU-2826 tests: add useful text to error() calls

Many tests just call "error" without any arguments, but this does not
provide any information to Maloo about why the test fails.  This
in turn causes autovet to match the empty failure message to many
different and irrelevant bugs, making Maloo statistics inaccurate.

Change a number of error() calls to have some valid error text, and
print something more useful if error() is called without arguments.
Not all of the error() calls in sanity.sh and sanityn.sh have been
converted, but it is at least a good start at cleaning this up, and
catches the most frequent failure cases currently.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: Ie148e3e26e1ecb888ac021a8df7cd995183ebbe5
Reviewed-on: http://review.whamcloud.com/5750
Tested-by: Hudson
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
7 years agoLU-2950 lnet: Add a mechanism to configure routes from a file
Amir Shehata [Wed, 1 May 2013 02:01:59 +0000 (19:01 -0700)]
LU-2950 lnet: Add a mechanism to configure routes from a file

Created a bash script lustre_rotues_config which takes in a file
with routes configured in the following format:
<network>: { gateway: <gateway>, [hop:<hop>], [priority: <prio>] }
The script shall parse the file and generate:
lctl --net <network> add_route <gateway> [hop [priority]]
for each route.
The script can be used to unconfigure routes as well by running it
as follows:

   lustre_routes_config --cleanup <file>

In this case it will remove all routes configures via lctl del_route.

Also added another script: lustre_routes_conversion, which will be
used to convert from legacy syntax for configuring routes to the
new syntax described above. The script can be run on a file which
contains the legacy syntax and will generate a new file with the
passed in name:

   lustre_routes_conversion <legacy file> <new file>

Added two man pages to describe the usages of these scripts

Added a test case in conf-sanity.sh to test the new scripts
lustre_routes_conversion and lustre_routes_config. The test case
takes in a sample routes file which has the old synatx, run the
lustre_routes_conversion script to convert to the new syntax, then
runs the lustre_routes_config script in --dry-run to ensure that
the script configures the routes

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I57f81176c2d926fedefa8ea3be34586aa1ac9d76
Reviewed-on: http://review.whamcloud.com/5757
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-1199 build: Remove unused LB_LDISKFS_RELEASE macros
Christopher J. Morrone [Sat, 9 Feb 2013 01:32:39 +0000 (17:32 -0800)]
LU-1199 build: Remove unused LB_LDISKFS_RELEASE macros

The products of the LB_LDISKFS_RELEASE macros are unused.  They
also make some bad assumtions that complicate building against
a future lustre-devel package.  So we remove them.

If Lustre needs to know ldiskfs's version in the future, we
should add explicit version defines to the new ldiskfs_config.h.in
header that is in the works (or possibly landed by now).

Change-Id: I40b96dc72f5076ba9a9dda3f41b31088fdfd4341
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/5881
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
7 years agoLU-2677 utils: fix swabbing of f_oid
Andreas Dilger [Thu, 25 Apr 2013 08:45:06 +0000 (02:45 -0600)]
LU-2677 utils: fix swabbing of f_oid

In ll_recover_lost_found_objs it is using le64_to_cpu() to swab
the f_oid field, which is a 32-bit value.  Swab it correctly using
le32_to_cpu().

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I2679b49dc16351140dbd1170b17056489c3ebbe5
Reviewed-on: http://review.whamcloud.com/6159
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3239 ofd: refill env in ofd_get_info
wang di [Mon, 29 Apr 2013 09:31:59 +0000 (02:31 -0700)]
LU-3239 ofd: refill env in ofd_get_info

Because ofd_get_info(KEY_FIEMAP) might be called from
ptlrpc_server_handle_req_in(see the stack below),
where env might not be initialized correctly(see LBUG below),
so it refill refill in ofd_get_info.

LutreError: 19182:0:(ofd_internal.h:518:ofd_info_init()) LBUG
Pid: 19182, comm: ll_ost_io00_001
Call Trace:
[<ffffffffa044e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa044ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0e03e62>] ofd_info_init+0x92/0x130 [ofd]
[<ffffffffa0e05835>] ofd_get_info+0x2e5/0xa90 [ofd]
[<ffffffff812805cd>] ? pointer+0x8d/0x830
[<ffffffffa029f7e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs]
[<ffffffffa078528a>] nrs_orr_range_fill_physical+0x18a/0x540
[ptlrpc]
[<ffffffffa0762dd6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
[<ffffffffa073e630>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
[<ffffffffa07871d7>] nrs_orr_res_get+0x817/0xb80 [ptlrpc]
[<ffffffffa077d306>] nrs_resource_get+0x56/0x110 [ptlrpc]
[<ffffffffa077dccb>] nrs_resource_get_safe+0x8b/0x100 [ptlrpc]
[<ffffffffa0780248>] ptlrpc_nrs_req_initialize+0x38/0x90 [ptlrpc]
[<ffffffffa074cff0>] ptlrpc_main+0x1170/0x16f0 [ptlrpc]
[<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
[<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Iee4b68fe331a895c61e6ccd0a14d6c60f5f9215c
Reviewed-on: http://review.whamcloud.com/6204
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3030 build: Update Master Copyrights pre 2.4 split
Keith Mannthey [Tue, 30 Apr 2013 21:10:07 +0000 (14:10 -0700)]
LU-3030 build: Update Master Copyrights pre 2.4 split

This is the output of the tool. Please note the tool
changed to using commit dates over author dates and
there are a few spots with erroneous 2011 copyrights
that have been removed by the tool.

Please see script update for further details.

Signed-off-by: Keith Mannthey <keith.mannthey@intel.com>
Change-Id: I6600369df53c01c425f33f62d5d6c4f8f1b48498
Reviewed-on: http://review.whamcloud.com/5841
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3030 build: Update Copyright Script
Keith Mannthey [Tue, 30 Apr 2013 21:03:24 +0000 (14:03 -0700)]
LU-3030 build: Update Copyright Script

Add pevious mass copyright update and cfs wrapper changes
to the list of excluded patches.

Many files have incorrect author date set, revert to
using the git committer date for the patch date.

--follow and --author were not working properly.
Screen for Author after the git log call.  Git log
call code came via Andreas Dilger.

Signed-off-by: Keith Mannthey <keith.mannthey@intel.com>
Change-Id: I46cf74b48095b54ac58dc17b6b0f94c9cf23cb8a
Reviewed-on: http://review.whamcloud.com/6183
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
7 years agoLU-2154 osp: osd-zfs: remove 2.3.x debugging code
Andreas Dilger [Tue, 30 Apr 2013 22:23:03 +0000 (16:23 -0600)]
LU-2154 osp: osd-zfs: remove 2.3.x debugging code

When LUSTRE_VERSION_CODE is increased beyond 2.3.90 the version
based debug code generates a compile warning and fails to build.
Originally added in commit 04e1d0cb95e1ad12 and 2acb75c36511aca9.

Since we have been running with these checks for months without
problem, and it will soon be inactivated anyway, we may as well
remove the code entirely instead of just fixing it.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I1c104ef388f112c619d4ff4b7f00d17383500c1e
Reviewed-on: http://review.whamcloud.com/6219
Tested-by: Hudson
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3160 clio: don't ignore layout on writeback
Niu Yawei [Thu, 25 Apr 2013 03:05:25 +0000 (23:05 -0400)]
LU-3160 clio: don't ignore layout on writeback

In some cases such as kernel writeback, we shouldn't ignore the
layout, otherwise, it could race with layout change undergoing.

Test-Parameters: envdefinitions=DURATION=7200  clientdistro=el6 serverdistro=el6 clientcount=4  osscount=2 mdscount=2 austeroptions=-R failover=true  useiscsi=true testlist=recovery-random-scale
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ib9d0aa581de90711c92db4c631c52f1950ad5b67
Reviewed-on: http://review.whamcloud.com/6154
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3244 utils: tunefs.lustre should preserve virgin label
Alex Zhuravlev [Tue, 30 Apr 2013 19:05:21 +0000 (23:05 +0400)]
LU-3244 utils: tunefs.lustre should preserve virgin label

so that the filesystem registers can register properly on MGS,
if tunefs.lustre was used right after mkfs.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I6245e5e4d10cd0a13a4e9068a9b758da8580b537
Reviewed-on: http://review.whamcloud.com/6216
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-1606 api: compile and build lustreapi test
Richard Henwood [Fri, 26 Apr 2013 15:39:09 +0000 (10:39 -0500)]
LU-1606 api: compile and build lustreapi test

To avoid regression of the client api conf-sanity includes a new test
73 to verify the lustreapi can be compiled and linked against. All the
files in the directory $LUSTRE_TESTS_API_DIR with the extension .c are
compiled and linked. The directory is located in the build tree at
./lustre/tests/clientapi.

Signed-off-by: Richard Henwood <richard.henwood@intel.com>
Change-Id: I0ad6a1671bf7033ec2ad5bc7a82d82e468cd31c2
Reviewed-on: http://review.whamcloud.com/3440
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1199 build: Increase ldiskfs version to 4.1.0
Christopher J. Morrone [Wed, 3 Apr 2013 22:39:20 +0000 (15:39 -0700)]
LU-1199 build: Increase ldiskfs version to 4.1.0

The ldiskfs version number has not changed in quite some time.

This brings the versioning in line with what LLNL has been using
externally, and allows us to realign on the same code.

Add an explicit requirement for "lustre-ldiskfs >= 4.1.0"
to the lustre spec file at the same time.

Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Change-Id: I72c9f52093942d881ee02dad8f65c2f04dbef35e
Reviewed-on: http://review.whamcloud.com/5938
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
7 years agoLU-2038 osd: Rename do_*punch to dbo_*punch
Girish Shilamkar [Sun, 31 Mar 2013 07:22:14 +0000 (12:52 +0530)]
LU-2038 osd: Rename do_*punch to dbo_*punch

Minor fix where dt_body_operations::do_punch and
dt_body_operations::do_declare_punch were renamed to
dt_body_operations::dbo_punch and
dt_body_operations::dbo_declare_punch, respectively, to keep the
field names consistent.

Signed-off-by: Girish Shilamkar <gshilamkar@ddn.com>
Change-Id: Id807805bd69bb20460898552830a658d8a25c238
Reviewed-on: http://review.whamcloud.com/5895
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3203 mdd: check result of dt_trans_create() using IS_ERR().
John L. Hammond [Mon, 22 Apr 2013 19:18:38 +0000 (14:18 -0500)]
LU-3203 mdd: check result of dt_trans_create() using IS_ERR().

In mdd_convert_linkea() and orphan_object_destroy() check the result
of dt_trans_create() using IS_ERR().  In mdd_close() avoid passing an
error pointer to mdd_trans_stop().  In local_oid_storage_init() avoid
a spurious mutex unlock after failure in dt_trans_create().  Trivially
simplify cleanup in mdd_convert_remove_dots() and mdd_convert_lma().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4a5c53d3efa9c69c1428a8e5a875a531c9206d12
Reviewed-on: http://review.whamcloud.com/6117
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3106 ofd: create additional symlinks to osd
Alex Zhuravlev [Thu, 28 Mar 2013 17:54:47 +0000 (21:54 +0400)]
LU-3106 ofd: create additional symlinks to osd

to preserve compatibility with obdfilter stats:
read_cache_enable, readcache_max_filesize and
writethrough_cache_enable are accessible via
/proc/.../obdfilter/.. again.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: Iee5315a58c6434f4633692a80f57e68890f5c415
Reviewed-on: http://review.whamcloud.com/5873
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3086 build: fix 'uninitialized variables' errors
Sebastien Buisson [Tue, 2 Apr 2013 12:14:18 +0000 (14:14 +0200)]
LU-3086 build: fix 'uninitialized variables' errors

Fix 'uninitialized variables' defects found by Coverity version 6.5.1:
Uninitialized scalar variable (UNINIT)
Using uninitialized value.

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Change-Id: I8df7c60823e392b37880ca5ddb1ea107554784e6
Reviewed-on: http://review.whamcloud.com/5916
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1330 proc: remove osd proc handlers from lprocfs_status.c
John L. Hammond [Thu, 14 Mar 2013 18:35:36 +0000 (13:35 -0500)]
LU-1330 proc: remove osd proc handlers from lprocfs_status.c

Move lprocfs_osd_rd_{blksize,{files,kbytes}{free,avail} to dt_object.c
and rename them to lprocfs_dt_rd_blksize....  Remove the unused
function lprocfs_obd_rd_mntdev().  Do not include
lustre_{fsfilt,log,disk}.h or dt_object.h in lprocfs_status.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ic54973f8bc6ec1d440fa76aadf0dd13629ce345c
Reviewed-on: http://review.whamcloud.com/5721
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-1199 build: Improve git hook link installation
Christopher J. Morrone [Sat, 9 Feb 2013 01:40:49 +0000 (17:40 -0800)]
LU-1199 build: Improve git hook link installation

Check for .git/hooks directory before trying anything else.  Removes
one barrier to compiling lustre without .git being present.

Change-Id: I5fdf8fae9a4958099a40772d927fa59e42446a5b
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/5879
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2989 build: some make targets are broken
Dmitry Eremin [Tue, 26 Mar 2013 18:03:47 +0000 (22:03 +0400)]
LU-2989 build: some make targets are broken

Add make targets "distclean" and "maintainer-clean" in
ldiskfs/ldiskfs/Makefile. The global "make distclean" now
executes whithout an error.

Also lustre/target/.gitignore file is created to hide
message from GIT.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ic8841d11960d5aacd8293ecfcaf8c628dd743c45
Reviewed-on: http://review.whamcloud.com/5766
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alexey Shvetsov <alexxy@gentoo.org>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-1199 lbuild: Fix error handling
Christopher J. Morrone [Thu, 21 Feb 2013 00:26:11 +0000 (16:26 -0800)]
LU-1199 lbuild: Fix error handling

Improper grouping of expressions in the error() function meant
that even a simple "lbuild -h" would trigger a backtrace, and email to
someone at whamcloud.com.  That is fixed here.

Also stop sending email to a static email address on errors.

Also fix a minor typo.

Change-Id: I5a3f6131baaadc9dba23414e9e959f72fbda2679
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/5961
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian J. Murrell <brian.murrell@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2877 Add a roundtrip delay in sanity test 34h
Oleg Drokin [Fri, 29 Mar 2013 03:48:36 +0000 (23:48 -0400)]
LU-2877 Add a roundtrip delay in sanity test 34h

It seems we are still getting false failures in test 34h due to
overloaded server that takes longer than 2 seconds to process
a request.
To even things out - do another sync RPC to OST first before
starting the timeout.

Change-Id: I070345233398d7a15a105162523ef6dc81c1a929
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/5882
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
7 years agoLU-3242 obdclass: skip zero cookie in class_handle_hash()
John L. Hammond [Mon, 29 Apr 2013 14:33:54 +0000 (09:33 -0500)]
LU-3242 obdclass: skip zero cookie in class_handle_hash()

In class_handle_hash() really skip the zero cookie when it comes
around.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I042d7a43b70feb9710a7b3a27e732fc314ec1cf5
Reviewed-on: http://review.whamcloud.com/6198
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3182 lmv: fix duplicate directory entries
Ned Bass [Wed, 17 Apr 2013 00:58:02 +0000 (17:58 -0700)]
LU-3182 lmv: fix duplicate directory entries

lmv_readpage() fails to overwrite the ldp_hash_end and ldp_flags of
the first lu_dirpage in a CFS_PAGE with the values from the last
lu_dirpage. This causes duplicate directory entries to be returned
from readdir() for sufficiently large directories.  This is only
an issue on platforms where CFS_PAGE_SIZE > LU_PAGE_SIZE, i.e. PPC.

* Fix the regression introduced in commit 5e91e5b, which was the
  apparently accidental removal of these lines from lmv_readpage():

-                        hash_end = dp->ldp_hash_end;
-                        flags = dp->ldp_flags;

* Refactor the lmv_readpage() function and add some comments.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Change-Id: I04e9a98b90216a7da7ce9d9325080d6b6c4010c7
Reviewed-on: http://review.whamcloud.com/6071
Tested-by: Hudson
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-718 mds: fix mds-max-threads configure parameter
Andreas Dilger [Thu, 14 Mar 2013 19:03:38 +0000 (12:03 -0700)]
LU-718 mds: fix mds-max-threads configure parameter

Commit 648b69c2c9 renamed MDT_MAX_THREADS to MDS_MAX_THREADS, but
didn't update the configuration parameter for setting this value
at build time.  Fortunately, the configure parameter itself has
the correct name to begin with, so no build-visible changes needed.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I21dc22e470e7e3788ebef8009643b5c7a98bc220
Reviewed-on: http://review.whamcloud.com/5723
Tested-by: Hudson
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2838 build: Syntax issues in sources
Dmitry Eremin [Thu, 14 Mar 2013 11:47:21 +0000 (15:47 +0400)]
LU-2838 build: Syntax issues in sources

Just fix syntax issues in source code.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I822b943f31785e18ca09125466a9b21d7f7ce558
Reviewed-on: http://review.whamcloud.com/5476
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
7 years agoLU-2856 mdt: Workaround for filenames w/zero hash
Daniel Kobras [Wed, 10 Apr 2013 12:25:12 +0000 (14:25 +0200)]
LU-2856 mdt: Workaround for filenames w/zero hash

Zero is a valid return value of full_name_hash(), but
consumers of mlh_pdo_hash don't handle this case correctly.
Work around problems by consistently mapping zero hashes to
a different value.

Signed-off-by: Daniel Kobras <d.kobras@science-computing.de>
Change-Id: I8b2622cb9b04d9f005f8e414ad35102155820c7d
Reviewed-on: http://review.whamcloud.com/6166
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
7 years agoLU-2139 osc: Track and limit "unstable" pages
Prakash Surya [Tue, 2 Apr 2013 20:36:39 +0000 (13:36 -0700)]
LU-2139 osc: Track and limit "unstable" pages

This change adds a global counter to track the number of "unstable"
pages held by a given client, along with per file system counters. An
"unstable" page is defined as a page which has been sent to the server
as part of a bulk request, but is uncommitted to stable storage.

In addition to simply tracking the unstable pages, they now also count
towards the maximum number of "pinned" pages on the system at any given
time. Thus, a client will now be bound on the number of dirty and
unstable pages it can pin in memory. Previously only dirty pages were
accounted for in this limit.

In addition to tracking the number of unstable pages in Lustre, the
NR_UNSTABLE_NFS memory zone is also incremented and decremented for
easy monitoring using the "NFS_Unstable:" field in /proc/meminfo.
This field is also used internally by the kernel to limit the total
amount of unstable pages on the system.

The motivation for this change is twofold. First, the client must not
allow itself to disconnect from an OST while still holding unstable
pages. Otherwise, these unstable pages can get lost due to an OST
failure, and replay is not possible due to the disconnect via unmount.

Secondly, the client needs a mechanism to prevent it from allocating too
much of its available RAM to unreclaimable pages pinned by the ptlrpc
layer. If this case occurs, out of memory events can trigger as a side
effect, which we need to avoid.

The current number of unstable pages accounted for on a per file system
granularity is exported by the unstable_stats proc file, contained under
each file system's llite namespace. An example of retrieving this
information is below:

$ lctl get_param llite.*.unstable_stats

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Change-Id: Ic43d34bfa39da9ac4fe159000c7db5908467fd7b
Reviewed-on: http://review.whamcloud.com/4245
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3196 tests: a few fixes for > 10 OSTs.
wang di [Sun, 21 Apr 2013 07:00:38 +0000 (00:00 -0700)]
LU-3196 tests: a few fixes for > 10 OSTs.

1. Because the stripe index not necessarily follow OST index order,
even it specifies the index, so we should not check the layout by
OST inde order in sanity 27C.

2. In test 33c, it should print the OSTXXXX by %.4x instead of
%.4d.

3. In replay 44, timeout is under mds, instead of mdt.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Id60ffece99f76cd7dd2e88a655e78e9abcbdfa9b
Reviewed-on: http://review.whamcloud.com/6110
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3133 lfsck: remove objects from OST
Emoly Liu [Thu, 28 Mar 2013 17:15:23 +0000 (01:15 +0800)]
LU-3133 lfsck: remove objects from OST

In lfsck test, remove_objects() should remove objects from OST,
just like remove_files() removes files from MDT.

Test-Parameters: testlist=lfsck
Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: I826bea7c780de275c1890e920deb3c8e8e942c53
Reviewed-on: http://review.whamcloud.com/5981
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3014 utils: Fix an offset overflow in file_create()
Li Wei [Fri, 22 Mar 2013 06:41:11 +0000 (14:41 +0800)]
LU-3014 utils: Fix an offset overflow in file_create()

On an x86_64 machine, creating a 3 GB ZFS-based target using a file
VDev failed like this:

  mkfs.lustre FATAL: mkfs.lustre: Unable to truncate backing store:
  Invalid argument

The error, returned by the ftruncate() call in file_create(), was due
to the "int"-type calculation for the "off_t" argument.  The byte
number of 3 GB overflowed the "int" type and became a negative
"off_t".  This patch changes file_create() to take an "__u64" size
instead of an "int" one and adds "_FILE_OFFSET_BITS=64" to the
AM_CPPFLAGS of lustre/utils, so that file VDevs larger than 2 GB can
be created on both 32-bit and 64-bit x86 architectures.

Change-Id: Id7e6bfc963b0ccba8266795ba2bf9832e9c641ba
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/5805
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2311 osd-ldiskfs: fix link refcounting
Hongchao Zhang [Mon, 8 Apr 2013 02:59:49 +0000 (10:59 +0800)]
LU-2311 osd-ldiskfs: fix link refcounting

Fix potential problem with osd_object_ref_{add,del}() where the link
count is temporarily set to an invalid value before being corrected
later on.  Since the link count is accessed in some places without a
lock, there is a chance an invalid value is sampled.  This was fixed
in ext4 via kernel commit 909a4cf1ffe4b875c87abf38239a9bfd25167e0c.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Change-Id: I396bff24f8a23d6b7622cebfb39b8f847e500c1e
Reviewed-on: http://review.whamcloud.com/4675
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3110 osd-ldiskfs: Dynamic LBUG in osd declares tracking
Bruno Faccini [Mon, 29 Apr 2013 10:21:29 +0000 (12:21 +0200)]
LU-3110 osd-ldiskfs: Dynamic LBUG in osd declares tracking

This patch implements a dynamic way to enable/disable osd
declaration tracking LBUGs.

OSD_TRACK_DECLARES define usage has been removed, and
tracking of declares is no longer a compile time option.

Enable/disable of declares tracking LBUGs is done via new
global lprocfs "track_declares_assert" boolean, also a
module-parameter.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I5164c51c3c3362a77d1a0c7cb7b9f63383b00403
Reviewed-on: http://review.whamcloud.com/6032
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2831 tests: Sync MDS data also when doing sync_all_data
Nathaniel Clark [Wed, 24 Apr 2013 14:10:25 +0000 (10:10 -0400)]
LU-2831 tests: Sync MDS data also when doing sync_all_data

The quota data was differing in sanity-quota/35 before and after
restart because the MDS data had not synced prior to checking the
values.  This patch forces MDS data to sync whenever sync_all_data is
called.  This should prevent this error from recurring here and
elsewhere.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ic2e1dafefbc2d83fc281d76845504c4547dd0694
Reviewed-on: http://review.whamcloud.com/6142
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2547 tests: EXCEPT recovery-small/24a for ZFS
Nathaniel Clark [Mon, 22 Apr 2013 20:46:48 +0000 (16:46 -0400)]
LU-2547 tests: EXCEPT recovery-small/24a for ZFS

EXCEPT this test for ZFS as it is failing over 50% of the time
currently.  Re-enable when root cause of failure is located.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I6c02a4db5a0d9f876cb4e8cff4e614eb3ac86dfd
Reviewed-on: http://review.whamcloud.com/6119
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3238 ptlrpc: expression expansion bug in macro
Liang Zhen [Sun, 28 Apr 2013 08:51:52 +0000 (16:51 +0800)]
LU-3238 ptlrpc: expression expansion bug in macro

slab flag passed into __OBD_SLAB_ALLOC_VERBOSE is not bracketed,
so __OBD_SLAB_ALLOC_VERBOSE will get wrong value from unexpected
expression expansion and hit assertion.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I4f09ccf743d6bda765bab64c60801a87d9a6b9b5
Reviewed-on: http://review.whamcloud.com/6190
Tested-by: Hudson
Reviewed-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3231 fld: fix previous range copy in fld_cache_lookup()
John L. Hammond [Thu, 25 Apr 2013 18:37:44 +0000 (13:37 -0500)]
LU-3231 fld: fix previous range copy in fld_cache_lookup()

In fld_cache_lookup() when returning the previous seq range, copy the
appropriate part of the previous fld_cache_entry.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iabc1146e5c5ac188c8d37c953ffbb645450e070d
Reviewed-on: http://review.whamcloud.com/6171
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
7 years agoLU-3135 wireshark: Add version check to Makefile
Nathaniel Clark [Wed, 10 Apr 2013 14:49:27 +0000 (10:49 -0400)]
LU-3135 wireshark: Add version check to Makefile

This will check for a minimum wireshark version, and warn if not
present.  While the info is in the README, this will make it more
clear why things fail to build.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Id4d5a5734befaa86e8e4fd77d4c35c25d1bbfb85
Reviewed-on: http://review.whamcloud.com/6011
Tested-by: Hudson
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3205 llite: Set layout_gen before compatibility check
Jinshan Xiong [Tue, 23 Apr 2013 23:59:56 +0000 (16:59 -0700)]
LU-3205 llite: Set layout_gen before compatibility check

We should set IO's layout generation to current generation of inode,
no matter what it is. Otherwise, it will go into an infinite loop
on layout change check.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I9f1f0d9ae0dd53dea130db46bbcbdbd29526b6eb
Reviewed-on: http://review.whamcloud.com/6137
Tested-by: Hudson
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3006 utils: mount to pass/clear UPDATE flag
Alex Zhuravlev [Tue, 23 Apr 2013 12:11:00 +0000 (16:11 +0400)]
LU-3006 utils: mount to pass/clear UPDATE flag

this flag is maintained in mountdata (or fs attribute in case
of ZFS) and used to signal MGS parameters have been changed
and this need to be reflected in configuration profiles.

the flag is cleared before actual mount as the device can
be exclusively open by the filesystem preventing any updates.

the scope of the patch is limited to UPDATE flag, but later
we can try to use the approach to deal with WRITECONF/VIRGIN
flags as well.

test 73 added to conf-sanity to verify the case.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: Ic0b4eb6b86798450e4dd4201611e4f5a1c54ef40
Reviewed-on: http://review.whamcloud.com/5982
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
7 years agoLU-2459 osd: add LMA incompat flag check
Bobi Jam [Mon, 8 Apr 2013 05:55:41 +0000 (13:55 +0800)]
LU-2459 osd: add LMA incompat flag check

* Add LMA incompatibility flags checking after object initialization.
* Add a sanity test case (test_17o).

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Iec83f3c87bd5ff769c3544c5897011333aa2d656
Reviewed-on: http://review.whamcloud.com/6121
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Hudson
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3142 tests: gather logs on passive server nodes
Jian Yu [Mon, 22 Apr 2013 07:26:02 +0000 (15:26 +0800)]
LU-3142 tests: gather logs on passive server nodes

This patch adds all_mdts_nodes(), all_osts_nodes(), all_server_nodes()
and all_nodes() into the test-framework.sh to get the active and
passive server nodes. By using these functions, after recovery-*-scale
tests failed in failover configuration, the logs on passive server
nodes can also be gathered.

This patch also fixes yml_entities() to add MGS entity and entities
for passive server nodes.

Test-Parameters: clientdistro=el6 serverdistro=el6 clientcount=4 \
osscount=2 mdscount=2 austeroptions=-R failover=true \
useiscsi=true testlist=recovery-mds-scale

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Idbba6f02f4f50b7d5097a78ea6b6954d5347919a
Reviewed-on: http://review.whamcloud.com/6112
Tested-by: Hudson
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3142 tests: use echoerr() to redirect debug logs
Jian Yu [Thu, 11 Apr 2013 10:37:47 +0000 (18:37 +0800)]
LU-3142 tests: use echoerr() to redirect debug logs

This patch fixes run_dd.sh to use echoerr() for redirecting
debug logs to the log file instead of messing up the output.

Test-Parameters: clientdistro=el6 serverdistro=el6 \
clientarch=x86_64 serverarch=x86_64 clientcount=4 \
osscount=2 mdscount=2 austeroptions=-R failover=true \
useiscsi=true testlist=recovery-double-scale

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I72c02a38c22fca4e7e806af6771e135efd506478
Reviewed-on: http://review.whamcloud.com/6027
Tested-by: Hudson
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Tested-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3216 kernel: Kernel update [RHEL6.4 2.6.32-358.6.1.el6]
yangsheng [Thu, 25 Apr 2013 01:48:37 +0000 (09:48 +0800)]
LU-3216 kernel: Kernel update [RHEL6.4 2.6.32-358.6.1.el6]

Update RHEL6.4 kernel to 2.6.32-358.6.1.el6.

Signed-off-by: yang sheng <yang.sheng@intel.com>
Change-Id: I8c02c826a4e40eb58bc0df017eb0654e3c0465de
Reviewed-on: http://review.whamcloud.com/6153
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3142 mdt: update mdt_max_mdsize in mdt_lvbo_size
Hongchao Zhang [Sun, 21 Apr 2013 13:12:52 +0000 (21:12 +0800)]
LU-3142 mdt: update mdt_max_mdsize in mdt_lvbo_size

during calling mdt_lvbo_size to get the size of lsm, mdt_max_mdsize
could be less than the lsm size of this object in failover mode.
then it is needed to check and update this value.

Test-Parameters: envdefinitions=RECOVERY_MDS_SCALE_EXCEPT=failover_ost \
clientdistro=el6 serverdistro=el6 clientarch=x86_64 serverarch=x86_64 \
clientcount=4 osscount=2 mdscount=2 austeroptions=-R failover=true \
useiscsi=true testlist=recovery-mds-scale

Change-Id: Id490c346ca9f34edcc963334e396678f1b41562c
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Signed-off-by: Jian Yu <jian.yu@intel.com>
Reviewed-on: http://review.whamcloud.com/6102
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1189 tests: run save_lustre_params() on facets
Jian Yu [Mon, 8 Apr 2013 08:15:21 +0000 (16:15 +0800)]
LU-1189 tests: run save_lustre_params() on facets

This patch fixes save_lustre_params() and restore_lustre_params()
to get and set Lustre parameters through do_facet() instead of
do_node(). This will fix the issue that restore_lustre_params() still
tries to restore the saved params on the original node which will
likely become inactive under failover test environment.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \
clientcount=4 osscount=2 mdscount=2 austeroptions=-R \
failover=true useiscsi=true \
testlist=replay-vbr,replay-dual

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I5bfc4797727aaaa15fac1e34b6a7182ae8f26da8
Reviewed-on: http://review.whamcloud.com/5628
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
7 years agoLU-992 build: remove obsolete rhel5 server configurations
James Simmons [Thu, 18 Apr 2013 15:28:32 +0000 (11:28 -0400)]
LU-992 build: remove obsolete rhel5 server configurations

For 2.4 RHEL5 server side support has been removed but
some technical debt remained before. This patch removes
the no longer needed rhel5 kernel configurations and
the CONFIG_LDISKFSDEV_* options that were left over
from the rhel5 ext3 days.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I345b1b5d182602b82be018b1e18a7bdbe939f578
Reviewed-on: http://review.whamcloud.com/5930
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3202 osd: no inode::i_mutex inside osd_object_destroy
Fan Yong [Mon, 22 Apr 2013 06:06:40 +0000 (14:06 +0800)]
LU-3202 osd: no inode::i_mutex inside osd_object_destroy

Originally, to control the race between the OI scrub inserting
OI mapping and the osd_object_destroy() removing OI mapping on
the same target, the inode::i_mutex was used.

But the unlink thread which called osd_object_destroy() already
started transaction handle. Such order is different from others
as to may cause some deadlock between transaction start and the
obtain inode::i_mutex.

So now, the osd_object_destroy() will not obtain inode::i_mutex,
instead, the OI scrub will check whether someone unlinked the
inode or not during the OI scrub rebuilding the OI mapping, and
remove the new-inserted OI mapping if the race happened.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ic5c2e2b1d967e52a2b980d4f6bcaed4bdcf8368b
Reviewed-on: http://review.whamcloud.com/6124
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Cliff White <cliff.white@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3029 osd-ldiskfs: clear old FMODE_32BITHASH for readdir
Fan Yong [Mon, 22 Apr 2013 05:31:20 +0000 (13:31 +0800)]
LU-3029 osd-ldiskfs: clear old FMODE_32BITHASH for readdir

On MDS, the RPC service thread's "osd_thread_info" will be reused
without completely reset when switches from one RPC processing to
another RPC processing.

Some old client may not claim "OBD_CONNECT_64BITHASH" when connect
to the MDS, then it will be regarded as 32-bit client. For readdir
RPC from such old client, the MDS will use 32-bit dirhash, the RPC
service thread's "osd_thread_info::oti_it_ea::oie_file::f_mode" is
set as "FMODE_32BITHASH", which will not be dropped until restart.

If some RPC service threads (for readdir) are "FMODE_32BITHASH",
but some NOT, then for new client, which support 64-bit dirhash,
it may get trouble when traverses large directroy as following:

1) The first readdir RPC is served by the RPC service thread1,
   which is marked as "FMODE_32BITHASH" because it ever served
   readdir RPC from old client. So the thread1 still generates
   32-bit dirhash (major hash only) for the new client readdir.

2) The new client triggers another readdir RPC for the same dir
   with the 32-bit dirhash as cookie, which was returned by the
   thread1.

3) This time, the readdir RPC is served by another RPC service
   thread2, which is NOT marked as "FMODE_32BITHASH". Then the
   thread2 explains the readdir cookie as 64-bit dirhash, It's
   wrong. So the thread2 can NOT locate the position correctly.

So we need to clear some fields in "osd_thread_info" to avoid
to be reused when switch from one RPC processing to another.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3b9aaede0bccab900f1d198c7093b98f0fc48945
Reviewed-on: http://review.whamcloud.com/6138
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Prakash Surya <surya1@llnl.gov>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-631 osc: wait OSC to complete initial connection with OST
Hongchao Zhang [Mon, 8 Apr 2013 02:49:09 +0000 (10:49 +0800)]
LU-631 osc: wait OSC to complete initial connection with OST

in functions lov_prep_*_set, if one OSC isn't active and it is trying
its first connection with OST, then spend some time (obd_timeout)
to wait it to complete.

Change-Id: I424dbf81b6ceebf2cfd1cf48b0f89be40c4c3df4
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/2469
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3143 tests: pass correct facet to wait_osc_import_state()
Jian Yu [Mon, 22 Apr 2013 11:12:58 +0000 (19:12 +0800)]
LU-3143 tests: pass correct facet to wait_osc_import_state()

This patch fixes recovery-small test 29a and some other tests
to pass correct facet to wait_osc_import_state(), which will
make the tests pass under failover configuration.

Test-Parameters: envdefinitions=SLOW=yes \
clientdistro=el6 serverdistro=el6 clientarch=x86_64 \
serverarch=x86_64 clientcount=4 osscount=2 mdscount=2 \
austeroptions=-R failover=true useiscsi=true \
testlist=recovery-small

Change-Id: I68b8d2f2528223a5140eed5724c2a4ffdd1988fa
Signed-off-by: Jian Yu <jian.yu@intel.com>
Reviewed-on: http://review.whamcloud.com/6100
Tested-by: Hudson
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3201 lmv: check result of lmv_find_target()
John L. Hammond [Mon, 22 Apr 2013 18:42:23 +0000 (13:42 -0500)]
LU-3201 lmv: check result of lmv_find_target()

In lmv_locate_mds() and lmv_iocontrol() check the result of
lmv_find_target() using IS_ERR() before dereferencing it.  In
lmv_get_target() return ERR_PTR(-ENODEV) rather than NULL when the
indicated MDS cannot be found among the members of lmv->tgts.  In
__lmv_fid_alloc() check the result of lmv_get_target() using IS_ERR().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I8cfce869e4be329fd432a7b6a88f48fbc81d69dd
Reviewed-on: http://review.whamcloud.com/6116
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: wangdi <di.wang@intel.com>
7 years agoLU-3118 lfsck: resume LFSCK from the last checkpoint
Fan Yong [Mon, 22 Apr 2013 01:58:42 +0000 (09:58 +0800)]
LU-3118 lfsck: resume LFSCK from the last checkpoint

It is a misc patch to enable the support to resume LFSCK from the last
checkpoint: no object will be skipped, not repeatly scan the object(s)
in front of checkpoint.

Other fixes:
1) Simplify LFSCK checkpoint logic.
2) Add linkEA for .lustre itself, then LFSCK will not be misguided.
3) Set LFSCK status as "failed" if hit error at prepare phase.
4) osd_otable_it_store() should return the cursor position, instead
of the pre-load position.
5) Other code cleanup.

Test-Parameters: testlist=sanity-scrub,sanity-lfsck
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3f6e988da323239ff2655ad4b13eae711e892ebe
Reviewed-on: http://review.whamcloud.com/6078
Tested-by: Hudson
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2747 mdt: use of mdt_XXX helpers
jcl [Mon, 4 Feb 2013 18:09:51 +0000 (19:09 +0100)]
LU-2747 mdt: use of mdt_XXX helpers

replace explicit use of some obd fields by mdt helpers:
mdt_obd_name() mdt2obd_dev() mdt_lu_site() mdt_seq_site()

Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Change-Id: I1a67145478a143bf6444c9a248c5cc213dc91687
Reviewed-on: http://review.whamcloud.com/5261
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-398 ptlrpc: Add the NRS ORR and TRR policies
Nikitas Angelinas [Wed, 9 Jan 2013 02:40:21 +0000 (02:40 +0000)]
LU-398 ptlrpc: Add the NRS ORR and TRR policies

The ORR (Object-based Round Robin) policy schedules brw RPCs in
per-backend-filesystem-object groupings; RPCs in each group are
sorted according to their logical file or physical disk offsets.

The TRR (Target-based Round Robin) policy performs the same
function as ORR, but instead schedules brw RPCs in per-OST
groupings.

Both these policies aim to provide for increased read throughput
in certain use cases, either by minimizing costly disk seek
operations (by ordering OST_READ, and perhaps also OST_WRITE
RPCs), but may also allow for improved performance through better
resource utilization and by taking advantage of locality of
reference characteristics of the I/O load.

Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Co-authored-by: Liang Zhen <liang@whamcloud.com>
Change-Id: I1f5a367f2f4a1cf296a3b38f3e395ab28a10668e
Oracle-bug-id: b=13634
Xyratex-bug-id: MRP-73
Reviewed-on: http://review.whamcloud.com/4938
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
7 years agoRevert "LU-2459 osd: add LMA incompat flag check"
Oleg Drokin [Tue, 23 Apr 2013 17:57:16 +0000 (13:57 -0400)]
Revert "LU-2459 osd: add LMA incompat flag check"

This reverts commit 9ee6e92bcf4a142e76e27d5b8ac8b34684749002.

This disrubptively broke maloo testing, unfortunately

7 years agoLU-3103 obdclass: Remove EXPORT_SYMBOL on static function
Christopher J. Morrone [Wed, 3 Apr 2013 23:43:14 +0000 (16:43 -0700)]
LU-3103 obdclass: Remove EXPORT_SYMBOL on static function

In LU-2912 commit 7e915f5d7177b22bd3cc800137fb505781a2c037,
the function linkea_entry_pack() was accidentally delared static
and then also explicitly exported with EXPORT_SYMBOL.  On ppc64
gcc balks at this conflict.

linkea_entry_pack() is not declared in lustre_linkea.h, so leave
it static and remove the EXPORT_SYMBOL.

Change-Id: I60093fc3da8b82e51530ed93427e5ee8d8e6745d
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/5939
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3125 layout: allow stripeless layouts swap
Lai Siyao [Tue, 9 Apr 2013 19:16:15 +0000 (03:16 +0800)]
LU-3125 layout: allow stripeless layouts swap

* allow stripeless layouts swap
* `lfs swap_layouts` should open with O_LOV_DELAY_CREATE

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I3947005d1060ee9be189a9c2adbab61064a8e6f0
Reviewed-on: http://review.whamcloud.com/5998
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-2976 tools: ZFS upstream sonames are versioned
Alexey Shvetsov [Sat, 16 Mar 2013 18:24:57 +0000 (22:24 +0400)]
LU-2976 tools: ZFS upstream sonames are versioned

Current zfsonlinux upstream has versioned zfs libs sonames, while in
lustre/utils/mount_utils_zfs.c are set unversioned

Versions are
libzfs.so -> libzfs.so.1
libnvpair.so -> libnvpair.so.1

Signed-off-by: Alexey Shvetsov <alexxy@gentoo.org>
Change-Id: I871613081c117731b5ec89fc2d79349df0668f94
Reviewed-on: http://review.whamcloud.com/5742
Tested-by: Hudson
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2799 ptlrpc: reduce verbosity of warning
Nathaniel Clark [Sat, 16 Feb 2013 15:07:22 +0000 (10:07 -0500)]
LU-2799 ptlrpc: reduce verbosity of warning

Reduce verbosity of a warning about large number of threads that the
user can change and should not change.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I2713a0629da5c7d4d1a8a6c0d4c35cdea765e5f0
Reviewed-on: http://review.whamcloud.com/5447
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Reviewed-by: Prakash Surya <surya1@llnl.gov>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3020 obdclass: Lustre returns EINTR when SA_RESTART is set
Patrick Farrell [Wed, 27 Mar 2013 21:27:28 +0000 (16:27 -0500)]
LU-3020 obdclass: Lustre returns EINTR when SA_RESTART is set

When Lustre is in a read or write system call and receives a
SIGALRM, it currently returns EINTR at this location. This is
problematic because it prevents the system call from being restarted
if SA_RESTART is set in the handler.

This patch changes behavior in this location to return ERESTARTSYS
when a signal is found.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I26e24b8e8e325c5b0bd7d5d20fa97e2180c12263
Reviewed-on: http://review.whamcloud.com/5814
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2986 mdc: Kernel Oops on ioctl LL_IOC_GET_MDTIDX
Bruno Faccini [Tue, 19 Mar 2013 23:37:02 +0000 (00:37 +0100)]
LU-2986 mdc: Kernel Oops on ioctl LL_IOC_GET_MDTIDX

ll_get_mdt_idx() calls md_getattr() with a NULL 3rd
parameter as a (struct ptlrpc_request **), this can
lead to a SEGV if lmv is skipped, like for a
single-MDS system without an LMV, and mdc_gettatr()
is called straight.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I0f9e5f6a4ae09c9e9a26b231d26b803418827c23
Reviewed-on: http://review.whamcloud.com/5769
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
7 years agoLU-3008 lnet: Update support for Cray's interconnects
James R. Shimek [Thu, 21 Mar 2013 22:41:24 +0000 (17:41 -0500)]
LU-3008 lnet: Update support for Cray's interconnects

This patch updates gnilnd to include all of Cray's
patches for the last year since the initial push.

Included changes

----------------------------------------------------------------------
Subject
Reverse rdma kgnilnd fixes
Description
A LNET_PUT when matched on the receiving side is parsed it
can call kgnilnd_recv with a mlen == 0, previously the reverse_rdma
code for kgnilnd did not handle this and asserted. This mod adds
handling of the case when mlen is set to 0 and also adds handling
when an LNET_GET's lnetmsg is == NULL, which is another case which
is handled in non reverse_rdma path but not in the reverse_rdma path.

----------------------------------------------------------------------
Subject
Gnilnd refcount changes
Description
This mod adjusts connection refcount handling to bring the
reference adding and removing in line with what was expected, this
was brought up during the whamcloud review but left undone on their
end.

----------------------------------------------------------------------
Subject
kgnilnd peer_timeout enhancement for peer_health
Description
Currently on router nodes kgnilnd peer_health is enabled, when
peer_health is enabled it sets a default timeout factor of
kgn_timeout+kgn_timeout/8. This value currently cannot be adjusted
except through adjust kgn_timeout. This mod allows for the user to
increase the value by setting the module parameter peer_timeout in
conjunction with peer_health.

When peer_timeout is set and peer_health is enabled the timeout
passed to lnet will be what the user has specified as long as it is
greater than the previous fudge calculation. If the user specifies a
value less than fudge kgnilnd will fail to load and throw an error
to the console.

Changes
1. Added module parameter peer_timeout, when peer_health is enabled
   this allows manipulation of the ni_peertimeout value passed to
   lnet.

----------------------------------------------------------------------
Subject
kgnilnd conn double free refcount fix
Description
Currently kgnilnd has a possible race condition on service nodes
between two scheduler threads. When a connection is scheduled another
scheduler can act upon the conn before the first has decremented its
reference.
Currently kgnilnd_conn_decref uses a seperate atomic_read after it
decrefs to decide what to do next. There is the possibility that two
threads calling kgnilnd_conn_decref could see the same value of zero
even though one thread would have brought the refcount to one and the
other to zero. The same issue can occur with kgnilnd_peer_decref.

This mod introduces changes to the scheduler to prevent two decrefs
at the same time in different scheduler threads. Also it updates
kgnilnd_conn_decref to utilize the value that is returned by
atomic_dec_return instead of doing a second atomic_read to verify
the reference count.

Changes
1. Changed kgnilnd_conn_decref to use the val returned by
   atomic_sub_return instead of doing atomic_reads to get the value.
2. Changed kgnilnd_peer_decref to use the val returned by
   atomic_sub_return instead of doing atomic_reads to get the value.
3. Updated kgnilnd_schedule_conn and kgnilnd_schedule_process conn
   so that when a connection is scheduled from within a scheduler
   thread it carries the reference forward instead of removing it.
   This in addition to the kgnilnd_conn_decref change should remove
   the double free problem.
4. Changed assertions in kgnilnd_peer_addref, kgnilnd_conn_addref so
   they catch when the value is incremented up from 0 to 1.
5. Use magic value to verify conn is not being free twice.

----------------------------------------------------------------------
Subject
Debug for mailbox corrruption.
Description
We have two peers (routers) writing to the same mailbox of a compute
node.

Add more debug to identify the cause of two peers getting the same
mailbox information.
- Store both the previous nid and the previous purgatory nid for this
  mailbox.
- Store the dgram type in the conn so we can tell if the conn
  resulted from a matched wildcard or a direct connection request.
- Keep track of the total allocations of a mailbox and the current
  number of allocations.
- Add a proc file peer_conns with information containing the peer's
  connection information.
  - writing a nid value (echo 1234 > /proc/kgnilnd/peer_conns) will
    allow the read (cat /proc/kgnilnd/peer_conns) to produce a list
    of conns associated with the specified nid.

----------------------------------------------------------------------
Subject
Ignore events generated from 'xtcli set/clr_reserve'
Description
'xtcli set_reserve' and 'xtcli clr_reserve' operations overload the
ec_node_unavailable event as described in bug 785850.  Since gnilnd
uses ec_node_unavailable events, we need to ignore them when they
originate from those commands.

----------------------------------------------------------------------
Subject
Close connection upon receipt of RCA unavailable event.
Description
When a blade is powered down, messages sent to the nodes will
cause ORB timeouts which causes a quiesce and ORB scrub. The quiesce
causes gnilnd to bump it's timeouts so we continue sending traffic
causing more ORB timeouts.

----------------------------------------------------------------------
Subject
kgnilnd_dgram_mover thread runtime deadline
Description
Currently there is no deadline associated with starting outbound
dgrams within the kgnilnd_dgram_mover thread. The thread will loop
while the list is not empty. When there is a large amount of network
problems the thread could run for a very long time. This mod adds a
deadline check to make sure the dgram thread stops attempting to post
dgrams after the deadline passes, the thread will schedule itself and
be woken up normally after time has passed to continue its work.

Changes
1. Added deadline to kgnilnd_dgram_mover so
   kgnilnd_start_outbound_dgrams is bounded in runtime by size of
   list and by a maximum runtime deadline.
2. Added error injection to verify dgram deadline.
3. Added module parameter to adjust deadline of dgram thread.

----------------------------------------------------------------------
Subject
fix peer_conn_lock deadlock
Description
kgnilnd_tx_done() called with lock held.
There is an error case whereby kgnilnd_tx_done will be called by
kgnilnd_queue_tx(). This can cause a deadlock if lnet calls back
needing the write lock.

Remove call to kgnilnd_tx_done since the tx will be processsed by
kgnilnd_process_fmaq() (like the EAGAIN case).

----------------------------------------------------------------------
Subject
Make kgnilnd_bump_timeouts aware of DONE connections
Description
Currently when kgnilnd comes out of quiesce all connections timeouts
are bumped so they dont close from the period they were paused.
kgnilnd_bump_timeouts schedules all the connections on a peer
including ones that are in purgatory in the GNILND_CONN_DONE state.
These connections are not supposed to be put through the scheduler
once they are in the DONE state.

A LBUG can occur if after the quiesce occurs the scheduler thread
does not push the newly scheduled conns through the state machine
fast enough. This can leave DONE conns on the scheduled list when
stack reset is triggered. Stack reset then puts any scheduled conns
through kgnilnd_complete_closed_conn which when the function sees a
conn in the GNILND_CONN_DONE state it asserts.

Changes
1. Add if statement so kgnilnd_bump_timeouts does not schedule DONE
   connections.

----------------------------------------------------------------------
Subject
Subscribe GNILND to UXACT errors
Description
Aries has a new type of error that GNILND needs to be subscribed to
for stack reset initiation. This mod adds that error type to our
callback subscription routine.

Changes
1. Add GNI_ERRMASK_UNKNOWN_TRANSACTION to mask passed into
   kgnilnd_subscribe_errors function.

----------------------------------------------------------------------
Subject
kgnilnd reverse bte rdma transactions
Description
Currently GNILND executes all of its kgni bte rdma transactions
using GNI_POST_RDMA_PUT, on cascade systems this can cause IOMMU
thrashing on router nodes from the many computes initiating rdma
to the single service node. This can cause linear performance
degradation as more and more computes attempt to write into a single
service nodes memory space. To alleviate this problem we will change
how rdmas are done we will use GNI_POST_RDMA_GET, so the service node
will initiate the transfer of data to it instead of thousands of
clients all trying at once. By adding a run time tunable that allows
us to switch to using GNI_POST_RDMA_GET we can govern the RDMA from
the receiving node.

Changes
1. Added new message types that exist side by side with current
   infrastructure so different nodes can have rdma setting tuned
   and all nodes will handle the messages.
2. Added tunables so that the REVERSE setting can be adjusted at
   run time.
3. Added support for non byte aligned data transfers so that gets
   will succeed when non byte aligned offsets and lengths are
   provided to kgnilnd.
4. Added the capability to send checksum information in the message
   being sent to the side that will be initiating the rdma.
   This works side by side with existing rdma checksum capabilities.
5. Corrected rdma nak problems when RDMA mapping fails for a specific
   type of tx.
6. Added counters to rdma when a copy needs to be made due to
   unaligned data, this will allow us to see if performance is
   hindered because of a large number of vmalloc calls have to be
   made.
7. Changed the entire call tree for rdma to support the handling of
   the new message types.
8. On Aries platforms service nodes will be defaulted to
   GNILND_REVERSE_GET, compute nodes defaulted to GNILND_REVERSE_PUT.

----------------------------------------------------------------------
Subject
Generate/check checksum over the number of bytes actually transferred
Description
It is possible for PUTs to have a different length than the
length stored in lntmsg->msg_len since LNET can adjust this
length based on it's buffer size and offset.
lnet_try_match_md() sets the mlength that we use to do the
RDMA transfer.

Therfore we need to compute checksum using tx->tx_rdma_desc.length
and verify the checksum using length returned in the
msg->gnm_u.completion.gncm_retval which contains the actual number
of bytes transmitted.

----------------------------------------------------------------------
Subject
GniLND needs to filter accelerator events.
Description
Change the kgnilnd_rca thread to filter out accelerator events.
----------------------------------------------------------------------
Subject
kgnilnd BTE Delivery MODE tunable
Description
Currently kgnilnd only exposes a few options to tune for kgni's rdma
bte delivery mode. This works well for Gemini systems, but on Cascade
we would like finer grained control. This mod allows us to change the
delivery mode at run time through the exposed tunable interface.
Giving us the capability to tune the delivery modes without having to
restart the system or make code changes.

Changes
1.  Added tunable bte_dlvr_mode which takes a mask/number for the
    delivery mode and uses that to set the bte delivery option for
    rdma.
2.  Removed extraneous tunables that were only single tunable
    specific.
3.  Added Gemini and Aries header options if in the future we need to
    change the defaults on Aries or Gemini.

----------------------------------------------------------------------
Subject
GniLND connection serialization, debug for compute bad message type.
Description
Introduce a semaphore for connection processing serialization within
the scheduler thread for bugs 789853 and 789855.
  - The main work of the scheduler thread is now protected by a read
    semaphore.
  - When kgnilnd_process_conns needs to do work on a connection, it
    takes a write semaphore.

----------------------------------------------------------------------
Subject
GniLND rca_thread exit fix.
Description
Change the kgnilnd_rca thread from exiting when receiving an error
from krca_wait_event.

----------------------------------------------------------------------
Subject
GniLND kgnilnd_recv message type unknown
Description
Add debug to print out more info in kgnilnd_recv() default case of
the gnm_type switch statement.

----------------------------------------------------------------------
Subject
fix fma_blk state when mdd is invalidated.
Description
Currently when an VIRT_MAPPED fma_blk is unmapped kgnilnd doesnt
change its state to IDLE. Since it doesnt the code that finds a free
mbox will use mboxes within the fma_blk even though its mdd has been
invalidated, causing dgram exchanges to contain bad mailboxes.

This change will mark the fma_blk as having its mdd invalidated.

----------------------------------------------------------------------
Subject
gnilnd/rca integration
Description
Subscribe for the rca events ec_node_unavailable, ec_node_available
and ec_node_failed to prevent reconnect attempts to downed nodes.
We do not use the event to kill a live connection.

----------------------------------------------------------------------
Subject
kgnilnd eager_recv double free fix
Description
Currently the function call kgnilnd_eager_recv does no verification
that the connection passed into it with an rx message is alive and
valid. Normally this is without issue except when connections are
being closed and opened on routers. A connection could be in the
process of being destroyed and have its refcount incremented.
The next call to kgnilnd_recv could cause a double free.

This mod alleviates this by doing a reverse lookup on the connection
based on the information we can validate within the rx message. By
using a read_lock on kgn_peer_conn_lock we can then lookup the
connection based on its nid and verify it conn_stamp matches the one
the message is expecting. If we find a valid connection that matches
we then increment that connections refcount while the lock is held,
preventing it from disappearing until after the receive. Without the
lock and reverse lookup we could end up looking at already freed
memory.

This race was showing itself through an fma_blk assertion on the
router nodes, when 2 destroy_conn calls occured in parallel sometimes
one would get past an if(fma_blk) check and then find that the
fma_blk had already been set to 0.

----------------------------------------------------------------------
Subject
Sequence kgnilnd tx use with close of connections.
Description
Currently kgnilnd makes an incorrect assumption
that when a conn is closed and the connection is removed from
the cqid lookup table that no tx's are in use by other threads.

What can happen is one of the other scheduler threads can be
in the process of using a tx and has called
kgnilnd_tx_del_state_locked. This can race against
kgnilnd_complete_closed_conn in a different scheduler thread as it
attempts to remove all existing tx's from the conn's tx_ref_table.
That kgnilnd_complete_closed_conn calls kgnilnd_tx_del_state_locked
on the connection's tx's, and since a tx could still be in use in the
first scheduler thread an exception can occur.

This mod marks the conn as having tx's in use when the first thread
has a read_lock on the kgnilnd_peer_conn_lock.

Changes
1. Added to kgn_conn_t an atomic gnc_tx_in_use that is incremented
   any time kgnilnd_validate_tx_ev_id is called.
2. Added a decref to the conn's gnc_tx_in_use after the function
   is finished using the tx.
3. Added a check in kgnilnd_process_conns that barriers entry for a
   given connection into kgnilnd_complete_closed_conn until
   gnc_tx_in_use is 0. Once the conn is removed by the close call from
   the cqid hash table only existing in use tx's from before the close
   will prevent the close from completing so no livelocks should be
   possible.

----------------------------------------------------------------------
Subject
Add kgnilnd scheduler thread runtime deadline
Description
This mod makes sure that the kgnilnd scheduler threads
are not sitting on the cpu longer than neccessary by adding a deadline
that forces a yield after the deadline is hit. The amount of time
that the scheduler will allow itself to run without scheduling
is configurable via module parameter in 1 second intervals.

It was also found that the nice value of the scheduler threads
is preventing the heartbeat system from working correctly on
compute nodes with only a single scheduler thread. So we are
changing default nice value of thread to 0 to allow other
threads to run.

Changes
1. Added sched_timeout module parameter to allow changing of
   default scheduler thread deadline.
2. Added deadline check to kgnilnd_process_conns so it does
   not spin in its while loop forever.
3. Added error injection to verify deadline is checked and
   calls to yield occur.
3. Added sched_nice module parameter to allow adjustment of
   scheduler thread priority seperate from other kgnilnd
   threads.

----------------------------------------------------------------------
Subject
Cleanup kgnilnd_schedule_conn races during conn close
Description
This patch reworks the previous debug patch and adds a
debug framework that addresses the shortcomings previous patch.

We are also removing an extraneous kgnilnd_schedule_conn
call from kgnilnd_finish_connect that was causing a large number of
the schedule after close occurences.

There is still a chance that a conn can be scheduled after close but
the current refcount framework is designed to counteract issues that
arise when that happens, making the removal of the assertion valid.

----------------------------------------------------------------------
Subject
Repost WC dgram when OOM event occurs
Description
Currently when kgnilnd runs out of GART space while attempting to
repost a wildcard datagram, the system asserts and tips over. Instead
we can put into place a mechanism that allows WC datagrams to be
reposted when the OOM conditon resolves.

This mod removes the assertion and puts into place a mechanism within
the dgram mover thread to post wildcards when neccessary. This allows
the system to stay up instead of crashing. When posting a dgram
fails a D_NETERROR message will be written out to the console.

----------------------------------------------------------------------
Subject
Workaround and additional debug for scheduler assertion
Description
This mod adds debug to get a better analysis of the gnc_scheduled
problem. It also has a workaround; the call to
kgnilnd_complete_closed_conn will short circuit and let
kgnilnd_process_conns handle the schedule normally when it sees that
gnc_scheduled != GNILND_CONN_PROCESS instead of asserting. I have also
added debug to all the calls to kgnilnd_schedule_conn so we can find
the call that is causing the assertion.

----------------------------------------------------------------------
Subject
Remove assertion and attempt recovery on mailbox corruption
Description
Previous mods have addressed the sequencing that could cause mailbox
corruption by fixing the state machine and adding timeouts. This mod
builds on those and makes the detection of issues relating to the
mailbox a correctable error. Instead of asserting we will now close
the connection when we detect corruption occuring and utilize the
purgatory system to attempt to get things back in order.  The previous
changes allow us to do this as they prevent the close sequence
corruption from spiraling out of control.

Changes
        1. Removed assertion in kgnilnd_check_fma_rx on seqno
           corruption and replace with a statement that closes the
           connection and returns -EIO. This should allow the system
           to continue without causing the node to come down.
        2. Added debug so when we do detect corruption it will be
           tagged in the console. This will allow us to see how often
           the problem occurs and if it contributes to system
           problems.

----------------------------------------------------------------------
Subject
Fix race condition and sequence kgnilnd connection closing
Description
There is a race between the scheduler thread and
kgnilnd_close_conn_locked. While we take the kgn_peer_conn_lock to
close the connection, the scheduler threads dont look at it when they
check the gnc_state. We could end up all the way through the close
state machine by the time the kgnilnd_close_conn_locked function
finishes tripping an assertion. To correct this race and improve
sequencing we need to make sure when changing the conn's gnc_state
we grab the write_lock on kgn_peer_conn_lock.

Changes
        1. In kgnilnd_send_conn_close when setting the conn's
           gnc_state to GNILND_CONN_CLOSED added a write_lock to make
           sure we are sequencing the close with other threads that
           might be changing the connections state.
----------------------------------------------------------------------

Signed-off-by: James R. Shimek <jshimek@cray.com>
Change-Id: I5b8de3b72cdc17b32134cb2532c9ad7dc4fa621c
Reviewed-on: http://review.whamcloud.com/5815
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1750 ofd: always update last_id.
wangdi [Fri, 17 Jan 2014 10:13:54 +0000 (02:13 -0800)]
LU-1750 ofd: always update last_id.

Always update last_id during orphan cleanup, even though
the orphan might be cleanup by the preious request.

Change-Id: I824c97b29b5e03906e66f27e044876cf097ce534
Signed-off-by: Wang Di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/4331
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3048 llapi: make lfs getstripe less crashy, more correct
John L. Hammond [Wed, 3 Apr 2013 20:57:05 +0000 (15:57 -0500)]
LU-3048 llapi: make lfs getstripe less crashy, more correct

Use of get_param_obdvar() in get_mds_md_size() could cause the
max_easize param from the wrong filesystem to be read, possibly
preventing userspace from allocating a sufficiently large buffer to
receive the results of the IOC_MDC_GETFILESTRIPE ioctl() and causing a
buffer overrun in copy_to_user().  Add internal functions
get_param_{cli,llite,lmv,lov}() which finds the correct params for the
filesystem containing the path argument.

In common_param_init() the lum buffer used for IOC_MDC_GETFILESTRIPE
is sized based on the return of get_mds_md_size().  For file systems
with a small number of OSTs this buffer may be too small to hold a
path component.  Fix this by ensuring that the allocated buffer has
size at least PATH_MAX + 1.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I9847a1414cb4306f4bce5f7c30d1d1cddfab8621
Reviewed-on: http://review.whamcloud.com/5934
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2983 osd: osd-zfs to handle errors in IO path
Alex Zhuravlev [Tue, 19 Mar 2013 12:27:51 +0000 (16:27 +0400)]
LU-2983 osd: osd-zfs to handle errors in IO path

- handle an error returned by dmu_buf_hold_array_by_bonus():
  release already pinned buffers, return an error to the caller.
- OFD to handle an error returned by dt_bufs_get()

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I1fe46364967dbc527d0d80f3729673c00ab7154c
Reviewed-on: http://review.whamcloud.com/5784
Tested-by: Hudson
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3044 llite: LSeek SEEK_CUR incorrect after O_APPEND write
Patrick Farrell [Fri, 29 Mar 2013 17:41:34 +0000 (12:41 -0500)]
LU-3044 llite: LSeek SEEK_CUR incorrect after O_APPEND write

When a file is opened with O_APPEND set and a write is done,
the file offset value immediately after the write is incorrect.
It is too much by approximately the length of the write.

This can be seen by doing lseek SEEK_CUR immediately after the
write. This does not cause corruption on subsequent writes
because with O_APPEND VFS resets the file offset to EOF before
each write.

This is resolved by removing the change made for BUG:17711,
which was to set crw_pos in ll_prepare_write.

That change was to pass the LASSERT(cl_page_in_io(page, io)) in
cl_io_prepare_write().  However, this assert has since been
modified to exclude the O_APPEND case, making this unnecessary.

crw_pos is also updated in cl_io_rw_advance which is why pos is
greater than expected.

Removing the extra update to crw_pos in ll_prepare_write fixes
this.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I7c1ad10eefec44aae415b8cfce6b01bc9b39fc8f
Reviewed-on: http://review.whamcloud.com/5861
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3183 tests: sanity test_27f dd to the wrong file
Minh Diep [Wed, 17 Apr 2013 20:56:25 +0000 (13:56 -0700)]
LU-3183 tests: sanity test_27f dd to the wrong file

typo in the output file in dd.

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: I63c5627ed766b92b1446b8f8082017b0e804dbbe
Reviewed-on: http://review.whamcloud.com/6084
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2679 grant: OFD grant as client requested upon reconnect
Lai Siyao [Fri, 1 Feb 2013 13:46:17 +0000 (21:46 +0800)]
LU-2679 grant: OFD grant as client requested upon reconnect

Part of the patch in bz20278 is lost in OFD implementation,
add it back:
* besides recovery, grant client requested amount on normal
  reconnect.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I9e06316d0bd8602663eef4ba661a4ebfebb6e1bd
Reviewed-on: http://review.whamcloud.com/5255
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>