Whamcloud - gitweb
fs/lustre-release.git
9 years agoLU-4536 ldlm: Recalculate interval in ldlm_pool_recalc() 47/12547/4
Nathaniel Clark [Tue, 4 Nov 2014 02:26:33 +0000 (21:26 -0500)]
LU-4536 ldlm: Recalculate interval in ldlm_pool_recalc()

Instead of rechecking a static value, recalculate to see if pool stats
need to be updated.
Add newline so message will print instead of warning about missing
newline.

Test-Parameters: mdsfilesystemtype=zfs mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity,sanity,sanity,sanity
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ic31cc0c1d09a85a9bd5ee04ac34c388263190df1
Reviewed-on: http://review.whamcloud.com/12547
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
9 years agoLU-5986 test: Ensure correct start for conf-sanity/84 16/13016/3
Nathaniel Clark [Wed, 10 Dec 2014 01:13:04 +0000 (20:13 -0500)]
LU-5986 test: Ensure correct start for conf-sanity/84

For review-zfs:
1) test 79 fails to reformat nodes
2) test 80 fails silently
3) test 84 chokes

Fix test 79 to reformat at end and fix 80 die when it should.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I42c635fffd89eda9ccd6f3f9d739ff8ac75afcf6
Reviewed-on: http://review.whamcloud.com/13016
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-6012 scrub: NOT miss to auto detect inconsistent OI mapping 20/13020/2
Fan Yong [Sat, 27 Sep 2014 23:10:34 +0000 (07:10 +0800)]
LU-6012 scrub: NOT miss to auto detect inconsistent OI mapping

When full scrub is triggered automatically, its flags should
be set as SF_INCONSISTENT.

For lookup case, we should check whether current OI mapping is
consistent or not, even if the current OI scrub flags is NOT
SF_INCONSISTENT.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I99ea077ae79fcdfedd7bb16c2a664714e0ea5ea3
Reviewed-on: http://review.whamcloud.com/13020
Tested-by: Jenkins
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
9 years agoLU-1445 lod: clean up lod_fld_lookup() return codes 27/12727/2
Andreas Dilger [Fri, 14 Nov 2014 21:06:34 +0000 (14:06 -0700)]
LU-1445 lod: clean up lod_fld_lookup() return codes

Don't return "rc" when it is known that this will always be "0".
This confuses the reader into thinking that this is an error path
when it is in fact a no-op shortcut that returns success.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ie4571226b3e90c866b958cf6ab65f6077abcab07
Reviewed-on: http://review.whamcloud.com/12727
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5885 lfsck: deadlock when remove striped dir 41/12741/3
Fan Yong [Wed, 24 Sep 2014 09:30:56 +0000 (17:30 +0800)]
LU-5885 lfsck: deadlock when remove striped dir

There is potential deadlock race condition between removing
striped directory and namespace LFSCK. Consider the following
scenario:

1) The LFSCK thread obtained the master object firstly, at
   that time, the master object has not been destroyed yet.

2) One RPC service thread destroyed the master and all its
   slave objects (shards). Because the LFSCK is referencing
   the master object, then the master object will be marked
   as dying in RAM. On the other hand, the master object is
   referencing all its slave objects, then all slave objects
   will be marked as dying in RAM also.

3) The LFSCK thread tries to find some slave object with the
   master object referenced. Then it will find that the slave
   object is dying. According to the object visibility rules:
   the object with dying flag cannot be returned to others.
   So the LFSCK thread has to wait until the dying object has
   been purged from RAM, then it can allocate a new object (with
   the same FID) in RAM. Unfortunately, the LFSCK thread itself
   is referencing the master object, and cause the master object
   cannot be purged, then cause the slave object cannot be purged
   also. So the LFSCK thread will fall into deadlock.

To resolve such trouble, the LFSCK should use non-blocked version
lu_object_find() to locate the slave object of the striped dir,
and return failure immediately (instead of wait) when it finds
dying (slave) object.

This patch also contorls the async pipeline depth between the
LFSCK main engine and the namespace assistant thread to avoid
potential RAM pressure.

Some other code adjustment to avoid potential data overflow
that may cause weird LFSCK statistics information.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I00c601eca8ade5d2e4260c729463f7ecdba0ed53
Reviewed-on: http://review.whamcloud.com/12741
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-3353 ptlrpc: Suppress error message when imp_sec is freed 00/10200/3
Amir Shehata [Thu, 8 May 2014 17:47:56 +0000 (10:47 -0700)]
LU-3353 ptlrpc: Suppress error message when imp_sec is freed

There is a race condition on client reconnect when the import
is being destroyed.  Some outstanding client bound requests
are being processed when the imp_sec has alread been freed.
Ensure to suppress the error message in import_sec_validate_get()
in that case

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I44bc27c804259d4e4b6564460318732113b251a9
Reviewed-on: http://review.whamcloud.com/10200
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-3573 osd-zfs: Only advance zap cursor as needed 04/12904/2
Nathaniel Clark [Wed, 5 Nov 2014 18:05:22 +0000 (13:05 -0500)]
LU-3573 osd-zfs: Only advance zap cursor as needed

Only advance the zap cursor when ozi_pos is not advanced, otherwise
occasionally the a file could get "lost" because the zap_cursor would
advance over it before the retrieve happened.  Handle '..' like '.'
when retrieving ZAP values.

Test-Parameters: mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs testlist=conf-sanity,conf-sanity,conf-sanity,conf-sanity
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Change-Id: I916573c70c8828bed6905b5eda9344b4a49b7f11
Reviewed-on: http://review.whamcloud.com/12904
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
9 years agoLU-4647 nodemap: add tests to sanity-sec for nodemap mapping 06/10406/23
Kit Westneat [Tue, 8 Jul 2014 18:57:40 +0000 (14:57 -0400)]
LU-4647 nodemap: add tests to sanity-sec for nodemap mapping

Added tests to sanity-sec.sh, as outlined in the original nodemap
spec. The tests currently only work with a single OSS node, but this
will be fixed in a future update. These tests test basic permissions
and quota handling of the nodemapper, as well as ACL mapping.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Ieb29091e5b3110593973a5eb03680e86a769b449
Reviewed-on: http://review.whamcloud.com/10406
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5373 test: ignore command return value in sanity test_33b 92/12992/2
Bob Glossman [Mon, 8 Dec 2014 19:48:22 +0000 (11:48 -0800)]
LU-5373 test: ignore command return value in sanity test_33b

Since the test is only looking for a panic and the command
used has different returns depending on kernel version,
ignore the command return value in all cases.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I63e9d0589b1c11736c9afbb8bee8ea4e11b30a4f
Reviewed-on: http://review.whamcloud.com/12992
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
9 years agoLU-5909 kernel: kernel update RHEL6.6 [2.6.32-504.1.3.el6] 15/12815/4
Bob Glossman [Tue, 11 Nov 2014 23:16:03 +0000 (15:16 -0800)]
LU-5909 kernel: kernel update RHEL6.6 [2.6.32-504.1.3.el6]

Update RHEL6.6 kernel to 2.6.32-504.1.3.el6

Test-Parameters: clientdistro=el6.6 mdsdistro=el6.6\
  ossdistro=el6.6 mdsfilesystemtype=ldiskfs\
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I6a320f2a2806b12ee7c07645bed212792965da99
Reviewed-on: http://review.whamcloud.com/12815
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5837 llite: ll_getparent cleanup 27/12527/6
Henri Doreau [Fri, 31 Oct 2014 23:04:19 +0000 (00:04 +0100)]
LU-5837 llite: ll_getparent cleanup

Avoid unneeded allocation. Get read-only attributes from the user
getparent structure and write the modified attributes only, instead
of populating a whole structure in kernel and copying it back.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Ifc0632870f80733194384d02d1b4962cdcd75658
Reviewed-on: http://review.whamcloud.com/12527
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5485 lnet: peer aliveness status and NI status 53/12453/4
Liang Zhen [Tue, 28 Oct 2014 10:04:51 +0000 (18:04 +0800)]
LU-5485 lnet: peer aliveness status and NI status

A couple of changes to improve aliveness detection:
- When LNet received a message, it can determine peer of this message
  is alive

- When LNet recieved a message from remote network, it can determine
  router is alive and NI status on router is UP.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I7133987c5c8728248cce7bc0a95048b26bc6611a
Reviewed-on: http://review.whamcloud.com/12453
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5932 tests: load sunrpc module before insmod ptlrpc_gss 90/12790/2
Jian Yu [Wed, 19 Nov 2014 21:10:10 +0000 (13:10 -0800)]
LU-5932 tests: load sunrpc module before insmod ptlrpc_gss

Lustre ptlrpc_gss module depends on Linux kernel sunrpc module.
This patch fixes load_module() in test-framework.sh to load the
sunrpc module before loading ptlrpc_gss module by using insmod.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Idf22fa9023a2fada9038e16fbc3e8a61530266bc
Reviewed-on: http://review.whamcloud.com/12790
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2675 llite: remove lli_lvb 49/12849/2
John L. Hammond [Mon, 24 Nov 2014 22:08:27 +0000 (16:08 -0600)]
LU-2675 llite: remove lli_lvb

In struct ll_inode_info remove the struct ost_lvb lli_lvb member and
replace it with obd_time lli_{a,m,c}time. Rename ll_merge_lvb() to
ll_merge_attr(). Remove cl_merge_lvb() and replace calls to it with
calls to ll_merge_attr().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Idaf8a89d2e4243e62a23cab949c3c129001bb9f3
Reviewed-on: http://review.whamcloud.com/12849
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5828 lnet: showing buffers problem with mulitple CPTs 93/12593/3
Amir Shehata [Wed, 5 Nov 2014 22:32:59 +0000 (14:32 -0800)]
LU-5828 lnet: showing buffers problem with mulitple CPTs

Overloading an iterator variable in lustre_lnet_show_routing()
caused only the first CPT information to be displayed.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ic75dccc0c3537b8272d1c2687a759fbcc23052e8
Reviewed-on: http://review.whamcloud.com/12593
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5858 obdclass: eliminate NULL error return 54/12554/4
Bob Glossman [Tue, 4 Nov 2014 19:14:35 +0000 (11:14 -0800)]
LU-5858 obdclass: eliminate NULL error return

Always return an ERR_PTR() on errors, never return a NULL,
in lu_object_find_slice().  Also clean up callers who
no longer need special case handling of NULL returns.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I57ddb38abaec7caf57bb63a75dbd76e181ba72b2
Reviewed-on: http://review.whamcloud.com/12554
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5396 mdc: (and lmv, mgc, osc) make some functions static 22/12222/9
Frank Zago [Tue, 30 Sep 2014 03:10:50 +0000 (22:10 -0500)]
LU-5396 mdc: (and lmv, mgc, osc) make some functions static

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

  warning: symbol 'proc_lnet_routes' was not declared.
      Should it be static?
  warning: symbol 'proc_lnet_routers' was not declared.
      Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago <fzago@cray.com>
Change-Id: I8dcc5224c1da75cfb5ef7afb1fdb0f72422a3ac0
Reviewed-on: http://review.whamcloud.com/12222
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5396 lov: (and ldlm) make some functions static 21/12221/10
Frank Zago [Tue, 30 Sep 2014 02:39:31 +0000 (21:39 -0500)]
LU-5396 lov: (and ldlm) make some functions static

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

  warning: symbol 'proc_lnet_routes' was not declared.
      Should it be static?
  warning: symbol 'proc_lnet_routers' was not declared.
      Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago <fzago@cray.com>
Change-Id: I86b7ada5c768f4b875fce55745f7492faabd4617
Reviewed-on: http://review.whamcloud.com/12221
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5396 ptlrpc: make some functions static 19/12219/4
Frank Zago [Thu, 2 Oct 2014 02:05:25 +0000 (21:05 -0500)]
LU-5396 ptlrpc: make some functions static

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

  warning: symbol 'proc_lnet_routes' was not declared.
      Should it be static?
  warning: symbol 'proc_lnet_routers' was not declared.
      Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago <fzago@cray.com>
Change-Id: Ic30c9c00be7fd161e0eb3aa2505c6d731c3d7a87
Reviewed-on: http://review.whamcloud.com/12219
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5396 llite: make some functions static 11/12211/5
Frank Zago [Mon, 8 Sep 2014 00:24:35 +0000 (19:24 -0500)]
LU-5396 llite: make some functions static

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

  warning: symbol 'proc_lnet_routes' was not declared.
      Should it be static?
  warning: symbol 'proc_lnet_routers' was not declared.
      Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago <fzago@cray.com>
Change-Id: Id6b13d2b5ceb30de02b60ed6be24d4a496454b70
Reviewed-on: http://review.whamcloud.com/12211
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5396 libcfs: make some functions static 07/12207/7
Frank Zago [Sun, 7 Sep 2014 18:00:28 +0000 (13:00 -0500)]
LU-5396 libcfs: make some functions static

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

  warning: symbol 'proc_lnet_routes' was not declared.
      Should it be static?
  warning: symbol 'proc_lnet_routers' was not declared.
      Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago <fzago@cray.com>
Change-Id: I5bdf94633fb94e435d32691d521ad7c1234018aa
Reviewed-on: http://review.whamcloud.com/12207
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5772 osd-zfs: irrelevant comment over __osd_xattr_load 50/12350/3
Isaac Huang [Mon, 20 Oct 2014 19:32:20 +0000 (13:32 -0600)]
LU-5772 osd-zfs: irrelevant comment over __osd_xattr_load

Moved comment over __osd_xattr_load() to __osd_xattr_get(),
where it really belongs, and converted it to Doxygen format.
Added a few other minor cleanups as well.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Change-Id: If73048d046419eaa4e23cbc5acde32c09b588996
Reviewed-on: http://review.whamcloud.com/12350
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5443 libcfs: replace direct HZ access with kernel APIs 93/11993/11
Jian Yu [Tue, 18 Nov 2014 02:32:20 +0000 (18:32 -0800)]
LU-5443 libcfs: replace direct HZ access with kernel APIs

On some customer's systems, kernel was compiled with HZ defined to
100, instead of 1000. This improves performance for HPC applications.
However, to use these systems with Lustre, customers have to re-build
Lustre for the kernel because Lustre directly uses the defined
constant HZ.

Since kernel 2.6.21, some non-HZ dependent timing APIs become non-
inline functions, which can be used in Lustre codes to replace the
direct HZ access.

These kernel APIs include:
 jiffies_to_msecs()
 jiffies_to_usecs()
 jiffies_to_timespec()
 msecs_to_jiffies()
 usecs_to_jiffies()
 timespec_to_jiffies()

And here are some samples of the replacement:
 HZ            -> msecs_to_jiffies(MSEC_PER_SEC)
 n * HZ        -> msecs_to_jiffies(n * MSEC_PER_SEC)
 HZ / n        -> msecs_to_jiffies(MSEC_PER_SEC / n)
 n / HZ        -> jiffies_to_msecs(n) / MSEC_PER_SEC
 n / HZ * 1000 -> jiffies_to_msecs(n)

This patch replaces the direct HZ access in libcfs module.

The patch also replaces ONE_BILLION with NSEC_PER_SEC,
and ONE_MILLION with USEC_PER_SEC in linux-time.h.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I33846f378eb876cd8958ff0c397ffb56a552f256
Reviewed-on: http://review.whamcloud.com/11993
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5443 ldiskfs: replace direct HZ access with kernel APIs 79/12779/2
Jian Yu [Wed, 19 Nov 2014 02:36:32 +0000 (18:36 -0800)]
LU-5443 ldiskfs: replace direct HZ access with kernel APIs

On some customer's systems, kernel was compiled with HZ defined to
100, instead of 1000. This improves performance for HPC applications.
However, to use these systems with Lustre, customers have to re-build
Lustre for the kernel because Lustre directly uses the defined
constant HZ.

Since kernel 2.6.21, some non-HZ dependent timing APIs become non-
inline functions, which can be used in Lustre codes to replace the
direct HZ access.

These kernel APIs include:
 jiffies_to_msecs()
 jiffies_to_usecs()
 jiffies_to_timespec()
 msecs_to_jiffies()
 usecs_to_jiffies()
 timespec_to_jiffies()

And here are some samples of the replacement:
 HZ            -> msecs_to_jiffies(MSEC_PER_SEC)
 n * HZ        -> msecs_to_jiffies(n * MSEC_PER_SEC)
 HZ / n        -> msecs_to_jiffies(MSEC_PER_SEC / n)
 n / HZ        -> jiffies_to_msecs(n) / MSEC_PER_SEC
 n / HZ * 1000 -> jiffies_to_msecs(n)

This patch replaces the direct HZ access in ldiskfs module.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes,MMP_EXCEPT=5 \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
clientcount=4 osscount=2 mdscount=2 austeroptions=-R failover=true iscsi=1 \
testlist=mmp

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Ic111360083bd6d8973e47767cb1b291915613727
Reviewed-on: http://review.whamcloud.com/12779
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5986 test: fix conflicting conf-sanity 83 test. 84/12984/2
James Simmons [Mon, 8 Dec 2014 15:59:59 +0000 (10:59 -0500)]
LU-5986 test: fix conflicting conf-sanity 83 test.

Patches for both LU-4119 and LU-5729 introduced test
83 to conf-sanity. The simple fix is to renumber the
test from LU-4119 to test 84.

Change-Id: Idca7c97daface6768a08f7ef7cbd00b601921a1e
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/12984
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
9 years agoLU-5996 tests: check spaces and tabs in .sh files 76/12976/2
Jian Yu [Sat, 6 Dec 2014 07:16:37 +0000 (23:16 -0800)]
LU-5996 tests: check spaces and tabs in .sh files

This patch fixes checkpatch.pl script to check the following
coding style rules in .sh files:
- code indent should use tabs where possible
- no space before tabs
- no spaces at the start of a line

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I71821d2d15ca218528fcd8fb37119d0e0798027a
Reviewed-on: http://review.whamcloud.com/12976
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
9 years agoRevert "LU-4820 osd: drop memcpy in zfs osd" 90/12990/3
Andreas Dilger [Mon, 8 Dec 2014 17:37:02 +0000 (17:37 +0000)]
Revert "LU-4820 osd: drop memcpy in zfs osd"

This caused review-zfs sanity test_44 to fail in all test cases.

This reverts commit 1249edcd71e6a44f92aba1482201b30696e85d0d.

Change-Id: I972c4c68ee67443c999ce74fda6f6960b0e4b30d
Reviewed-on: http://review.whamcloud.com/12990
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5982 lfsck: not unlock the object repeatedly 43/12943/3
Fan Yong [Wed, 24 Sep 2014 04:10:29 +0000 (12:10 +0800)]
LU-5982 lfsck: not unlock the object repeatedly

There was wrong logic in lfsck_namespace_insert_orphan() that tried to
unlock the same object twice if failed to update the object's linkEA,
then triggered low layer LBUG(). Fix it.

On the other hand, the remote orphan parent object should be marked as
LOHA_EXISTS after lfsck_namespace_create_orphan_remote() done.

Some test scripts cleanup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3901d5cea8afde362dca8ee25a8d2a44e9f6ffea
Reviewed-on: http://review.whamcloud.com/12943
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
9 years agonew tag 2.6.91 2.6.91 v2_6_91 v2_6_91_0
Oleg Drokin [Fri, 5 Dec 2014 23:38:22 +0000 (18:38 -0500)]
new tag 2.6.91

Change-Id: I9601b0cfaa9088eb4d7fda2614fbc2183dcccb52

9 years agoLU-5824 tests: auster -h gives wrong example 70/12470/2
Isaac Huang [Wed, 29 Oct 2014 05:19:47 +0000 (23:19 -0600)]
LU-5824 tests: auster -h gives wrong example

The last example given by "auster -h" should use "-i 5" rather than
"-r 5".

Signed-off-by: Isaac Huang <he.huang@intel.com>
Change-Id: Ibf1f196e5a1d4282ad9a75eda68c27c45f074a7e
Reviewed-on: http://review.whamcloud.com/12470
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
9 years agoLU-5866 build: add option to disable zfs build 76/12576/8
Wang Shilong [Mon, 10 Nov 2014 12:51:30 +0000 (20:51 +0800)]
LU-5866 build: add option to disable zfs build

add option --disable-zfs to disable build zfs for Lustre.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ie7c1c5d0417979f61f0294390377eaebc36fd320
Reviewed-on: http://review.whamcloud.com/12576
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5577 obdclass: change loop indexes to unsigned 87/12387/4
Dmitry Eremin [Mon, 13 Oct 2014 17:18:21 +0000 (21:18 +0400)]
LU-5577 obdclass: change loop indexes to unsigned

Cleanup warnings about comparison between signed and unsigned.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I2d94940251f639942142d54a561225daa8cd8a74
Reviewed-on: http://review.whamcloud.com/12387
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5451 lod: improve weird FID handling 60/11560/9
John L. Hammond [Fri, 22 Aug 2014 15:21:25 +0000 (10:21 -0500)]
LU-5451 lod: improve weird FID handling

In lod_fld_lookup() the FID in question may have come from disk or
wire. Thus if fid_is_sane() returns false then return -EIO rather than
asserting.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I6c7e3885a8b1aa81fcaa8891392a11e40a02fbce
Reviewed-on: http://review.whamcloud.com/11560
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5933 fiemap: set FIEMAP_EXTENT_LAST correctly 81/12781/2
Bobi Jam [Wed, 19 Nov 2014 05:47:58 +0000 (13:47 +0800)]
LU-5933 fiemap: set FIEMAP_EXTENT_LAST correctly

When we've collected enough extents as user requested, we'd check one
further to decide whether we've reached the last extent of the file.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ic4c4710adf98552626d87d54c893ba9fa18ef7b8
Reviewed-on: http://review.whamcloud.com/12781
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5696 ptlrpc: missing wakeup for ptlrpc_check_set 58/12158/10
Liang Zhen [Wed, 1 Oct 2014 16:47:46 +0000 (00:47 +0800)]
LU-5696 ptlrpc: missing wakeup for ptlrpc_check_set

This patch changes a few things:

- There is no guarantee that request_out_callback will happen
  before reply_in_callback, if a request got reply and unlinked
  reply buffer before request_out_callback is called, then the
  thread waiting on ptlrpc_request_set will miss wakeup event.

  This may seriously impact performance of some IO workloads or
  result in RPC timeout

- To make code more easier to understand, this patch changes
  action-bits "rq_req_unlink" and "rq_reply_unlink" to
  status-bits "rq_req_unlinked" and "rq_reply_unlinked"

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: Ie6043534af3c9b48a52da30210d327f3de83b866
Reviewed-on: http://review.whamcloud.com/12158
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5474 tests: sanity-hsm test_90 use local HSM_ARCHIVE 69/12069/11
James Nunez [Mon, 17 Nov 2014 18:29:40 +0000 (11:29 -0700)]
LU-5474 tests: sanity-hsm test_90 use local HSM_ARCHIVE

sanity-hsm test 90 suffers from frequent failures due to
slow archive speeds. If the existing archive is not
local, test 90 now uses a local disk archive to speed
the archive process.

sanity-hsm test 40 was modified to query the SINGLEAGT
node to check if the archive is a local disk for
HSM_ARCHIVE.

copytool_cleanup was modified to match copytool_setup;
remove the contents of $hsm_root and not $hsm_root itself.

Test 90 is removed from the exception list.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I0beee30b681d4b80f23d33cb42ff5b2944fc21d1
Reviewed-on: http://review.whamcloud.com/12069
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoRevert "LU-5275 lprocfs: remove last of non seq data structs and functions." 53/12953/3
Johann Lombardi [Fri, 5 Dec 2014 09:53:39 +0000 (09:53 +0000)]
Revert "LU-5275 lprocfs: remove last of non seq data structs and functions."

This reverts commit 0ad4f8a4227ed7dd93fec99d33c6bb25056473fc.
This patch has broken the el6.6 build:
include/linux/proc_fs.h:120: note: previous declaration of 'remove_proc_subtree' was here

Change-Id: I62d9b032448d9eea5d089b69382c2ff5064c5d3d
Reviewed-on: http://review.whamcloud.com/12953
Tested-by: Jenkins
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
9 years agoLU-5581 ldlm: evict clients returning errors on ASTs 52/11752/5
Alexey Lyashkov [Sun, 2 Nov 2014 16:53:48 +0000 (11:53 -0500)]
LU-5581 ldlm: evict clients returning errors on ASTs

When a client returns an error other then EINVAL replying to blocking
ast, it is unsafe to cancel the lock on server side only, because the
client may continue its I/O assuming it still owns the lock while the
real lock may be granted already to another client.

In only valid error case when client replied to AST with EINVAL cancel
the lock and return ERESTART, evict the client in any other error
case.

Xyratex-bug-id: MRP-2041
Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Change-Id: Ibce60ce3b2c24ba388155ac49cba8f20388893e7
Reviewed-on: http://review.whamcloud.com/11752
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-4647 nodemap: fix problem with node reclassification 75/12575/2
Kit Westneat [Wed, 5 Nov 2014 14:23:10 +0000 (09:23 -0500)]
LU-4647 nodemap: fix problem with node reclassification

nodemap_add_member can't be used to move an already hashed member
to a new nodemap, so this patch copies the needed functionality to
nm_member_reclassif_cb. This also adds a mutex lock for reclassifying
so that there is only one nodemap reclassifying at a time.
Reclassifying takes a lock on a nodemap's nm_member_hash, so a
deadlock could arise if one nodemap is trying to add a member to
another nodemap, and that second nodemap is also reclassifying and
eventually tries to add a member to the first nodemap.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Icc93a8e6d8384afa90e45cc04f1422512974ce4a
Reviewed-on: http://review.whamcloud.com/12575
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5889 mdc: Proper accessing struct lov_user_md 83/12683/2
Yoshifumi Uemura [Wed, 12 Nov 2014 07:02:04 +0000 (16:02 +0900)]
LU-5889 mdc: Proper accessing struct lov_user_md

In mdc_setattr_pack() access the members of struct lov_user_md by
little endian byte order.

Signed-off-by: Yoshifumi Uemura <kogexe@gmail.com>
Change-Id: I201f00f527242faa6e1a199d3e792e5cdfa48006
Reviewed-on: http://review.whamcloud.com/12683
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-4975 ofd: Fix Doxygen warnings for ofd files 65/12665/2
Doug Oucharek [Tue, 11 Nov 2014 02:35:48 +0000 (18:35 -0800)]
LU-4975 ofd: Fix Doxygen warnings for ofd files

After patches 10417 and 10586, some Doxygen warnings were created
due to incorrect syntax and missing function parameters in header.
This patch fixes those warnings.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I6859d9cbe17f52562ae1e93ea4fa0afb08b3f547
Reviewed-on: http://review.whamcloud.com/12665
Tested-by: Jenkins
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
9 years agoLU-1892 osp: Fix Doxygen warnings for osp_trans.c 59/12659/2
Doug Oucharek [Mon, 10 Nov 2014 20:22:14 +0000 (12:22 -0800)]
LU-1892 osp: Fix Doxygen warnings for osp_trans.c

After patch 10361, three Doxygen warnings were created due to
incorrect syntax. This patch fixes those.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: Iaaec90f545a8f55522b5f87111dea2b544592ea2
Reviewed-on: http://review.whamcloud.com/12659
Tested-by: Jenkins
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5813 lnet: Fix typo in route show command 57/12557/4
Amir Shehata [Tue, 4 Nov 2014 20:58:01 +0000 (12:58 -0800)]
LU-5813 lnet: Fix typo in route show command

Fix type: s/vebose/verbose

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I454dc319b670b176c96916ea6b1d44036f9f0199
Reviewed-on: http://review.whamcloud.com/12557
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5577 obdclass: lu_htable_order() return type to long 85/12385/2
Dmitry Eremin [Fri, 10 Oct 2014 19:35:16 +0000 (23:35 +0400)]
LU-5577 obdclass: lu_htable_order() return type to long

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I4a2071f9ca51cc34f1fd7c73ccf7dac52a9ff0e9
Reviewed-on: http://review.whamcloud.com/12385
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5275 lprocfs: remove last of non seq data structs and functions. 98/12298/4
James Simmons [Mon, 3 Nov 2014 20:55:06 +0000 (15:55 -0500)]
LU-5275 lprocfs: remove last of non seq data structs and functions.

This patch removes the rest of the non-seq file data structs
and functions. We rename the current seq data structs and
functions to match what is in the upstream lustre client.
Some functions in newer kernels are absent in RHEL6.5 and
SLES11SP3 kernels but lustre has equivalent functions so
they have also been renamed to match what exist in newer
kernels.

Change-Id: Iec17cd214864fe7c004eae8972397be326cdfee4
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/12298
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2675 obd: remove client_obd_lock_t 31/12231/4
John L. Hammond [Wed, 5 Nov 2014 01:10:43 +0000 (20:10 -0500)]
LU-2675 obd: remove client_obd_lock_t

Remove the definition of client_obd_lock_t and the functions
client_obd_list_{init,lock,unlock,done}(). Use spinlock_t for the
cl_{loi,lru}_list_lock members of struct client_obd and call
spin_{lock,unlock}() directly.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I3c4b9cf531b6d62c3481a40f4a1c448cf864beec
Reviewed-on: http://review.whamcloud.com/12231
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5536 target: allow FLD_READ request during recovery 99/12199/6
Mikhail Pershin [Mon, 6 Oct 2014 18:35:10 +0000 (22:35 +0400)]
LU-5536 target: allow FLD_READ request during recovery

FLD_READ opcode was introduced but not added to the
tgt_filter_recovery_request() as allowed during recovery.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I1d63a95202288e3d72b77037658e5ba0eec4103e
Reviewed-on: http://review.whamcloud.com/12199
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
9 years agoLU-5702 ldlm: suppress error message for valid case 89/12189/3
Mikhail Pershin [Mon, 6 Oct 2014 11:01:53 +0000 (15:01 +0400)]
LU-5702 ldlm: suppress error message for valid case

LVB object may not exist and this is valid case

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I673933d582856a212289e228f0ccfb156c88cfb1
Reviewed-on: http://review.whamcloud.com/12189
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
9 years agoLU-5857 tests: check lctl return value in check_catastrophe() 40/12640/4
Jian Yu [Wed, 3 Dec 2014 02:26:10 +0000 (18:26 -0800)]
LU-5857 tests: check lctl return value in check_catastrophe()

This patch fixes check_catastrophe() to check the return value of
lctl command. The catastrophe value would be checked only if the
lctl command passed.

The patch also simplifies the function to check catastrophe value
on all of the test nodes without separating local and remote nodes.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I0ffdafe27b0829dde5a8ea136be76e35b5ea8f43
Reviewed-on: http://review.whamcloud.com/12640
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5586 llite: fix dup flags names 92/12892/2
Bob Glossman [Mon, 1 Dec 2014 19:03:26 +0000 (11:03 -0800)]
LU-5586 llite: fix dup flags names

The name 'xattr' is used for two different ll_flags bits.
Change the names to be distinct and different, reflecting
the names of the bits used in LL_SBI_xbitnamex #defines.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I538cbee8f5382e1a7c74f2dcd598025886225cc3
Reviewed-on: http://review.whamcloud.com/12892
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5951 clio: update timestamps after buiding rpc 65/12865/4
Niu Yawei [Mon, 1 Dec 2014 06:05:55 +0000 (01:05 -0500)]
LU-5951 clio: update timestamps after buiding rpc

The mtime/atime/ctime in the write RPC has to be updated after
the RPC is built (where xid is generated), otherwise, it could
race with the setattr and updating wrong timestamps on OST side.

Seems this regression was introduced when landing clio code.

Use ofd_write_lock() to protect fmd lookup/udpate in
ofd_punch_object(), otherwise, it could race with ofd_attr_set()
and ofd_commitrw().

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I16216038ea2bd064ef7f33857a1d4aba167ac5fb
Reviewed-on: http://review.whamcloud.com/12865
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
9 years agoLU-5950 mgc: add nid iteration 29/12829/3
Alexander.Boyko [Mon, 24 Nov 2014 10:55:15 +0000 (13:55 +0300)]
LU-5950 mgc: add nid iteration

mgc_apply_recover_logs use only first nid from entry,
this could be the problem for a cluster with several network
address for a one node.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Change-Id: I6ec348761c2d51edd613cb388e37ef7776990424
Xyratex-bug-id: MRP-2255
Reviewed-on: http://review.whamcloud.com/12829
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
9 years agoLU-5912 libcfs: use vfs api for fsync calls 31/12731/3
Bob Glossman [Fri, 14 Nov 2014 22:26:30 +0000 (14:26 -0800)]
LU-5912 libcfs: use vfs api for fsync calls

Use vfs_fsync_range() instead of direct use of filp->f_op->fsync()
routines.  Doing so will apply correct locking transparently without
needing to decide how to do it ourselves.
What we were doing was a long term violation of the locking
protocols described in Documentation/filesystems/Locking in linux
source but was never noticed until new checking code went into the
RHEL 6.6 kernel.  The new check triggered a visible error in syslog.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I551215fc340637364fe04f6e3bae963cf983c953
Reviewed-on: http://review.whamcloud.com/12731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
9 years agoLU-5894 mds: allow 2.4/2.5 clients create remote dir 15/12715/2
Wang Di [Fri, 14 Nov 2014 07:16:09 +0000 (23:16 -0800)]
LU-5894 mds: allow 2.4/2.5 clients create remote dir

MDS will only return ENOTSUPP if old client (2.4/2.5) tries
to create striped dir with stripe count > 1, so it can still
create remote directory on the new MDS (>= 2.6).

Change-Id: I25c90ae793f91eed032949d26fd5e7fc41801e4f
Signed-off-by: Wang Di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/12715
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
9 years agoLU-5808 llog: check name strictly to avoid invalid record 37/12437/2
Li Xi [Mon, 27 Oct 2014 13:54:25 +0000 (21:54 +0800)]
LU-5808 llog: check name strictly to avoid invalid record

Records for a file system cound be written to llog of another file
system by mistake if the name of the former one is the prefix of
the latter one. This patch fixes the problem by using more strict
checking of llog name.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: If45c59b0226b71e8a95f9aa719eae8412c89a2f1
Reviewed-on: http://review.whamcloud.com/12437
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
9 years agoLU-5635 llog: prevent out-of-bound index 61/12161/3
Frank Zago [Wed, 1 Oct 2014 20:30:50 +0000 (15:30 -0500)]
LU-5635 llog: prevent out-of-bound index

llog_process_thread() can be called from llog_cat_process_cb with an
index already out of bound, leading to the following crash:

LustreError: 3773:0:(llog.c:310:llog_process_thread())
  ASSERTION(index <= last_index + 1 ) failed:
LustreError: 3773:0:(llog.c:310:llog_process_thread()) LBUG

 #0 [ffff8801144bf900] machine_kexec at ffffffff81038f3b
 #1 [ffff8801144bf960] crash_kexec at ffffffff810c5d82
 #2 [ffff8801144bfa30] panic at ffffffff8152798a
 #3 [ffff8801144bfab0] lbug_with_loc at ffffffffa02f8eeb [libcfs]
 #4 [ffff8801144bfad0] llog_process_thread at ffffffffa0413fff [obdclass]
 #5 [ffff8801144bfb80] llog_process_or_fork at ffffffffa041585f [obdclass]
 #6 [ffff8801144bfbd0] llog_cat_process_cb at ffffffffa0418612 [obdclass]
 #7 [ffff8801144bfc30] llog_process_thread at ffffffffa0413c22 [obdclass]
 #8 [ffff8801144bfce0] llog_process_or_fork at ffffffffa041585f [obdclass]
 #9 [ffff8801144bfd30] llog_cat_process_or_fork at ffffffffa0416b9d [obdclass]
    RIP: 00007f6de5e4f730  RSP: 00007fff9aa26d98  RFLAGS: 00000206
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 00007f6de5e4f730
    RDX: 0000000000008000  RSI: 00000000019c7000  RDI: 0000000000000003
    RBP: 00000000019c7000   R8: 00007f6de6103ee8   R9: 0000000000000001
    R10: 00007fff9aa26b20  R11: 0000000000000246  R12: ffffffffffff8000
    R13: 0000000000000003  R14: 0000000000008000  R15: 0000000000000003
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

If index is too big, simply return success.

Change-Id: I81bbedbbe2bcef478c370ef40fc069447d39efbd
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12161
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5687 dt: propagate errors from failed declarations 30/12130/4
John L. Hammond [Tue, 30 Sep 2014 16:09:30 +0000 (11:09 -0500)]
LU-5687 dt: propagate errors from failed declarations

Check for and return errors from dt_declare_*() in several locations.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Id18b12d6c713e78e2f1cc782ff659d2c84cc60bb
Reviewed-on: http://review.whamcloud.com/12130
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2675 lnet: remove ulnds 17/12117/3
John L. Hammond [Mon, 29 Sep 2014 18:33:24 +0000 (13:33 -0500)]
LU-2675 lnet: remove ulnds

Remove the unused userspace LND code (all of lnet/ulnds/) and
supporting autocrud.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I104d8b22afdde5027a2a0ef1a9ecc0423b67fae5
Reviewed-on: http://review.whamcloud.com/12117
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2675 lmv: remove lmv_init_{lock,unlock}() 15/12115/3
John L. Hammond [Mon, 29 Sep 2014 18:12:52 +0000 (13:12 -0500)]
LU-2675 lmv: remove lmv_init_{lock,unlock}()

In struct lmv_obd rename the init_mutex member to
lmv_init_mutex. Remove the compat macros lmv_init_{lock,unlock}() and
use mutex_{lock,unlock}(&lmv->lmv_init_mutex) instead.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iae1f5d6b7fd1f96ba430d5e7af97c51ce3e042a8
Reviewed-on: http://review.whamcloud.com/12115
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2675 md: remove unused code from md_object.h 13/12113/3
John L. Hammond [Mon, 29 Sep 2014 17:55:46 +0000 (12:55 -0500)]
LU-2675 md: remove unused code from md_object.h

Remove several unused functions, structures, and members from
md_object.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I33de0ba987bfde95172e9bfb77929b6b4dcd0aa8
Reviewed-on: http://review.whamcloud.com/12113
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5622 tests: check/wait for copytool death 22/11922/5
Bruno Faccini [Mon, 15 Sep 2014 15:37:31 +0000 (17:37 +0200)]
LU-5622 tests: check/wait for copytool death

Seems that copytool death/kill may take more time so
this condition must be handled in sanity-hsm copytool_cleanup()
function to avoid situations where copytool will then not be
restarted, but only signaled, in next copytool_setup().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ia72ed07f0219cf0aa2ef5b3805fb1f7faf4dab66
Reviewed-on: http://review.whamcloud.com/11922
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Jenkins
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-3456 ptlrpc: quiet errors on initial connection 57/10057/4
Andreas Dilger [Tue, 22 Apr 2014 19:54:46 +0000 (13:54 -0600)]
LU-3456 ptlrpc: quiet errors on initial connection

It may be that a client or MDS is trying to connect to a target (OST
or peer MDT) before that target is finished setup.  Rather than
spamming the console logs during initial connection, only print a
console error message if there are repeated failures trying to
connect to the target, which may indicate an error on that node.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I98ec7b4c2109b700b53297038d3fede4773ebbe5
Reviewed-on: http://review.whamcloud.com/10057
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-4820 osd: drop memcpy in zfs osd 60/9760/10
Alex Zhuravlev [Mon, 24 Mar 2014 15:30:19 +0000 (19:30 +0400)]
LU-4820 osd: drop memcpy in zfs osd

dmu_read() was called from osd_read_prep() copying from
ARC bufs into the same ARC bufs. seem to be the remainings
of per-zerocopy age.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I0f3657c360d8541d7c3c6e8e32eac78bc5702b42
Reviewed-on: http://review.whamcloud.com/9760
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5878 lfs: migrate file to its proper destination 01/12601/6
Frank Zago [Thu, 6 Nov 2014 17:08:30 +0000 (11:08 -0600)]
LU-5878 lfs: migrate file to its proper destination

llapi_file_open_param() is supposed to be returning the opened file
descriptor. However, when llapi_search_ost() is called, it returns 1,
which sets rc to 1, which in turn is confused for an error later, and
returned to the caller. So when the copy happen, the destination file
descriptor is 1 (stdout).

Fixed a typo in the function description, and format the parameters
descriptions.

Fixed a bad indentation.

There's no need to test lum before freeing it since at that point is
not NULL (and free will test it anyway).

Change-Id: I16fe26480b880aa818b1bb706b22bfdd6833d69c
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12601
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
9 years agoLU-5861 lnet: invoke lnetctl properly from startup script 61/12561/2
Amir Shehata [Tue, 4 Nov 2014 21:14:39 +0000 (13:14 -0800)]
LU-5861 lnet: invoke lnetctl properly from startup script

Use the correct lnetctl command syntax to load default config:
lnetctl import < lnet.conf

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I54dd0d34f75b91c1c6ceb9745d817cb43f82ef25
Reviewed-on: http://review.whamcloud.com/12561
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-4119 ldlm: abort recovery by time_hard 78/9078/11
Sergey Cheremencev [Thu, 20 Nov 2014 16:58:43 +0000 (11:58 -0500)]
LU-4119 ldlm: abort recovery by time_hard

Set obd_abort_recovery to 1 when recovery time
reaches obd_recovery_time_hard.

Xyratex-bug-id: MRP-1365

Change-Id: Ida8f71cb63d5db9bf85bcdf2c152b4d9f71b8bca
Signed-off-by: Sergey Cheremencev <Sergey_Cheremencev@xyratex.com>
Reviewed-on: http://review.whamcloud.com/9078
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5893 kernel: kernel update [RHEL7 3.10.0-123.9.3.el7] 57/12657/4
Bob Glossman [Mon, 10 Nov 2014 19:20:18 +0000 (11:20 -0800)]
LU-5893 kernel: kernel update [RHEL7 3.10.0-123.9.3.el7]

update RHEL7 kernel to 3.10.0-123.9.3.el7

Test-Parameters: clientdistro=el7
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ife164ff8bea44369bc33cae07cfbb59d5845e406
Reviewed-on: http://review.whamcloud.com/12657
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-1453 scrub: auto trigger OI scrub more flexible 38/12738/10
Fan Yong [Sat, 13 Sep 2014 20:22:41 +0000 (04:22 +0800)]
LU-1453 scrub: auto trigger OI scrub more flexible

Generally, scanning the whole device for OI scrub routine check may
takes some long time. If the whole system only contains several bad
OI mappings, then it is not worth to trigger OI scrub automatically
with full speed when some bad OI mapping is auto-detected. Instead,
we can make the OI scrub to fix the found bad OI mappings only, and
if more and more bad OI mappings are found as to exceeds some given
threshold that can be adjusted via some proc interface, then the OI
scrub will run with full speed to scan whole device.

Currently, we offer two kinds of thresholds for triggering OI scrub
to scan the whole device:

1) "the total OI mappings count" vs "the bad OI mappings count".
   If such ratio is low than the given threshold that can be set
   via the proc interface "full_scrub_ratio", then trigger urgent
   mode OI scrub.

2) "the speed of found the bad OI mappings". If the speed exceeds
   the given threshold that can be adjusted via the proc interface
   "full_scrub_speed", then trigger urgent mode OI scrub.

Test-Parameters: mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs envdefinitions=ONLY=4 testlist=sanity-scrub
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibc4592fef1da11994ec30eb348d20576be5ae54b
Reviewed-on: http://review.whamcloud.com/12738
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
9 years agoLU-1452 scrub: OI scrub skips uninitialized groups 37/12737/5
Fan Yong [Thu, 11 Sep 2014 23:55:43 +0000 (07:55 +0800)]
LU-1452 scrub: OI scrub skips uninitialized groups

If the ldiskfs group descriptor is marked as LDISKFS_BG_INODE_UNINIT,
then means that the inodes in such group have never been initialized,
so the otable based iterator can skip this group directly to speed up
the scanning.

If the iteration position reaches the unused inodes area in the
group descriptor (indicated by bg_itable_unused), then skip the
rest inodes in this group to reduce the scanning time.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ie8a2eb1269d288865ce51d40e211e3db54d062af
Reviewed-on: http://review.whamcloud.com/12737
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
9 years agoLU-5867 lfsck: Enable --create_mdtobj flag 78/12578/5
James Nunez [Tue, 9 Sep 2014 18:53:42 +0000 (02:53 +0800)]
LU-5867 lfsck: Enable --create_mdtobj flag

Using the --create_mdtobj flag in 'lctl lfsck_start'
creates an error. "create_mdtobj" is added to the
option struct so it will be recognized as a valid option.

When displaying the results of LFSCK, "create_mdtobj" is
not listed as a parameter. "create_mdtobj" is added to
the lfsck_param_names array so it will be printed when
used.

Also, added LSV_CREATE_MDTOBJ to the lfsck_request
valid options/flags.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I1923bb9a71958b390b9abea248b328ac59c3caad
Reviewed-on: http://review.whamcloud.com/12578
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5963 nodemap: use proper hashing 81/12881/2
Alexey Lyashkov [Sat, 29 Nov 2014 08:55:22 +0000 (11:55 +0300)]
LU-5963 nodemap: use proper hashing

don't hash a export pointer as string.
check a situation when we don't delete a export from nodemap
hash.

Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Change-Id: Id53281078f165ce984abebc74992bde30fcc9f31
Reviewed-on: http://review.whamcloud.com/12881
Tested-by: Jenkins
Reviewed-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Kit Westneat <kit.westneat@gmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5727 ldlm: revert the changes for lock canceling policy 33/12733/2
Jinshan Xiong [Sat, 15 Nov 2014 01:07:37 +0000 (17:07 -0800)]
LU-5727 ldlm: revert the changes for lock canceling policy

The changes for LRU lock policy was introduced by commit bfae5a4e,
where I was trying to revise the policy to pick locks for canceling.

However, this caused two problems as mentioned in LU-5727. The first
problem is that the lock can only be picked for canceling only if
the number of LRU locks is over preset LRU number AND it's aged; the
second problem is that mdc_cancel_weight() tends to not cancel OPEN
locks, therefore open locks can be kept forever and finally exhausts
memory on the MDT side.

The first problem is fixed by patch e8812867. This patch will revert
the rest of changes related to LRU policy revise.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ie1dbcd15dc6e739d01ddcae01d7e637688a1d4b2
Reviewed-on: http://review.whamcloud.com/12733
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5507 recovery: don't replay closed open 67/12667/4
Niu Yawei [Tue, 11 Nov 2014 05:54:34 +0000 (00:54 -0500)]
LU-5507 recovery: don't replay closed open

To avoid scanning the replay open list every time in the
ptlrpc_free_committed(), the fix of LU-2613 (4322e0f9) changed
the ptlrpc_free_committed() to skip the open list unless the
import generation is changed. That introduced a race which could
make a closed open being replayed:

1. Application calls ll_close_inode_openhandle()-> mdc_close(),
   to close file, rq_replay is cleared, but the open request is
   still on the imp_committed_list;

2. Before the md_clear_open_replay_data() is called for close,
   client start replay, and that closed open will be replayed
   mistakenly;

3. Open replay interpret callback (mdc_replay_open) could race
   with the mdc_clear_open_replay_data() at the end;

This patch fix the ptlrpc_free_committed() to make sure the
open list is scanned on recovery to prevent the closed open request
from being replayed.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ia67fe5d8d501a69bafbbd7e44bd612abb9c254c6
Reviewed-on: http://review.whamcloud.com/12667
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2833 tests: Unexempt sanity/48a for zfs 07/12607/3
Nathaniel Clark [Thu, 6 Nov 2014 20:51:37 +0000 (15:51 -0500)]
LU-2833 tests: Unexempt sanity/48a for zfs

With LU-2449 being landed this test no longer fails on ZFS.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ie82f25ac0152dee7972a8a210d8669b59798e9a7
Reviewed-on: http://review.whamcloud.com/12607
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoRevert "LU-3573 osd-zfs: Only advance zap cursor as needed" 87/12887/4
Andreas Dilger [Mon, 1 Dec 2014 09:07:00 +0000 (09:07 +0000)]
Revert "LU-3573 osd-zfs: Only advance zap cursor as needed"

This reverts commit 1da9b84b39ab36be9ba67a72ae175dde6521769b.

This patch introduced a far more serious regression in conf-sanity
test_32b LU-5924 and should be reverted until the problem is fixed.

Change-Id: I28f04a33d1c1bb4688d2ba9af6015a2737fb1d93
Reviewed-on: http://review.whamcloud.com/12887
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5079 tests: fix service_time in max_recovery_time() 24/12724/9
Jian Yu [Mon, 24 Nov 2014 22:32:55 +0000 (14:32 -0800)]
LU-5079 tests: fix service_time in max_recovery_time()

This patch fixes the calculation of service_time in
max_recovery_time() to use the new method in
check_and_start_recovery_timer() and new values of
CONNECTION_SWITCH_MAX and CONNECTION_SWITCH_INC.

The patch also fixes replay-dual sub-tests:
- to call wait_clients_import_state() instead of sleeping
  uncertain time in test_11()
- to add some margin into the recovery time comparison
  in test_20()

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ENABLE_QUOTA=yes,REPLAY_DUAL_EXCEPT=21 \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs mdtcount=1 \
testlist=replay-dual,replay-dual

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Ife0fab28ed7b67ac61022f7e8a38957e3995b167
Reviewed-on: http://review.whamcloud.com/12724
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5650 mgc: check the import stat for lprocfs 27/12327/2
Hongchao Zhang [Tue, 9 Sep 2014 12:18:17 +0000 (20:18 +0800)]
LU-5650 mgc: check the import stat for lprocfs

in lprocfs_mgc_rd_ir_state, the import state should be checked
the validity before doing further work.

Change-Id: Ic582150a1cdbef331a929ce378d6e4f987a169fd
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/12327
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5888 utils: limit max_sectors_kb tunable setting 23/12723/2
Andreas Dilger [Fri, 14 Nov 2014 19:48:39 +0000 (12:48 -0700)]
LU-5888 utils: limit max_sectors_kb tunable setting

Limit the value set by mount.lustre set_blockdev_tunables() to a
reasonable 32MB instead of the maximum possible amount, since the
parsing of max_hw_sectors_kb might be bad, or it just returns a
value much larger than we need.

Also quiet the printing of the max_sectors_kb tunable that was added
in commit 9813961151e (http://review.whamcloud.com/9865) so that it
only prints something when the value is actually changed, instead of
printing it for every tunable even if the value is the same.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I648c2d8484ae5cef59ab62421cd01bc0ed02fcd6
Reviewed-on: http://review.whamcloud.com/12723
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Blake Caldwell <blakec@ornl.gov>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5862 changelog: Proper record remapping 74/12574/4
Henri Doreau [Wed, 5 Nov 2014 14:01:52 +0000 (15:01 +0100)]
LU-5862 changelog: Proper record remapping

Fixed changelog_remap_rec() to correctly remap records emitted
with jobid_var=disabled, i.e. delivered by new servers but with
no jobid field.

Updated sanity test 205 accordingly.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Ia151e9bfde2def8819913ee658bde6b71ef3ab18
Reviewed-on: http://review.whamcloud.com/12574
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
9 years agoLU-5848 debug: more debug log for dt_sync 73/12573/3
Fan Yong [Sat, 6 Sep 2014 04:39:46 +0000 (12:39 +0800)]
LU-5848 debug: more debug log for dt_sync

Add some D_CACHE logs at the entry/exit for osp_sync()/osd_sync().

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iaa7fbfbbadb9312528b5092d64615b277de6b679
Reviewed-on: http://review.whamcloud.com/12573
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5641 tests: ensure user daemon is in group bin on mds 62/12762/2
Bob Glossman [Tue, 18 Nov 2014 01:33:57 +0000 (17:33 -0800)]
LU-5641 tests: ensure user daemon is in group bin on mds

The previous fix for this problem only fixed groups on client.
That worked as long as we were only testing with el7 client,
but was an incomplete solution for el7 client/servers.
Need to apply the same fix to mds too to keep things consistent.

Signed-off-by: Bob Gossman <bob.glossman@intel.com>
Change-Id: I411970c591a72b0393ed892f15da1f5d6340df8c
Reviewed-on: http://review.whamcloud.com/12762
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5892 lfsck: remove improper LASSERT in lfsck_needs_scan_dir 70/12670/2
Fan Yong [Sat, 6 Sep 2014 20:13:49 +0000 (04:13 +0800)]
LU-5892 lfsck: remove improper LASSERT in lfsck_needs_scan_dir

Inside the lfsck_needs_scan_dir(), when the internal variable @fid
becomes the input @obj's parent FID, the internal variable @depth
may be still zero, so the original "LASSERT(depth > 0);" is improper
under such case. Then remove it.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I64f10be682c51c6ac5cc1af3497eb569281fcd21
Reviewed-on: http://review.whamcloud.com/12670
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5832 utils: Fix buffer overflow in bound string copy 16/12516/8
Dmitry Eremin [Fri, 31 Oct 2014 10:45:26 +0000 (13:45 +0300)]
LU-5832 utils: Fix buffer overflow in bound string copy

The function 'strncpy' may incorrectly check buffer boundaries
and may overflow buffer 'info->name' of fixed size (256). Also
there is one similar error on line 1135.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I512ab6678fbf1d02bac2eb290fd13c22fca9dc2b
Reviewed-on: http://review.whamcloud.com/12516
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5568 lnet: fix kernel crash when network failed to start 12/12512/5
Amir Shehata [Fri, 31 Oct 2014 00:50:15 +0000 (17:50 -0700)]
LU-5568 lnet: fix kernel crash when network failed to start

When loading Lustre modules without proper network configuration,
it always hit the following kernel panic:
LNetError: 105-4: Error -100 starting up LNI tcp
LNetError: 2145:0:(api-ni.c:823:lnet_unprepare())
 ASSERTION( list_empty(&the_lnet.ln_nis) ) failed:
LNetError: 2145:0:(api-ni.c:823:lnet_unprepare()) LBUG
Pid: 2145, comm: modprobe
x0aCall Trace:
[<ffffffffa044f853>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[<ffffffffa044fdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
[<ffffffffa04f3267>] lnet_unprepare+0x297/0x340 [lnet]
[<ffffffffa04f3b5c>] LNetNIInit+0x25c/0x3e0 [lnet]
[<ffffffff81061bc6>] ? put_online_cpus+0x56/0x80
[<ffffffffa0983000>] ? init_module+0x0/0x1000 [ptlrpc]
[<ffffffffa081310c>] ptlrpc_ni_init+0x2c/0x1a0 [ptlrpc]
[<ffffffffa0983000>] ? init_module+0x0/0x1000 [ptlrpc]
[<ffffffffa0813291>] ptlrpc_init_portals+0x11/0xf0 [ptlrpc]
[<ffffffffa0983000>] ? init_module+0x0/0x1000 [ptlrpc]
[<ffffffffa09831c4>] init_module+0x1c4/0x1000 [ptlrpc]
[<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[<ffffffff810ca7fb>] load_module+0x129b/0x1a90
[<ffffffff812da590>] ? ddebug_dyndbg_module_param_cb+0x0/0x60
[<ffffffff810c7133>] ? copy_module_from_fd.isra.43+0x53/0x150
[<ffffffff810cb1a6>] SyS_finit_module+0xa6/0xd0
[<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
...
This is because in lnet_startup_lndnis(), we may add list items to
@the_lnet.ln_nis and @the_lnet.ln_nis_cpt before it failed. But in
lnet_startup_lndis() failure path,it did not cleanup list thus
causing assertion in lnet_unprepare().

Fix the assertion by cleaning up using lnet_shutdown_lndnis()
if the startup fails.

In a future enahancement the ni startup API will be modified to
cleanup after itself in case of failure.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ia344fd7c0f24c87b654554dda9e57bf5525edc85
Reviewed-on: http://review.whamcloud.com/12512
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5731 osp: flush async updates for osp_sync 59/12359/2
Fan Yong [Thu, 21 Aug 2014 04:19:25 +0000 (12:19 +0800)]
LU-5731 osp: flush async updates for osp_sync

Current osp_sync() only considers the async requests that are
handled by the osp_sync_thread, but ignores the async updates
that are handled directly by the background ptlrpcd threads.
Usually, such async updates are for LFSCK remote repairing.
This patch will flush all of them when dt_sync() is called.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0e6d54120acbd8ab82cf776222277ae3b805812d
Reviewed-on: http://review.whamcloud.com/12359
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-4839 tests: Give copytool more time to start 82/12682/6
Nathaniel Clark [Wed, 12 Nov 2014 01:56:28 +0000 (20:56 -0500)]
LU-4839 tests: Give copytool more time to start

Copytool can take some time to start, and if the HSM archive directory
is on a busy NFS server, it can take a bit of time for the initial
opens to occur.  This allows those actions more time to complete which
should give this test a better chance of passing correctly.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs \
testlist=sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes,ONLY=60 \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
mdtcount=4 testlist=sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I28bc57b92c34b4eee07ba34a2d976f2c39dc70dc
Reviewed-on: http://review.whamcloud.com/12682
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Michael MacDonald <michael.macdonald@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5707 lfsck: store namespace LFSCK statistics info in new EA 21/12321/5
Fan Yong [Tue, 9 Sep 2014 03:23:04 +0000 (11:23 +0800)]
LU-5707 lfsck: store namespace LFSCK statistics info in new EA

For Lustre-2.6 or older release, the namespace LFSCK statistics info
was stored as XATTR_NAME_LFSCK_NAMESPACE EA, but in Lustre-2.7, the
namespace LFSCK will introduce more statistics information that will
cause the XATTR_NAME_LFSCK_NAMESPACE EA to be extended. If it still
uses the old XATTR_NAME_LFSCK_NAMESPACE EA, then when downgrade, the
old LFSCK will get -ERANGE when load the new trace file from disk,
and then the LFSCK cannot be started after downgrade.

To avoid such trouble, Lustre-2.7 will use new EA to store the
namespace LFSCK statistics info: XATTR_NAME_LFSCK_NAMESPACE_V2,
and keep a dummy XATTR_NAME_LFSCK_NAMESPACE EA in the trace file
to be compatible with old LFSCK.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I55b5adb962434013b00e3938a67b671010ecc206
Reviewed-on: http://review.whamcloud.com/12321
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
9 years agoLU-5740 build: add RHEL6.6 [2.6.32-504.el6] to build selections 09/12609/4
Bob Glossman [Tue, 28 Oct 2014 17:25:04 +0000 (10:25 -0700)]
LU-5740 build: add RHEL6.6 [2.6.32-504.el6] to build selections

Add support for building with RHEL6.6 kernel version 2.6.32-504.el6
while retaining the ability to build with older RHEL 6.5 kernels.
New ldiskfs patch series for el6.6 is included.

Test-Parameters: clientdistro=el6.6 mdsdistro=el6.6\
  ossdistro=el6.6 mdsfilesystemtype=ldiskfs\
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I780feefbbc179607762c0d2997fd608830f3db8b
Reviewed-on: http://review.whamcloud.com/12609
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5941 build: build dkms build at installed source tree 02/12802/2
Minh Diep [Thu, 20 Nov 2014 16:10:53 +0000 (08:10 -0800)]
LU-5941 build: build dkms build at installed source tree

Port from:
https://github.com/
zfsonlinux/zfs/commit/46bf86a9635266dd399443f5bf5c5f8d0f280aa2

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: If0c8543d955594b4f9dc305c35271a9cc94e1bbd
Reviewed-on: http://review.whamcloud.com/12802
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5941 dkms: make lustre-dkms require 2.2.0.3-28.git.7c3e7c5 01/12801/2
Minh Diep [Thu, 20 Nov 2014 16:05:25 +0000 (08:05 -0800)]
LU-5941 dkms: make lustre-dkms require 2.2.0.3-28.git.7c3e7c5

Due to a bug in dkms, we need to enfore the use of
dkms-2.2.0.3-28.git.7c3e7c5 version.

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: I9ad8ccaa5106b221f41a50c520d8bdfef160c065
Reviewed-on: http://review.whamcloud.com/12801
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-2524 test: Code clean up for conf-sanity 30/10530/7
James Nunez [Fri, 30 May 2014 19:20:21 +0000 (13:20 -0600)]
LU-2524 test: Code clean up for conf-sanity

The patch modifying the tdir variable to a single directory
has landed; http://review.whamcloud.com/#/c/8123/. We can
now conduct miscellaneous cleanup including:

Remove the `-p` (parents) option from many calls to mkdir
Replace `lfs setstripe` with $SETSTRIPE
Replace `lfs getstripe` with $GETSTRIPE
Replace `lctl` with $LCTL
Added check for and call `error` and/or added error messages
for a variety of common functions.
Replace `…` with $(...)
Remove linefeed escape after |, ||, & and && operators.
Modify directory and file names to use $tdir and $tfile
Remove 'mkdir -p $MOUNT' when 'mount_client $MOUNT' is
called right before or after mkdir

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes testlist=conf-sanity

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I94bd51ce2d2f225736e12c4f9ac1a86a3d8a23d8
Reviewed-on: http://review.whamcloud.com/10530
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
9 years agoLU-5814 llite: remove ll_objects_destroy() 18/12618/2
John L. Hammond [Fri, 7 Nov 2014 15:00:09 +0000 (09:00 -0600)]
LU-5814 llite: remove ll_objects_destroy()

Remove ll_objects_destroy(). This function is not needed for
interoperability with servers of version 2.4 or higher (after lustre
commit 5165cdd4).

Remove the then unused function lov_destroy() and its supporting
functions. Remove the lsm_destroy method of struct lsm_operations.

Remove the unused struct lov_stripe_md, MD export, and capa parameters
from obd_destroy() and its implementations.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: If8634b3d88a660d00891219c348622ec45361316
Reviewed-on: http://review.whamcloud.com/12618
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5418 echo: replace lov_stripe_md with lov_oinfo 47/12447/3
John L. Hammond [Wed, 29 Oct 2014 17:15:06 +0000 (12:15 -0500)]
LU-5418 echo: replace lov_stripe_md with lov_oinfo

In echo_client replace uses of struct lov_stripe_md with struct
lov_oinfo (since the instances of the former really only contained a
single instance of the latter). Remove the then unneccessary functions
echo_alloc_memmd(), echo_free_memmd(), osc_unpackmd(), and
obd_alloc_memmd(). Remove the struct lov_stripe_md * parameter from
obd_create(). Flatten osc_create() and osc_real_create() into a single
function.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I5fe276bcc56e1fa8138a4d3f20b9d5297cf74f3f
Reviewed-on: http://review.whamcloud.com/12447
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
9 years agoLU-3962 iokit: fix whitespace in scripts 56/10456/6
Andreas Dilger [Tue, 27 May 2014 17:53:01 +0000 (11:53 -0600)]
LU-3962 iokit: fix whitespace in scripts

Fix the whitespace in mds-survey and obdfilter-survey to use tabs
instead of 4-space indentation.  Fix coding style in several places.

Remove the use of a python script just to get the page size.  Instead,
use "getconf PAGE_SIZE" to do this.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
testlist=mds-survey,obdfilter-survey

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I921007043c360b45d45fc03a8237edea9a3ebbe5
Reviewed-on: http://review.whamcloud.com/10456
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5537 ptlrpc: Fix an rq_no_reply assertion failure 40/11740/3
Li Wei [Wed, 3 Sep 2014 09:02:22 +0000 (17:02 +0800)]
LU-5537 ptlrpc: Fix an rq_no_reply assertion failure

An OSS had an assertion failure:

  LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout
  on bulk GET after 0+0s  req@ffff88083a61b400
  x1476486691018500/t0(4300509964)
  o4->8dda3382-83f8-6445-5eea-828fd59e4a06@192.168.1.116@o2ib1:0/0
  lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc
  0/0
  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION(
  req->rq_no_reply == 0 ) failed:
  Lustre: soaked-OST0000: Bulk IO write error with
  8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116@o2ib1),
  client will retry: rc -110
  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG
  Pid: 5432, comm: ll_ost_io03_003

  Call Trace:
  [<ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
  [<ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs]
  [<ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc]
  [<ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
  [<ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
  [<ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
  [<ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc]
  [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
  [<ffffffff81529246>] ? schedule+0x176/0x3b0
  [<ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
  [<ffffffff8109abf6>] kthread+0x96/0xa0
  [<ffffffff8100c20a>] child_rip+0xa/0x20
  [<ffffffff8109ab60>] ? kthread+0x0/0xa0
  [<ffffffff8100c200>] ? child_rip+0x0/0x20

The thread in tgt_brw_write() had decided not to reply by setting
rq_no_reply, right before another thread tried to send an early reply
for the request.

Change-Id: I9096a098621a38610c0d0d2dff016c012fc4b7f2
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/11740
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
9 years agoLU-20 kernel: increase BH_LRU_SIZE to 16 77/12577/2
Sebastien Buisson [Wed, 5 Nov 2014 15:34:14 +0000 (16:34 +0100)]
LU-20 kernel: increase BH_LRU_SIZE to 16

As kernel community did not want a complicated way of
modifying BH_LRU_SIZE, it was proposed to directly set it
to 16. This has been accepted.
This patch is merged in the upstream kernel:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/
linux.git/commit/?id=86cf78d73de8c6bfa89804b91ee0ace71a459961

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Change-Id: I71fb455de9ec70ed90f86d402ae76ecfba1e1e61
Reviewed-on: http://review.whamcloud.com/12577
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
9 years agoLU-5729 osd: iput in case of error in osd_scrub_setup 25/12325/4
Sergey Cheremencev [Fri, 26 Sep 2014 13:00:56 +0000 (17:00 +0400)]
LU-5729 osd: iput in case of error in osd_scrub_setup

In case of ENOSPACE from osd_scrub_file_store iput is needed.
Otherwise there is a message in dmesg: "VFS: Busy inodes after
unmount of vdb. Self-destruct in 5 seconds. Have a nice day..."
Also added osd_oi_fini for case of error from osd_initial_OI_scrub
or osd_scrub_start.

Change-Id: Ibc6f487c9bd5b07f09cb3f7e3b5fc2bf1e329fb0
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Xyratex-bug-id: MRP-2109
Reviewed-on: http://review.whamcloud.com/12325
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5855 lfsck: misc fixes for zfs-based backend 52/12552/5
Fan Yong [Wed, 3 Sep 2014 16:25:33 +0000 (00:25 +0800)]
LU-5855 lfsck: misc fixes for zfs-based backend

It contains several fixes to make the LFSCK to work under DNE mode
for zfs-based backend.

Test-Parameters: mdsfilesystemtype=zfs mdtfilesystemtype=zfs ostfilesystemtype=zfs mdscount=2 mdtcount=2 testlist=sanity-lfsck
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I8e8758336d4ce67667f7e3586475ddd72db2d419
Reviewed-on: http://review.whamcloud.com/12552
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
9 years agoLU-5833 lfsck: handle lfsck_open_dir() return-value properly 33/12533/3
Fan Yong [Tue, 2 Sep 2014 11:06:03 +0000 (19:06 +0800)]
LU-5833 lfsck: handle lfsck_open_dir() return-value properly

Inside the lfsck_prep(), the returned value from lfsck_open_dir()
should be handled properly before returning to the caller directly.
For example: positive number from lfsck_open_dir() means the end of
current directory, but if continuously return such value to the
lfsck_prep()'s caller, then the whole LFSCK first-stage scanning
will be regarded as done by wrong.

Test-Parameters: mdsfilesystemtype=zfs mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=sanity-lfsck
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e5c32b8594a65f1b605196373034ace6c9d1881
Reviewed-on: http://review.whamcloud.com/12533
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
9 years agoLU-5817 clio: Do not allow group locks with gid 0 59/12459/4
Patrick Farrell [Mon, 10 Nov 2014 07:39:29 +0000 (01:39 -0600)]
LU-5817 clio: Do not allow group locks with gid 0

When a group lock with GID=0 is released (put_grouplock is
called), an assertion in cl_put_grouplock is hit.

We should not allow group lock requests with GID=0, instead
we should return -EINVAL.

Also fix random_group_id so it never returns gid==0.

Change-Id: I56e58791742809da5353a4d8dfbf3b80a22f3814
Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: http://review.whamcloud.com/12459
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>