Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-7144 tests: skip scrub/lfsck test under interoperation 20/17520/4
Fan Yong [Wed, 14 Oct 2015 16:26:45 +0000 (00:26 +0800)]
LU-7144 tests: skip scrub/lfsck test under interoperation

Since the scrub/lfsck test are only for server side logic,
it is unnecessary to test scrub/lfsck under interoperation
mode, skip them.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I044030b3bace787809d7cfd5622000b44a8be789
Reviewed-on: http://review.whamcloud.com/17520
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7450 osd: call commit_callback if no write updates 68/17268/9
Di Wang [Tue, 17 Nov 2015 16:17:12 +0000 (08:17 -0800)]
LU-7450 osd: call commit_callback if no write updates

If it does not need write updates in some failure cases,
top_trans_stop should also call commit_callback to help
release the top_thandle in the commit list. Otherwise
it will stay in the commit list forever, as well as the
following top thandle, then update logs will be culmulated,
and cause long time recovery.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I1feaf0bd6d20f14dfabb4572f49818083e697dbb
Reviewed-on: http://review.whamcloud.com/17268
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7209 doc: more accurate documentation for obdfilter-survey 46/16646/4
Richard Henwood [Fri, 25 Sep 2015 22:05:51 +0000 (17:05 -0500)]
LU-7209 doc: more accurate documentation for obdfilter-survey

Make the the description of obdfilter-survey accurate and
precise.

Signed-off-by: Richard Henwood <richard.henwood@intel.com>
Change-Id: Icdd4adf53643e91dc8a2539f63977aae5fe28fe0
Reviewed-on: http://review.whamcloud.com/16646
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Omkar Kulkarni <omkar.kulkarni@intel.com>
Tested-by: Omkar Kulkarni <omkar.kulkarni@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7419 llog: lock new llog object creation 32/17132/4
Di Wang [Tue, 10 Nov 2015 15:31:55 +0000 (07:31 -0800)]
LU-7419 llog: lock new llog object creation

Lock the new llog object creation to avoid two
process create the same object at the same time.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Icdc0eec534ca2f15cd0e195df951416953195346
Reviewed-on: http://review.whamcloud.com/17132
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7475 lnet: ensure buffer config symmetry 70/17370/2
Amir Shehata [Fri, 27 Nov 2015 03:30:01 +0000 (19:30 -0800)]
LU-7475 lnet: ensure buffer config symmetry

When showing the configuration, make sure to add a buffers block
in the YAML output, if routing is configured, in order to allow
the same YAML block to be fed back for configuration to LNet.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I3b269edf5b3688b500bbb3656c367bc82fff6b68
Reviewed-on: http://review.whamcloud.com/17370
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6020 gss: properly map buffers to sg 19/17319/4
Andrew Perepechko [Sat, 21 Nov 2015 13:55:24 +0000 (16:55 +0300)]
LU-6020 gss: properly map buffers to sg

A lot of buffer pointers passed to buf_to_sg() as input are coming
from vmalloc(), e.g. OBD_ALLOC_LARGE() in ptlrpc_add_rqs_to_pool().
sg_set_buf() uses virt_to_page() to map virtual addresses to
struct page, which does not work for vmalloc addresses.

The original code for buf_to_sg() caused the following crash:

BUG: unable to handle kernel paging request at ffffeb040057c040
IP: [<ffffffff81300367>] scatterwalk_pagedone+0x27/0x70
PGD 0
Oops: 0000 [#1] SMP
CPU 1
Pid: 2374, comm: ptlrpcd_3 Tainted: G           O 3.6.10-030610-generic
RIP: 0010:[<ffffffff81300367>]  [<ffffffff81300367>] scatterwalk_pagedone+0x27/0x70
RSP: 0018:ffff8801a3c178a8  EFLAGS: 00010282
RAX: ffffeb040057c040 RBX: ffff8801a3c17938 RCX: ffffeb040057c040
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801a3c17970
RBP: ffff8801a3c178a8 R08: 00000000000005a8 R09: ffff8801a3c17a40
R10: ffff8801a30370d0 R11: 0000000000000a68 R12: 0000000000000010
R13: ffff8801a3c17a08 R14: ffff8801a3c17970 R15: ffff88017d1c2c80
FS:  0000000000000000(0000) GS:ffff8801afa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffeb040057c040 CR3: 0000000001c0c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ptlrpcd_3 (pid: 2374, threadinfo ffff8801a3c16000, task ffff8801a44e0000)
Stack:
 ffff8801a3c178b8 ffffffff813004bd ffff8801a3c17908 ffffffff8130303f
 ffff880100000000 ffffffff00000000 ffff8801a3c17908 ffff8801a3c17b18
 ffffc90015f015a8 0000000000000000 0000000000000010 0000000000000010
Call Trace:
 [<ffffffff813004bd>] scatterwalk_done+0x3d/0x50
 [<ffffffff8130303f>] blkcipher_walk_done+0x8f/0x230
 [<ffffffff8130a39f>] crypto_cbc_encrypt+0xff/0x190
 [<ffffffffa0688660>] ? aes_decrypt+0x80/0x80 [aesni_intel]
 [<ffffffffa0a1a1e4>] krb5_encrypt_bulk+0x164/0x5b0 [ptlrpc_gss]
 [<ffffffffa0a1a812>] gss_wrap_bulk_kerberos+0x1e2/0x490 [ptlrpc_gss]
 [<ffffffffa0a1600e>] lgss_wrap_bulk+0x2e/0x100 [ptlrpc_gss]
 [<ffffffffa0a0d98e>] gss_cli_ctx_wrap_bulk+0x44e/0x650 [ptlrpc_gss]
 [<ffffffffa0ab867c>] sptlrpc_cli_wrap_bulk+0x3c/0x70 [ptlrpc]
 [<ffffffffa0aba2d0>] sptlrpc_cli_wrap_request+0x60/0x360 [ptlrpc]
 [<ffffffffa0a8cde4>] ptl_send_rpc+0x164/0xc30 [ptlrpc]
 [<ffffffffa07be957>] ? libcfs_debug_msg+0x47/0x50 [libcfs]
 [<ffffffffa0a80ee0>] ptlrpc_send_new_req+0x3b0/0x940 [ptlrpc]
 [<ffffffffa0a86530>] ptlrpc_check_set+0x8e0/0x1d50 [ptlrpc]
 [<ffffffff816ac9f6>] ? schedule_timeout+0x146/0x260
 [<ffffffffa0ab0c9b>] ptlrpcd_check+0x4eb/0x5d0 [ptlrpc]
 [<ffffffffa0ab105f>] ptlrpcd+0x2df/0x420 [ptlrpc]
 [<ffffffff8108efa0>] ? try_to_wake_up+0x200/0x200
 [<ffffffffa0ab0d80>] ? ptlrpcd_check+0x5d0/0x5d0 [ptlrpc]
 [<ffffffff8107c5f3>] kthread+0x93/0xa0
 [<ffffffff816b8d04>] kernel_thread_helper+0x4/0x10
 [<ffffffff8107c560>] ? flush_kthread_worker+0xb0/0xb0
 [<ffffffff816b8d00>] ? gs_change+0x13/0x13

Change-Id: I346d50568b65ed10da2762ca34562fc2858a05d8
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Xyratex-bug-id: SNT-15
Reviewed-on: http://review.whamcloud.com/17319
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5690 mount: fix lmd_parse() to handle commas in expr_list 36/17036/6
Jian Yu [Fri, 20 Nov 2015 19:13:18 +0000 (11:13 -0800)]
LU-5690 mount: fix lmd_parse() to handle commas in expr_list

The lmd_parse() function parses mount options with comma as
delimiter without considering commas in expr_list as follows
is a valid LNET nid range syntax:

<expr_list>  :== '[' <range_expr> [ ',' <range_expr>] ']'

This patch fixes the above issue by using cfs_parse_nidlist()
to parse nid range list instead of using class_parse_nid_quiet()
to parse only one nid.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I8ba6ee6eb31b4bb078a83d9db213cfca27b0fe66
Reviewed-on: http://review.whamcloud.com/17036
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6714 llog: test on-disk llog header values 87/16287/4
Mikhail Pershin [Thu, 20 Aug 2015 08:28:06 +0000 (11:28 +0300)]
LU-6714 llog: test on-disk llog header values

llog_test_2():
- re-enable the disabled llog_open() test cases.
- Checks that llog_log_hdr values are written atomically
  with the llog record addition/cancelling.

Patch contains also minor fixes:
- verify_handle() does header sanity checks at first then
  checks amount of records against expected value.
- llog_test_3: rename test_3 static variables to show
  that they are related to the test_3.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Iedcf15c8f365f9c2021abae3325edcaf08efc4c9
Reviewed-on: http://review.whamcloud.com/16287
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7030 security: put imp_sec after all requests drained off 71/16071/2
Niu Yawei [Tue, 25 Aug 2015 03:07:34 +0000 (23:07 -0400)]
LU-7030 security: put imp_sec after all requests drained off

imp_sec should be put after all requests being drained off.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I35f572fcc79b2bd1991db14577226a3ea735630d
Reviewed-on: http://review.whamcloud.com/16071
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7530 mdt: Do not leak identity when no nodemap is present 19/17519/5
Oleg Drokin [Wed, 9 Dec 2015 02:44:18 +0000 (21:44 -0500)]
LU-7530 mdt: Do not leak identity when no nodemap is present

It looks like sometimes nodemap structure on the export is not there
due to a race in old_init_ucred_common.
Move the nodemap check to the start not to leak identity reference
in such a case.

The bug was introduced in commit 2aea469a3a6e214d from LU-7199

Also silence the warning as there's nothing sysadmins could do when
it happens.

Change-Id: I5329ccb16201a71a263eb586e3a486b26ff238db
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/17519
Tested-by: Jenkins
Reviewed-by: Kit Westneat <kit.westneat@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
4 years agoLU-6910 osp: add procfs values for OST reserved size 31/15731/20
Alexander Boyko [Mon, 10 Aug 2015 11:40:28 +0000 (14:40 +0300)]
LU-6910 osp: add procfs values for OST reserved size

osp_pre_status=-ENOSPC is used to skip OST from object allocation.
The error was set when OST available space is less than 0.1% of total
OST size. This value is not configurable, so procfs files was
added:
reserved_mb_low - low watermark, if available space is less
  than it, object allocation is stopped.
reserved_mb_high - highw watermark, if available space is more
   than it, object allocation is enabled.

By default ~0.1% is reserved as low watermark. The high watermark
is twice bigger than the low by default.
High and low watermark could be changed by:
lctl set_param osp.lustre-OST0000-osc-MDT0000.reserved_mb_high=1024

When object allocation is disabled, a clients could appened to
existing files. And 0.1% is too low for them. For example, OST size
is 8TB, 0.1% is 8GB, if cluster has 1k clients, reserved space is
~8MB per client. The main reason of the patch is ability to increase
reserved space.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Xyratex-bug-id: MRP-2606
Change-Id: Ie48cc1a232f64aa7dc922000861004277fb47340
Reviewed-on: http://review.whamcloud.com/15731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7136 test: allow more time for copytools to stop 99/17499/4
John L. Hammond [Mon, 7 Dec 2015 17:40:54 +0000 (11:40 -0600)]
LU-7136 test: allow more time for copytools to stop

In sanity-hsm allow up to 200 seconds for the copytool to stop. This
is needed to prevent sporadic failures due to slow NFS (for the
copytool log file) delaying the termination of the copytool.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Icafc4e9c5a00c849dcb479233826de058d2ede62
Reviewed-on: http://review.whamcloud.com/17499
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6767 osd-zfs: Track readonly status of ZFS 00/15400/3
Nathaniel Clark [Thu, 25 Jun 2015 21:24:34 +0000 (17:24 -0400)]
LU-6767 osd-zfs: Track readonly status of ZFS

Return READONLY from osd_statfs() if underlying ZFS has been set to
READONLY, or if osd_ro() has been called.  This adds a callback for
ZFS_PROP_READONLY for when it's changed.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ib7f35925904b1d93f9a457936585e9783635c849
Reviewed-on: http://review.whamcloud.com/15400
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6229 utils: fix lustre_rsync bug of cascade move 14/14914/3
Li Xi [Fri, 22 May 2015 04:07:00 +0000 (12:07 +0800)]
LU-6229 utils: fix lustre_rsync bug of cascade move

When replaying the changelog, destination files have to be
put into a special directory, if their parent directory is
possessing a different path other than the ultimate path
because of renaming. With the replaying process going on,
when the parent directory is being moved to the ultimate path,
the child files should be moved under the parent directory
which is called cascade move.

As long as a directory has child files under sepcial direcoty,
cascade move should happen, no matter the direcotry is being
renamed from sepcial direcoty or not. This patch fixes the problem
that cascade move is missing when the direcotry is being renamed
from ordinary path.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I2d21604b81fe0cf08df1af2bfccc90a32986bf05
Reviewed-on: http://review.whamcloud.com/14914
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7515 obdclass: add export for lprocfs_stats_alloc_one() 43/17443/5
Chennaiah Palla [Thu, 3 Dec 2015 10:57:16 +0000 (16:27 +0530)]
LU-7515 obdclass: add export for lprocfs_stats_alloc_one()

 When compiling Lustre without optimization, when using GCOV,
 the lprocfs_stats_alloc_one() symbol is not properly exported to other modules
 and causes the ptlrpc module to fail loading with an unknown symbol.
 Added EXPORT_SYMBOL(lprocfs_stats_alloc_one) so that this works properly.

Seagate-bug-id: MRP-3188
Signed-off-by: Chennaiah Palla <chennaiah.palla@seagate.com>
Change-Id: I8ef02a0e0bf519fa93f85cb162a6340e3feeb736
Reviewed-on: http://review.whamcloud.com/17443
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-3569 utils: remove ll_recover_lost_found_obj 77/16477/6
Lai Siyao [Tue, 13 Oct 2015 09:22:40 +0000 (17:22 +0800)]
LU-3569 utils: remove ll_recover_lost_found_obj

remove obsolete tool ll_recover_lost_found_obj.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I5d5f33f5c9d68bb1f05d7ab0da6fb2986e873501
Reviewed-on: http://review.whamcloud.com/16477
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7446 clio: lov_io_init() should return error code 40/17240/4
Bobi Jam [Fri, 13 Nov 2015 10:48:53 +0000 (18:48 +0800)]
LU-7446 clio: lov_io_init() should return error code

lov_io_init_empty/release() should returns error code instead of
true on error case.

Fault IO need handle restart in the case of accessing HSM released
file.

Add a test case.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I4953c12c1e9b82a16aed9b8b1e3fe6e38d783b24
Reviewed-on: http://review.whamcloud.com/17240
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6732 llite: ll_write_begin/end not passing on errors 02/15302/6
Hiroya Nozaki [Tue, 16 Jun 2015 05:41:00 +0000 (14:41 +0900)]
LU-6732 llite: ll_write_begin/end not passing on errors

Because of a implementation of generic_perform_write(), write(2)
may return 0 with no errno even if EDQUOT or ENOSPC actually
happend in it.
This patch fixes the issue with setting a proper errno to
ci_result and get it in ll_file_io_generic.

Signed-off-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Change-Id: I3fc986b57d703ad5fbf41e1ea8182d2d561e8005
Reviewed-on: http://review.whamcloud.com/15302
Tested-by: Maloo <hpdd-maloo@intel.com>
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-1606 misc: clean up DFID related error messages 56/6156/9
Andreas Dilger [Thu, 25 Apr 2013 04:23:57 +0000 (22:23 -0600)]
LU-1606 misc: clean up DFID related error messages

Improve the error messages related to DFID output and parsing left
over from removal of LPU64/LPX64 usage in userspace.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I4b4fcb3cc389b8d8ec4375fa92bfee9b353ebbe5
Reviewed-on: http://review.whamcloud.com/6156
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-2524 test: Clean up sanity-quota 80/14680/8
James Nunez [Tue, 5 May 2015 17:53:30 +0000 (11:53 -0600)]
LU-2524 test: Clean up sanity-quota

Conduct miscellaneous cleanup to sanity-quota including:
Removing the `-p` (parents) option from many calls to mkdir
Replace `lfs setstripe` with $SETSTRIPE
Added check for and call to `error` with error messages for a variety
of common routines ,like mkdir, or for functions that return a value.
Replace `…` with $(...)
Removed linefeed escape after |, ||, & and && operators.
Modified parameters in test 4b so that the --inode-grace value exceeds
the valid range.
Removed unused variables
Removed $ from variables inside $(())

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Iadedea0bd0a0f85235e0bb908ee0a6ed36503eb3
Reviewed-on: http://review.whamcloud.com/14680
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7148 osc: Remove remains of osc_ast_guard 92/16392/2
Oleg Drokin [Mon, 14 Sep 2015 04:18:46 +0000 (00:18 -0400)]
LU-7148 osc: Remove remains of osc_ast_guard

osc_ast_guard has been removed by the clio simplification.
Remove the extern declaartion and lock class definition.

Change-Id: Ibcf14e7aebe1dab8b586d3cd8d81560f6d3dcc81
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/16392
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
4 years agoLU-5951 ptlrpc: track unreplied requests 59/16759/4
Niu Yawei [Thu, 8 Oct 2015 02:14:56 +0000 (22:14 -0400)]
LU-5951 ptlrpc: track unreplied requests

The request xid was used to make sure the ost object timestamps
being updated by the out of order setattr/punch/write requests
properly. However, this mechanism is broken by the multiple rcvd
slot feature, where we deferred the xid assignment from request
packing to request sending.

This patch moved back the xid assignment to request packing, and
the manner of finding lowest unreplied xid is changed from scan
sending & delay list to scan a unreplied requests list.

This patch also skipped packing the known replied XID in connect
and disconnect request, so that we can make sure the known replied
XID is increased only on both server & client side.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ic98e1599085871c0ac08d28609a044c79d5af75d
Reviewed-on: http://review.whamcloud.com/16759
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Grégoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7508 ldlm: Don't check opcode with NULL rq_reqmsg 14/17414/2
Jeremy Filizetti [Tue, 1 Dec 2015 19:54:10 +0000 (14:54 -0500)]
LU-7508 ldlm: Don't check opcode with NULL rq_reqmsg

When GSS is enabled it's possible to have a NULL rq_reqmsg
if a bad signature or no context is returned during the unwrap
of the request.  Don't check the opcode in this case.

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: I3a74dff7638b318190c5c4ad73acbe7ec299aa80
Reviewed-on: http://review.whamcloud.com/17414
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
4 years agoLU-7268 scrub: NOT assign LMA for EA inode 43/17043/3
Fan Yong [Thu, 17 Sep 2015 16:40:03 +0000 (00:40 +0800)]
LU-7268 scrub: NOT assign LMA for EA inode

Originally, when OI scrub scans the device, if the target inode has
no FID-in-LMA EA, then it will generate an IGIF mode FID and store
it in the LMA EA. Such behavior is not suitable if the target inode
is used for large EA. The OI scrub should skip the EA inode that is
marked as "LDISKFS_EA_INODE_FL".

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I52b05b864ef8a2797a2f3dda0f80f95227809c34
Reviewed-on: http://review.whamcloud.com/17043
Tested-by: Jenkins
Reviewed-by: Kalpak Shah <kalpak.shah@seagate.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-6298 hsm: shutdown HSM CDTs in parallel 01/13901/4
John L. Hammond [Wed, 25 Feb 2015 22:12:49 +0000 (16:12 -0600)]
LU-6298 hsm: shutdown HSM CDTs in parallel

In sanity-hsm.sh rewrite copytool_cleanup() to shutdown and restart
the MDT HSM coordinators in parallel. This saves about 8 * (MDSCOUNT -
1) seconds per call.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I75445ad126dc73251a3d056611133e3ab6b83362
Reviewed-on: http://review.whamcloud.com/13901
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5921 tests: enhance server target mount race testing 02/17302/2
Bruno Faccini [Fri, 20 Nov 2015 14:38:11 +0000 (15:38 +0100)]
LU-5921 tests: enhance server target mount race testing

This patch is a follow on to LU-5299 to strengthen and enhance
concurrent server target mount race testing.
It uses OBD_RACE() feature to better set a concurrent/racy
situation, and also allow to handle all mount errors instead
of only EALREADY.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I16a94e5aa046e15096d2e55d57e22899a93fa03f
Reviewed-on: http://review.whamcloud.com/17302
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7383 mdt: retry for busy lock during migration 48/17048/5
Di Wang [Tue, 3 Nov 2015 15:32:13 +0000 (07:32 -0800)]
LU-7383 mdt: retry for busy lock during migration

In migration, if the lock of the migrating object
is being cached on other node, it should revoke
the lock and retry, instead of return -EBUSY.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I1317681a892b9a21f2c78d7696ca6f94d43bd9bc
Reviewed-on: http://review.whamcloud.com/17048
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoNew tag 2.7.64 2.7.64 v2_7_64 v2_7_64_0
Oleg Drokin [Tue, 8 Dec 2015 04:42:39 +0000 (23:42 -0500)]
New tag 2.7.64

Change-Id: I79fb95af8bc9e979edf3214315219f786eb12599
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7428 test: disable conf-sanity, test_84 82/17482/2
Bob Glossman [Fri, 4 Dec 2015 19:50:15 +0000 (11:50 -0800)]
LU-7428 test: disable conf-sanity, test_84

Add failing test to ALWAYS_EXCEPT.
This is a temprorary workaround until a real
fix for the test failure is developed.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I73e00d658a9e7728ce52b5dc90741e9a18ce15f9
Reviewed-on: http://review.whamcloud.com/17482
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-7437 lctl: list_param -R can't work correctly 23/17223/4
Emoly Liu [Mon, 23 Nov 2015 07:30:10 +0000 (15:30 +0800)]
LU-7437 lctl: list_param -R can't work correctly

We shouldn't call lprocfs_param_pattern() inside listparam_display(),
otherwise it will add the prefix "/proc/{fs,sys}/{lnet,lustre}" each
time, so that the parameters can be listed recursively. The similar
issue in {set/get}param is fixed as well.

Also, this patch adds sanity.sh test_401 to verify this function.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I21acec364cdbdfc025979153f66a87d44c9136e8
Reviewed-on: http://review.whamcloud.com/17223
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7297 osd-zfs: initialize oh_lock 19/16919/3
Olaf Faaland [Thu, 22 Oct 2015 20:31:05 +0000 (13:31 -0700)]
LU-7297 osd-zfs: initialize oh_lock

The ZFS osd was not initializing od_brw_stats.hist[].oh_lock.
This rectifies that.

Change-Id: I3f637b73c77908c2297bfab97e33eca63b0d5986
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: http://review.whamcloud.com/16919
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
4 years agoLU-1026 ldiskfs: make bitmaps corruption not fatal 79/16679/8
Wang Shilong [Sat, 11 Jul 2015 03:49:55 +0000 (11:49 +0800)]
LU-1026 ldiskfs: make bitmaps corruption not fatal

We still hit bitmaps problems for rhel6 series kernel,
corruptions happen because ext4_mb_check_ondisk_bitmap()
check failed and FS become RO again:

ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group
294corrupted: 20180 blocks free in bitmap, 20181 - in gd
Aborting journal on device dm-6-8.
LDISKFS-fs (dm-6): Remounting filesystem read-only
ldiskfs_mb_new_blocks: Updating bitmap error: [err -30]
 [pa ffff880d9d6e4d68] [phy 14974976] [logic 8192] [len 3072]
 [free 3072] [error 1] [inode 278678]
ldiskfs_ext_new_extent_cb: Journal has aborted

this might be caused by some ext4 internal bugs, this patch
did the following things:

1.Inside ext4_read_block_bitmap() have gaven reasons
why it failed, so caller don't need call ext4_error() again.
2. mark block group corrupt and use ext4_warning() instead
of ext4_error().

There are still some bitmaps corruption places not handling,
let's keep it for now, and if it really hurt, let's add the
same handling codes logic later.

Tested by following scripts:

TEST_DEV="/dev/sdb"
TEST_MNT="/mnt/ext4"

mkdir -p $TEST_MNT
mkfs.ext4 -F $TEST_DEV >&/dev/null

mount -t ldiskfs $TEST_DEV $TEST_MNT
dd if=/dev/zero of=$TEST_MNT/largefile
oflag=direct bs=10485760 count=200
umount $TEST_MNT
dd if=/dev/zero of=$TEST_DEV bs=4096 seek=641
count=10 oflag=direct
mount -t ldiskfs $TEST_DEV $TEST_MNT
rm -f $TEST_MNT/largefile
dd if=/dev/zero of=$TEST_MNT/largefile oflag=direct
bs=10485760 count=200 && echo
  "FILESYSTEM still usable after bitmaps corrupts happen"
dmesg | tail
umount $TEST_MNT
e2fsck $TEST_DEV -y

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Iabb6ebf719d80d9ba4f41bee0b237e304212832b
Reviewed-on: http://review.whamcloud.com/16679
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7315 osd-ldiskfs: handle pdo lock properly 24/16924/6
Fan Yong [Sat, 3 Oct 2015 01:45:32 +0000 (09:45 +0800)]
LU-7315 osd-ldiskfs: handle pdo lock properly

Inside the osd_dirent_check_repair(), if the logic comes to
"goto again", it only unlock the "hlock" but without seting
the variable @hlock as NULL. Althouth it will not cause any
logic failure, it may make the readers to be confused. This
patch will set "hlock = NULL;" explicitly to avoid trouble.

On the other hand, inside ldiskfs, the pdo lock users need
to check whether the lock handler is NULL or not properly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9db9dc758a2976849c299f76e06723e796da235d
Reviewed-on: http://review.whamcloud.com/16924
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6856 zfs: handle non existing file in osd_object_ref_del 11/15611/3
Jinshan Xiong [Mon, 1 Jun 2015 18:08:07 +0000 (11:08 -0700)]
LU-6856 zfs: handle non existing file in osd_object_ref_del

Remove false assertion in zfs:osd_object_ref_del() because this
may be in the cleanning path of error handling.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ib7b9d80816bdab7f68b36a33e95140ea7f3eae8c
Reviewed-on: http://review.whamcloud.com/15611
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
4 years agoLU-6802 ptlrpc: reset imp_replay_cursor 51/17351/2
Hongchao Zhang [Mon, 16 Nov 2015 02:33:51 +0000 (10:33 +0800)]
LU-6802 ptlrpc: reset imp_replay_cursor

At client side, the replay cursor using to speed up the lookup
of committed open requests in its obd_import should be resetted
for normal connection (not reconnection) during recovery.

Change-Id: I68816780a5d79053d9109cb68ae1c3b8ea13ede8
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/17351
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6693 out: not return NULL in object_update_param_get 17/16417/4
Di Wang [Sun, 4 Oct 2015 18:26:26 +0000 (11:26 -0700)]
LU-6693 out: not return NULL in object_update_param_get

Return ERR_PTR in object_update_param_get() for all cases to
avoid unnecessary confusion to callers.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Idfcc19d99bbf308759481b3d60d95341745d19e8
Reviewed-on: http://review.whamcloud.com/16417
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7316 build: Update ZFS/SPL version to 0.6.5.3 77/16877/2
Nathaniel Clark [Mon, 19 Oct 2015 18:13:49 +0000 (14:13 -0400)]
LU-7316 build: Update ZFS/SPL version to 0.6.5.3

Bug Fixes

* Fix CPU hotplug zfsonlinux/spl#482
* Disable dynamic taskqs by default to avoid deadlock
  zfsonlinux/spl#484
* Don't import all visible pools in zfs-import init script
  zfsonlinux/zfs#3777
* Fix use-after-free in vdev_disk_physio_completion
  zfsonlinux/zfs#3920
* Fix avl_is_empty(&dn->dn_dbufs) assertion zfsonlinux/zfs#3865

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I36347630be2506bee4ff0a05f1b236ba2ba7a0ae
Reviewed-on: http://review.whamcloud.com/16877
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-4423 lnet: don't use iovec instead of kvec 05/17205/3
Al Viro [Thu, 26 Nov 2015 15:10:02 +0000 (10:10 -0500)]
LU-4423 lnet: don't use iovec instead of kvec

Replace struct iovec with struct kvec.

Linux commit: f351bad2b4b4bb74810ad4f127f6602e2d2ae403

Change-Id: Ib7bb49069e42ca82d66a149617361c73ee4d710d
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: http://review.whamcloud.com/17205
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7462 mdd: check object existence 23/17323/3
Di Wang [Sat, 21 Nov 2015 15:33:29 +0000 (07:33 -0800)]
LU-7462 mdd: check object existence

Check object existence in mdd_is_parent()
and _mdd_lookup(), so the following retrieving
attributes will not panic if object does not
exist.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Icae243949e202f9d6a4d38b9823373101a093c74
Reviewed-on: http://review.whamcloud.com/17323
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-7375 lbuild: add missing case to lbuild 24/17024/4
Bob Glossman [Mon, 2 Nov 2015 23:39:37 +0000 (15:39 -0800)]
LU-7375 lbuild: add missing case to lbuild

Add the missing case of el6.7 to autodetect_target

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I6a8032eb03b2fb3002798aa49ee585879cf711c3
Reviewed-on: http://review.whamcloud.com/17024
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7103 test: avoid cat of /dev/urandom 50/17350/3
Bob Glossman [Tue, 24 Nov 2015 19:29:43 +0000 (11:29 -0800)]
LU-7103 test: avoid cat of /dev/urandom

Use a different command to generate a random number.
Using cat /dev/urandom doesn't work in all cases.

Test-Parameters: clientdistro=el7
testlist=ost-pools,ost-pools,ost-pools,ost-pools,ost-pools,ost-pools,ost-pools,
ost-pools,ost-pools,ost-pools,ost-pools,ost-pools,ost-pools,ost-pools

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I63cfad2654cf654460263529457538965e373f82
Reviewed-on: http://review.whamcloud.com/17350
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7447 lfsck: correct nlink attr for new created dir 69/17269/5
Fan Yong [Sat, 3 Oct 2015 07:49:46 +0000 (15:49 +0800)]
LU-7447 lfsck: correct nlink attr for new created dir

For new created directory object, there are three main operations:
1) insert dot entry
2) insert dotdot entry
3) ref_add on the directory object.

Sometimes, the 3rd step maybe ahead of 1st, sometimes maybe between
1st and 2nd. Usually, the developers would think that such order is
not important. But in fact, such assumption isn't true, because the
ldiskfs will set the new created directory object's nlink attr as 2
when inserting dotdot entry. Then if ref_add() is called after ".."
inserted, the empty directory object's nlink attr is 3.

To avoid above trouble and make the OSD API to be order-independent,
we will make the osd-ldiskfs to backup the new created dir object's
nlink attr before calling ldiskfs_add_dot_dotdot(), then restore it
after ldiskfs_add_dot_dotdot() called.

On the other hand, this patch also adds some missed logic for lfsck
declaring to insert dot/dotdot entries, remove some redundant logic.

Add e2fsck check for on-disk consistency verification after LFSCK.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I065f683388417f3e23d47471c065419bd9bfcb19
Reviewed-on: http://review.whamcloud.com/17269
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-7371 test: wrong read length over isize 60/17060/3
Li Xi [Fri, 6 Nov 2015 09:06:08 +0000 (17:06 +0800)]
LU-7371 test: wrong read length over isize

This patch adds tests to check read length is correct if reading
a file of size 4095.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I0c0f6b378fa4af053ed54f2f5dea2418191a7b69
Reviewed-on: http://review.whamcloud.com/17060
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-6851 lnet: Ignore hops if not explicitly set 19/15719/12
Amir Shehata [Fri, 24 Jul 2015 21:14:38 +0000 (14:14 -0700)]
LU-6851 lnet: Ignore hops if not explicitly set

Since the # of hops is not a mandatory parameter the LU-6060
patch will cause problems to already existing systems since it
changes the behavior by which a route is determined down.

To fix this case the # of hops now defaults to LNET_UNDEFINED_HOPS
if no hop count is specified.

LNET_UNDEFINED_HOPS is defined to ((__u32)-1). When it's printed as
%d, it displays as -1.

__u32 is used through out the call stack for hop count to explicitly
define the size of the hop count and to avoid any sizing issues when
passing data to and from the kernel.

To keep existing behavior both lnet_compare_routes() and LNetDist()
will treat undefined hop count as hop count 1.

When executing the logic in lnet_parse_rc_info() there is no
longer an assumption that the default hop count is 1. If
the hop count is 1 then it must've been explicitly set by
the user.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I1a28a35a4edc2437cf95cb9d455e59c8102736fa
Reviewed-on: http://review.whamcloud.com/15719
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf@sgi.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7400 lod: register stop callbacks at create 59/17059/8
Alex Zhuravlev [Fri, 6 Nov 2015 07:20:53 +0000 (10:20 +0300)]
LU-7400 lod: register stop callbacks at create

In some cases we stop just created transaction (i.e. it's
empty, not started), but then top_trans_stop() waits
indefinitely for stop callbacks which were supposed to be
register at top_trans_start(). instead register them at
top_trans_create().

Instead of register the commit callback in top_trans_start(),
we should register the commit callback once the top thandle
is added into commit list, because in some error cases,
top_trans_start() might not be called, then the top thandle
will stay in the commit list forever, and also blocking
llog cancellation.

Change-Id: I1ca528e2f8f5e4d9cf6b5dd484653a055e17cc6c
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/17059
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7199 nodemap: assign nodemap to export before connecting 02/16802/7
Kit Westneat [Tue, 13 Oct 2015 02:58:18 +0000 (22:58 -0400)]
LU-7199 nodemap: assign nodemap to export before connecting

This patch moves the nodemap assignment to be before the connection
is active, and the nodemap deassignemnt to be after the connection
is made inactive. It also checks for null nodemaps and returns
-EACCES if the nodemap is not set in order to avoid a kernel panic.

Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Id4dd02cf5b4208bda6d6f17aed6adb13b18c7731
Reviewed-on: http://review.whamcloud.com/16802
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7428 tests: write superblock in conf-sanity test_84 71/17371/3
Andreas Dilger [Fri, 27 Nov 2015 07:29:04 +0000 (00:29 -0700)]
LU-7428 tests: write superblock in conf-sanity test_84

In conf-sanity.sh test_84() a newly-formatted MDS filesystem's block
device is set read-only immediately after mount via replay_barrier(),
which may result in initial formatting or configs to be discarded,
resulting in a wide variety of different failure modes for this test.

Ensure that the superblock and other configuration logs are flushed
to disk before replay_barrier() is called, so that the MDS can mount
properly again later in the test.

Test-Parameters: testlist=conf-sanity,conf-sanity,conf-sanity,conf-sanity
Test-Parameters: testlist=conf-sanity,conf-sanity,conf-sanity,conf-sanity
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I95894b600ed7596c2014cba4f35fef4b443ebbe5
Reviewed-on: http://review.whamcloud.com/17371
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7329 obdclass: sync device to flush journal callbacks 52/17052/3
Bruno Faccini [Thu, 5 Nov 2015 15:44:52 +0000 (16:44 +0100)]
LU-7329 obdclass: sync device to flush journal callbacks

After llog_test_10() has been added to check correct Catalog
wrap-around as part of LU-6556, frequent auto-tests failures
have been encountered due to OOM situation on VMs caused by
a lot of journal commit callbacks backlog induced by the huge
LLOG activity generated by new test.
This patch forces frequent device sync to flush these journal
commit callbacks.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: If06914a395f5bacbbb55fcdc229ffcd47d742843
Reviewed-on: http://review.whamcloud.com/17052
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7098 osd-ldiskfs: don't alloc inode directly 04/16804/5
Yang Sheng [Tue, 13 Oct 2015 07:43:27 +0000 (15:43 +0800)]
LU-7098 osd-ldiskfs: don't alloc inode directly

We should alloc ldiskfs_inode_info instead alloc inode
directly. Else will overflow in follow ldiskfs_add_entry.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ib88f847ea8f160f6f0d0c7e7e680247d7ec96fb6
Reviewed-on: http://review.whamcloud.com/16804
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
4 years agoLU-1095 mdc: remove console spew from mdc_ioc_fid2path 78/17078/3
Andreas Dilger [Fri, 6 Nov 2015 21:50:01 +0000 (14:50 -0700)]
LU-1095 mdc: remove console spew from mdc_ioc_fid2path

In some cases with a very long pathname, such as with sanity.sh
test_154c, mdc_ioc_fid2path() would spew long debug messages to
the log, because libcfs_debug_vmsg2() refuses to log messages over
one page in size.

Truncate the debug message to only log the last 512 characters
of the pathname, which is sufficient for most debugging, saves a
bit of space in the debug log, and will prevent the debug logging
from printing to the console in the first place.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I3cad19d02e8065574b8baf5694e9894e43112318
Reviewed-on: http://review.whamcloud.com/17078
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7396 llite: check request != NULL in ll_migrate 79/17079/3
Di Wang [Thu, 5 Nov 2015 10:25:52 +0000 (02:25 -0800)]
LU-7396 llite: check request != NULL in ll_migrate

Check if the request is NULL, before retrieve reply body
from the request.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Ifec9caf270b938b7583de0315610f930fa52649d
Reviewed-on: http://review.whamcloud.com/17079
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-7276 utils: make llog_reader consistent with kernel 88/16788/2
Andreas Dilger [Fri, 9 Oct 2015 19:10:09 +0000 (13:10 -0600)]
LU-7276 utils: make llog_reader consistent with kernel

Handle the CM_SKIP records in the same manner as the kernel,
by printing "SKIP" only until the first CM_END record, rather
until the first CM_END|CM_SKIP record.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ie0c35ecd9d74a1e0b378bcf6642bca32e4dfa35e
Reviewed-on: http://review.whamcloud.com/16788
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7068 mdd: object leak in mdd_migrate_entries 84/16384/4
wang di [Thu, 10 Sep 2015 04:34:47 +0000 (21:34 -0700)]
LU-7068 mdd: object leak in mdd_migrate_entries

mdd_migrate_entries should release the object
if mdd_trans_start() fails.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: If10dce8b450e8c19bb93465beb08118e98c4ed96
Reviewed-on: http://review.whamcloud.com/16384
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
4 years agoLU-6666 osc: Do not merge extents with partial pages 68/15468/3
Jinshan Xiong [Thu, 2 Jul 2015 07:30:55 +0000 (15:30 +0800)]
LU-6666 osc: Do not merge extents with partial pages

After range lock is introduced to Lustre, it's possible for
multiple threads to submit osc_extents with partial pages, and
finally I/O engine may try to merge these extents, which will
end up with assert in osc_build_rpc().

In this patch, osc_extent::oe_no_merge is introduced, and this flag
is set if osc_extent submitted via osc_io_submit() includes partial
pages. This flag is used by I/O engine to stop merging this kind
of extents.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I0c851912beb0370553bc2e2b05f80dee175a2f1f
Reviewed-on: http://review.whamcloud.com/15468
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7463 osd: Change existence assert into error 24/17324/2
Di Wang [Sat, 21 Nov 2015 15:42:27 +0000 (07:42 -0800)]
LU-7463 osd: Change existence assert into error

In osd_declare_xx(), some objects might not existent,
especially when calling out_handler(). Let's change
these assert into -ENOENT error to avoid panic on
MDS.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: If17a013a6939a1ebe0519406d39e405fd915110f
Reviewed-on: http://review.whamcloud.com/17324
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
4 years agoLU-7461 lod: retry to get remote update log 22/17322/2
Di Wang [Sat, 21 Nov 2015 15:16:28 +0000 (07:16 -0800)]
LU-7461 lod: retry to get remote update log

If the remote MDT is also in recovery status,
then retrieving update logs in lod_sub_recovery_thread()
might return -EAGAIN or -EIO or -EBUSY, let's
retry in this case until the recovery is aborted or
the local MDT is umounted.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Iee945942bd01925cdcfe75c4e59dccbd63b34498
Reviewed-on: http://review.whamcloud.com/17322
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7416 osp: check rq_repmsg in osp_request_commit_cb 30/17130/2
Di Wang [Tue, 10 Nov 2015 10:22:20 +0000 (02:22 -0800)]
LU-7416 osp: check rq_repmsg in osp_request_commit_cb

Check if rq_repmsg is NULL before retrieving
last committed transno from reply message.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Ibf1e110e33df333934c65dfcf52870954e936180
Reviewed-on: http://review.whamcloud.com/17130
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7343 osd-ldiskfs: handle ldiskfs_append failure 48/17148/5
Fan Yong [Fri, 2 Oct 2015 05:21:21 +0000 (13:21 +0800)]
LU-7343 osd-ldiskfs: handle ldiskfs_append failure

In new linux kernel (linux-3.1x, x>=0), the ldiskfs exported
function ldiskfs_append() return error# via the return value,
instead of via the output parameter @err as it does on other
kernels (linux-2.6). Under such case, the caller should not
assume non-NULL returned value is valid buffer head, it can
stands for error#. So check that properly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I4dca43bcfd31aafd999f54934a51d258071dab22
Reviewed-on: http://review.whamcloud.com/17148
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-7376 tests: sanity-hsm/59 should skip old servers. 29/17029/3
Alex Zhuravlev [Tue, 3 Nov 2015 12:05:52 +0000 (15:05 +0300)]
LU-7376 tests: sanity-hsm/59 should skip old servers.

there is no poin to crash vulnerable versions.

Change-Id: Iacafd10d2a3d04ba1bb9ca70d8e343809490a349
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/17029
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7436 tests: skip conf-sanity/91 with old servers 22/17222/2
Alex Zhuravlev [Tue, 17 Nov 2015 06:39:25 +0000 (09:39 +0300)]
LU-7436 tests: skip conf-sanity/91 with old servers

due to missing functionality.

Change-Id: I2be72820413aee9a9d7082c2d8c6f308eeb6e141
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/17222
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7415 kernel: kernel update RHEL 6.7 [2.6.32-573.8.1.el6] 19/17119/3
Bob Glossman [Tue, 10 Nov 2015 16:03:49 +0000 (08:03 -0800)]
LU-7415 kernel: kernel update RHEL 6.7 [2.6.32-573.8.1.el6]

Update RHEL6.7 kernel to 2.6.32-573.8.1.el6

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I1b3c4b144b06e5e96f818e35c08f490e574ed798
Reviewed-on: http://review.whamcloud.com/17119
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7164 osc: osc_extent should hold refcount to osc_object 33/16433/3
Jinshan Xiong [Tue, 15 Sep 2015 19:19:10 +0000 (12:19 -0700)]
LU-7164 osc: osc_extent should hold refcount to osc_object

To avoid a race that osc_extent and osc_object destroy happens on the
same time, which causes kernel crash.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I3e3237f0d1cff4bd992bef4e4c01355a1d5c8d9f
Reviewed-on: http://review.whamcloud.com/16433
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7384 lfsck: check transaction stop status 42/17042/3
Fan Yong [Fri, 18 Sep 2015 07:52:31 +0000 (15:52 +0800)]
LU-7384 lfsck: check transaction stop status

The LFSCK modification will be sent to remote server when the
transaction stop, for sync transaction case, we can check the
dt_trans_stop() result.

If the lfsck_namespace_create_orphan_dir() failed, but we may
ignored that before because of ignoring dt_trans_stop result.
Then it may cause subsequent lfsck_namespace_insert_normal()
failed at LASSERT(lu_object_exists(o) != 0);

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: If897b7bd479ecdb61e6435f3177211f865a4e303
Reviewed-on: http://review.whamcloud.com/17042
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-3536 lfsck: reuse parameter name for re-locating object 32/16932/7
Fan Yong [Wed, 23 Sep 2015 15:24:02 +0000 (23:24 +0800)]
LU-3536 lfsck: reuse parameter name for re-locating object

Usually, LFSCK engine will locate the object against the bottom
device (OSD), then make related check/repair directly. Sometimes,
such as lfsck_namespace_repair_dirent(), we need to modify based
on LOD device. Under such case, the LFSCK will re-locate related
object with the same FID.

Originally, there is no special rules about the parameter's name,
that is confused which one should be used. For example, the input
parameter is named as "parent" that is against OSD, we need to
re-locate the obj based on the LOD, named as "pobj", then in the
subsequent logic, "pobj" should be used, but unfortunately, the
"parent" may be used by wrong. It is difficult to find out such
invalid usage.

To avoid such trouble, we prefer to reuse the (input) parameter
name after re-locating the object, name "pobj" as "parent".

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I5b6d7c5c10e1817ef2bade4931485228b26c511d
Reviewed-on: http://review.whamcloud.com/16932
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
4 years agoLU-3322 lnet: make connect parameters persistent 74/17074/5
Amir Shehata [Fri, 6 Nov 2015 20:41:01 +0000 (12:41 -0800)]
LU-3322 lnet: make connect parameters persistent

Store map-on-demand and peertx credits in the peer, since the peer
is persistent. Also made sure that when assigning the parameters
received on the connection to the peer structure through create,
that if another peer is added before grabbing the lock we assign
these parameters to it as well.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ie68f1ba1349d15b0a31eff9a2ca454df8e408ea9
Reviewed-on: http://review.whamcloud.com/17074
Tested-by: Jenkins
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7324 lnet: recv could access freed message 65/17065/4
Liang Zhen [Fri, 6 Nov 2015 14:23:05 +0000 (22:23 +0800)]
LU-7324 lnet: recv could access freed message

When lnet_parse_put calls lnet_ptl_match_md, this function can attach
current message on the delayed list if there is no match. It means
this message can be taken over and freed by another thread who is
posting new MD, then it is not safe for caller of lnet_parse_put to
check this message again.

This patch fixes this issue by adding a local variable "ready_delay"
to store corresponding status of lnet_msg, so lnet doesn't need to
check the message again if lnet_ptl_match_md returned MATCH_NONE for
it.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I0f8827103dd637648112e936ce6e685266e5ca40
Reviewed-on: http://review.whamcloud.com/17065
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7221 ldlm: do not take a reference on target if stopping 40/16940/5
Bruno Faccini [Mon, 26 Oct 2015 13:37:19 +0000 (14:37 +0100)]
LU-7221 ldlm: do not take a reference on target if stopping

In the set of changes of patch for LU-5569
(http://review.whamcloud.com/11750/,
commit 892078e3b566c04471e7dcf2c28e66f2f3584f93) one is to take a
reference on target even if it is stopping (umount'ed). Then, upon
connections attempts, this can lead to unwanted cleanup actions to
occur on [obd_self_]export from class_decref(), finally causing a
LBUG in class_export_put() because export's exp_refcount has already
reached 0.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: If960437934fb694d173a4fd1fbfb9e43d496fea6
Reviewed-on: http://review.whamcloud.com/16940
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7318 out: dynamic reply size 89/16889/16
Alex Zhuravlev [Tue, 20 Oct 2015 13:53:18 +0000 (16:53 +0300)]
LU-7318 out: dynamic reply size

every update on the initiator side can declare how many bytes
it expects back. OUT packing library put these numbers on the
wire and prepary an appropriate buffer for the reply. then OUT
target do few checks to ensure individual replies fit their
buffers.

Change-Id: I443b5c879bc321c33efb70af665ecd2b2f7baa18
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/16889
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7077 target: avoid using possible error return NULL pointer 73/16473/5
Bob Glossman [Thu, 17 Sep 2015 19:05:02 +0000 (12:05 -0700)]
LU-7077 target: avoid using possible error return NULL pointer

previous fix http://review.whamcloud.com/15576 added a call
to cfs_hash_getref().  add LASSERT() to ensure the can
never happen here return value of NULL is in fact never seen.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Ic6132b5450534db0bb9b89c3dd6f55517450c42a
Reviewed-on: http://review.whamcloud.com/16473
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-7174 build: make git ignore dkms generated file 36/17136/3
James Simmons [Thu, 12 Nov 2015 14:31:49 +0000 (09:31 -0500)]
LU-7174 build: make git ignore dkms generated file

While testing patches other non-patch the related build
by product dkms.mkconf show up with git status. To avoid
adding this by accident place thes by product files in the
proper .gitignore files.

Change-Id: I49a5411f8c1159a75d1cd28067dfbf3c2a677d6c
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/17136
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7169 tests: check disk corruption during failover 64/16664/9
Fan Yong [Thu, 24 Sep 2015 09:04:41 +0000 (17:04 +0800)]
LU-7169 tests: check disk corruption during failover

It is a debug patch for conf-sanity test_84. It is suspected
that there is some disk corruption during the MDT0 failover.

Test-Parameters: mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=conf-sanity,conf-sanity,conf-sanity
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7e20f26e1ecee483474ace44c8284b5776f3c602
Reviewed-on: http://review.whamcloud.com/16664
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-7377 utils: Don't fail on plugin load for mount/tunefs 28/17128/2
Nathaniel Clark [Fri, 6 Nov 2015 18:48:34 +0000 (13:48 -0500)]
LU-7377 utils: Don't fail on plugin load for mount/tunefs

While loading mount_utils_zfs, if zfs modules aren't loaded, but
zfs plugin is present, it will return an error, this shouldn't cause
all module loading to fail.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Idad52745cdfa9d673ab9bd4afe38de4d51ae9a49
Reviewed-on: http://review.whamcloud.com/17128
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7414 target: do not share update and rdbuf 29/17129/3
Di Wang [Tue, 10 Nov 2015 10:12:00 +0000 (02:12 -0800)]
LU-7414 target: do not share update and rdbuf

Redefine lu_rdbuf structure to simplify the rdbuf
allocation in out_read().

And also move tti_u.rdbuf out of tgt_thread_info
union, otherwise rdbuf and update will share
the same memory and cause corruption, see out_read().

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Idb0f5af1b00fd5fd15ebc8742aa60d9a43df0a8a
Reviewed-on: http://review.whamcloud.com/17129
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-7200 kernel: kernel update [SLES11 SP3 3.0.101-0.47.67] 17/16617/9
Bob Glossman [Wed, 23 Sep 2015 18:36:38 +0000 (11:36 -0700)]
LU-7200 kernel: kernel update [SLES11 SP3 3.0.101-0.47.67]

Update SLES11 SP3 kernel to 3.0.101-0.47.67

Test-Parameters: mdsdistro=sles11sp3 ossdistro=sles11sp3 \
  clientdistro=sles11sp3 mdsfilesystemtype=ldiskfs \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
  testgroup=review-ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Icd7bac6bea6866f82e2e03f5dbbb1bda1a4ecacf
Reviewed-on: http://review.whamcloud.com/16617
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoNew tag 2.7.63 2.7.63 v2_7_63 v2_7_63_0
Oleg Drokin [Mon, 16 Nov 2015 22:41:54 +0000 (17:41 -0500)]
New tag 2.7.63

Change-Id: I79f285380612f61679c1cf8e51446c9018e8225c
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6204 build: clean up kernel module metadata 87/16787/5
Andreas Dilger [Mon, 19 Oct 2015 15:24:22 +0000 (11:24 -0400)]
LU-6204 build: clean up kernel module metadata

Update static MODULE_VERSION() lines - this should be automated.

Improve MODULE_DESCRIPTION() descriptions.

Make the name of the module_init()/_exit() functions consistently
{module_name}_init and {module_name}_exit.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I1c3fe5698c7f41d971a38225650597c913500c1e
Reviewed-on: http://review.whamcloud.com/16787
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7345 obdclass: annotate locks in __local_file_create 57/16957/4
Olaf Faaland [Tue, 27 Oct 2015 00:51:46 +0000 (17:51 -0700)]
LU-7345 obdclass: annotate locks in __local_file_create

dt_write_lock() is called for both the child and the parent dt_objects
when a directory is created.  This triggers a false positive in
lockdep when running with CONFIG_LOCKDEP=y, as the structure
containing the lock and the name of the lock is the same, and so it
appears to be a recursive lock attempt based on lock class.

This gives the two locks different subclasses so lockdep can
differentiate between them.

Also, osd-zfs osd_object_{read,write}_lock() functions currently
ignore the subclass (role) provided by the caller, calling down_read()
instead of down_read_nested() for example.

Make osd_zfs use the _nested variants so the role takes effect.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Iab79feadfbd7d1a5a06749ecb9f6888b55a78d73
Reviewed-on: http://review.whamcloud.com/16957
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7269 ptlrpc: remove ptlrpc_prep_req 65/16765/5
Ben Evans [Wed, 7 Oct 2015 17:30:52 +0000 (12:30 -0500)]
LU-7269 ptlrpc: remove ptlrpc_prep_req

Remove unused functions ptlrpc_prep_req, ptlrpc_prep_req_pool
Combine __ptlrpc_request_bufs_pack and ptlrpc_request_bufs_pack

Signed-off-by: Ben Evans <bevans@cray.com>
Change-Id: I4e4e64aa1f7fa4c85daf311906f6417a513dcddc
Reviewed-on: http://review.whamcloud.com/16765
Tested-by: Jenkins
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
4 years agoLU-7362 lnet: Remove LASSERTS from router checker 03/17003/2
Doug Oucharek [Fri, 30 Oct 2015 21:40:59 +0000 (14:40 -0700)]
LU-7362 lnet: Remove LASSERTS from router checker

In lnet_router_checker(), there are two LASSERTS.  Neither protects
us from anything and one of them triggered for a customer crashing
the system unecessarily.  This patch removes them.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: If2732632e47103fb8fa63a263c4c5ef4a44142a3
Reviewed-on: http://review.whamcloud.com/17003
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Matt Ezell <ezellma@ornl.gov>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7296 ldlm: improve lock timeout messages 24/16824/2
John L. Hammond [Wed, 14 Oct 2015 16:33:35 +0000 (11:33 -0500)]
LU-7296 ldlm: improve lock timeout messages

In ldlm_expired_completion_wait() remove the useless LCONSOLE_WARN()
message and upgrade the LDLM_DEBUG() statement to LDLM_ERROR().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I6293720bf8e038057a2c84a715359cdbb8cebe91
Reviewed-on: http://review.whamcloud.com/16824
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7120 scrub: handle osd_scrub_post return value 68/16368/2
Fan Yong [Fri, 31 Jul 2015 16:00:20 +0000 (00:00 +0800)]
LU-7120 scrub: handle osd_scrub_post return value

To avoid missing some failure cases during write scrub status
to disk.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7f77bdba184f634b4f9dd748c3f1b97609b81960
Reviewed-on: http://review.whamcloud.com/16368
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ashish Purkar <ashish.purkar@seagate.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6013 utils: don't initialize OSD code for client mount 19/13019/4
Andreas Dilger [Wed, 10 Dec 2014 10:53:49 +0000 (03:53 -0700)]
LU-6013 utils: don't initialize OSD code for client mount

Don't even try to initialize the server OSD handling code if this
is a client mountpoint.  That avoids potential problems if the OSD
code isn't working or available when it isn't needed on a client.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I104b70b9d27811d306879fc047a83f85ea3ebbe5
Reviewed-on: http://review.whamcloud.com/13019
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoRevert "LU-4865 zfs: grow block size by write pattern" 53/17053/4
Andreas Dilger [Thu, 5 Nov 2015 18:47:06 +0000 (18:47 +0000)]
Revert "LU-4865 zfs: grow block size by write pattern"

This reverts commit 3e4369135127b350dbc26a4a5dc94cfa46e394cf.

This has shown problems in testing and may be the cause of LU-7392.

Change-Id: I664f7f8c943d8a90f2d2a9845aea2636535d6b1e
Reviewed-on: http://review.whamcloud.com/17053
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7330 ldlm: fix race of starting bl threads 26/17026/2
Niu Yawei [Tue, 3 Nov 2015 06:59:32 +0000 (01:59 -0500)]
LU-7330 ldlm: fix race of starting bl threads

There is race in the code of starting bl threads which leads to
thread number exceeds the maximum number when race happened, it
can also lead to duplicated thread name.

This patch fixes the race and cleanup the code a bit.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I9c9be125d1d76890b8c52476684976dad3cb3d87
Reviewed-on: http://review.whamcloud.com/17026
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-2222 mdt: restore evict-by-nid functionality 67/16867/8
Alex Zhuravlev [Mon, 19 Oct 2015 10:18:44 +0000 (13:18 +0300)]
LU-2222 mdt: restore evict-by-nid functionality

Writing a NID or UUID to mdt.*.evict_tgt_nids will evict clients
with NID or UUID specified all the targets (OSTs and MDTs).

Change-Id: I66a60a6c81fbac1571f5685111df7b00a306be36
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/16867
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7367 tests: fix fail_loc code in 110g 09/17009/2
Alex Zhuravlev [Sun, 1 Nov 2015 18:47:48 +0000 (21:47 +0300)]
LU-7367 tests: fix fail_loc code in 110g

test 110g used wrong code for OBD_FAIL_MIGRATE_NET_REP

Change-Id: I5dc2e18fb99e35422a9ae227e2e16ba7d39600a3
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/17009
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7366 build: resolve compile issues on Ubuntu 15.04 08/17008/2
James Simmons [Sun, 1 Nov 2015 16:47:29 +0000 (11:47 -0500)]
LU-7366 build: resolve compile issues on Ubuntu 15.04

Ubuntu 15.04 has been released which uses a 4.2 kernel
and gcc 5.2. The gcc version fails to build lustre due
to missing headers in the GSS userland code and a
variable not being uninitialized in the llite layer.

Change-Id: I3615414ac039277a6ef6c6af1a541590b9d79566
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/17008
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7353 utils: fix lctl usage messages 80/16980/3
Andreas Dilger [Wed, 28 Oct 2015 20:06:13 +0000 (14:06 -0600)]
LU-7353 utils: fix lctl usage messages

Fix the lctl usage message for sub-commands that need the LNet network
to be specified.  lctl::g_net_is_set() incorrectly recommended using
the "network" command to specify the LNet network, which is incorrect
when using lctl in command-line mode:

  # lctl peer_list
  You must run the 'network' command before 'peer_list'.
  # lctl network tcp0 peer_list
  # lctl --net tcp0 peer_list
  12345-192.168.20.1@tcp [1]192.168.40.147->mookie-gig:988 #15

Fix that to correctly recommend using the "--net" command when using
non-interactive mode, and to return an error if "network" is used in
non-interactive mode with extra arguments.

Replace mention of "portals" in help messages with "LNet".
Remove mention of obsolete elan, qsw, ra network types.

Improve the help message content for related subcommands.
Fix whitespace and command descriptions for related subcommands.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Idf9bf663b16012ebc9f38566ecba9859a54cab07
Reviewed-on: http://review.whamcloud.com/16980
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7336 ofd: cleanup proc when ofd_info_init fails 34/16934/3
Li Xi [Sat, 24 Oct 2015 05:36:00 +0000 (13:36 +0800)]
LU-7336 ofd: cleanup proc when ofd_info_init fails

In ofd_init0(), if ofd_info_init() fails it should cleanup
procs.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I3ff278526f09ef7e36631712ce21a498a6644907
Reviewed-on: http://review.whamcloud.com/16934
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7012 osp: don't use OSP when import is deactivated 37/16937/3
Mikhail Pershin [Wed, 23 Sep 2015 19:09:27 +0000 (12:09 -0700)]
LU-7012 osp: don't use OSP when import is deactivated

Unset opd_imp_connected flag upon IMP_EVENT_INACTIVE event,
it will stop any llog processing by that device until import
will be activated again.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ie219e536c216130f428ba933d11842511692c95b
Reviewed-on: http://review.whamcloud.com/16937
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7371 osd-ldiskfs: fix wrong read length over isize 20/17020/5
Li Xi [Mon, 2 Nov 2015 16:39:31 +0000 (00:39 +0800)]
LU-7371 osd-ldiskfs: fix wrong read length over isize

If the isize is 4095, a read length of 4096 will be
returned because a wrong calculation of EOF. This patch fixes the
problem.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I73b18641f000a2d96067243c08c26e51d0d53244
Reviewed-on: http://review.whamcloud.com/17020
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7261 ldiskfs: clean up code style for large_xattr 78/16778/4
Andreas Dilger [Fri, 9 Oct 2015 06:37:03 +0000 (00:37 -0600)]
LU-7261 ldiskfs: clean up code style for large_xattr

Clean up the code style for the large_xattr patches to match the
upstream kernel style (use ! instead of == 0, and similar), and
the original style of the code before the earlier versions of the
patch was applied.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Icffe339d1b002a55984856829afde9e3eae98bd9
Reviewed-on: http://review.whamcloud.com/16778
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7379 kernel: kernel update RHEL7.1 [3.10.0-229.20.1.el7] 44/17044/3
Bob Glossman [Tue, 3 Nov 2015 22:09:32 +0000 (14:09 -0800)]
LU-7379 kernel: kernel update RHEL7.1 [3.10.0-229.20.1.el7]

Test-Parameters: mdsdistro=el7 ossdistro=el7 \
  clientdistro=el7 mdsfilesystemtype=ldiskfs \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
  testgroup=review-ldiskfs

update RHEL 7.1 kernel to 3.10.0-229.20.1.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I690daa6493232353703f5392cfe0a979b824f3f1
Reviewed-on: http://review.whamcloud.com/17044
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7304 ldiskfs: fix bug when bigalloc is enabled 32/16832/3
Wang Shilong [Tue, 13 Oct 2015 00:28:29 +0000 (20:28 -0400)]
LU-7304 ldiskfs: fix bug when bigalloc is enabled

See following error when enabled bigalloc feature
for ldiskfs rhel7:

LDISKFS-fs error (device sdb):
ldiskfs_mb_check_ondisk_bitmap:3611: comm mkdir:
 on-disk bitmap for group 8corrupted: 0 blocks free in
 bitmap, 32768 - in gd

Fixed to use EXT4_CLUSTERS_PER_GROUP, otherwise,
we will get wrong value and fail to check, which
make FS become RO..

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I7f61918918e6f4e2f372929181b704b0648dcbca
Reviewed-on: http://review.whamcloud.com/16832
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7354 osd: avoid NULL pointer in osd_obj_update_entry 10/17010/3
Fan Yong [Sun, 13 Sep 2015 09:41:13 +0000 (17:41 +0800)]
LU-7354 osd: avoid NULL pointer in osd_obj_update_entry

In osd_obj_update_entry(), the variable @oi_fid may be NULL.
We need to check such case before further using it to avoid
accessing invalid RAM.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibf47e949d69f0b9e5657a6dce2007fe4f6f1a9f6
Reviewed-on: http://review.whamcloud.com/17010
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7243 misc: update Intel copyright messages 2015 58/16758/3
Andreas Dilger [Tue, 6 Oct 2015 23:25:40 +0000 (17:25 -0600)]
LU-7243 misc: update Intel copyright messages 2015

Update copyright messages in files modified by Intel employees
in 2015 by non-trivial patches.  Exclude patches that are only
deleting code, renaming functions, or adding or removing whitespace.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I70fe6a346790e15d23606a3f380e7ef8fb8b84a0
Reviewed-on: http://review.whamcloud.com/16758
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7230 llite: clear dir stripe md in ll_iget 77/16677/15
Di Wang [Thu, 8 Oct 2015 07:51:16 +0000 (00:51 -0700)]
LU-7230 llite: clear dir stripe md in ll_iget

If ll_iget fails during inode initialization, especially
during striped directory lookup after creation failed,
then it should clear stripe MD before make_bad_inode(),
because make_bad_inode() will reset the i_mode, which
can cause ll_clear_inode() skip freeing those stripe MD.

Remove the name entry from the directory, once creation
failed. Note: this will not rollback all of local
operation, and LFSCK will take care of the orphan object.

Add sanity.sh 300p to verify the case.

And also enable lfs rm_entry for local object as well,
because sometimes it is quite possible to create the
local corrupted striped directory, and we might need
use "lfs rm_entry" to delete the corrupted striped dir.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I892c52117b83c8348aa0ceb888e73c84e79ffe46
Reviewed-on: http://review.whamcloud.com/16677
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-6634 llog: destroy plain llog if init fails 27/16427/15
Alex Zhuravlev [Tue, 15 Sep 2015 09:36:35 +0000 (12:36 +0300)]
LU-6634 llog: destroy plain llog if init fails

llog_cat_add_rec() should destroy the plain llog
in the same transaction if initialization of that
failed. also, llog_osd_write_rec() should check
the object still exists as it's possible that
another thread failed to initialize and destroyed
the llog.

Change-Id: I7b823d34b32b5caaf0cc17b4cfe278a07a78ec15
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/16427
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7277 lod: keep trying to get remote update log 86/16786/2
Di Wang [Thu, 8 Oct 2015 08:09:04 +0000 (01:09 -0700)]
LU-7277 lod: keep trying to get remote update log

Because the remote MDT might be in recovery at the same
time, let's Keep trying to get remote update log until
the recovery is abort.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Id9543201ce543be730e73f9f51f3f7a0d10d3dfc
Reviewed-on: http://review.whamcloud.com/16786
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6992 test: wait device to be registered 81/16781/5
Hongchao Zhang [Thu, 22 Oct 2015 19:26:45 +0000 (03:26 +0800)]
LU-6992 test: wait device to be registered

in mount_facet, the device label can only be used after the device
registered to MGS and it was rewritten.

Change-Id: I9ed65631391f2be84e484e409fbfe59020d982be
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/16781
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>