Whamcloud - gitweb
fs/lustre-release.git
5 years agoLU-5950 mgc: add nid iteration 29/12829/3
Alexander.Boyko [Mon, 24 Nov 2014 10:55:15 +0000 (13:55 +0300)]
LU-5950 mgc: add nid iteration

mgc_apply_recover_logs use only first nid from entry,
this could be the problem for a cluster with several network
address for a one node.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Change-Id: I6ec348761c2d51edd613cb388e37ef7776990424
Xyratex-bug-id: MRP-2255
Reviewed-on: http://review.whamcloud.com/12829
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
5 years agoLU-5912 libcfs: use vfs api for fsync calls 31/12731/3
Bob Glossman [Fri, 14 Nov 2014 22:26:30 +0000 (14:26 -0800)]
LU-5912 libcfs: use vfs api for fsync calls

Use vfs_fsync_range() instead of direct use of filp->f_op->fsync()
routines.  Doing so will apply correct locking transparently without
needing to decide how to do it ourselves.
What we were doing was a long term violation of the locking
protocols described in Documentation/filesystems/Locking in linux
source but was never noticed until new checking code went into the
RHEL 6.6 kernel.  The new check triggered a visible error in syslog.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I551215fc340637364fe04f6e3bae963cf983c953
Reviewed-on: http://review.whamcloud.com/12731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
5 years agoLU-5894 mds: allow 2.4/2.5 clients create remote dir 15/12715/2
Wang Di [Fri, 14 Nov 2014 07:16:09 +0000 (23:16 -0800)]
LU-5894 mds: allow 2.4/2.5 clients create remote dir

MDS will only return ENOTSUPP if old client (2.4/2.5) tries
to create striped dir with stripe count > 1, so it can still
create remote directory on the new MDS (>= 2.6).

Change-Id: I25c90ae793f91eed032949d26fd5e7fc41801e4f
Signed-off-by: Wang Di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/12715
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
5 years agoLU-5808 llog: check name strictly to avoid invalid record 37/12437/2
Li Xi [Mon, 27 Oct 2014 13:54:25 +0000 (21:54 +0800)]
LU-5808 llog: check name strictly to avoid invalid record

Records for a file system cound be written to llog of another file
system by mistake if the name of the former one is the prefix of
the latter one. This patch fixes the problem by using more strict
checking of llog name.

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: If45c59b0226b71e8a95f9aa719eae8412c89a2f1
Reviewed-on: http://review.whamcloud.com/12437
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
5 years agoLU-5635 llog: prevent out-of-bound index 61/12161/3
Frank Zago [Wed, 1 Oct 2014 20:30:50 +0000 (15:30 -0500)]
LU-5635 llog: prevent out-of-bound index

llog_process_thread() can be called from llog_cat_process_cb with an
index already out of bound, leading to the following crash:

LustreError: 3773:0:(llog.c:310:llog_process_thread())
  ASSERTION(index <= last_index + 1 ) failed:
LustreError: 3773:0:(llog.c:310:llog_process_thread()) LBUG

 #0 [ffff8801144bf900] machine_kexec at ffffffff81038f3b
 #1 [ffff8801144bf960] crash_kexec at ffffffff810c5d82
 #2 [ffff8801144bfa30] panic at ffffffff8152798a
 #3 [ffff8801144bfab0] lbug_with_loc at ffffffffa02f8eeb [libcfs]
 #4 [ffff8801144bfad0] llog_process_thread at ffffffffa0413fff [obdclass]
 #5 [ffff8801144bfb80] llog_process_or_fork at ffffffffa041585f [obdclass]
 #6 [ffff8801144bfbd0] llog_cat_process_cb at ffffffffa0418612 [obdclass]
 #7 [ffff8801144bfc30] llog_process_thread at ffffffffa0413c22 [obdclass]
 #8 [ffff8801144bfce0] llog_process_or_fork at ffffffffa041585f [obdclass]
 #9 [ffff8801144bfd30] llog_cat_process_or_fork at ffffffffa0416b9d [obdclass]
    RIP: 00007f6de5e4f730  RSP: 00007fff9aa26d98  RFLAGS: 00000206
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 00007f6de5e4f730
    RDX: 0000000000008000  RSI: 00000000019c7000  RDI: 0000000000000003
    RBP: 00000000019c7000   R8: 00007f6de6103ee8   R9: 0000000000000001
    R10: 00007fff9aa26b20  R11: 0000000000000246  R12: ffffffffffff8000
    R13: 0000000000000003  R14: 0000000000008000  R15: 0000000000000003
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

If index is too big, simply return success.

Change-Id: I81bbedbbe2bcef478c370ef40fc069447d39efbd
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12161
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5687 dt: propagate errors from failed declarations 30/12130/4
John L. Hammond [Tue, 30 Sep 2014 16:09:30 +0000 (11:09 -0500)]
LU-5687 dt: propagate errors from failed declarations

Check for and return errors from dt_declare_*() in several locations.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Id18b12d6c713e78e2f1cc782ff659d2c84cc60bb
Reviewed-on: http://review.whamcloud.com/12130
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2675 lnet: remove ulnds 17/12117/3
John L. Hammond [Mon, 29 Sep 2014 18:33:24 +0000 (13:33 -0500)]
LU-2675 lnet: remove ulnds

Remove the unused userspace LND code (all of lnet/ulnds/) and
supporting autocrud.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I104d8b22afdde5027a2a0ef1a9ecc0423b67fae5
Reviewed-on: http://review.whamcloud.com/12117
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2675 lmv: remove lmv_init_{lock,unlock}() 15/12115/3
John L. Hammond [Mon, 29 Sep 2014 18:12:52 +0000 (13:12 -0500)]
LU-2675 lmv: remove lmv_init_{lock,unlock}()

In struct lmv_obd rename the init_mutex member to
lmv_init_mutex. Remove the compat macros lmv_init_{lock,unlock}() and
use mutex_{lock,unlock}(&lmv->lmv_init_mutex) instead.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iae1f5d6b7fd1f96ba430d5e7af97c51ce3e042a8
Reviewed-on: http://review.whamcloud.com/12115
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2675 md: remove unused code from md_object.h 13/12113/3
John L. Hammond [Mon, 29 Sep 2014 17:55:46 +0000 (12:55 -0500)]
LU-2675 md: remove unused code from md_object.h

Remove several unused functions, structures, and members from
md_object.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I33de0ba987bfde95172e9bfb77929b6b4dcd0aa8
Reviewed-on: http://review.whamcloud.com/12113
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5622 tests: check/wait for copytool death 22/11922/5
Bruno Faccini [Mon, 15 Sep 2014 15:37:31 +0000 (17:37 +0200)]
LU-5622 tests: check/wait for copytool death

Seems that copytool death/kill may take more time so
this condition must be handled in sanity-hsm copytool_cleanup()
function to avoid situations where copytool will then not be
restarted, but only signaled, in next copytool_setup().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ia72ed07f0219cf0aa2ef5b3805fb1f7faf4dab66
Reviewed-on: http://review.whamcloud.com/11922
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Jenkins
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-3456 ptlrpc: quiet errors on initial connection 57/10057/4
Andreas Dilger [Tue, 22 Apr 2014 19:54:46 +0000 (13:54 -0600)]
LU-3456 ptlrpc: quiet errors on initial connection

It may be that a client or MDS is trying to connect to a target (OST
or peer MDT) before that target is finished setup.  Rather than
spamming the console logs during initial connection, only print a
console error message if there are repeated failures trying to
connect to the target, which may indicate an error on that node.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I98ec7b4c2109b700b53297038d3fede4773ebbe5
Reviewed-on: http://review.whamcloud.com/10057
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4820 osd: drop memcpy in zfs osd 60/9760/10
Alex Zhuravlev [Mon, 24 Mar 2014 15:30:19 +0000 (19:30 +0400)]
LU-4820 osd: drop memcpy in zfs osd

dmu_read() was called from osd_read_prep() copying from
ARC bufs into the same ARC bufs. seem to be the remainings
of per-zerocopy age.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I0f3657c360d8541d7c3c6e8e32eac78bc5702b42
Reviewed-on: http://review.whamcloud.com/9760
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5878 lfs: migrate file to its proper destination 01/12601/6
Frank Zago [Thu, 6 Nov 2014 17:08:30 +0000 (11:08 -0600)]
LU-5878 lfs: migrate file to its proper destination

llapi_file_open_param() is supposed to be returning the opened file
descriptor. However, when llapi_search_ost() is called, it returns 1,
which sets rc to 1, which in turn is confused for an error later, and
returned to the caller. So when the copy happen, the destination file
descriptor is 1 (stdout).

Fixed a typo in the function description, and format the parameters
descriptions.

Fixed a bad indentation.

There's no need to test lum before freeing it since at that point is
not NULL (and free will test it anyway).

Change-Id: I16fe26480b880aa818b1bb706b22bfdd6833d69c
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12601
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
5 years agoLU-5861 lnet: invoke lnetctl properly from startup script 61/12561/2
Amir Shehata [Tue, 4 Nov 2014 21:14:39 +0000 (13:14 -0800)]
LU-5861 lnet: invoke lnetctl properly from startup script

Use the correct lnetctl command syntax to load default config:
lnetctl import < lnet.conf

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I54dd0d34f75b91c1c6ceb9745d817cb43f82ef25
Reviewed-on: http://review.whamcloud.com/12561
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-4119 ldlm: abort recovery by time_hard 78/9078/11
Sergey Cheremencev [Thu, 20 Nov 2014 16:58:43 +0000 (11:58 -0500)]
LU-4119 ldlm: abort recovery by time_hard

Set obd_abort_recovery to 1 when recovery time
reaches obd_recovery_time_hard.

Xyratex-bug-id: MRP-1365

Change-Id: Ida8f71cb63d5db9bf85bcdf2c152b4d9f71b8bca
Signed-off-by: Sergey Cheremencev <Sergey_Cheremencev@xyratex.com>
Reviewed-on: http://review.whamcloud.com/9078
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5893 kernel: kernel update [RHEL7 3.10.0-123.9.3.el7] 57/12657/4
Bob Glossman [Mon, 10 Nov 2014 19:20:18 +0000 (11:20 -0800)]
LU-5893 kernel: kernel update [RHEL7 3.10.0-123.9.3.el7]

update RHEL7 kernel to 3.10.0-123.9.3.el7

Test-Parameters: clientdistro=el7
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ife164ff8bea44369bc33cae07cfbb59d5845e406
Reviewed-on: http://review.whamcloud.com/12657
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-1453 scrub: auto trigger OI scrub more flexible 38/12738/10
Fan Yong [Sat, 13 Sep 2014 20:22:41 +0000 (04:22 +0800)]
LU-1453 scrub: auto trigger OI scrub more flexible

Generally, scanning the whole device for OI scrub routine check may
takes some long time. If the whole system only contains several bad
OI mappings, then it is not worth to trigger OI scrub automatically
with full speed when some bad OI mapping is auto-detected. Instead,
we can make the OI scrub to fix the found bad OI mappings only, and
if more and more bad OI mappings are found as to exceeds some given
threshold that can be adjusted via some proc interface, then the OI
scrub will run with full speed to scan whole device.

Currently, we offer two kinds of thresholds for triggering OI scrub
to scan the whole device:

1) "the total OI mappings count" vs "the bad OI mappings count".
   If such ratio is low than the given threshold that can be set
   via the proc interface "full_scrub_ratio", then trigger urgent
   mode OI scrub.

2) "the speed of found the bad OI mappings". If the speed exceeds
   the given threshold that can be adjusted via the proc interface
   "full_scrub_speed", then trigger urgent mode OI scrub.

Test-Parameters: mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs envdefinitions=ONLY=4 testlist=sanity-scrub
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ibc4592fef1da11994ec30eb348d20576be5ae54b
Reviewed-on: http://review.whamcloud.com/12738
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
5 years agoLU-1452 scrub: OI scrub skips uninitialized groups 37/12737/5
Fan Yong [Thu, 11 Sep 2014 23:55:43 +0000 (07:55 +0800)]
LU-1452 scrub: OI scrub skips uninitialized groups

If the ldiskfs group descriptor is marked as LDISKFS_BG_INODE_UNINIT,
then means that the inodes in such group have never been initialized,
so the otable based iterator can skip this group directly to speed up
the scanning.

If the iteration position reaches the unused inodes area in the
group descriptor (indicated by bg_itable_unused), then skip the
rest inodes in this group to reduce the scanning time.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ie8a2eb1269d288865ce51d40e211e3db54d062af
Reviewed-on: http://review.whamcloud.com/12737
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
5 years agoLU-5867 lfsck: Enable --create_mdtobj flag 78/12578/5
James Nunez [Tue, 9 Sep 2014 18:53:42 +0000 (02:53 +0800)]
LU-5867 lfsck: Enable --create_mdtobj flag

Using the --create_mdtobj flag in 'lctl lfsck_start'
creates an error. "create_mdtobj" is added to the
option struct so it will be recognized as a valid option.

When displaying the results of LFSCK, "create_mdtobj" is
not listed as a parameter. "create_mdtobj" is added to
the lfsck_param_names array so it will be printed when
used.

Also, added LSV_CREATE_MDTOBJ to the lfsck_request
valid options/flags.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I1923bb9a71958b390b9abea248b328ac59c3caad
Reviewed-on: http://review.whamcloud.com/12578
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5963 nodemap: use proper hashing 81/12881/2
Alexey Lyashkov [Sat, 29 Nov 2014 08:55:22 +0000 (11:55 +0300)]
LU-5963 nodemap: use proper hashing

don't hash a export pointer as string.
check a situation when we don't delete a export from nodemap
hash.

Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Change-Id: Id53281078f165ce984abebc74992bde30fcc9f31
Reviewed-on: http://review.whamcloud.com/12881
Tested-by: Jenkins
Reviewed-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Kit Westneat <kit.westneat@gmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5727 ldlm: revert the changes for lock canceling policy 33/12733/2
Jinshan Xiong [Sat, 15 Nov 2014 01:07:37 +0000 (17:07 -0800)]
LU-5727 ldlm: revert the changes for lock canceling policy

The changes for LRU lock policy was introduced by commit bfae5a4e,
where I was trying to revise the policy to pick locks for canceling.

However, this caused two problems as mentioned in LU-5727. The first
problem is that the lock can only be picked for canceling only if
the number of LRU locks is over preset LRU number AND it's aged; the
second problem is that mdc_cancel_weight() tends to not cancel OPEN
locks, therefore open locks can be kept forever and finally exhausts
memory on the MDT side.

The first problem is fixed by patch e8812867. This patch will revert
the rest of changes related to LRU policy revise.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ie1dbcd15dc6e739d01ddcae01d7e637688a1d4b2
Reviewed-on: http://review.whamcloud.com/12733
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-5507 recovery: don't replay closed open 67/12667/4
Niu Yawei [Tue, 11 Nov 2014 05:54:34 +0000 (00:54 -0500)]
LU-5507 recovery: don't replay closed open

To avoid scanning the replay open list every time in the
ptlrpc_free_committed(), the fix of LU-2613 (4322e0f9) changed
the ptlrpc_free_committed() to skip the open list unless the
import generation is changed. That introduced a race which could
make a closed open being replayed:

1. Application calls ll_close_inode_openhandle()-> mdc_close(),
   to close file, rq_replay is cleared, but the open request is
   still on the imp_committed_list;

2. Before the md_clear_open_replay_data() is called for close,
   client start replay, and that closed open will be replayed
   mistakenly;

3. Open replay interpret callback (mdc_replay_open) could race
   with the mdc_clear_open_replay_data() at the end;

This patch fix the ptlrpc_free_committed() to make sure the
open list is scanned on recovery to prevent the closed open request
from being replayed.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ia67fe5d8d501a69bafbbd7e44bd612abb9c254c6
Reviewed-on: http://review.whamcloud.com/12667
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoLU-2833 tests: Unexempt sanity/48a for zfs 07/12607/3
Nathaniel Clark [Thu, 6 Nov 2014 20:51:37 +0000 (15:51 -0500)]
LU-2833 tests: Unexempt sanity/48a for zfs

With LU-2449 being landed this test no longer fails on ZFS.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ie82f25ac0152dee7972a8a210d8669b59798e9a7
Reviewed-on: http://review.whamcloud.com/12607
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
5 years agoRevert "LU-3573 osd-zfs: Only advance zap cursor as needed" 87/12887/4
Andreas Dilger [Mon, 1 Dec 2014 09:07:00 +0000 (09:07 +0000)]
Revert "LU-3573 osd-zfs: Only advance zap cursor as needed"

This reverts commit 1da9b84b39ab36be9ba67a72ae175dde6521769b.

This patch introduced a far more serious regression in conf-sanity
test_32b LU-5924 and should be reverted until the problem is fixed.

Change-Id: I28f04a33d1c1bb4688d2ba9af6015a2737fb1d93
Reviewed-on: http://review.whamcloud.com/12887
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5079 tests: fix service_time in max_recovery_time() 24/12724/9
Jian Yu [Mon, 24 Nov 2014 22:32:55 +0000 (14:32 -0800)]
LU-5079 tests: fix service_time in max_recovery_time()

This patch fixes the calculation of service_time in
max_recovery_time() to use the new method in
check_and_start_recovery_timer() and new values of
CONNECTION_SWITCH_MAX and CONNECTION_SWITCH_INC.

The patch also fixes replay-dual sub-tests:
- to call wait_clients_import_state() instead of sleeping
  uncertain time in test_11()
- to add some margin into the recovery time comparison
  in test_20()

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ENABLE_QUOTA=yes,REPLAY_DUAL_EXCEPT=21 \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs mdtcount=1 \
testlist=replay-dual,replay-dual

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Ife0fab28ed7b67ac61022f7e8a38957e3995b167
Reviewed-on: http://review.whamcloud.com/12724
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5650 mgc: check the import stat for lprocfs 27/12327/2
Hongchao Zhang [Tue, 9 Sep 2014 12:18:17 +0000 (20:18 +0800)]
LU-5650 mgc: check the import stat for lprocfs

in lprocfs_mgc_rd_ir_state, the import state should be checked
the validity before doing further work.

Change-Id: Ic582150a1cdbef331a929ce378d6e4f987a169fd
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/12327
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5888 utils: limit max_sectors_kb tunable setting 23/12723/2
Andreas Dilger [Fri, 14 Nov 2014 19:48:39 +0000 (12:48 -0700)]
LU-5888 utils: limit max_sectors_kb tunable setting

Limit the value set by mount.lustre set_blockdev_tunables() to a
reasonable 32MB instead of the maximum possible amount, since the
parsing of max_hw_sectors_kb might be bad, or it just returns a
value much larger than we need.

Also quiet the printing of the max_sectors_kb tunable that was added
in commit 9813961151e (http://review.whamcloud.com/9865) so that it
only prints something when the value is actually changed, instead of
printing it for every tunable even if the value is the same.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I648c2d8484ae5cef59ab62421cd01bc0ed02fcd6
Reviewed-on: http://review.whamcloud.com/12723
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Blake Caldwell <blakec@ornl.gov>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5862 changelog: Proper record remapping 74/12574/4
Henri Doreau [Wed, 5 Nov 2014 14:01:52 +0000 (15:01 +0100)]
LU-5862 changelog: Proper record remapping

Fixed changelog_remap_rec() to correctly remap records emitted
with jobid_var=disabled, i.e. delivered by new servers but with
no jobid field.

Updated sanity test 205 accordingly.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Ia151e9bfde2def8819913ee658bde6b71ef3ab18
Reviewed-on: http://review.whamcloud.com/12574
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
6 years agoLU-5848 debug: more debug log for dt_sync 73/12573/3
Fan Yong [Sat, 6 Sep 2014 04:39:46 +0000 (12:39 +0800)]
LU-5848 debug: more debug log for dt_sync

Add some D_CACHE logs at the entry/exit for osp_sync()/osd_sync().

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iaa7fbfbbadb9312528b5092d64615b277de6b679
Reviewed-on: http://review.whamcloud.com/12573
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5641 tests: ensure user daemon is in group bin on mds 62/12762/2
Bob Glossman [Tue, 18 Nov 2014 01:33:57 +0000 (17:33 -0800)]
LU-5641 tests: ensure user daemon is in group bin on mds

The previous fix for this problem only fixed groups on client.
That worked as long as we were only testing with el7 client,
but was an incomplete solution for el7 client/servers.
Need to apply the same fix to mds too to keep things consistent.

Signed-off-by: Bob Gossman <bob.glossman@intel.com>
Change-Id: I411970c591a72b0393ed892f15da1f5d6340df8c
Reviewed-on: http://review.whamcloud.com/12762
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5892 lfsck: remove improper LASSERT in lfsck_needs_scan_dir 70/12670/2
Fan Yong [Sat, 6 Sep 2014 20:13:49 +0000 (04:13 +0800)]
LU-5892 lfsck: remove improper LASSERT in lfsck_needs_scan_dir

Inside the lfsck_needs_scan_dir(), when the internal variable @fid
becomes the input @obj's parent FID, the internal variable @depth
may be still zero, so the original "LASSERT(depth > 0);" is improper
under such case. Then remove it.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I64f10be682c51c6ac5cc1af3497eb569281fcd21
Reviewed-on: http://review.whamcloud.com/12670
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5832 utils: Fix buffer overflow in bound string copy 16/12516/8
Dmitry Eremin [Fri, 31 Oct 2014 10:45:26 +0000 (13:45 +0300)]
LU-5832 utils: Fix buffer overflow in bound string copy

The function 'strncpy' may incorrectly check buffer boundaries
and may overflow buffer 'info->name' of fixed size (256). Also
there is one similar error on line 1135.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I512ab6678fbf1d02bac2eb290fd13c22fca9dc2b
Reviewed-on: http://review.whamcloud.com/12516
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5568 lnet: fix kernel crash when network failed to start 12/12512/5
Amir Shehata [Fri, 31 Oct 2014 00:50:15 +0000 (17:50 -0700)]
LU-5568 lnet: fix kernel crash when network failed to start

When loading Lustre modules without proper network configuration,
it always hit the following kernel panic:
LNetError: 105-4: Error -100 starting up LNI tcp
LNetError: 2145:0:(api-ni.c:823:lnet_unprepare())
 ASSERTION( list_empty(&the_lnet.ln_nis) ) failed:
LNetError: 2145:0:(api-ni.c:823:lnet_unprepare()) LBUG
Pid: 2145, comm: modprobe
x0aCall Trace:
[<ffffffffa044f853>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[<ffffffffa044fdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
[<ffffffffa04f3267>] lnet_unprepare+0x297/0x340 [lnet]
[<ffffffffa04f3b5c>] LNetNIInit+0x25c/0x3e0 [lnet]
[<ffffffff81061bc6>] ? put_online_cpus+0x56/0x80
[<ffffffffa0983000>] ? init_module+0x0/0x1000 [ptlrpc]
[<ffffffffa081310c>] ptlrpc_ni_init+0x2c/0x1a0 [ptlrpc]
[<ffffffffa0983000>] ? init_module+0x0/0x1000 [ptlrpc]
[<ffffffffa0813291>] ptlrpc_init_portals+0x11/0xf0 [ptlrpc]
[<ffffffffa0983000>] ? init_module+0x0/0x1000 [ptlrpc]
[<ffffffffa09831c4>] init_module+0x1c4/0x1000 [ptlrpc]
[<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[<ffffffff810ca7fb>] load_module+0x129b/0x1a90
[<ffffffff812da590>] ? ddebug_dyndbg_module_param_cb+0x0/0x60
[<ffffffff810c7133>] ? copy_module_from_fd.isra.43+0x53/0x150
[<ffffffff810cb1a6>] SyS_finit_module+0xa6/0xd0
[<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
...
This is because in lnet_startup_lndnis(), we may add list items to
@the_lnet.ln_nis and @the_lnet.ln_nis_cpt before it failed. But in
lnet_startup_lndis() failure path,it did not cleanup list thus
causing assertion in lnet_unprepare().

Fix the assertion by cleaning up using lnet_shutdown_lndnis()
if the startup fails.

In a future enahancement the ni startup API will be modified to
cleanup after itself in case of failure.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ia344fd7c0f24c87b654554dda9e57bf5525edc85
Reviewed-on: http://review.whamcloud.com/12512
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5731 osp: flush async updates for osp_sync 59/12359/2
Fan Yong [Thu, 21 Aug 2014 04:19:25 +0000 (12:19 +0800)]
LU-5731 osp: flush async updates for osp_sync

Current osp_sync() only considers the async requests that are
handled by the osp_sync_thread, but ignores the async updates
that are handled directly by the background ptlrpcd threads.
Usually, such async updates are for LFSCK remote repairing.
This patch will flush all of them when dt_sync() is called.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0e6d54120acbd8ab82cf776222277ae3b805812d
Reviewed-on: http://review.whamcloud.com/12359
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-4839 tests: Give copytool more time to start 82/12682/6
Nathaniel Clark [Wed, 12 Nov 2014 01:56:28 +0000 (20:56 -0500)]
LU-4839 tests: Give copytool more time to start

Copytool can take some time to start, and if the HSM archive directory
is on a busy NFS server, it can take a bit of time for the initial
opens to occur.  This allows those actions more time to complete which
should give this test a better chance of passing correctly.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs \
testlist=sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes,ONLY=60 \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
mdtcount=4 testlist=sanity-hsm,sanity-hsm,sanity-hsm,sanity-hsm

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I28bc57b92c34b4eee07ba34a2d976f2c39dc70dc
Reviewed-on: http://review.whamcloud.com/12682
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Michael MacDonald <michael.macdonald@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5707 lfsck: store namespace LFSCK statistics info in new EA 21/12321/5
Fan Yong [Tue, 9 Sep 2014 03:23:04 +0000 (11:23 +0800)]
LU-5707 lfsck: store namespace LFSCK statistics info in new EA

For Lustre-2.6 or older release, the namespace LFSCK statistics info
was stored as XATTR_NAME_LFSCK_NAMESPACE EA, but in Lustre-2.7, the
namespace LFSCK will introduce more statistics information that will
cause the XATTR_NAME_LFSCK_NAMESPACE EA to be extended. If it still
uses the old XATTR_NAME_LFSCK_NAMESPACE EA, then when downgrade, the
old LFSCK will get -ERANGE when load the new trace file from disk,
and then the LFSCK cannot be started after downgrade.

To avoid such trouble, Lustre-2.7 will use new EA to store the
namespace LFSCK statistics info: XATTR_NAME_LFSCK_NAMESPACE_V2,
and keep a dummy XATTR_NAME_LFSCK_NAMESPACE EA in the trace file
to be compatible with old LFSCK.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I55b5adb962434013b00e3938a67b671010ecc206
Reviewed-on: http://review.whamcloud.com/12321
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
6 years agoLU-5740 build: add RHEL6.6 [2.6.32-504.el6] to build selections 09/12609/4
Bob Glossman [Tue, 28 Oct 2014 17:25:04 +0000 (10:25 -0700)]
LU-5740 build: add RHEL6.6 [2.6.32-504.el6] to build selections

Add support for building with RHEL6.6 kernel version 2.6.32-504.el6
while retaining the ability to build with older RHEL 6.5 kernels.
New ldiskfs patch series for el6.6 is included.

Test-Parameters: clientdistro=el6.6 mdsdistro=el6.6\
  ossdistro=el6.6 mdsfilesystemtype=ldiskfs\
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I780feefbbc179607762c0d2997fd608830f3db8b
Reviewed-on: http://review.whamcloud.com/12609
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5941 build: build dkms build at installed source tree 02/12802/2
Minh Diep [Thu, 20 Nov 2014 16:10:53 +0000 (08:10 -0800)]
LU-5941 build: build dkms build at installed source tree

Port from:
https://github.com/
zfsonlinux/zfs/commit/46bf86a9635266dd399443f5bf5c5f8d0f280aa2

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: If0c8543d955594b4f9dc305c35271a9cc94e1bbd
Reviewed-on: http://review.whamcloud.com/12802
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5941 dkms: make lustre-dkms require 2.2.0.3-28.git.7c3e7c5 01/12801/2
Minh Diep [Thu, 20 Nov 2014 16:05:25 +0000 (08:05 -0800)]
LU-5941 dkms: make lustre-dkms require 2.2.0.3-28.git.7c3e7c5

Due to a bug in dkms, we need to enfore the use of
dkms-2.2.0.3-28.git.7c3e7c5 version.

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: I9ad8ccaa5106b221f41a50c520d8bdfef160c065
Reviewed-on: http://review.whamcloud.com/12801
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-2524 test: Code clean up for conf-sanity 30/10530/7
James Nunez [Fri, 30 May 2014 19:20:21 +0000 (13:20 -0600)]
LU-2524 test: Code clean up for conf-sanity

The patch modifying the tdir variable to a single directory
has landed; http://review.whamcloud.com/#/c/8123/. We can
now conduct miscellaneous cleanup including:

Remove the `-p` (parents) option from many calls to mkdir
Replace `lfs setstripe` with $SETSTRIPE
Replace `lfs getstripe` with $GETSTRIPE
Replace `lctl` with $LCTL
Added check for and call `error` and/or added error messages
for a variety of common functions.
Replace `…` with $(...)
Remove linefeed escape after |, ||, & and && operators.
Modify directory and file names to use $tdir and $tfile
Remove 'mkdir -p $MOUNT' when 'mount_client $MOUNT' is
called right before or after mkdir

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes testlist=conf-sanity

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I94bd51ce2d2f225736e12c4f9ac1a86a3d8a23d8
Reviewed-on: http://review.whamcloud.com/10530
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-5814 llite: remove ll_objects_destroy() 18/12618/2
John L. Hammond [Fri, 7 Nov 2014 15:00:09 +0000 (09:00 -0600)]
LU-5814 llite: remove ll_objects_destroy()

Remove ll_objects_destroy(). This function is not needed for
interoperability with servers of version 2.4 or higher (after lustre
commit 5165cdd4).

Remove the then unused function lov_destroy() and its supporting
functions. Remove the lsm_destroy method of struct lsm_operations.

Remove the unused struct lov_stripe_md, MD export, and capa parameters
from obd_destroy() and its implementations.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: If8634b3d88a660d00891219c348622ec45361316
Reviewed-on: http://review.whamcloud.com/12618
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5418 echo: replace lov_stripe_md with lov_oinfo 47/12447/3
John L. Hammond [Wed, 29 Oct 2014 17:15:06 +0000 (12:15 -0500)]
LU-5418 echo: replace lov_stripe_md with lov_oinfo

In echo_client replace uses of struct lov_stripe_md with struct
lov_oinfo (since the instances of the former really only contained a
single instance of the latter). Remove the then unneccessary functions
echo_alloc_memmd(), echo_free_memmd(), osc_unpackmd(), and
obd_alloc_memmd(). Remove the struct lov_stripe_md * parameter from
obd_create(). Flatten osc_create() and osc_real_create() into a single
function.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I5fe276bcc56e1fa8138a4d3f20b9d5297cf74f3f
Reviewed-on: http://review.whamcloud.com/12447
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-3962 iokit: fix whitespace in scripts 56/10456/6
Andreas Dilger [Tue, 27 May 2014 17:53:01 +0000 (11:53 -0600)]
LU-3962 iokit: fix whitespace in scripts

Fix the whitespace in mds-survey and obdfilter-survey to use tabs
instead of 4-space indentation.  Fix coding style in several places.

Remove the use of a python script just to get the page size.  Instead,
use "getconf PAGE_SIZE" to do this.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes \
testlist=mds-survey,obdfilter-survey

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I921007043c360b45d45fc03a8237edea9a3ebbe5
Reviewed-on: http://review.whamcloud.com/10456
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5537 ptlrpc: Fix an rq_no_reply assertion failure 40/11740/3
Li Wei [Wed, 3 Sep 2014 09:02:22 +0000 (17:02 +0800)]
LU-5537 ptlrpc: Fix an rq_no_reply assertion failure

An OSS had an assertion failure:

  LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout
  on bulk GET after 0+0s  req@ffff88083a61b400
  x1476486691018500/t0(4300509964)
  o4->8dda3382-83f8-6445-5eea-828fd59e4a06@192.168.1.116@o2ib1:0/0
  lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc
  0/0
  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION(
  req->rq_no_reply == 0 ) failed:
  Lustre: soaked-OST0000: Bulk IO write error with
  8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116@o2ib1),
  client will retry: rc -110
  LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG
  Pid: 5432, comm: ll_ost_io03_003

  Call Trace:
  [<ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
  [<ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs]
  [<ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc]
  [<ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
  [<ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
  [<ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
  [<ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc]
  [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
  [<ffffffff81529246>] ? schedule+0x176/0x3b0
  [<ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
  [<ffffffff8109abf6>] kthread+0x96/0xa0
  [<ffffffff8100c20a>] child_rip+0xa/0x20
  [<ffffffff8109ab60>] ? kthread+0x0/0xa0
  [<ffffffff8100c200>] ? child_rip+0x0/0x20

The thread in tgt_brw_write() had decided not to reply by setting
rq_no_reply, right before another thread tried to send an early reply
for the request.

Change-Id: I9096a098621a38610c0d0d2dff016c012fc4b7f2
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/11740
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
6 years agoLU-20 kernel: increase BH_LRU_SIZE to 16 77/12577/2
Sebastien Buisson [Wed, 5 Nov 2014 15:34:14 +0000 (16:34 +0100)]
LU-20 kernel: increase BH_LRU_SIZE to 16

As kernel community did not want a complicated way of
modifying BH_LRU_SIZE, it was proposed to directly set it
to 16. This has been accepted.
This patch is merged in the upstream kernel:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/
linux.git/commit/?id=86cf78d73de8c6bfa89804b91ee0ace71a459961

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Change-Id: I71fb455de9ec70ed90f86d402ae76ecfba1e1e61
Reviewed-on: http://review.whamcloud.com/12577
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
6 years agoLU-5729 osd: iput in case of error in osd_scrub_setup 25/12325/4
Sergey Cheremencev [Fri, 26 Sep 2014 13:00:56 +0000 (17:00 +0400)]
LU-5729 osd: iput in case of error in osd_scrub_setup

In case of ENOSPACE from osd_scrub_file_store iput is needed.
Otherwise there is a message in dmesg: "VFS: Busy inodes after
unmount of vdb. Self-destruct in 5 seconds. Have a nice day..."
Also added osd_oi_fini for case of error from osd_initial_OI_scrub
or osd_scrub_start.

Change-Id: Ibc6f487c9bd5b07f09cb3f7e3b5fc2bf1e329fb0
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Xyratex-bug-id: MRP-2109
Reviewed-on: http://review.whamcloud.com/12325
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5855 lfsck: misc fixes for zfs-based backend 52/12552/5
Fan Yong [Wed, 3 Sep 2014 16:25:33 +0000 (00:25 +0800)]
LU-5855 lfsck: misc fixes for zfs-based backend

It contains several fixes to make the LFSCK to work under DNE mode
for zfs-based backend.

Test-Parameters: mdsfilesystemtype=zfs mdtfilesystemtype=zfs ostfilesystemtype=zfs mdscount=2 mdtcount=2 testlist=sanity-lfsck
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I8e8758336d4ce67667f7e3586475ddd72db2d419
Reviewed-on: http://review.whamcloud.com/12552
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5833 lfsck: handle lfsck_open_dir() return-value properly 33/12533/3
Fan Yong [Tue, 2 Sep 2014 11:06:03 +0000 (19:06 +0800)]
LU-5833 lfsck: handle lfsck_open_dir() return-value properly

Inside the lfsck_prep(), the returned value from lfsck_open_dir()
should be handled properly before returning to the caller directly.
For example: positive number from lfsck_open_dir() means the end of
current directory, but if continuously return such value to the
lfsck_prep()'s caller, then the whole LFSCK first-stage scanning
will be regarded as done by wrong.

Test-Parameters: mdsfilesystemtype=zfs mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=sanity-lfsck
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9e5c32b8594a65f1b605196373034ace6c9d1881
Reviewed-on: http://review.whamcloud.com/12533
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-5817 clio: Do not allow group locks with gid 0 59/12459/4
Patrick Farrell [Mon, 10 Nov 2014 07:39:29 +0000 (01:39 -0600)]
LU-5817 clio: Do not allow group locks with gid 0

When a group lock with GID=0 is released (put_grouplock is
called), an assertion in cl_put_grouplock is hit.

We should not allow group lock requests with GID=0, instead
we should return -EINVAL.

Also fix random_group_id so it never returns gid==0.

Change-Id: I56e58791742809da5353a4d8dfbf3b80a22f3814
Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: http://review.whamcloud.com/12459
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
6 years agoLU-5732 hsm: sanity check for progress input 85/12285/4
Frank Zago [Sun, 12 Oct 2014 18:57:05 +0000 (13:57 -0500)]
LU-5732 hsm: sanity check for progress input

During an HSM archive or restore, the progress is reported by the
copytool, in userspace. That value may be bogus. For instance, this
will crash the MDS in interval_set():

he.offset = -1;
he.length = 10;
rc = llapi_hsm_action_progress(hcp, &he, length, 0);

So check that userspace is giving a sane progress extent value.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I0eb3fa9a66400a4ff3cee2f256c08e1d84744111
Reviewed-on: http://review.whamcloud.com/12285
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-3270 statahead: small fixes and cleanup 67/9667/12
Lai Siyao [Fri, 14 Mar 2014 12:10:32 +0000 (20:10 +0800)]
LU-3270 statahead: small fixes and cleanup

small fixes:
* when 'unplug' is set for ll_statahead(), sa_put() shouldn't kill
  the entry found, because its inflight RPC may not finish yet.
* remove 'sai_generation', add 'lli_sa_generation' because the
  former one is not safe to access without lock.
* revalidate_statahead_dentry() may fail to wait for statahead
  entry to become ready, in this case it should not release this
  entry, because it may be used by inflight statahead RPC.

cleanups:
* rename ll_statahead_enter() to ll_statahead().
* move dentry 'lld_sa_generation' update to ll_statahead() to
  simplify code and logic.
* other small cleanups.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I65759c7dfcbe879b42f14152dbfe5949e3d37ea0
Reviewed-on: http://review.whamcloud.com/9667
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5473 tests: print space usage in sanity test_51b 85/12185/7
Andreas Dilger [Fri, 24 Oct 2014 07:23:14 +0000 (00:23 -0700)]
LU-5473 tests: print space usage in sanity test_51b

In sanity.sh test_51b print out the space usage before and after
the test so that the failure can be debugged.

Skip test_51b and test_51ba for ZFS when running regular review
tests, since there isn't a limit of 60000 subdirectories (ZFS
nlink is a 64-bit number), and they take a long time to run in
a VM (20 minutes combined).

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ENABLE_QUOTA=yes,ONLY=51 \
mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs \
clientdistro=el6 ossdistro=el6 mdsdistro=el6 \
mdtcount=1 mdssizegb=2 ostcount=2 ostsizegb=8 \
testlist=sanity

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ENABLE_QUOTA=yes,ONLY=51 \
mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs \
clientdistro=el6 ossdistro=el6 mdsdistro=el6 \
mdtcount=1 mdssizegb=2 ostcount=2 ostsizegb=8 nettypes=o2ib \
testlist=sanity

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I21b072fbcb05dea3fd7803bf3353de11ffbcab07
Reviewed-on: http://review.whamcloud.com/12185
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5778 lod: Fix lod_qos_statfs_update() 17/12617/2
Aurelien Degremont [Fri, 7 Nov 2014 13:10:03 +0000 (14:10 +0100)]
LU-5778 lod: Fix lod_qos_statfs_update()

When an OST is sick, or unactivate, lod cannot fetch its statfs
information. In lod_qos_statfs_update() this was preventing lod
to get information for other OST because refresh was stopped at
first error.
This patch fixes this behaviour.

Signed-off-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Change-Id: Id0217f228381ef7a41fdbfd99f5499dcc97ace0e
Reviewed-on: http://review.whamcloud.com/12617
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5865 lfsck: avoid NULL pointer 97/12597/2
Fan Yong [Thu, 4 Sep 2014 01:37:08 +0000 (09:37 +0800)]
LU-5865 lfsck: avoid NULL pointer

NOT pass "NULL" as the parameter of @lmv for lfsck_record_lmv(),
then the subsequent handling inside lfsck_record_lmv() needs NOT
to worry about the case of "lmv == NULL".

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I5f308818edd5ded2c4ccc7d59fb0908791b8aae3
Reviewed-on: http://review.whamcloud.com/12597
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-181 ptlrpc: reorganize ptlrpc_request 06/8806/16
Liang Zhen [Sun, 5 Jan 2014 15:00:26 +0000 (23:00 +0800)]
LU-181 ptlrpc: reorganize ptlrpc_request

ptlrpc_request has some structure members are only for client side,
and some others are only for server side, this patch moved these
members to different structure then putting into an union.

By doing this, size of ptlrpc_request is decreased about 300 bytes,
besides saving memory, it also can reduce memory footprint while
processing.

Another change in this patch is, osp will not use rq_exp_list anymore
because it is a server only member now.
osp will use ptlrpc_req_async_args to store commit_cb parameters in
this patch.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: Id910ac225b8e9d33a0cae40b3124ce55f1a3fbc9
Reviewed-on: http://review.whamcloud.com/8806
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
6 years agoLU-5654 osp: Call obd_fid_fini() on osp_init0() error path 37/12037/2
Li Wei [Wed, 24 Sep 2014 09:09:48 +0000 (17:09 +0800)]
LU-5654 osp: Call obd_fid_fini() on osp_init0() error path

osp_init0() should call obd_fid_fini() on its error path to avoid
leaks.

Change-Id: I1a679db172ae60c74049d2dd3e111c93cfcbeda2
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/12037
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
6 years agoLU-5854 lnet: make YAML output/input consistent 56/12556/6
Amir Shehata [Tue, 4 Nov 2014 20:49:51 +0000 (12:49 -0800)]
LU-5854 lnet: make YAML output/input consistent

The YAML format used for configuring and showing networks was not
consistent. This patch makes both formats consistent.

EX:
net:
    - net: tcp
      nid: 192.168.206.130@tcp
      status: up
      interfaces:
          0: eth0
      tunables:
          peer_timeout: 180
          peer_credits: 8
          peer_buffer_credits: 0
          credits: 256
      CPT: [0,1]

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Id4314679930709ac43104f1ba544bb6d1ca8cb0a
Reviewed-on: http://review.whamcloud.com/12556
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5831 lfsck: extend lfsck_request::lr_pool_name 34/12534/3
Fan Yong [Mon, 1 Sep 2014 12:11:30 +0000 (20:11 +0800)]
LU-5831 lfsck: extend lfsck_request::lr_pool_name

Fix some issues found during Lustre source static analysis:

1) Extend lfsck_request::lr_pool_name size to match the
   lmv_mds_md_v1::lmv_pool_name.
2) Check lfsck->li_obj_dir inside lfsck_close_dir() before using.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I84443089135c5de1b9fa89eb76e5cd623412a01f
Reviewed-on: http://review.whamcloud.com/12534
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
6 years agoLU-4217 build: Remove ACL mount options 54/12154/3
James Nunez [Wed, 1 Oct 2014 14:06:08 +0000 (08:06 -0600)]
LU-4217 build: Remove ACL mount options

The "acl" mount option on the client has been
depricated since Lustre 1.8. The "acl" mount
option code is now obsolete and is removed.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ib765476a71ebb732d9ffda60b336530e0a758943
Reviewed-on: http://review.whamcloud.com/12154
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5006 mdd: don't call attr_set on object create 43/12243/4
Niu Yawei [Thu, 9 Oct 2014 08:11:31 +0000 (04:11 -0400)]
LU-5006 mdd: don't call attr_set on object create

The object attr has been initialzed in OSD layer when create
object, it's not necessary to initialize it again in MDD layer.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I6f4094d4384b2c153d4dad2666d64281c0450059
Reviewed-on: http://review.whamcloud.com/12243
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5848 lnet: fix inconsistent seq_no names 62/12562/2
Amir Shehata [Tue, 4 Nov 2014 21:29:06 +0000 (13:29 -0800)]
LU-5848 lnet: fix inconsistent seq_no names

When YAML output is printed the literal "seqno" is used,
when it's parsed, the literal "seq_no" is expected.
This patch makes it consistent.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Iabf5394e858007c7f6e87c7baf892887da88f8e3
Reviewed-on: http://review.whamcloud.com/12562
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-4730 utils: lctl get_param, set_param cleanup 45/9545/10
Andreas Dilger [Mon, 3 Nov 2014 15:42:18 +0000 (10:42 -0500)]
LU-4730 utils: lctl get_param, set_param cleanup

Cleanup "lctl get_param" and "lctl set_param" code and error handling.
Deny "parameters" with embedded relative paths to avoid strangeness.
Return an error consistently if multiple parameters are set but the
last one did not fail.  Remove deprecated full-path handling.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I1004b5b4da4dc9b5825ef498758e248ed52f4141
Reviewed-on: http://review.whamcloud.com/9545
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Chao Wang <chao.ornl@gmail.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5329 mgs: Remove nibtbl swab code for 2.2 clients 10/12010/3
James Nunez [Mon, 22 Sep 2014 22:50:59 +0000 (16:50 -0600)]
LU-5329 mgs: Remove nibtbl swab code for 2.2 clients

Remove obsolete code that allows compatibility with
Lustre 2.2 clients.

Due to a bug, Lustre 2.2 clients always swab nidtbl
entries even if the server and client are using the
same endian. The fix was to allow the servers to do
the swabbing for the client.

Now, clients will do the swabbing.

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I420ca986c0a68343be07272bb419cbdb1cebf148
Reviewed-on: http://review.whamcloud.com/12010
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
6 years agoLU-2675 libcfs: remove libcfs posix headers 87/11987/5
John L. Hammond [Mon, 3 Nov 2014 19:42:49 +0000 (14:42 -0500)]
LU-2675 libcfs: remove libcfs posix headers

Remove libcfs/include/libcfs/posix/. Include what was needed from
libcfs/posix/libcfs.h into libcfs/libcfs.h or in the appropriate .c
file.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ia3016c83f13554b617c5f4a6dcc86adf222d4e49
Reviewed-on: http://review.whamcloud.com/11987
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-5551 lustre: fix messages with missisng newlines 96/11996/5
John L. Hammond [Fri, 19 Sep 2014 15:09:51 +0000 (10:09 -0500)]
LU-5551 lustre: fix messages with missisng newlines

Add missisng newlines to four CERROR() messages. Restore the trailing
newline in the definition of OSC_DUMP_GRANT(). Remove an unnecessary
CDEBUG() from ldlm_pool_recalc().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I549de59dd9cd205e1a6d0fbcd70ccd1cbf5e389b
Reviewed-on: http://review.whamcloud.com/11996
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-5871 lod: Do not return EAGAIN in lod_object_init 86/12586/4
Wang Di [Wed, 5 Nov 2014 18:46:59 +0000 (10:46 -0800)]
LU-5871 lod: Do not return EAGAIN in lod_object_init

Convert EAGAIN to EIO if fld_client_rpc() fails in
lod_object_init(), otherwise it will confuse
lu_object_find_at(), and make it wait there for no
reason, which should only wait if the object is dying.
See call chain lu_object_find_at()-> lu_object_find_try()
->lu_object_alloc()->lod_object_init()->lod_fld_lookup()
->fld_client_rpc(), and even worse waitq is not being
initialized yet when the failure happened here.

Change-Id: Ieae434b34c239efea86a4a471fb01e397336a31c
Signed-off-by: Wang Di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/12586
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-3573 osd-zfs: Only advance zap cursor as needed 82/12582/3
Nathaniel Clark [Wed, 5 Nov 2014 18:05:22 +0000 (13:05 -0500)]
LU-3573 osd-zfs: Only advance zap cursor as needed

Only advance the zap cursor when ozi_pos is not advanced, otherwise
occasionally the a file could get "lost" because the zap_cursor would
advance over it before the retrieve happened.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Iad560e2ffb4cfe2c74a1cf9197be7c2537538822
Reviewed-on: http://review.whamcloud.com/12582
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5565 osd-ldiskfs: separate LASSERT() into two lines 98/12398/2
Andreas Dilger [Wed, 22 Oct 2014 18:35:31 +0000 (12:35 -0600)]
LU-5565 osd-ldiskfs: separate LASSERT() into two lines

Separate the compound assertions in osd-ldiskfs into two lines:

    LASSERT(dt_object_exists(dt) && !dt_object_remote(dt));
to
    LASSERT(dt_object_exists(dt));
    LASSERT(!dt_object_remote(dt));

so that it is possible to distinguish which of the two is being hit.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I3ff4fc28bffe955ab051ece665faa4c8a6500c1e
Reviewed-on: http://review.whamcloud.com/12398
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
6 years agoLU-5626 ldiskfs: update non-htree dotdot in rename 85/12585/2
Bob Glossman [Wed, 5 Nov 2014 17:40:42 +0000 (09:40 -0800)]
LU-5626 ldiskfs: update non-htree dotdot in rename

This mod duplicates changes previously committed only for el6
for sles11sp3.

In 2.4+, when renaming a directory, its old dotdot entry will
be removed firstly, then the new dotdot entry is inserted, and
ldiskfs tries to append FID-in-dirent to the new entry.
But the space for dotdot entry may not be enough to hold
the new dotdot with FID-in-dirent, such as an MDT device
restored from file-level backup, or a device upgraded from 1.8.

In that case, for non-HTree directories, the ".." entry
will be written in the next available space in the directory
block.  This is invalid, as the ".." entry must be the
second entry in the block.

The same bug was fixed for HTree directories in LU-2638.
As Fan Yong said then: we do not want to introduce
complex logic to handle directory data moving, instead, in
such case, ignore the FID-in-dirent for the new dotdot entry,
and just insert the new dotdot entry.

There is one known flaw: This patch, like the one for
LU-2638, skips the entire data section rather than just
the FID.  This could cause trouble if something else ever
uses this section with ".." entries.

Test-Parameters: mdsdistro=sles11sp3 ossdistro=sles11sp3 \
 mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs \
 ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Iaba11ac19ab7f802925af7a562ad7f739e6ed5c8
Reviewed-on: http://review.whamcloud.com/12585
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-5830 scripts: use lustre_rmmod in lnet start/stop script 13/12513/4
Bruno Faccini [Fri, 31 Oct 2014 00:59:51 +0000 (01:59 +0100)]
LU-5830 scripts: use lustre_rmmod in lnet start/stop script

In lnet's start/stop script stop phase, use lustre_rmmod instead
to try to unload a static list/sequence of modules.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ie8584b32e4d7cd21de0ed18954aa38124485964d
Reviewed-on: http://review.whamcloud.com/12513
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian J. Murrell <brian.murrell@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoNew tag 2.6.90 2.6.90 v2_6_90 v2_6_90_0
Oleg Drokin [Thu, 6 Nov 2014 18:50:28 +0000 (13:50 -0500)]
New tag 2.6.90

Change-Id: I9fcc98e0df6a44f5836c6a038fd59d8614200bd8

6 years agoLU-5863 utils: Handle the special case of ldd_svname for mgs 64/12564/3
James Simmons [Wed, 5 Nov 2014 00:43:44 +0000 (19:43 -0500)]
LU-5863 utils: Handle the special case of ldd_svname for mgs

Currently parse_ldd checks to see if ldd_svname is 8 or more
characters in length. This is true for all servers except
the mgs which is always labeled as "MGS". This can prevent
the mounting of the MGT. The solution is to see if we are
handling a MGT which doesn't require any extra type of
special handling that other OSD need.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I68582471e2b6ce47473a4fefb21e589c8c5b3730
Reviewed-on: http://review.whamcloud.com/12564
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5814 lov: remove unused {get,set}_info handlers 45/12445/4
John L. Hammond [Mon, 27 Oct 2014 22:26:04 +0000 (17:26 -0500)]
LU-5814 lov: remove unused {get,set}_info handlers

In LOV and OSC remove handlers for the obsolete get and set info keys:
KEY_CAPA_KEY, KEY_CONNECT_FLAG, KEY_EVICT_BY_NID, KEY_LAST_ID,
KEY_LOCK_TO_STRIPE, KEY_MDS_CONN, KEY_NEXT_ID.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iab1adaffc4ea860ea6ce2a2614b5ab6f6444e34b
Reviewed-on: http://review.whamcloud.com/12445
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-5691 hsm: remove a request from the index if not present in the store 42/12142/2
Frank Zago [Tue, 30 Sep 2014 23:05:48 +0000 (18:05 -0500)]
LU-5691 hsm: remove a request from the index if not present in the store

When processing the list of requests that have aged out, if the
request cannot be found in the store, removing it from the index. If
that is not done, Lustre will try again to remove it, leading to an
endless cycle of cancellation.

This fixes the repetition of these messages:

  LustreError:
  2028:0:(mdt_coordinator.c:1465:mdt_hsm_update_request_state())
  tas01-MDT0000: Cannot find running request for cookie 0x54249515 on
  fid=[0x200000404:0x15caa:0x0]
  LustreError:
  2028:0:(mdt_coordinator.c:1465:mdt_hsm_update_request_state())
  Skipped 15979999 previous similar messages
  LustreError: 2028:0:(mdt_coordinator.c:339:mdt_coordinator_cb())
  tas01-MDT0000: Cannot cleanup timeouted request:
  [0x200000404:0x15caa:0x0] for cookie 0x54249515 action=CANCEL
  LustreError: 2028:0:(mdt_coordinator.c:339:mdt_coordinator_cb())
  Skipped 15979999 previous similar messages

Change-Id: Ie7a2a98be8cc97db9af7a64476c06fc7321544eb
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12142
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5853 build: fix el7 build regression 46/12546/2
Bob Glossman [Mon, 3 Nov 2014 22:43:27 +0000 (14:43 -0800)]
LU-5853 build: fix el7 build regression

Correct the build failure caused by recent master landing for
LU-4647 nodemap: add mapping functionality by using PDE_DATA()
instead of PDE() in new code.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I5b43e485cf5ba25e8473ed5783848aca77b96048
Reviewed-on: http://review.whamcloud.com/12546
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5383 utils: fix array index out of bounds 24/12524/2
Dmitry Eremin [Fri, 31 Oct 2014 15:33:59 +0000 (18:33 +0300)]
LU-5383 utils: fix array index out of bounds

Possible attempt to access element -8..-1 of array 'ldd_svname'.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ib4ec6a6d74ff6e805725d0ff4487868b7cbffa2f
Reviewed-on: http://review.whamcloud.com/12524
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5577 changelog: fix comparison between signed and unsigned 74/12474/2
Dmitry Eremin [Wed, 29 Oct 2014 12:36:25 +0000 (15:36 +0300)]
LU-5577 changelog: fix comparison between signed and unsigned

Change type of changelog_*{namelen,size}() to size_t.
Fixed string specifier for unsigned types.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ie24c87242328d14ee608ad38b530a6e581db93b9
Reviewed-on: http://review.whamcloud.com/12474
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
6 years agoLU-5814 echo: remove userspace LSM handling 46/12446/4
John L. Hammond [Mon, 27 Oct 2014 22:53:04 +0000 (17:53 -0500)]
LU-5814 echo: remove userspace LSM handling

In lustre/obdecho/echo_client.c, remove handling of lov_stripe_md
passed from userspace (since userspace never passes it). Remove the
LOV specific code (ed_next_islov) from the echo client (since it
doesn't work).

Remove echo_get_stripe_off_id() and all calls to it since the stripe
count of the passed in lsm is always 0 and the funciton does nothing
in this case. Remove the then unused lsm parameters of
echo_client_page_debug_setup() and echo_client_page_debug_check().

In the OBD_IOC_GETATTR and OBD_IOC_SETATTR cases of
echo_client_iocontrol() do not set the oi_md member of struct obd_info
since only LOV OBD methods access it.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: If5d31ca3bf798d2e4f6c4f63c2012160e50f8cd7
Reviewed-on: http://review.whamcloud.com/12446
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
6 years agoLU-5814 lov: remove LL_IOC_RECREATE_{FID,OBJ} 42/12442/4
John L. Hammond [Mon, 27 Oct 2014 21:13:21 +0000 (16:13 -0500)]
LU-5814 lov: remove LL_IOC_RECREATE_{FID,OBJ}

Remove the obsolete ioctls LL_IOC_RECREATE_FID and LL_IOC_RECREATE_OBJ
along with their handlers in llite. Remove the then unused OBD method
lov_create(). Remove OBD_FL_RECREATE_OBJS handling from osc_create().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ib7183235d9eb761d2dfa2072dbeb8dd4d918e4ad
Reviewed-on: http://review.whamcloud.com/12442
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
6 years agoLU-2675 utils: remove loadgen 95/12395/2
John L. Hammond [Wed, 22 Oct 2014 18:33:01 +0000 (13:33 -0500)]
LU-2675 utils: remove loadgen

Remove lustre/utils/loadgen.c. It doesn't work and is not being
maintained.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I214cbbde5a3f18dd2050e852d33cca2bc2998b6a
Reviewed-on: http://review.whamcloud.com/12395
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
6 years agoLU-5577 obdclass: change uuid_unpack arg to size_t 89/12389/2
Dmitry Eremin [Mon, 13 Oct 2014 17:28:08 +0000 (21:28 +0400)]
LU-5577 obdclass: change uuid_unpack arg to size_t

Cleanup warnings about comparison between signed and unsigned.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ib577a36879a1a57f20f13a5c4c697ba404e113fa
Reviewed-on: http://review.whamcloud.com/12389
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5577 obdclass: change lu_site->ls_purge_start to unsigned 84/12384/2
Dmitry Eremin [Fri, 10 Oct 2014 19:26:30 +0000 (23:26 +0400)]
LU-5577 obdclass: change lu_site->ls_purge_start to unsigned

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ic2d6906eff21ab1fe964899f0da9732e68c193f7
Reviewed-on: http://review.whamcloud.com/12384
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5577 mdd: lu_dirent_calc_size() return type to size_t 83/12383/2
Dmitry Eremin [Fri, 10 Oct 2014 19:15:24 +0000 (23:15 +0400)]
LU-5577 mdd: lu_dirent_calc_size() return type to size_t

Change the type accordant usage.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I2451940225dd30015928cf85a2e0cc0e6cc8dfeb
Reviewed-on: http://review.whamcloud.com/12383
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5577 ldlm: count of pools is unsigned long 04/12304/3
Dmitry Eremin [Wed, 15 Oct 2014 19:12:13 +0000 (23:12 +0400)]
LU-5577 ldlm: count of pools is unsigned long

Function ldlm_pools_count() return unsigned long but counter is int.
Use ldlm_pool_granted() everywhere.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I6ee7bb1ba7f9590a776465b00e63584aada5f7dc
Reviewed-on: http://review.whamcloud.com/12304
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5387 test: Skip sanity test_239 if MDS version older than 2.5.60 41/12241/2
Wei Liu [Wed, 8 Oct 2014 22:37:41 +0000 (15:37 -0700)]
LU-5387 test: Skip sanity test_239 if MDS version older than 2.5.60

Skip sanity test_239 if MDS version older than 2.5.60

Change-Id: Ic2c71235a0f05ecb4a8b111a6044efe51c5270c8
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/12241
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
6 years agoLU-5668 test: enable ior data consistency check 58/12058/9
Johann Lombardi [Thu, 25 Sep 2014 14:04:13 +0000 (16:04 +0200)]
LU-5668 test: enable ior data consistency check

Check file content (-W) is consistent during read phase of IOR
and also change task ordering to n+1 ordering for readback (-C).

Test-Parameters: testlist=parallel-scale

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Change-Id: Ic9b1c3cda48ebd08907a6251f62cf2c845d00476
Reviewed-on: http://review.whamcloud.com/12058
Tested-by: Jenkins
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5443 lustre: replace direct HZ access with kernel APIs 52/12052/8
Jian Yu [Fri, 31 Oct 2014 14:15:10 +0000 (10:15 -0400)]
LU-5443 lustre: replace direct HZ access with kernel APIs

On some customer's systems, kernel was compiled with HZ defined to
100, instead of 1000. This improves performance for HPC applications.
However, to use these systems with Lustre, customers have to re-build
Lustre for the kernel because Lustre directly uses the defined
constant HZ.

Since kernel 2.6.21, some non-HZ dependent timing APIs become non-
inline functions, which can be used in Lustre codes to replace the
direct HZ access.

These kernel APIs include:
 jiffies_to_msecs()
 jiffies_to_usecs()
 jiffies_to_timespec()
 msecs_to_jiffies()
 usecs_to_jiffies()
 timespec_to_jiffies()

And here are some samples of the replacement:
 HZ            -> msecs_to_jiffies(MSEC_PER_SEC)
 n * HZ        -> msecs_to_jiffies(n * MSEC_PER_SEC)
 HZ / n        -> msecs_to_jiffies(MSEC_PER_SEC / n)
 n / HZ        -> jiffies_to_msecs(n) / MSEC_PER_SEC
 n / HZ * 1000 -> jiffies_to_msecs(n)

This patch replaces the direct HZ access in lustre modules.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Ib0ed9b5faf6ed311ff5423873d1c125b02ec4ab5
Reviewed-on: http://review.whamcloud.com/12052
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5545 ptlrpc: false alarm in AT network latency measuring 18/12018/5
Liang Zhen [Tue, 23 Sep 2014 13:00:27 +0000 (21:00 +0800)]
LU-5545 ptlrpc: false alarm in AT network latency measuring

If early reply of client RPC is lost and client RPC is expired and
resent, server will drop the resent RPC because it's already in
processing, server may also send reply or early reply to client,
which can still match reply buffer of the original request.
In this case, client is measuring time from resent time, but server
is reporting service time of original RPC, which is longer than
the time measured by client.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I772fb054480a3212e28dc018a6f592f3da7a87b5
Reviewed-on: http://review.whamcloud.com/12018
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
6 years agoLU-5651: ptlrpc: fix import state during replay 15/12015/4
Andriy Skulysh [Tue, 21 Oct 2014 20:53:11 +0000 (16:53 -0400)]
LU-5651: ptlrpc: fix import state during replay

Client doesn't restore import state correctly
on reconnect during replay. It resends lock replay
when final ping was queued by server.
Server fails with "target_queue_recovery_request())
ASSERTION( req->rq_export->exp_lock_replay_needed ) failed"

Add imp_replay_state to store last replay state.
imp_state is restored from imp_replay_state
during reconnect.

Xyratex-bug-id: MRP-2022
Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Change-Id: Iaa14fe968cc31f266b605785df4fa676083fbca4
Reviewed-on: http://review.whamcloud.com/12015
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5591 lod: fix Null pointer dereference in lod_ah_init() 70/11770/8
Dmitry Eremin [Fri, 5 Sep 2014 13:35:55 +0000 (17:35 +0400)]
LU-5591 lod: fix Null pointer dereference in lod_ah_init()

Null pointer may be dereferenced in lod_ah_init()

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I76567d222ac7eb74440c74692aaa79f7078bca61
Reviewed-on: http://review.whamcloud.com/11770
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-5589 obdclass: fix NULL pointer dereference 69/11769/5
Dmitry Eremin [Fri, 5 Sep 2014 13:13:34 +0000 (17:13 +0400)]
LU-5589 obdclass: fix NULL pointer dereference

NULL pointer dereferenced without check in lsi_prepare()

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I144c9bcd3739c68563c7460799efa4897489a1a7
Reviewed-on: http://review.whamcloud.com/11769
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-2675 obd: cleanup struct md_op_data and uses 34/11734/4
John L. Hammond [Tue, 2 Sep 2014 19:30:07 +0000 (14:30 -0500)]
LU-2675 obd: cleanup struct md_op_data and uses

Make the following changes in or around struct md_op_data:

* Move the definition of enum op_cli_flags from lclient.h to obd.h and
  rename it to enum md_cli_flags.

* Change to type of the op_flags member from __u32 to enum
  md_op_flags.

* Remove the used but never set member op_npages.

* Remove the set but never used member op_stripe_offset (an alias for
  op_ioepoch).

* Remove the op_max_pages alias for op_valid and add a op_max_pages
  member.

* Add a new member op_attr_flags.

* Remove the definition and all uses of struct ll_iattr. This
  structure was only used in expressions of the form
  ((struct ll_iattr *)&op_data->op_attr)->ia_attr_flags which can all
  be rewritten as op_data->op_attr_flags.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I17aabfcecdfd1a02dbee04362b033ef404a2cb27
Reviewed-on: http://review.whamcloud.com/11734
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-4167 tests: correct version check to enable ff_convert 56/11556/6
Emoly Liu [Sat, 30 Aug 2014 21:12:48 +0000 (05:12 +0800)]
LU-4167 tests: correct version check to enable ff_convert

In conf-sanity.sh test_32d, filter fid convertion can be made only
when both of the following conditions are satisfied:
- ost device img version < 2.3.64 (struct filter_fid_old has been
introduced since 2.3.64)
- ost server version >= 2.5

Also, this patch fix the ofd_iocontrol() message to print the ioctl()
CMD argument as hex instead of a signed integer.

Test-Parameters: testlist=conf-sanity envdefinitions=SLOW=yes,ONLY=32 ossjob=lustre-b2_4 mdsjob=lustre-b2_4 ossbuildno=73 mdsbuildno=73  ossdistro=el6 mdsdistro=el6
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I4045fe6b7504a3ed30436da5a1319c09331fc261
Reviewed-on: http://review.whamcloud.com/11556
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
6 years agoLU-5577 mdc: fix comparison between signed and unsigned 79/11379/17
Dmitry Eremin [Wed, 29 Oct 2014 11:37:08 +0000 (14:37 +0300)]
LU-5577 mdc: fix comparison between signed and unsigned

Change type of client_obd->*_mds_*size from int to __u32 and
argumanets of related create/rename/setattr functions.
Change type of op_data->op_namelen to size_t.
Change type of argument size for all mdc_*_pack() to size_t.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I97f4fa6928c24fa416c334206c75f9885266b1ae
Reviewed-on: http://review.whamcloud.com/11379
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-4176 tests: re-enable sanity-hsm/test_31a 77/9577/5
Bruno Faccini [Sun, 2 Nov 2014 20:06:46 +0000 (15:06 -0500)]
LU-4176 tests: re-enable sanity-hsm/test_31a

Due to frequent failures and despite first patch/fix for LU-4176 has
landed, sanity-hsm/test_31a has been disabled like other sub-tests
in a patch for LU-4178. Since both test_31[b,c], which also triggered
the same issue, are running fine now and with original fix for
LU-4176, it is time to re-enable it.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I5cc6c23b7cb48e190438eb3b84fa55ffdb198739
Reviewed-on: http://review.whamcloud.com/9577
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
6 years agoLU-5807 qos: enable QOS_DEBUG() 34/12434/3
Niu Yawei [Mon, 27 Oct 2014 12:43:07 +0000 (08:43 -0400)]
LU-5807 qos: enable QOS_DEBUG()

Enable QOS_DEBUG() by default.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: If2b0e1388a5f12edd5b03ffc1081709b9efb1c13
Reviewed-on: http://review.whamcloud.com/12434
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
6 years agoLU-4856 obdclass: check val in proc_max_dirty_pages_in_mb() 69/12269/4
Jian Yu [Fri, 10 Oct 2014 19:39:26 +0000 (12:39 -0700)]
LU-4856 obdclass: check val in proc_max_dirty_pages_in_mb()

In proc_max_dirty_pages_in_mb(), assigning "__u64 val" to
"unsigned long obd_max_dirty_pages" will cause values over
2^32 to be truncated on a 32-bit client (where "unsigned long"
is 32 bits, and not 64 bits).

This patch fixes the above issue by checking "val" in an
acceptable range before assigning it to obd_max_dirty_pages.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I838b5bab283a0068f72a86f5a990d909c892a9d9
Reviewed-on: http://review.whamcloud.com/12269
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
6 years agoLU-5380 at: net AT after connect 55/11155/2
Alexander.Boyko [Mon, 21 Jul 2014 10:18:23 +0000 (14:18 +0400)]
LU-5380 at: net AT after connect

Once connected, the previously gathered AT statistics is not valid
anymore because may reflect other routing, etc. The connect by itself
could take a long time due to different reasons (e.g. server was not
ready) and net latency got very high (see import_select_connection())
what does not reflect the current situation.

Take into account only the current (re-)CONNECT rpc latency.

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Xyratex-bug-id: MRP-1285
Change-Id: I6edc0e232a92319e7c8535aced28fe1ad3436c54
Reviewed-on: http://review.whamcloud.com/11155
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
6 years agoLU-4810 utils: print messages when set tunables 65/9865/5
Niu Yawei [Mon, 28 Apr 2014 04:39:38 +0000 (00:39 -0400)]
LU-4810 utils: print messages when set tunables

When set scheduler and max_sectors_kb on mount, error messages should
be printed to notify user that these tunables are changed by lustre.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I6a3618b7e5eb0127e9aea18631397cd4dcbde546
Reviewed-on: http://review.whamcloud.com/9865
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Blake Caldwell <blakec@ornl.gov>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-3259 clio: cl_lock simplification 58/10858/15
Jinshan Xiong [Fri, 26 Sep 2014 21:46:30 +0000 (14:46 -0700)]
LU-3259 clio: cl_lock simplification

In this patch, the cl_lock cache is eliminated. cl_lock is turned
into a cacheless data container for the requirements of locks to
complete the IO. cl_lock is created before I/O starts and destroyed
when the I/O is complete.

cl_lock depends on LDLM lock to fulfill lock semantics. LDLM lock
is attached to cl_lock at OSC layer. LDLM lock is still cacheable.

Two major methods are supported for cl_lock: clo_enqueue and
clo_cancel.  A cl_lock is enqueued by cl_lock_request(), which will
call clo_enqueue() methods for each layer to enqueue the lock.
At the LOV layer, if a cl_lock consists of multiple sub cl_locks,
each sub locks will be enqueued correspondingly. At OSC layer, the
lock enqueue request will tend to reuse cached LDLM lock; otherwise
a new LDLM lock will have to be requested from OST side.

cl_lock_cancel() must be called to release a cl_lock after use.
clo_cancel() method will be called for each layer to release the
resource held by this lock. At OSC layer, the reference count of LDLM
lock, which is held at clo_enqueue time, is released.

LDLM lock can only be canceled if there is no cl_lock using it.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I6a61250549cfbc28070fe4bb7789ba7429eaf089
Reviewed-on: http://review.whamcloud.com/10858
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>