Whamcloud - gitweb
fs/lustre-release.git
9 years agoLU-4090 kernel: a kernel patch for jbd2 hung 60/8060/2 b1_8
Bobi Jam [Thu, 24 Oct 2013 04:23:09 +0000 (12:23 +0800)]
LU-4090 kernel: a kernel patch for jbd2 hung

patch get from linux vanilla kernel commit
0ef54180e0187117062939202b96faf04c8673bc (v3.10-rc2)

jbd2:drop checkpoint mutex when waiting in __jbd2_log_wait_for_space()

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm              D ffff8802203afbc0  4600  4878   4748 0x00000000
ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
[<ffffffff8172dc34>] schedule+0x24/0x70
[<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
[<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
[<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
[<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
[<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
[<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
[<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
[<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
[<ffffffff8114370f>] do_rmdir+0xdf/0x120
[<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
[<ffffffff81002889>] ? do_notify_resume+0x89/0x100
[<ffffffff817361ae>] ? int_signal+0x12/0x17
[<ffffffff81145d85>] sys_unlinkat+0x25/0x40
[<ffffffff81735f22>] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space.  And then, as we
are waiting, journal_commit_transaction can run, and if the
JBD2_FLUSHED bit is set, then we will also try to take the same
checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be
made by jbd2_journal_commit_transaction().  There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ifc1ebe6516ea349381331d22bd4f226255330d93
Reviewed-on: http://review.whamcloud.com/8060
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <pkuelelixi@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
10 years agoLU-3617 o2ib: Correctly detect infiniband features 88/7488/3
James Simmons [Thu, 29 Aug 2013 14:50:35 +0000 (10:50 -0400)]
LU-3617 o2ib: Correctly detect infiniband features

The header file rdma_cm.h refers to the fc_compat.h
header for a function declaration. Lustre test to
see if certain infiniband features are available but
in order for those test to work properly the
fc_compact.h header must be included. Otherwise the
test will always fail thus disabling potential
features. This patch includes fc_compact.h when needed.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I9e6a9726dec04b3acd5898d6f633ef510144e4b2
Reviewed-on: http://review.whamcloud.com/7488
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
10 years agoLU-2253 tests: fix typo in test_2 of s-q 41/7341/2
Niu Yawei [Thu, 15 Aug 2013 10:54:00 +0000 (06:54 -0400)]
LU-2253 tests: fix typo in test_2 of s-q

In test_2 of s-q, the awk column number should be changed from
5 to 4 when replacing the 'lfs df' with 'lfs_df'.

Test-Parameters: envdefinitions=SLOW=yes testlist=sanity-quota

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ia37df21915dd631c9ee7d8c46249291dc1a6f6b5
Reviewed-on: http://review.whamcloud.com/7341
Tested-by: Hudson
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
10 years agoLU-1378 seq: swab lu_seq_range before accessing it
wang di [Sun, 21 Apr 2013 07:22:40 +0000 (00:22 -0700)]
LU-1378 seq: swab lu_seq_range before accessing it

In seq_client_rpc, it should swab lu_seq_range after getting
it from the reply.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I8a2118e1895d2c89430961997dbad4a3f20a6762
Reviewed-on: http://review.whamcloud.com/2655
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
10 years agoLU-1410 test: Allow sanity test 200 to work with one OST
Keith Mannthey [Tue, 21 Aug 2012 02:01:51 +0000 (19:01 -0700)]
LU-1410 test: Allow sanity test 200 to work with one OST

The sanity test is failing test 200 with one OST. When
the test adds machines to the pool it is not passing
valid arguments.

Changing TGTPOOL_FIRST=0 is allowing the test to run.

Signed-off-by: Keith Mannthey <keith@whamcloud.com>
Change-Id: I0709429e7888d0b3d90a0160778ca6224d9cb12e
Reviewed-on: http://review.whamcloud.com/3731
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
10 years agoLU-3041 lfsck: improve lfsck.sh on b1_8
Emoly Liu [Tue, 7 May 2013 16:33:26 +0000 (00:33 +0800)]
LU-3041 lfsck: improve lfsck.sh on b1_8

The test directory of lfsck.sh contains some files referencing same
object, which could cause error when removing the directory on test
cleanup. Also, in lfsck.sh we shouldn't use debugfs to remove objects,
that'll cause quota usage inconsistence warning in e2fsck.

This patch is a backport of commit 622148 and 2df010, and it also
includes part of the patch of LU-3133 and LU-3180.

Test-Parameters: testlist=lfsck

Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: I6fd5db0c921e744f12a92676eb906d730b9c3d10
Reviewed-on: http://review.whamcloud.com/5968
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
10 years agoLU-3036 test: make atime update properly with 2.x servers
Emoly Liu [Tue, 7 May 2013 21:40:53 +0000 (05:40 +0800)]
LU-3036 test: make atime update properly with 2.x servers

This patch includes the following fixes:
- set both MDS_ATTR_xTIME | MDS_ATTR_xTIME_SET when converting
  from OBD_MD_FLATIME in mdc_close_pack_20(). This will fix
  new 1.8 clients with old 2.x servers.
- improve sanityn.sh test_23
- remove sanity.sh test_203 since it does the same check to
  sanityn.sh test_23

Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: Ia5a2749dc548614d55d1b50a3ac57e34dfed56c4
Reviewed-on: http://review.whamcloud.com/6289
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
11 years agoLU-2913 kernel: kernel update [RHEL5.9 2.6.18-348.3.1.el5]
yangsheng [Tue, 2 Apr 2013 04:13:03 +0000 (12:13 +0800)]
LU-2913 kernel: kernel update [RHEL5.9 2.6.18-348.3.1.el5]

Update RHEL5.9 kernel to 2.6.18-348.3.1.el5.

Signed-off-by: yang sheng <yang.sheng@intel.com>
Change-Id: Ibaa3bc680c669f54fc0f9e44f2695b6114d1dc53
Reviewed-on: http://review.whamcloud.com/5914
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2309 config: ignore unknown configuration param
Jian Yu [Mon, 8 Apr 2013 09:56:49 +0000 (17:56 +0800)]
LU-2309 config: ignore unknown configuration param

Client or server should not fail to mount if it hits
a configuration parameter that it doesn't understand.
This patch fixes class_process_config() to meet
the above requirement.

The patch also improves conf-sanity test 42 to verify
that invalid sys config param should not prevent client
or server from mounting.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I04b7a0fe90a558b41c68b6e1218823db1ceed8b4
Reviewed-on: http://review.whamcloud.com/5972
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2823 obdfilter: always return last id when del orphan
Niu Yawei [Sun, 17 Feb 2013 03:32:54 +0000 (22:32 -0500)]
LU-2823 obdfilter: always return last id when del orphan

In filter_destroy_precreated(), it should still return the last id
even if the fo_destroy_in_progress was cleared already, otherwise
the last_id on MDS will not be synced with the last_id on obdfilter.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=el6 serverdistro=el5 \
clientarch=x86_64 serverarch=x86_64 \
nettypes=o2ib clientibstack=inkernel \
serveribstack=inkernel testlist=large-scale

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I01a5a5da201e227c139c3406adcaa896e5d5b71f
Reviewed-on: http://review.whamcloud.com/5448
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-2824 mds: do not deactivate OSC if osc_create returns -EBUSY 1.8.9-wc1 v1_8_9_WC1 v1_8_9_WC1_RC2
Jian Yu [Sun, 17 Feb 2013 11:03:15 +0000 (19:03 +0800)]
LU-2824 mds: do not deactivate OSC if osc_create returns -EBUSY

During MDS<->OST orphan recovery, osc_create() will likely return
-EBUSY while OSCC_FLAG_SYNC_IN_PROGRESS flag is still set
due to slow ll_mdt service thread. The __mds_lov_synchronize() will
then deactivate the OSC, which will cause mds_create_objects() get
-EIO error.

This patch fixes the above issue by checking the return value of
mds_lov_clear_orphans(). If it's -EBUSY, then do not mark the
OSC as inactive.

Test-Parameters: envdefinitions=DURATION=21600 \
clientdistro=el5 serverdistro=el5 clientarch=x86_64 \
serverarch=x86_64 clientcount=4 osscount=2 mdscount=2 \
austeroptions=-R failover=true \
useiscsi=true testlist=recovery-mds-scale

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I7e04648524714999bbd72d730d1d618239b630e6
Reviewed-on: http://review.whamcloud.com/5450
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Tested-by: Hudson
11 years agoLU-2657 recovery: don't delete objects
Niu Yawei [Mon, 21 Jan 2013 09:13:19 +0000 (04:13 -0500)]
LU-2657 recovery: don't delete objects

In mds_lov_update_objids(), when an data object id gap is
detected during recovery, it'll delete all the objects in
the gap, which isn't quite correct, because we can't guarantee
that the id is always increased by the transno order,
furthermore, that could also bring big trouble when the
lov_objid file was removed manually (to rebuild the corrupted
lov_objid file).

Fix type defect in filter_recov_log_unlink_cb(), where
oa->o_id should be increased by each loop cycle.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I47247a584da10b1434bf7cb24f606073c6afa903
Reviewed-on: http://review.whamcloud.com/5137
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1484 lprocfs: enable HAVE_PROCFS_USERS for rhel kernels
Nathaniel Clark [Thu, 14 Feb 2013 21:27:05 +0000 (16:27 -0500)]
LU-1484 lprocfs: enable HAVE_PROCFS_USERS for rhel kernels

For rhel kernels, given only the kernel-devel rpm it is not possible
to tell if proc_dir_entry_aux is defined, so assume it is.  It has
been included since late in the 5.x release cycle.

Test-Parameters: envdefinitions=SLOW=yes clientdistro=el5 \
  serverdistro=el5 clientarch=x86_64 serverarch=x86_64 \
  testlist=recovery-small

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: If17ecb0902ec90a1af6228d2b9b1b72bc68a6672
Reviewed-on: http://review.whamcloud.com/5439
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1969 release: bump version to 1.8.9-wc1 v1_8_9_WC1_RC1
Johann Lombardi [Tue, 12 Feb 2013 21:52:38 +0000 (22:52 +0100)]
LU-1969 release: bump version to 1.8.9-wc1

Get ready for RC1.

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Change-Id: I64590ae4110605f56448e96a609f79b8c0fed4ba
Reviewed-on: http://review.whamcloud.com/5328

11 years agoLU-2703 osc: cancel lock outside of spinlock
Hongchao Zhang [Fri, 8 Feb 2013 17:05:13 +0000 (01:05 +0800)]
LU-2703 osc: cancel lock outside of spinlock

during calling *_ap_completion, there is a ldlm lock needed to be
cancelled if the page belongs to readahead. this patch move the
cancellation out of the spin_lock cl_loi_list_lock for it is a big
operation containing lock acquisition (mutex) and memory allocation.

Change-Id: Iecd4c27d405ffa968cc74f264dfc48cb2e2f5671
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/5285
Tested-by: Hudson
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-2183 quota: update tests for DNE
Niu Yawei [Fri, 1 Feb 2013 04:19:05 +0000 (23:19 -0500)]
LU-2183 quota: update tests for DNE

Update test_13 of sq since the osp is changed to lwp in DNE.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I002bc6e3377ff578767e03657b32abf02f9562d5
Reviewed-on: http://review.whamcloud.com/5241
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2467 protocol: Add OBD_CONNECT_PINGLESS
Li Wei [Thu, 31 Jan 2013 09:48:23 +0000 (17:48 +0800)]
LU-2467 protocol: Add OBD_CONNECT_PINGLESS

Reserve a bit for OBD_CONNECT_PINGLESS, which indicates a client is
capable of suppressing keep-alive OBD_PINGs.  If granted by a server,
it means the server does not require (but still allows) pings.

Change-Id: Id19f979650ffdcb117feb3c28fe7755add1013c6
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/5231
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2631 build: fix o2ib build for 2.6.18-348.el5 kernel
Shuichi Ihara [Thu, 17 Jan 2013 11:16:16 +0000 (20:16 +0900)]
LU-2631 build: fix o2ib build for 2.6.18-348.el5 kernel

Adding scsi/fc_compat.h to lustre-lnet.m4 and o2iblnd.h to build
o2ib modules for 2.6.18-348.el5 kernel.

Signed-off-by: Shuichi Ihara <sihara@ddn.com>
Change-Id: I496eaf9977f44e90bfde7ff68215da8e651688dd
Reviewed-on: http://review.whamcloud.com/5051
Tested-by: Hudson
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1484 lprocfs: handle hidden proc_dir_entry users
Andreas Dilger [Sat, 2 Feb 2013 00:22:13 +0000 (17:22 -0700)]
LU-1484 lprocfs: handle hidden proc_dir_entry users

The RHEL 5.9 2.6.18-348.1.1.el5 kernel uses both the new style
"pde_users" and the old style "deleted" flags for marking a
proc_dir_entry as deleted.  Unfortunately, the new "pde_users"
data is hidden in an external structure that is not visible to
the callers or in the headers (for binary compatibility I guess?)
so our configure checks cannot find it.

Instead, just check for proc_fops == NULL in a racy manner on
such kernels, since we cannot do locking and the locking is
mostly just needed as a memory barrier since pde_fops could
become NULL at any time after dropping the lock.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I05cae305e24ffff09a06e3ad17c28c175c3ebbe5
Reviewed-on: http://review.whamcloud.com/5253
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2601 kernel: kernel update [RHEL5.9 2.6.18-348.1.1.el5]
yangsheng [Fri, 18 Jan 2013 20:24:32 +0000 (04:24 +0800)]
LU-2601 kernel: kernel update [RHEL5.9 2.6.18-348.1.1.el5]

Update RHEL5.9 kernel to 2.6.18-348.1.1.el5

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I62442e898774aa1513f3496e3e5701b7ec4c2833
Reviewed-on: http://review.whamcloud.com/5132
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2616 build: B1_8 Copyright update: Script
Keith Mannthey [Wed, 16 Jan 2013 01:35:59 +0000 (17:35 -0800)]
LU-2616 build: B1_8 Copyright update: Script

This is a copy of the current Master updatecw.sh tool
Jan 14 2013. The previous Update commit was added
to the commit exclude list.

This tool updates a files copyright including assigning
Copyrights to Intel where appropriate.

Addition to build/autoMakefile.am.toplevel required
to pass build for Ubuntu Client.

Signed-off-by: Keith Mannthey <keith.mannthey@intel.com>
Change-Id: Ie86a6263f7932875a00c676dfa5ced133871c648
Reviewed-on: http://review.whamcloud.com/5019
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1484 kernel: pass RHEL5 build for 2.6.18-308
Bobi Jam [Fri, 18 Jan 2013 13:48:49 +0000 (21:48 +0800)]
LU-1484 kernel: pass RHEL5 build for 2.6.18-308

For vanilla kernel, proc_dir_entry::deleted and ::pde_users co-exists
from 2.6.23 to 2.6.23.17.

For some RHEL5 kernels, it defines co-existings
proc_dir_entry::deleted and proc_dir_entry_aux::pde_users.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ic0a381aa44708ff0545da376c1be05b13efa523f
Reviewed-on: http://review.whamcloud.com/5129
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Peng Tao <bergwolf@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2516 kernel: kernel update [RHEL6.3 2.6.32-279.19.1.el6]
yangsheng [Wed, 16 Jan 2013 16:27:52 +0000 (00:27 +0800)]
LU-2516 kernel: kernel update [RHEL6.3 2.6.32-279.19.1.el6]

Update RHEL6.3 kernel to 2.6.32-279.19.1.el6 (client only)

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: Idd0a3bd08c79e2acafedb2c8271a1be1199bc476
Reviewed-on: http://review.whamcloud.com/4989
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2550 test: fix sanity test_122
Niu Yawei [Thu, 17 Jan 2013 09:44:57 +0000 (04:44 -0500)]
LU-2550 test: fix sanity test_122

The injected error in sanity test_122 could cause reconnect & resend
infinitely, then the test will wait on 'sync' forever.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes testlist=sanity

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I53f0a476c98904e9fb570cf9e13ccdacdcd80af0
Reviewed-on: http://review.whamcloud.com/5050
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2592 tests: do not remove $TMP/*active in rpc.sh
Yu Jian [Sat, 12 Jan 2013 15:44:56 +0000 (23:44 +0800)]
LU-2592 tests: do not remove $TMP/*active in rpc.sh

The $TMP/*active files record the current active facets info
under failover test configuration. They are removed from
init_test_env() initially. If failover tests use do_rpc_nodes()
which performs rpc.sh and in which init_test_env() is used,
then the active facets info will be lost during the testing.

This patch fixes the above issue by introducing an RPC_MODE
variable which controls that the $TMP/*active files will not
be removed from init_test_env() in rpc.sh.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes clientcount=4 osscount=2 mdscount=2 austeroptions=-R failover=true useiscsi=true testlist=replay-vbr
Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I24c560f2be92bbf6ab92e4d5de1905092eb926b4
Reviewed-on: http://review.whamcloud.com/5008
Tested-by: Hudson
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2616 build: B1_8 Copyright update: Updates
Keith Mannthey [Mon, 14 Jan 2013 18:02:39 +0000 (10:02 -0800)]
LU-2616 build:  B1_8 Copyright update: Updates

The b1_8 branch needs to update it's copyrights.
This is the output of the updatecw.sh tool.

Whamcloud copyrights are assinged to Intel.

No manual Edits were/are required.

Signed-off-by: Keith Mannthey <keith.mannthey@intel.com>
Change-Id: Ia75ffe198a72fa21d52224ff554266296ce863c4
Reviewed-on: http://review.whamcloud.com/5018
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1018 tests: reduce runtime value for compilebench
Minh Diep [Tue, 15 Jan 2013 16:14:45 +0000 (08:14 -0800)]
LU-1018 tests: reduce runtime value for compilebench

We are reducing the cbench_IDIRS and cbench_RUNS to 2.
These variables can be adjusted to other runs beside
sanity check.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes testlist=parallel-scale
Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: I6613c9d0245d6c49c8d94f1252eb13ac87a621a1
Reviewed-on: http://review.whamcloud.com/5032
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1526 tests: Adapt oos to the new grant and osd-zfs behavior
Yu Jian [Thu, 10 Jan 2013 08:19:58 +0000 (16:19 +0800)]
LU-1526 tests: Adapt oos to the new grant and osd-zfs behavior

This patch is backported from commit 938947f of LU-1415 to
support oos interoperating testing with 2.4 server.

The patch moves the oos checking codes from oos.sh and
oos2.sh into common function oos_full() in test-framework.sh.

The oos_full() allows 1% of total space in bavail because of
delayed allocation with ZFS which might release some free space
after txg commit.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I8933a6c8614a96d7bab5689c9d3e3a1f59bbc133
Reviewed-on: http://review.whamcloud.com/4986
Tested-by: Hudson
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1129 obdfilter: handle race condition of recreating objects
Yu Jian [Mon, 14 Jan 2013 06:29:14 +0000 (14:29 +0800)]
LU-1129 obdfilter: handle race condition of recreating objects

During OST recovery, a race can happen while handling replayed
OST_WRITE request during the MDS->OST orphan recovery period to
recreate missing objects, which can trigger ASSERTION(diff >= 0)
failure.

This patch handles the above issue by adding obd->obd_recovering
into the assertion to check whether the OST is in recovery or not.
If it's in recovery and diff < 0, then no assertion failure occurs,
the object has been recreated. If the OST is not in recovery and
diff < 0, then the assertion failure occurs.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes testlist=sanity-quota
Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I346875378802385a95b0832b76d19f9957910cdf
Reviewed-on: http://review.whamcloud.com/5013
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-797 tests: fix ost-pools test timeout issues on b1_8
Liu Ying [Mon, 14 Jan 2013 15:10:45 +0000 (23:10 +0800)]
LU-797 tests: fix ost-pools test timeout issues on b1_8

The test time of the ost-pools subtests is unreasonably long.

test_14 fills an OST to 90% full, regardless of the OST size.
Skip the test if the amount of data to be written is too large
to run in a practical time.

test_18 creates 3x3x30000 files to compare performance with/without
pools enabled.  Instead of creating a fixed number of files, use
createmany to run for a specific (short) time to measure
performance.

test_23 tried to fill all OSTs 100% full.  Split this test into two:
- test_23a to test quota with a file in a pool
- test_23b to test OOS with a file striped over pool

The following patches are merged into this one:
- LU-797 tests: speed up ost-pools tests
(master patch eea698c944283b755882d8f504d2fcc8ea371bd8)
- LU-797 tests: skip ost-pools.sh 23b when SLOW=no
(master patch f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)
- LU-797 tests: process lfs df output properly
(master patch b1a1ec6300a5ec3925b725d5d2b783314dff3f8)
- LU-797 tests: improve test_23b of ost-pools.sh
(master patch 6dd41a43e3cdff1b2e0713cfc163734889d8650a)

Test-Parameters: envdefinitions=SLOW=yes testlist=ost-pools

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Niu Yawei <niu@whamcloud.com>
Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: I391e641664890e7172d2ed1da815894e656826ce
Reviewed-on: http://review.whamcloud.com/4898
Tested-by: Hudson
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1779 tests: fix run_one_logged() to log SKIP status
Yu Jian [Sat, 5 Jan 2013 07:28:02 +0000 (15:28 +0800)]
LU-1779 tests: fix run_one_logged() to log SKIP status

In the current test framework, only those tests which are in the
$ALWAYS_EXCEPT list are logged with SKIP status, other skipped
tests are all logged with PASS status.

This patch fixes the above issue by setting the SKIP status in
pass() and logging the status in run_one_logged().

Test-Parameters: clientarch=x86_64 serverarch=x86_64 testlist=mmp
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I766cf6d2bb984b6097b20d2c089925890b86f9b1
Reviewed-on: http://review.whamcloud.com/4955
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
11 years agoLU-958 tests: debug_mb set incorrectly for smp or vm
Emoly Liu [Tue, 8 Jan 2013 08:29:15 +0000 (16:29 +0800)]
LU-958 tests: debug_mb set incorrectly for smp or vm

For cpus with number of cores or for some VMs, number of possible CPUs
in the system could be greater than number of cpu reported by getconf.
Added check for maximum debug buffer size.
Added check if that "possible" exists, if not - use old method.

The patch of LU-1249 is also invloved to auto correct improper debug
buffer size setting.

port of patch 2ccb34d882b01305794e7780b6dd691179ddae7e
port of patch 28817cbd133c626042f9b142600c03187ba4a7ce
Xyratex-bug-id: MRP-219 incorrect settings for debug_mb

Test-Parameters: clientarch=x86_64 serverarch=x86_64 testlist=mmp

Signed-off-by: Denis Kondratenko <Denis_Kondratenko@xyratex.com>
Signed-off-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: Ib1e39a26e4d4af8e599d6e5fcdb1fecff8a6f4fa
Reviewed-on: http://review.whamcloud.com/4962
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1526 tests: new sanity-quota tests
Yu Jian [Tue, 8 Jan 2013 07:52:55 +0000 (15:52 +0800)]
LU-1526 tests: new sanity-quota tests

This patch is ported from commit d6f2a9f to add new sanity-quota.sh
for the interoprability testing with new quota architecture,
meanwhile, the old sanity-quota.sh is reserved for the interoprability
with old server.

The patch also contains the fixups for LU-2174, LU-2283, LU-2284,
LU-2329 and LU-2526.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes clientarch=x86_64 serverarch=x86_64 testlist=conf-sanity,sanity-quota
Signed-off-by: Niu Yawei <niu@whamcloud.com>
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I5d6834e5b7c257f0a44f45710674b2a236039bf1
Reviewed-on: http://review.whamcloud.com/4915
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Tested-by: Hudson
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1625 test: reduce test duration for nfs mode
Emoly Liu [Tue, 8 Jan 2013 03:16:27 +0000 (11:16 +0800)]
LU-1625 test: reduce test duration for nfs mode

There isn't much value to run long duration in nfs mode.
Cut down IOR test as well.
Based on original work by Minh Diep.

port of patch f518a40d96d3431f3d68de9eac99ea33498894c7
port of patch e69d9852bc095695ceecb219b84bd8a48d5aa10c

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \
testlist=parallel-scale-nfsv3,parallel-scale-nfsv4

Signed-off-by: Keith Mannthey <keith@whamcloud.com>
Change-Id: I308d2dcedcb86bbc86b3d2b875e91ceeb2b96f6e
Signed-off-by: Liu Ying <emoly.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/4949
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1484 lprocfs: refine LC_PROCFS_USERS check
Bobi Jam [Wed, 9 Jan 2013 01:16:08 +0000 (09:16 +0800)]
LU-1484 lprocfs: refine LC_PROCFS_USERS check

In some RHEL patched 2.6.18 kernels, pde_users member is added in
another struct proc_dir_entry_aux instead of in struct proc_dir_entry
in later kernel version of 2.6.23.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Icee65893b2fbf4d0c3b3e957cb038be99aaf6eb8
Reviewed-on: http://review.whamcloud.com/4976
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-2550 osc: set resend count properly
Niu Yawei [Mon, 7 Jan 2013 09:12:33 +0000 (04:12 -0500)]
LU-2550 osc: set resend count properly

The resend count of new io request should be set properly
in osc_brw_redo_request().

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I31285df00513ad02befd84d9d37cfcbb48055bb0
Reviewed-on: http://review.whamcloud.com/4964
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1526 tests: Handle OFD procfs changes
Yu Jian [Tue, 8 Jan 2013 08:09:02 +0000 (16:09 +0800)]
LU-1526 tests: Handle OFD procfs changes

In order to interop with 2.4 server, the following procfs entry
changes need to be handled:

- obdfilter.*.mntdev -> osd-*.*.mntdev
- obdfilter.*.<cache_related> -> osd-*.*.<cache_related>

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes clientarch=x86_64 serverarch=x86_64 testlist=lfsck,sanity
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I74bfaa1e6d68203951de31676f23fbd8250ec652
Reviewed-on: http://review.whamcloud.com/4958
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-2420 tests: Have POSIX testing on b1_8
Emoly Liu [Thu, 27 Dec 2012 06:46:39 +0000 (14:46 +0800)]
LU-2420 tests: Have POSIX testing on b1_8

Have POSIX testing automated on b1_8 as it is on master.
LU-2274 port is included to change the baseline filesystem to ext3
for POSIX testing on SLES distro.
Also, this patch adds add_group() and add_user() functions into the
test-framework.sh. They are used by setup_posix_users() in posix.sh.

port of b2_1 patch f55a2af51af0bbb1d97e6987a45ca501adbc4ab6
port of b2_1 patch f9531ab0803cbafcb68003f3470307e4b826129f

Test-Parameters: envdefinitions=SLOW=yes testlist=posix

Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: I8e3e8d5e87b13f4fd2d0b972d2161e9f1afbc4e9
Reviewed-on: http://review.whamcloud.com/4894
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1842 quota: t-f changes for new quota
Niu Yawei [Fri, 28 Dec 2012 07:19:57 +0000 (15:19 +0800)]
LU-1842 quota: t-f changes for new quota

Add new quota functions in the t-f according to new quota
architecutre, at the same time, old functions are kept for
the interoprability with old server.

The patch also adds version_code() and lustre_version_code()
functions into the test framework.

One minor defect fixed:
- in facet_up(), add '-x' option to grep for exact matching,
otherwise this function will not work in single node test;

Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-quota

Signed-off-by: Niu Yawei <niu@whamcloud.com>
Change-Id: I7d96b7ea7cd14331aee7cfcca711a4e876025e2f
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Reviewed-on: http://review.whamcloud.com/4897
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1762 tests: get correct MMP update and check intervals
Yu Jian [Fri, 4 Jan 2013 15:40:43 +0000 (23:40 +0800)]
LU-1762 tests: get correct MMP update and check intervals

This patch fixes the get_mmp_update_interval() and
get_mmp_check_interval() in mmp.sh to get the correct
MMP update and check intervals from both the old and
new outputs of debugfs.

The patch also improves test_8() to increase the running
time of e2fsck to allow mount operation to be started
before e2fsck operation stops.

Test-Parameters: testlist=mmp
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: If614659e3b7fe45c4b406d5541b1c2944b3c37ce
Reviewed-on: http://review.whamcloud.com/4953
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
11 years agoLU-1526 tests: add --index support to the test framework
Yu Jian [Thu, 27 Dec 2012 04:57:25 +0000 (12:57 +0800)]
LU-1526 tests: add --index support to the test framework

This patch improves mkfs_opts() in test-framework.sh to
add --index option for MDT and OST targets automatically.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: Ic00bf4c498b336e283ae86e4543f43709eda01aa
Reviewed-on: http://review.whamcloud.com/4893
Tested-by: Hudson
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1526 tests: Support for MDS-initiated OST_DESTROYs
Yu Jian [Sat, 5 Jan 2013 10:10:11 +0000 (18:10 +0800)]
LU-1526 tests: Support for MDS-initiated OST_DESTROYs

This patch is backported from commit af5f388 of LU-1303 to
support interoperating with 2.4 server.

The patch makes sure the tests work with MDSs that destroy OST
objects asynchronously on behalf of clients.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I8d8cb9e3699b6e7f63af106a5f45363f61f3ce7c
Reviewed-on: http://review.whamcloud.com/4959
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1075 tests: auster detect directory as script name
Emoly Liu [Fri, 28 Dec 2012 05:54:00 +0000 (13:54 +0800)]
LU-1075 tests: auster detect directory as script name

Auster should use -f or detect the existing of script name
rather use -e which will treat directorysame as script.
Also, the patch of LU-412 is included to fix test script
lookup in auster.

port of master patch 0a79b541ad736bb296ea051e58b667c6195731a1
port of master patch 39b98cb351866da5648ea1a2216c108f8791226f

Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Signed-off-by: Li Wei <liwei@whamcloud.com>
Signed-off-by: Liu Ying <emoly.liu@intel.com>
Change-Id: Id7d526b73051124b8b76deca974345573c6faf2b
Reviewed-on: http://review.whamcloud.com/4895
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1520 ldlm: Revert "improve ldlm_pools_shrink algorithm"
Johann Lombardi [Tue, 11 Dec 2012 15:47:59 +0000 (10:47 -0500)]
LU-1520 ldlm: Revert "improve ldlm_pools_shrink algorithm"

This reverts commit c861cc7e0b6f7e82fd55b9658dd29578f97b5607
The patch should land on master first.

Change-Id: I9b3739defed6bf315646f8a107d3218414a14d25
Reviewed-on: http://review.whamcloud.com/4799
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Tested-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-676 tests: machinefile option for mpirun via a variable
Jay J. Lan [Tue, 18 Oct 2011 19:04:07 +0000 (12:04 -0700)]
LU-676 tests: machinefile option for mpirun via a variable

Not all MPI implementations pass the host file to mpirun via the same
option. Common options are -machinefile and -hostfile.

This problem can be resolved by using a variable MACHINEFILE_OPTION
instead. A default value is assigned if the variable not defined.

Signed-off-by: Jay J Lan <jay.j.lan@nasa.gov>
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I3362a6e62a27318cff733aea2f99b1356b3ff02e
Reviewed-on: http://review.whamcloud.com/1540
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1322 llite: revalidate dentry if f_version is 0
Peng Tao [Mon, 25 Jun 2012 13:31:18 +0000 (21:31 +0800)]
LU-1322 llite: revalidate dentry if f_version is 0

If a file is lseek()ed before i_version changes, it is possible
for application to call into ll_readdir and have f_pos pointing
to some garbage data and cause kernel hang.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Change-Id: I49ab94ad5c63c3029d5ad96e27e38e124a135ed8
Reviewed-on: http://review.whamcloud.com/3181
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Lai Siyao <laisiyao@whamcloud.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
11 years agoLU-1520 ldlm: improve ldlm_pools_shrink algorithm
Hongchao Zhang [Tue, 4 Sep 2012 15:27:09 +0000 (23:27 +0800)]
LU-1520 ldlm: improve ldlm_pools_shrink algorithm

1, shrink namespaces by batches of 64 namespaces, the batch is
   implemented as list
2, limit number of simultaneously shrinking threads to 32 threads
3, have ldlm_pools_recalc to operate with namespaces similar to
   ldlm_pools_shrink
4, use glboal counters of unused locks on client and granted
   locks on servers to avoid iterating over namespaces

Change-Id: I5fb3f56748ae10961c50b4b06c300c9c7f5fca87
Signed-off-by: Vladimir Saveliev <valdimir.saveliev@oracle.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@whamcloud.com>
Reviewed-on: http://review.whamcloud.com/3270
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <yong.fan@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1901 ldiskfs: compile error for sles11 when JBD2_DEBUG is on
Vladimir Saveliev [Sun, 9 Sep 2012 08:57:03 +0000 (12:57 +0400)]
LU-1901 ldiskfs: compile error for sles11 when JBD2_DEBUG is on

The only change is the below hunk for ext4/inode.c:ext4_forget():
        jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, "
-                 "data mode %xn",
+                 "data mode %Lxn",
                  bh, is_metadata, inode->i_mode,
                  test_opt(inode->i_sb, DATA_FLAGS));

It is needed because for sles11 s_mount_opt of struct ext4_sb_info
is changed to unsigned long long.

Signed-off-by: Vladimir Saveliev <vladimir.saveliev@oracle.com>
Change-Id: I59646b821b83eed08a67124a9f52ab8dcb9b46ae
Reviewed-on: http://review.whamcloud.com/3943
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
11 years agoLU-2371 ptlrpc: add support for -EINPROGRESS
Niu Yawei [Fri, 13 Jan 2012 08:33:22 +0000 (00:33 -0800)]
LU-2371 ptlrpc: add support for -EINPROGRESS

Backport patches from LU-904, LU-1329 and LU-1788 to introduce
support for -EINPROGRESS in lustre 1.8. This is needed for
quota interoperability with 2.4 servers.

Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Change-Id: I9136112ca82dbf6caba41c2d41643ec646372852
Signed-off-by: Niu Yawei <niu@whamcloud.com>
Reviewed-on: http://review.whamcloud.com/4655
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1064 mds: fix mds_lookup lma removal error path
Andreas Dilger [Sat, 1 Dec 2012 07:15:34 +0000 (00:15 -0700)]
LU-1064 mds: fix mds_lookup lma removal error path

In commit 1fd243c89e3b221d40ce74b8ef47f1bca760c8f9 if an error is hit
removing the "lma" xattr from an updated 2.x inode, then the open
transaction handle would never be committed, and the MDS would hang.

This is unlikely to be a problem, as the only errors fsfilt_set_md()
will hit that are not programming bugs are due to IO errors from the
underlying disk (which is an even bigger problem).

Make sure that the transaction is committed, even after an error.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If7d19a2337a12efafacd20d5c4e5c00e85300c1e
Reviewed-on: http://review.whamcloud.com/4729
Tested-by: Hudson
Reviewed-by: Iurii Golovach <Iurii_Golovach@xyratex.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1064 mds: journal locking issue was fixed
Iurii Golovach [Fri, 16 Mar 2012 03:06:10 +0000 (20:06 -0700)]
LU-1064 mds: journal locking issue was fixed

During the downgrade procedure locking during
lma data removal is observed. The patch fix
this issue by moving the lma remove under the
mutex.

Reviewed-by: Vitaly Fertman <vfertman@xyratex.com>
Reviewed-by: Andrew Perepechko <aperepechko@xyratex.com>
Xyratex-bug-id: MRP-251

Signed-off-by: Iurii Golovach <Iurii_Golovach@xyratex.com>
Change-Id: I01e7bda03e3b0dfae92b7e03672c56b23a73989d
Reviewed-on: http://review.whamcloud.com/2077
Tested-by: Hudson
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Keith Mannthey <kemannthey@gmail.com>
11 years agoLU-1517 ptlrpc: throw net error to ptlrpc for bulk
Alexander.Boyko [Mon, 17 Sep 2012 13:48:59 +0000 (17:48 +0400)]
LU-1517 ptlrpc: throw net error to ptlrpc for bulk

Start reconnect and resend if network error occures
for the bulk transfer.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Change-Id: I0cf2ee1230a039336f081fbb520c1ce768882088
Xyratex-bug-id: MRP-523
Reviewed-on: http://review.whamcloud.com/3102
Tested-by: Hudson
Reviewed-by: Liang Zhen <liang@whamcloud.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-588 ldiskfs: Don't release super block buffer_head too early
Jeremy Filizetti [Mon, 30 Jul 2012 16:10:45 +0000 (12:10 -0400)]
LU-588 ldiskfs: Don't release super block buffer_head too early

If the super block buffer_head is released prior to MMP stopping
kmmpd can get a zeroed buffer_head and exit. The following code
causes kmmpd to exit when the s_feature_incompat is zero:

if (!(le32_to_cpu(es->s_feature_incompat) &
LDISKFS_FEATURE_INCOMPAT_MMP)) {
ldiskfs_warning(sb, "kmmpd being stopped since MMP feature"
                             " has been disabled.");
LDISKFS_SB(sb)->s_mmp_tsk = NULL;
goto failed;
}

A deadlock can occur with the kthread_stop_lock mutex because
ldiskfs_put_super calls kthread_stop on an already stopped thread
(kmmpd) so it waits for completion of the thread stopping before
releasing the kthread_stop_lock.

This is the result of a race with the kmmpd thread setting s_mmp_tsk
to NULL and ldiskfs_put_super in another thread to checks s_mmp_tsk
for NULL prior to calling kthread_stop.

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: Ia15d8ff829705a5d51dea4f86e40ba7c5745a9c5
Reviewed-on: http://review.whamcloud.com/3172
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yu Jian <yujian@whamcloud.com>
11 years agoLU-1770 ptlrpc: introducing OBD_CONNECT_FLOCK_OWNER flag
Iurii.Golovach [Tue, 16 Oct 2012 13:39:07 +0000 (16:39 +0300)]
LU-1770 ptlrpc: introducing OBD_CONNECT_FLOCK_OWNER flag

After applying flock policy fix into the 1.8 users met with an issue
when 1.8 clients with a fixed flock policy recognized incorrectly by
2.x servers.
This flag is intended to present 1.8 clients with fixed flock policy
to let 2.x servers make flock policy recognition correctly.
Patches with functionality changes were attached on review at LU-1575

Xyratex-bug-id: MRP-489

Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andriy Skulysh <andriy_skulysh@xyratex.com>
Signed-off-by: Iurii Golovach <iurii_golovach@xyratex.com>
Change-Id: I0b203a7e181310c2888ae5bbe8c90ca0a5bbe549
Reviewed-on: http://review.whamcloud.com/3723
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-630 lnet: only router checks peer health
Lai Siyao [Mon, 5 Dec 2011 07:28:39 +0000 (15:28 +0800)]
LU-630 lnet: only router checks peer health

The peer health code is designed for router, so a ~rtr node always
assumes peers to be alive.

Signed-off-by: Lai Siyao <laisiyao@whamcloud.com>
Change-Id: Iacdc7359c69e0f172de0914048b35bd6fe06133e
Reviewed-on: http://review.whamcloud.com/4287
Tested-by: Hudson
Reviewed-by: Isaac Huang <he.huang@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1306 ldlm: LBUG at ldlm_lock.c:213
Andriy Skulysh [Wed, 11 Apr 2012 11:55:28 +0000 (14:55 +0300)]
LU-1306 ldlm: LBUG at ldlm_lock.c:213

Protect l_flags with locking to prevent race on
signal reception.

Xyratex-bug-id: MRP-420
Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Iurii Golovach <iurii_golovach@xyratex.com>
Change-Id: I98ba5e6e7a287090f6bd2a270c89a7671875bb9a
Reviewed-on: http://review.whamcloud.com/2727
Reviewed-by: Iurii Golovach <Iurii_Golovach@xyratex.com>
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
11 years agoLU-1585 lnet: Fix an incorrect timestamp calculation in lst.c
Doug Oucharek [Thu, 26 Jul 2012 05:21:46 +0000 (22:21 -0700)]
LU-1585 lnet: Fix an incorrect timestamp calculation in lst.c

The operation in routine lst_timeval_diff() (in lst.c) has
a bug. It uses tv_sec where it should be using tv_usec.

Signed-off-by: Doug Oucharek <doug@whamcloud.com>
Change-Id: I898dacd3d4a2c84594148d0514fda731d24a49bf
Reviewed-on: http://review.whamcloud.com/3474
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang@whamcloud.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
11 years agoLU-1969 release: bump version to 1.8.8.60-wc1 v1_8_8_60_WC1
Johann Lombardi [Mon, 17 Sep 2012 19:14:14 +0000 (21:14 +0200)]
LU-1969 release: bump version to 1.8.8.60-wc1

Bump version to 1.8.8.60-wc1.

Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Change-Id: I6c8e6191afc21f674b92ebe65a869fb76bb52bd4
Reviewed-on: http://review.whamcloud.com/4014

11 years agoLU-1789 protocol: reserve connect flag for lightweight conn
Johann Lombardi [Tue, 4 Sep 2012 07:52:00 +0000 (09:52 +0200)]
LU-1789 protocol: reserve connect flag for lightweight conn

Reserve connection flag for lightweight connection support.
Although this feature will never be supported on 1.8, it still avoids
flag conflicts.

Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Change-Id: I1b943579d8c46cce0d70a5df45d35874bf63b29b
Reviewed-on: http://review.whamcloud.com/3852

11 years agoLU-1675: fix fid for ll_get_parent
Alexander.Boyko [Thu, 26 Jul 2012 06:16:51 +0000 (10:16 +0400)]
LU-1675: fix fid for ll_get_parent

When nfs reexport occured between 1.8 client and 2.0 server,
nfs client do mkdir a, cd a, and get -521 error, ll_get_parent()
fail with -22 error, and mds print
"mdt_body_unpack()) Invalid fid: [0x2010e0901000001:0x0:0x4000]".
This patch fix fid translation for the ptlrpc request/reply.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Xyratex-bug-id: MRP-522
Change-Id: Ib1a79e36ead478ebb874e7e04761bc43f33a410f
Reviewed-on: http://review.whamcloud.com/3475
Tested-by: Hudson
Reviewed-by: Fan Yong <yong.fan@whamcloud.com>
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1674 llite: opencreate without mode can crash llite
Liang Zhen [Thu, 26 Jul 2012 00:38:07 +0000 (08:38 +0800)]
LU-1674 llite: opencreate without mode can crash llite

User should specify mode for opencreate but if they don't llite will
LBUG, which is not good.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5c4044da6dda2a902bc48f408f6aaf8d02dd82a4
Reviewed-on: http://review.whamcloud.com/3469
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1838 llite: trusted. xattr is invisible to non-root
Fan Yong [Thu, 6 Sep 2012 16:18:05 +0000 (00:18 +0800)]
LU-1838 llite: trusted. xattr is invisible to non-root

Filter out all invalid xattrs in listxattr.
This includes trusted. xattrs that can cause
unnecessary "EPERM" in subsequent getxattr operations.

Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Signed-off-by: Bob Glossman <bogl@whamcloud.com>
Change-Id: I9613444adcdb14067a775f68f951af7a9b941e9a
Reviewed-on: http://review.whamcloud.com/3892
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1754 kernel: Kernel update [RHEL6.3 2.6.32-279.5.1.el6]
yangsheng [Fri, 24 Aug 2012 16:31:03 +0000 (00:31 +0800)]
LU-1754 kernel: Kernel update [RHEL6.3 2.6.32-279.5.1.el6]

Update RHEL6.3 kernel to 2.6.32-279.5.1.el6(client only).

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: Idffcaf6efa53119c1d093b852a37453d8c9b4116
Reviewed-on: http://review.whamcloud.com/3774
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-919 obdclass: remove hard coded 0x5a5a5a
Niu Yawei [Wed, 11 Jan 2012 04:24:59 +0000 (20:24 -0800)]
LU-919 obdclass: remove hard coded 0x5a5a5a

We assert atomic_t value with hard coded 0x5a5a5a in several places,
which could result in false assertion failure when the reference count
getting very large in some extreme case.

The hard coded 0x5a5a5a should be replaced by LI_POISON.

Signed-off-by: Bruno Faccini <bruno.faccini@bull.net>
Signed-off-by: Niu Yawei <niu@whamcloud.com>
Change-Id: Ia1105e48326c20a37d887ba9dc926ea300d97741
Reviewed-on: http://review.whamcloud.com/1954
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1720 kernel: Quota doesn't work over 4TB on single OST
yangsheng [Fri, 10 Aug 2012 13:07:29 +0000 (21:07 +0800)]
LU-1720 kernel: Quota doesn't work over 4TB on single OST

Fix previous kernel update patch wrong update chunk.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: Ice06fd70ad6f034dddb5aae5291c613e04b18d99
Reviewed-on: http://review.whamcloud.com/3599
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1782 quota: ignore sb_has_quota_active() in OFED's header
Shuichi Ihara [Thu, 23 Aug 2012 16:36:46 +0000 (01:36 +0900)]
LU-1782 quota: ignore sb_has_quota_active() in OFED's header

sb_has_quota_active() and sb_any_quota_active() are defined
in ofed's backport headers, but if these are reffered on
RHEL5's kernel, quota is broken. So, it ignores them.

Signed-off-by: Shuichi Ihara <sihara@ddn.com>
Change-Id: Ic78799bc5d948b583b4a515479d5091381c63185
Reviewed-on: http://review.whamcloud.com/3764
Reviewed-by: Niu Yawei <niu@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1496 ptlrpc: prolong rw locks even IO RPCs are finished
Bobi Jam [Thu, 21 Jun 2012 04:37:49 +0000 (12:37 +0800)]
LU-1496 ptlrpc: prolong rw locks even IO RPCs are finished

Refresh rw lock again after IO RPCs are finished to leave a time
window for clients to cancel covering dlm locks.

This is a part of LU-874 back port.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5cd185870e601a66bce21b3cc3c91f5f800b4c27
Reviewed-on: http://review.whamcloud.com/3157
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@whamcloud.com>
Reviewed-by: Fan Yong <yong.fan@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1115 kernel: software raid6 related BUG
yangsheng [Wed, 2 May 2012 10:35:41 +0000 (18:35 +0800)]
LU-1115 kernel: software raid6 related BUG

Software raid6 hit BUGON in fs/bio.c:222 when raid chunk > 64k.
We pull upstream patch: 5b99c2ffa980528a197f26c7d876cceeccce8dd5
to deal with this issue.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I5330bc161e7cf5364a614547949323fc9a3ee7e3
Reviewed-on: http://review.whamcloud.com/2625
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-359 llite: no close error if application has known failure
Fan Yong [Wed, 1 Aug 2012 13:02:36 +0000 (21:02 +0800)]
LU-359 llite: no close error if application has known failure

Don't return error again when close if the application has known
former write failure to avoid potenical rdundant error handling,
like confused error message.

Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Change-Id: I62d9cd83fc03fad22c994f2a77774ca113a6c057
Reviewed-on: http://review.whamcloud.com/596
Reviewed-by: Niu Yawei <niu@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1488 mdc: fix fid_res_name_eq() issue.
yangsheng [Sun, 5 Aug 2012 19:42:48 +0000 (03:42 +0800)]
LU-1488 mdc: fix fid_res_name_eq() issue.

Original error message:
LustreError: 25302:0:(namei.c:256:ll_mdc_blocking_ast())
ns: lustre-MDT0000-mdc-ffff81021762a000 lock:
The issue cause by commit ef8bd11416bae8c03a65682f3a10a4da39922b45.
fid_res_name_eq() use wrong way to compare fid & res_name.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: Iacba148b6c3ba7fa775d2b9a4a58bdbf67434d7c
Reviewed-on: http://review.whamcloud.com/3522
Tested-by: Hudson
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <yong.fan@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1511 kernel: kernel update [RHEL5.8 2.6.18-308.11.1.el5]
yangsheng [Tue, 12 Jun 2012 15:57:16 +0000 (23:57 +0800)]
LU-1511 kernel: kernel update [RHEL5.8 2.6.18-308.11.1.el5]

Update RHEL5.8 kernel to 2.6.18-308.11.1.el5.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I9d93c9666af54ed8c7d6d9ff33154929c94afe2e
Reviewed-on: http://review.whamcloud.com/3096
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1563 quota: Put lqs properly in quota_pending_commit()
Niu Yawei [Tue, 26 Jun 2012 09:35:01 +0000 (02:35 -0700)]
LU-1563 quota: Put lqs properly in quota_pending_commit()

In quota_pending_commit(), always check if pending > 0 to figure
out if a lqs is held from quota_check_common(), otherwise, we
could find a lqs in quota_pending_commit(), then put it twice.

Signed-off-by: Niu Yawei <niu@whamcloud.com>
Change-Id: Ia653f41d721c002bbfbebcaf688b9943dde256bf
Reviewed-on: http://review.whamcloud.com/3187
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Reviewed-by: Fan Yong <yong.fan@whamcloud.com>
11 years agoLU-1535 ldlm: backport fix for LU-1128
Lai Siyao [Tue, 19 Jun 2012 08:58:41 +0000 (16:58 +0800)]
LU-1535 ldlm: backport fix for LU-1128

Backport fix for LU-1128 to 1.8:
For ldlm server pool shrinker, we just use it to decrease SLV,
but never reclaim any memory directly, so it should always return
-1 to inform the kernel to break the shrink loop.

Signed-off-by: Lai Siyao <laisiyao@whamcloud.com>
Change-Id: I1c841e7485375017d33e93c59f2318318fae299c
Reviewed-on: http://review.whamcloud.com/3138
Reviewed-by: Niu Yawei <niu@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1459 llite: Don't LBUG when import has LUSTRE_IMP_NEW state
Jeremy Filizetti [Thu, 31 May 2012 14:30:00 +0000 (10:30 -0400)]
LU-1459 llite: Don't LBUG when import has LUSTRE_IMP_NEW state

When a disabled OSC/OST is configured in the system at mount
time, a client will LBUG if calling "lfs check servers".
Disabling the LBUG causes client to return -EIO instead.

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: Ib689eb37c20d1012728abb7c35aee15f30604d54
Reviewed-on: http://review.whamcloud.com/2993
Tested-by: Hudson
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1459 llite: Don't use unitialized variable
Jeremy Filizetti [Tue, 5 Jun 2012 00:14:19 +0000 (20:14 -0400)]
LU-1459 llite: Don't use unitialized variable

Currently lov_connect_obd prints warning messages using
an unitialized stack variable. The message also only
prints a uuid instead of a meaningful target name.

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: I7549acdd45ab70e3528d2f0c153a27a502b0404a
Reviewed-on: http://review.whamcloud.com/2992
Tested-by: Hudson
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1448 llite: Prevent NULL pointer dereference on disabled OSC
Jeremy Filizetti [Thu, 31 May 2012 12:26:28 +0000 (08:26 -0400)]
LU-1448 llite: Prevent NULL pointer dereference on disabled OSC

When a file system is mounted with a disabled OSC reading the import
information from the proc file system can result in a NULL pointer
dereference. The Lustre import on a disabled OSC with remain
in the LUSTRE_IMP_NEW state and imp_connection will remain NULL.

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: Ib416b2d706ac9797715db2c0ea4f4eaa79bceb22
Reviewed-on: http://review.whamcloud.com/2977
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1438 quota: quota active checking is missed on slave
Niu Yawei [Fri, 8 Jun 2012 04:55:40 +0000 (21:55 -0700)]
LU-1438 quota: quota active checking is missed on slave

On quota slave, we missed checking if quota is enabled in the
quota_check_common() and several other places. Which could cause
slave retry acquire quota in quota_chk_acq_common() infinitely
when the quota is already turned off on master.

Signed-off-by: Niu Yawei <niu@whamcloud.com>
Change-Id: I707bc34684e95f2a0beec99548dc2d78a4ce8bbf
Reviewed-on: http://review.whamcloud.com/3060
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <yong.fan@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1438 quota: fix race in quota_chk_acq_common()
Niu Yawei [Mon, 28 May 2012 09:12:08 +0000 (02:12 -0700)]
LU-1438 quota: fix race in quota_chk_acq_common()

quota_check_common() & qctxt_adjust_qunit() uses different way
to check if quota is enforced on certain ID, which could result
in infinite loop in quota_chk_acq_common() when the QB/QI_SET
flag is cleared just after checking.

This patch used a non-instrusive way to fix this rare race.

Signed-off-by: Niu Yawei <niu@whamcloud.com>
Change-Id: I7212e9fc85e98a40e36d2773c02f838ca68339bb
Reviewed-on: http://review.whamcloud.com/2927
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-814 tests: remove leading spaces from $WRITE_DISJOINT
Yu Jian [Tue, 22 May 2012 11:38:42 +0000 (19:38 +0800)]
LU-814 tests: remove leading spaces from $WRITE_DISJOINT

In functions.sh, the WRITE_DISJOINT variable is defined as follows:

    WRITE_DISJOINT=${WRITE_DISJOINT:-\
        $(which write_disjoint 2> /dev/null || true)}

This will assign WRITE_DISJOINT with a value leading with spaces,
which causes "[: too many arguments" issue while checking the
variable. The PARALLEL_GROUPLOCK variable also has the same issue.
This patch fixes it.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I424af2acd4ef79ea67830010d17b6904769c2ca4
Reviewed-on: http://review.whamcloud.com/2866
Tested-by: Hudson
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-121 test: Change framework to only use the short hostname. 1.8.8-wc1 v1_8_8_WC1 v1_8_8_WC1_RC3
chris [Mon, 7 May 2012 12:34:49 +0000 (08:34 -0400)]
LU-121 test: Change framework to only use the short hostname.

This means stripping of everything off the name after and including
the first.

This change is only designed to make the .yml files consistent. The
log files will append the fully qualified name and appear to do this
consistantly.

We can create a jira to make the log and yml files consistant, but
the reality is that the yml files have a short life and are only
used to send to Maloo.

This change will allow automated posting of results and automated
testing to begin.

1. Carries out the above using hostname -s
2. Adds in a quick change so that LUSTRE_BUILD in yaml.sh can be a
reference to the source rather than just the lustre version string
which is recorded and written to the yaml anyway as LUSTER_VERSION

Additionally a couple of other changes sneaked in.

1. Allows the review information to be applied to the yaml output
file by way of exporting the variable CODE_REVIEW_YAML to be a yaml
description for maloo

2. The addition of a couple of fixes to make the permissions for yaml
files be allow-all. This permissions are bracketed and so do not
change any other parts of the code.

Signed-off-by: Chris Gearing <chris@whamcloud.com>
Change-Id: I4b2431030afd206bc83490f5c81fd04e57937aad
Reviewed-on: http://review.whamcloud.com/2663
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Yu Jian <yujian@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-458 debug: print client profile name correctly
Yu Jian [Mon, 21 May 2012 06:23:28 +0000 (14:23 +0800)]
LU-458 debug: print client profile name correctly

This patch reverts commit 48c2f667236e2f41f9fd0224b5de7a83517b3180,
which does not print client profile name correctly and introduces
a new defect that the client profile is not deleted properly.

In ll_put_super(), the memory space pointed to by profilenm is
in fact freed inside lustre_common_put_super(sb), which is called
before LCONSOLE_WARN(). In order to print the client profile name
in LCONSOLE_WARN(), we need copy the contents of profilenm to a
temporary storage before freeing profilenm.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I9ce2f304f3bad4761d2e3c857e4cdd5df6269c38
Reviewed-on: http://review.whamcloud.com/2841
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1424 kernel: Kernel update [RHEL6.2 2.6.32-220.17.1.el6]
yangsheng [Mon, 21 May 2012 15:25:21 +0000 (23:25 +0800)]
LU-1424 kernel: Kernel update [RHEL6.2 2.6.32-220.17.1.el6]

Update RHEL6.2 patchless client support to 2.6.32-220.17.1.el6.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: Ib6a164ecd7beb225107883fc21394056d2ce06bf
Reviewed-on: http://review.whamcloud.com/2848
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-458 debug: use profilenm before running class_del_profile() v1_8_8_WC1_RC2
Yu Jian [Wed, 16 May 2012 04:57:41 +0000 (12:57 +0800)]
LU-458 debug: use profilenm before running class_del_profile()

This patch fixes the defect in ll_put_super() which uses profilenm
after running class_del_profile(profilenm).

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: Ida357c6c970f3b6bf1bbe0060a71d17e65323aa2
Reviewed-on: http://review.whamcloud.com/2799
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-425 tests: fix the issue of using "grep -w"
Yu Jian [Thu, 12 Apr 2012 15:38:17 +0000 (23:38 +0800)]
LU-425 tests: fix the issue of using "grep -w"

This patch fixes the following issue while using "grep -w"
to do exact match:

$ echo /mnt/nbp0-2 | grep -w /mnt/nbp0
/mnt/nbp0-2

Per the description of "-w" option:
-w, --word-regexp
Select only those lines containing matches that form whole words.
The test is that the matching substring must either be at the
beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line
or followed by a non-word constituent character. Word-constituent
characters are letters, digits, and the underscore.

So, the hyphen "-" character is a non-word constituent character
and "grep -w" does not do exact match on strings which contain it.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I91962910033f561cc9c9a82bd88bbb6dff5594af
Reviewed-on: http://review.whamcloud.com/2528
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1340 release: get ready for 1.8.8-wc1 RC1
Johann Lombardi [Wed, 9 May 2012 15:49:07 +0000 (17:49 +0200)]
LU-1340 release: get ready for 1.8.8-wc1 RC1

Change lustre version to 1.8.8-wc1 for RC1

Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Change-Id: I96448a4e05c3cdbe8945642c53d007a51d0137eb
Reviewed-on: http://review.whamcloud.com/2696

11 years agoLU-1374 kernel: Kernel update [RHEL5.8 2.6.18-308.4.1.el5]
yangsheng [Fri, 4 May 2012 16:06:30 +0000 (00:06 +0800)]
LU-1374 kernel: Kernel update [RHEL5.8 2.6.18-308.4.1.el5]

Update RHEL5.8 kernel to 2.6.18-308.4.1.el5.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I1304dda0fae60c86ea67862095856a1c741ee3c3
Reviewed-on: http://review.whamcloud.com/2651
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-458 debug: quiet too noisy console messages at mount
yangsheng [Mon, 26 Mar 2012 17:09:15 +0000 (01:09 +0800)]
LU-458 debug: quiet too noisy console messages at mount

Quiet a number of extra debug messages printed to the console after a
remount or recovery. They provide no value and just add to the general
confusion of reading Lustre debug messages.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I5d5352d55d5a91f9fd4c55d077eebf1fdab61f80
Reviewed-on: http://review.whamcloud.com/2381
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1358 kernel: Kernel update [RHEL6.2 2.6.32-220.13.1.el6]
yangsheng [Wed, 2 May 2012 05:57:11 +0000 (13:57 +0800)]
LU-1358 kernel: Kernel update [RHEL6.2 2.6.32-220.13.1.el6]

Update RHEL6.2 patchless client to 2.6.32-220.13.1.el6.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I290f985bb6cbcc6c46bc1821c7d87819479eb1be
Reviewed-on: http://review.whamcloud.com/2623
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-814 test: automated NFS over lustre testing
Minh Diep [Fri, 20 Apr 2012 18:57:37 +0000 (11:57 -0700)]
LU-814 test: automated NFS over lustre testing

Provide setup nfs within auster framework
Note: this change includes LU-1134, LU-1213

Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: If28a237a23cd448c7d8b9a772a4b8951d94697ef
Reviewed-on: http://review.whamcloud.com/2593
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bogl@whamcloud.com>
Tested-by: Bob Glossman <bogl@whamcloud.com>
Reviewed-by: Yu Jian <yujian@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
11 years agoLU-1312 kernel: crash at boot time in isci driver
yangsheng [Tue, 24 Apr 2012 19:50:34 +0000 (03:50 +0800)]
LU-1312 kernel: crash at boot time in isci driver

Restore SG_ALL to default value to avoid crash isci.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I855ba8c7669b749fded51f3b0316f115d18e0fcd
Reviewed-on: http://review.whamcloud.com/2595
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
11 years agoLU-1335 build: include lustre srpm in build
Minh Diep [Thu, 19 Apr 2012 00:18:46 +0000 (17:18 -0700)]
LU-1335 build: include lustre srpm in build

Add a support for building lustre-*.src.rpm

Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: I301915c107e50ea5d1a3275ecc631f07aa2b78be
Reviewed-on: http://review.whamcloud.com/2576
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Brian J. Murrell <brian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-447 add lctl --net XXX push
Wally Wang [Tue, 30 Aug 2011 23:43:53 +0000 (16:43 -0700)]
LU-447 add lctl --net XXX push

In order to clear out peer/conn data in the gnilnd for testing after
adding lnet_notify(see LU-446), we need an alternative to
lctl --net gni del_peer - as this nukes the peer and results in false
lnd_query failures.

Change-Id: Ie8029953a2881c0d6e3ac250101d2d4374bbf3c1
Signed-off-by: Wally Wang <wang@cray.com>
Reviewed-on: http://review.whamcloud.com/1311
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang@whamcloud.com>
Reviewed-by: Lai Siyao <laisiyao@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
12 years agoLU-1340 release: bump version to 1.8.7.81-wc1 v1_8_7_81_WC1
Johann Lombardi [Fri, 20 Apr 2012 13:55:14 +0000 (15:55 +0200)]
LU-1340 release: bump version to 1.8.7.81-wc1

Yet another build which brings us closer to 1.8.8-wc1 RC1.

Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Change-Id: I7bac34ab4d29dc459171265319807c221a564c1a
Reviewed-on: http://review.whamcloud.com/2590

12 years agoLU-995 utils: make lfs getstripe directory output consistent.
Hongchao Zhang [Thu, 15 Mar 2012 04:03:07 +0000 (12:03 +0800)]
LU-995 utils: make lfs getstripe directory output consistent.

"lfs getstripe" should report the global default for any fields with
a value that means "use the default". This patch introduces the
following functionality:

1. If "lfs getstripe" is called on a directory and finds that striping
   EA is not set, the filesystem's defaults are looked up and printed.
2. If the striping EA is set, but the striping count and/or striping
   size has a value that means "use the default" (count = 0 and/or
   size = 0), the filesystem's default for that specific striping
   attribute is looked up and printed.
3. A new option to "lfs getstripe" is introduced; the "--raw" or
   "-R" option. If this option is specified, the previous two checks
   are skipped. In other words, if the striping EA is not set, 0, 0,
   -1, will be printed for the striping count, size, and offset
   respectively. Also, if the striping EA is set, the values will be
   printed without first converting them into their respective
   defaults.

This patch relies on the /proc filesystem to determine each
filesystem's default striping attributes, and a cache is maintained
which holds the default values for the last filesystem queried.

See Also:
Bugzilla #23802, https://bugzilla.lustre.org/show_bug.cgi?id=23802

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Hongchao Zhang <hongchao.zhang@whamcloud.com>
Change-Id: Ic5b616dd83775c0a15be5c060b11cfbba05c4fbb
Reviewed-on: http://review.whamcloud.com/2117
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-891 test: waiting import state for next step.
yangsheng [Fri, 2 Mar 2012 14:46:47 +0000 (22:46 +0800)]
LU-891 test: waiting import state for next step.

Anyway, There still has a rare chance that the request meet
a invalid import after recovery. So we should waiting import
restore to a certain state and then doing next operation.

Signed-off-by: yang sheng <ys@whamcloud.com>
Change-Id: I20bed347a16755ccaf102d4c67b0a7e87b1318a1
Reviewed-on: http://review.whamcloud.com/2248
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-734 tests: add sub-tests into recovery-*-scale tests
Yu Jian [Wed, 11 Apr 2012 06:41:49 +0000 (14:41 +0800)]
LU-734 tests: add sub-tests into recovery-*-scale tests

This patch adds sub-tests into the recovery-*-scale tests
so that test results and logs could be gathered properly
and uploaded to Maloo.

The patch also does some cleanup works on the test scripts
and moves some common functions into test-framework.sh.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I514143e1fa29aad289d215174dbc68d2740da73b
Reviewed-on: http://review.whamcloud.com/2508
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Cliff White <cliffw@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-577 tests: FAIL replay-single test_70b rundbench load
James Simmons [Wed, 18 Apr 2012 14:09:14 +0000 (10:09 -0400)]
LU-577 tests: FAIL replay-single test_70b rundbench load

Test 70b for replay-single assumes that lustre is mounted on
/mnt/lustre which is not the case for us. This patch passes
the proper MOUNT. The test also was not using the standard
DIR/tdir setup which had generated data files not being
cleaned up. Increased the sleep period to match dbench's
warm up period. This gives dbench a change to start up when
using many clients.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I3a793db35aa21d57220d0de1a9e92486e65ae21a
Reviewed-on: http://review.whamcloud.com/2518
Tested-by: Hudson
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-1062 tests: incorrect path to configuration file
Iurii Golovach [Tue, 20 Mar 2012 00:53:27 +0000 (17:53 -0700)]
LU-1062 tests: incorrect path to configuration file

this is a bit modified port of the
http://review.whamcloud.com/#change,1877
(author Andreas Dilger)

Xyratex-bug-id: MRP-480

Reviewed-by: Sergey Glushchenko <Sergey_Glushchenko@xyratex.com>
Reviewed-by: Andriy Skulysh <andriy_skulysh@xyratex.com>
Signed-off-by: Iurii Golovach <iurii_golovach@xyratex.com>
Change-Id: I10b42de2c2d453e23142c01290742153dff7262a
Reviewed-on: http://review.whamcloud.com/2419
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-427 test: Test failure on test suite lfsck
Yu Jian [Tue, 10 Apr 2012 10:23:02 +0000 (18:23 +0800)]
LU-427 test: Test failure on test suite lfsck

- Reset $MDSDB & $OSTDB in generate_db(). Else they will
  stale if user redefine $SHARED_DIRECTORY.
- Add a function check_shared_dir() to ensure
  $SHARED_DIRECTORY is shared among tests nodes.
- Fix check_logdir() and check_write_access() to avoid using
  node.$(hostname).yml files which should not be deleted.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: Ie7d1d08c0d2c701fa9fb74ef8b252fa8b31bf111
Reviewed-on: http://review.whamcloud.com/2498
Tested-by: Hudson
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
12 years agoLU-313 tests: re-enable lfsck test to run by default
Andreas Dilger [Tue, 10 Apr 2012 08:53:29 +0000 (16:53 +0800)]
LU-313 tests: re-enable lfsck test to run by default

Due to bug 13698, the lfsck part of the lfsck.sh test script was
disabled by default. After the fixes in LU-113 were landed, lfsck
should work again. Remove SKIP_LFSCK checks so lfsck.sh actually runs
lfsck instead of silently skipping it unless SKIP_LFSCK=no is set.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Change-Id: I430f7398b2d21db0d0755726fdcb6053f25b4b10
Reviewed-on: http://review.whamcloud.com/2497
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>