Whamcloud - gitweb
Fan Yong [Sat, 26 Jul 2014 23:49:29 +0000 (07:49 +0800)]
LU-4788 lfsck: verify .lustre/lost+found at the LFSCK start
/ROOT/.lustre/lost+found/ is a special directory to hold the
objects that the LFSCK does not exactly know how to handle,
such as orphans. So before the LFSCK scanning the system,
the consistency of such directory needs to be verified firstly
to allow the users it during the LFSCK.
fid_seq_is_dot_lustre() is a duplication of fid_seq_is_dot(),
drop it and cleanup the code.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I95cac84bed1ae16c8c86e495db0120d964395b5e
Reviewed-on: http://review.whamcloud.com/10987
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Sat, 26 Jul 2014 22:20:22 +0000 (06:20 +0800)]
LU-5509 osd: get PFID from linkEA for remote dir on ldiskfs
On the ldiskfs backend, for a directory which parent resides on
remote MDT, to satisfy the local e2fsck, we insert it into the
/REMOTE_PARENT_DIR locally. On the other hand, to make the lookup(..)
on the directory can return the real parent FID, we append the real
parent FID after its ".." name entry in the /REMOTE_PARENT_DIR.
Unfortunately, such PFID-in-dirent cannot be preserved via file-level
backup. So after the restore, we cannot get the right parent FID from
its ".." name entry in the /REMOTE_PARENT_DIR. Under such case, since
we have stored the real parent FID in the directory object's linkEA,
we can parse the linkEA for the real parent FID.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Icf1e24ec911818b3a49a253f67c72334a4b75712
Reviewed-on: http://review.whamcloud.com/11485
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Sat, 26 Jul 2014 22:15:42 +0000 (06:15 +0800)]
LU-5508 osp: RPC adjustment for remote transaction
1) For remote transaction, the set_attr/set_xattr RPC should not be
prepared in declare phase. According to our current transaction/
dt_object_lock framework, the transaction sponsor will start the
transaction firstly, then try to acquire related dt_object_lock
if needed. That is a general rule, and the LFSCK needs to follow
such rule when repair inconsistent linkEA, in spite of local or
remote MDT-object.
For linkEA repairing case, before the LFSCK thread obtained the
dt_object_lock on the target MDT-object, it cannot know whether
the MDT-object has linkEA or not, neither invalid or not.
Since the LFSCK cannot hold dt_object_lock before the (remote)
transaction start (otherwise there will be potential deadlock),
it cannot prepare related RPC for repairing during the declare
phase as other normal transactions do.
To resolve the trouble, we will make OSP to prepare related RPC
(set_attr/set_xattr/del_xattr) after remote transaction started,
and trigger the remote updating (RPC sending) when trans_stop.
Then the up layer users, such as LFSCK, can follow the general
rule to handle trans_start/dt_object_lock for repairing linkEA
inconsistency without distinguishing remote MDT-object.
2) Some adjustment for OSP object attributes cache maintainig to make
the logic more clear and reasonable.
2.1) Update the cached attribute in osp_attr_set(), but not in the
osp_declare_attr_set().
2.2) Update the cached extended attribute in osp_xattr_set(), but not
in the osp_declare_xattr_set().
2.3) Drop the cached extended attribute in osp_xattr_del().
3) Typo fixing and code cleanup.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I43c88a8fd3b184c91a4b3cbd4104e35f9915ee24
Reviewed-on: http://review.whamcloud.com/11382
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Stephen Champion [Fri, 12 Sep 2014 00:03:01 +0000 (17:03 -0700)]
LU-5610 tests: Handle quoted module options
When test-framework.sh translates module options to environment
variables for remote nodes, quotes sould be escaped to the subshell.
Signed-off-by: Stephen Champion <schamp@sgi.com>
Change-Id: I937cc28b96b54ea75082c7d8789c762b4db16c5f
Reviewed-on: http://review.whamcloud.com/11887
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Nunez [Mon, 8 Sep 2014 23:08:14 +0000 (17:08 -0600)]
LU-5596 lnet: Remove obsolete LNET variable
For Lustre version 2.6.50 and later, the variable
session_features is defined as "LST_FEATS_MASK". For
earlier versions of Lustre, the same variable is defined
as "LST_FEATS_EMPTY.
Since Lustre master is at 2.6.52 or later, the second
definition of "session_features" can be removed.
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I52e7a914880509cfcd4961032ab7775bbaf626a8
Reviewed-on: http://review.whamcloud.com/11825
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Thu, 28 Aug 2014 18:58:50 +0000 (14:58 -0400)]
LU-4746 libcfs: Use Linux kernel current_umask() function
Lustre not using kernel current_umask() function breaks GRSecurity
umask handling. This is also needed for the linux api cleanup.
Replaces current->fs->umask with more secure current_umask() function
Change-Id: Ide0b83eb3e6c69e1e2178ede37ce708227f1c107
Signed-off-by: Andrew Prout <ajprout@hotmail.com>
Signed-off-by: Cliff White <cliffwhi@intel.com>
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11642
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Andreas Dilger [Wed, 28 May 2014 23:15:25 +0000 (17:15 -0600)]
LU-5499 tests: keep /sbin/mount.lustre until cleanup
Don't unmount /sbin/mount.lustre in the middle of running tests
on a local test system if it is not doing final cleanup. Otherwise,
later mounts may fail.
The current /sbin/mount.lustre mountpoint is an empty stub that
returns success (0) if executed, but doesn't mount the filesystem.
Instead, create a mountpoint that prints an message if executed and
returns an error to the caller, so it is easier to debug problems.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ied7b69f536bad87333cf5c543384723412500c1e
Reviewed-on: http://review.whamcloud.com/11259
Reviewed-by: frank zago <fzago@cray.com>
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Prakash Surya [Thu, 3 Oct 2013 00:16:51 +0000 (17:16 -0700)]
LU-1669 vvp: Use lockless __generic_file_aio_write
Testing multi-threaded single shard file write performance has shown
the inode mutex to be a limiting factor when using the
generic_file_aio_write function. To work around this bottle neck, this
change replaces the locked version of that call with the lock less
version, specifically, __generic_file_aio_write.
In order to maintain posix consistency, Lustre must now employ it's
own locking mechanism in the higher layers. Currently writes are
protected using the lli_write_mutex in the ll_inode_info structure.
To protect against simultaneous write and truncate operations, since
we no longer take the inode mutex during writes, we must down the
lli_trunc_sem semaphore.
Unfortunately, this change by itself does not garner any performance
benefits. Using FIO on a single machine with 32 GB of RAM, write
performance tests were ran with and without this change applied; the
results are below:
+---------+-----------+---------+--------+--------+
| fio v2.0.13 | Write Bandwidth (KB/s) |
+---------+-----------+---------+--------+--------+
| # Tasks | GB / Task | Test 1 | Test 2 | Test 3 |
+---------+-----------+---------+--------+--------+
| 1 | 64 | 452446 | 454623 | 457653 |
| 2 | 32 | 850318 | 565373 | 602498 |
| 4 | 16 | 1058900 | 463546 | 529107 |
| 8 | 8 | 1026300 | 468190 | 576451 |
| 16 | 4 | 1065500 | 503160 | 462902 |
| 32 | 2 | 1068600 | 462228 | 466963 |
| 64 | 1 | 991830 | 556618 | 557863 |
+---------+-----------+---------+--------+--------+
* Test 1: Lustre client running 04ec54f. File per process write
workload. This test was used as a baseline for what we
_could_ achieve in the single shared file tests if the
bottle necks were removed.
* Test 2: Lustre client running 04ec54f. Single shared file
workload, each task writing to a unique region.
* Test 3: Lustre client running 04ec54f + this patch. Single shared
file workload, each task writing to a unique region.
In order to garner any real performance benefits out of a single
shared file workload, the lli_write_mutex needs to be broken up into a
range lock. That would allow write operations to unique regions of a
file to be executed concurrently. This work is left to be done in a
follow up patch.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Change-Id: I0023132b5d941b3304f39f015f95106542998072
Reviewed-on: http://review.whamcloud.com/6672
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Prakash Surya [Wed, 19 Jun 2013 17:30:36 +0000 (10:30 -0700)]
LU-1669 llite: Replace write mutex with range lock
Testing has shown the ll_inode_inode's lli_write_mutex to be a
limiting factor with single shared file write performance, when using
many writing threads on a single machine. Even if each thread is
writing to a unique portion of the file, the lli_write_mutex will
prevent no more than a single thread to ever write to the file
simultaneously.
This change attempts to remove this bottle neck, by replacing this
mutex with a range lock. This should allow multiple threads to write
to a single file simultaneously iff the threads are writing to unique
regions of the file.
Performance testing shows this change to garner a significant
performance boost to write bandwidth. Using FIO on a single machine
with 32 GB of RAM, write performance tests were run with and without
this change applied; the results are below:
+---------+-----------+---------+--------+--------+--------+
| fio v2.0.13 | Write Bandwidth (KB/s) |
+---------+-----------+---------+--------+--------+--------+
| # Tasks | GB / Task | Test 1 | Test 2 | Test 3 | Test 4 |
+---------+-----------+---------+--------+--------+--------+
| 1 | 64 | 452446 | 454623 | 457653 | 463737 |
| 2 | 32 | 850318 | 565373 | 602498 | 733027 |
| 4 | 16 | 1058900 | 463546 | 529107 | 976284 |
| 8 | 8 | 1026300 | 468190 | 576451 | 963404 |
| 16 | 4 | 1065500 | 503160 | 462902 | 830065 |
| 32 | 2 | 1068600 | 462228 | 466963 | 749733 |
| 64 | 1 | 991830 | 556618 | 557863 | 710912 |
+---------+-----------+---------+--------+--------+--------+
* Test 1: Lustre client running 04ec54f. File per process write
workload. This test was used as a baseline for what we
_could_ achieve in the single shared file tests if the
bottle necks were removed.
* Test 2: Lustre client running 04ec54f. Single shared file
workload, each task writing to a unique region.
* Test 3: Lustre client running 04ec54f + I0023132b. Single shared
file workload, each task writing to a unique region.
* Test 4: Lustre client running 04ec54f + this patch.
Single shared file workload, each task writing to a unique
region.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Change-Id: I71e060c190065d87a20dc8df3104f898883d0583
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/6320
Tested-by: Jenkins
Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Oleg Drokin [Fri, 12 Sep 2014 16:17:31 +0000 (16:17 +0000)]
Revert "LU-5261 osc: use wait_for_completion_killable() instead"
This is causing LU-5446
This reverts commit
2b3663dda896f669c87feb49e7f3c7d85a89cefe.
Change-Id: I8bd254137ad0d402bad5f5aac85aa52cd3d47f63
Reviewed-on: http://review.whamcloud.com/11892
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Bob Glossman [Tue, 9 Sep 2014 19:59:10 +0000 (12:59 -0700)]
LU-5600 kernel: kernel update RHEL6.5 [2.6.32-431.29.2.el6]
Update RHEL6.5 kernel to 2.6.32-431.29.2.el6
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I4832cf12cb49eebd1c81fc4e07a53d0a1315500d
Reviewed-on: http://review.whamcloud.com/11837
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Vitaly Fertman [Sat, 21 Jun 2014 22:02:37 +0000 (02:02 +0400)]
LU-5520 ldlm: resend AST callbacks
While clients will resend client->server RPCs, servers would not
resend server->client RPCs such as LDLM callbacks (blocking
or completion callbacks/ASTs). This could result in clients being
evicted from the server if blocking callbacks were dropped by the
network (a failed router or lossy network) and the client did not
cancel the requested lock in time.
In order to fix this problem, this patch adds the ability to resend
LDLM callbacks from the server and give the client a chance to
respond within the timeout period before it is evicted:
- resend BL AST within lock callback timeout period;
- still do not resend CANCEL_ON_BLOCK;
- regular resend for CP AST without BL AST embedded;
- prolong lock callback timeout on resend;
some fixes:
- recovery-small test_10 to actually evict the client
with dropped BL AST;
- ETIMEDOUT to be returned if send limit is expired;
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Change-Id: Ie65fac94ea68defffd1769cbbb0f74381c11262c
Tested-by: Elena Gryaznova <Elena_Gryaznova@xyratex.com>
Reviewed-by: Alexey Lyashkov <Alexey_Lyashkov@xyratex.com>
Reviewed-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Xyratex-bug-id: MRP-417
Reviewed-on: http://review.whamcloud.com/9335
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Vitaly Fertman [Thu, 28 Aug 2014 22:33:54 +0000 (02:33 +0400)]
LU-5496 ldlm: reconstruct proper flags on enqueue resend
otherwise, waiting lock may get granted as no BLOCKED_GRANTED
flag is returned
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Xyratex-bug-id: MRP-1944
Change-Id: I5e938ff0454d5e8694b09f9fff3c1f82d746360d
Reviewed-on: http://review.whamcloud.com/11644
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Dmitry Eremin [Fri, 5 Sep 2014 14:36:31 +0000 (18:36 +0400)]
LU-5592 ofd: fix incorrect check for NULL
Fix incorrect check for NULL in ofd_version_get_check()
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ia671f86ab051eafa5336ba18f0378cdac846f0c5
Reviewed-on: http://review.whamcloud.com/11773
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Alexander.Boyko [Tue, 1 Jul 2014 19:17:28 +0000 (23:17 +0400)]
LU-3192 osc: split different type of IO
Do not allow different type of pages at the same rpc.
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Xyratex-bug-id: MRP-859
Change-Id: Ic595a29f685757faf4d9c3de9c2d2ae8fd039baf
Reviewed-on: http://review.whamcloud.com/10930
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Jeff Mahoney [Wed, 3 Sep 2014 13:37:36 +0000 (09:37 -0400)]
LU-4416 osd-ldiskfs: limit trans size to match reservations
Linux commit
8f7d89f36 (jbd2: transaction reservation support)
shrunk the limit of a transaction size to be
j_max_transaction_buffers / 2 to accomodate the reservations.
This can have a perforamce impact so we limit this change to
3.10+ kernels. The same commit that changed this is the same
one that changed the number of arguments to ext4_journal_start
function so we use that test for this. This patch adjusts the
limit in osd_trans_start to match depending on the kernel
version.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I10a65474dbaf7b67fe7cd59cc2544759d1e3270d
Reviewed-on: http://review.whamcloud.com/10376
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Jinshan Xiong [Fri, 5 Sep 2014 19:31:59 +0000 (12:31 -0700)]
LU-3676 llite: to configure max_cached_mb correctly
If there exists MGS conf_param to reduce the memory cache
max_cached_mb it will fail because dt_exp is not initialized
yet.
It should just go ahead and configure it because certainly it
have enough free LRU slot to deduct ccc_lru_left.
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I317e556413bc63086378dec294d8ba2792afca52
Reviewed-on: http://review.whamcloud.com/11783
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
James Simmons [Wed, 3 Sep 2014 16:40:20 +0000 (12:40 -0400)]
LU-5275 libcfs: cleanup the proc hash and cfs wrappers
Remove the non seq file version of the cfs_hash functions.
Remove the basic cfs wrappers for procfs definations.
Change-Id: Ie4996656267998b78e44cdedfbf972f358e04d50
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11743
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Nathaniel Clark [Mon, 21 Jul 2014 13:31:11 +0000 (09:31 -0400)]
LU-4334 utils: Only set a single property for nodes
For servicenode, failnode and mgsnode only set a single property but
concatenate data. ZFS can only set a single property for a given
name so this prevents erroneously overwritting previous entries.
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ia0316e138b0d68dcda1d14811e43db3bbed64457
Reviewed-on: http://review.whamcloud.com/11161
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Lai Siyao [Sun, 7 Sep 2014 15:09:20 +0000 (11:09 -0400)]
LU-3270 statahead: statahead thread wait for RPCs to finish
Statahead thread should wait for inflight stat RPCs to finish in
case statahead RPC callback may access data allocated in statahead
thread context.
ll_sa_entry_fini() should keep old entry if stat RPC is not
finished yet.
Simplify sai refcounting:
* newly allocated sai will hold one refcount, and it will put it
after starting statahead thread.
* statahead thread holds one refcount.
* agl thread holds one refcount.
* stat process calls do_statahead_enter() which will try to get
sai, and if it's valid, it will revalidate from statahead cache,
and put refcount after use.
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I55a4fe66a5f6c04595d3bc84f0cd3750f20e0ee4
Reviewed-on: http://review.whamcloud.com/9663
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Mon, 11 Aug 2014 13:50:42 +0000 (09:50 -0400)]
LU-4993 osd-ldiskfs: Support 3.14 kernel changes to bio api
In the 3.14 kernel code base several data fields in
struct bio were moved into a new structure called
bvec_iter. This patch updates osd-ldiskfs to handle
this api change.
Change-Id: I849a1d62462c58a79766c176060b27c621627646
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/10995
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Dmitry Eremin [Thu, 24 Jul 2014 17:39:17 +0000 (21:39 +0400)]
LU-5200 llite: Cmp of unsigned value against 0 is always true
Comparison of unsigned value against 0 is always true.
desc->bd_iov_count declared as int and atomic_sub_return() have
first argument int also.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I89cedc4991d1a117f824ce6b5d8cf045595fdc0f
Reviewed-on: http://review.whamcloud.com/11217
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Thu, 14 Aug 2014 19:17:28 +0000 (14:17 -0500)]
LU-2675 lnet: add lnet/nidstr.h
Add lnet/include/lnet/nidstr.h to break the include loop between
libcfs/libcfs.h and lnet/types.h. Where possible include lnet/types.h
or lnet/nidstr.h rather than lnet/lnet.h. Remove the unneccessary
headers lnet/{,darwin/,linux/}api-support.h.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ide4cd79295eba8705c0d413449cbb812343cbec9
Reviewed-on: http://review.whamcloud.com/11506
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Mon, 18 Aug 2014 17:30:28 +0000 (12:30 -0500)]
LU-2675 lustre: move lustre_intent.h to lustre/include
Move lustre_intent.h from lustre/include/linux to lustre/include.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I8b3947bc558e6721e4a238526497124d3ad193d0
Reviewed-on: http://review.whamcloud.com/11499
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
John L. Hammond [Mon, 18 Aug 2014 17:06:57 +0000 (12:06 -0500)]
LU-2675 lustre: remove linux/lustre_handles.h
Remove the cfs_rcu_head_t typedef and remove
lustre/include/linux/lustre_handles.h.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I75a9d455ae68de6173b3c47b9e10a6a6eda65989
Reviewed-on: http://review.whamcloud.com/11498
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
John L. Hammond [Mon, 18 Aug 2014 16:57:47 +0000 (11:57 -0500)]
LU-2675 lustre: remove linux/lustre_dlm.h
Remove lustre/include/linux/lustre_dlm.h.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I7895cbe7a42146e4be14a0c36fecd28f642d1cd1
Reviewed-on: http://review.whamcloud.com/11497
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Mon, 18 Aug 2014 16:50:34 +0000 (11:50 -0500)]
LU-2675 lustre: remove linux/lustre_debug.h
Move the definition of LL_CDEBUG_PAGE() to
lustre/include/lustre_debug.h and remove
lustre/include/linux/lustre_debug.h.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ie5450ea76bfb1abeef8256da27aac5ea3009a1f2
Reviewed-on: http://review.whamcloud.com/11496
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Chao Wang [Fri, 1 Aug 2014 16:14:58 +0000 (12:14 -0400)]
LU-5030 utils: fix hard-coded /proc/fs/lustre in scripts
In the upstream Linux kernel, the files under /proc/fs/lustre and lnet
will be moved in the future to use sysfs. Lustre handles this by
providing access to this data with the tool lctl which is independent
of where the data is located. Many scripts directly access the proc file
system instead of using lctl so this patch migrates those scripts to
do the proper thing.
Signed-off-by: Chao Wang <chao.ornl@gmail.com>
Change-Id: I1d96ccd27fee2b0eb0bf173a4e37adacb628f83c
Reviewed-on: http://review.whamcloud.com/10534
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Yang Sheng [Thu, 7 Aug 2014 11:05:31 +0000 (19:05 +0800)]
LU-5276 build: handle RHEL ldiskfs series more accurated
Since RHEL7 change RHEL_RELEASE macro format. So we need
a unified way to handle it. Change using RHEL_MAJOR &
RHEL_MINOR to decided which ldiskfs series to be choose.
Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I147c4ff01372f4aafcdbce6197c7a85e9d64027b
Reviewed-on: http://review.whamcloud.com/11398
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Tue, 2 Sep 2014 14:14:05 +0000 (10:14 -0400)]
LU-3963 libcfs: remove last cfs wrappers for cpu node handling
Th last of the cfs wrapper cfs_for_each_possible_cpu is removed
with this patch.
Change-Id: I2c626e587343c3950599d1da6442cd25cb2d27a6
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11729
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Henri Doreau [Fri, 22 Aug 2014 13:50:48 +0000 (15:50 +0200)]
LU-5538 mdc: Report D_CHANGELOG messages as D_HSM.
Removed the D_CHANGELOG pseudo-debug flag that wasn't actually defined
as a usable one. Report the D_CHANGELOG messages as D_HSM ones instead
since this is the primary user of these messages.
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: I4d82b16935d053e1b7e43512a1b9368f4b0316b5
Reviewed-on: http://review.whamcloud.com/11558
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Thomas Leibovici [Tue, 19 Aug 2014 11:51:05 +0000 (13:51 +0200)]
LU-5504 utils: add const qualifier to changelog accessors.
The following accessors don't need to modify their argument:
changelog_rec_size(), changelog_rec_name(), changelog_rec_snamelen(),
changelog_rec_sname().
Make their prototype more rigorous by adding a "const" qualifier.
Signed-off-by: Thomas Leibovici <thomas.leibovici@cea.fr>
Change-Id: I7915a2487b2b772482c777c4c19ec947b7015b74
Reviewed-on: http://review.whamcloud.com/11517
Tested-by: Jenkins
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Tue, 2 Sep 2014 15:44:38 +0000 (11:44 -0400)]
LU-5275 obdclass: remove lproc_var argument to lprocfs_add_simple
The struct lproc_var argument is no longer used so we can
simplify the lprocfs_add_simple function.
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I2099e92c9e93c67843a41fb8e46a481a75fbd004
Reviewed-on: http://review.whamcloud.com/11732
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Bob Glossman [Tue, 26 Aug 2014 03:04:59 +0000 (20:04 -0700)]
LU-5543 ldiskfs: export ldiskfs_map_blocks
Regression caused by reccnt commit
31190547864cbcac1f6b85e88fd129dfe7de0977. Seen only in sles11sp3 as
it requires a server build on a 3.0 or later kernel to make it happen.
sles11sp3 is the only currently supported distro where that is true.
New code in osd-ldiskfs that is conditional when
HAVE_LDISKFS_MAP_BLOCKS is set by autoconf calls ldiskfs_map_blocks.
That entry point exists in ext4/ldiskfs code, but isn't exported with
EXPORT_SYMBOL. This causes a runtime failure at module load time of
osd-ldiskfs with an error like:
osd_ldiskfs: Unknown symbol ldiskfs_map_blocks (err 0)
This mod adds the needed EXPORT_SYMBOL() to an ldiskfs patch.
Test-Parameters: mdsdistro=sles11sp3 ossdistro=sles11sp3 clientdistro=sles11sp3 mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I0722bfaf73d736247f07d5402aea05dadcfcd394
Reviewed-on: http://review.whamcloud.com/11591
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Thu, 28 Aug 2014 17:32:50 +0000 (13:32 -0400)]
LU-5275 obdclass: Remove lprocfs_vars argument from class_register_type function
Lustre no longer uses struct lprocfs_vars with any instance
of class_register_type function. We can safely remove it.
Change-Id: Ia4ecabab4f286bfe9991d1a860e2485694fca5d0
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11640
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Thu, 28 Aug 2014 17:24:12 +0000 (13:24 -0400)]
LU-5275 obdclass: Remove non seq file proc routines
Now that we have moved everything over to seq_files
for the proc handling we can remove all non seq_file
routines in lustre.
Change-Id: I20bd22fe920e430183b266219029a64beeb9747a
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11451
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Amir Shehata [Wed, 27 Aug 2014 23:21:11 +0000 (16:21 -0700)]
LU-5540 lnet: fix crash due to NULL networks string
If there is an invalid networks or ip2nets lnet_parse_networks()
gets called with a NULL 'network' string parameter
lnet_parse_networks() needs to sanitize its input string now that
it's being called from multiple places. Instead, check for
a NULL string everytime the function is called, which reduces the
probability of errors with other code modifications.
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Ifcc1f6f74a3e0e804cb65e3d1b83f85a24f44d9b
Reviewed-on: http://review.whamcloud.com/11626
Tested-by: Jenkins
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Mon, 18 Aug 2014 16:43:31 +0000 (11:43 -0500)]
LU-2675 lustre: remove linux/lustre_common.h
Remove lustre/include/linux/lustre_common.h and several unnecessary
calls to cfs_cleanup_group_info().
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I149673dea6559b02a5de1c0a160836d67ea96119
Reviewed-on: http://review.whamcloud.com/11495
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Henri Doreau [Fri, 18 Apr 2014 19:47:37 +0000 (21:47 +0200)]
LU-4928 utils: LLAPI helpers for file leases
Introduced llapi_lease_{get,check,put} functions to abstract the ioctl
implementation to manipulate file leases.
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: I956408defe550c1b432526e142825ea736d0e285
Reviewed-on: http://review.whamcloud.com/10022
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
James Simmons [Tue, 2 Sep 2014 14:26:22 +0000 (10:26 -0400)]
LU-3963 libcfs: Delete empty source files for linux layer
Some of the files in libcfs/libcfs/linux are actually
empty. Remove them.
Change-Id: Ic49d3bea7777488ab51fdcdc1f93ed516654c665
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11725
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Lai Siyao [Mon, 30 Jun 2014 03:45:21 +0000 (11:45 +0800)]
LU-4975 doxygen: add comments for lproc_osp.c
- Fix up GPL header block to reference proper GPLv2 license URL.
- Remove mention of contacting Sun.
- Add introductionary comment block for the lproc_osp.c file.
- Add function comments blocks to all functions in the lproc_osp.c
file
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: If037ab1777b0640e8cd6b66f0a78ca54339907ff
Reviewed-on: http://review.whamcloud.com/10893
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Fan Yong [Sun, 13 Jul 2014 12:14:10 +0000 (20:14 +0800)]
LU-4788 lfsck: take ldlm lock before modifying visible object
Before the LFSCK modifying on the namespace visible object,
it needs to acquire related ibits lock to pervent the client
to cache stale information. The .lustre/lost+found/ and its
sub-directories also needs related ldlm locks.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I439e02e1b7b24e87e7e6e25c5084f1c98116e7f7
Reviewed-on: http://review.whamcloud.com/10986
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Fan Yong [Sun, 13 Jul 2014 12:11:56 +0000 (20:11 +0800)]
LU-5395 lfsck: deadlock between LFSCK and destroy
There is potential deadlock race condition between object
destroy and layout LFSCK. Consider the following scenario:
1) The LFSCK thread obtained the parent object firstly, at
that time, the parent object has not been destroyed yet.
2) One RPC service thread destroyed the parent and all its
children objects. Because the LFSCK is referencing the
parent object, then the parent object will be marked as
dying in RAM. On the other hand, the parent object is
referencing all its children objects, then all children
objects will be marked as dying in RAM also.
3) The LFSCK thread tries to find some child object with
the parent object referenced. Then it will find that the
child object is dying. According to the object visibility
rules: the object with dying flag cannot be returned to
others. So the LFSCK thread has to wait until the dying
object has been purged from RAM, then it can allocate a
new object (with the same FID) in RAM. Unfortunately, the
LFSCK thread itself is referencing the parent object, and
cause the parent object cannot be purged, then cause the
child object cannot be purged also. So the LFSCK thread
will fall into deadlock.
We introduce non-blocked version lu_object_find() to allow
the LFSCK thread to return failure immediately (instead of
wait) when it finds dying (child) object, then the LFSCK
thread can check whether the parent object is dying or not.
So avoid above deadlock.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7f465259011ad5fb92ef1b4dba0ff9f46d134352
Reviewed-on: http://review.whamcloud.com/11373
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Sun, 13 Jul 2014 12:08:23 +0000 (20:08 +0800)]
LU-5395 lfsck: misc patch to prevent lfsck hung
1) When the LFSCK rebuilt the crashed LAST_ID files, it will
notify the MDS to sync lastid information via disconnecting
the connection. OFD should hold the export reference before
disconnecting to allow to send RPC reply message.
2) When the layout LFSCK scans on the OST, it needs to handle
the IDIF objects specially (use fid_idif_id() to get the
OST object ID) to avoid to regard the LAST_ID file as
corrupted by wrong.
3) The LFSCK should check the ostid_to_fid() return value for
corrupted OSTID and/or index.
4) If the LAST_ID file is not crashed, then do not update the
LAST_ID file.
5) NOT change the lu_buf::lb_len once the lu_buf::lb_buf is
allocated to prevent accessing released or non-allocated
RAM space by wrong.
6) Other small fixes and code cleanup.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I84726ddcf0b8fa6b334163fb13d9bae273033d20
Reviewed-on: http://review.whamcloud.com/11304
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Sun, 13 Jul 2014 11:19:23 +0000 (19:19 +0800)]
LU-4788 lfsck: LFSCK code framework adjustment (1)
The LFSCK wrap functions are only used by the LFSCK engines.
So move these functions from lfsck_lib.c to lfsck_engine.c.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ic192c71101c1718fe893b64f8e4ff74f4992914b
Reviewed-on: http://review.whamcloud.com/10493
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Fan Yong [Sun, 13 Jul 2014 11:11:11 +0000 (19:11 +0800)]
LU-4788 lfsck: replace cfs_list_t with list_head
Do not use cfs_list_xxx any more in LFSCK.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I74bd14fe37274b06762ce5f01aec6e1288837000
Reviewed-on: http://review.whamcloud.com/10602
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Fan Yong [Sun, 13 Jul 2014 04:14:20 +0000 (12:14 +0800)]
LU-4970 tests: wait async LFSCK updates to be done
There may be some async LFSCK updates in processing when the
test scripts check the reparation result. So make the scripts
to wait/retry until the async updates has been done or timeout.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I71a1986de432631ec7158a7e90797c46ae672812
Reviewed-on: http://review.whamcloud.com/11590
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Li Xi [Mon, 25 Aug 2014 08:56:29 +0000 (16:56 +0800)]
LU-5405 llog: add newly opened llog at tail of handle list
Add newly opened llog handle at the tail of handle list to increase
lookup speed, especially for cancel operation, because the canceled
log will be removed from the list, and lookup is from the beginning.
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I4c5c22901ea5818a8ee50ef97d6dabe4839b9a74
Reviewed-on: http://review.whamcloud.com/11575
Tested-by: Jenkins
Reviewed-by: Ian Costello <costello.ian@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Li Dongyang [Thu, 28 Aug 2014 02:17:33 +0000 (12:17 +1000)]
LU-5552 llite: make sure we do cl_page_clip on the last page
When we are doing a partial IO on both first and last page,
the logic currently only call cl_page_clip on the first page, which
will end up with a incorrect i_size.
Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Change-Id: Ia7be3d71e535d583cb424bb816c14015d3141cdb
Reviewed-on: http://review.whamcloud.com/11630
Reviewed-by: Ian Costello <costello.ian@gmail.com>
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Li Xi <pkuelelixi@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Oleg Drokin [Sat, 30 Aug 2014 01:28:43 +0000 (21:28 -0400)]
New tag 2.6.52
Change-Id: I1d9113e953c289e4fc6d0c8c7fbc0e573432e840
Andreas Dilger [Thu, 19 Dec 2013 23:24:00 +0000 (07:24 +0800)]
LU-4217 build: bump build version warnings to x.y.53
Move the LUSTRE_VERSION_CODE checks to trigger on x.y.53 instead of
x.y.50, so that it is into the development cycle that they are hit
instead of right at the start.
In many cases, the #warning has been removed (to prevent build errors)
and instead the code is just disabled outright. The dead code can be
seen easily and removed in the future with less interruption to the
development process.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: Iae310f66557be5e250c79c216e002c6c5165b09e
Reviewed-on: http://review.whamcloud.com/8630
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Liang Zhen [Tue, 19 Aug 2014 02:55:04 +0000 (10:55 +0800)]
LU-5548 ptlrpc: avoid list scan in ptlrpcd_check
ptlrpcd_check() always scan all requests on ptlrpc_request_set
and try to finish completed requests, this is low efficiency.
Even worse, l_wait_event() always checks condition for twice
before sleeping and one more time after waking up, which means
it will call ptlrpcd_check() for three times in each loop.
This patch will move completed requests at the head of list
in ptlrpc_check_set(), with this change ptlrpcd_check doesn't
need to scan all requests anymore.
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I84ef477717b53be9e508a596594f176c9de476bb
Reviewed-on: http://review.whamcloud.com/11513
Tested-by: Maloo <hpdd-maloo@intel.com>
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Jian Yu [Fri, 1 Aug 2014 08:41:17 +0000 (01:41 -0700)]
LU-5443 lnet: replace direct HZ access with kernel APIs
On some customers’ systems, kernel was compiled with HZ defined to
100, instead of 1000. This improves performance for HPC applications.
However, to use these systems with Lustre, customers have to re-build
Lustre for the kernel because Lustre directly uses the defined
constant HZ.
Since kernel 2.6.21, some non-HZ dependent timing APIs become non-
inline functions, which can be used in Lustre codes to replace the
direct HZ access.
These kernel APIs include:
jiffies_to_msecs()
jiffies_to_usecs()
jiffies_to_timespec()
msecs_to_jiffies()
usecs_to_jiffies()
timespec_to_jiffies()
And here are some samples of the replacement:
HZ -> msecs_to_jiffies(MSEC_PER_SEC)
n * HZ -> msecs_to_jiffies(n * MSEC_PER_SEC)
HZ / n -> msecs_to_jiffies(MSEC_PER_SEC / n)
n / HZ -> jiffies_to_msecs(n) / MSEC_PER_SEC
n / HZ * 1000 -> jiffies_to_msecs(n)
This patch replaces the direct HZ access in lnet module.
Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I0be6c82636df08b0a0a763ea31dafa817c077fe1
Reviewed-on: http://review.whamcloud.com/11303
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Amir Shehata [Thu, 10 Oct 2013 22:27:11 +0000 (15:27 -0700)]
LU-2456 lnet: Dynamic LNet Configuration (DLC) show command
This is the fifth patch of a set of patches that enables DLC.
This patch adds the new structures which will be used
in the IOCTL communication. It also added a set of
show operations to show buffers, networks, statistics
and peer information.
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I96e5cb3dcf07289c6cd1deb46f4acb3c263ae21e
Reviewed-on: http://review.whamcloud.com/8022
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Amir Shehata [Wed, 9 Oct 2013 18:30:04 +0000 (11:30 -0700)]
LU-2456 lnet: Dynamic LNet Configuration (DLC) IOCTL changes
This is the fourth patch of a set of patches that enables DLC.
This patch changes the IOCTL infrastructure in preparation of
adding extra IOCTL communication between user and kernel space.
The changes include:
- adding a common header to be passed to ioctl infra functions
instead of passing an exact structure. This header is meant
to be included in all structures to be passed through that
interface. The IOCTL handler casts this header to a particular
type that it expects
- All sanity testing on the past in structure is performed in the
generic ioctl infrastructure code.
- All ioctl handlers changed to take the header instead of a
particular structure type
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I144706a14293637cd5f381d2c020faa0e9c21f6b
Reviewed-on: http://review.whamcloud.com/8021
Tested-by: Jenkins
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Yang Sheng [Mon, 25 Aug 2014 03:58:57 +0000 (11:58 +0800)]
LU-4416 osd: using set_nlink to calm down WARN_ON message
Invoke inc_nlink with nlink=0 will trigger a WARN_ON
message. Flag I_LINKABLE can avoid this problem in
latest upstream kernel. But there exist a case that
WARN_ON was invoked and I_LINKABLE hasn't defined.
RHEL7 does so. Then we just using set_nlink in the
nlink=0 case.
Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I1e110872b359ebd7fa883506c5592f072f26c1dc
Reviewed-on: http://review.whamcloud.com/11571
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Mon, 18 Aug 2014 16:03:14 +0000 (11:03 -0500)]
LU-2675 lustre: remove linux/lustre_acl.h
Remove lustre/include/linux/lustre_acl.h. Include linux/xattr.h where
needed.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I1faec3acb8ff0b578a2f962a4601a44f17d65c41
Reviewed-on: http://review.whamcloud.com/11490
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Mon, 18 Aug 2014 15:48:47 +0000 (10:48 -0500)]
LU-2675 lustre: remove linux/lprocfs_status.h
Remove lustre/include/linux/lprocfs_status.h. Include linux/statfs.h
where needed.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4f5965fee6b82187a0f5f548843a2643b1438bc1
Reviewed-on: http://review.whamcloud.com/11489
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Li Wei [Fri, 25 Jul 2014 16:14:51 +0000 (00:14 +0800)]
LU-1279 kernel: Fix concurrent module loading deadlocks
Concurrently starting multiple OSTs on a single OSS frequently
triggers 30s deadlocks on module_mutex. This RHEL 6 kernel bug
applies to any module that results in additional request_module()
calls in its init callback. In Lustre, at least ptlrpc and libcfs are
affected. (RHEL 7 should have enough fixes in this area, but testing
needs to be done.) This patch adds a fix adapted from a number of
upstream commits to the RHEL 6 kernel patch series.
Change-Id: Ibdd384fd7622a0b4fcbf4cb5fdb864de87fcc25e
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/11229
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
John L. Hammond [Mon, 25 Aug 2014 16:44:48 +0000 (11:44 -0500)]
LU-5502 ofd: add a high level description of the OFD layer
Add a high-level description of the OFD module to
lustre/ofd/ofd_dev.c.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Id46a2d24ff2e2834e80e9d0f378ef2b515268bb3
Reviewed-on: http://review.whamcloud.com/11576
Tested-by: Jenkins
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Oleg Drokin [Thu, 28 Aug 2014 02:50:53 +0000 (22:50 -0400)]
LU-5424 tests: Disable sanity-sec test 4
With patch from LU-4367 the somewhat incorrect assumption
of the test about availability of dentry is broken and
the test always fails.
Disable the test for now until it's fixed one way or another.
Change-Id: I0759f4ab7022bc072bb2480eaba0b19ff0f5101d
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/11631
Tested-by: Jenkins
Minh Diep [Wed, 13 Aug 2014 16:32:32 +0000 (09:32 -0700)]
LU-5482 build: lustre-tests depends on attr, rsync, lsof, perl
Tests use get/setfattr and rsync but never require
to install attr, rsync, lsof, perl packages.
Test-Parameters: clientdistro=el7
Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: Id6ec7f88e6f9e71f6f1c97f2bd6a40a70f62f53c
Reviewed-on: http://review.whamcloud.com/11433
Tested-by: Jenkins
Reviewed-by: Brian J. Murrell <brian.murrell@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Fri, 15 Aug 2014 14:45:09 +0000 (09:45 -0500)]
LU-2675 build: assume __linux__ and __KERNEL__
Assume that __linux__is defined everywhere and that __KERNEL__ is
defined in most of lustre/.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ia04e7ed4c3ab3e8ca205e14eaa1824536aedd1e3
Reviewed-on: http://review.whamcloud.com/11437
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Tue, 22 Jul 2014 20:33:25 +0000 (15:33 -0500)]
LU-5396: use gfp_t for gfp mask, instead of (unsigned) int
This fixes sparse warnings, such as:
.../echo.c:670:62: warning: incorrect type in initializer
(different base types)
.../echo.c:670:62: expected int [signed] gfp_mask
.../echo.c:670:62: got restricted gfp_t
gfp_t was introduced in 2.6.14 (end of 2005), so that should
work all recent distributions.
Besides using gfp_t, there is no code change.
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I3464fb21c47c174c3a04cb512ce05e0e9de146fb
Reviewed-on: http://review.whamcloud.com/11200
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Fri, 1 Aug 2014 14:59:20 +0000 (09:59 -0500)]
LU-5396: obd: make local functions static
This decreases the code by 150 bytes.
Change-Id: I517c53fe65d0b80f509b903a704eac8ef9f1130a
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11305
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
frank zago [Sun, 13 Jul 2014 17:06:02 +0000 (12:06 -0500)]
LU-5396: add sparse locking annotations
Adds __acquires / __releases / __must_hold sparse locking annotations to
several functions.
Fixes sparse warnings such as:
libcfs/libcfs/hash.c:127:1: warning: context imbalance in 'cfs_hash_spin_lock'
- wrong count at exit
libcfs/libcfs/hash.c:133:1: warning: context imbalance in 'cfs_hash_spin_unlock'
- unexpected unlock
libcfs/libcfs/hash.c:141:9: warning: context imbalance in 'cfs_hash_rw_lock'
- wrong count at exit
include/linux/rwlock_api_smp.h:221:9: warning: context imbalance in
'cfs_hash_rw_unlock' - unexpected unlock
Change-Id: I91834ea62a2bc21ee853a80b0b266e3d0e960bc3
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11295
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Fri, 15 Aug 2014 19:42:41 +0000 (14:42 -0500)]
LU-5396: define __acquires, __releases and __must_hold for userspace
Some functions that are shared between userspace and the kernel have sparse
annotations. So we need to also define these annotations in userspace.
Change-Id: Ie1e55416554d3295ed61fe26615d5241faa53818
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11481
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
frank zago [Wed, 30 Jul 2014 03:11:04 +0000 (22:11 -0500)]
LU-5396: define __must_hold() for older kernels
Backport of sparse macro __must_hold(), introduced in linux kernel
commit
8529091e:
linux/compiler.h has macros to denote functions that acquire or release
locks, but not to denote functions called with a lock held that return
with the lock still held. Add a __must_hold macro to cover that case.
Change-Id: Ic77304a5f78f1681cfc48a728a12759366bb2cb8
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11294
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Mon, 28 Jul 2014 21:17:46 +0000 (16:17 -0500)]
LU-5396: o2ib: make local functions static
This reduces the code size by about 1KiB.
Change-Id: Ib01fe7b2b47f3add55a449473b304c8030758865
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11256
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Wed, 25 Jun 2014 16:40:41 +0000 (11:40 -0500)]
LU-5396: o2ib: make local functions static
This fixes sparse warnings such as:
.../o2iblnd.c:424:1: warning: symbol 'kiblnd_get_peer_info' was not declared.
Should it be static?
This reduces the code size by 400 bytes.
The body of "the_o2iblnd" was moved at the end of the file, to avoid
having to declare some static prototypes.
Change-Id: I160a2cda3db1a581e0a961e0368d8ee6fd2781fd
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11255
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Wed, 23 Jul 2014 20:53:07 +0000 (15:53 -0500)]
LU-5396: add spare annotation __user wherever needed
This fixes sparse warnings such as:
.../api-ni.c:1639:33: warning: incorrect type in argument 3
(different address spaces)
.../api-ni.c:1639:33: expected struct lnet_process_id_t
[noderef] [usertype] <asn:1>*ids
.../api-ni.c:1639:33: got struct lnet_process_id_t
[usertype] *<noident>
There is no code change.
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I683ce17935ce0cc76ce4b450bc3750b7bca4d9a8
Reviewed-on: http://review.whamcloud.com/11202
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Fri, 15 Aug 2014 19:06:34 +0000 (14:06 -0500)]
LU-5396: define __user for userspace
Structures used by ioctl are shared with the kernel. Some of these
structures will have the __user annotation for sparse, and thus __user
needs to be defined in user space
Change-Id: I2a4087a2faea0849cc816515c4aa9871699acf55
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11480
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Tue, 5 Aug 2014 16:10:10 +0000 (11:10 -0500)]
LU-5452 lmv: release request in lmv_revalidate_slaves()
In lmv_revalidate_slaves() ensure that the request returned by
md_intent_lock() is properly released on all paths.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I35d63b248d0c80261ebc97f43722b0c547eb1aac
Reviewed-on: http://review.whamcloud.com/11326
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Sat, 19 Jul 2014 14:21:10 +0000 (09:21 -0500)]
LU-5385: HSM: do not call the JSON log function if no log is open
llapi_hsm_log_ct_registration() and llapi_hsm_log_ct_progress() are
very expensive (fid2path+allocations). Do not let them do anything
if llapi_hsm_write_json_event() is going to discard the JSON record.
Make llapi_hsm_log_ct_registration() and llapi_hsm_log_ct_progress()
static too.
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ib10023968a1bd021694bca6338c0a962f58da19a
Reviewed-on: http://review.whamcloud.com/11164
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Mikhail Pershin [Sat, 19 Jul 2014 10:32:46 +0000 (14:32 +0400)]
LU-4975 ofd: documenting lprocfs_ofd.c
Fix up GPL header block to reference proper GPLv2 license URL.
Remove mention of contacting Sun, since they don't exist anymore.
Add introductory comment block for the lprocfs_ofd.c file and add
function comment blocks to all functions in it.
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I16d7e0d7be72339d91ddd7a50e5ac233397a5e45
Reviewed-on: http://review.whamcloud.com/11149
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Vitaly Fertman [Fri, 15 Aug 2014 11:09:39 +0000 (15:09 +0400)]
LU-5496 ldlm: granting the same lock twice on recovery
the previous fix was not correct, check for resend before removing
from resource, otherwise conflicts can be granted in parallel.
also, some minor cleanups.
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Change-Id: I461608878d40d6bba4e23179a7379de835d526c3
Reviewed-by: Andriy Skulysh <andriy_skulysh@xyratex.com>
Reviewed-by: Alexander Boyko <alexander_boyko@xyratex.com>
Tested-by: Alexander Lezhoev <alexander_lezhoev@xyratex.com>
Xyratex-bug-id: MRP-1944
Reviewed-on: http://review.whamcloud.com/11469
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
John L. Hammond [Fri, 22 Aug 2014 19:00:56 +0000 (14:00 -0500)]
LU-5502 osp: add a high-level description of the OSP module
Add a high-level description of the OSP module to
lustre/osp/osp_dev.c.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4dfcc3629032377aa7fa726b43fedb24680e5829
Reviewed-on: http://review.whamcloud.com/11562
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Isaac Huang [Tue, 22 Jul 2014 22:42:03 +0000 (16:42 -0600)]
LU-5391 osd-zfs: ZAP object block sizes too small
Currently osd-zfs ZAP objects use 4K for both leaf
and indirect blocks. This patch increases:
- leaf block to 16K, which equals ZFS fzap_default_block_shift
- indirect block to 16K, the default used by ZPL directories
Signed-off-by: Isaac Huang <he.huang@intel.com>
Change-Id: I5b476414d27822a14afb25e1307991fbd2e3a59e
Reviewed-on: http://review.whamcloud.com/11182
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Wed, 23 Jul 2014 23:38:53 +0000 (18:38 -0500)]
LU-5401 mdt: handle getattr errors in mdt_reint_open()
In mdt_reint_open() if mdt_attr_get_complex_fails() then bail out and
return an error.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I67c773b93100064e4cfdc82b4356424fd102c925
Reviewed-on: http://review.whamcloud.com/11210
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Wang Di [Wed, 13 Aug 2014 19:34:01 +0000 (12:34 -0700)]
LU-5312 ptlrpc: update export_last_request_time after recovery
Update export_last_request_time after recovery, because client can
not send ping to the server during the recovery, so we need refresh
the last_request_time to avoid the export is being evicted earlier
than it should be.
Change-Id: I1d9bf5aaa3ca4215479ddc942c7052ad047c6989
Signed-off-by: Wang Di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/11443
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Wang Shilong [Wed, 6 Aug 2014 06:16:42 +0000 (14:16 +0800)]
LU-5455 ptlrpc: fix magic return value of ptlrpc_init_portals
Previously, when running 'modprobe lustre', it hit the following
error message which is becaue of network initialisation failure:
modprobe: ERROR: could not insert 'lustre': Input/output error
However, error code is there, just let it return to caller,
after this patch, error message will be something like:
modprobe: ERROR: could not insert 'lustre': Network is down
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ib0c893052b4cd4f2e9b427fb4ce91da356065351
Reviewed-on: http://review.whamcloud.com/11337
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Wei Liu [Tue, 29 Jul 2014 04:32:30 +0000 (21:32 -0700)]
LU-5339 tests: Un-duplicate test_24 and skip if MDS is 2.5.2 or older
Un-duplicate test_24 and disable the test if MDS version
is 2.5.2 or older.
Change-Id: I3564c44a21bf149baf09294d42edd1529179736a
Signed-off-by: Wei Liu <wei3.liu@intel.com>
Reviewed-on: http://review.whamcloud.com/11262
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Mon, 18 Aug 2014 14:37:13 +0000 (10:37 -0400)]
LU-3963 libcfs: convert lod, mdt, and gss to linux list api
Move from the cfs_[h]list api to the native linux api for
the lod, mdt, and gss part of the ptlrpc layers.
Change-Id: Ieff231f3220a850521713a6f1c997b7e09130a4c
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/10387
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: frank zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Fri, 15 Aug 2014 16:32:35 +0000 (12:32 -0400)]
LU-3963 libcfs: remove invalid cfs_hlist_for_each_* checks
For newer kernels cfs_hlist_for_each_entry* changed the
number of arguments so in this case lustre still needs
wrappers. The HPDD script to test decrepit wrappers
incorrectly complains about cfs_hlist_for_* functions
that are required. Added some more wrappers to deal with
kernel api change that were missing before.
Change-Id: Idda94c147f2e3e9c138ee50c242604494041c244
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/11005
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Dmitry Eremin [Mon, 12 May 2014 15:06:12 +0000 (19:06 +0400)]
LU-4808 tests: fix check for sanity tests prerequisites
Check of prerequisites for tests that manipulate Lustre servers.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I17d0c90356a62976e1b5478588925862b588d7a2
Reviewed-on: http://review.whamcloud.com/10295
Tested-by: Jenkins
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Ned Bass [Mon, 7 Jul 2014 21:51:37 +0000 (17:51 -0400)]
LU-2182 llapi: implementation of new llapi_layout API
Add a new set of llapi routines for interacting with file layouts that
hide the details of the wire-protocol data structures from the user.
Define an opaque struct llapi_layout to abstract the layout of a
lustre file and generic accessor functions to read and write it. The
following documented functions are added the liblustreapi public
interface with accompanying man pages:
llapi_layout_alloc - allocate a new layout
llapi_layout_free - free memory allocated for a layout
llapi_layout_file_create - create new file with given layout
llapi_layout_file_open - open or create a file with given layout
llapi_layout_get_by_path - get file layout given a path
llapi_layout_get_by_fd - get file layout given a file descriptor
llapi_layout_get_by_fid - get file layout given a Lustre FID
llapi_layout_ost_index_get - get OST index associated with a stripe
llapi_layout_ost_index_set - set OST index associated with a stripe
llapi_layout_pattern_get - get RAID pattern of a layout
llapi_layout_pattern_set - set RAID pattern of a layout
llapi_layout_pool_name_get - get pool name of a layout
llapi_layout_pool_name_set - set pool name of a layout
llapi_layout_stripe_count_get - get stripe count of a layout
llapi_layout_stripe_count_set - set stripe count of a layout
llapi_layout_stripe_size_get - get stripe size of a layout
llapi_layout_stripe_size_set - set stripe size of a layout
The layouts are read and written using fgetxattr() and fsetxattr()
instead of ioctl() to make it easier for architectures like Blue Gene
to function-ship the system calls.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I35fb51055b6438ef3090f43c28a4083a66eaa907
Reviewed-on: http://review.whamcloud.com/5302
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Mon, 11 Aug 2014 23:24:07 +0000 (18:24 -0500)]
LU-5195: hsm: delete HSM records not found on disk
After a MDS crash, it is possible the file containing an hsm record is
not present anymore. When the MDS restarts, we get traces like this:
(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000:
failed to process HSM_ACTIONS llog (rc=-2)
(mdt_hsm_cdt_actions.c:104:cdt_llog_process())
Skipped 600 previous similar messages
(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000:
error opening log id 0x1c:1:0: rc = -2
(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages
(llog_cat.c:556:llog_cat_process_cb()) tas01-MDD0000:
cannot find handle for llog 0x1c:1: -2
(llog_cat.c:556:llog_cat_process_cb()) Skipped 600 previous similar messages
(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000:
failed to process HSM_ACTIONS llog (rc=-2)
(mdt_hsm_cdt_actions.c:104:cdt_llog_process())
Skipped 600 previous similar messages
(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000:
error opening log id 0x1c:1:0: rc = -2
(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages
No HSM operation can happen, and the only way to clean is to unmount
the MDT and delete hsm_actions.
If the record can't be found, let the MDS delete it instead.
Change-Id: I61f625d4c18750c8044909ff56d53042cf0b6d86
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11419
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ryan Haasken <haasken@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Tue, 1 Jul 2014 04:22:22 +0000 (12:22 +0800)]
LU-5466 lfsck: typo in lfsck_del_target
To handle MDT or OST target differently.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I8ede89935ca89774261efa466f753f78a09a50c2
Reviewed-on: http://review.whamcloud.com/11407
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Mon, 23 Jun 2014 22:02:23 +0000 (06:02 +0800)]
LU-5075 test: keep LFSCK fail_loc until recovery completed
In sanity-lfsck test_8, the old script may over wrote the fail_loc
before the recovery completed, that will cause the paused LFSCK to
be restarted automatically, then the subsequent manual command for
starting the LFSCK will get -EALREADY. That is unexpected.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I8500ecdccf354e83ec70be8dea88943cafa47d81
Reviewed-on: http://review.whamcloud.com/11288
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Fan Yong [Mon, 23 Jun 2014 22:01:52 +0000 (06:01 +0800)]
LU-4970 lfsck: flush async updating before exit
Otherwise, the test scripts may get some internal status by race.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I4af07cf91f6b6c77d5cab67fc0df21b27174ee4c
Reviewed-on: http://review.whamcloud.com/11276
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Fan Yong [Sun, 22 Jun 2014 06:34:04 +0000 (14:34 +0800)]
LU-5208 tests: inject failure on the proper OST
The old test scripts assumed that the file created by the MDS2 will
be striped to the OST2 and OST1, but such assumption is wrong. When
the OSTs reside on different OSS nodes, the sanity-lfsck test_18c
will get failure.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I755370e6a70384b5a70e046b1acc806e63817f6e
Reviewed-on: http://review.whamcloud.com/11275
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Niu Yawei [Wed, 14 May 2014 09:16:38 +0000 (05:16 -0400)]
LU-4976 osp: add comments for lwp_dev.c functions
Add introductory comment block for the lwp_dev.c file.
Add function comment blocks to all functions in the lwp_dev.c file.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I8a672982964a742d56b1d64e7b8cceb31aa64a48
Reviewed-on: http://review.whamcloud.com/10335
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
John L. Hammond [Thu, 21 Aug 2014 21:24:53 +0000 (16:24 -0500)]
LU-5502 lod: add a high-level description of the LOD layer
Add a high-level description of the LOD layer to lustre/lod/lod_dev.c.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I903394e96a2792683351d741614e2d326e77879d
Reviewed-on: http://review.whamcloud.com/11551
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Richard Henwood <richard.henwood@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
James Simmons [Tue, 19 Aug 2014 14:00:03 +0000 (10:00 -0400)]
LU-3963 ldlm: convert to linux list api
Move from the cfs_[h]list api to the native linux api for
all the code related to ldlm.
Change-Id: Ibedd3870c530318dc4cba6a27dfb7005e1961ece
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Reviewed-on: http://review.whamcloud.com/10945
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Dmitry Eremin [Fri, 8 Aug 2014 12:42:50 +0000 (16:42 +0400)]
LU-5417 lfs: fix comparison between signed and unsigned
Expression if (size != (ssize_t)size) is always false.
Therefore no bounds check errors detected.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ib8fc68b2ddbb50ad50d80aa391c9bed6308ea575
Reviewed-on: http://review.whamcloud.com/11376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Bruno Faccini [Fri, 18 Jul 2014 11:34:33 +0000 (13:34 +0200)]
LU-5299 obdclass: avoid race during Server device start
Handle concurrent starts for same device (multiple mounts, ...).
But allows for separate nosvc and nomgs case.
Also add a specific test of concurent MDT/OST start with an
artificial delay to verify.
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I442819a5b865ed3e98477f9d2602efc4d09d7860
Reviewed-on: http://review.whamcloud.com/11139
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Bruno Faccini [Thu, 26 Jun 2014 09:03:52 +0000 (11:03 +0200)]
LU-5042 ldlm: delay filling resource's LVB upon replay
This patch is an attempt to delay unnecessary filling+resend of
resource's LVB upon replay after Server reboot.
This should avoid recovery to take a very long time when
replaying a huge number of locks and due to all associated LVBs
beeing read from disk. Now resource's LVB is only read upon need
to be sent to a new Client.
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I20bd20bce328953c46accb4b41dcba776f3608a6
Reviewed-on: http://review.whamcloud.com/10845
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Frank Zago [Thu, 24 Jul 2014 16:27:22 +0000 (11:27 -0500)]
LU-5406: liblustre: remove \n from some llapi_error strings
Some strings passed to llapi_error() contain a \n. But llapi_error's
callback, error_callback_default(), adds one too. So fix these strings.
Converted a few spaces to tabs.
Change-Id: I0e639bd52813ac672c677e7c10713f211ea0888a
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11214
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Liang Zhen [Tue, 12 Aug 2014 14:36:42 +0000 (22:36 +0800)]
LU-5428 libcfs: false alarm of libcfs watchdog
lc_watchdog_disable will not delete kernel timer, benefit of this
is no overhead of del_timer in each serivce loop, however, there
is a corner case that lc_watchdog_touch can race with lcw_cb when
service thread is actually idle:
1. service thread armed a timer before processing request, after
it handled request, it called lc_watchdog_disable() and set
lc_watchdog as disabled, but kernel timer is still outstanding.
2. service thread slept for many seconds because there is no
request to handle
3. it's waken up by incoming request
4. it called lc_watchdog_touch and set watchdog status to enabled.
now because timer is still alive, if timer is expired right
before cfs_timer_arm(), then lcw_cb will set a disabled watchdog
as "expired", which is wrong.
5. the next call of lc_watchdog_disable will complain for expired
watchdog.
There are two options to fix this issue:
1. always del_timer in lc_watchdog_disable, however, it may increase
system overhead when service threads are busy.
2. set watchdog as "enabled" after cfs_timer_arm, so there is no
window for lcw_cb to see false status of watchdog.
This patch chooses the second approach to fix the problem
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: Iac24082e2d63de8330285cf243ed585da6524ab9
Reviewed-on: http://review.whamcloud.com/11415
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Dmitry Eremin [Mon, 4 Aug 2014 08:41:14 +0000 (12:41 +0400)]
LU-5417 obdclass: fix comparison between signed and unsigned
Make lu_buf->lb_len unsigned.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: I769b0a21aabb5466096e5f3e1aba4e96bcf64a6b
Reviewed-on: http://review.whamcloud.com/11281
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>