Whamcloud - gitweb
grev [Sat, 6 Dec 2008 00:56:52 +0000 (00:56 +0000)]
b=17661
i=Brian
run mpi tests as MPI_USER
yury [Fri, 5 Dec 2008 11:43:13 +0000 (11:43 +0000)]
b=17758
r=shadow,johann
- Do NOT call server_deregister_mount() here. This leads to
inability cleanup cleanly and free lsi and other stuff when
mgs calls server_put_mount() in error handling case.
adilger [Thu, 4 Dec 2008 18:43:06 +0000 (18:43 +0000)]
Branch b1_6
Changing build version.
grev [Thu, 4 Dec 2008 17:48:11 +0000 (17:48 +0000)]
b=17747
i=Nathan
run_one: print PASS/FAIL depending on new TEST_FAILED var
vitaly [Wed, 3 Dec 2008 18:44:59 +0000 (18:44 +0000)]
Branch b1_6
b=17644
i=green
i=adilger
send 1 extra rpc in flight if this is a high priority request
grev [Wed, 3 Dec 2008 18:37:17 +0000 (18:37 +0000)]
b=17853
i=Adilger
check_config fix for NETTYPE=ptl
vitaly [Wed, 3 Dec 2008 17:18:08 +0000 (17:18 +0000)]
Branch b1_6
b=17748
i=grev
i=adilger
the sanityN test issue is fixed
anserper [Wed, 3 Dec 2008 16:55:24 +0000 (16:55 +0000)]
Branch b1_6
b=17770
i=Elena Gryaznova
move the cleanup/setup test to the end of the series
anserper [Wed, 3 Dec 2008 16:55:22 +0000 (16:55 +0000)]
Branch b1_6
i=Elena Gryaznova
avoid using quota_usr/quota_2usr groups
anserper [Wed, 3 Dec 2008 16:33:25 +0000 (16:33 +0000)]
Branch b1_6
b=17371
i=Andreas Dilger
move the cleanup/setup test to the end of the series
zhanghc [Wed, 3 Dec 2008 15:50:10 +0000 (15:50 +0000)]
b=16432
fix mgs_setparam, which will return -EINVAL when param
related to llite(PARAM_LLITE) is set by MDT or OST
i=johann
i=nathan.rutman
grev [Wed, 3 Dec 2008 15:48:19 +0000 (15:48 +0000)]
b=13584
i=Scjody
test_99a fix: use $TMP as working dir
zhanghc [Wed, 3 Dec 2008 15:39:17 +0000 (15:39 +0000)]
b=16432
fix mgs_setparam, which will return -EINVAL when param
related to llite is set by MDT or OST
i=johann
i=nathan.rutman
shadow [Wed, 3 Dec 2008 07:26:58 +0000 (07:26 +0000)]
shrink LOV EAs before replying
Branch b1_6
b=16693
i=shadow
i=johann
shadow [Wed, 3 Dec 2008 04:33:49 +0000 (04:33 +0000)]
don't resend llog cancels,
fix resend requests for ldlm imports.
Branch b1_6
b=17695
i=umka
i=tappro
yury [Tue, 2 Dec 2008 12:20:22 +0000 (12:20 +0000)]
b=17813
r=shadow
- take into account the limit on the host because the higher limit the longer it takes to kill some locks.
grev [Mon, 1 Dec 2008 21:27:37 +0000 (21:27 +0000)]
b=16897
i=Adilger
i=Tappro
test_70b fix: use do_nodes instead of loop to run rundbench
grev [Mon, 1 Dec 2008 09:30:44 +0000 (09:30 +0000)]
b=17696
i=Adilger
skip liblustre for different client/mds versions
tianzy [Mon, 1 Dec 2008 03:52:08 +0000 (03:52 +0000)]
Branch b1_6
fix an error in the test_18b of sanity-quota.sh
b=17832
i=tianzy
zhanghc [Sun, 30 Nov 2008 15:44:59 +0000 (15:44 +0000)]
branch=b1_6
b=17031
during refreshing locks waiting its I/O to complete,
take current service time into account, not only using
the timeout gotten by ldlm_get_enqueue_timeout
i=Andreas
i=Nathan.Rutman
grev [Fri, 28 Nov 2008 18:42:53 +0000 (18:42 +0000)]
b=17761
i=Adilger
i=Johann
test_6 fix
anserper [Fri, 28 Nov 2008 10:17:29 +0000 (10:17 +0000)]
Branch b1_6
b=17772
i=Johann Lombardi
i=ZhiYong Tian
lov_quota_check shall return an error when a target goes offline
tianzy [Fri, 28 Nov 2008 07:01:06 +0000 (07:01 +0000)]
Branch b1_6
let dqacq_in_flight() hold lock for qunit
b=16890
i=johann
i=panda
johann [Thu, 27 Nov 2008 10:56:24 +0000 (10:56 +0000)]
Branch b1_6
b=12596
i=grev
i=adilger
check striping after setstripe in recovery-small test 18*.
move get_stripe_info() to t-f.
adilger [Thu, 27 Nov 2008 05:33:11 +0000 (05:33 +0000)]
Branch b1_6
If an expected error is returned from llapi_ping() for an inactive device
print a more useful message.
b=16208
yangsheng [Wed, 26 Nov 2008 11:10:01 +0000 (11:10 +0000)]
Branch b1_6
b=17153
i=johann, adilger
Remove 2.4 compatibility.
tianzy [Wed, 26 Nov 2008 08:23:29 +0000 (08:23 +0000)]
Branch b1_6
fix the false qunit_put in qctxt_wait_pending_dqacq()
b=17794
i=tianzy
i=panda
jxiong [Wed, 26 Nov 2008 07:15:54 +0000 (07:15 +0000)]
Restore the changes I made yesterday which broke build because I used a stale kernel.
anserper [Tue, 25 Nov 2008 23:04:43 +0000 (23:04 +0000)]
Branch b1_6
b=17371
i=Elena
fail each time cleanup/setup went wrong
adilger [Tue, 25 Nov 2008 21:42:38 +0000 (21:42 +0000)]
Cleanup ChangeLog comments.
grev [Tue, 25 Nov 2008 20:13:26 +0000 (20:13 +0000)]
b=17326
i=Brian
load_modules fn modprobe.d fix
grev [Tue, 25 Nov 2008 19:27:18 +0000 (19:27 +0000)]
b=17477
i=Huang Hua
check_config fn default network type fix
yangsheng [Tue, 25 Nov 2008 12:31:56 +0000 (12:31 +0000)]
Branch b1_6
b=17630
Add a comment to explain the change.
vs [Tue, 25 Nov 2008 11:24:04 +0000 (11:24 +0000)]
Branch b1_666666
b=17359
i=adilger,bzzz
use time obtained from a client to update inode timestamps on mds
mds_reint_link, mds_reint_unlink and mds_reint_rename updated
inode timestamps with local server time
johann [Tue, 25 Nov 2008 09:53:41 +0000 (09:53 +0000)]
Branch b1_6
i=umka
i=panda
b=17611
don't override lcm->lcm_name
yury [Tue, 25 Nov 2008 08:38:36 +0000 (08:38 +0000)]
- make margin 10 sec more in 124a to let client chance to kill some locks
shadow [Tue, 25 Nov 2008 07:54:03 +0000 (07:54 +0000)]
revert on chunk from patch, due startup race.
Branch b1_6
b=16492
tianzy [Tue, 25 Nov 2008 05:59:05 +0000 (05:59 +0000)]
Branch b1_6
fix "should take longer" problem of test_18 of sanity-quota.sh
b=17773
i=johann
i=panda
tianzy [Tue, 25 Nov 2008 05:52:22 +0000 (05:52 +0000)]
Branch b1_6
change target_handle_dqacq_callback() error handling
b=16890
i=johann
i=panda
jxiong [Tue, 25 Nov 2008 03:53:38 +0000 (03:53 +0000)]
Fixed the raid5 patches.
- rebuild policy for rhel5 .21 kernel
- soft lockups fixed
b=17084
r=adilger,jay
yangsheng [Tue, 25 Nov 2008 03:11:08 +0000 (03:11 +0000)]
Branch b1_6
b=17786
i=adilger, huanghua
Initialize the request.
grev [Mon, 24 Nov 2008 22:22:27 +0000 (22:22 +0000)]
b=17747
i=Tappro
FAIL_ON_ERROR=false fix: force suits exit 1 if some tests failed
yury [Mon, 24 Nov 2008 15:13:11 +0000 (15:13 +0000)]
b=17631
- fix previous wrong commit in part related to changes in ptlrpc_abort_bulk()
shadow [Mon, 24 Nov 2008 12:20:11 +0000 (12:20 +0000)]
Drop slow OSCs if we can, but not for requested start idx.
This means "if OSC is slow and it is not the requested
start OST, then it can be skipped, otherwise skip it only
if it is inactive/recovering/out-of-space.
Branch b1_6
b=16081
i=shadow
i=green
deshmukh [Mon, 24 Nov 2008 07:28:21 +0000 (07:28 +0000)]
Fixes related to mount failure path cleanup
b=17752
i=umka
i=shadow
yangsheng [Mon, 24 Nov 2008 03:51:49 +0000 (03:51 +0000)]
Branch b1_6
b=17630
i=green, adilger
Disable NFS export when the THREAD_SIZE < 8192.
yury [Sun, 23 Nov 2008 20:38:31 +0000 (20:38 +0000)]
b=17631
r=panda,shadow
- fixes possible sync long bulk unlink in ptlrpcd which would lead to assertion in forced umount time. Basically the fix is identical to 17310 where we move req to special phase UNREGISTERING and go processing other rpcs until bulk unlink is done;
- in sync bulk and reply unlink we check for wakeup condition every 1 sec to act quckly if unlink come instead of doing it every 20 sec as before.
yury [Sun, 23 Nov 2008 12:40:34 +0000 (12:40 +0000)]
b=17750
r=shadow,deen
- fixes writing cookie beyond of llcd boundaries.
yury [Sun, 23 Nov 2008 12:32:38 +0000 (12:32 +0000)]
- commit missed bit from previous commit.
yury [Sun, 23 Nov 2008 12:14:50 +0000 (12:14 +0000)]
b=17690
r=shadow
- fixes in replay-single.sh test_59b
yury [Sun, 23 Nov 2008 11:54:00 +0000 (11:54 +0000)]
b=17751
r=grev
- fixes and cleanups in test_124a from sanity.sh
grev [Fri, 21 Nov 2008 21:54:16 +0000 (21:54 +0000)]
b=17735
i=Yury.Umanets
check_mem_leak fn fix: use echo instead of log fn
yangsheng [Fri, 21 Nov 2008 16:20:31 +0000 (16:20 +0000)]
Branch b1_6
b=17201
i=shadow, bobijam
Update to RHEL5 kernel-2.6.18-92.1.17.el5.
yangsheng [Fri, 21 Nov 2008 15:27:42 +0000 (15:27 +0000)]
Branch b1_6
b=16208
i=adilger, johann
Add utility for showing mounted hosts
vs [Thu, 20 Nov 2008 23:07:38 +0000 (23:07 +0000)]
Branch b1_6
b=17132
i=adilger
Use raid5/6 rhel5 improvements
vitaly [Thu, 20 Nov 2008 21:26:57 +0000 (21:26 +0000)]
Branch b1_6
b=16129
i=adilger
i=green
- a high priority request list is added into service;
- once a lock is canceled, all the IO requests, including coming
ones, under this lock, are moved into this list;
- PING is also added into this list;
- once a lock cancel timeout occurs, the timeout is prolonged
if there is an IO rpc under this lock;
- another request list is added into the export, used to speed up
the rpc-lock matching.
fanyong [Thu, 20 Nov 2008 05:55:07 +0000 (05:55 +0000)]
Branch b1_6
b=16947
i=h.huang
i=yury.umanets
Hold lli_lock when access lli_sai to prevent NULL pointer.
anserper [Thu, 20 Nov 2008 01:00:20 +0000 (01:00 +0000)]
Branch b1_6
b=17371
i=Johann Lombardi
testcase for 17371
grev [Wed, 19 Nov 2008 19:00:07 +0000 (19:00 +0000)]
b=17477
i=Yury.Umanets
run acc-sm:formatall() only if forced
grev [Wed, 19 Nov 2008 18:02:08 +0000 (18:02 +0000)]
b=17477
i=Adilger
force replay-dual to check and mount MOUNT2
yury [Wed, 19 Nov 2008 09:14:20 +0000 (09:14 +0000)]
b=17686
r=panda,shadow
- fixes race in ptlrpcd which leads to busy import and obd;
- cleanups and debugs in llcd code.
grev [Wed, 19 Nov 2008 08:52:15 +0000 (08:52 +0000)]
b=17653
i=Adilger
test_21c fix: restore config
shadow [Wed, 19 Nov 2008 06:02:20 +0000 (06:02 +0000)]
fix handle ost additional correctly
Branch b1_6
b=16492
i=umka
i=tappro
bobijam [Wed, 19 Nov 2008 01:39:07 +0000 (01:39 +0000)]
Branch b1_6
b=16992
o=johann
i=oleg.drokin (green)
i=zhenyu.xu (bobijam)
During ll_intent_lock(), server looks up parent and child, lock them, between these events parent could be deleted, then vfs_create may_access() fails with -ENOENT.
Then client intent disposition got DISP_OPEN_CREATE | DISP_LOOKUP_NEG | DISP_LOOKUP_EXECD | DISP_IT_EXECD, and the request got double free.
Solution: Clear DISP_ENQ_COMPLETE when we are going to release the intent (request cannot be reused anyway)
anserper [Tue, 18 Nov 2008 03:43:31 +0000 (03:43 +0000)]
Branch b1_6
b=12433
i=Oleg Drokin
i=Yury Umanets
fix the message about imp_inval
tianzy [Mon, 17 Nov 2008 06:47:30 +0000 (06:47 +0000)]
Branch b1_6
decay qos ost/oss penalties if MDS is not creating objects
i=nathan
i=johann
tianzy [Mon, 17 Nov 2008 06:27:29 +0000 (06:27 +0000)]
Branch b1_6
fix lov_brw_check() calls lov_stripe_intersects() with incorrect parameter.
written by nikita
tianzy [Mon, 17 Nov 2008 06:19:11 +0000 (06:19 +0000)]
Branch b1_6
fix the error handling on quota slaves
i=johann
i=panda
adilger [Sat, 15 Nov 2008 08:34:09 +0000 (08:34 +0000)]
Branch b1_6
Remove trailing whitespace.
grev [Fri, 14 Nov 2008 18:51:39 +0000 (18:51 +0000)]
b=16488
i=Oleg.Drokin
RACER acc-sm test suit
grev [Fri, 14 Nov 2008 10:30:09 +0000 (10:30 +0000)]
b=17122
i=Adilger
skip sanity test_100 for NETTYPE != tcp
yury [Thu, 13 Nov 2008 09:04:45 +0000 (09:04 +0000)]
b=17479
r=adilger,behlendorf1
- avoid div/mod in lustre_hash code
tianzy [Thu, 13 Nov 2008 08:09:26 +0000 (08:09 +0000)]
Branch b1_6
fix lquota.ko fails to install with --disable-liblustre used
b=17620
i=johann
i=brian
green [Thu, 13 Nov 2008 03:12:23 +0000 (03:12 +0000)]
b=16823
r=shadow,adilger
Lift 4G limit on stripe_size*stripe_count
4G limit on stripe_size remains in place, though.
grev [Wed, 12 Nov 2008 21:08:20 +0000 (21:08 +0000)]
b=17634
i=Yury.Umanets
insanity cleanup (remove dup fn, sync with HEAD t-f)
yury [Wed, 12 Nov 2008 18:43:33 +0000 (18:43 +0000)]
b=17310
r=shadow,vitaly
- correct check for phase in ptlrpc_expired_set() and couple of other places.
grev [Wed, 12 Nov 2008 16:08:27 +0000 (16:08 +0000)]
b=16488
i=Oleg.Drokin
new runracer script
yury [Wed, 12 Nov 2008 15:40:17 +0000 (15:40 +0000)]
b=17037
r=tappro,wangdi
- fixes ost cleanup issue due to missed llcd_put() in the case ost does not receive disconnect from mds;
- do not sleep on hanging llcd. Instead assert on it _after_ stopping recov_thread's ptlrpcd which should kill any remeining llcds;
- fixes and cleanups, comments.
grev [Wed, 12 Nov 2008 11:11:20 +0000 (11:11 +0000)]
b=17555
i=Adilger
use current config instead of reformat fs to have a single ost
zhanghc [Wed, 12 Nov 2008 02:57:47 +0000 (02:57 +0000)]
branch b1_6
b=17505
remove "mfd" from "cloing_list" for the "mfd" will be freed in mds_mfd_close
i=robert.read
huanghua [Wed, 12 Nov 2008 02:52:18 +0000 (02:52 +0000)]
Branch b1_6
b=17602
i=yury.umanets
i=tappro
use 1.8/2.0 compatible MDT config for 1.6 mds, easy to upgrade.
grev [Tue, 11 Nov 2008 21:54:05 +0000 (21:54 +0000)]
b=16551
i=Adilger
conf-sanity test_32* fix to not be skipped for remote setup
yangsheng [Tue, 11 Nov 2008 06:12:49 +0000 (06:12 +0000)]
Branch b1_6
b=17374
i=shadow, bobijam
kernel update for sles9 2.6.5-7.314.
zhanghc [Tue, 11 Nov 2008 06:11:35 +0000 (06:11 +0000)]
branch b1_6
b=17176
fixed a bug in 14774 patch -- compare peer's nid instead of self's nid
in ptlrpc_connection during select failover MDS/OST nodes
i=deen
yangsheng [Tue, 11 Nov 2008 05:34:59 +0000 (05:34 +0000)]
Branch b1_6
b=17458
i=shadow, bobijam
Update kernel to SLES10 SP2 2.6.16.60-0.31.
zhanghc [Tue, 11 Nov 2008 02:50:51 +0000 (02:50 +0000)]
b=17495
move the check of recovering state of the OST in osc_precreate
out of "if (oscc->oscc_last_id < oscc->oscc_next_id)" condition
so create operation don't use recovering OST
i=adilger
i=nathan.rutman
shadow [Sat, 8 Nov 2008 16:05:19 +0000 (16:05 +0000)]
don't panic on nfs reexport.
Branch b1_6
b=16492
i=green
i=johann
yury [Sat, 8 Nov 2008 10:32:30 +0000 (10:32 +0000)]
b=17310
r=shadow,johann
- make sure that no new inflight rpcs may come after ptlrpcd_deactivate_import() for both
synchronous and asynchronous sending. To do so we make sure that imp_inflight++ is done only when
permission is granted by ptlrpc_import_delay_req() which makes decision should req be sent,
deferred or killed as import is not in the state to send it in observable future. For async
sending, rpc is only counted inflight when its added to sending or delaying list instead of just
adding it to set for processing.
This fixes assert in ptlrpc_invalidate_import() and as number of other issues;
- synchronize imp_inflight and the presence on sending or delaying list for ptlrpc_queue_wait()
case. So that, now it is guaranteed that if imp_inflight != 0 we may always find hanging rpc either
in sending or in delaying list;
- make sure that in ptlrcp_queue_wait() we remove rpc from sending or delaying list and dec
inflight only after ptlrpc_unregister_reply() is done. This way we make sure that accounting is
correct. Rpc can't be returned to the pool or counted finished until lnet lets us go with finished
reply unlink;
- check for inflight and rq_list in pinger;
- comments, cleanups;
grev [Fri, 7 Nov 2008 20:52:49 +0000 (20:52 +0000)]
b=17477
i=Adilger
replace cleanup_and_setup_lustre fn by check_and_setup_lustre fn
yury [Fri, 7 Nov 2008 20:39:21 +0000 (20:39 +0000)]
b=17511
r=adilger,johann
- removes deadlock possibility by disabling rehash in hash_del() operations and moving hash_add()
out of spin_locks when calling. Hash table has own mechanisms for protecting its structures and it
also has hash_add_unique() method for using in concurrent run contexts;
- fixed missed lh_put() in hash_add_unique() which led to extra refs in some cases (extra ref to
export) and inability to cleanup;
- fixed __lustre_hash_set_theta() which set @max theta into ->lh_min_theta;
- in lustre_hash_rehash_size() disable rehash also for the case when new and old hash sizes equal
in corner cases (max_size or min_size). Before this fix it could be possible to do needless
rehashes when size is actually did not change but we do this expensive operation;
- disable rehash in hash_add_unique() if no actual add happened since entry with the same key is
already found in the table;
- some cleanups in hash table code;
grev [Fri, 7 Nov 2008 17:20:01 +0000 (17:20 +0000)]
b=17477
i=Adilger
check config if lustre is mounted before acc-sm run
grev [Fri, 7 Nov 2008 16:38:41 +0000 (16:38 +0000)]
b=14384
i=Brian
assert_DIR cleanup
yury [Fri, 7 Nov 2008 13:18:17 +0000 (13:18 +0000)]
b=17445
r=tappro,johann
- implements proper locking for rq pool freeing
johann [Fri, 7 Nov 2008 10:31:56 +0000 (10:31 +0000)]
Branch b1_6
b=16860
i=nathan
i=rread
Description: Excessive recovery window
Details : With AT enabled, the recovery window can be excessively long (6000+
seconds). To address this problem, we no longer use
OBD_RECOVERY_FACTOR when extending the recovery window (the connect
timeout no longer depends on the service time, it is set to
INITIAL_CONNECT_TIMEOUT now) and clients report the old service
time via pb_service_time.
bobijam [Fri, 7 Nov 2008 03:20:42 +0000 (03:20 +0000)]
Branch b1_6
b=16578
o=adilger
A faster way to get long string.
yury [Thu, 6 Nov 2008 13:39:20 +0000 (13:39 +0000)]
b=17310
- make sure that rpcs in RQ_PHASE_UNREGISTERING phase can be marked expired and interrupted.
yury [Thu, 6 Nov 2008 12:12:12 +0000 (12:12 +0000)]
b=17310
r=johann,shadow
- fixes ptlrpcd blocking on very long reply unlink waiting. To do so new rpc phase introduced
RQ_PHASE_UNREGISTERING in which request stay until we have reply_in_callback() called by lnet
signaling that reply is unlinked. All requests in this state are skipped in processing by prlrcd
instead of waiting n * 300s on each of them. This allows ptlrpcd to process other rpcs in the set;
- make sure that inflight count is coherent with being present on sending or delay list. That is,
if we see inflight != 0, rpc must be on one of these lists. This is very helpful in
ptlrpc_invalidate_import() to show all rpcs still waiting after invalidating import;
- in ptlrpc_invalidate_import() wait maximal rq_deadline - now from all inflight rpcs instead of
obd_timeout which may be much longer. If calculated timeout is 0, obd_timeout is used. This fixes
the issue that rq_deadline - now > obd_timeout (very easy to see in logs) which led to inflight !=
0 assert because inflight rpcs timed out later than our wait period is finished;
- in ptlrpc_invalidate_import() wait forever for rpcs in UNREGISTERING phase. Check in assert for
inflight == 0 for wait timed out case if no rpcs in UNREGISTERING phase. Only those in
UNREGISTERING phase are allowed to stay longer than obd_timeout;
- added ptlrpc_move_rqphase() function. All phase changes go through it. Add debug_req() there to
track down all phase changes;
- conf_sanity.sh test_45 added to emulate very long reply unlink and also situation when
rq_deadline - now > obd_timeout;
- do not wait forever in ptlrpc_unregister_reply() for async case (using it from sets). sync case
left unchanged;
- make sure that ptlrpc_set_next_timeout() yields 1s timeout (instead of 0s) for the set with rpcs
in "unregistering" stage to prevent ptlrpcd from sleeping forever and hanging in test_45;
- in ptlrpcd() make sure that we do not sleep on 0 timeout.
anserper [Wed, 5 Nov 2008 22:31:06 +0000 (22:31 +0000)]
Branch b1_6
b=17371
i=Johann Lombardi
i=Oleg Drokin
fix a race between requeue thread processing and umount
grev [Wed, 5 Nov 2008 17:47:14 +0000 (17:47 +0000)]
b=16551
i\Adilger
correct remote_[mds|ost] fn to work correctly on configuration
with several MDS/OSS nodes
kalpak [Wed, 5 Nov 2008 09:13:08 +0000 (09:13 +0000)]
b=16438
i=adilger
i=girish
Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems. This can be over-riden with "bigendian_extents" mount option.