Whamcloud - gitweb
zhanghc [Tue, 11 Nov 2008 06:11:35 +0000 (06:11 +0000)]
branch b1_6
b=17176
fixed a bug in 14774 patch -- compare peer's nid instead of self's nid
in ptlrpc_connection during select failover MDS/OST nodes
i=deen
yangsheng [Tue, 11 Nov 2008 05:34:59 +0000 (05:34 +0000)]
Branch b1_6
b=17458
i=shadow, bobijam
Update kernel to SLES10 SP2 2.6.16.60-0.31.
zhanghc [Tue, 11 Nov 2008 02:50:51 +0000 (02:50 +0000)]
b=17495
move the check of recovering state of the OST in osc_precreate
out of "if (oscc->oscc_last_id < oscc->oscc_next_id)" condition
so create operation don't use recovering OST
i=adilger
i=nathan.rutman
shadow [Sat, 8 Nov 2008 16:05:19 +0000 (16:05 +0000)]
don't panic on nfs reexport.
Branch b1_6
b=16492
i=green
i=johann
yury [Sat, 8 Nov 2008 10:32:30 +0000 (10:32 +0000)]
b=17310
r=shadow,johann
- make sure that no new inflight rpcs may come after ptlrpcd_deactivate_import() for both
synchronous and asynchronous sending. To do so we make sure that imp_inflight++ is done only when
permission is granted by ptlrpc_import_delay_req() which makes decision should req be sent,
deferred or killed as import is not in the state to send it in observable future. For async
sending, rpc is only counted inflight when its added to sending or delaying list instead of just
adding it to set for processing.
This fixes assert in ptlrpc_invalidate_import() and as number of other issues;
- synchronize imp_inflight and the presence on sending or delaying list for ptlrpc_queue_wait()
case. So that, now it is guaranteed that if imp_inflight != 0 we may always find hanging rpc either
in sending or in delaying list;
- make sure that in ptlrcp_queue_wait() we remove rpc from sending or delaying list and dec
inflight only after ptlrpc_unregister_reply() is done. This way we make sure that accounting is
correct. Rpc can't be returned to the pool or counted finished until lnet lets us go with finished
reply unlink;
- check for inflight and rq_list in pinger;
- comments, cleanups;
grev [Fri, 7 Nov 2008 20:52:49 +0000 (20:52 +0000)]
b=17477
i=Adilger
replace cleanup_and_setup_lustre fn by check_and_setup_lustre fn
yury [Fri, 7 Nov 2008 20:39:21 +0000 (20:39 +0000)]
b=17511
r=adilger,johann
- removes deadlock possibility by disabling rehash in hash_del() operations and moving hash_add()
out of spin_locks when calling. Hash table has own mechanisms for protecting its structures and it
also has hash_add_unique() method for using in concurrent run contexts;
- fixed missed lh_put() in hash_add_unique() which led to extra refs in some cases (extra ref to
export) and inability to cleanup;
- fixed __lustre_hash_set_theta() which set @max theta into ->lh_min_theta;
- in lustre_hash_rehash_size() disable rehash also for the case when new and old hash sizes equal
in corner cases (max_size or min_size). Before this fix it could be possible to do needless
rehashes when size is actually did not change but we do this expensive operation;
- disable rehash in hash_add_unique() if no actual add happened since entry with the same key is
already found in the table;
- some cleanups in hash table code;
grev [Fri, 7 Nov 2008 17:20:01 +0000 (17:20 +0000)]
b=17477
i=Adilger
check config if lustre is mounted before acc-sm run
grev [Fri, 7 Nov 2008 16:38:41 +0000 (16:38 +0000)]
b=14384
i=Brian
assert_DIR cleanup
yury [Fri, 7 Nov 2008 13:18:17 +0000 (13:18 +0000)]
b=17445
r=tappro,johann
- implements proper locking for rq pool freeing
johann [Fri, 7 Nov 2008 10:31:56 +0000 (10:31 +0000)]
Branch b1_6
b=16860
i=nathan
i=rread
Description: Excessive recovery window
Details : With AT enabled, the recovery window can be excessively long (6000+
seconds). To address this problem, we no longer use
OBD_RECOVERY_FACTOR when extending the recovery window (the connect
timeout no longer depends on the service time, it is set to
INITIAL_CONNECT_TIMEOUT now) and clients report the old service
time via pb_service_time.
bobijam [Fri, 7 Nov 2008 03:20:42 +0000 (03:20 +0000)]
Branch b1_6
b=16578
o=adilger
A faster way to get long string.
yury [Thu, 6 Nov 2008 13:39:20 +0000 (13:39 +0000)]
b=17310
- make sure that rpcs in RQ_PHASE_UNREGISTERING phase can be marked expired and interrupted.
yury [Thu, 6 Nov 2008 12:12:12 +0000 (12:12 +0000)]
b=17310
r=johann,shadow
- fixes ptlrpcd blocking on very long reply unlink waiting. To do so new rpc phase introduced
RQ_PHASE_UNREGISTERING in which request stay until we have reply_in_callback() called by lnet
signaling that reply is unlinked. All requests in this state are skipped in processing by prlrcd
instead of waiting n * 300s on each of them. This allows ptlrpcd to process other rpcs in the set;
- make sure that inflight count is coherent with being present on sending or delay list. That is,
if we see inflight != 0, rpc must be on one of these lists. This is very helpful in
ptlrpc_invalidate_import() to show all rpcs still waiting after invalidating import;
- in ptlrpc_invalidate_import() wait maximal rq_deadline - now from all inflight rpcs instead of
obd_timeout which may be much longer. If calculated timeout is 0, obd_timeout is used. This fixes
the issue that rq_deadline - now > obd_timeout (very easy to see in logs) which led to inflight !=
0 assert because inflight rpcs timed out later than our wait period is finished;
- in ptlrpc_invalidate_import() wait forever for rpcs in UNREGISTERING phase. Check in assert for
inflight == 0 for wait timed out case if no rpcs in UNREGISTERING phase. Only those in
UNREGISTERING phase are allowed to stay longer than obd_timeout;
- added ptlrpc_move_rqphase() function. All phase changes go through it. Add debug_req() there to
track down all phase changes;
- conf_sanity.sh test_45 added to emulate very long reply unlink and also situation when
rq_deadline - now > obd_timeout;
- do not wait forever in ptlrpc_unregister_reply() for async case (using it from sets). sync case
left unchanged;
- make sure that ptlrpc_set_next_timeout() yields 1s timeout (instead of 0s) for the set with rpcs
in "unregistering" stage to prevent ptlrpcd from sleeping forever and hanging in test_45;
- in ptlrpcd() make sure that we do not sleep on 0 timeout.
anserper [Wed, 5 Nov 2008 22:31:06 +0000 (22:31 +0000)]
Branch b1_6
b=17371
i=Johann Lombardi
i=Oleg Drokin
fix a race between requeue thread processing and umount
grev [Wed, 5 Nov 2008 17:47:14 +0000 (17:47 +0000)]
b=16551
i\Adilger
correct remote_[mds|ost] fn to work correctly on configuration
with several MDS/OSS nodes
kalpak [Wed, 5 Nov 2008 09:13:08 +0000 (09:13 +0000)]
b=16438
i=adilger
i=girish
Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems. This can be over-riden with "bigendian_extents" mount option.
jxiong [Wed, 5 Nov 2008 02:52:06 +0000 (02:52 +0000)]
b=15715
r=adilger,green
Fixed the race of destroying and enqueuing a ldlm lock at OST side.
bobijam [Wed, 5 Nov 2008 02:26:16 +0000 (02:26 +0000)]
Branch b1_6
b=16578
i=adilger
Description: ldlm_cancel_pack()) ASSERTION(max >= dlm->lock_count + count)
Details : If there is no extra space in the request for early cancels,
ldlm_req_handles_avail() returns 0 instead of a negative value.
liuy [Wed, 5 Nov 2008 01:51:53 +0000 (01:51 +0000)]
*** empty log message ***
yury [Tue, 4 Nov 2008 16:31:07 +0000 (16:31 +0000)]
- removed old trash which probably was committed along with copyrighting effort.
yangsheng [Tue, 4 Nov 2008 07:55:01 +0000 (07:55 +0000)]
Branch b1_6
b=17534
i=alilger, yangsheng
Fixed for client crash by old-style mount command.
tianzy [Tue, 4 Nov 2008 07:28:35 +0000 (07:28 +0000)]
Branch b1_6
Replace LBUG with RETURN(-EINVAL) to avoid crashing
b=5135
i=adilger
i=johann
tappro [Mon, 3 Nov 2008 22:06:46 +0000 (22:06 +0000)]
- test fix from 12512
b:12512
i:grev, adilger
anserper [Mon, 3 Nov 2008 21:17:59 +0000 (21:17 +0000)]
b=17493
i=Andreas Dilger
i=Johann Lombardi
handling of a broken readonly key
tianzy [Mon, 3 Nov 2008 12:55:21 +0000 (12:55 +0000)]
Branch b1_6
fix an error in the test_18 of sanity-quota.sh
b=17523
i=johann
i=panda
adilger [Mon, 3 Nov 2008 04:26:03 +0000 (04:26 +0000)]
Branch b1_6
Quiet compiler warning about unused label.
Conditional check will be optimized away by compiler.
adilger [Mon, 3 Nov 2008 04:08:38 +0000 (04:08 +0000)]
Branch b1_6
Fix 80-column line wrapping.
grev [Fri, 31 Oct 2008 18:19:06 +0000 (18:19 +0000)]
b=17122
i=Nikita
sanity test_100 fix
grev [Fri, 31 Oct 2008 17:05:10 +0000 (17:05 +0000)]
b=17540
i=Nikita
test_53 fix
adilger [Fri, 31 Oct 2008 17:00:48 +0000 (17:00 +0000)]
Branch b1_6
Remove trailing whitespace.
grev [Fri, 31 Oct 2008 16:10:13 +0000 (16:10 +0000)]
b=16551
o=Robert.Read
i=grev
test_27u fix
anserper [Fri, 31 Oct 2008 14:13:45 +0000 (14:13 +0000)]
Branch b1_6
b=13904
i=Johann Lombardi
i=ZhiYong Tian
64-bit quota support for kernel
cvs2svn [Fri, 31 Oct 2008 14:13:44 +0000 (14:13 +0000)]
This commit was manufactured by cvs2svn to create branch 'b1_6'.
yangsheng [Fri, 31 Oct 2008 08:18:09 +0000 (08:18 +0000)]
Branch b1_6
b=17379
i=adilger, johann
Test case for recursive symlink.
yangsheng [Fri, 31 Oct 2008 07:51:42 +0000 (07:51 +0000)]
Branch b1_6
b=17379
i=Brian(LLNL), johann
Set recursive symlink depth to 5 when kernel has 4K stack.
tianzy [Fri, 31 Oct 2008 07:51:37 +0000 (07:51 +0000)]
Branch b1_6
fix a possible NULL pointer in client_quota_ctl()
b=17486
i=johann
i=panda
girish [Thu, 30 Oct 2008 18:04:57 +0000 (18:04 +0000)]
Remove the LBUG and instead, return an error if npages > OST_THREAD_POOL_SIZE
i=johann
i=adilger
b=17448
bobijam [Thu, 30 Oct 2008 02:48:35 +0000 (02:48 +0000)]
Branch b1_6
b=16887
i=pravin.shelar
i=adilger
Address LBUG, ASSERTION(client_stat->nid_exp_ref_count == 0) failed:count -1
* add client stat on obd_nid_stat after client stat is ready.
* properly decrease exp_nid_stats' nid_exp_ref_count in lprocfs_exp_cleanup().
wangdi [Wed, 29 Oct 2008 23:11:02 +0000 (23:11 +0000)]
Branch: b1_6
remove unecessary return.
wangdi [Wed, 29 Oct 2008 22:55:02 +0000 (22:55 +0000)]
Branch: b1_6
Once the unmatched stride IO mode is detected, shrink the stride-ahead window to 0.
if it does hit cache miss, and read-pattern is still stride-io mode,
does not reset the stride window, but also does not increase the stride
window length in this case.
b=17197
i=Nikita
i=Andreas
nathan [Wed, 29 Oct 2008 21:59:36 +0000 (21:59 +0000)]
b=15899
i=johann
i=adilger
coverity fix. thought I landed this awhile ago, but apparently not...
robert.read [Tue, 28 Oct 2008 23:17:40 +0000 (23:17 +0000)]
Branch b1_6
b=17491
i=nathan
i=rread
Quick fix patch from behlendorf1@llnl.gov.
kalpak [Tue, 28 Oct 2008 17:59:03 +0000 (17:59 +0000)]
b=16680
i=adilger, kalpak (o=bzzz)
Detect on-disk corruption of block bitmap and better checking of preallocated blocks.
johann [Tue, 28 Oct 2008 17:36:40 +0000 (17:36 +0000)]
Branch b1_6
b=17089
i=wangdi
fix mistake made when the patch was landed.
bobijam [Tue, 28 Oct 2008 05:46:54 +0000 (05:46 +0000)]
Branch b1_6
b=17093
o=Btian Behlendorf (behlendorf1@llnl.gov) for mgs part
o=E. Gryaznova (grev) for test framework
i=nathan.rutman
i=adilger
Do writeconf only explicitly required.
johann [Mon, 27 Oct 2008 11:36:01 +0000 (11:36 +0000)]
Branch b1_6
b=17385
i=green
i=shadow
grab lock reference when the lock is added to the waiting or expired list.
zhanghc [Mon, 27 Oct 2008 07:35:37 +0000 (07:35 +0000)]
Branch b1_6
handle the problem of test_120a in sanity.sh: 1 cancel RPC occured
b=14502
i=adilger
i=grev
zhanghc [Mon, 27 Oct 2008 01:14:36 +0000 (01:14 +0000)]
Branch b1_6
handle "Unexpected: can't find mdc_open_data,
but the close succeed, Please tell <http://bugzilla.lustre.org/>."
printed in mdc_close in mdc_request.c
b=17089
i=johann
grev [Fri, 24 Oct 2008 20:46:31 +0000 (20:46 +0000)]
b=17477
i=Adilger
i=Tappro
init facets vars for mounted lustre
grev [Fri, 24 Oct 2008 20:01:25 +0000 (20:01 +0000)]
b=16551
i=Nathan
skip replay/recovery tests if remote MDS/OSS with nodsh
lost_test55 fix
tianzy [Fri, 24 Oct 2008 10:52:53 +0000 (10:52 +0000)]
Branch b1_6
handle errors returned by lustre_swab_re{q,p}buf in quota_get_qdata() and
quota_copy_qdata()
b=17324
i=johann
i=panda
shadow [Fri, 24 Oct 2008 05:04:33 +0000 (05:04 +0000)]
Kill extra argument for llog_connect, and don't access to ld_tgt_count
without protection.
Branch b1_6
b=16693
i=umka
i=tappro
shadow [Thu, 23 Oct 2008 19:12:44 +0000 (19:12 +0000)]
in rare cases, inode in catalog can have i_no less than have parent
i_no, this produce wrong order for locking during open, and parallel
unlink can be lock open. this need teach mds_open to grab locks in
resouce id order, not at parent -> child order.
Branch b1_6
b=16492
i=johann
i=alex
grev [Thu, 23 Oct 2008 18:14:27 +0000 (18:14 +0000)]
b=16551
i=Nathan
skip replay/recovery tests if remote MDS/OSS with nodsh
lost test_27 fix
yury [Thu, 23 Oct 2008 18:06:01 +0000 (18:06 +0000)]
b=17323
r=tappro
- small fix from Mike
kalpak [Thu, 23 Oct 2008 10:01:46 +0000 (10:01 +0000)]
b=12800
o=alex.zhuravlev
i=kalpak
i=adilger
Add support for tunable preallocation window and new tunables for large/small requests
yury [Thu, 23 Oct 2008 09:42:20 +0000 (09:42 +0000)]
b=17447
r=adilger,deen
- missed lustre_put_lsi() in couple of places;
- do not call deregister_mount() in mount error path, this makes it impossible for MDT to do put_mount() and thus, its lsi left not released;
- fixes error handling with llog_setup/llog_cleanup in couple of places;
- fixes error handling after hash_init errors in class_setup();
- cleanups.
anserper [Wed, 22 Oct 2008 21:12:11 +0000 (21:12 +0000)]
Branch b1_6
b=17302
i=Johann Lombardi
i=ZhiYong Tian
fix error output messages
grev [Wed, 22 Oct 2008 19:16:40 +0000 (19:16 +0000)]
b=15711
i=Johann
recovery-small exception for FAILURE_MODE=HARD and mixed ost devices
grev [Wed, 22 Oct 2008 19:00:16 +0000 (19:00 +0000)]
b=17442
i=Nikita
do_nodes fix for case when list contains a single node $HOSTNAME
grev [Wed, 22 Oct 2008 11:55:15 +0000 (11:55 +0000)]
b=16551 (att 19784)
o=Adilger
i=grev
skip replay/recovery tests if remote MDS/OSS with nodsh
b=16551 (att 19835)
i=Nathan
do skip_rem[mds|ost] check only if test suit is run; fail acc-sm if
tests suits were skipped due to nodsh
b=17326
i=Alexey.Lyashkov
remove now-useless remount/reconfig for liblustre;
always add accept=all fo lnet module
robert.read [Tue, 21 Oct 2008 22:36:59 +0000 (22:36 +0000)]
Branch b1_6
b=1819
i=adilger
i=nathan
Add an import file to the osc, mdc, and mgc proc dir,
and include test for new proc file.
yury [Tue, 21 Oct 2008 16:57:25 +0000 (16:57 +0000)]
- roll back invalid changes in sanity.sh
yury [Tue, 21 Oct 2008 16:52:41 +0000 (16:52 +0000)]
b=17323
r=adilger,johann
- handle log_cancel resent correctly;
- some cleanups in llog.
yury [Tue, 21 Oct 2008 15:21:17 +0000 (15:21 +0000)]
b=17353
r=wangdi,shadow
- fixes killing alive objecgt on ost in recovery time due to wrong logid added to catalog;
- some cleanups.
tianzy [Mon, 20 Oct 2008 07:48:21 +0000 (07:48 +0000)]
Branch b1_6
fix the test_19 of sanity-quota.sh
b=14909
i=johann
adilger [Fri, 17 Oct 2008 22:53:44 +0000 (22:53 +0000)]
Branch b1_6
Fix autoconf messages.
adilger [Fri, 17 Oct 2008 22:51:32 +0000 (22:51 +0000)]
Branch b1_6
Add in OBD_CONNECT flags from b1_8 so that they are not mistakenly used for
something else.
adilger [Fri, 17 Oct 2008 22:04:34 +0000 (22:04 +0000)]
Branch b1_6
Quiet printf format warning.
adilger [Fri, 17 Oct 2008 21:05:02 +0000 (21:05 +0000)]
Branch b1_6
Use $RM macro for portability.
Make it more clear when sub-makes are finished.
grev [Fri, 17 Oct 2008 19:12:34 +0000 (19:12 +0000)]
b=16551
i=Adilger
fix for remote [mds|ost] with nodsh
yangsheng [Fri, 17 Oct 2008 14:53:37 +0000 (14:53 +0000)]
Branch b1_6
b=17357
i=johann, shadow, bobijam
Reset rep_swap_mask to prevent confuse after resend.
grev [Fri, 17 Oct 2008 12:05:33 +0000 (12:05 +0000)]
b=16551
i=Adilger
fix for remote [mds|ost] with nodsh
grev [Fri, 17 Oct 2008 11:13:11 +0000 (11:13 +0000)]
b=16551
i=Adilger
fix for remote [mds|ost] with nodsh
grev [Fri, 17 Oct 2008 11:03:17 +0000 (11:03 +0000)]
b=16551
i=Adilger
fix for remote [mds|ost] with nodsh
grev [Fri, 17 Oct 2008 10:30:48 +0000 (10:30 +0000)]
b=15266
i=Brian
create machinefile on $TMP, cleanup machinefile
grev [Fri, 17 Oct 2008 10:27:34 +0000 (10:27 +0000)]
b=15266
i=Brian
create machinefile on $TMP, cleanup machinefile
liuy [Fri, 17 Oct 2008 09:47:09 +0000 (09:47 +0000)]
Branch HEAD
b=12521
To avoid extent lock conflicts, if avail_cb_nodes < stripe_count*CO,
avail_cb_nodes should divide (stripe_count*CO) exactly. So that each OST
can be accessed by one or more constant clients.
yangsheng [Fri, 17 Oct 2008 07:45:00 +0000 (07:45 +0000)]
Branch b1_6
b=17151
i=nathan, adilger
patch provide by LLNL
Validate ptlrpc body checksum before swabbing.
huanghua [Thu, 16 Oct 2008 16:58:48 +0000 (16:58 +0000)]
Branch HEAD
b=17403
i=adilger
i=yury.umanets
create objects in correct directory on OST.
yangsheng [Thu, 16 Oct 2008 08:27:54 +0000 (08:27 +0000)]
Branch HEAD
b=17199
i=johann, bobijam
Patch provide by LLNL.
Skip dumping log if panic_on_lbug is set.
ericm [Thu, 16 Oct 2008 04:15:39 +0000 (04:15 +0000)]
build setting for b_hd_sptlrpc.
bobijam [Thu, 16 Oct 2008 01:51:51 +0000 (01:51 +0000)]
Branch b1_6
b=17038
i=johannn
regression test case for getxattr upon symlink file.
bobijam [Thu, 16 Oct 2008 01:41:11 +0000 (01:41 +0000)]
Branch HEAD
b=17038
i=johann
regression test case for getxattr upon symlink file.
anserper [Wed, 15 Oct 2008 22:40:59 +0000 (22:40 +0000)]
Branch b1_6
b=17302
i=Johann Lombardi
i=ZhiYong Tian
pass QFMT through qc_id to be compatible with older Lustre versions
grev [Wed, 15 Oct 2008 17:11:01 +0000 (17:11 +0000)]
b=14471
i=Adilger
replace lustre proc by lctl [set|get]_param
grev [Wed, 15 Oct 2008 16:13:21 +0000 (16:13 +0000)]
b=14471
i=Adilger
replace lustre proc by lctl [set|get]_param
grev [Wed, 15 Oct 2008 15:33:53 +0000 (15:33 +0000)]
b=12599
i=Nathan
fix obsolete run_one CLEANUP
liuy [Wed, 15 Oct 2008 15:18:11 +0000 (15:18 +0000)]
Branch HEAD
b=12521
- set Lustre hints (except striping hints) "anywhere"
- perform collective I/O for interleaving, no matter how big the req size is
- keep the semantic information of cb_nodes
- perform collective I/O by the same client if the whole file access portion
is no bigger than stripe size and cb_nodes is changed by the user
- fix some bugs in the error handling
- remove redundant codes
grev [Wed, 15 Oct 2008 15:04:40 +0000 (15:04 +0000)]
b=12599
i=Nathan
fix obsolete run_one CLEANUP
grev [Wed, 15 Oct 2008 14:46:43 +0000 (14:46 +0000)]
b=16932
i=Johann
skip some tests if there are several ost services on oss node
grev [Wed, 15 Oct 2008 14:42:26 +0000 (14:42 +0000)]
b=16932
i=Johann
skip some tests if there are several ost services on oss node
adilger [Wed, 15 Oct 2008 07:57:11 +0000 (07:57 +0000)]
Branch b1_6
Backport warning fixes from b1_8.
i=girish (original patch)
i=robert
adilger [Wed, 15 Oct 2008 07:46:15 +0000 (07:46 +0000)]
Branch b1_6
Remove shadow variable, quiet use-before-free.
yury [Wed, 15 Oct 2008 07:36:32 +0000 (07:36 +0000)]
- sombody declated @err two times in lov_destroy().
anserper [Tue, 14 Oct 2008 19:26:50 +0000 (19:26 +0000)]
Branch HEAD
b=17152
i=Johann Lombardi
i=Alexey Lyashkov
Take additional references to lov while operating over it
anserper [Tue, 14 Oct 2008 19:14:58 +0000 (19:14 +0000)]
Branch b1_6
b=17152
i=Johann Lombardi
i=Alexey Lyashkov
Take additional references to lov while operating over it
ericm [Tue, 14 Oct 2008 18:59:51 +0000 (18:59 +0000)]
branch: HEAD
do not repost buffer before all requests are finished.
b=17228
r=wangdi
r=nathan
isaac [Tue, 14 Oct 2008 17:28:01 +0000 (17:28 +0000)]
b=13490,i=maxim:
- fix credit flow deadlock in uptllnd.