Whamcloud - gitweb
phil [Sun, 21 Dec 2003 07:46:39 +0000 (07:46 +0000)]
b=2425
Jacob reported that when MDS/OST recovery requires new objects to be
created, the OST throws an assertion.
Bug 2425 remains open to track the creation of many more tests for
missing MDS/OST recovery cases.
phil [Sun, 21 Dec 2003 07:41:47 +0000 (07:41 +0000)]
Remove pesky $Id tag which only causes conflicts
rread [Fri, 19 Dec 2003 19:45:29 +0000 (19:45 +0000)]
b=2353
r=shaver
Delete IOC_CONNECT,DISCONNECT and use obd_self_export instead
of creating connections for lctl. Also delete the IOC_DEVICE comamnd
and make the ioctl interface stateless. The lctl probe command is now
a noop, and lctl device is still used to set the device, although the
current device state is only saved in lctl now, and not the kernel.
shaver [Fri, 19 Dec 2003 14:17:00 +0000 (14:17 +0000)]
b=2420: don't acquire a duplicate lock when processing a resent GETATTR, just
grab the dchild directly and sample the data. Fixes recovery-small.sh.
r=phik,buffalo
alex [Fri, 19 Dec 2003 11:16:11 +0000 (11:16 +0000)]
- tcp_sendpage_zccd() must be exported always
phil [Thu, 18 Dec 2003 10:21:23 +0000 (10:21 +0000)]
b=2383
Stop taking a PR lock in mds_readpage; a PR is already held by the
client, so if there is a PW in the queue, deadlock will result. Just
assume that the client has a lock.
phil [Thu, 18 Dec 2003 09:45:44 +0000 (09:45 +0000)]
Print the service name in the mds RECOVERY: message
zab [Thu, 18 Dec 2003 04:13:42 +0000 (04:13 +0000)]
b=2252
r=adilger
(didn't see regressions in buffalo, confirmed read throughput increases
with sf and fpp multi-node IOR)
This cleans up llite's readpage path and implements our own read-ahead window
that hangs off of ll_file_data. The broad goal is to keep a fair amount of
read-ahead pages issued and queued which can be fired off into read rpcs as
read-ahead rpcs are completed.
zab [Thu, 18 Dec 2003 03:59:08 +0000 (03:59 +0000)]
- put llite page cache pages in a list_head for the duration
of their stay in the page cache. This lets us display the contents
of the page cache via llite/*/dump_pgcache file. This was done as part
of b=2252 and is being committed seperately from the read-ahead work.
adilger [Wed, 17 Dec 2003 19:49:18 +0000 (19:49 +0000)]
Silence bogus compiler warning.
adilger [Wed, 17 Dec 2003 19:48:16 +0000 (19:48 +0000)]
We can never hit the end of mds_finish_open() with a non-zero error code
because we exit early on error, so the mds_destroy_mfd() is bogus. I left
RETURN(rc) in case things change in the future though.
We don't use request_body() anywhere inside mds_put_write_access(), but
since all of that code is just commented out I didn't do a real cleanup.
Just a bogus compiler warning fixed.
ericm [Wed, 17 Dec 2003 13:42:47 +0000 (13:42 +0000)]
file dir.c was initially added on branch b_eq.
zab [Wed, 17 Dec 2003 00:04:24 +0000 (00:04 +0000)]
- move the osc histogram helpers into lprocfs and rename accordingly
- export brw histograms from the filter that record discontiguous offsets
in the brw request and discontigous blocks that satisfy the request
(seen as /proc/fs/lustre/obdfilter/$name/brw_stats)
zab [Tue, 16 Dec 2003 22:13:03 +0000 (22:13 +0000)]
- get rid of some ancient unused left-overs
green [Tue, 16 Dec 2003 17:46:23 +0000 (17:46 +0000)]
r=zab,phil
Fix for bug 974, Also adds a test to check for OOM (modified script from
bug 1135), fixes to sanity.sh's test 45 to obtain a grant (closes 2387).
phil [Tue, 16 Dec 2003 17:01:07 +0000 (17:01 +0000)]
b=1557/2316
Back out patch from bug 1557, because it causes the crash described in
bug 2316.
alex [Mon, 15 Dec 2003 20:42:09 +0000 (20:42 +0000)]
- large kernel address space support against vanilla-2.4.22
ericm [Mon, 15 Dec 2003 12:03:33 +0000 (12:03 +0000)]
file sanity.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:32 +0000 (12:03 +0000)]
file echo_test.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:31 +0000 (12:03 +0000)]
file Makefile.am was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:30 +0000 (12:03 +0000)]
file test_lock_cancel.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:29 +0000 (12:03 +0000)]
file test_common.h was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:28 +0000 (12:03 +0000)]
file test_common.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:27 +0000 (12:03 +0000)]
file replay_single.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:26 +0000 (12:03 +0000)]
file recovery_small.c was initially added on branch b_eq.
green [Mon, 15 Dec 2003 10:36:15 +0000 (10:36 +0000)]
Implement saving of previous value of max_dirty_mb, as suggested by Andreas
tianying [Mon, 15 Dec 2003 06:22:42 +0000 (06:22 +0000)]
b: 2356
r: Andreas and Phil
To increase the mount count of mds.
phil [Mon, 15 Dec 2003 06:14:25 +0000 (06:14 +0000)]
change debug_client_off from 0 to the minimal but still useful 0x3f0400
phil [Mon, 15 Dec 2003 04:39:38 +0000 (04:39 +0000)]
- fix iopentest*.c to produce error messages with filenames
- remove sanity test 55
green [Sun, 14 Dec 2003 22:05:30 +0000 (22:05 +0000)]
Whoops, just added test for #2319 was a bit flawed and failed for no good reason
green [Sun, 14 Dec 2003 21:39:09 +0000 (21:39 +0000)]
r=shaver
fix for #2319, make osic to be allocated separately and implement proper
refcounting for it.
Also adds a test to sanity.sh that checks for (fixed) crash.
green [Sun, 14 Dec 2003 17:42:07 +0000 (17:42 +0000)]
r=phik
fix for #2348
alex [Sun, 14 Dec 2003 12:49:50 +0000 (12:49 +0000)]
- xattr-related fixes against chaos-2.4.21
phil [Sun, 14 Dec 2003 05:15:10 +0000 (05:15 +0000)]
fix "empty case at end of compound statement" warning in newer GCCs
phil [Sun, 14 Dec 2003 03:59:16 +0000 (03:59 +0000)]
change default debug level to a more reasonable production setting
phil [Sun, 14 Dec 2003 02:50:28 +0000 (02:50 +0000)]
b=2371
Updated the BUILDING file, to at least remove the lies, and point
people at more helpful documentation
phil [Sat, 13 Dec 2003 06:10:30 +0000 (06:10 +0000)]
ignore generated files
phil [Sat, 13 Dec 2003 04:28:44 +0000 (04:28 +0000)]
b=2368
fix a useless error message
alex [Fri, 12 Dec 2003 17:11:48 +0000 (17:11 +0000)]
- chaos-2.4.21 series against 2.4.21-p4smp-12chaos
wangchao [Fri, 12 Dec 2003 06:43:41 +0000 (06:43 +0000)]
b=1792
r=Chris
add sanity test for "iopen_connect_dentry() on already-connected dentry"
adilger [Fri, 12 Dec 2003 01:38:33 +0000 (01:38 +0000)]
Fix path to include lctl (was already in PATH at LLNL).
adilger [Fri, 12 Dec 2003 00:20:51 +0000 (00:20 +0000)]
Allow sanityN.sh to run with a zconf-mounted setup.
Be more verbose about what the specific error is.
adilger [Fri, 12 Dec 2003 00:17:08 +0000 (00:17 +0000)]
Make ONLY=setup not do cleanup at the end, while we use replay-dual.sh as
a proxy for mount2.sh.
zab [Fri, 12 Dec 2003 00:01:42 +0000 (00:01 +0000)]
- silence trivial unused variable warning
adilger [Thu, 11 Dec 2003 22:33:40 +0000 (22:33 +0000)]
Add lock-order regression test.
b=1844
zab [Thu, 11 Dec 2003 20:06:24 +0000 (20:06 +0000)]
- fix up rc = type-o spotted by adilger
zab [Thu, 11 Dec 2003 19:04:49 +0000 (19:04 +0000)]
b=2339
filter_precreate() was setting the oid returned based on the last_id for the
requested object group, but was always creating objects in group 0 by virtue of
passing NULL in as the obdo to the _next_id functions. In the process of
fixing this we stop NULLing out the obdo in the loop and get rid of the
_setattr() and obdo_from_inode() which are artifacts from when the client
performed obd_create().
Also some cleanup_phase beautification.
wangdi [Thu, 11 Dec 2003 08:30:50 +0000 (08:30 +0000)]
b:2316 Save the owner of f_op before replace it with llite special file operation
wangchao [Thu, 11 Dec 2003 08:29:09 +0000 (08:29 +0000)]
a trivial fix to add description for lfs commands
wangchao [Thu, 11 Dec 2003 02:19:12 +0000 (02:19 +0000)]
b=1135
r=Andreas
Add a regression test script to test OST out-of-space.
ccooper [Thu, 11 Dec 2003 00:01:27 +0000 (00:01 +0000)]
- ignore write_disjoint
alex [Wed, 10 Dec 2003 23:26:10 +0000 (23:26 +0000)]
- kernel_text_address patch against chaos-2.4.18 series
alex [Wed, 10 Dec 2003 21:40:11 +0000 (21:40 +0000)]
- list_for_each_entry_safe(), list_move() and list_move_tail() have been added
alex [Wed, 10 Dec 2003 19:10:15 +0000 (19:10 +0000)]
- list_for_each_entry() added
niu [Wed, 10 Dec 2003 10:13:55 +0000 (10:13 +0000)]
b: 1991
r: Peter
lfs catinfo <keyword>
Fetching logs information from client node. Now keywords include:
config and deletions. Others will be added in future.
wangchao [Wed, 10 Dec 2003 09:51:51 +0000 (09:51 +0000)]
b=2237
a small fix. We should use 0 instead of 1 as the stripe_start patameter, because the first number of OSTs is 0. If we have only one OST, 1 will fail.
wangchao [Wed, 10 Dec 2003 07:05:25 +0000 (07:05 +0000)]
b=2237
r=phil
lstripe should fail when offset > numobd
wangdi [Wed, 10 Dec 2003 03:23:30 +0000 (03:23 +0000)]
Doing endian conversion on constant instead of variable according to andreas advices bug 1989
zab [Wed, 10 Dec 2003 02:03:24 +0000 (02:03 +0000)]
b=2230
Allocation failures during heavy bulk IO load were causing timeouts. Using
GFP_NOFS throughout lustre, and in particular instead of 0 as sk->allocation,
is our most recent attempt to appease the VM. Make lots of noise if you see
allocation failures or deadlocks involving threads waiting for memory.
niu [Wed, 10 Dec 2003 01:51:01 +0000 (01:51 +0000)]
b: 1988
r: Andreas
Make log record alignment 8 bytes.
niu [Wed, 10 Dec 2003 01:36:14 +0000 (01:36 +0000)]
b: 2226
r: Phil
Remove all orhpans on OST while MDS startup, and set last_id correctly.
radhika [Tue, 9 Dec 2003 19:39:56 +0000 (19:39 +0000)]
The newly added "jt_llog_check" function was not declared here.
zab [Tue, 9 Dec 2003 18:37:20 +0000 (18:37 +0000)]
- bring the filter_survey script up to date with recent lctl interface changes
phil [Tue, 9 Dec 2003 16:26:16 +0000 (16:26 +0000)]
b=2330
Add sanity test #62 for obd_match error checking, to avoid regression
wangdi [Tue, 9 Dec 2003 13:00:35 +0000 (13:00 +0000)]
add llog_check and add remove the logs of catalog in llog_remove r:peter
wangchao [Tue, 9 Dec 2003 11:42:17 +0000 (11:42 +0000)]
b=2284
r=Robert
scsi support for dev_read_only
wangchao [Tue, 9 Dec 2003 04:14:25 +0000 (04:14 +0000)]
b=2284
r=Robert
scsi support for dev_read_only
phil [Mon, 8 Dec 2003 15:22:47 +0000 (15:22 +0000)]
b=2321
Fix two rare exit paths which will leak an l_lock() reference:
- an allocation failure in ldlm_server_blocking_ast
- an unlikely race condition in ldlm_resource_add_lock
I blame the latter for the problem reported in bug 2321.
phil [Mon, 8 Dec 2003 14:48:36 +0000 (14:48 +0000)]
Fix confusing MDC error message
zab [Fri, 5 Dec 2003 23:51:27 +0000 (23:51 +0000)]
- bring the generic_hweight32 x86_64 insmod fix over from b_eq
zab [Fri, 5 Dec 2003 20:28:01 +0000 (20:28 +0000)]
b=2330
minor state cleanup from matching error return paths
phil [Fri, 5 Dec 2003 17:46:41 +0000 (17:46 +0000)]
b=2334
A slight reorganization of ll_intent_release, so we can drop the MDS
lock early.
phil [Fri, 5 Dec 2003 15:18:18 +0000 (15:18 +0000)]
b=2334
Break cyclic locking deadlock by dropping the MDC read lock before we
take the OSC read lock during getattr intents
shaver [Fri, 5 Dec 2003 14:45:23 +0000 (14:45 +0000)]
b=1897: use the rpcd to send closes, so that we can resend in the case of a
reconnect after user interruption, and avoid leaking an open-count.
Also, allocate repmsg _before_ reconstructing a close into it.
r=phik
phil [Fri, 5 Dec 2003 11:49:50 +0000 (11:49 +0000)]
b=2313
My fix to bug 2313 accidentally created a lot of noise by returning
non-zero return codes when multiple clients had a file open for write.
phil [Fri, 5 Dec 2003 11:01:10 +0000 (11:01 +0000)]
I am very stupid. I put the extra debugging code in the wrong path.
phil [Fri, 5 Dec 2003 09:31:53 +0000 (09:31 +0000)]
b=2306
r=alex
Replace i_sem with BKL in ext3_fsfilt_write_record
phil [Fri, 5 Dec 2003 05:35:10 +0000 (05:35 +0000)]
b=2333
Fix i_sem/journal inversion in mds_client_add, which was never updated
when we decided to re-order these a few months ago. This became much
easier to hit after we fixed bug 2306.
phil [Fri, 5 Dec 2003 05:33:12 +0000 (05:33 +0000)]
b=2330
Be more careful about the return codes from obd_match, lest we try to
cancel a lock which was never granted.
phil [Fri, 5 Dec 2003 03:20:24 +0000 (03:20 +0000)]
b=1505
r=shaver
Print a much more meaningful error when a client is rejected because a
service node is waiting for recoverable clients.
phil [Fri, 5 Dec 2003 03:18:46 +0000 (03:18 +0000)]
b=2313
r=shaver
This bug happens when a file is opened twice for write, then both close it
at the same time. If they both drop the writecount, then race to
compare it against 0, one will free the fsdata and the other will assert.
This looks like a big patch, but it's mostly plumbing. I had to do some
different argument passing, in order to keep everything protected under the
same lock.
I removed the writecount spinlock, and use the epoch semaphore for all three
things: management of the epoch, protection of the writecount, and atomicity of
writecount modifications which result in allocation or freeing of the fsdata.
phil [Fri, 5 Dec 2003 03:15:19 +0000 (03:15 +0000)]
b=2313
r=shaver
This bug happens when a file is opened twice for write, then both close it
at the same time. If they both drop the writecount, then race to
compare it against 0, one will free the fsdata and the other will assert.
This looks like a big patch, but it's mostly plumbing. I had to do some
different argument passing, in order to keep everything protected under the
same lock.
I removed the writecount spinlock, and use the epoch semaphore for all three
things: management of the epoch, protection of the writecount, and atomicity of
writecount modifications which result in allocation or freeing of the fsdata.
alex [Thu, 4 Dec 2003 20:17:51 +0000 (20:17 +0000)]
- suse-2.4.21 builds on x86_64 now
rread [Thu, 4 Dec 2003 20:12:45 +0000 (20:12 +0000)]
- more error checking and less verbosity for insanity
- fixed shell brainos preventing client failures from working
shaver [Thu, 4 Dec 2003 20:07:30 +0000 (20:07 +0000)]
b=2329: move osc_rpcd into ptlrpc as ptlrpcd, for non-OSC applications. Largely
mechanical, plus a tiny Makefile.am cleanup in ptlrpc.
r=zab
alex [Thu, 4 Dec 2003 17:58:26 +0000 (17:58 +0000)]
- tcp_sendpage_zccd() must be exported always
alex [Thu, 4 Dec 2003 17:48:22 +0000 (17:48 +0000)]
- tcp_sendpage_zccd() must be exported always
eeb [Thu, 4 Dec 2003 14:00:06 +0000 (14:00 +0000)]
* Added ENOMEM detection and retry on socknal sends
niu [Thu, 4 Dec 2003 08:18:25 +0000 (08:18 +0000)]
r: TianYing
close all opened logs before user change logs via lctl, reopen them
after lctl operation finished.
rread [Thu, 4 Dec 2003 05:25:40 +0000 (05:25 +0000)]
- insanity cleanups.
- Call the right function to shutdown the osts
- just sleep when powering off the machine.
- use checkstat, instead of ls -ld
phil [Thu, 4 Dec 2003 04:55:28 +0000 (04:55 +0000)]
force D_OTHER on for the duration of the ldlm_namespace_dump, then restore
phil [Thu, 4 Dec 2003 04:41:32 +0000 (04:41 +0000)]
b=2328
r=shaver
- Make sure that all locks which have been marked as receiving a blocking AST
are eventually added to the waiting list, to evict badly-behaving clients.
- If a service node times out waiting for a lock, dump the namespace
to the log, no more than once every 5 minutes
phil [Thu, 4 Dec 2003 02:56:32 +0000 (02:56 +0000)]
when kernel_thread fails, print the return code instead of 0 (or nothing)
niu [Thu, 4 Dec 2003 01:10:02 +0000 (01:10 +0000)]
b: 2226
r: Phil
Set correct last object id when cleaning up orphans during mds setting up.
alex [Wed, 3 Dec 2003 22:35:57 +0000 (22:35 +0000)]
- fix against wrong lock order in fsfilt_ext3_write_record()
bug 2306
shaver [Wed, 3 Dec 2003 21:39:49 +0000 (21:39 +0000)]
Instrumentation for reproducing and verifying 1897 (open-count leaked if close
is interrupted on the client). r=robert.
shaver [Wed, 3 Dec 2003 21:24:55 +0000 (21:24 +0000)]
file llmount-upcall.sh was initially added on branch b_devel.
phil [Wed, 3 Dec 2003 10:58:26 +0000 (10:58 +0000)]
b=2322
In ldlm_process_{plain,extent}_lock, we used to remove and re-add the
lock to the waiting list after every -ERESTART loop. But because of
the logic in the ldlm_*_compat_queue functions, in a very rare case,
this could lead to lock re-ordering and subsequent deadlock.
phil [Wed, 3 Dec 2003 09:39:55 +0000 (09:39 +0000)]
Indentation
phil [Wed, 3 Dec 2003 05:31:34 +0000 (05:31 +0000)]
b=1844
Andreas's patch to fix MDS lock inversions in getattr/reint paths.
I'm giving it one more day to bake on ALC before I commit to the 1.0.x
branch.