Whamcloud - gitweb
jerrifer [Wed, 7 Jan 2004 04:39:25 +0000 (04:39 +0000)]
update config
eeb [Mon, 5 Jan 2004 13:35:01 +0000 (13:35 +0000)]
* Fixed type of lib_finalize()'s 'status' parameter
ericm [Tue, 30 Dec 2003 08:25:50 +0000 (08:25 +0000)]
merge HEAD to b_eq: tag
20031230
ericm [Mon, 22 Dec 2003 12:29:21 +0000 (12:29 +0000)]
TCPNAL:
array out-of-bounds in tcpnal_send().
eeb [Fri, 19 Dec 2003 13:58:28 +0000 (13:58 +0000)]
* PtlMDUnlink() can no longer return PTL_MD_INUSE, since it commits the MD
for destruction. If no network I/O is current at the time, a
PTL_EVENT_UNLINK event is created.
* The 'unlinked_me' field of an event has been replaced by a simple flag
'unlinked' that is set if the event signals the destruction of the MD.
* Events have a new 'status' field. This is PTL_OK on successful
completion, and any other portals errno on completion with failure.
CWARN() messages in these callbacks log abnormal completion.
* All event callbacks changed to handle the UNLINK event, completion
status and unlinked flag.
* All abnormal completions changed to work with PltMDUnlink and the new
callbacks.
* Removed bd_complete from ptlrpc_bulk_desc and added bd_success.
Communications have completed when bd_network_rw gets cleared. If
bd_success is set, then bd_nob_transferred tells you how much data
was sent/received.
* Changed MDS and OST bulk completion to deal with failed bulk transfers.
The PtlBD server just LASSERTS things went OK, so we can be reminded to
implement better error handling there too.
* ptlrpc_wake_client_req() inline helper.
* Changed the lib/NAL interface as follows....
. cb_callback() is optional and defaults to calling the event queue's
callback if it is left NULL.
. cb_read(), cb_write(), cb_map(), cb_map_pages(), return PTL_OK on
success and another portals errno on failure.
. cb_send(), cb_send_pages(), cb_recv(), cb_recv_pages() return PTL_OK
if and only if they can commit to calling lib_finalize() when the
relevent message completes (possibly with error).
. cb_send(), cb_send_pages(), cb_recv(), cb_recv_pages() may not modify
the iovec/ptl_kiov_t they are passed, and must do I/O on the
subsection of this scatter/gather buffer starting at 'offset' for
'mlen' bytes. This greatly simplifies portals lib level descriptor
management at minimal expense to the NAL.
. portals lib now exports lib_extract_iov(), lib_extract_kiov() and the
other iov helpers take an additional 'offset' parameter, to simplify
offset buffer coding in the NAL.
. lib_parse() is void (i.e. returns no value).
. lib_finalize() takes an addition ptl_errno_t completion status.
...note that NALs other than qswnal and socknal need to have these
changes implemented properly and tested.
* Swapped some loose fprintf()s for CERROR()
* Dropped PORTAL_SLAB_ALLOC(); portals just uses PORTAL_ALLOC() now.
Since there are no slabs now, I also changed #ifdef PTL_USE_SLAB_CACHE
to #ifndef PTL_USE_LIB_FREELIST
* Changed lib_msg_alloc() so it is _never_ called with the statelock held,
just like all the other allocators.
* Changed dynamic MD allocation to size the MD by the number of fragments.
* Dropped a bunch of dross, plus the iovs from lib_msg_t so they become
tiny again.
cvs2svn [Wed, 17 Dec 2003 13:42:50 +0000 (13:42 +0000)]
This commit was manufactured by cvs2svn to create branch 'unlabeled-1.1.4'.
green [Tue, 16 Dec 2003 17:46:23 +0000 (17:46 +0000)]
r=zab,phil
Fix for bug 974, Also adds a test to check for OOM (modified script from
bug 1135), fixes to sanity.sh's test 45 to obtain a grant (closes 2387).
phil [Tue, 16 Dec 2003 17:01:07 +0000 (17:01 +0000)]
b=1557/2316
Back out patch from bug 1557, because it causes the crash described in
bug 2316.
alex [Mon, 15 Dec 2003 20:42:09 +0000 (20:42 +0000)]
- large kernel address space support against vanilla-2.4.22
ericm [Mon, 15 Dec 2003 12:31:19 +0000 (12:31 +0000)]
liblustre:
use noinst_LIBRARIES instead of lib_LIBRARIES for the immediate libraries
ericm [Mon, 15 Dec 2003 12:03:33 +0000 (12:03 +0000)]
file sanity.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:32 +0000 (12:03 +0000)]
file echo_test.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:31 +0000 (12:03 +0000)]
file Makefile.am was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:30 +0000 (12:03 +0000)]
file test_lock_cancel.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:29 +0000 (12:03 +0000)]
file test_common.h was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:28 +0000 (12:03 +0000)]
file test_common.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:27 +0000 (12:03 +0000)]
file replay_single.c was initially added on branch b_eq.
ericm [Mon, 15 Dec 2003 12:03:26 +0000 (12:03 +0000)]
file recovery_small.c was initially added on branch b_eq.
green [Mon, 15 Dec 2003 10:36:15 +0000 (10:36 +0000)]
Implement saving of previous value of max_dirty_mb, as suggested by Andreas
tianying [Mon, 15 Dec 2003 06:22:42 +0000 (06:22 +0000)]
b: 2356
r: Andreas and Phil
To increase the mount count of mds.
phil [Mon, 15 Dec 2003 06:14:25 +0000 (06:14 +0000)]
change debug_client_off from 0 to the minimal but still useful 0x3f0400
phil [Mon, 15 Dec 2003 04:39:38 +0000 (04:39 +0000)]
- fix iopentest*.c to produce error messages with filenames
- remove sanity test 55
green [Sun, 14 Dec 2003 22:05:30 +0000 (22:05 +0000)]
Whoops, just added test for #2319 was a bit flawed and failed for no good reason
green [Sun, 14 Dec 2003 21:39:09 +0000 (21:39 +0000)]
r=shaver
fix for #2319, make osic to be allocated separately and implement proper
refcounting for it.
Also adds a test to sanity.sh that checks for (fixed) crash.
green [Sun, 14 Dec 2003 17:42:07 +0000 (17:42 +0000)]
r=phik
fix for #2348
alex [Sun, 14 Dec 2003 12:49:50 +0000 (12:49 +0000)]
- xattr-related fixes against chaos-2.4.21
phil [Sun, 14 Dec 2003 05:15:10 +0000 (05:15 +0000)]
fix "empty case at end of compound statement" warning in newer GCCs
phil [Sun, 14 Dec 2003 03:59:16 +0000 (03:59 +0000)]
change default debug level to a more reasonable production setting
phil [Sun, 14 Dec 2003 02:50:28 +0000 (02:50 +0000)]
b=2371
Updated the BUILDING file, to at least remove the lies, and point
people at more helpful documentation
phil [Sat, 13 Dec 2003 06:10:30 +0000 (06:10 +0000)]
ignore generated files
phil [Sat, 13 Dec 2003 04:28:44 +0000 (04:28 +0000)]
b=2368
fix a useless error message
alex [Fri, 12 Dec 2003 17:11:48 +0000 (17:11 +0000)]
- chaos-2.4.21 series against 2.4.21-p4smp-12chaos
ericm [Fri, 12 Dec 2003 07:57:08 +0000 (07:57 +0000)]
liblustre:
fix -fPIC flag in portals/utils/Makefile.am confused automake. Still not
really understand the reason, looks like automake have a bug when handle
this case. Now the workaround is assign different name of the ptlctl
library for build for kernel/liblustre.
wangchao [Fri, 12 Dec 2003 06:43:41 +0000 (06:43 +0000)]
b=1792
r=Chris
add sanity test for "iopen_connect_dentry() on already-connected dentry"
adilger [Fri, 12 Dec 2003 01:38:33 +0000 (01:38 +0000)]
Fix path to include lctl (was already in PATH at LLNL).
adilger [Fri, 12 Dec 2003 00:20:51 +0000 (00:20 +0000)]
Allow sanityN.sh to run with a zconf-mounted setup.
Be more verbose about what the specific error is.
adilger [Fri, 12 Dec 2003 00:17:08 +0000 (00:17 +0000)]
Make ONLY=setup not do cleanup at the end, while we use replay-dual.sh as
a proxy for mount2.sh.
zab [Fri, 12 Dec 2003 00:01:42 +0000 (00:01 +0000)]
- silence trivial unused variable warning
adilger [Thu, 11 Dec 2003 22:33:40 +0000 (22:33 +0000)]
Add lock-order regression test.
b=1844
zab [Thu, 11 Dec 2003 20:06:24 +0000 (20:06 +0000)]
- fix up rc = type-o spotted by adilger
zab [Thu, 11 Dec 2003 19:04:49 +0000 (19:04 +0000)]
b=2339
filter_precreate() was setting the oid returned based on the last_id for the
requested object group, but was always creating objects in group 0 by virtue of
passing NULL in as the obdo to the _next_id functions. In the process of
fixing this we stop NULLing out the obdo in the loop and get rid of the
_setattr() and obdo_from_inode() which are artifacts from when the client
performed obd_create().
Also some cleanup_phase beautification.
wangdi [Thu, 11 Dec 2003 08:30:50 +0000 (08:30 +0000)]
b:2316 Save the owner of f_op before replace it with llite special file operation
wangchao [Thu, 11 Dec 2003 08:29:09 +0000 (08:29 +0000)]
a trivial fix to add description for lfs commands
wangchao [Thu, 11 Dec 2003 02:19:12 +0000 (02:19 +0000)]
b=1135
r=Andreas
Add a regression test script to test OST out-of-space.
ccooper [Thu, 11 Dec 2003 00:01:27 +0000 (00:01 +0000)]
- ignore write_disjoint
alex [Wed, 10 Dec 2003 23:26:10 +0000 (23:26 +0000)]
- kernel_text_address patch against chaos-2.4.18 series
alex [Wed, 10 Dec 2003 21:40:11 +0000 (21:40 +0000)]
- list_for_each_entry_safe(), list_move() and list_move_tail() have been added
alex [Wed, 10 Dec 2003 19:10:15 +0000 (19:10 +0000)]
- list_for_each_entry() added
niu [Wed, 10 Dec 2003 10:13:55 +0000 (10:13 +0000)]
b: 1991
r: Peter
lfs catinfo <keyword>
Fetching logs information from client node. Now keywords include:
config and deletions. Others will be added in future.
wangchao [Wed, 10 Dec 2003 09:51:51 +0000 (09:51 +0000)]
b=2237
a small fix. We should use 0 instead of 1 as the stripe_start patameter, because the first number of OSTs is 0. If we have only one OST, 1 will fail.
wangchao [Wed, 10 Dec 2003 07:05:25 +0000 (07:05 +0000)]
b=2237
r=phil
lstripe should fail when offset > numobd
wangdi [Wed, 10 Dec 2003 03:23:30 +0000 (03:23 +0000)]
Doing endian conversion on constant instead of variable according to andreas advices bug 1989
zab [Wed, 10 Dec 2003 02:03:24 +0000 (02:03 +0000)]
b=2230
Allocation failures during heavy bulk IO load were causing timeouts. Using
GFP_NOFS throughout lustre, and in particular instead of 0 as sk->allocation,
is our most recent attempt to appease the VM. Make lots of noise if you see
allocation failures or deadlocks involving threads waiting for memory.
niu [Wed, 10 Dec 2003 01:51:01 +0000 (01:51 +0000)]
b: 1988
r: Andreas
Make log record alignment 8 bytes.
niu [Wed, 10 Dec 2003 01:36:14 +0000 (01:36 +0000)]
b: 2226
r: Phil
Remove all orhpans on OST while MDS startup, and set last_id correctly.
radhika [Tue, 9 Dec 2003 19:39:56 +0000 (19:39 +0000)]
The newly added "jt_llog_check" function was not declared here.
zab [Tue, 9 Dec 2003 18:37:20 +0000 (18:37 +0000)]
- bring the filter_survey script up to date with recent lctl interface changes
phil [Tue, 9 Dec 2003 16:26:16 +0000 (16:26 +0000)]
b=2330
Add sanity test #62 for obd_match error checking, to avoid regression
wangdi [Tue, 9 Dec 2003 13:00:35 +0000 (13:00 +0000)]
add llog_check and add remove the logs of catalog in llog_remove r:peter
wangchao [Tue, 9 Dec 2003 11:42:17 +0000 (11:42 +0000)]
b=2284
r=Robert
scsi support for dev_read_only
wangchao [Tue, 9 Dec 2003 04:14:25 +0000 (04:14 +0000)]
b=2284
r=Robert
scsi support for dev_read_only
phil [Mon, 8 Dec 2003 15:22:47 +0000 (15:22 +0000)]
b=2321
Fix two rare exit paths which will leak an l_lock() reference:
- an allocation failure in ldlm_server_blocking_ast
- an unlikely race condition in ldlm_resource_add_lock
I blame the latter for the problem reported in bug 2321.
phil [Mon, 8 Dec 2003 14:48:36 +0000 (14:48 +0000)]
Fix confusing MDC error message
zab [Fri, 5 Dec 2003 23:51:27 +0000 (23:51 +0000)]
- bring the generic_hweight32 x86_64 insmod fix over from b_eq
zab [Fri, 5 Dec 2003 20:28:01 +0000 (20:28 +0000)]
b=2330
minor state cleanup from matching error return paths
phil [Fri, 5 Dec 2003 17:46:41 +0000 (17:46 +0000)]
b=2334
A slight reorganization of ll_intent_release, so we can drop the MDS
lock early.
phil [Fri, 5 Dec 2003 15:18:18 +0000 (15:18 +0000)]
b=2334
Break cyclic locking deadlock by dropping the MDC read lock before we
take the OSC read lock during getattr intents
shaver [Fri, 5 Dec 2003 14:45:23 +0000 (14:45 +0000)]
b=1897: use the rpcd to send closes, so that we can resend in the case of a
reconnect after user interruption, and avoid leaking an open-count.
Also, allocate repmsg _before_ reconstructing a close into it.
r=phik
phil [Fri, 5 Dec 2003 11:49:50 +0000 (11:49 +0000)]
b=2313
My fix to bug 2313 accidentally created a lot of noise by returning
non-zero return codes when multiple clients had a file open for write.
phil [Fri, 5 Dec 2003 11:01:10 +0000 (11:01 +0000)]
I am very stupid. I put the extra debugging code in the wrong path.
phil [Fri, 5 Dec 2003 09:31:53 +0000 (09:31 +0000)]
b=2306
r=alex
Replace i_sem with BKL in ext3_fsfilt_write_record
phil [Fri, 5 Dec 2003 05:35:10 +0000 (05:35 +0000)]
b=2333
Fix i_sem/journal inversion in mds_client_add, which was never updated
when we decided to re-order these a few months ago. This became much
easier to hit after we fixed bug 2306.
phil [Fri, 5 Dec 2003 05:33:12 +0000 (05:33 +0000)]
b=2330
Be more careful about the return codes from obd_match, lest we try to
cancel a lock which was never granted.
phil [Fri, 5 Dec 2003 03:20:24 +0000 (03:20 +0000)]
b=1505
r=shaver
Print a much more meaningful error when a client is rejected because a
service node is waiting for recoverable clients.
phil [Fri, 5 Dec 2003 03:18:46 +0000 (03:18 +0000)]
b=2313
r=shaver
This bug happens when a file is opened twice for write, then both close it
at the same time. If they both drop the writecount, then race to
compare it against 0, one will free the fsdata and the other will assert.
This looks like a big patch, but it's mostly plumbing. I had to do some
different argument passing, in order to keep everything protected under the
same lock.
I removed the writecount spinlock, and use the epoch semaphore for all three
things: management of the epoch, protection of the writecount, and atomicity of
writecount modifications which result in allocation or freeing of the fsdata.
phil [Fri, 5 Dec 2003 03:15:19 +0000 (03:15 +0000)]
b=2313
r=shaver
This bug happens when a file is opened twice for write, then both close it
at the same time. If they both drop the writecount, then race to
compare it against 0, one will free the fsdata and the other will assert.
This looks like a big patch, but it's mostly plumbing. I had to do some
different argument passing, in order to keep everything protected under the
same lock.
I removed the writecount spinlock, and use the epoch semaphore for all three
things: management of the epoch, protection of the writecount, and atomicity of
writecount modifications which result in allocation or freeing of the fsdata.
alex [Thu, 4 Dec 2003 20:17:51 +0000 (20:17 +0000)]
- suse-2.4.21 builds on x86_64 now
rread [Thu, 4 Dec 2003 20:12:45 +0000 (20:12 +0000)]
- more error checking and less verbosity for insanity
- fixed shell brainos preventing client failures from working
shaver [Thu, 4 Dec 2003 20:07:30 +0000 (20:07 +0000)]
b=2329: move osc_rpcd into ptlrpc as ptlrpcd, for non-OSC applications. Largely
mechanical, plus a tiny Makefile.am cleanup in ptlrpc.
r=zab
alex [Thu, 4 Dec 2003 17:58:26 +0000 (17:58 +0000)]
- tcp_sendpage_zccd() must be exported always
alex [Thu, 4 Dec 2003 17:48:22 +0000 (17:48 +0000)]
- tcp_sendpage_zccd() must be exported always
eeb [Thu, 4 Dec 2003 15:06:50 +0000 (15:06 +0000)]
* merged HEAD
eeb [Thu, 4 Dec 2003 14:00:06 +0000 (14:00 +0000)]
* Added ENOMEM detection and retry on socknal sends
eeb [Thu, 4 Dec 2003 11:06:57 +0000 (11:06 +0000)]
* Merged HEAD
* "clobbered" (as Phil would say) kernel_patches/patches
niu [Thu, 4 Dec 2003 08:18:25 +0000 (08:18 +0000)]
r: TianYing
close all opened logs before user change logs via lctl, reopen them
after lctl operation finished.
ericm [Thu, 4 Dec 2003 06:29:55 +0000 (06:29 +0000)]
tcpnal:
since we switched to the new timeout mechanism, the old hack code coulde
be removed now.
ericm [Thu, 4 Dec 2003 05:55:53 +0000 (05:55 +0000)]
tcpnal:
nal thread will call select() to block itself waiting incoming packets,
but in 2 cases (1. new connection created 2. shutdown tcpnal) upper
thread need wake up nal thread from sleep immediately.
here we use a local socket which will be under select's monitoring to
notify nal thread wakeup. again brings in unclean code into tcpnal.
ericm [Thu, 4 Dec 2003 05:25:51 +0000 (05:25 +0000)]
tcpnal:
originally tcpnal came with PtEQWait_timeout(), which implemented with
longjmp. It shows some problems with pthreads, and even totally can't
work on Opteron machines.
Now use the pthread's internal timer to do timeout, but also brings some
unclean things between portals - tcpnal. but it's ok at this moment.
rread [Thu, 4 Dec 2003 05:25:40 +0000 (05:25 +0000)]
- insanity cleanups.
- Call the right function to shutdown the osts
- just sleep when powering off the machine.
- use checkstat, instead of ls -ld
phil [Thu, 4 Dec 2003 04:55:28 +0000 (04:55 +0000)]
force D_OTHER on for the duration of the ldlm_namespace_dump, then restore
phil [Thu, 4 Dec 2003 04:41:32 +0000 (04:41 +0000)]
b=2328
r=shaver
- Make sure that all locks which have been marked as receiving a blocking AST
are eventually added to the waiting list, to evict badly-behaving clients.
- If a service node times out waiting for a lock, dump the namespace
to the log, no more than once every 5 minutes
phil [Thu, 4 Dec 2003 02:56:32 +0000 (02:56 +0000)]
when kernel_thread fails, print the return code instead of 0 (or nothing)
niu [Thu, 4 Dec 2003 01:10:02 +0000 (01:10 +0000)]
b: 2226
r: Phil
Set correct last object id when cleaning up orphans during mds setting up.
alex [Wed, 3 Dec 2003 22:35:57 +0000 (22:35 +0000)]
- fix against wrong lock order in fsfilt_ext3_write_record()
bug 2306
shaver [Wed, 3 Dec 2003 21:39:49 +0000 (21:39 +0000)]
Instrumentation for reproducing and verifying 1897 (open-count leaked if close
is interrupted on the client). r=robert.
shaver [Wed, 3 Dec 2003 21:24:55 +0000 (21:24 +0000)]
file llmount-upcall.sh was initially added on branch b_devel.
phil [Wed, 3 Dec 2003 10:58:26 +0000 (10:58 +0000)]
b=2322
In ldlm_process_{plain,extent}_lock, we used to remove and re-add the
lock to the waiting list after every -ERESTART loop. But because of
the logic in the ldlm_*_compat_queue functions, in a very rare case,
this could lead to lock re-ordering and subsequent deadlock.
phil [Wed, 3 Dec 2003 09:39:55 +0000 (09:39 +0000)]
Indentation
phil [Wed, 3 Dec 2003 05:31:34 +0000 (05:31 +0000)]
b=1844
Andreas's patch to fix MDS lock inversions in getattr/reint paths.
I'm giving it one more day to bake on ALC before I commit to the 1.0.x
branch.
phil [Wed, 3 Dec 2003 05:12:52 +0000 (05:12 +0000)]
land 1.0.1 fixes on main development branch (head)