Whamcloud - gitweb
ericm [Thu, 4 Dec 2003 06:29:55 +0000 (06:29 +0000)]
tcpnal:
since we switched to the new timeout mechanism, the old hack code coulde
be removed now.
ericm [Thu, 4 Dec 2003 05:55:53 +0000 (05:55 +0000)]
tcpnal:
nal thread will call select() to block itself waiting incoming packets,
but in 2 cases (1. new connection created 2. shutdown tcpnal) upper
thread need wake up nal thread from sleep immediately.
here we use a local socket which will be under select's monitoring to
notify nal thread wakeup. again brings in unclean code into tcpnal.
ericm [Thu, 4 Dec 2003 05:25:51 +0000 (05:25 +0000)]
tcpnal:
originally tcpnal came with PtEQWait_timeout(), which implemented with
longjmp. It shows some problems with pthreads, and even totally can't
work on Opteron machines.
Now use the pthread's internal timer to do timeout, but also brings some
unclean things between portals - tcpnal. but it's ok at this moment.
zab [Wed, 3 Dec 2003 01:25:18 +0000 (01:25 +0000)]
- oops, put _CFLAGS = -fPIC crap in if LIBLUSTRE conditionals. the
portals/utils instance is commented out until someone can explain why
(horrible waste of time) automake changes behaviour with it uncommented in
a !LIBLUSTRE build
zab [Tue, 2 Dec 2003 19:48:37 +0000 (19:48 +0000)]
- use -fPIC to build liblustre to stop the .os from containing relocations
that x86_64 can't put in .sos
- don't include ldlm *and* ptlrpc libs in liblustre.{a,so}
zab [Fri, 28 Nov 2003 23:04:59 +0000 (23:04 +0000)]
get x86_64 kernel-side lustre building and inserting.
- __init/exit aren't used in declarations, only definitions.
- x86_64 doesn't have a working hweight32, apparently.
- really only do the stack business in the i386 arch build.
phil [Fri, 28 Nov 2003 22:12:53 +0000 (22:12 +0000)]
merge b_devel into b_eq:
- fixes from alex, tian, eeb
- small differences in opinion
phil [Fri, 28 Nov 2003 21:53:54 +0000 (21:53 +0000)]
merge part of b_eq to b_devel:
removes ptlrpc_pack_msg(), adds ptlrpc_pack_request() and ptlrpc_pack_reply()
phil [Fri, 28 Nov 2003 21:51:37 +0000 (21:51 +0000)]
more innocent b_eq to b_devel merging:
- updates to liblustre-specific files
- updates to #ifdef-ed code
- Makefiles et al
phil [Fri, 28 Nov 2003 08:12:52 +0000 (08:12 +0000)]
merge b_devel into b_eq
phil [Fri, 28 Nov 2003 06:54:24 +0000 (06:54 +0000)]
b=2254
At truncate time, ext3 is zeroing indirect blocks and writing a
transaction to the journal. At some point, perhaps when that
transaction commits, it gives the dirty buffer for that indirect block
to the buffer cache to write to disk. Meanwhile, having marked that
block as unused, it reallocates it to us as a normal data block into
which IOR puts some data.
The obdfilter writes that block with brw_kiovec, which writes
immediately to the disk with no regard for what data might be in the
buffer cache. Shortly thereafter, the buffer cache writes the block
of zeroes over top our valuable data.
The correct fix is to modify our special ext3 block allocation code to
look in the buffer cache for us, and discard any pending writes to the
newly-allocated blocks, much like the direct I/O code does. As a
workaround, for kernels which do not yet have this change, I added
some code to the obdfilter to do this after the call to
ext3_map_inode_page returns.
This introduces kernel version 32, but doesn't force an upgrade. I
updated the kernel patches for 2.4.18 and 2.4.20, but not 2.6. I also:
- tested the ext3_map_inode_page change on vanilla-2.4.20
- tested the workaround change on chaos-2.4.18
- compile-tested a version 32 chaos-2.4.18 kernel
phil [Thu, 27 Nov 2003 06:31:33 +0000 (06:31 +0000)]
Slightly less obvious, but still very innocent, parts of b_eq:
- code or includes in the main build, but contained in #if __KERNEL__
- userspace-only portals code
- very few and minor other changes, such as renaming a function
phil [Thu, 27 Nov 2003 06:01:11 +0000 (06:01 +0000)]
Innocent beginnings of the b_eq merge: the liblustre directory, which
accounts for almost half of the patch and doesn't build in kernelspace.
eeb [Wed, 26 Nov 2003 18:40:20 +0000 (18:40 +0000)]
* Added a generic facility for callback to be called any time
l_wait_event() happens, to take the place of daemons in the kernel
implementation
* Changed liblustre_services() to use the new callback.
* Backed out the liblustre OSC rpcd changes, and converted that to use
the new callback too.
* Disabled mmap in alloc_pages() FTTB (it seems to screw up under linux;
dunno exactly why right now).
* Left some hacking utensils behind in tcpnal_send() FTTB, just to make
it easy to see what's getting sent.
* Added some error checking in tcpnal_send(); it will abort if something
screws up.
rread [Wed, 26 Nov 2003 01:26:49 +0000 (01:26 +0000)]
b=2296
r=shaver
- Build ldlm into ptlrpc.
- Remove all the ptlrpc_ldlm_* hooks
This almost certainly breaks building when $(SRDIR) != $(OBJDIR).
tianying [Tue, 25 Nov 2003 12:51:51 +0000 (12:51 +0000)]
1. move llog_origin_handle_cancel to llog_server.c
2. destroy useless plain logs during both llog_cleanup and llog_setup
3. change some CERRORs to CWARNs while just print out status information
ericm [Tue, 25 Nov 2003 12:14:09 +0000 (12:14 +0000)]
liblustre: follow up last merge from b_devel.
ericm [Tue, 25 Nov 2003 11:34:23 +0000 (11:34 +0000)]
merge b_devel to b_eq:
20031125. kernel part only.
rread [Mon, 24 Nov 2003 23:57:01 +0000 (23:57 +0000)]
cleanup clients correctly in insanity.sh
set the mds upcall, so the mds can recovery from ost failures
ericm [Mon, 24 Nov 2003 03:06:06 +0000 (03:06 +0000)]
tcpnal: use mutex to protect tcpnal_write, since in case of bulk put,
nal_thread will compete with upper thread to write into same socket.
adilger [Sat, 22 Nov 2003 19:32:05 +0000 (19:32 +0000)]
Add fsfilt_map_inode_page() instead of calling ext3_map_inode_page()
directly from filter_direct_io(). Update fsfilt_extN.c.
adilger [Fri, 21 Nov 2003 23:27:33 +0000 (23:27 +0000)]
Include add_page_private.patch in chaos kernel series.
alex [Fri, 21 Nov 2003 15:06:25 +0000 (15:06 +0000)]
- patch to fix 64-bit pointer arithmetic bug in the ext3 extended
attributes code.
rread [Fri, 21 Nov 2003 09:10:56 +0000 (09:10 +0000)]
move client_df to test-framework.
rread [Fri, 21 Nov 2003 08:42:35 +0000 (08:42 +0000)]
- integrate support for FAILURE_MODE into test-framework
- add configs for mdev
rread [Fri, 21 Nov 2003 03:27:13 +0000 (03:27 +0000)]
- cleanup old active files
- cleanup insanity test_0
- add checks to make sure the nodes are up during setup
rread [Fri, 21 Nov 2003 01:04:55 +0000 (01:04 +0000)]
add usage() to test_framework, and use getopts cause getopt is broken
insanity fixes for hardware
shaver [Fri, 21 Nov 2003 00:19:11 +0000 (00:19 +0000)]
Verify data from a read that spans a failover.
shaver [Fri, 21 Nov 2003 00:05:31 +0000 (00:05 +0000)]
Verify data written by operations that span a failure.
rread [Thu, 20 Nov 2003 21:42:24 +0000 (21:42 +0000)]
Override the local_nid in the logs during zeroconf mount. If the
local_nid option is not specified, then mount.lustre will use
the hostname for socknal, or /proc/elan/device0/position for qswnal.
For nettype=elan, if remote_nid isn't set the mount.lustre will
attempt to parse the hostname and use the first series of numbers as
the nid. So, mdev20-eth1 will become elan id 20.
shaver [Thu, 20 Nov 2003 21:41:15 +0000 (21:41 +0000)]
Small test and upcall fixes; we fail over during a write now.
adilger [Thu, 20 Nov 2003 19:05:02 +0000 (19:05 +0000)]
Create a per-fs lock for obdfilter block allocation. This is held only
during actual block allocation and not during RPCs or writes. This allows
us to allocate contiguous chunks of disk (if available) up to the size of
each RPC, instead of interleaving block allocations.
It slows down writes in the contention case, because we might be holding
the lock while waiting for a bitmap or something to be loaded from disk,
and in the current 2.4 IO code reads-behind-lots-of-writes can be punishing.
We might benefit here and elsewhere from AKPM's read priority patch.
The big benefit is that at read time, or after some amount of create-delete
we don't have a maximally fragmented block allocation to deal with, which
causes pathological seeking on the disks.
b=2260
r=peter,phil
rread [Thu, 20 Nov 2003 07:42:54 +0000 (07:42 +0000)]
- Integrate insanity.sh with test-framework
- Centralize the config data to files in tests/cfg/
- Different config file can be used by setting CONFIG=file or
using command line option: --config <file>
- FAILURE_MODE option for insanity.sh:
FAILURE_MODE=SOFT - shutdown services with lconf --force
FAILURE_MODE=HARD - power down the service nodes using $POWER_DOWN
and $POWER_UP to reboot.
This feature should be integrated with the other tests.
- replay-single.sh, replay-dual.sh, and conf-sanity.sh all pass on hardware
- replay-ost-dual.sh and insanity.sh don't pass yet
rread [Thu, 20 Nov 2003 00:49:00 +0000 (00:49 +0000)]
- print the correct strings
rread [Tue, 18 Nov 2003 20:36:15 +0000 (20:36 +0000)]
- Remove last references to OBD_CLASS_UUID. Now the obd->obd_uuid is
used as the client uuid for self and lctl exports.
rread [Tue, 18 Nov 2003 09:52:36 +0000 (09:52 +0000)]
insanity.sh - this is test17 turned into a regular test, if you could
call this test regular. Still a work in progress, but does at least
setup/cleanup.
zab [Mon, 17 Nov 2003 18:46:30 +0000 (18:46 +0000)]
- give the user an error message when the dump file can't be opened
alex [Mon, 17 Nov 2003 17:32:25 +0000 (17:32 +0000)]
- kernel part of b1933
rread [Sat, 15 Nov 2003 01:49:00 +0000 (01:49 +0000)]
return true value from zconf_mount
rread [Sat, 15 Nov 2003 01:17:53 +0000 (01:17 +0000)]
b=2250
create lctl commands set_lustre_upcall and set_timeout, so they
can be saved as part of the 0conf log.
adilger [Fri, 14 Nov 2003 22:50:30 +0000 (22:50 +0000)]
Code is dead, dead, dead, and keeps showing up in my greps.
rread [Fri, 14 Nov 2003 22:32:26 +0000 (22:32 +0000)]
More test cleanups.
rread [Fri, 14 Nov 2003 22:24:19 +0000 (22:24 +0000)]
In osc_interpret_create, don't overwrite rc if it's already set.
adilger [Fri, 14 Nov 2003 20:12:23 +0000 (20:12 +0000)]
Interoperability for different PAGE_SIZE/wordsize machines. Tested on ia64
and i386 separately, and with ia64 client + i386 MDS/OST.
Mostly aligning structs to have 64-bit fields aligned on 64-bit boundaries.
Remove some VFS constants and replace them with Lustre constants on the wire.
Since the MDS doesn't really open files itself, we don't need to convert from
the wire constants back to local flags at all.
Frobbing of niobufs on the targets to split them into PAGE_SIZE chunks (this
may be a problem on large PAGE_SIZE servers with small PAGE_SIZE clients,
not sure yet).
I have tested and this appears to be compatible with old filesystems.
At worst we should only need another --write_conf on the MDS.
b=686, b=1821, b=1343, b=2042
phil [Fri, 14 Nov 2003 09:24:33 +0000 (09:24 +0000)]
I touched one thing in lustre_mds.h, and I was dismayed to see the
entire source tree rebuild!
It took just a few minutes to remove a number of #includes which
violated all manner of abstraction boundary. ericm assures me that I
have not broken anything major in liblustre.
Also, I moved a good chunk of inline functions into llite_internal.h
(which is where I secretly believe that most of lustre_lite.h will end up),
and renamed ll_ino2fid (which no longer takes an inode) to mdc_pack_fid.
rread [Thu, 13 Nov 2003 19:22:32 +0000 (19:22 +0000)]
- don't crash if there are no options
eeb [Thu, 13 Nov 2003 18:49:21 +0000 (18:49 +0000)]
* Changed liblustre/libtest to call into lctl for interactive echo tests
* Changed some global names in the utilities to avoid conflicts
* Added set_ioc_handler() to allow portals/obd ioctl redirects to
the liblustre handler (as well as dumping to a file)
alex [Thu, 13 Nov 2003 15:06:14 +0000 (15:06 +0000)]
- kmem_cache_validate patch has been removed from all the series
NOTE: all the supported series still build
ericm [Thu, 13 Nov 2003 12:06:04 +0000 (12:06 +0000)]
opts could be NULL, cause segfault.
ericm [Thu, 13 Nov 2003 08:02:59 +0000 (08:02 +0000)]
liblustre: minor fix for last merge.
ericm [Thu, 13 Nov 2003 07:54:51 +0000 (07:54 +0000)]
again merge b_devel to b_eq
20031113
yesterday's merge brought in some nasty bugs.
tianying [Thu, 13 Nov 2003 05:56:59 +0000 (05:56 +0000)]
b:2215 - OSTs fetch unlink llog records from MDS post replay
1. add lop_connect and lop_precleanup to llog_operations
2. rename llog_obd_ctxt to llog_ctxt; llog_commit_data to llog_canceld_ctxt
3. split out llog functions in llog_client.c and llog_server.c and remove llogd.c
4. add one test-59 to sanity.sh to verify cancellation of llog records async
5. fix calling of mds_cleanup_orphans, add test-34 to replay-single.sh
6. fix some codes about llog
rread [Thu, 13 Nov 2003 01:52:01 +0000 (01:52 +0000)]
braino
rread [Thu, 13 Nov 2003 01:27:35 +0000 (01:27 +0000)]
- return replay-ost-single to runable state
now just need to pass
- commonize the zconf mount/umount
rread [Thu, 13 Nov 2003 00:04:13 +0000 (00:04 +0000)]
- include pcfg's in dump_log
adilger [Wed, 12 Nov 2003 23:04:30 +0000 (23:04 +0000)]
Remove straggler .pc file.
rread [Wed, 12 Nov 2003 21:15:12 +0000 (21:15 +0000)]
- remove extra dec in the error cleanup case.
shaver [Wed, 12 Nov 2003 14:10:30 +0000 (14:10 +0000)]
b=2239
- rq_timeout halving for imp_server_timeout imps
- set sending_error in the resent-RPC case in check_set
- wake and error out if a create fails due to invalidated OSC
r=phik
ericm [Wed, 12 Nov 2003 11:03:26 +0000 (11:03 +0000)]
merge b_devel to b_eq:
20031112
for robert's final zeroconf code.
adilger [Wed, 12 Nov 2003 09:40:51 +0000 (09:40 +0000)]
Fix compile warning.
rread [Wed, 12 Nov 2003 06:18:47 +0000 (06:18 +0000)]
- add /sbin/mount.lustre to rpm
rread [Wed, 12 Nov 2003 06:16:30 +0000 (06:16 +0000)]
fix cut-n-paste error
rread [Tue, 11 Nov 2003 23:13:17 +0000 (23:13 +0000)]
Add new "lustre" fs type which supports only zeroconf mounts.
- old zeroconf client code removed from lustre-lite, and the lconf
--zeroconf option deleted
- common code factored out for ll/lustre fill_super and put_super
- lconf still uses lustre_lite (but not for long)
llmount will be used by mount for lustre filesystems if copied to
/sbin/mount.lustre:
mount -t lustre mds_host:/mds_servicee/profile /mnt/lustre
Multiple mounts of the same filesystem are supported.
Remove unused mds and filter nspath code.
eeb [Tue, 11 Nov 2003 16:45:19 +0000 (16:45 +0000)]
* removed some more build warnings
eeb [Tue, 11 Nov 2003 16:21:56 +0000 (16:21 +0000)]
* reduced the number of warnings for liblustre compilation
ericm [Tue, 11 Nov 2003 10:05:28 +0000 (10:05 +0000)]
merge b_devel to b_eq:
20031111, for recent kernel patches updates.
adilger [Mon, 10 Nov 2003 16:01:27 +0000 (16:01 +0000)]
Despite Zach's assurances, it seems that conditional_schedule() is a
low-latency RHism, so define a compat macro for non-RH systems.
b=2227
ericm [Mon, 10 Nov 2003 10:45:40 +0000 (10:45 +0000)]
liblustre: temprorily add accepter installation, for Jerrifer build rpm.
ericm [Sat, 8 Nov 2003 15:22:46 +0000 (15:22 +0000)]
tcpnal: need exchange NID when tcp connection created, ksocknal now default
require this.
rread [Fri, 7 Nov 2003 21:45:46 +0000 (21:45 +0000)]
transparently convert '-' to '_' in command line options.
so, for example, you can use --write-conf if you want
rread [Fri, 7 Nov 2003 20:58:06 +0000 (20:58 +0000)]
- fix typo so replay-single test_32 runs again.
rread [Fri, 7 Nov 2003 18:56:51 +0000 (18:56 +0000)]
i really should make this a compile time option, at least.
rread [Fri, 7 Nov 2003 18:52:22 +0000 (18:52 +0000)]
- remove use of "lctl device_list" from lconf, now using
/proc/fs/lustre/devices
rread [Fri, 7 Nov 2003 18:52:21 +0000 (18:52 +0000)]
b=2225 fix obd_self_export issues
ericm [Fri, 7 Nov 2003 15:50:56 +0000 (15:50 +0000)]
merge b_devel to b_eq:
20031107
kernel passed sanity, but liblustre broken since lconf changes. need fix
by robert.
sice [Fri, 7 Nov 2003 08:22:16 +0000 (08:22 +0000)]
Add define of LCONF LMC LCTL in replay-dual.sh replay-single.sh replay-ost-single.sh test-framework.sh
rread [Fri, 7 Nov 2003 06:19:32 +0000 (06:19 +0000)]
Remind the user to use 'cfg_device' to set target device for config commands.
rread [Wed, 5 Nov 2003 08:35:46 +0000 (08:35 +0000)]
- change NETWORKTYPE to NETTYPE, and make sure it can be overridden.
rread [Wed, 5 Nov 2003 05:57:47 +0000 (05:57 +0000)]
r=2152
replay-single.sh now supports multiple nodes. The script must be run
on the client, but the mds, mdsfailover, and ost can all be different
nodes. If mdsfailover_HOST is set, then the MDS service will be
failed between the two mds nodes.
uml1# PDSH="pdsh -w" mds_HOST=uml2 ost_HOST=uml3 mdsfailover_HOST=uml4
./replay-single.sh
It still runs on a single node as before.
phil [Wed, 5 Nov 2003 00:47:41 +0000 (00:47 +0000)]
b=1028
Andreas pointed out that we already have a function obdo_from_inode,
and that we might as well pack all valid fields, and let the client
take what it can.
phil [Wed, 5 Nov 2003 00:00:19 +0000 (00:00 +0000)]
b=1028
r=zab
Return the object's block count from write requests, and store it in
the client inode.
adilger [Mon, 3 Nov 2003 23:41:32 +0000 (23:41 +0000)]
Don't do setattr after transaction handle has been committed.
Combine size and timestamp setattrs, and update size under i_sem.
Use client timestamps instead of server timestamps for files.
rread [Mon, 3 Nov 2003 07:00:17 +0000 (07:00 +0000)]
return the osc create grow counts to 2000, which sadly doesn't work
very well on my machine.
alex [Sat, 1 Nov 2003 11:48:07 +0000 (11:48 +0000)]
- filter_finish_transno() moved in filter_direct_io()
- fsfilt_commit_async() is called unconditionally in filter_direct_io()
rread [Fri, 31 Oct 2003 19:39:27 +0000 (19:39 +0000)]
land zcfg on devel
- includes changes from b_llogging
- the MDS -> LOV connection is created using the MDS config log, so the
log must exist. Lconf will create the log automatically when
--reformat is used. To create the config logs on an existing
filesystem, run lconf on the MDS with --write_conf. This only needs to
be done on the MDS.
- LOV does not connect to the MDS during setup. Instead, the MDS and
MDC use obd_get_info("lovdesc") to get the stripe info. The LOVDESC
and LOVTGTS files on the MDS are no longer used (and not created
on new filesystems.)
- Zeroconf clients are new support by lconf --zeroconf, and
replay-single.sh uses this to mount the client. The exact arguments
needed for zeroconf will be changing quickly, so don't use
lconf --zeroconf in other scripts yet. Instead, once the dust
clears, lconf will be changed to always do zeroconf mounts, and
eventually when don't need lconf anymore is when we start changing
the test scripts.
rread [Fri, 31 Oct 2003 18:50:49 +0000 (18:50 +0000)]
merge devel zcfg
rread [Fri, 31 Oct 2003 06:53:23 +0000 (06:53 +0000)]
merge devel zcfg
alex [Wed, 29 Oct 2003 22:38:26 +0000 (22:38 +0000)]
- async commit/wait API have been changed a bit. it used oti_handler, but
it must not
rread [Wed, 29 Oct 2003 19:35:58 +0000 (19:35 +0000)]
merge devel zcfg
shaver [Wed, 29 Oct 2003 17:35:57 +0000 (17:35 +0000)]
b=2200: perform delorphan recovery when an OST is reintegrated.
- use generic per-obd notification system
- relay LOV's notifications to MDS
- support running delorphan on only one OST, if an OST
UUID is specified
- suppress CERRORs and refailing for -EIO in interpret_create,
because we handle it more gracefully now
r=phik.
adilger [Wed, 29 Oct 2003 08:37:23 +0000 (08:37 +0000)]
Add extN compat.
wangdi [Wed, 29 Oct 2003 08:02:47 +0000 (08:02 +0000)]
Made a mistake in commit dsp.patch, just revert it, sorry
wangdi [Wed, 29 Oct 2003 07:52:31 +0000 (07:52 +0000)]
fix patch confilcts on dsp.patch for 2.4.20-rh kernel
ericm [Wed, 29 Oct 2003 05:14:23 +0000 (05:14 +0000)]
merge b_devel -> b_eq:
20031029
kerenl pass sanity.sh, liblustre is broken
youfeng [Wed, 29 Oct 2003 03:47:04 +0000 (03:47 +0000)]
b_2036
formats nids for readability, from "nid" to "nid (format)"
nid : using LPX64 everywhere
format : "%u:%d.%d.%d.%d" for IP NIDS (leading part iscluster ID)
"%u:%u" for non-IP NIDs (leading part is cluster ID)
niu [Wed, 29 Oct 2003 02:28:00 +0000 (02:28 +0000)]
b: 1990
r: Andreas
Padding record header to 64 bits size.
rread [Tue, 28 Oct 2003 21:46:17 +0000 (21:46 +0000)]
merge devel zcfg
This merge broke config llog, so this branch doesn't mount right now:
llog_write_rec()) ASSERTION((buflen %% LLOG_MIN_REC_SIZE) == 0)
alex [Tue, 28 Oct 2003 17:49:06 +0000 (17:49 +0000)]
- brown paper bug fixed: it seems I pressed dd randomly :(
alex [Tue, 28 Oct 2003 13:19:17 +0000 (13:19 +0000)]
- b2188: filter fixes: sync transaction, commit earlier
niu [Tue, 28 Oct 2003 09:29:05 +0000 (09:29 +0000)]
b: 1990
r: Andreas
Eliminate u8/u16 from log header, move padding from record header/tail to
record body.