Whamcloud - gitweb
pschwan [Fri, 24 May 2002 17:09:48 +0000 (17:09 +0000)]
It did indeed fix it; here's the ones that I forgot.
pschwan [Fri, 24 May 2002 16:58:06 +0000 (16:58 +0000)]
Do lots of explicit EXPORT_SYMBOLs to see if this cures ia64 problems
adilger [Thu, 23 May 2002 23:21:05 +0000 (23:21 +0000)]
Don't BUG() if we just run out of space.
Some minor changes made when using page index/offset debugging (which is
not included here).
adilger [Thu, 23 May 2002 23:18:01 +0000 (23:18 +0000)]
Add the xattr files to the list of dependencies.
Add a comment that the export patch can fail (this happens if the kernel
source has already been patched to export ext3_bread()).
adilger [Thu, 23 May 2002 23:00:32 +0000 (23:00 +0000)]
Add interface to updated journal callback interface (it should still work
with the old interface until the kernel has been updated).
Add methods for extN support. These require that the extN module be available
in order to load mds.o.
adilger [Thu, 23 May 2002 22:57:14 +0000 (22:57 +0000)]
Hopefully final version of journal commit callback patch. Stephen agreed
with this one and should include it into CVS/release at some point. This
includes a status for the callback function, because the callbacks need to
always be called (to clean up memory), but the actual operations may have
had an error for some reason (no memory, IO, etc).
pschwan [Thu, 23 May 2002 22:36:34 +0000 (22:36 +0000)]
changed ioctl 'cmd' type from 'long' to 'int'
adilger [Thu, 23 May 2002 22:15:04 +0000 (22:15 +0000)]
Rename dump to debug to hold generic debugging code.
adilger [Thu, 23 May 2002 22:10:36 +0000 (22:10 +0000)]
Save and restore the journal context between preprw and commitrw at the OST.
adilger [Thu, 23 May 2002 21:19:46 +0000 (21:19 +0000)]
Don't use 4096 hard-coded for PAGE_SIZE.
adilger [Thu, 23 May 2002 21:18:50 +0000 (21:18 +0000)]
Add journal_info to ptlrpc_bulk_desc for ext3.
adilger [Thu, 23 May 2002 21:16:40 +0000 (21:16 +0000)]
Quiet compiler warnings about unused values when we don't use EA cache.
adilger [Thu, 23 May 2002 21:14:19 +0000 (21:14 +0000)]
Fix locking with filter. This changes the way objects are looked up in
the filter to always use lookup_one_len() from the parent (which is gotten
at filter_setup time) instead of doing a file open each time.
There appears to be some sort of leak which prevents the loop device from
being cleaned up, but e.g. bonnie runs just fine and all of the modules
can be unloaded. Will look at it again later once I've gotten some more
urgent things out of the way.
adilger [Thu, 23 May 2002 20:57:10 +0000 (20:57 +0000)]
Don't BUG if we only run out of space.
pschwan [Thu, 23 May 2002 20:52:49 +0000 (20:52 +0000)]
- Fixed symlinks
- Fixed connect-failure-leads-to-segfault bug
pschwan [Thu, 23 May 2002 18:28:48 +0000 (18:28 +0000)]
- Fixed problem where all files were owned by root
- Fixed a couple warnings that fell out of the Portals changes
pschwan [Thu, 23 May 2002 18:14:51 +0000 (18:14 +0000)]
- Moved filp_close()s up to avoid unnecessarily unlinking open files
- Added two missing dput()s -- why didn't this cause us more trouble before?
pschwan [Thu, 23 May 2002 18:09:39 +0000 (18:09 +0000)]
Copied some patch infrastructure from ext2obd to extN
adilger [Thu, 23 May 2002 17:47:34 +0000 (17:47 +0000)]
Quiet some overly verbose messages.
adilger [Thu, 23 May 2002 16:39:49 +0000 (16:39 +0000)]
Add a few more files to cvsignore
adilger [Wed, 22 May 2002 21:30:42 +0000 (21:30 +0000)]
Update patches to more closely match changes in 2.5 kernel.
Commit fix for htree index-split bug discussed on ext2-devel last week.
adilger [Tue, 21 May 2002 22:06:17 +0000 (22:06 +0000)]
Update configurations to set up LDLM where needed.
Convert scripts over to new setup methods where possible to avoid them
becoming increasingly outdated. Some scripts are already broken, and
I don't use them so I'm not sure whether to remove them or fix them.
Scripts updated are llmount.sh, llrmount.sh, llecho.sh, lldlm.sh,
llmount-client.sh and llmount-server.sh. They use the default config
scripts net*.cfg, obd*.cfg, ldlm.cfg, mds.cfg as needed to do the same
thing they used to do.
adilger [Tue, 21 May 2002 21:36:23 +0000 (21:36 +0000)]
Add LDLM setup/cleanup to subsystems set up via new configuration scripts.
It is also now possible to do "incremental" setup of subsystems (e.g.
"llcleanup.sh client-mount.cfg; llsetup.sh client-mount.cfg" or similar
without shutting everything else down).
*** NOTE:
*** You need to have a line "SETUP_LDLM=y" in your .cfg file (or add
*** ldlm.cfg to your command-line) in order for the CVS HEAD to be usable.
pschwan [Tue, 21 May 2002 04:29:51 +0000 (04:29 +0000)]
Fix small variable confusion that corrupted MDS data
pschwan [Tue, 21 May 2002 03:58:05 +0000 (03:58 +0000)]
- Fixed really stupid bug in events.c that was dereferencing a freed struct
- Made llrmount.sh not suck.
pschwan [Fri, 17 May 2002 16:18:11 +0000 (16:18 +0000)]
* Split struct niobuf into niobuf_local and niobuf_remote
- niobuf_remote is offset, length, xid, and flags
- niobuf_local is all of the above, plus an address and sometimes a page
- The former is sent across the network, the latter used internally
* Small ldlm fixes brought over from the (now-defunct) ldlm_testing branch
- SMP deadlock fix
- comment fix
* Bulk descriptor refactoring
- You create a bulk descriptor and then n bulk pages that get hooked in
- Pages sent all at once, optional callback per page
- Another optional callback when the final ack has been received, although
Eric tells me that elan doesn't guarantee packet ordering, so this needs
revisited
* A few key bugfixes in the MDC/MDS/OSC/OST bulk code; these probably bit us if
we sent it a signal during bulk processing
* A few LOV pieces (mostly in genops.c)
- A temporary gen_multi_setup/cleanup to get the LOV rolling; it won't remain
in this form
I've tested these fixes, but not exhaustively.
adilger [Thu, 16 May 2002 18:23:39 +0000 (18:23 +0000)]
Vmalloc ns_hash instead of kmalloc (it is 128kB). This appears to have
been checked in only in Phil's LDLM branch and not in main.
behlendo [Tue, 14 May 2002 23:43:28 +0000 (23:43 +0000)]
2.4.9 kernel patch against LLNL chaos14 kernels.
braam [Sun, 12 May 2002 02:34:55 +0000 (02:34 +0000)]
- make directIO conditional on kernel version
- add ext2obd patch for 2.4.9
- change ha_assist2 to failover at LLNL
- fix exit code from llmountcleanup.sh to allow kimberlite to work.
braam [Sun, 12 May 2002 01:08:41 +0000 (01:08 +0000)]
- test programs for directio, writing and opening
- phase 2 ha assistance program
braam [Sun, 12 May 2002 01:06:29 +0000 (01:06 +0000)]
- mds failover code
- connection and recovd subsystem
- refined handling of replies/timeout with levels:
- requests are delayed until the request level is lower than or
equals to the connection level
- much updated network documentation
- updated file system recovery documentation
- server maintains lists of open files and handles "re-opening"
maintains list in the metadata client info structures.
- flags on requests to indicate their disposition after a reply,
e.g. retain until commit, retain until explicitly canceled etc.
- new failure instrumentation to drop a reply, but execute the
request.
- handling of re-sent creation requests
- move file attribute updates on mds to close, remove from write
- reconnection routine in llight.
- work through recovery list more orderly:
- retain list in sent order
- handle according to disposition of request
- return integers not void
- add direct (0-copy) I/O support -- doesn't compile on 2.4.9
- failure handling in client reintegration code
- replay handling in server reintegration code
- add names to client systems to understand debugging/tracing output better
- remove most lists from the client structure: the multiple lists
introduced request reordering. We now use one list and flag the
requests.
- re-addressing of connections: invoked by the client recovery scripts
- don't reallocate reply buffers if they were already there and not
consumed in case of re-sending requests.
- introduce a request replay function: I want this to be merged with
ptlrpc_queue wait soon.
- small support routines for continuing delayed requests, restarting
requests for which replies were lost, etc.
- try to get negative errors back even when Portals errors return
positive problems.
- make last committed and received 64 bit in network packets.
- write test programs that:
- keep files open
- do I/O every second
- include 5 basic regression cases for failover recovery:
runfailure-client-mds.sh
- simplify ha_assist.sh -- the secondary ha_assist program does the
work
adilger [Fri, 10 May 2002 23:56:38 +0000 (23:56 +0000)]
Fix each-entry-in-own-block problem for unindexed directories.
adilger [Fri, 10 May 2002 18:23:13 +0000 (18:23 +0000)]
Bug fix for incorrect directory size - it was not setting i_disksize when
appending new directory blocks.
adilger [Thu, 9 May 2002 21:31:40 +0000 (21:31 +0000)]
Routines to "pretty print" various lustre data structs. Useful for debugging.
adilger [Thu, 9 May 2002 20:27:41 +0000 (20:27 +0000)]
Insmod extN if we are using a filesystem of that type.
adilger [Thu, 9 May 2002 20:19:59 +0000 (20:19 +0000)]
Exit on setup error.
adilger [Thu, 9 May 2002 20:16:55 +0000 (20:16 +0000)]
Add extN support to new_fs helper function.
adilger [Thu, 9 May 2002 20:11:41 +0000 (20:11 +0000)]
Macros useful for debugging the file offset/page index corruption, allowing
you to set the maximum file size in a single place (maybe a /proc/sys/lustre
value which could be set at runtime would be more useful at a later date).
adilger [Thu, 9 May 2002 20:09:34 +0000 (20:09 +0000)]
Whitespace cleanup only.
adilger [Thu, 9 May 2002 20:06:58 +0000 (20:06 +0000)]
Ignore extN include files.
adilger [Thu, 9 May 2002 20:06:24 +0000 (20:06 +0000)]
One more extN ignore.
adilger [Thu, 9 May 2002 20:05:55 +0000 (20:05 +0000)]
Add some more files to extN cvsignore.
pschwan [Thu, 9 May 2002 17:08:39 +0000 (17:08 +0000)]
Landing the ldlm_testing branch; now the only difference is that the locking
calls are #if 0ed out of the trunk's ll_file_read and ll_file_write
adilger [Wed, 8 May 2002 20:35:39 +0000 (20:35 +0000)]
Add ext3 extended attributes patch to extN. This needed some massaging in
order to get it to fit with htree. Note that the ext3 EA patch has been
stripped of all the syscall stuff to avoid intruding into the kernel too
much (we still need the VFS xattr methods, but those are really small.
adilger [Wed, 8 May 2002 20:17:14 +0000 (20:17 +0000)]
Fix minor typo.
adilger [Wed, 8 May 2002 20:16:44 +0000 (20:16 +0000)]
Add MDS filesystem methods for extN. For now they are identical to the
ext3 filesystem methods, but the fs_{get,set}_objid methods will change
to use EAs in extN. We will probably also need to take additional blocks
for large directories into account when calculating the transaction size.
adilger [Wed, 8 May 2002 19:57:28 +0000 (19:57 +0000)]
Add ext3 extended attributes patch. This does not include any of the EA
syscall interface code, nor the ACL code.
This _does_ require that the kernel sources be patched to add the xattr
VFS inode methods, but you do not actually need to rebuild the kernel before
using extN - the extra methods are defined in a struct declared by the extN
module so it has no problems if it has a different struct size.
adilger [Wed, 8 May 2002 19:52:03 +0000 (19:52 +0000)]
Add extended attribute VFS methods to the inode operations struct, and a
couple of other EA-related header files.
adilger [Wed, 8 May 2002 19:49:37 +0000 (19:49 +0000)]
For some reason extN complains about "ntohl" not being exported, so rather
than fix that I changed it to be "be32_to_cpu()" which is equivalent. When
I get a chance I will look into this.
adilger [Wed, 8 May 2002 07:07:43 +0000 (07:07 +0000)]
Remove page.c from list of files (holdover from source Makefile.am)
Split the EXPORT stuff from the main patch to allow it to fail silently
if that patch is already applied to the ext3 sources in the kernel.
Minor changes to htree patch to disable debugging output.
adilger [Tue, 7 May 2002 23:47:42 +0000 (23:47 +0000)]
Ignore extN files copied into tree from kernel.
Add extN patches:
- ext3-ino_sb-macro.diff: abstracts access to u.ext3_{sb,i} because we do
not have a u.extN_{sb,i} in the inode struct and we need to use u.generic.
This patch is generic and could be included in the stock kernel (2.5 already
has this abstraction)
- extN-ino_sb-fixup.diff: use the u.generic_{ip,sbp} pointer instead of extN,
a few bits of cleanup from the above patch related to the hashed directory
changes, and includes the extN_bread() export which we will no longer need
to apply to the stock kernel.
- Makefile to do all of the conversion from ext3 to extN and such.
adilger [Tue, 7 May 2002 07:29:58 +0000 (07:29 +0000)]
We don't actually use bulk_vec anywhere in ost_brw_read(), remove it.
adilger [Mon, 6 May 2002 22:47:05 +0000 (22:47 +0000)]
Minor change to niobuf variable name so it is consistent.
wmarcusm [Wed, 1 May 2002 21:30:59 +0000 (21:30 +0000)]
WMM
difftime() macro causing general protect faults when
type int is promoted to double. Perhaps a gcc code
generation bug.
pschwan [Wed, 1 May 2002 17:10:59 +0000 (17:10 +0000)]
Fixed recovd deadlock
pschwan [Wed, 1 May 2002 15:44:11 +0000 (15:44 +0000)]
Avoid cli_lock deadlock in ptlrpc_free_req
pschwan [Tue, 30 Apr 2002 22:08:05 +0000 (22:08 +0000)]
- added a 'dying' head to fix very bad bug in yesterday's request code
- removed request->rq_lock (never used)
- made a ptlrpc_thread structure, and a list of those in ptlrpc_service
- adapted service code to support multithreading
- removed service->srv_id (duplicated existing local_id)
- updated llecho
adilger [Tue, 30 Apr 2002 18:28:51 +0000 (18:28 +0000)]
Fix OSC_DEVNO. It was set initially in the inferior environment variable
config method, but we can name devices now. It is still convenient to
save this value to avoid having to get it for each test. Maybe the
--thread and --device code can be changed to support device names directly
(if they don't already by virtue of using the same device setup code).
pschwan [Tue, 30 Apr 2002 17:15:23 +0000 (17:15 +0000)]
Fixup CFLAGS for building userspace test apps
pschwan [Mon, 29 Apr 2002 22:20:00 +0000 (22:20 +0000)]
Create sparse files unless using one of the gzipped sizes. Waiting for
6GB dd runs has lost all appeal.
pschwan [Mon, 29 Apr 2002 22:04:40 +0000 (22:04 +0000)]
Trivial whitespace, struct, etc changes to bring the ldlm_testing branch more in
line with the trunk.
pschwan [Mon, 29 Apr 2002 21:54:30 +0000 (21:54 +0000)]
removed srv_ev from ptlrpc_service and put it on the service thread's stack
braam [Mon, 29 Apr 2002 20:40:12 +0000 (20:40 +0000)]
- and here are the new files with the previous commit
braam [Mon, 29 Apr 2002 20:36:57 +0000 (20:36 +0000)]
- see message on previous commit.
braam [Mon, 29 Apr 2002 20:36:26 +0000 (20:36 +0000)]
- documentation update for MDS recovery
- remove unused MGR_ constants
- remove rpc fallout from Andreas mergers
- add last committed updates to close/reint
- add handling of last committed to client file system
- add replay handling for recovery to client fs & rpc
- mark requests as completed and committed on the client to
be agnostic of the ordering of these events
- state machine for recovd - basics in place
- last_committed and last_received moved in the lustre_msg from body
- client cleanup is call when system cleans up
- set transaction numbers properly on MDS
- mds_connect call completed
- obd interface for high availability new connection announcements
braam [Mon, 29 Apr 2002 15:25:48 +0000 (15:25 +0000)]
default is not relevant and leads to errors on all calls.
pschwan [Sun, 28 Apr 2002 19:53:31 +0000 (19:53 +0000)]
- small 64-bit warning fix
- removed namespace creation from OSC--it's fixed in the branch and doesn't
belong there anyways.
adilger [Sat, 27 Apr 2002 08:41:47 +0000 (08:41 +0000)]
Update llext3.sh and llrext3.sh scripts to use new config files. This
reduces these scripts to basically llsetup.sh using some .cfg files.
adilger [Sat, 27 Apr 2002 08:33:54 +0000 (08:33 +0000)]
Send last_rcvd values around when talking to the MDS. The MDC gets the
last_{rcvd,committed,xid} values on mdc_connect, but doesn't yet do
anything with this new data except print it to the debug logs.
A select number of MDS operations get last_{rcvd,committed} values sent
in the reply (mds_body) - create, getattr, open. It is not totally
clear to me how to add in the mds_body to an RPC reply if it doesn't
already exist, so there is a little more work to do there.
At connect and reint time, client "UUIDs" are looked up and handled
appropriately for new and existing clients. Currently, since the
RPCs don't actually contain any UUID values, all updates go to UUID "",
which is enough for testing, and should "just work" when UUIDs appear.
adilger [Sat, 27 Apr 2002 08:22:03 +0000 (08:22 +0000)]
Add a helper function to abstract the actual location of the UUID, to
avoid the need for changes when UUIDs move around.
adilger [Sat, 27 Apr 2002 00:20:46 +0000 (00:20 +0000)]
Remove redundant inode parameter from mds_fs_journal_data().
adilger [Sat, 27 Apr 2002 00:17:13 +0000 (00:17 +0000)]
Add lustre_fsync() helper function.
adilger [Sat, 27 Apr 2002 00:16:14 +0000 (00:16 +0000)]
Add last_committed, last_rcvd, and last_xid to the RPC mds_body.
adilger [Fri, 26 Apr 2002 15:59:27 +0000 (15:59 +0000)]
Update journal callback patch so that we can tell if it is applied.
adilger [Fri, 26 Apr 2002 15:57:30 +0000 (15:57 +0000)]
Add support for JBD journal callbacks to update MDS last_committed value.
This needs the most recent kernel patch in order to work properly. If
the kernel patch isn't applied, you will get a message like:
"no journal callback kernel patch, faking it..."
Likewise, ext2 has "fake" support for commit callbacks.
adilger [Wed, 24 Apr 2002 21:09:30 +0000 (21:09 +0000)]
Add callbacks from the JBD (journal) to allow async notification of when
a handle has been committed to disk.
adilger [Wed, 24 Apr 2002 20:16:19 +0000 (20:16 +0000)]
Add client RPC xid to per-client last_rcvd data. If the MDS dies but the
client lives, the on-disk xid tells the client which operations the MDS
has completed, even if the client never got a reply (hence no last_rcvd #).
adilger [Wed, 24 Apr 2002 20:03:54 +0000 (20:03 +0000)]
Code to update the last_rcvd file within a transaction.
adilger [Wed, 24 Apr 2002 08:46:14 +0000 (08:46 +0000)]
Minor fixups to avoid warnings on 64-bit platforms.
adilger [Wed, 24 Apr 2002 08:34:15 +0000 (08:34 +0000)]
Nice change to the OBD_ALLOC and OBD_FREE macros - it prints the name of
the pointer which is being allocated or freed, to make debugging easier.
adilger [Wed, 24 Apr 2002 08:32:47 +0000 (08:32 +0000)]
The code to read the last_rcvd file at MDS startup.
adilger [Wed, 24 Apr 2002 08:15:49 +0000 (08:15 +0000)]
Don't print out bogus rootfid on error.
adilger [Wed, 24 Apr 2002 06:31:47 +0000 (06:31 +0000)]
Add llcleanup.sh script. This is the opposite of the llsetup.sh script,
and also needs a config file in order to work. Some time soon when
network configuration is included, this will be able to do the network
cleanup, unlike the "llmountcleanup.sh" script.
adilger [Wed, 24 Apr 2002 06:29:55 +0000 (06:29 +0000)]
Add in a bit of explanation to the config file documentation. Also added
some notes about the 'runtests' test.
adilger [Wed, 24 Apr 2002 06:07:15 +0000 (06:07 +0000)]
Update the new test configuration stuff to use the newly implemented obdctl
features (newdev, name2dev FOO, and setup $FOO). Also changed the "runtests"
script over to using the new configuration setup so that it is easier to run
with both ext2/ext3 MDS and obdext2/obdfilter OBDs.
adilger [Wed, 24 Apr 2002 06:00:47 +0000 (06:00 +0000)]
Fixups to handle error recovery when we are out of memory. Some of them
need a bit closer inspection, but should be mostly correct.
adilger [Wed, 24 Apr 2002 05:56:16 +0000 (05:56 +0000)]
Allow obdctl to use "$OBDDEV" to resolve a device number in setup.
This required changing the NAME2DEV ioctl so that it didn't change the
currently selected device when it was resolving a name. Now the NAME2DEV
ioctl only resolves the name, and obdctl selects the returned device
explicitly (to user-space the "name2dev" command works exactly the same).
Cleaned up setup scripts to remove last vestiges of hard-coded device numbers.
Instead we use "setup $OBDDEV" (note that the '$' must be escaped from the
shell if using it in a shell script).
adilger [Wed, 24 Apr 2002 01:16:55 +0000 (01:16 +0000)]
Added nesting of journaled operations to last_rcvd file. Currently has a
no-op for the last_rcvd update.
adilger [Wed, 24 Apr 2002 00:50:51 +0000 (00:50 +0000)]
Only set up the MDS service after the filesystem-specific stuff is set up.
Still working towards my broken tree - haven't hit the problem yet.
adilger [Tue, 23 Apr 2002 22:03:30 +0000 (22:03 +0000)]
Small start to committing MDS changes. Testing/committing in separate tree
to ensure they are not the cause of my problems. This one just adds new
fields into the MDS structs (no functional change).
adilger [Tue, 23 Apr 2002 21:44:42 +0000 (21:44 +0000)]
Change llext3.sh and llrmount.sh to use obdfilter, so that mount/remount
will work properly. Fixes "fatal - invalid inode" bug Peter reported.
adilger [Tue, 23 Apr 2002 21:33:05 +0000 (21:33 +0000)]
Commit minor cleanups to reduce size of outstanding changes in my tree.
adilger [Tue, 23 Apr 2002 21:26:12 +0000 (21:26 +0000)]
More changes to OBDDEV so that cleanup works properly.
adilger [Tue, 23 Apr 2002 21:18:49 +0000 (21:18 +0000)]
Another change to OBDDEV so that cleanup works properly.
adilger [Tue, 23 Apr 2002 20:57:36 +0000 (20:57 +0000)]
Use debugging macros to aid in tracing.
adilger [Tue, 23 Apr 2002 20:55:19 +0000 (20:55 +0000)]
Remove extraneous RSH_MDS from elan-server.cfg.
Use OBDDEV for llext3.sh setup script, so cleanup works.
adilger [Tue, 23 Apr 2002 20:30:30 +0000 (20:30 +0000)]
Fix symlinks when building a new tree outside the source tree.
braam [Tue, 23 Apr 2002 19:23:24 +0000 (19:23 +0000)]
- newdev feature in obdctl
braam [Tue, 23 Apr 2002 18:52:46 +0000 (18:52 +0000)]
Description how to run the tests
adilger [Tue, 23 Apr 2002 07:37:38 +0000 (07:37 +0000)]
Missed one of the "OBDDEV" changes to allow common cleanup in llecho.sh.