Whamcloud - gitweb
adilger [Thu, 23 May 2002 16:39:49 +0000 (16:39 +0000)]
Add a few more files to cvsignore
adilger [Wed, 22 May 2002 21:30:42 +0000 (21:30 +0000)]
Update patches to more closely match changes in 2.5 kernel.
Commit fix for htree index-split bug discussed on ext2-devel last week.
adilger [Tue, 21 May 2002 22:06:17 +0000 (22:06 +0000)]
Update configurations to set up LDLM where needed.
Convert scripts over to new setup methods where possible to avoid them
becoming increasingly outdated. Some scripts are already broken, and
I don't use them so I'm not sure whether to remove them or fix them.
Scripts updated are llmount.sh, llrmount.sh, llecho.sh, lldlm.sh,
llmount-client.sh and llmount-server.sh. They use the default config
scripts net*.cfg, obd*.cfg, ldlm.cfg, mds.cfg as needed to do the same
thing they used to do.
adilger [Tue, 21 May 2002 21:36:23 +0000 (21:36 +0000)]
Add LDLM setup/cleanup to subsystems set up via new configuration scripts.
It is also now possible to do "incremental" setup of subsystems (e.g.
"llcleanup.sh client-mount.cfg; llsetup.sh client-mount.cfg" or similar
without shutting everything else down).
*** NOTE:
*** You need to have a line "SETUP_LDLM=y" in your .cfg file (or add
*** ldlm.cfg to your command-line) in order for the CVS HEAD to be usable.
pschwan [Tue, 21 May 2002 04:29:51 +0000 (04:29 +0000)]
Fix small variable confusion that corrupted MDS data
pschwan [Tue, 21 May 2002 03:58:05 +0000 (03:58 +0000)]
- Fixed really stupid bug in events.c that was dereferencing a freed struct
- Made llrmount.sh not suck.
pschwan [Fri, 17 May 2002 16:18:11 +0000 (16:18 +0000)]
* Split struct niobuf into niobuf_local and niobuf_remote
- niobuf_remote is offset, length, xid, and flags
- niobuf_local is all of the above, plus an address and sometimes a page
- The former is sent across the network, the latter used internally
* Small ldlm fixes brought over from the (now-defunct) ldlm_testing branch
- SMP deadlock fix
- comment fix
* Bulk descriptor refactoring
- You create a bulk descriptor and then n bulk pages that get hooked in
- Pages sent all at once, optional callback per page
- Another optional callback when the final ack has been received, although
Eric tells me that elan doesn't guarantee packet ordering, so this needs
revisited
* A few key bugfixes in the MDC/MDS/OSC/OST bulk code; these probably bit us if
we sent it a signal during bulk processing
* A few LOV pieces (mostly in genops.c)
- A temporary gen_multi_setup/cleanup to get the LOV rolling; it won't remain
in this form
I've tested these fixes, but not exhaustively.
adilger [Thu, 16 May 2002 18:23:39 +0000 (18:23 +0000)]
Vmalloc ns_hash instead of kmalloc (it is 128kB). This appears to have
been checked in only in Phil's LDLM branch and not in main.
behlendo [Tue, 14 May 2002 23:43:28 +0000 (23:43 +0000)]
2.4.9 kernel patch against LLNL chaos14 kernels.
braam [Sun, 12 May 2002 02:34:55 +0000 (02:34 +0000)]
- make directIO conditional on kernel version
- add ext2obd patch for 2.4.9
- change ha_assist2 to failover at LLNL
- fix exit code from llmountcleanup.sh to allow kimberlite to work.
braam [Sun, 12 May 2002 01:08:41 +0000 (01:08 +0000)]
- test programs for directio, writing and opening
- phase 2 ha assistance program
braam [Sun, 12 May 2002 01:06:29 +0000 (01:06 +0000)]
- mds failover code
- connection and recovd subsystem
- refined handling of replies/timeout with levels:
- requests are delayed until the request level is lower than or
equals to the connection level
- much updated network documentation
- updated file system recovery documentation
- server maintains lists of open files and handles "re-opening"
maintains list in the metadata client info structures.
- flags on requests to indicate their disposition after a reply,
e.g. retain until commit, retain until explicitly canceled etc.
- new failure instrumentation to drop a reply, but execute the
request.
- handling of re-sent creation requests
- move file attribute updates on mds to close, remove from write
- reconnection routine in llight.
- work through recovery list more orderly:
- retain list in sent order
- handle according to disposition of request
- return integers not void
- add direct (0-copy) I/O support -- doesn't compile on 2.4.9
- failure handling in client reintegration code
- replay handling in server reintegration code
- add names to client systems to understand debugging/tracing output better
- remove most lists from the client structure: the multiple lists
introduced request reordering. We now use one list and flag the
requests.
- re-addressing of connections: invoked by the client recovery scripts
- don't reallocate reply buffers if they were already there and not
consumed in case of re-sending requests.
- introduce a request replay function: I want this to be merged with
ptlrpc_queue wait soon.
- small support routines for continuing delayed requests, restarting
requests for which replies were lost, etc.
- try to get negative errors back even when Portals errors return
positive problems.
- make last committed and received 64 bit in network packets.
- write test programs that:
- keep files open
- do I/O every second
- include 5 basic regression cases for failover recovery:
runfailure-client-mds.sh
- simplify ha_assist.sh -- the secondary ha_assist program does the
work
adilger [Fri, 10 May 2002 23:56:38 +0000 (23:56 +0000)]
Fix each-entry-in-own-block problem for unindexed directories.
adilger [Fri, 10 May 2002 18:23:13 +0000 (18:23 +0000)]
Bug fix for incorrect directory size - it was not setting i_disksize when
appending new directory blocks.
adilger [Thu, 9 May 2002 21:31:40 +0000 (21:31 +0000)]
Routines to "pretty print" various lustre data structs. Useful for debugging.
adilger [Thu, 9 May 2002 20:27:41 +0000 (20:27 +0000)]
Insmod extN if we are using a filesystem of that type.
adilger [Thu, 9 May 2002 20:19:59 +0000 (20:19 +0000)]
Exit on setup error.
adilger [Thu, 9 May 2002 20:16:55 +0000 (20:16 +0000)]
Add extN support to new_fs helper function.
adilger [Thu, 9 May 2002 20:11:41 +0000 (20:11 +0000)]
Macros useful for debugging the file offset/page index corruption, allowing
you to set the maximum file size in a single place (maybe a /proc/sys/lustre
value which could be set at runtime would be more useful at a later date).
adilger [Thu, 9 May 2002 20:09:34 +0000 (20:09 +0000)]
Whitespace cleanup only.
adilger [Thu, 9 May 2002 20:06:58 +0000 (20:06 +0000)]
Ignore extN include files.
adilger [Thu, 9 May 2002 20:06:24 +0000 (20:06 +0000)]
One more extN ignore.
adilger [Thu, 9 May 2002 20:05:55 +0000 (20:05 +0000)]
Add some more files to extN cvsignore.
pschwan [Thu, 9 May 2002 17:08:39 +0000 (17:08 +0000)]
Landing the ldlm_testing branch; now the only difference is that the locking
calls are #if 0ed out of the trunk's ll_file_read and ll_file_write
adilger [Wed, 8 May 2002 20:35:39 +0000 (20:35 +0000)]
Add ext3 extended attributes patch to extN. This needed some massaging in
order to get it to fit with htree. Note that the ext3 EA patch has been
stripped of all the syscall stuff to avoid intruding into the kernel too
much (we still need the VFS xattr methods, but those are really small.
adilger [Wed, 8 May 2002 20:17:14 +0000 (20:17 +0000)]
Fix minor typo.
adilger [Wed, 8 May 2002 20:16:44 +0000 (20:16 +0000)]
Add MDS filesystem methods for extN. For now they are identical to the
ext3 filesystem methods, but the fs_{get,set}_objid methods will change
to use EAs in extN. We will probably also need to take additional blocks
for large directories into account when calculating the transaction size.
adilger [Wed, 8 May 2002 19:57:28 +0000 (19:57 +0000)]
Add ext3 extended attributes patch. This does not include any of the EA
syscall interface code, nor the ACL code.
This _does_ require that the kernel sources be patched to add the xattr
VFS inode methods, but you do not actually need to rebuild the kernel before
using extN - the extra methods are defined in a struct declared by the extN
module so it has no problems if it has a different struct size.
adilger [Wed, 8 May 2002 19:52:03 +0000 (19:52 +0000)]
Add extended attribute VFS methods to the inode operations struct, and a
couple of other EA-related header files.
adilger [Wed, 8 May 2002 19:49:37 +0000 (19:49 +0000)]
For some reason extN complains about "ntohl" not being exported, so rather
than fix that I changed it to be "be32_to_cpu()" which is equivalent. When
I get a chance I will look into this.
adilger [Wed, 8 May 2002 07:07:43 +0000 (07:07 +0000)]
Remove page.c from list of files (holdover from source Makefile.am)
Split the EXPORT stuff from the main patch to allow it to fail silently
if that patch is already applied to the ext3 sources in the kernel.
Minor changes to htree patch to disable debugging output.
adilger [Tue, 7 May 2002 23:47:42 +0000 (23:47 +0000)]
Ignore extN files copied into tree from kernel.
Add extN patches:
- ext3-ino_sb-macro.diff: abstracts access to u.ext3_{sb,i} because we do
not have a u.extN_{sb,i} in the inode struct and we need to use u.generic.
This patch is generic and could be included in the stock kernel (2.5 already
has this abstraction)
- extN-ino_sb-fixup.diff: use the u.generic_{ip,sbp} pointer instead of extN,
a few bits of cleanup from the above patch related to the hashed directory
changes, and includes the extN_bread() export which we will no longer need
to apply to the stock kernel.
- Makefile to do all of the conversion from ext3 to extN and such.
adilger [Tue, 7 May 2002 07:29:58 +0000 (07:29 +0000)]
We don't actually use bulk_vec anywhere in ost_brw_read(), remove it.
adilger [Mon, 6 May 2002 22:47:05 +0000 (22:47 +0000)]
Minor change to niobuf variable name so it is consistent.
wmarcusm [Wed, 1 May 2002 21:30:59 +0000 (21:30 +0000)]
WMM
difftime() macro causing general protect faults when
type int is promoted to double. Perhaps a gcc code
generation bug.
pschwan [Wed, 1 May 2002 17:10:59 +0000 (17:10 +0000)]
Fixed recovd deadlock
pschwan [Wed, 1 May 2002 15:44:11 +0000 (15:44 +0000)]
Avoid cli_lock deadlock in ptlrpc_free_req
pschwan [Tue, 30 Apr 2002 22:08:05 +0000 (22:08 +0000)]
- added a 'dying' head to fix very bad bug in yesterday's request code
- removed request->rq_lock (never used)
- made a ptlrpc_thread structure, and a list of those in ptlrpc_service
- adapted service code to support multithreading
- removed service->srv_id (duplicated existing local_id)
- updated llecho
adilger [Tue, 30 Apr 2002 18:28:51 +0000 (18:28 +0000)]
Fix OSC_DEVNO. It was set initially in the inferior environment variable
config method, but we can name devices now. It is still convenient to
save this value to avoid having to get it for each test. Maybe the
--thread and --device code can be changed to support device names directly
(if they don't already by virtue of using the same device setup code).
pschwan [Tue, 30 Apr 2002 17:15:23 +0000 (17:15 +0000)]
Fixup CFLAGS for building userspace test apps
pschwan [Mon, 29 Apr 2002 22:20:00 +0000 (22:20 +0000)]
Create sparse files unless using one of the gzipped sizes. Waiting for
6GB dd runs has lost all appeal.
pschwan [Mon, 29 Apr 2002 22:04:40 +0000 (22:04 +0000)]
Trivial whitespace, struct, etc changes to bring the ldlm_testing branch more in
line with the trunk.
pschwan [Mon, 29 Apr 2002 21:54:30 +0000 (21:54 +0000)]
removed srv_ev from ptlrpc_service and put it on the service thread's stack
braam [Mon, 29 Apr 2002 20:40:12 +0000 (20:40 +0000)]
- and here are the new files with the previous commit
braam [Mon, 29 Apr 2002 20:36:57 +0000 (20:36 +0000)]
- see message on previous commit.
braam [Mon, 29 Apr 2002 20:36:26 +0000 (20:36 +0000)]
- documentation update for MDS recovery
- remove unused MGR_ constants
- remove rpc fallout from Andreas mergers
- add last committed updates to close/reint
- add handling of last committed to client file system
- add replay handling for recovery to client fs & rpc
- mark requests as completed and committed on the client to
be agnostic of the ordering of these events
- state machine for recovd - basics in place
- last_committed and last_received moved in the lustre_msg from body
- client cleanup is call when system cleans up
- set transaction numbers properly on MDS
- mds_connect call completed
- obd interface for high availability new connection announcements
braam [Mon, 29 Apr 2002 15:25:48 +0000 (15:25 +0000)]
default is not relevant and leads to errors on all calls.
pschwan [Sun, 28 Apr 2002 19:53:31 +0000 (19:53 +0000)]
- small 64-bit warning fix
- removed namespace creation from OSC--it's fixed in the branch and doesn't
belong there anyways.
adilger [Sat, 27 Apr 2002 08:41:47 +0000 (08:41 +0000)]
Update llext3.sh and llrext3.sh scripts to use new config files. This
reduces these scripts to basically llsetup.sh using some .cfg files.
adilger [Sat, 27 Apr 2002 08:33:54 +0000 (08:33 +0000)]
Send last_rcvd values around when talking to the MDS. The MDC gets the
last_{rcvd,committed,xid} values on mdc_connect, but doesn't yet do
anything with this new data except print it to the debug logs.
A select number of MDS operations get last_{rcvd,committed} values sent
in the reply (mds_body) - create, getattr, open. It is not totally
clear to me how to add in the mds_body to an RPC reply if it doesn't
already exist, so there is a little more work to do there.
At connect and reint time, client "UUIDs" are looked up and handled
appropriately for new and existing clients. Currently, since the
RPCs don't actually contain any UUID values, all updates go to UUID "",
which is enough for testing, and should "just work" when UUIDs appear.
adilger [Sat, 27 Apr 2002 08:22:03 +0000 (08:22 +0000)]
Add a helper function to abstract the actual location of the UUID, to
avoid the need for changes when UUIDs move around.
adilger [Sat, 27 Apr 2002 00:20:46 +0000 (00:20 +0000)]
Remove redundant inode parameter from mds_fs_journal_data().
adilger [Sat, 27 Apr 2002 00:17:13 +0000 (00:17 +0000)]
Add lustre_fsync() helper function.
adilger [Sat, 27 Apr 2002 00:16:14 +0000 (00:16 +0000)]
Add last_committed, last_rcvd, and last_xid to the RPC mds_body.
adilger [Fri, 26 Apr 2002 15:59:27 +0000 (15:59 +0000)]
Update journal callback patch so that we can tell if it is applied.
adilger [Fri, 26 Apr 2002 15:57:30 +0000 (15:57 +0000)]
Add support for JBD journal callbacks to update MDS last_committed value.
This needs the most recent kernel patch in order to work properly. If
the kernel patch isn't applied, you will get a message like:
"no journal callback kernel patch, faking it..."
Likewise, ext2 has "fake" support for commit callbacks.
adilger [Wed, 24 Apr 2002 21:09:30 +0000 (21:09 +0000)]
Add callbacks from the JBD (journal) to allow async notification of when
a handle has been committed to disk.
adilger [Wed, 24 Apr 2002 20:16:19 +0000 (20:16 +0000)]
Add client RPC xid to per-client last_rcvd data. If the MDS dies but the
client lives, the on-disk xid tells the client which operations the MDS
has completed, even if the client never got a reply (hence no last_rcvd #).
adilger [Wed, 24 Apr 2002 20:03:54 +0000 (20:03 +0000)]
Code to update the last_rcvd file within a transaction.
adilger [Wed, 24 Apr 2002 08:46:14 +0000 (08:46 +0000)]
Minor fixups to avoid warnings on 64-bit platforms.
adilger [Wed, 24 Apr 2002 08:34:15 +0000 (08:34 +0000)]
Nice change to the OBD_ALLOC and OBD_FREE macros - it prints the name of
the pointer which is being allocated or freed, to make debugging easier.
adilger [Wed, 24 Apr 2002 08:32:47 +0000 (08:32 +0000)]
The code to read the last_rcvd file at MDS startup.
adilger [Wed, 24 Apr 2002 08:15:49 +0000 (08:15 +0000)]
Don't print out bogus rootfid on error.
adilger [Wed, 24 Apr 2002 06:31:47 +0000 (06:31 +0000)]
Add llcleanup.sh script. This is the opposite of the llsetup.sh script,
and also needs a config file in order to work. Some time soon when
network configuration is included, this will be able to do the network
cleanup, unlike the "llmountcleanup.sh" script.
adilger [Wed, 24 Apr 2002 06:29:55 +0000 (06:29 +0000)]
Add in a bit of explanation to the config file documentation. Also added
some notes about the 'runtests' test.
adilger [Wed, 24 Apr 2002 06:07:15 +0000 (06:07 +0000)]
Update the new test configuration stuff to use the newly implemented obdctl
features (newdev, name2dev FOO, and setup $FOO). Also changed the "runtests"
script over to using the new configuration setup so that it is easier to run
with both ext2/ext3 MDS and obdext2/obdfilter OBDs.
adilger [Wed, 24 Apr 2002 06:00:47 +0000 (06:00 +0000)]
Fixups to handle error recovery when we are out of memory. Some of them
need a bit closer inspection, but should be mostly correct.
adilger [Wed, 24 Apr 2002 05:56:16 +0000 (05:56 +0000)]
Allow obdctl to use "$OBDDEV" to resolve a device number in setup.
This required changing the NAME2DEV ioctl so that it didn't change the
currently selected device when it was resolving a name. Now the NAME2DEV
ioctl only resolves the name, and obdctl selects the returned device
explicitly (to user-space the "name2dev" command works exactly the same).
Cleaned up setup scripts to remove last vestiges of hard-coded device numbers.
Instead we use "setup $OBDDEV" (note that the '$' must be escaped from the
shell if using it in a shell script).
adilger [Wed, 24 Apr 2002 01:16:55 +0000 (01:16 +0000)]
Added nesting of journaled operations to last_rcvd file. Currently has a
no-op for the last_rcvd update.
adilger [Wed, 24 Apr 2002 00:50:51 +0000 (00:50 +0000)]
Only set up the MDS service after the filesystem-specific stuff is set up.
Still working towards my broken tree - haven't hit the problem yet.
adilger [Tue, 23 Apr 2002 22:03:30 +0000 (22:03 +0000)]
Small start to committing MDS changes. Testing/committing in separate tree
to ensure they are not the cause of my problems. This one just adds new
fields into the MDS structs (no functional change).
adilger [Tue, 23 Apr 2002 21:44:42 +0000 (21:44 +0000)]
Change llext3.sh and llrmount.sh to use obdfilter, so that mount/remount
will work properly. Fixes "fatal - invalid inode" bug Peter reported.
adilger [Tue, 23 Apr 2002 21:33:05 +0000 (21:33 +0000)]
Commit minor cleanups to reduce size of outstanding changes in my tree.
adilger [Tue, 23 Apr 2002 21:26:12 +0000 (21:26 +0000)]
More changes to OBDDEV so that cleanup works properly.
adilger [Tue, 23 Apr 2002 21:18:49 +0000 (21:18 +0000)]
Another change to OBDDEV so that cleanup works properly.
adilger [Tue, 23 Apr 2002 20:57:36 +0000 (20:57 +0000)]
Use debugging macros to aid in tracing.
adilger [Tue, 23 Apr 2002 20:55:19 +0000 (20:55 +0000)]
Remove extraneous RSH_MDS from elan-server.cfg.
Use OBDDEV for llext3.sh setup script, so cleanup works.
adilger [Tue, 23 Apr 2002 20:30:30 +0000 (20:30 +0000)]
Fix symlinks when building a new tree outside the source tree.
braam [Tue, 23 Apr 2002 19:23:24 +0000 (19:23 +0000)]
- newdev feature in obdctl
braam [Tue, 23 Apr 2002 18:52:46 +0000 (18:52 +0000)]
Description how to run the tests
adilger [Tue, 23 Apr 2002 07:37:38 +0000 (07:37 +0000)]
Missed one of the "OBDDEV" changes to allow common cleanup in llecho.sh.
adilger [Tue, 23 Apr 2002 07:36:15 +0000 (07:36 +0000)]
Cleanup - avoid extraneous indirection in ost_brw_write_cb().
adilger [Tue, 23 Apr 2002 07:33:49 +0000 (07:33 +0000)]
Minor cleanups, get ready to make statfs not complain.
adilger [Tue, 23 Apr 2002 07:33:04 +0000 (07:33 +0000)]
Add in UUID fields.
adilger [Tue, 23 Apr 2002 07:30:13 +0000 (07:30 +0000)]
Add a couple of helper functions to get ll data from a superblock.
adilger [Tue, 23 Apr 2002 07:28:39 +0000 (07:28 +0000)]
Update scripts to use name2dev.
For OST devices, use "OBDDEV" for all types (ext2obd,filterobd,echo) so that
you can clean them all up the same way.
adilger [Tue, 23 Apr 2002 07:21:40 +0000 (07:21 +0000)]
Clean and restart up before running the removal test.
adilger [Tue, 23 Apr 2002 07:20:35 +0000 (07:20 +0000)]
Use underscores instead of dashes in device environment variables.
adilger [Tue, 23 Apr 2002 07:15:49 +0000 (07:15 +0000)]
Return an error from simple_mkdir() if the target exists and isn't a directory.
adilger [Tue, 23 Apr 2002 07:14:51 +0000 (07:14 +0000)]
Clear the current device in the filehandle if name2dev fails.
pschwan [Mon, 22 Apr 2002 22:06:42 +0000 (22:06 +0000)]
64-bit warning fixes. Someone should take a closer look at ext2_obd.c
adilger [Mon, 22 Apr 2002 18:26:50 +0000 (18:26 +0000)]
Cosmetic cleanup.
adilger [Mon, 22 Apr 2002 18:18:28 +0000 (18:18 +0000)]
Only put the ldlm connection if we are not connected locally.
adilger [Mon, 22 Apr 2002 18:15:42 +0000 (18:15 +0000)]
Add mds_fs_journal_data() method to enable data journaling on last_rcvd file.
adilger [Mon, 22 Apr 2002 18:12:37 +0000 (18:12 +0000)]
Fix minor error in error checking.
adilger [Mon, 22 Apr 2002 18:11:30 +0000 (18:11 +0000)]
Fix MDS dir truncation bug.
braam [Mon, 22 Apr 2002 17:34:25 +0000 (17:34 +0000)]
- minor further changes to the test script:
- add a fail function to common.sh to notify user that umount failed
- give all attaches a name.
- clean up llmountcleanup.sh with name2dev
- remove debugging printouts from obdctl
braam [Mon, 22 Apr 2002 16:51:27 +0000 (16:51 +0000)]
- small changes to name2dev:
Usage:
obdctl > attach osc THEOSC
obdctl > quit
mount -t lustre_lite -o device=`obdctl name2dev THEOSC` none /mnt/lustre
- free a tiny leak
- temporary fix to runfailure-net
- also add to obdctl the setting of an environment variable when
setting the name of a device. Not clear yet if this is useful.
braam [Mon, 22 Apr 2002 06:56:45 +0000 (06:56 +0000)]
- fix mds_connect memory leak
- install name parameter for jt_attach in obdctl
- add name2dev feature to find device by name
braam [Mon, 22 Apr 2002 05:57:35 +0000 (05:57 +0000)]
- rename ha_mgr to recovd
- rename connmgr_obd to recovd_obd
- pack fids as part of body_pack/body_unpack
- do body_pack/unpack for both requests and replies
- clean up 3 different groups of constants:
- PTL_RPC_MSG_ERR/REQUEST -- into _idl: part of lustre_msg
- PTL_RPC_FL_{TIMEOUT,REPLY...,} -- bitmask part of request->rq_flags
will control the state machine for recovery somewhat
- PTL_RPC_TYPE_REQUEST/REPLY -- request->rq_type:
to determine what kind of packet is being sent
- ptlrpc_error: set the msg type field to an error message, otherwise
the reply body is accidentally unpacked
- add a c_level field to the connection: the level will control what
RPC's will go out during recovery and which ones are held up until
recovery completes. This will be compared with an rq_level field
(still to be added).
- mdc_connect further finished:
- it gets the fid of ROOT on the MDS and
- llite/super.c now uses that as the root inode. Didn't see major
havoc.
- the mds has a mds_rootfid field accordingly. This is set in
mds_prep.