Whamcloud - gitweb
eeb [Thu, 13 Oct 2005 18:24:45 +0000 (18:24 +0000)]
* the userspace version of LIBCFS_ALLOC() stopped zeroing memory
This was reported as bug 9506
pjkirner [Thu, 13 Oct 2005 15:53:56 +0000 (15:53 +0000)]
* Now handle portals pid correctly (To support catamount clients)
nathan [Tue, 11 Oct 2005 19:26:40 +0000 (19:26 +0000)]
Branch b_hd_newconfig
r=eeb
only drop self-reference on error, not already-initted return value
nathan [Mon, 10 Oct 2005 22:50:21 +0000 (22:50 +0000)]
b=8076
update from b1_4
pjkirner [Mon, 10 Oct 2005 13:53:54 +0000 (13:53 +0000)]
* weak alias doesn't seem to be working especially on the XT3 Catamount buid system. Go to a more explict registration based on #ifdefs.
pjkirner [Mon, 10 Oct 2005 13:45:45 +0000 (13:45 +0000)]
* Added include/libcfs/types.h to fixed XT3 (Catamount) build error (i.e. no "types.h" in that environmnet)
liangzhen [Sat, 8 Oct 2005 11:01:32 +0000 (11:01 +0000)]
Fix problem for build usocklnd and uptllnd while with --disable-modules.
pjkirner [Fri, 7 Oct 2005 17:43:12 +0000 (17:43 +0000)]
* Fix build problems on Catamount
r=eeb
pjkirner [Fri, 7 Oct 2005 17:41:04 +0000 (17:41 +0000)]
* Fixed bug in liblustre that always caused LNET to fail to init
r=eeb
eeb [Fri, 7 Oct 2005 12:48:55 +0000 (12:48 +0000)]
* Candidate fix for 9470: changed LNET config-on-load (which is now the
default) to actually spawn a separate thread to configure LNET since it
looks like prepare/schedule_work() doesn't avoid the modprobe dealock
always.
eeb [Thu, 6 Oct 2005 17:12:57 +0000 (17:12 +0000)]
* fixed 9468: problem with patch I submitted for 9414 (graceful
handling of gid lookup failures).
pjkirner [Wed, 5 Oct 2005 20:48:13 +0000 (20:48 +0000)]
* Fixed bug with unitilized data in plni
pjkirner [Wed, 5 Oct 2005 20:47:27 +0000 (20:47 +0000)]
* Fix build error that got committed by accident
* Remove unused variable
pjkirner [Wed, 5 Oct 2005 18:36:35 +0000 (18:36 +0000)]
* Add seperate thread for non-critical work
* Moved peer locks so they are not held with Ptl API calls are made
* Fixed location of returning credits and checking sends (to before lnet_parse()) is called
* Added TBD list
nathan [Tue, 4 Oct 2005 22:22:18 +0000 (22:22 +0000)]
Update b1_4_newconfig from b1_4
Add add_conn back to lconf so failover works
pjkirner [Tue, 4 Oct 2005 20:20:43 +0000 (20:20 +0000)]
* PTL_EQ_HANDLER_NONE not defined in standart portals (i.e. on Catamount XT3)
eeb [Tue, 4 Oct 2005 15:07:49 +0000 (15:07 +0000)]
* Added lctl commands...
which_nid NID [NID...] prints the closest NID
list_nids [all] lists local NIDs (all to include lo)
* Stopped 'lctl network' from printing local NIDs when passed 0 args; this
arg overloading was a bit hackish anyway.
pjkirner [Tue, 4 Oct 2005 15:04:58 +0000 (15:04 +0000)]
* Fix XT3 build errors on Catamount (that were already resolved on Linux XT3 build) by making a common file that contains the necessary definitions.
eeb [Sat, 1 Oct 2005 11:48:42 +0000 (11:48 +0000)]
* Updated gm-reg-phys patch with the fix from Myricom that allows
userpace programs to link again.
pjkirner [Fri, 30 Sep 2005 17:58:06 +0000 (17:58 +0000)]
* Fix leaking MD on Put on sender side. Introduced when I added threashold +1 for get operations. Which should only have been on the recv side.
* A few more stats to track MDs
* Cleanup a few ASSERTS
pjkirner [Fri, 30 Sep 2005 14:31:16 +0000 (14:31 +0000)]
* Cleanup stats
they are unsigned
shorten names
fix typo
add new stat for credit noop msg
* Fix bug with bucket check rounding error
pjkirner [Fri, 30 Sep 2005 13:07:52 +0000 (13:07 +0000)]
* Adjusted comment on PTL_MD_MAX_IOV now that I've got confirmation from Cray on this fact
pjkirner [Fri, 30 Sep 2005 13:03:38 +0000 (13:03 +0000)]
* Remove unused code
eeb [Fri, 30 Sep 2005 08:23:46 +0000 (08:23 +0000)]
* correction to ip2nets matching code
eeb [Fri, 30 Sep 2005 05:51:17 +0000 (05:51 +0000)]
* Added the 'ip2nets' lnet module parameter. It may be specified
instead of 'networks', and it consists of a list of networks and IP match
expressions. If any IP addresses on the node match the IP match
expression, the given network is added to the list of local networks.
This (along with recent route table fixes etc) allows a single site-wide
modprobe.conf
* Included a recursion check in lnet_finalize(). Recursion through
lnet_return_credits_locked() was possible if a set of blocked buffers
suddenly all complete in-line (e.g. when a peer dies). The recursion check
limits the number of threads actually processing finalized messages to the
number of CPUs.
* Cosmetic changes to /proc/lnet/{buffers,nis,peers} to make the
buffer/credit stats more readable.
* Fixed configure-on-load (it used to deadlock in modprobe when lnet loaded
LNDs) and made it the default.
* Added a shell script 'lnetunload' to unload lnet and any loaded LNDs.
liangzhen [Fri, 30 Sep 2005 04:27:05 +0000 (04:27 +0000)]
- Add new option --disable-libpthread while building of Lustre library
usocklnd will not be built while the option is used.
- Fix build problem while no libpthread in system.
- Change configuration message to make it clear.
pjkirner [Thu, 29 Sep 2005 20:55:13 +0000 (20:55 +0000)]
* Fix build breakage on XT3 introduced earlier today
pjkirner [Thu, 29 Sep 2005 20:04:49 +0000 (20:04 +0000)]
* make scheduler threads balance rx and tx more evenly
* move peer allocation outside spin lock
* fix bug with irqs not being disabled on spinlocks that need syncronzation with callbacks
eeb [Thu, 29 Sep 2005 14:01:10 +0000 (14:01 +0000)]
* PTL_MTU,PTL_MD_MAX_IOV -> LNET_MTU,LNET_MAX_IOV
* moved portals mtu and max iov out of lib-types.h into types.h so ptllnd
can see PTL_MTU/PTL_MD_MAX_IOV
CAVEAT EMPTOR: this might break ptllnd builds with non-lustre portals
since these defines are not in the official spec. Also, ptllnd really
needs the underlying portals to cope with LNET's MTU and MAX_IOV; it
simply asserts it currently.
eeb [Thu, 29 Sep 2005 13:16:35 +0000 (13:16 +0000)]
* cleaned up socklnd descriptor handling
eeb [Thu, 29 Sep 2005 12:13:04 +0000 (12:13 +0000)]
* More PORTALS -> LND renaming
eeb [Thu, 29 Sep 2005 00:11:36 +0000 (00:11 +0000)]
* Changed LNetGet() LNetPut() to include a source nid parameter
Set this to LNET_NID_ANY to have the source nid set automatically,
otherwise it checks that the destination is reachable from the given
source NID and sends using that. This ensures the source NID is
deterministic in the presence of equivalent routes.
* Changed LNetDist() to return the count of networks traversed rather than
routers (i.e. 0 == local, 1 == local net, 2 == via 1 router etc). It also
returns a source NID (== NID of local LNET interface that will send).
* Changed LNet() completion events to include the target process ID so the app
can tell which NID the sender sent to.
* Removed the 'implicit_loopback' lnet module parameter. You have to use
0@lo explicitly if you want loopback.
* Changed lustre to remember the source NID returned by LNetDist() to ensure
client requests always arrive with a consistent initiator NID.
* Changed lustre to remember the target NID of incoming requests to ensure
server replies are always sent with the same source NID that the client
sent her requests to.
* Changed lustre to use a NID of 0@lo if the distance to the target NID is 0
(i.e use the loopback LNET interface when talking to local servers).
pjkirner [Wed, 28 Sep 2005 18:01:31 +0000 (18:01 +0000)]
* Update license text for all PTLLND from phil's boilerplate
pjkirner [Wed, 28 Sep 2005 17:52:16 +0000 (17:52 +0000)]
* Add eager_recv
eeb [Wed, 28 Sep 2005 10:07:00 +0000 (10:07 +0000)]
* minor build fixes for ulnds/ptllnd
pjkirner [Wed, 28 Sep 2005 00:35:48 +0000 (00:35 +0000)]
* Add ptllnd.h to libptllnd_a_SOURCES
pjkirner [Wed, 28 Sep 2005 00:01:08 +0000 (00:01 +0000)]
* Commit small patch for lnet to allow Lustre to build against a 2.6.12 kernel.
(From adliger)
eeb [Tue, 27 Sep 2005 15:33:01 +0000 (15:33 +0000)]
* first cut ulnds/ptllnd
pjkirner [Tue, 27 Sep 2005 13:51:27 +0000 (13:51 +0000)]
Patch by Liang
1. lnet/ulnds/socklnd/debug.c will not be built anymore, lnet/libcfs/libcfs.a will be created while building of liblustre, functions in lnet/ulnds/socklnd/debug.c are moved to lnet/libcfs/debug.c.
2. LNET_SINGLE_THREADED will be assigned to 1 if no libpthread found.
3. usocklnd will not be built if no libpthread (I'm not sure if it's reasonable)
pjkirner [Mon, 26 Sep 2005 14:15:28 +0000 (14:15 +0000)]
* Cleanup NULL ptr usage
pjkirner [Mon, 26 Sep 2005 13:54:42 +0000 (13:54 +0000)]
* Complete the partial checkin of the MSG_TYPE_PUT typo. The build works again.
eeb [Sun, 25 Sep 2005 18:31:01 +0000 (18:31 +0000)]
* fixed typo PTLLND_MSG_TYPE_PUT
eeb [Fri, 23 Sep 2005 16:25:43 +0000 (16:25 +0000)]
* Added check for non-deterministic routes
pjkirner [Fri, 23 Sep 2005 15:00:56 +0000 (15:00 +0000)]
* Resolved "unload problem" due to PTL_MD_LUSTRE_COMPLETION_SEMANTICS not being specified explicitly when runing on Cray Portals.
* Removed TESTING_WITH_LOOPBACK (early unit testing code will never have any value in the future)
* Added more comments on the diffrences between Portals implemenations in the headers
eeb [Fri, 23 Sep 2005 14:30:47 +0000 (14:30 +0000)]
* Changed parsing of 'routes=' to allow routes via different networks with
and with different hopcounts; it just uses the shortest hopcount.
NB this still needs a change to LNet{Put,Get} to allow the caller to
specify the source NI and for ptlrpc to use it.
* forwarding="enabled" explicitly enables the node as a router
forwarding="disabled" explicitly disables the node as a router
otherwise if any of the node's NIDs are mentioned as routers in 'routes='
the node is implicitly enabled as a router.
* Added LNET_SINGLE_THREADED to tell LNET if the userspace runtime supports
posix threads.
* Added lnd_wait() to tell LNET that to call into the LND if the APP wants to
block for an event.
liangzhen [Fri, 23 Sep 2005 12:01:32 +0000 (12:01 +0000)]
1. Add now option --disable-usocklnd, to disable build & link of libsocklnd.
2. Smallfix for link of liblustre.a
pjkirner [Fri, 23 Sep 2005 02:38:05 +0000 (02:38 +0000)]
* Fixed problem where struct iovec != ptl_md_iovec_t, causing >1 page PtlMDAttach() to fail with error 26.
pjkirner [Thu, 22 Sep 2005 21:29:33 +0000 (21:29 +0000)]
* ptllnd stats via /proc entry
* a few more stats.
pjkirner [Thu, 22 Sep 2005 19:07:37 +0000 (19:07 +0000)]
* Fix KIOV mapping when used with CRAY PORTALS
* Replaced KIOV mapping for LUSTRE PORTALS that was previously there.
pjkirner [Thu, 22 Sep 2005 17:34:14 +0000 (17:34 +0000)]
* Invalid flags to cfs_mem_cache_alloc (Causd oops on XT3)
pjkirner [Thu, 22 Sep 2005 15:42:27 +0000 (15:42 +0000)]
* Remove non compatible irqs_diabled(). Basically backing out the changes from last night introducing this.
eeb [Thu, 22 Sep 2005 13:05:19 +0000 (13:05 +0000)]
* Fixed bugs in new lnet_[k]iov2[k]iov() that skipped fragments incorrectly.
These bugs would manifest as data corruption or crashes.
* Reduced verbosity of router buffer allocation CDEBUGs
eeb [Thu, 22 Sep 2005 10:59:57 +0000 (10:59 +0000)]
* removed #include <asm/arch/system.h> from kp30.h
* Fixed bug in lnet_send() that broke portals compatibility.
* Fixed places where lnd_eager_recv() fails but lnet doesn't decref
the relevant peer before freeing the msg.
pjkirner [Thu, 22 Sep 2005 04:15:10 +0000 (04:15 +0000)]
* Use a real PID
* Fixed defect in a case were lock,action,lock, rather than lock,action,unlock
* Fixed locking in error path (x2)
* Use Cray SS IFACE type
pjkirner [Thu, 22 Sep 2005 03:23:27 +0000 (03:23 +0000)]
* Fixed asserts
pjkirner [Thu, 22 Sep 2005 03:07:33 +0000 (03:07 +0000)]
* Additional checks for correct locking
pjkirner [Thu, 22 Sep 2005 03:04:31 +0000 (03:04 +0000)]
* Add support in for irqs_disabled() on all patforms
pjkirner [Wed, 21 Sep 2005 21:19:21 +0000 (21:19 +0000)]
* Add additional asserts to check caller context
pjkirner [Wed, 21 Sep 2005 20:41:14 +0000 (20:41 +0000)]
* Cleanedup send path
* Fixed lock being heald while calling lnet_finalize (only in error path)
* Fixed calls to kptllnd_peer_queue_tx_locked(), to actually hold the lock.
eeb [Wed, 21 Sep 2005 18:45:40 +0000 (18:45 +0000)]
* Moved some #defines out of klnds/ptllnd.h into a shared place
pjkirner [Wed, 21 Sep 2005 18:31:25 +0000 (18:31 +0000)]
* Adjusted TX Pool to meet new LNET requiremenst (no blocking)
* Fixed bug with routed REPLY
pjkirner [Wed, 21 Sep 2005 18:15:26 +0000 (18:15 +0000)]
* Fix a typo
eeb [Wed, 21 Sep 2005 17:45:46 +0000 (17:45 +0000)]
* Fixed LND registration / setting the_lnet.ln_init ordering
eeb [Wed, 21 Sep 2005 16:54:30 +0000 (16:54 +0000)]
* Added lnet/ulnds/ptllnd
* Changed userspace LNET to register all LNDs it has been linked with
and construct the default set of networks from them.
* Added support for userspace LNET config via environment variables
LNET_NETWORKS and LNET_ROUTES. They work just like the kernel module
parameters.
pjkirner [Wed, 21 Sep 2005 14:30:36 +0000 (14:30 +0000)]
* Fixed dist problem with ptllnd_wire.h file that moved
* Fixed dist problem with missing file in libcfs/linux
eeb [Wed, 21 Sep 2005 08:21:08 +0000 (08:21 +0000)]
* fixed a ./config warning
* moved klnds/ptllnd/ptllnd_wire.h into lnet/include/lnet/ so
ulnds/ptllnd can get at it.
* fixed/added some .cvsignore files
* fixed ulnds/tcplnd to give itself some send credits
pjkirner [Wed, 21 Sep 2005 05:13:14 +0000 (05:13 +0000)]
* Fixed some XT3 Compilation issues
* Clarified a few LNET vs PTL ambiguities
* Added code for mapping kiov -> iovec
pjkirner [Wed, 21 Sep 2005 02:08:05 +0000 (02:08 +0000)]
b=7982
* Portals LND
pjkirner [Wed, 21 Sep 2005 00:38:08 +0000 (00:38 +0000)]
* Undoing changes from the b_newconfig_rdmarouting landing that have negativily affected the ptllrpc build.
pjkirner [Tue, 20 Sep 2005 18:10:37 +0000 (18:10 +0000)]
* Fix buffalo build error with removed file.
pjkirner [Tue, 20 Sep 2005 17:43:10 +0000 (17:43 +0000)]
* Removed problematic building of ut (unit test tool) so we can move ahead with buffalo testing.
eeb [Tue, 20 Sep 2005 17:19:08 +0000 (17:19 +0000)]
* removed lnet_parse() rc ambiguity: 0 on success
pjkirner [Tue, 20 Sep 2005 17:00:04 +0000 (17:00 +0000)]
b=7981
* Landing of b_newconfig_rdmarouting
* Passed sanity.sh
* 9348 is still open, but this landing hasn't introduced it.
pjkirner [Mon, 19 Sep 2005 21:56:31 +0000 (21:56 +0000)]
* Fix incorrect path in godb file
pjkirner [Mon, 19 Sep 2005 20:41:31 +0000 (20:41 +0000)]
* Fixed problem with distributed build.
pjkirner [Mon, 19 Sep 2005 13:50:41 +0000 (13:50 +0000)]
* Add simple LNET unit test modules
pjkirner [Fri, 16 Sep 2005 16:42:38 +0000 (16:42 +0000)]
* Fixed build warning (and possible 64-bit error)
pjkirner [Fri, 16 Sep 2005 15:33:32 +0000 (15:33 +0000)]
b=8021
* Landing EEB's b_newconfig_rdmarouting branch
pjkirner [Fri, 16 Sep 2005 13:21:26 +0000 (13:21 +0000)]
* Apply Nikita's patch from portals tree to lnet
pjkirner [Fri, 16 Sep 2005 13:10:48 +0000 (13:10 +0000)]
* Removed extra unnecessary message
liangzhen [Fri, 16 Sep 2005 04:12:40 +0000 (04:12 +0000)]
Fix problem of build ptllnd.
pjkirner [Thu, 15 Sep 2005 13:32:28 +0000 (13:32 +0000)]
* Fix 2.6.5 build issue
liangzhen [Thu, 15 Sep 2005 13:17:32 +0000 (13:17 +0000)]
Remove unused socklnd files in ulnds, they have been
moved ulnds/socklnd.
liangzhen [Thu, 15 Sep 2005 09:56:38 +0000 (09:56 +0000)]
1. two options for build
a. --with-portals=<path to portals>, build ptllnd with external portals
b. --with-lustre-portals, build ptllnd and lustre portals
2. ulnd build patch, tcplnd is built as lnet/ulnds/socklnd
3. smallfix for lnet/ulnds/socklnd
eeb [Thu, 15 Sep 2005 08:54:50 +0000 (08:54 +0000)]
* Moved PTL_{MTU,_MD_MAX_IOV) into types.h
* Removed ref in lustre_net.h to lib-types.h and simplified how
PTLRPC_MAX_BRW_{SIZE,PAGES} are defined.
nathan [Wed, 14 Sep 2005 19:04:23 +0000 (19:04 +0000)]
fix build
pjkirner [Wed, 14 Sep 2005 13:22:02 +0000 (13:22 +0000)]
b=9318
r=eeb
* Removed MOST of the instances of CRAY_PORTALS, especially the ones that were related to the networking portion.
* In addition to what was in the patch for 9318, also included the removal of build_check.h and refrences.
Note: This does NOT complete the work on this bug, there are still a number of outstanding refrences to CRAY_PORTALS.
pjkirner [Wed, 14 Sep 2005 03:56:39 +0000 (03:56 +0000)]
* Fixed LNET undefined symbol PDE() problem on 2.4 kernel (specifically Cray XT3)
nathan [Tue, 13 Sep 2005 03:28:32 +0000 (03:28 +0000)]
b=8080
update from b1_4
eeb [Mon, 12 Sep 2005 17:41:23 +0000 (17:41 +0000)]
* Changed nal_send() to include 'target_is_router' and 'routing' flags
Where 'target_is_router' == the immediate destination is a router
and 'routing' == This message is being forwarded from another LND.
NB The routing flag isn't set yet (but will be when all routing is done in
lib-move.
* Added support for RDMA-ed REPLYs in all relevent LNDs ready for RDMA
routing. LNDs must send IMMEDIATE GETs if the local node or the target
are routers, but may RDMA the REPLY (just lika a PUT) on the return
route.
eeb [Mon, 12 Sep 2005 14:17:32 +0000 (14:17 +0000)]
* viblnd: applied ARP retry patch
eeb [Mon, 12 Sep 2005 13:49:36 +0000 (13:49 +0000)]
* tidied up NID printing (s/LPX64/%s/ && s/nid/libcfs_nid2str(nid)/)
eeb [Sun, 11 Sep 2005 13:54:35 +0000 (13:54 +0000)]
* Cleaned up portals compatibility tests into a couple of inlines
in lib-lnet.h
* Added (but didn't test) portals compatibility support for
gm, openib and ra.
eeb [Sat, 10 Sep 2005 17:12:16 +0000 (17:12 +0000)]
* Added check for portals compatibility mode in LNDs that don't support it
yet.
eeb [Sat, 10 Sep 2005 17:05:04 +0000 (17:05 +0000)]
* Got vibnal LNET/portals wire compatibility working
* Removed bad LPSZ in format strings (I got rid of lnet_size_t)
* Changed vibnal NID printing from LPX64 to %s(libcfs_nid2str(nid))
eeb [Sat, 10 Sep 2005 03:48:48 +0000 (03:48 +0000)]
* LNET/portals wire compatibility working on elan and tcp. Set the lnet
module parameter "portals_compatibility" to...
"strong" Compatible with portals and LNET "strong" and "weak"
"weak" Compatible with any value of LNET portals_compatibility
"none" Compatible with LNET "weak" and "none". This is the default.
Old XML and existing old configuration profiles (logs) can be used as-is.
* Updated GM README
* Backed out most of the change to lconf that used hostaddr to construct the
LNET NID. It now signals an error if the XML contains > 1 --hostaddr, or
if the --hostaddr doesn't match the NID, since it's likely manual
intervention will be required in these cases.
liangzhen [Fri, 9 Sep 2005 15:40:08 +0000 (15:40 +0000)]
optional build for portals.
1. Portals will not be built by default
2. To build portals: configure --with-portals=yes .....
pjkirner [Thu, 8 Sep 2005 20:37:10 +0000 (20:37 +0000)]
* Fix ogdb-host file generation, pickup the correct lnet modules as well as the legacy portals modules (for testing purposes only)
eeb [Thu, 8 Sep 2005 17:18:24 +0000 (17:18 +0000)]
* Added GM README
eeb [Thu, 8 Sep 2005 15:18:55 +0000 (15:18 +0000)]
* Removed unused parameters from LNet??? APIs (e.g. interface handle)
* Removed unused LNet??? APIs.
* Removed many scalar typedefs inherited from portals.
* fixed up alignment in some decls that s/ptl/lnet/ had unaligned.
* updated sanity.sh to s/portals.debug/lnet.debug/
* verified lnet can zeroconf mount a pre-lnet filesystem after
lctl --write_config <pre-lnet-xml>