+2007-04-23 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.11 / 1.6.1
+ * Support for networks:
+ socklnd - kernels up to 2.6.16
+ qswlnd - Qsnet kernel modules 5.20 and later
+ openiblnd - IbGold 1.8.2
+ o2iblnd - OFED 1.1
+ viblnd - Voltaire ibhost 3.4.5 and later
+ ciblnd - Topspin 3.2.0
+ iiblnd - Infiniserv 3.3 + PathBits patch
+ gmlnd - GM 2.1.22 and later
+ mxlnd - MX 1.2.1 or later
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+ * bug fixes
+
+------------------------------------------------------------------------------
+
+2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.10 / 1.6.0
+ * Support for networks:
+ socklnd - kernels up to 2.6.16
+ qswlnd - Qsnet kernel modules 5.20 and later
+ openiblnd - IbGold 1.8.2
+ o2iblnd - OFED 1.1
+ viblnd - Voltaire ibhost 3.4.5 and later
+ ciblnd - Topspin 3.2.0
+ iiblnd - Infiniserv 3.3 + PathBits patch
+ gmlnd - GM 2.1.22 and later
+ mxlnd - MX 1.2.1 or later
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+ * bug fixes
+
+Severity : major
+Frequency : rare
+Bugzilla : 12458
+Description: Assertion failure in kernel ptllnd caused by posting passive
+ bulk buffers before connection establishment complete.
+
+Severity : major
+Frequency : rare
+Bugzilla : 12455
+Description: A race in kernel ptllnd between deleting a peer and posting
+ new communications for it could hang communications -
+ manifesting as "Unexpectedly long timeout" messages.
+
+Severity : major
+Frequency : rare
+Bugzilla : 12432
+Description: Kernel ptllnd lock ordering issue could hang a node.
+
+Severity : major
+Frequency : rare
+Bugzilla : 12016
+Description: node crash on socket teardown race
+
+Severity : minor
+Frequency : 'lctl peer_list' issued on a mx net
+Bugzilla : 12237
+Description: Enable lctl's peer_list for MXLND
+
+Severity : major
+Frequency : after Ptllnd timeouts and portals congestion
+Bugzilla : 11659
+Description: Credit overflows
+Details : This was a bug in ptllnd connection establishment. The fix
+ implements better peer stamps to disambiguate connection
+ establishment and ensure both peers enter the credit flow
+ state machine consistently.
+
+Severity : major
+Frequency : rare
+Bugzilla : 11394
+Description: kptllnd didn't propagate some network errors up to LNET
+Details : This bug was spotted while investigating 11394. The fix
+ ensures network errors on sends and bulk transfers are
+ propagated to LNET/lustre correctly.
+
+Severity : enhancement
+Bugzilla : 10316
+Description: Fixed console chatter in case of -ETIMEDOUT.
+
+Severity : enhancement
+Bugzilla : 11684
+Description: Added D_NETTRACE for recording network packet history
+ (initially only for ptllnd). Also a separate userspace
+ ptllnd facility to gather history which should really be
+ covered by D_NETTRACE too, if only CDEBUG recorded history in
+ userspace.
+
+Severity : major
+Frequency : rare
+Bugzilla : 11616
+Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED.
+Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED
+ callback can occur before a connection has actually been
+ established. This caused an assertion failure previously.
+
+Severity : enhancement
+Bugzilla : 11094
+Description: Multiple instances for o2iblnd
+Details : Allow multiple instances of o2iblnd to enable networking over
+ multiple HCAs and routing between them.
+
+Severity : major
+Bugzilla : 11201
+Description: lnet deadlock in router_checker
+Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock
+ into BH locks to eliminate potential deadlock caused by
+ ksocknal_data_ready() preempting code holding these locks.
+
+Severity : major
+Bugzilla : 11126
+Description: Millions of failed socklnd connection attempts cause a very slow FS
+Details : added a new route flag ksnr_scheduled to distinguish from
+ ksnr_connecting, so that a peer connection request is only turned
+ down for race concerns when an active connection to the same peer
+ is under progress (instead of just being scheduled).
+
+------------------------------------------------------------------------------
+
+2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.9
+ * Support for networks:
+ socklnd - kernels up to 2.6.16
+ qswlnd - Qsnet kernel modules 5.20 and later
+ openiblnd - IbGold 1.8.2
+ o2iblnd - OFED 1.1
+ viblnd - Voltaire ibhost 3.4.5 and later
+ ciblnd - Topspin 3.2.0
+ iiblnd - Infiniserv 3.3 + PathBits patch
+ gmlnd - GM 2.1.22 and later
+ mxlnd - MX 1.2.1 or later
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+ * bug fixes
+
+Severity : major on XT3
+Bugzilla : none
+Description: libcfs overwrites /proc/sys/portals
+Details : libcfs created a symlink from /proc/sys/portals to
+ /proc/sys/lnet for backwards compatibility. This is no
+ longer required and makes the Cray portals /proc variables
+ inaccessible.
+
+Severity : minor
+Bugzilla : 11312
+Description: OFED FMR API change
+Details : This changes parameter usage to reflect a change in
+ ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note
+ that FMR support is only used in experimental versions of the
+ o2iblnd - this change does not affect standard usage at all.
+
+Severity : enhancement
+Bugzilla : 11245
+Description: new ko2iblnd module parameter: ib_mtu
+Details : the default IB MTU of 2048 performs badly on 23108 Tavor
+ HCAs. You can avoid this problem by setting the MTU to 1024
+ using this module parameter.
+
+Severity : enhancement
+Bugzilla : 11118/11620
+Description: ptllnd small request message buffer alignment fix
+Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives.
+ Round up small message size on sends in case this option
+ is not supported. 11620 was a defect in the initial
+ implementation which effectively asserted all peers had to be
+ running the correct protocol version which was fixed by always
+ NAK-ing such requests and handling any misalignments they
+ introduce.
+
+Severity : minor
+Frequency : rarely
+Description: When kib(nal|lnd)_del_peer() is called upon a peer whose
+ ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s
+ 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail.
+
+Severity : enhancement
+Bugzilla : 11250
+Description: Patchless ZC(zero copy) socklnd
+Details : New protocol for socklnd, socklnd can support zero copy without
+ kernel patch, it's compatible with old socklnd. Checksum is
+ moved from tunables to modparams.
+
+Severity : minor
+Frequency : rarely
+Description: When ksocknal_del_peer() is called upon a peer whose
+ ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s
+ 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail.
+
+Severity : normal
+Frequency : when ptlrpc is under heavy use and runs out of request buffer
+Bugzilla : 11318
+Description: In lnet_match_blocked_msg(), md can be used without holding a
+ ref on it.
+
+Severity : minor
+Frequency : very rarely
+Bugzilla : 10727
+Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost.
+ If connd connects a route which has been closed by
+ ksocknal_shutdown(), ksocknal_create_routes() may create new
+ routes which hold references on the peer, causing shutdown
+ process to wait for peer to disappear forever.
+
+Severity : enhancement
+Bugzilla : 11234
+Description: Dump XT3 portals traces on kptllnd timeout
+Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to
+ dump Cray portals debug traces to a file. The kptllnd module
+ parameter "ptltrace_basename", default "/tmp/lnet-ptltrace",
+ is the basename of the dump file.
+
+Severity : major
+Frequency : infrequent
+Bugzilla : 11308
+Description: kernel ptllnd fix bug in connection re-establishment
+Details : Kernel ptllnd could produce protocol errors e.g. illegal
+ matchbits and/or violate the credit flow protocol when trying
+ to re-establish a connection with a peer after an error or
+ timeout.
+
+Severity : enhancement
+Bugzilla : 10316
+Description: Allow /proc/sys/lnet/debug to be set symbolically
+Details : Allow debug and subsystem debug values to be read/set by name
+ in addition to numerically, for ease of use.
+
+Severity : normal
+Frequency : only in configurations with LNET routers
+Bugzilla : 10316
+Description: routes automatically marked down and recovered
+Details : In configurations with LNET routers if a router fails routers
+ now actively try to recover routes that are down, unless they
+ are marked down by an administrator.
+
+------------------------------------------------------------------------------
+
+2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
+
+Severity : critical
+Frequency : very rarely, in configurations with LNET routers and TCP
+Bugzilla : 10889
+Description: incorrect data written to files on OSTs
+Details : In certain high-load conditions incorrect data may be written
+ to files on the OST when using TCP networks.
+
+------------------------------------------------------------------------------
+
+2006-07-31 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.7
+ - rework CDEBUG messages rate-limiting mechanism b=10375
+ - add per-socket tunables for socklnd if the kernel is patched b=10327
+
+------------------------------------------------------------------------------
+
+2006-02-15 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.6
+ - fix use of portals/lnet pid to avoid dropping RPCs b=10074
+ - iiblnd wasn't mapping all memory, resulting in comms errors b=9776
+ - quiet LNET startup LNI message for liblustre b=10128
+ - Better console error messages if 'ip2nets' can't match an IP address
+ - Fixed overflow/use-before-set bugs in linux-time.h
+ - Fixed ptllnd bug that wasn't initialising rx descriptors completely
+ - LNET teardown failed an assertion about the route table being empty
+ - Fixed a crash in LNetEQPoll(<invalid handle>)
+ - Future protocol compatibility work (b_rls146_lnetprotovrsn)
+ - improve debug message for liblustre/Catamount nodes (b=10116)
+
+2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
+ * Configuration change for the XT3
+ The PTLLND is now used to run Lustre over Portals on the XT3.
+ The configure option(s) --with-cray-portals are no longer
+ used. Rather --with-portals=<path-to-portals-includes> is
+ used to enable building on the XT3. In addition to enable
+ XT3 specific features the option --enable-cray-xt3 must be
+ used.
+
+2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
+ * Portals has been removed, replaced by LNET.
+ LNET is new networking infrastructure for Lustre, it includes a
+ reorganized network configuration mode (see the user
+ documentation for full details) as well as support for routing
+ between different network fabrics. Lustre Networking Devices
+ (LNDS) for the supported network fabrics have also been created
+ for this new infrastructure.
+
+2005-08-08 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.4
+ * bug fixes
+
+Severity : major
+Frequency : rare (large Voltaire clusters only)
+Bugzilla : 6993
+Description: the default number of reserved transmit descriptors was too low
+ for some large clusters
+Details : As a workaround, the number was increased. A proper fix includes
+ a run-time tunable.
+
2005-06-02 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.3
* bug fixes
Frequency : occasional (large-scale events, cluster reboot, network failure)
Bugzilla : 6411
Description: too many error messages on console obscure actual problem and
- can slow down/panic server, or cause recovery to fail repeatedly
+ can slow down/panic server, or cause recovery to fail repeatedly
Details : enable rate-limiting of console error messages, and some messages
- that were console errors now only go to the kernel log
+ that were console errors now only go to the kernel log
Severity : enhancement
Bugzilla : 1693
Description: add /proc/sys/portals/catastrophe entry which will report if
- that node has previously LBUGged
+ that node has previously LBUGged
2005-04-06 Cluster File Systems, Inc. <info@clusterfs.com>
* bugs