-tbd Sun Microsystems, Inc.
+TBD Intel Corporation
+ * version 2.2.0
+ * Support for networks:
+ socklnd - any kernel supported by Lustre,
+ o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
+ mxlnd - MX 1.2.10 or later
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
+-------------------------------------------------------------------------------
+
+09-30-2011 Whamcloud, Inc.
+ * version 2.1.0
+ * Support for networks:
+ socklnd - any kernel supported by Lustre,
+ o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
+ * Available but unsupported:
+ mxlnd - MX 1.2.10 or later
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
+-------------------------------------------------------------------------------
+
+2010-07-15 Oracle, Inc.
+ * version 2.0.0
+ * Support for networks:
+ socklnd - any kernel supported by Lustre,
+ qswlnd - Qsnet kernel modules 5.20 and later,
+ openiblnd - IbGold 1.8.2,
+ o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
+ viblnd - Voltaire ibhost 3.4.5 and later,
+ ciblnd - Topspin 3.2.0,
+ iiblnd - Infiniserv 3.3 + PathBits patch,
+ gmlnd - GM 2.1.22 and later,
+ mxlnd - MX 1.2.10 or later,
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
+Severity : minor
+Bugzilla : 21459
+Description: should update lp_alive for non-router peers
+
+Severity : enhancement
+Bugzilla : 15332
+Description: LNet router shuffler.
+
+Severity : enhancement
+Bugzilla : 15332
+Description: LNet fine grain routing support.
+
+Severity : normal
+Bugzilla : 20171
+Description: router checker stops working when system wall clock goes backward
+Details : use monotonic timing source instead of system wall clock time.
+
+Severity : enhancement
+Bugzilla : 18460
+Description: avoid asymmetrical router failures
+
+Severity : enhancement
+Bugzilla : 19735
+Description: multiple-instance support for kptllnd
+
+Severity : normal
+Bugzilla : 20897
+Description: ksocknal_close_conn_locked connection race
+Details : A race was possible when ksocknal_create_conn calls
+ ksocknal_close_conn_locked for already closed conn.
+
+Severity : normal
+Bugzilla : 18102
+Description: router_proc.c is rewritten to use sysctl-interface for parameters
+ residing in /proc/sys/lnet
+
+Severity : enhancement
+Bugzilla : 13065
+Description: port router pinger to userspace
+
+Severity : normal
+Bugzilla : 17546
+Description: kptllnd HELLO protocol deadlock
+Details : kptllnd HELLO protocol doesn't run to completion in finite time
+
+Severity : normal
+Bugzilla : 18075
+Description: LNet selftest fixes and enhancements
+
+Severity : enhancement
+Bugzilla : 19156
+Description: allow a test node to be a member of multiple test groups
+
+Severity : enhancement
+Bugzilla : 18654
+Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
+Details : an update from the upstream developer Scott Atchley.
+
+Severity : enhancement
+Bugzilla : 15332
+Description: add a new LND optiion to control peer buffer credits on routers
+
+Severity : normal
+Bugzilla : 18844
+Description: Fixing deadlock in usocklnd
+Details : A deadlock was possible in usocklnd due to race condition while
+ tearing connection down. The problem resulted from erroneous
+ assumption that lnet_finalize() could have been called holding
+ some lnd-level locks.
+
+Severity : major
+Bugzilla : 13621, 15983
+Description: Protocol V2 of o2iblnd
+Details : o2iblnd V2 has several new features:
+ . map-on-demand: map-on-demand is disabled by default, it can
+ be enabled by using modparam "map_on_demand=@value@", @value@
+ should >= 0 and < 256, 0 will disable map-on-demand, any other
+ valid value will enable map-on-demand.
+ Oi2blnd will create FMR or physical MR for RDMA if fragments of
+ RD > @value@.
+ Enable map-on-demand will take less memory for new connection,
+ but a little more CPU for RDMA.
+ . iWARP : to support iWARP, please enable map-on-demand, 32 and 64
+ are recommanded value. iWARP will probably fail for value >=128.
+ . OOB NOOP message: to resolve deadlock on router.
+ . tunable peer_credits_hiw: (high water to return credits),
+ default value of peer_credits_hiw equals to (peer_credits -1),
+ user can change it between peer_credits/2 and (peer_credits - 1).
+ Lower value is recommended for high latency network.
+ . tunable message queue size: it always equals to peer_credits,
+ higher value is recommended for high latency network.
+ . It's compatible with earlier version of o2iblnd
+
+Severity : normal
+Bugzilla : 18414
+Description: Fixing 'running out of ports' issue
+Details : Add a delay before next reconnect attempt in ksocklnd in
+ the case of lost race. Limit the frequency of query-requests
+ in lnet. Improved handling of 'dead peer' notifications in
+ lnet.
+
+Severity : normal
+Bugzilla : 16034
+Description: Change ptllnd timeout and watchdog timers
+Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match
+ Portals wire timeout.
+
+Severity : normal
+Bugzilla : 16186
+Description: One down Lustre FS hangs ALL mounted Lustre filesystems
+Details : Shared routing enhancements - peer health detection.
+
+Severity : enhancement
+Bugzilla : 14132
+Description: acceptor.c cleanup
+Details : Code duplication in acceptor.c for the cases of kernel and
+ user-space removed. User-space libcfs tcpip primitives
+ uniformed to have prototypes similar to kernel ones. Minor
+ cosmetic changes in usocklnd to use cfs_socket_t as
+ representation of socket.
+
+Severity : minor
+Bugzilla : 11245
+Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
+Details : See comment 46 in bug 11245 for details - it's indeed a bug
+ introduced by the original 11245 fix.
+
+Severity : minor
+Bugzilla : 15984
+Description: uptllnd credit overflow fix
+Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since
+ it is only a __u8.
+
+Severity : major
+Bugzilla : 14634
+Description: socklnd protocol version 3
+Details : With current protocol V2, connections on router can be
+ blocked and can't receive any incoming messages when there is no
+ more router buffer, so ZC-ACK can't be handled (LNet message
+ can't be finalized) and will cause deadlock on router.
+ Protocol V3 has a dedicated connection for emergency messages
+ like ZC-ACK to router, messages on this dedicated connection
+ don't need any credit so will never be blocked. Also, V3 can send
+ keepalive ping in specified period for router healthy checking.
+
+-------------------------------------------------------------------------------
+
+12-31-2008 Sun Microsystems, Inc.
+ * version 1.8.0
+ * Support for networks:
+ socklnd - any kernel supported by Lustre,
+ qswlnd - Qsnet kernel modules 5.20 and later,
+ openiblnd - IbGold 1.8.2,
+ o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
+ viblnd - Voltaire ibhost 3.4.5 and later,
+ ciblnd - Topspin 3.2.0,
+ iiblnd - Infiniserv 3.3 + PathBits patch,
+ gmlnd - GM 2.1.22 and later,
+ mxlnd - MX 1.2.1 or later,
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
+Severity : major
+Bugzilla : 15983
+Description: workaround for OOM from o2iblnd
+Details : OFED needs allocate big chunk of memory for QP while creating
+ connection for o2iblnd, OOM can happen if no such a contiguous
+ memory chunk.
+ QP size is decided by concurrent_sends and max_fragments of
+ o2iblnd, now we permit user to specify smaller value for
+ concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
+ will decrease memory block size required by creating QP.
+
+Severity : major
+Bugzilla : 15093
+Description: Support Zerocopy receive of Chelsio device
+Details : Chelsio driver can support zerocopy for iov[1] if it's
+ contiguous and large enough.
+
+Severity : normal
+Bugzilla : 13490
+Description: fix credit flow deadlock in uptllnd
+
+Severity : normal
+Bugzilla : 16308
+Description: finalize network operation in reasonable time
+Details : conf-sanity test_32a couldn't stop ost and mds because it
+ tried to access non-existent peer and tcp connect took
+ quite long before timing out.
+
+Severity : major
+Bugzilla : 16338
+Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
+Details : Lost reference on conn prevents peer from being destroyed, which
+ could prevent new peer creation if peer count has reached upper
+ limit.
+
+Severity : normal
+Bugzilla : 16102
+Description: LNET Selftest results in Soft lockup on OSS CPU
+Details : only hits when 8 or more o2ib clients involved and a session is
+ torn down with 'lst end_session' without preceeding 'lst stop'.
+
+Severity : minor
+Bugzilla : 16321
+Description: concurrent_sends in IB LNDs should not be changeable at run time
+Details : concurrent_sends in IB LNDs should not be changeable at run time
+
+Severity : normal
+Bugzilla : 15272
+Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
+Details : only hits under out-of-memory situations
+
+
+-------------------------------------------------------------------------------
+
+2009-02-07 Sun Microsystems, Inc.
+ * version 1.6.7
+ * Support for networks:
+ socklnd - any kernel supported by Lustre,
+ qswlnd - Qsnet kernel modules 5.20 and later,
+ openiblnd - IbGold 1.8.2,
+ o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
+ viblnd - Voltaire ibhost 3.4.5 and later,
+ ciblnd - Topspin 3.2.0,
+ iiblnd - Infiniserv 3.3 + PathBits patch,
+ gmlnd - GM 2.1.22 and later,
+ mxlnd - MX 1.2.1 or later,
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+Severity : major
+Bugzilla : 15983
+Description: workaround for OOM from o2iblnd
+Details : OFED needs allocate big chunk of memory for QP while creating
+ connection for o2iblnd, OOM can happen if no such a contiguous
+ memory chunk.
+ QP size is decided by concurrent_sends and max_fragments of
+ o2iblnd, now we permit user to specify smaller value for
+ concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
+ will decrease memory block size required by creating QP.
+
+Severity : major
+Bugzilla : 15093
+Description: Support Zerocopy receive of Chelsio device
+Details : Chelsio driver can support zerocopy for iov[1] if it's
+ contiguous and large enough.
+Severity : normal
+Bugzilla : 13490
+Description: fix credit flow deadlock in uptllnd
+
+Severity : normal
+Bugzilla : 16308
+Description: finalize network operation in reasonable time
+Details : conf-sanity test_32a couldn't stop ost and mds because it
+ tried to access non-existent peer and tcp connect took
+ quite long before timing out.
+
+Severity : major
+Bugzilla : 16338
+Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
+Details : Lost reference on conn prevents peer from being destroyed, which
+ could prevent new peer creation if peer count has reached upper
+ limit.
+
+Severity : normal
+Bugzilla : 16102
+Description: LNET Selftest results in Soft lockup on OSS CPU
+Details : only hits when 8 or more o2ib clients involved and a session is
+ torn down with 'lst end_session' without preceeding 'lst stop'.
+
+Severity : minor
+Bugzilla : 16321
+Description: concurrent_sends in IB LNDs should not be changeable at run time
+Details : concurrent_sends in IB LNDs should not be changeable at run time
+
+-------------------------------------------------------------------------------
+
+11-03-2008 Sun Microsystems, Inc.
+ * version 1.6.6
+ * Support for networks:
+ socklnd - any kernel supported by Lustre,
+ qswlnd - Qsnet kernel modules 5.20 and later,
+ openiblnd - IbGold 1.8.2,
+ o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
+ viblnd - Voltaire ibhost 3.4.5 and later,
+ ciblnd - Topspin 3.2.0,
+ iiblnd - Infiniserv 3.3 + PathBits patch,
+ gmlnd - GM 2.1.22 and later,
+ mxlnd - MX 1.2.1 or later,
+ ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
+Severity : normal
+Bugzilla : 15272
+Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
+Details : only hits under out-of-memory situations
+
+-------------------------------------------------------------------------------
+
+04-26-2008 Sun Microsystems, Inc.
* version 1.6.5
* Support for networks:
socklnd - any kernel supported by Lustre,
gmlnd - GM 2.1.22 and later,
mxlnd - MX 1.2.1 or later,
ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
+Severity : normal
+Bugzilla : 14322
+Description: excessive debug information removed
+Details : excessive debug information removed
+
+Severity : major
+Bugzilla : 15712
+Description: ksocknal_create_conn() hit ASSERTION during connection race
+Details : ksocknal_create_conn() hit ASSERTION during connection race
+
+Severity : major
+Bugzilla : 13983
+Description: ksocknal_send_hello() hit ASSERTION while connecting race
+Details : ksocknal_send_hello() hit ASSERTION while connecting race
+
+Severity : major
+Bugzilla : 14425
+Description: o2iblnd/ptllnd credit deadlock in a routed config.
+Details : o2iblnd/ptllnd credit deadlock in a routed config.
+
Severity : normal
Bugzilla : 14956
Description: High load after starting lnet
Description: ksocklnd fails to establish connection if accept_port is high
Details : PID remapping must not be done for active (outgoing) connections
+
--------------------------------------------------------------------------------
2008-01-11 Sun Microsystems, Inc.
gmlnd - GM 2.1.22 and later,
mxlnd - MX 1.2.1 or later,
ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+
Severity : normal
Bugzilla : 14387
Description: liblustre network error
Bugzilla : 12302
Description: new userspace socklnd
Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced
- with new one - usocklnd.
+ with new one - usocklnd.
Severity : enhancement
Bugzilla : 11686
Bugzilla : 13236
Description: TOE Kernel panic by ksocklnd
Details : offloaded sockets provide their own implementation of sendpage,
- can't call tcp_sendpage() directly
+ can't call tcp_sendpage() directly
Severity : normal
Bugzilla : 10778
Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs
Details : races between lnd_shutdown and peer creation prevent
- lnd_shutdown from finishing.
+ lnd_shutdown from finishing.
Severity : normal
Bugzilla : 13279
Description: open files rlimit 1024 reached while liblustre testing
Details : ulnds/socklnd must close open socket after unsuccessful
- 'say hello' attempt.
+ 'say hello' attempt.
Severity : major
Bugzilla : 13482
Description: build error
Details : fix typos in gmlnd, ptllnd and viblnd
-------------------------------------------------------------------------------
+--------------------------------------------------------------------------------
2007-07-30 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.6.1
mxlnd - MX 1.2.1 or later,
ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
+--------------------------------------------------------------------------------
+
2007-06-21 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.11
* Support for networks:
Severity : major
Bugzilla : 12014
Description: ASSERTION failures when upgrading to the patchless zero-copy
- socklnd
+ socklnd
Details : This bug affects "rolling upgrades", causing an inconsistent
- protocol version negotiation and subsequent assertion failure
+ protocol version negotiation and subsequent assertion failure
during rolling upgrades after the first wave of upgrades.
Severity : minor
Bugzilla : 11223
Details : Change "dropped message" CERRORs to D_NETERROR so they are
- logged instead of creating "console chatter" when a lustre
+ logged instead of creating "console chatter" when a lustre
timeout races with normal RPC completion.
Severity : minor
Details : lnet_clear_peer_table can wait forever if user forgets to
- clear a lazy portal.
+ clear a lazy portal.
Severity : minor
Details : libcfs_id2str should check pid against LNET_PID_ANY.
Bugzilla : 12316
Description: Add OFED1.2 support to o2iblnd
Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules
- are installed (other than kernel's in-tree infiniband), there
- could be some problem while insmod o2iblnd (mismatch CRC of
- ib_* symbols).
- If extra Module.symvers is supported in kernel (i.e, 2.6.17),
- this link provides solution:
- https://bugs.openfabrics.org/show_bug.cgi?id=355
- if extra Module.symvers is not supported in kernel, we will
- have to run the script in bug 12316 to update
- $LINUX/module.symvers before building o2iblnd.
- More details about this are in bug 12316.
+ are installed (other than kernel's in-tree infiniband), there
+ could be some problem while insmod o2iblnd (mismatch CRC of
+ ib_* symbols).
+ If extra Module.symvers is supported in kernel (i.e, 2.6.17),
+ this link provides solution:
+ https://bugs.openfabrics.org/show_bug.cgi?id=355
+ if extra Module.symvers is not supported in kernel, we will
+ have to run the script in bug 12316 to update
+ $LINUX/module.symvers before building o2iblnd.
+ More details about this are in bug 12316.
------------------------------------------------------------------------------
Severity : minor
Frequency : rare
Description: the_lnet.ln_finalizing was not set when the current thread is
- about to complete messages. It only affects multi-threaded
+ about to complete messages. It only affects multi-threaded
user space LNet.
Severity : normal
Frequency : rare
Bugzilla : 12458
Description: Assertion failure in kernel ptllnd caused by posting passive
- bulk buffers before connection establishment complete.
+ bulk buffers before connection establishment complete.
Severity : major
Frequency : rare
Bugzilla : 12445
Description: A race in kernel ptllnd between deleting a peer and posting
- new communications for it could hang communications -
+ new communications for it could hang communications -
manifesting as "Unexpectedly long timeout" messages.
Severity : major
Bugzilla : 11659
Description: Credit overflows
Details : This was a bug in ptllnd connection establishment. The fix
- implements better peer stamps to disambiguate connection
+ implements better peer stamps to disambiguate connection
establishment and ensure both peers enter the credit flow
state machine consistently.
Bugzilla : 11394
Description: kptllnd didn't propagate some network errors up to LNET
Details : This bug was spotted while investigating 11394. The fix
- ensures network errors on sends and bulk transfers are
+ ensures network errors on sends and bulk transfers are
propagated to LNET/lustre correctly.
Severity : enhancement
- renamed cfs_sleep_chan -> cfs_waitq
cfs_sleep_link -> cfs_waitlink
- - fixed race in linux version of arch-independent socknal
- (the ENOMEM/EAGAIN decision).
+ - fixed race in linux version of arch-independent socknal
+ (the ENOMEM/EAGAIN decision).
- Didn't fix problems in Darwin version of arch-independent socknal
- (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
+ (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
- removed libcfs types from non-socknal header files (only some types
in the header files had been changed; the .c files hadn't been