11-05-2018 Whamcloud, Inc. * version 2.12.0 * Support for networks: socklnd - any kernel supported by Lustre o2iblnd - OFED from any kernels supported by Lustre o2ilbnd - MLNX_OFED 3.4-2.0.0.0, MLNX_OFED 4.0-1.0.1.0 MLNX_OFED 4.0-2.0.0.1, MLNX_OFED 4.1-1.0.2.0 MLNX_OFED 4.2-1.0.0.0, MLNX_OFED 4.2-1.2.0.0 MLNX_OFED 4.3-1.0.1.0. MLNX_OFED 4.4-1.0.0.0 MLNX_OFED 4.4-2.0.7.0 * Features: LNet Health: Keep track of network interface health and select healthiest interface. TBD Intel Corporation * version 2.2.0 * Support for networks: socklnd - any kernel supported by Lustre, o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1 mxlnd - MX 1.2.10 or later ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x ------------------------------------------------------------------------------- 09-30-2011 Whamcloud, Inc. * version 2.1.0 * Support for networks: socklnd - any kernel supported by Lustre, o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1 * Available but unsupported: mxlnd - MX 1.2.10 or later ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x ------------------------------------------------------------------------------- 2010-07-15 Oracle, Inc. * version 2.0.0 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.10 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : minor Bugzilla : 21459 Description: should update lp_alive for non-router peers Severity : enhancement Bugzilla : 15332 Description: LNet router shuffler. Severity : enhancement Bugzilla : 15332 Description: LNet fine grain routing support. Severity : normal Bugzilla : 20171 Description: router checker stops working when system wall clock goes backward Details : use monotonic timing source instead of system wall clock time. Severity : enhancement Bugzilla : 18460 Description: avoid asymmetrical router failures Severity : enhancement Bugzilla : 19735 Description: multiple-instance support for kptllnd Severity : normal Bugzilla : 20897 Description: ksocknal_close_conn_locked connection race Details : A race was possible when ksocknal_create_conn calls ksocknal_close_conn_locked for already closed conn. Severity : normal Bugzilla : 18102 Description: router_proc.c is rewritten to use sysctl-interface for parameters residing in /proc/sys/lnet Severity : enhancement Bugzilla : 13065 Description: port router pinger to userspace Severity : normal Bugzilla : 17546 Description: kptllnd HELLO protocol deadlock Details : kptllnd HELLO protocol doesn't run to completion in finite time Severity : normal Bugzilla : 18075 Description: LNet selftest fixes and enhancements Severity : enhancement Bugzilla : 19156 Description: allow a test node to be a member of multiple test groups Severity : enhancement Bugzilla : 18654 Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution Details : an update from the upstream developer Scott Atchley. Severity : enhancement Bugzilla : 15332 Description: add a new LND optiion to control peer buffer credits on routers Severity : normal Bugzilla : 18844 Description: Fixing deadlock in usocklnd Details : A deadlock was possible in usocklnd due to race condition while tearing connection down. The problem resulted from erroneous assumption that lnet_finalize() could have been called holding some lnd-level locks. Severity : major Bugzilla : 13621, 15983 Description: Protocol V2 of o2iblnd Details : o2iblnd V2 has several new features: . map-on-demand: map-on-demand is disabled by default, it can be enabled by using modparam "map_on_demand=@value@", @value@ should >= 0 and < 256, 0 will disable map-on-demand, any other valid value will enable map-on-demand. Oi2blnd will create FMR or physical MR for RDMA if fragments of RD > @value@. Enable map-on-demand will take less memory for new connection, but a little more CPU for RDMA. . iWARP : to support iWARP, please enable map-on-demand, 32 and 64 are recommanded value. iWARP will probably fail for value >=128. . OOB NOOP message: to resolve deadlock on router. . tunable peer_credits_hiw: (high water to return credits), default value of peer_credits_hiw equals to (peer_credits -1), user can change it between peer_credits/2 and (peer_credits - 1). Lower value is recommended for high latency network. . tunable message queue size: it always equals to peer_credits, higher value is recommended for high latency network. . It's compatible with earlier version of o2iblnd Severity : normal Bugzilla : 18414 Description: Fixing 'running out of ports' issue Details : Add a delay before next reconnect attempt in ksocklnd in the case of lost race. Limit the frequency of query-requests in lnet. Improved handling of 'dead peer' notifications in lnet. Severity : normal Bugzilla : 16034 Description: Change ptllnd timeout and watchdog timers Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match Portals wire timeout. Severity : normal Bugzilla : 16186 Description: One down Lustre FS hangs ALL mounted Lustre filesystems Details : Shared routing enhancements - peer health detection. Severity : enhancement Bugzilla : 14132 Description: acceptor.c cleanup Details : Code duplication in acceptor.c for the cases of kernel and user-space removed. User-space libcfs tcpip primitives uniformed to have prototypes similar to kernel ones. Minor cosmetic changes in usocklnd to use cfs_socket_t as representation of socket. Severity : minor Bugzilla : 11245 Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off Details : See comment 46 in bug 11245 for details - it's indeed a bug introduced by the original 11245 fix. Severity : minor Bugzilla : 15984 Description: uptllnd credit overflow fix Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since it is only a __u8. Severity : major Bugzilla : 14634 Description: socklnd protocol version 3 Details : With current protocol V2, connections on router can be blocked and can't receive any incoming messages when there is no more router buffer, so ZC-ACK can't be handled (LNet message can't be finalized) and will cause deadlock on router. Protocol V3 has a dedicated connection for emergency messages like ZC-ACK to router, messages on this dedicated connection don't need any credit so will never be blocked. Also, V3 can send keepalive ping in specified period for router healthy checking. ------------------------------------------------------------------------------- 12-31-2008 Sun Microsystems, Inc. * version 1.8.0 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : major Bugzilla : 15983 Description: workaround for OOM from o2iblnd Details : OFED needs allocate big chunk of memory for QP while creating connection for o2iblnd, OOM can happen if no such a contiguous memory chunk. QP size is decided by concurrent_sends and max_fragments of o2iblnd, now we permit user to specify smaller value for concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which will decrease memory block size required by creating QP. Severity : major Bugzilla : 15093 Description: Support Zerocopy receive of Chelsio device Details : Chelsio driver can support zerocopy for iov[1] if it's contiguous and large enough. Severity : normal Bugzilla : 13490 Description: fix credit flow deadlock in uptllnd Severity : normal Bugzilla : 16308 Description: finalize network operation in reasonable time Details : conf-sanity test_32a couldn't stop ost and mds because it tried to access non-existent peer and tcp connect took quite long before timing out. Severity : major Bugzilla : 16338 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure Details : Lost reference on conn prevents peer from being destroyed, which could prevent new peer creation if peer count has reached upper limit. Severity : normal Bugzilla : 16102 Description: LNET Selftest results in Soft lockup on OSS CPU Details : only hits when 8 or more o2ib clients involved and a session is torn down with 'lst end_session' without preceeding 'lst stop'. Severity : minor Bugzilla : 16321 Description: concurrent_sends in IB LNDs should not be changeable at run time Details : concurrent_sends in IB LNDs should not be changeable at run time Severity : normal Bugzilla : 15272 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails Details : only hits under out-of-memory situations ------------------------------------------------------------------------------- 2009-02-07 Sun Microsystems, Inc. * version 1.6.7 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : major Bugzilla : 15983 Description: workaround for OOM from o2iblnd Details : OFED needs allocate big chunk of memory for QP while creating connection for o2iblnd, OOM can happen if no such a contiguous memory chunk. QP size is decided by concurrent_sends and max_fragments of o2iblnd, now we permit user to specify smaller value for concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which will decrease memory block size required by creating QP. Severity : major Bugzilla : 15093 Description: Support Zerocopy receive of Chelsio device Details : Chelsio driver can support zerocopy for iov[1] if it's contiguous and large enough. Severity : normal Bugzilla : 13490 Description: fix credit flow deadlock in uptllnd Severity : normal Bugzilla : 16308 Description: finalize network operation in reasonable time Details : conf-sanity test_32a couldn't stop ost and mds because it tried to access non-existent peer and tcp connect took quite long before timing out. Severity : major Bugzilla : 16338 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure Details : Lost reference on conn prevents peer from being destroyed, which could prevent new peer creation if peer count has reached upper limit. Severity : normal Bugzilla : 16102 Description: LNET Selftest results in Soft lockup on OSS CPU Details : only hits when 8 or more o2ib clients involved and a session is torn down with 'lst end_session' without preceeding 'lst stop'. Severity : minor Bugzilla : 16321 Description: concurrent_sends in IB LNDs should not be changeable at run time Details : concurrent_sends in IB LNDs should not be changeable at run time ------------------------------------------------------------------------------- 11-03-2008 Sun Microsystems, Inc. * version 1.6.6 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : normal Bugzilla : 15272 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails Details : only hits under out-of-memory situations ------------------------------------------------------------------------------- 04-26-2008 Sun Microsystems, Inc. * version 1.6.5 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1 and 1.2.0, 1.2.5 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : normal Bugzilla : 14322 Description: excessive debug information removed Details : excessive debug information removed Severity : major Bugzilla : 15712 Description: ksocknal_create_conn() hit ASSERTION during connection race Details : ksocknal_create_conn() hit ASSERTION during connection race Severity : major Bugzilla : 13983 Description: ksocknal_send_hello() hit ASSERTION while connecting race Details : ksocknal_send_hello() hit ASSERTION while connecting race Severity : major Bugzilla : 14425 Description: o2iblnd/ptllnd credit deadlock in a routed config. Details : o2iblnd/ptllnd credit deadlock in a routed config. Severity : normal Bugzilla : 14956 Description: High load after starting lnet Details : gmlnd should sleep in rx thread in interruptible way. Otherwise, uptime utility reports high load that looks confusingly. Severity : normal Bugzilla : 14838 Description: ksocklnd fails to establish connection if accept_port is high Details : PID remapping must not be done for active (outgoing) connections -------------------------------------------------------------------------------- 2008-01-11 Sun Microsystems, Inc. * version 1.4.12 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1 and 1.2.0, 1.2.5 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : normal Bugzilla : 14387 Description: liblustre network error Details : liblustre clients should understand LNET_ACCEPT_PORT environment variable even if they don't start lnet acceptor. Severity : normal Bugzilla : 14300 Description: Strange message from lnet (Ignoring prediction from the future) Details : Incorrect calculation of peer's last_alive value in ksocklnd -------------------------------------------------------------------------------- 2007-12-07 Cluster File Systems, Inc. * version 1.6.4 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1 and 1.2.0, 1.2.5. viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : normal Bugzilla : 14238 Description: ASSERTION(me == md->md_me) failed in lnet_match_md() Severity : normal Bugzilla : 12494 Description: increase send queue size for ciblnd/openiblnd Severity : normal Bugzilla : 12302 Description: new userspace socklnd Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced with new one - usocklnd. Severity : enhancement Bugzilla : 11686 Description: Console message flood Details : Make cdls ratelimiting more tunable by adding several tunable in procfs /proc/sys/lnet/console_{min,max}_delay_centisecs and /proc/sys/lnet/console_backoff. -------------------------------------------------------------------------------- 2007-09-27 Cluster File Systems, Inc. * version 1.6.3 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1 and 1.2, viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : normal Bugzilla : 12782 Description: /proc/sys/lnet has non-sysctl entries Details : Updating dump_kernel/daemon_file/debug_mb to use sysctl variables Severity : major Bugzilla : 13236 Description: TOE Kernel panic by ksocklnd Details : offloaded sockets provide their own implementation of sendpage, can't call tcp_sendpage() directly Severity : normal Bugzilla : 10778 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs Details : races between lnd_shutdown and peer creation prevent lnd_shutdown from finishing. Severity : normal Bugzilla : 13279 Description: open files rlimit 1024 reached while liblustre testing Details : ulnds/socklnd must close open socket after unsuccessful 'say hello' attempt. Severity : major Bugzilla : 13482 Description: build error Details : fix typos in gmlnd, ptllnd and viblnd -------------------------------------------------------------------------------- 2007-07-30 Cluster File Systems, Inc. * version 1.6.1 * Support for networks: socklnd - kernels up to 2.6.16, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1 and 1.2 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x -------------------------------------------------------------------------------- 2007-06-21 Cluster File Systems, Inc. * version 1.4.11 * Support for networks: socklnd - kernels up to 2.6.16, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1 viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : minor Bugzilla : 13288 Description: Initialize cpumask before use Severity : major Bugzilla : 12014 Description: ASSERTION failures when upgrading to the patchless zero-copy socklnd Details : This bug affects "rolling upgrades", causing an inconsistent protocol version negotiation and subsequent assertion failure during rolling upgrades after the first wave of upgrades. Severity : minor Bugzilla : 11223 Details : Change "dropped message" CERRORs to D_NETERROR so they are logged instead of creating "console chatter" when a lustre timeout races with normal RPC completion. Severity : minor Details : lnet_clear_peer_table can wait forever if user forgets to clear a lazy portal. Severity : minor Details : libcfs_id2str should check pid against LNET_PID_ANY. Severity : major Bugzilla : 10916 Description: added LNET self test Details : landing b_self_test Severity : minor Frequency : rare Bugzilla : 12227 Description: cfs_duration_{u,n}sec() wrongly calculate nanosecond part of struct timeval. Details : do_div() macro is used incorrectly. 2007-04-23 Cluster File Systems, Inc. Severity : normal Bugzilla : 11680 Description: make panic on lbug configurable Severity : major Bugzilla : 12316 Description: Add OFED1.2 support to o2iblnd Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules are installed (other than kernel's in-tree infiniband), there could be some problem while insmod o2iblnd (mismatch CRC of ib_* symbols). If extra Module.symvers is supported in kernel (i.e, 2.6.17), this link provides solution: https://bugs.openfabrics.org/show_bug.cgi?id=355 if extra Module.symvers is not supported in kernel, we will have to run the script in bug 12316 to update $LINUX/module.symvers before building o2iblnd. More details about this are in bug 12316. ------------------------------------------------------------------------------ 2007-04-01 Cluster File Systems, Inc. * version 1.4.10 / 1.6.0 * Support for networks: socklnd - kernels up to 2.6.16, qswlnd - Qsnet kernel modules 5.20 and later, openiblnd - IbGold 1.8.2, o2iblnd - OFED 1.1, viblnd - Voltaire ibhost 3.4.5 and later, ciblnd - Topspin 3.2.0, iiblnd - Infiniserv 3.3 + PathBits patch, gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x Severity : minor Frequency : rare Description: Ptllnd didn't init kptllnd_data.kptl_idle_txs before it could be possibly accessed in kptllnd_shutdown. Ptllnd should init kptllnd_data.kptl_ptlid2str_lock before calling kptllnd_ptlid2str. Severity : normal Frequency : rare Description: gmlnd ignored some transmit errors when finalizing lnet messages. Severity : minor Frequency : rare Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_hello. Severity : minor Frequency : rare Description: the_lnet.ln_finalizing was not set when the current thread is about to complete messages. It only affects multi-threaded user space LNet. Severity : normal Frequency : rare Bugzilla : 11472 Description: Changed the default kqswlnd ntxmsg=512 Severity : major Frequency : rare Bugzilla : 12458 Description: Assertion failure in kernel ptllnd caused by posting passive bulk buffers before connection establishment complete. Severity : major Frequency : rare Bugzilla : 12445 Description: A race in kernel ptllnd between deleting a peer and posting new communications for it could hang communications - manifesting as "Unexpectedly long timeout" messages. Severity : major Frequency : rare Bugzilla : 12432 Description: Kernel ptllnd lock ordering issue could hang a node. Severity : major Frequency : rare Bugzilla : 12016 Description: node crash on socket teardown race Severity : minor Frequency : 'lctl peer_list' issued on a mx net Bugzilla : 12237 Description: Enable lctl's peer_list for MXLND Severity : major Frequency : after Ptllnd timeouts and portals congestion Bugzilla : 11659 Description: Credit overflows Details : This was a bug in ptllnd connection establishment. The fix implements better peer stamps to disambiguate connection establishment and ensure both peers enter the credit flow state machine consistently. Severity : major Frequency : rare Bugzilla : 11394 Description: kptllnd didn't propagate some network errors up to LNET Details : This bug was spotted while investigating 11394. The fix ensures network errors on sends and bulk transfers are propagated to LNET/lustre correctly. Severity : enhancement Bugzilla : 10316 Description: Fixed console chatter in case of -ETIMEDOUT. Severity : enhancement Bugzilla : 11684 Description: Added D_NETTRACE for recording network packet history (initially only for ptllnd). Also a separate userspace ptllnd facility to gather history which should really be covered by D_NETTRACE too, if only CDEBUG recorded history in userspace. Severity : major Frequency : rare Bugzilla : 11616 Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED. Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED callback can occur before a connection has actually been established. This caused an assertion failure previously. Severity : enhancement Bugzilla : 11094 Description: Multiple instances for o2iblnd Details : Allow multiple instances of o2iblnd to enable networking over multiple HCAs and routing between them. Severity : major Bugzilla : 11201 Description: lnet deadlock in router_checker Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock into BH locks to eliminate potential deadlock caused by ksocknal_data_ready() preempting code holding these locks. Severity : major Bugzilla : 11126 Description: Millions of failed socklnd connection attempts cause a very slow FS Details : added a new route flag ksnr_scheduled to distinguish from ksnr_connecting, so that a peer connection request is only turned down for race concerns when an active connection to the same peer is under progress (instead of just being scheduled). ------------------------------------------------------------------------------ 2007-02-09 Cluster File Systems, Inc. * version 1.4.9 * Support for networks: socklnd - kernels up to 2.6.16 qswlnd - Qsnet kernel modules 5.20 and later openiblnd - IbGold 1.8.2 o2iblnd - OFED 1.1 viblnd - Voltaire ibhost 3.4.5 and later ciblnd - Topspin 3.2.0 iiblnd - Infiniserv 3.3 + PathBits patch gmlnd - GM 2.1.22 and later mxlnd - MX 1.2.1 or later ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x * bug fixes Severity : major on XT3 Bugzilla : none Description: libcfs overwrites /proc/sys/portals Details : libcfs created a symlink from /proc/sys/portals to /proc/sys/lnet for backwards compatibility. This is no longer required and makes the Cray portals /proc variables inaccessible. Severity : minor Bugzilla : 11312 Description: OFED FMR API change Details : This changes parameter usage to reflect a change in ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note that FMR support is only used in experimental versions of the o2iblnd - this change does not affect standard usage at all. Severity : enhancement Bugzilla : 11245 Description: new ko2iblnd module parameter: ib_mtu Details : the default IB MTU of 2048 performs badly on 23108 Tavor HCAs. You can avoid this problem by setting the MTU to 1024 using this module parameter. Severity : enhancement Bugzilla : 11118/11620 Description: ptllnd small request message buffer alignment fix Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives. Round up small message size on sends in case this option is not supported. 11620 was a defect in the initial implementation which effectively asserted all peers had to be running the correct protocol version which was fixed by always NAK-ing such requests and handling any misalignments they introduce. Severity : minor Frequency : rarely Description: When kib(nal|lnd)_del_peer() is called upon a peer whose ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail. Severity : enhancement Bugzilla : 11250 Description: Patchless ZC(zero copy) socklnd Details : New protocol for socklnd, socklnd can support zero copy without kernel patch, it's compatible with old socklnd. Checksum is moved from tunables to modparams. Severity : minor Frequency : rarely Description: When ksocknal_del_peer() is called upon a peer whose ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail. Severity : normal Frequency : when ptlrpc is under heavy use and runs out of request buffer Bugzilla : 11318 Description: In lnet_match_blocked_msg(), md can be used without holding a ref on it. Severity : minor Frequency : very rarely Bugzilla : 10727 Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost. If connd connects a route which has been closed by ksocknal_shutdown(), ksocknal_create_routes() may create new routes which hold references on the peer, causing shutdown process to wait for peer to disappear forever. Severity : enhancement Bugzilla : 11234 Description: Dump XT3 portals traces on kptllnd timeout Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to dump Cray portals debug traces to a file. The kptllnd module parameter "ptltrace_basename", default "/tmp/lnet-ptltrace", is the basename of the dump file. Severity : major Frequency : infrequent Bugzilla : 11308 Description: kernel ptllnd fix bug in connection re-establishment Details : Kernel ptllnd could produce protocol errors e.g. illegal matchbits and/or violate the credit flow protocol when trying to re-establish a connection with a peer after an error or timeout. Severity : enhancement Bugzilla : 10316 Description: Allow /proc/sys/lnet/debug to be set symbolically Details : Allow debug and subsystem debug values to be read/set by name in addition to numerically, for ease of use. Severity : normal Frequency : only in configurations with LNET routers Bugzilla : 10316 Description: routes automatically marked down and recovered Details : In configurations with LNET routers if a router fails routers now actively try to recover routes that are down, unless they are marked down by an administrator. ------------------------------------------------------------------------------ 2006-12-09 Cluster File Systems, Inc. Severity : critical Frequency : very rarely, in configurations with LNET routers and TCP Bugzilla : 10889 Description: incorrect data written to files on OSTs Details : In certain high-load conditions incorrect data may be written to files on the OST when using TCP networks. ------------------------------------------------------------------------------ 2006-07-31 Cluster File Systems, Inc. * version 1.4.7 - rework CDEBUG messages rate-limiting mechanism b=10375 - add per-socket tunables for socklnd if the kernel is patched b=10327 ------------------------------------------------------------------------------ 2006-02-15 Cluster File Systems, Inc. * version 1.4.6 - fix use of portals/lnet pid to avoid dropping RPCs b=10074 - iiblnd wasn't mapping all memory, resulting in comms errors b=9776 - quiet LNET startup LNI message for liblustre b=10128 - Better console error messages if 'ip2nets' can't match an IP address - Fixed overflow/use-before-set bugs in linux-time.h - Fixed ptllnd bug that wasn't initialising rx descriptors completely - LNET teardown failed an assertion about the route table being empty - Fixed a crash in LNetEQPoll() - Future protocol compatibility work (b_rls146_lnetprotovrsn) - improve debug message for liblustre/Catamount nodes (b=10116) 2005-10-10 Cluster File Systems, Inc. * Configuration change for the XT3 The PTLLND is now used to run Lustre over Portals on the XT3. The configure option(s) --with-cray-portals are no longer used. Rather --with-portals= is used to enable building on the XT3. In addition to enable XT3 specific features the option --enable-cray-xt3 must be used. 2005-10-10 Cluster File Systems, Inc. * Portals has been removed, replaced by LNET. LNET is new networking infrastructure for Lustre, it includes a reorganized network configuration mode (see the user documentation for full details) as well as support for routing between different network fabrics. Lustre Networking Devices (LNDS) for the supported network fabrics have also been created for this new infrastructure. 2005-08-08 Cluster File Systems, Inc. * version 1.4.4 * bug fixes Severity : major Frequency : rare (large Voltaire clusters only) Bugzilla : 6993 Description: the default number of reserved transmit descriptors was too low for some large clusters Details : As a workaround, the number was increased. A proper fix includes a run-time tunable. 2005-06-02 Cluster File Systems, Inc. * version 1.4.3 * bug fixes Severity : major Frequency : occasional (large-scale events, cluster reboot, network failure) Bugzilla : 6411 Description: too many error messages on console obscure actual problem and can slow down/panic server, or cause recovery to fail repeatedly Details : enable rate-limiting of console error messages, and some messages that were console errors now only go to the kernel log Severity : enhancement Bugzilla : 1693 Description: add /proc/sys/portals/catastrophe entry which will report if that node has previously LBUGged 2005-04-06 Cluster File Systems, Inc. * bugs - update gmnal to use PTL_MTU, fix module refcounting (b=5786) 2005-04-04 Cluster File Systems, Inc. * bugs - handle error return code in kranal_check_fma_rx() (5915,6054) 2005-02-04 Cluster File Systems, Inc. * miscellania - update vibnal (Voltaire IB NAL) - update gmnal (Myrinet NAL), gmnalid 2005-02-04 Eric Barton * Landed portals:b_port_step as follows... - removed CFS_DECL_SPIN* just use 'spinlock_t' and initialise with spin_lock_init() - removed CFS_DECL_MUTEX* just use 'struct semaphore' and initialise with init_mutex() - removed CFS_DECL_RWSEM* just use 'struct rw_semaphore' and initialise with init_rwsem() - renamed cfs_sleep_chan -> cfs_waitq cfs_sleep_link -> cfs_waitlink - fixed race in linux version of arch-independent socknal (the ENOMEM/EAGAIN decision). - Didn't fix problems in Darwin version of arch-independent socknal (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision) - removed libcfs types from non-socknal header files (only some types in the header files had been changed; the .c files hadn't been updated at all).