X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=blobdiff_plain;f=lnet%2FChangeLog;h=aadf97c2a67ebe4d14a0a3f010da67c7d5431712;hp=dd608fd15b7ba58dda0ab275c2c92ad0e1f92f8f;hb=3b84a1ee5213563945225854a50e9037bb9646db;hpb=bc35985999e455c5ce3d9b2ed9f052c728df03e8 diff --git a/lnet/ChangeLog b/lnet/ChangeLog index dd608fd..aadf97c 100644 --- a/lnet/ChangeLog +++ b/lnet/ChangeLog @@ -1,5 +1,187 @@ -tbd Sun Microsystems, Inc. - * version 1.6.6 +TBD Intel Corporation + * version 2.2.0 + * Support for networks: + socklnd - any kernel supported by Lustre, + o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1 + mxlnd - MX 1.2.10 or later + ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x + +------------------------------------------------------------------------------- + +09-30-2011 Whamcloud, Inc. + * version 2.1.0 + * Support for networks: + socklnd - any kernel supported by Lustre, + o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1 + * Available but unsupported: + mxlnd - MX 1.2.10 or later + ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x + +------------------------------------------------------------------------------- + +2010-07-15 Oracle, Inc. + * version 2.0.0 + * Support for networks: + socklnd - any kernel supported by Lustre, + qswlnd - Qsnet kernel modules 5.20 and later, + openiblnd - IbGold 1.8.2, + o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1 + viblnd - Voltaire ibhost 3.4.5 and later, + ciblnd - Topspin 3.2.0, + iiblnd - Infiniserv 3.3 + PathBits patch, + gmlnd - GM 2.1.22 and later, + mxlnd - MX 1.2.10 or later, + ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x + +Severity : minor +Bugzilla : 21459 +Description: should update lp_alive for non-router peers + +Severity : enhancement +Bugzilla : 15332 +Description: LNet router shuffler. + +Severity : enhancement +Bugzilla : 15332 +Description: LNet fine grain routing support. + +Severity : normal +Bugzilla : 20171 +Description: router checker stops working when system wall clock goes backward +Details : use monotonic timing source instead of system wall clock time. + +Severity : enhancement +Bugzilla : 18460 +Description: avoid asymmetrical router failures + +Severity : enhancement +Bugzilla : 19735 +Description: multiple-instance support for kptllnd + +Severity : normal +Bugzilla : 20897 +Description: ksocknal_close_conn_locked connection race +Details : A race was possible when ksocknal_create_conn calls + ksocknal_close_conn_locked for already closed conn. + +Severity : normal +Bugzilla : 18102 +Description: router_proc.c is rewritten to use sysctl-interface for parameters + residing in /proc/sys/lnet + +Severity : enhancement +Bugzilla : 13065 +Description: port router pinger to userspace + +Severity : normal +Bugzilla : 17546 +Description: kptllnd HELLO protocol deadlock +Details : kptllnd HELLO protocol doesn't run to completion in finite time + +Severity : normal +Bugzilla : 18075 +Description: LNet selftest fixes and enhancements + +Severity : enhancement +Bugzilla : 19156 +Description: allow a test node to be a member of multiple test groups + +Severity : enhancement +Bugzilla : 18654 +Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution +Details : an update from the upstream developer Scott Atchley. + +Severity : enhancement +Bugzilla : 15332 +Description: add a new LND optiion to control peer buffer credits on routers + +Severity : normal +Bugzilla : 18844 +Description: Fixing deadlock in usocklnd +Details : A deadlock was possible in usocklnd due to race condition while + tearing connection down. The problem resulted from erroneous + assumption that lnet_finalize() could have been called holding + some lnd-level locks. + +Severity : major +Bugzilla : 13621, 15983 +Description: Protocol V2 of o2iblnd +Details : o2iblnd V2 has several new features: + . map-on-demand: map-on-demand is disabled by default, it can + be enabled by using modparam "map_on_demand=@value@", @value@ + should >= 0 and < 256, 0 will disable map-on-demand, any other + valid value will enable map-on-demand. + Oi2blnd will create FMR or physical MR for RDMA if fragments of + RD > @value@. + Enable map-on-demand will take less memory for new connection, + but a little more CPU for RDMA. + . iWARP : to support iWARP, please enable map-on-demand, 32 and 64 + are recommanded value. iWARP will probably fail for value >=128. + . OOB NOOP message: to resolve deadlock on router. + . tunable peer_credits_hiw: (high water to return credits), + default value of peer_credits_hiw equals to (peer_credits -1), + user can change it between peer_credits/2 and (peer_credits - 1). + Lower value is recommended for high latency network. + . tunable message queue size: it always equals to peer_credits, + higher value is recommended for high latency network. + . It's compatible with earlier version of o2iblnd + +Severity : normal +Bugzilla : 18414 +Description: Fixing 'running out of ports' issue +Details : Add a delay before next reconnect attempt in ksocklnd in + the case of lost race. Limit the frequency of query-requests + in lnet. Improved handling of 'dead peer' notifications in + lnet. + +Severity : normal +Bugzilla : 16034 +Description: Change ptllnd timeout and watchdog timers +Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match + Portals wire timeout. + +Severity : normal +Bugzilla : 16186 +Description: One down Lustre FS hangs ALL mounted Lustre filesystems +Details : Shared routing enhancements - peer health detection. + +Severity : enhancement +Bugzilla : 14132 +Description: acceptor.c cleanup +Details : Code duplication in acceptor.c for the cases of kernel and + user-space removed. User-space libcfs tcpip primitives + uniformed to have prototypes similar to kernel ones. Minor + cosmetic changes in usocklnd to use cfs_socket_t as + representation of socket. + +Severity : minor +Bugzilla : 11245 +Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off +Details : See comment 46 in bug 11245 for details - it's indeed a bug + introduced by the original 11245 fix. + +Severity : minor +Bugzilla : 15984 +Description: uptllnd credit overflow fix +Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since + it is only a __u8. + +Severity : major +Bugzilla : 14634 +Description: socklnd protocol version 3 +Details : With current protocol V2, connections on router can be + blocked and can't receive any incoming messages when there is no + more router buffer, so ZC-ACK can't be handled (LNet message + can't be finalized) and will cause deadlock on router. + Protocol V3 has a dedicated connection for emergency messages + like ZC-ACK to router, messages on this dedicated connection + don't need any credit so will never be blocked. Also, V3 can send + keepalive ping in specified period for router healthy checking. + +------------------------------------------------------------------------------- + +12-31-2008 Sun Microsystems, Inc. + * version 1.8.0 * Support for networks: socklnd - any kernel supported by Lustre, qswlnd - Qsnet kernel modules 5.20 and later, @@ -12,27 +194,22 @@ tbd Sun Microsystems, Inc. mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x -Severity : -Bugzilla : -Description: -Details : - Severity : major Bugzilla : 15983 Description: workaround for OOM from o2iblnd Details : OFED needs allocate big chunk of memory for QP while creating - connection for o2iblnd, OOM can happen if no such a contiguous - memory chunk. - QP size is decided by concurrent_sends and max_fragments of - o2iblnd, now we permit user to specify smaller value for - concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which - will decrease memory block size required by creating QP. + connection for o2iblnd, OOM can happen if no such a contiguous + memory chunk. + QP size is decided by concurrent_sends and max_fragments of + o2iblnd, now we permit user to specify smaller value for + concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which + will decrease memory block size required by creating QP. Severity : major Bugzilla : 15093 Description: Support Zerocopy receive of Chelsio device Details : Chelsio driver can support zerocopy for iov[1] if it's - contiguous and large enough. + contiguous and large enough. Severity : normal Bugzilla : 13490 @@ -42,27 +219,21 @@ Severity : normal Bugzilla : 16308 Description: finalize network operation in reasonable time Details : conf-sanity test_32a couldn't stop ost and mds because it - tried to access non-existent peer and tcp connect took - quite long before timing out. - -Severity : normal -Bugzilla : 13139 -Description: Remove portals compatibility -Details : Remove portals compatibility, not interoperable with releases - before 1.4.6 + tried to access non-existent peer and tcp connect took + quite long before timing out. Severity : major Bugzilla : 16338 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure Details : Lost reference on conn prevents peer from being destroyed, which - could prevent new peer creation if peer count has reached upper + could prevent new peer creation if peer count has reached upper limit. Severity : normal Bugzilla : 16102 Description: LNET Selftest results in Soft lockup on OSS CPU Details : only hits when 8 or more o2ib clients involved and a session is - torn down with 'lst end_session' without preceeding 'lst stop'. + torn down with 'lst end_session' without preceeding 'lst stop'. Severity : minor Bugzilla : 16321 @@ -77,6 +248,86 @@ Details : only hits under out-of-memory situations ------------------------------------------------------------------------------- +2009-02-07 Sun Microsystems, Inc. + * version 1.6.7 + * Support for networks: + socklnd - any kernel supported by Lustre, + qswlnd - Qsnet kernel modules 5.20 and later, + openiblnd - IbGold 1.8.2, + o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3 + viblnd - Voltaire ibhost 3.4.5 and later, + ciblnd - Topspin 3.2.0, + iiblnd - Infiniserv 3.3 + PathBits patch, + gmlnd - GM 2.1.22 and later, + mxlnd - MX 1.2.1 or later, + ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x +Severity : major +Bugzilla : 15983 +Description: workaround for OOM from o2iblnd +Details : OFED needs allocate big chunk of memory for QP while creating + connection for o2iblnd, OOM can happen if no such a contiguous + memory chunk. + QP size is decided by concurrent_sends and max_fragments of + o2iblnd, now we permit user to specify smaller value for + concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which + will decrease memory block size required by creating QP. + +Severity : major +Bugzilla : 15093 +Description: Support Zerocopy receive of Chelsio device +Details : Chelsio driver can support zerocopy for iov[1] if it's + contiguous and large enough. +Severity : normal +Bugzilla : 13490 +Description: fix credit flow deadlock in uptllnd + +Severity : normal +Bugzilla : 16308 +Description: finalize network operation in reasonable time +Details : conf-sanity test_32a couldn't stop ost and mds because it + tried to access non-existent peer and tcp connect took + quite long before timing out. + +Severity : major +Bugzilla : 16338 +Description: Continuous recovery on 33 of 413 nodes after lustre oss failure +Details : Lost reference on conn prevents peer from being destroyed, which + could prevent new peer creation if peer count has reached upper + limit. + +Severity : normal +Bugzilla : 16102 +Description: LNET Selftest results in Soft lockup on OSS CPU +Details : only hits when 8 or more o2ib clients involved and a session is + torn down with 'lst end_session' without preceeding 'lst stop'. + +Severity : minor +Bugzilla : 16321 +Description: concurrent_sends in IB LNDs should not be changeable at run time +Details : concurrent_sends in IB LNDs should not be changeable at run time + +------------------------------------------------------------------------------- + +11-03-2008 Sun Microsystems, Inc. + * version 1.6.6 + * Support for networks: + socklnd - any kernel supported by Lustre, + qswlnd - Qsnet kernel modules 5.20 and later, + openiblnd - IbGold 1.8.2, + o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3 + viblnd - Voltaire ibhost 3.4.5 and later, + ciblnd - Topspin 3.2.0, + iiblnd - Infiniserv 3.3 + PathBits patch, + gmlnd - GM 2.1.22 and later, + mxlnd - MX 1.2.1 or later, + ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x + +Severity : normal +Bugzilla : 15272 +Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails +Details : only hits under out-of-memory situations + +------------------------------------------------------------------------------- 04-26-2008 Sun Microsystems, Inc. * version 1.6.5 @@ -123,6 +374,7 @@ Bugzilla : 14838 Description: ksocklnd fails to establish connection if accept_port is high Details : PID remapping must not be done for active (outgoing) connections + -------------------------------------------------------------------------------- 2008-01-11 Sun Microsystems, Inc. @@ -138,6 +390,7 @@ Details : PID remapping must not be done for active (outgoing) connections gmlnd - GM 2.1.22 and later, mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x + Severity : normal Bugzilla : 14387 Description: liblustre network error @@ -177,7 +430,7 @@ Severity : normal Bugzilla : 12302 Description: new userspace socklnd Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced - with new one - usocklnd. + with new one - usocklnd. Severity : enhancement Bugzilla : 11686 @@ -211,26 +464,26 @@ Severity : major Bugzilla : 13236 Description: TOE Kernel panic by ksocklnd Details : offloaded sockets provide their own implementation of sendpage, - can't call tcp_sendpage() directly + can't call tcp_sendpage() directly Severity : normal Bugzilla : 10778 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs Details : races between lnd_shutdown and peer creation prevent - lnd_shutdown from finishing. + lnd_shutdown from finishing. Severity : normal Bugzilla : 13279 Description: open files rlimit 1024 reached while liblustre testing Details : ulnds/socklnd must close open socket after unsuccessful - 'say hello' attempt. + 'say hello' attempt. Severity : major Bugzilla : 13482 Description: build error Details : fix typos in gmlnd, ptllnd and viblnd ------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- 2007-07-30 Cluster File Systems, Inc. * version 1.6.1 @@ -246,6 +499,8 @@ Details : fix typos in gmlnd, ptllnd and viblnd mxlnd - MX 1.2.1 or later, ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x +-------------------------------------------------------------------------------- + 2007-06-21 Cluster File Systems, Inc. * version 1.4.11 * Support for networks: @@ -267,20 +522,20 @@ Description: Initialize cpumask before use Severity : major Bugzilla : 12014 Description: ASSERTION failures when upgrading to the patchless zero-copy - socklnd + socklnd Details : This bug affects "rolling upgrades", causing an inconsistent - protocol version negotiation and subsequent assertion failure + protocol version negotiation and subsequent assertion failure during rolling upgrades after the first wave of upgrades. Severity : minor Bugzilla : 11223 Details : Change "dropped message" CERRORs to D_NETERROR so they are - logged instead of creating "console chatter" when a lustre + logged instead of creating "console chatter" when a lustre timeout races with normal RPC completion. Severity : minor Details : lnet_clear_peer_table can wait forever if user forgets to - clear a lazy portal. + clear a lazy portal. Severity : minor Details : libcfs_id2str should check pid against LNET_PID_ANY. @@ -307,16 +562,16 @@ Severity : major Bugzilla : 12316 Description: Add OFED1.2 support to o2iblnd Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules - are installed (other than kernel's in-tree infiniband), there - could be some problem while insmod o2iblnd (mismatch CRC of - ib_* symbols). - If extra Module.symvers is supported in kernel (i.e, 2.6.17), - this link provides solution: - https://bugs.openfabrics.org/show_bug.cgi?id=355 - if extra Module.symvers is not supported in kernel, we will - have to run the script in bug 12316 to update - $LINUX/module.symvers before building o2iblnd. - More details about this are in bug 12316. + are installed (other than kernel's in-tree infiniband), there + could be some problem while insmod o2iblnd (mismatch CRC of + ib_* symbols). + If extra Module.symvers is supported in kernel (i.e, 2.6.17), + this link provides solution: + https://bugs.openfabrics.org/show_bug.cgi?id=355 + if extra Module.symvers is not supported in kernel, we will + have to run the script in bug 12316 to update + $LINUX/module.symvers before building o2iblnd. + More details about this are in bug 12316. ------------------------------------------------------------------------------ @@ -351,7 +606,7 @@ Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_ Severity : minor Frequency : rare Description: the_lnet.ln_finalizing was not set when the current thread is - about to complete messages. It only affects multi-threaded + about to complete messages. It only affects multi-threaded user space LNet. Severity : normal @@ -363,13 +618,13 @@ Severity : major Frequency : rare Bugzilla : 12458 Description: Assertion failure in kernel ptllnd caused by posting passive - bulk buffers before connection establishment complete. + bulk buffers before connection establishment complete. Severity : major Frequency : rare Bugzilla : 12445 Description: A race in kernel ptllnd between deleting a peer and posting - new communications for it could hang communications - + new communications for it could hang communications - manifesting as "Unexpectedly long timeout" messages. Severity : major @@ -392,7 +647,7 @@ Frequency : after Ptllnd timeouts and portals congestion Bugzilla : 11659 Description: Credit overflows Details : This was a bug in ptllnd connection establishment. The fix - implements better peer stamps to disambiguate connection + implements better peer stamps to disambiguate connection establishment and ensure both peers enter the credit flow state machine consistently. @@ -401,7 +656,7 @@ Frequency : rare Bugzilla : 11394 Description: kptllnd didn't propagate some network errors up to LNET Details : This bug was spotted while investigating 11394. The fix - ensures network errors on sends and bulk transfers are + ensures network errors on sends and bulk transfers are propagated to LNET/lustre correctly. Severity : enhancement @@ -670,11 +925,11 @@ Description: add /proc/sys/portals/catastrophe entry which will report if - renamed cfs_sleep_chan -> cfs_waitq cfs_sleep_link -> cfs_waitlink - - fixed race in linux version of arch-independent socknal - (the ENOMEM/EAGAIN decision). + - fixed race in linux version of arch-independent socknal + (the ENOMEM/EAGAIN decision). - Didn't fix problems in Darwin version of arch-independent socknal - (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision) + (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision) - removed libcfs types from non-socknal header files (only some types in the header files had been changed; the .c files hadn't been