3 * Support for networks:
4 socklnd - any kernel supported by Lustre,
5 qswlnd - Qsnet kernel modules 5.20 and later,
6 openiblnd - IbGold 1.8.2,
7 o2iblnd - OFED 1.3, 1.4.1, 1.4.2 and 1.5.1
8 viblnd - Voltaire ibhost 3.4.5 and later,
9 ciblnd - Topspin 3.2.0,
10 iiblnd - Infiniserv 3.3 + PathBits patch,
11 gmlnd - GM 2.1.22 and later,
12 mxlnd - MX 1.2.10 or later,
13 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
17 Description: With peer health detection, o2iblnd makes only one attempt to
18 reconnect which is not enough with nodes running lustre 1.6
19 because of proto version mismatch. Fix o2iblnd to retry one more
22 Severity : enhancement
24 Description: Quiet some LNET messages
26 Severity : enhancement
28 Description: Add OFED 1.5.1 support
30 Severity : enhancement
32 Description: The peer health code lacked some important debugging info in
33 lnd_query code paths. We've added necessary debug prints,
34 not just for bug 21678, but also for future troubleshooting.
36 -------------------------------------------------------------------------------
38 2010-04-30 Oracle, Inc.
40 * Support for networks:
41 socklnd - any kernel supported by Lustre,
42 qswlnd - Qsnet kernel modules 5.20 and later,
43 openiblnd - IbGold 1.8.2,
44 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, 1.4.1, and 1.4.2
45 viblnd - Voltaire ibhost 3.4.5 and later,
46 ciblnd - Topspin 3.2.0,
47 iiblnd - Infiniserv 3.3 + PathBits patch,
48 gmlnd - GM 2.1.22 and later,
49 mxlnd - MX 1.2.10 or later,
50 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
54 Description: lst: check # of remained RPCs before aborting
55 Details : lstcon_rpc_trans_postwait() calls lstcon_rpc_trans_abort() only
56 when the transaction is timeout, so if we got "end_session" to
57 interrupt waiting on transaction, then we can hit the assertion
58 failure ASSERTION(crpc->crp_stamp != 0)
62 Description: print more debug info for timedout ZC-req
63 Details : Print more information for timedout ZC-req and partial
64 received connection. Close connection for timedout ZC-req
65 Always send ZC_ACK on non-blocking connection(BULK_IN)
69 Description: Adding WIRE_ATTR attribute to LNET types
70 Details : LST nodes on different platforms might not communicate well
71 due to the lack of WIRE_ATTR attribute in some LNET structures
72 traversing network. The patch fixes the problem by adding
73 WIRE_ATTR where needed.
77 Description: hash MEs on RDMA portal
78 Details : RDMA portal can have very long ME list on client side, which
79 will trigger soft lockup because of long searching on list.
80 Hash MEs on RDMA portal can resolve this problem.
84 Description: fix for double release of ibc_lock in o2iblnd
85 Details : Re-acquire ibc_lock in kiblnd_post_tx_locked(). Add extra
86 reference to conn before calling kiblnd_post_tx_locked()
87 to avoid scenario when conn disappears inside
88 kiblnd_post_tx_locked().
90 -------------------------------------------------------------------------------
91 2010-01-29 Sun Microsystems, Inc.
93 * Support for networks:
94 socklnd - any kernel supported by Lustre,
95 qswlnd - Qsnet kernel modules 5.20 and later,
96 openiblnd - IbGold 1.8.2,
97 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, 1.4.1, and 1.4.2
98 viblnd - Voltaire ibhost 3.4.5 and later,
99 ciblnd - Topspin 3.2.0,
100 iiblnd - Infiniserv 3.3 + PathBits patch,
101 gmlnd - GM 2.1.22 and later,
102 mxlnd - MX 1.2.10 or later,
103 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
107 Description: should update lp_alive for non-router peers.
109 Severity : enhancement
111 Description: LNet router shuffler.
113 Severity : enhancement
115 Description: LNet fine grain routing support.
119 Description: router checker stops working when system wall clock goes backward
120 Details : use monotonic timing source instead of system wall clock time.
122 Severity : enhancement
124 Description: avoid asymmetrical router failures
126 Severity : enhancement
128 Description: multiple-instance support for kptllnd
132 Description: ksocknal_close_conn_locked connection race
133 Details : A race was possible when ksocknal_create_conn calls
134 ksocknal_close_conn_locked for already closed conn.
136 Severity : enhancement
138 Description: port router pinger to userspace
142 Description: kptllnd HELLO protocol deadlock
143 Details : kptllnd HELLO protocol doesn't run to completion in finite time
147 Description: LNet selftest fixes and enhancements
149 Severity : enhancement
151 Description: allow a test node to be a member of multiple test groups
153 Severity : enhancement
155 Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
156 Details : an update from the upstream developer Scott Atchley.
159 -------------------------------------------------------------------------------
160 2009-07-31 Sun Microsystems, Inc.
162 * Support for networks:
163 socklnd - any kernel supported by Lustre,
164 qswlnd - Qsnet kernel modules 5.20 and later,
165 openiblnd - IbGold 1.8.2,
166 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
167 viblnd - Voltaire ibhost 3.4.5 and later,
168 ciblnd - Topspin 3.2.0,
169 iiblnd - Infiniserv 3.3 + PathBits patch,
170 gmlnd - GM 2.1.22 and later,
171 mxlnd - MX 1.2.1 or later,
172 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
176 Description: router_proc.c is rewritten to use sysctl-interface for parameters
177 residing in /proc/sys/lnet
179 Severity : enhancement
181 Description: add a new LND optiion to control peer buffer credits on routers
185 Description: Fixing deadlock in usocklnd
186 Details : A deadlock was possible in usocklnd due to race condition while
187 tearing connection down. The problem resulted from erroneous
188 assumption that lnet_finalize() could have been called holding
189 some lnd-level locks.
192 Bugzilla : 13621, 15983
193 Description: Protocol V2 of o2iblnd
194 Details : o2iblnd V2 has several new features:
195 . map-on-demand: map-on-demand is disabled by default, it can
196 be enabled by using modparam "map_on_demand=@value@", @value@
197 should >= 0 and < 256, 0 will disable map-on-demand, any other
198 valid value will enable map-on-demand.
199 Oi2blnd will create FMR or physical MR for RDMA if fragments of
201 Enable map-on-demand will take less memory for new connection,
202 but a little more CPU for RDMA.
203 . iWARP : to support iWARP, please enable map-on-demand, 32 and 64
204 are recommanded value. iWARP will probably fail for value >=128.
205 . OOB NOOP message: to resolve deadlock on router.
206 . tunable peer_credits_hiw: (high water to return credits),
207 default value of peer_credits_hiw equals to (peer_credits -1),
208 user can change it between peer_credits/2 and (peer_credits - 1).
209 Lower value is recommended for high latency network.
210 . tunable message queue size: it always equals to peer_credits,
211 higher value is recommended for high latency network.
212 . It's compatible with earlier version of o2iblnd
216 Description: Fixing 'running out of ports' issue
217 Details : Add a delay before next reconnect attempt in ksocklnd in
218 the case of lost race. Limit the frequency of query-requests
219 in lnet. Improved handling of 'dead peer' notifications in
224 Description: Change ptllnd timeout and watchdog timers
225 Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match
226 Portals wire timeout.
230 Description: One down Lustre FS hangs ALL mounted Lustre filesystems
231 Details : Shared routing enhancements - peer health detection.
235 Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
236 Details : See comment 46 in bug 11245 for details - it's indeed a bug
237 introduced by the original 11245 fix.
241 Description: uptllnd credit overflow fix
242 Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since
247 Description: socklnd protocol version 3
248 Details : With current protocol V2, connections on router can be
249 blocked and can't receive any incoming messages when there is no
250 more router buffer, so ZC-ACK can't be handled (LNet message
251 can't be finalized) and will cause deadlock on router.
252 Protocol V3 has a dedicated connection for emergency messages
253 like ZC-ACK to router, messages on this dedicated connection
254 don't need any credit so will never be blocked. Also, V3 can send
255 keepalive ping in specified period for router healthy checking.
257 -------------------------------------------------------------------------------
258 12-31-2008 Sun Microsystems, Inc.
260 * Support for networks:
261 socklnd - any kernel supported by Lustre,
262 qswlnd - Qsnet kernel modules 5.20 and later,
263 openiblnd - IbGold 1.8.2,
264 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
265 viblnd - Voltaire ibhost 3.4.5 and later,
266 ciblnd - Topspin 3.2.0,
267 iiblnd - Infiniserv 3.3 + PathBits patch,
268 gmlnd - GM 2.1.22 and later,
269 mxlnd - MX 1.2.1 or later,
270 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
279 Description: workaround for OOM from o2iblnd
280 Details : OFED needs allocate big chunk of memory for QP while creating
281 connection for o2iblnd, OOM can happen if no such a contiguous
283 QP size is decided by concurrent_sends and max_fragments of
284 o2iblnd, now we permit user to specify smaller value for
285 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
286 will decrease memory block size required by creating QP.
290 Description: Support Zerocopy receive of Chelsio device
291 Details : Chelsio driver can support zerocopy for iov[1] if it's
292 contiguous and large enough.
296 Description: fix credit flow deadlock in uptllnd
300 Description: finalize network operation in reasonable time
301 Details : conf-sanity test_32a couldn't stop ost and mds because it
302 tried to access non-existent peer and tcp connect took
303 quite long before timing out.
307 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
308 Details : Lost reference on conn prevents peer from being destroyed, which
309 could prevent new peer creation if peer count has reached upper
314 Description: LNET Selftest results in Soft lockup on OSS CPU
315 Details : only hits when 8 or more o2ib clients involved and a session is
316 torn down with 'lst end_session' without preceeding 'lst stop'.
320 Description: concurrent_sends in IB LNDs should not be changeable at run time
321 Details : concurrent_sends in IB LNDs should not be changeable at run time
325 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
326 Details : only hits under out-of-memory situations
329 -------------------------------------------------------------------------------
331 2009-02-07 Sun Microsystems, Inc.
333 * Support for networks:
334 socklnd - any kernel supported by Lustre,
335 qswlnd - Qsnet kernel modules 5.20 and later,
336 openiblnd - IbGold 1.8.2,
337 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
338 viblnd - Voltaire ibhost 3.4.5 and later,
339 ciblnd - Topspin 3.2.0,
340 iiblnd - Infiniserv 3.3 + PathBits patch,
341 gmlnd - GM 2.1.22 and later,
342 mxlnd - MX 1.2.1 or later,
343 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
347 Description: workaround for OOM from o2iblnd
348 Details : OFED needs allocate big chunk of memory for QP while creating
349 connection for o2iblnd, OOM can happen if no such a contiguous
351 QP size is decided by concurrent_sends and max_fragments of
352 o2iblnd, now we permit user to specify smaller value for
353 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
354 will decrease memory block size required by creating QP.
358 Description: Support Zerocopy receive of Chelsio device
359 Details : Chelsio driver can support zerocopy for iov[1] if it's
360 contiguous and large enough.
364 Description: fix credit flow deadlock in uptllnd
368 Description: finalize network operation in reasonable time
369 Details : conf-sanity test_32a couldn't stop ost and mds because it
370 tried to access non-existent peer and tcp connect took
371 quite long before timing out.
375 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
376 Details : Lost reference on conn prevents peer from being destroyed, which
377 could prevent new peer creation if peer count has reached upper
382 Description: LNET Selftest results in Soft lockup on OSS CPU
383 Details : only hits when 8 or more o2ib clients involved and a session is
384 torn down with 'lst end_session' without preceeding 'lst stop'.
388 Description: concurrent_sends in IB LNDs should not be changeable at run time
389 Details : concurrent_sends in IB LNDs should not be changeable at run time
391 -------------------------------------------------------------------------------
393 11-03-2008 Sun Microsystems, Inc.
395 * Support for networks:
396 socklnd - any kernel supported by Lustre,
397 qswlnd - Qsnet kernel modules 5.20 and later,
398 openiblnd - IbGold 1.8.2,
399 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
400 viblnd - Voltaire ibhost 3.4.5 and later,
401 ciblnd - Topspin 3.2.0,
402 iiblnd - Infiniserv 3.3 + PathBits patch,
403 gmlnd - GM 2.1.22 and later,
404 mxlnd - MX 1.2.1 or later,
405 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
409 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
410 Details : only hits under out-of-memory situations
412 -------------------------------------------------------------------------------
415 04-26-2008 Sun Microsystems, Inc.
417 * Support for networks:
418 socklnd - any kernel supported by Lustre,
419 qswlnd - Qsnet kernel modules 5.20 and later,
420 openiblnd - IbGold 1.8.2,
421 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
422 viblnd - Voltaire ibhost 3.4.5 and later,
423 ciblnd - Topspin 3.2.0,
424 iiblnd - Infiniserv 3.3 + PathBits patch,
425 gmlnd - GM 2.1.22 and later,
426 mxlnd - MX 1.2.1 or later,
427 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
431 Description: excessive debug information removed
432 Details : excessive debug information removed
436 Description: ksocknal_create_conn() hit ASSERTION during connection race
437 Details : ksocknal_create_conn() hit ASSERTION during connection race
441 Description: ksocknal_send_hello() hit ASSERTION while connecting race
442 Details : ksocknal_send_hello() hit ASSERTION while connecting race
446 Description: o2iblnd/ptllnd credit deadlock in a routed config.
447 Details : o2iblnd/ptllnd credit deadlock in a routed config.
451 Description: High load after starting lnet
452 Details : gmlnd should sleep in rx thread in interruptible way. Otherwise,
453 uptime utility reports high load that looks confusingly.
457 Description: ksocklnd fails to establish connection if accept_port is high
458 Details : PID remapping must not be done for active (outgoing) connections
460 --------------------------------------------------------------------------------
462 2008-01-11 Sun Microsystems, Inc.
464 * Support for networks:
465 socklnd - any kernel supported by Lustre,
466 qswlnd - Qsnet kernel modules 5.20 and later,
467 openiblnd - IbGold 1.8.2,
468 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
469 viblnd - Voltaire ibhost 3.4.5 and later,
470 ciblnd - Topspin 3.2.0,
471 iiblnd - Infiniserv 3.3 + PathBits patch,
472 gmlnd - GM 2.1.22 and later,
473 mxlnd - MX 1.2.1 or later,
474 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
477 Description: liblustre network error
478 Details : liblustre clients should understand LNET_ACCEPT_PORT environment
479 variable even if they don't start lnet acceptor.
483 Description: Strange message from lnet (Ignoring prediction from the future)
484 Details : Incorrect calculation of peer's last_alive value in ksocklnd
486 --------------------------------------------------------------------------------
488 2007-12-07 Cluster File Systems, Inc. <info@clusterfs.com>
490 * Support for networks:
491 socklnd - any kernel supported by Lustre,
492 qswlnd - Qsnet kernel modules 5.20 and later,
493 openiblnd - IbGold 1.8.2,
494 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5.
495 viblnd - Voltaire ibhost 3.4.5 and later,
496 ciblnd - Topspin 3.2.0,
497 iiblnd - Infiniserv 3.3 + PathBits patch,
498 gmlnd - GM 2.1.22 and later,
499 mxlnd - MX 1.2.1 or later,
500 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
504 Description: ASSERTION(me == md->md_me) failed in lnet_match_md()
508 Description: increase send queue size for ciblnd/openiblnd
512 Description: new userspace socklnd
513 Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced
514 with new one - usocklnd.
516 Severity : enhancement
518 Description: Console message flood
519 Details : Make cdls ratelimiting more tunable by adding several tunable in
520 procfs /proc/sys/lnet/console_{min,max}_delay_centisecs and
521 /proc/sys/lnet/console_backoff.
523 --------------------------------------------------------------------------------
525 2007-09-27 Cluster File Systems, Inc. <info@clusterfs.com>
527 * Support for networks:
528 socklnd - any kernel supported by Lustre,
529 qswlnd - Qsnet kernel modules 5.20 and later,
530 openiblnd - IbGold 1.8.2,
531 o2iblnd - OFED 1.1 and 1.2,
532 viblnd - Voltaire ibhost 3.4.5 and later,
533 ciblnd - Topspin 3.2.0,
534 iiblnd - Infiniserv 3.3 + PathBits patch,
535 gmlnd - GM 2.1.22 and later,
536 mxlnd - MX 1.2.1 or later,
537 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
541 Description: /proc/sys/lnet has non-sysctl entries
542 Details : Updating dump_kernel/daemon_file/debug_mb to use sysctl variables
546 Description: TOE Kernel panic by ksocklnd
547 Details : offloaded sockets provide their own implementation of sendpage,
548 can't call tcp_sendpage() directly
552 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs
553 Details : races between lnd_shutdown and peer creation prevent
554 lnd_shutdown from finishing.
558 Description: open files rlimit 1024 reached while liblustre testing
559 Details : ulnds/socklnd must close open socket after unsuccessful
564 Description: build error
565 Details : fix typos in gmlnd, ptllnd and viblnd
567 ------------------------------------------------------------------------------
569 2007-07-30 Cluster File Systems, Inc. <info@clusterfs.com>
571 * Support for networks:
572 socklnd - kernels up to 2.6.16,
573 qswlnd - Qsnet kernel modules 5.20 and later,
574 openiblnd - IbGold 1.8.2,
575 o2iblnd - OFED 1.1 and 1.2
576 viblnd - Voltaire ibhost 3.4.5 and later,
577 ciblnd - Topspin 3.2.0,
578 iiblnd - Infiniserv 3.3 + PathBits patch,
579 gmlnd - GM 2.1.22 and later,
580 mxlnd - MX 1.2.1 or later,
581 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
583 2007-06-21 Cluster File Systems, Inc. <info@clusterfs.com>
585 * Support for networks:
586 socklnd - kernels up to 2.6.16,
587 qswlnd - Qsnet kernel modules 5.20 and later,
588 openiblnd - IbGold 1.8.2,
590 viblnd - Voltaire ibhost 3.4.5 and later,
591 ciblnd - Topspin 3.2.0,
592 iiblnd - Infiniserv 3.3 + PathBits patch,
593 gmlnd - GM 2.1.22 and later,
594 mxlnd - MX 1.2.1 or later,
595 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
599 Description: Initialize cpumask before use
603 Description: ASSERTION failures when upgrading to the patchless zero-copy
605 Details : This bug affects "rolling upgrades", causing an inconsistent
606 protocol version negotiation and subsequent assertion failure
607 during rolling upgrades after the first wave of upgrades.
611 Details : Change "dropped message" CERRORs to D_NETERROR so they are
612 logged instead of creating "console chatter" when a lustre
613 timeout races with normal RPC completion.
616 Details : lnet_clear_peer_table can wait forever if user forgets to
620 Details : libcfs_id2str should check pid against LNET_PID_ANY.
624 Description: added LNET self test
625 Details : landing b_self_test
630 Description: cfs_duration_{u,n}sec() wrongly calculate nanosecond part of
632 Details : do_div() macro is used incorrectly.
634 2007-04-23 Cluster File Systems, Inc. <info@clusterfs.com>
638 Description: make panic on lbug configurable
642 Description: Add OFED1.2 support to o2iblnd
643 Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules
644 are installed (other than kernel's in-tree infiniband), there
645 could be some problem while insmod o2iblnd (mismatch CRC of
647 If extra Module.symvers is supported in kernel (i.e, 2.6.17),
648 this link provides solution:
649 https://bugs.openfabrics.org/show_bug.cgi?id=355
650 if extra Module.symvers is not supported in kernel, we will
651 have to run the script in bug 12316 to update
652 $LINUX/module.symvers before building o2iblnd.
653 More details about this are in bug 12316.
655 ------------------------------------------------------------------------------
657 2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
658 * version 1.4.10 / 1.6.0
659 * Support for networks:
660 socklnd - kernels up to 2.6.16,
661 qswlnd - Qsnet kernel modules 5.20 and later,
662 openiblnd - IbGold 1.8.2,
664 viblnd - Voltaire ibhost 3.4.5 and later,
665 ciblnd - Topspin 3.2.0,
666 iiblnd - Infiniserv 3.3 + PathBits patch,
667 gmlnd - GM 2.1.22 and later,
668 mxlnd - MX 1.2.1 or later,
669 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
673 Description: Ptllnd didn't init kptllnd_data.kptl_idle_txs before it could be
674 possibly accessed in kptllnd_shutdown. Ptllnd should init
675 kptllnd_data.kptl_ptlid2str_lock before calling kptllnd_ptlid2str.
679 Description: gmlnd ignored some transmit errors when finalizing lnet messages.
683 Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_hello.
687 Description: the_lnet.ln_finalizing was not set when the current thread is
688 about to complete messages. It only affects multi-threaded
694 Description: Changed the default kqswlnd ntxmsg=512
699 Description: Assertion failure in kernel ptllnd caused by posting passive
700 bulk buffers before connection establishment complete.
705 Description: A race in kernel ptllnd between deleting a peer and posting
706 new communications for it could hang communications -
707 manifesting as "Unexpectedly long timeout" messages.
712 Description: Kernel ptllnd lock ordering issue could hang a node.
717 Description: node crash on socket teardown race
720 Frequency : 'lctl peer_list' issued on a mx net
722 Description: Enable lctl's peer_list for MXLND
725 Frequency : after Ptllnd timeouts and portals congestion
727 Description: Credit overflows
728 Details : This was a bug in ptllnd connection establishment. The fix
729 implements better peer stamps to disambiguate connection
730 establishment and ensure both peers enter the credit flow
731 state machine consistently.
736 Description: kptllnd didn't propagate some network errors up to LNET
737 Details : This bug was spotted while investigating 11394. The fix
738 ensures network errors on sends and bulk transfers are
739 propagated to LNET/lustre correctly.
741 Severity : enhancement
743 Description: Fixed console chatter in case of -ETIMEDOUT.
745 Severity : enhancement
747 Description: Added D_NETTRACE for recording network packet history
748 (initially only for ptllnd). Also a separate userspace
749 ptllnd facility to gather history which should really be
750 covered by D_NETTRACE too, if only CDEBUG recorded history in
756 Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED.
757 Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED
758 callback can occur before a connection has actually been
759 established. This caused an assertion failure previously.
761 Severity : enhancement
763 Description: Multiple instances for o2iblnd
764 Details : Allow multiple instances of o2iblnd to enable networking over
765 multiple HCAs and routing between them.
769 Description: lnet deadlock in router_checker
770 Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock
771 into BH locks to eliminate potential deadlock caused by
772 ksocknal_data_ready() preempting code holding these locks.
776 Description: Millions of failed socklnd connection attempts cause a very slow FS
777 Details : added a new route flag ksnr_scheduled to distinguish from
778 ksnr_connecting, so that a peer connection request is only turned
779 down for race concerns when an active connection to the same peer
780 is under progress (instead of just being scheduled).
782 ------------------------------------------------------------------------------
784 2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
786 * Support for networks:
787 socklnd - kernels up to 2.6.16
788 qswlnd - Qsnet kernel modules 5.20 and later
789 openiblnd - IbGold 1.8.2
791 viblnd - Voltaire ibhost 3.4.5 and later
792 ciblnd - Topspin 3.2.0
793 iiblnd - Infiniserv 3.3 + PathBits patch
794 gmlnd - GM 2.1.22 and later
795 mxlnd - MX 1.2.1 or later
796 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
799 Severity : major on XT3
801 Description: libcfs overwrites /proc/sys/portals
802 Details : libcfs created a symlink from /proc/sys/portals to
803 /proc/sys/lnet for backwards compatibility. This is no
804 longer required and makes the Cray portals /proc variables
809 Description: OFED FMR API change
810 Details : This changes parameter usage to reflect a change in
811 ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note
812 that FMR support is only used in experimental versions of the
813 o2iblnd - this change does not affect standard usage at all.
815 Severity : enhancement
817 Description: new ko2iblnd module parameter: ib_mtu
818 Details : the default IB MTU of 2048 performs badly on 23108 Tavor
819 HCAs. You can avoid this problem by setting the MTU to 1024
820 using this module parameter.
822 Severity : enhancement
823 Bugzilla : 11118/11620
824 Description: ptllnd small request message buffer alignment fix
825 Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives.
826 Round up small message size on sends in case this option
827 is not supported. 11620 was a defect in the initial
828 implementation which effectively asserted all peers had to be
829 running the correct protocol version which was fixed by always
830 NAK-ing such requests and handling any misalignments they
835 Description: When kib(nal|lnd)_del_peer() is called upon a peer whose
836 ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s
837 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail.
839 Severity : enhancement
841 Description: Patchless ZC(zero copy) socklnd
842 Details : New protocol for socklnd, socklnd can support zero copy without
843 kernel patch, it's compatible with old socklnd. Checksum is
844 moved from tunables to modparams.
848 Description: When ksocknal_del_peer() is called upon a peer whose
849 ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s
850 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail.
853 Frequency : when ptlrpc is under heavy use and runs out of request buffer
855 Description: In lnet_match_blocked_msg(), md can be used without holding a
859 Frequency : very rarely
861 Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost.
862 If connd connects a route which has been closed by
863 ksocknal_shutdown(), ksocknal_create_routes() may create new
864 routes which hold references on the peer, causing shutdown
865 process to wait for peer to disappear forever.
867 Severity : enhancement
869 Description: Dump XT3 portals traces on kptllnd timeout
870 Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to
871 dump Cray portals debug traces to a file. The kptllnd module
872 parameter "ptltrace_basename", default "/tmp/lnet-ptltrace",
873 is the basename of the dump file.
876 Frequency : infrequent
878 Description: kernel ptllnd fix bug in connection re-establishment
879 Details : Kernel ptllnd could produce protocol errors e.g. illegal
880 matchbits and/or violate the credit flow protocol when trying
881 to re-establish a connection with a peer after an error or
884 Severity : enhancement
886 Description: Allow /proc/sys/lnet/debug to be set symbolically
887 Details : Allow debug and subsystem debug values to be read/set by name
888 in addition to numerically, for ease of use.
891 Frequency : only in configurations with LNET routers
893 Description: routes automatically marked down and recovered
894 Details : In configurations with LNET routers if a router fails routers
895 now actively try to recover routes that are down, unless they
896 are marked down by an administrator.
898 ------------------------------------------------------------------------------
900 2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
903 Frequency : very rarely, in configurations with LNET routers and TCP
905 Description: incorrect data written to files on OSTs
906 Details : In certain high-load conditions incorrect data may be written
907 to files on the OST when using TCP networks.
909 ------------------------------------------------------------------------------
911 2006-07-31 Cluster File Systems, Inc. <info@clusterfs.com>
913 - rework CDEBUG messages rate-limiting mechanism b=10375
914 - add per-socket tunables for socklnd if the kernel is patched b=10327
916 ------------------------------------------------------------------------------
918 2006-02-15 Cluster File Systems, Inc. <info@clusterfs.com>
920 - fix use of portals/lnet pid to avoid dropping RPCs b=10074
921 - iiblnd wasn't mapping all memory, resulting in comms errors b=9776
922 - quiet LNET startup LNI message for liblustre b=10128
923 - Better console error messages if 'ip2nets' can't match an IP address
924 - Fixed overflow/use-before-set bugs in linux-time.h
925 - Fixed ptllnd bug that wasn't initialising rx descriptors completely
926 - LNET teardown failed an assertion about the route table being empty
927 - Fixed a crash in LNetEQPoll(<invalid handle>)
928 - Future protocol compatibility work (b_rls146_lnetprotovrsn)
929 - improve debug message for liblustre/Catamount nodes (b=10116)
931 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
932 * Configuration change for the XT3
933 The PTLLND is now used to run Lustre over Portals on the XT3.
934 The configure option(s) --with-cray-portals are no longer
935 used. Rather --with-portals=<path-to-portals-includes> is
936 used to enable building on the XT3. In addition to enable
937 XT3 specific features the option --enable-cray-xt3 must be
940 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
941 * Portals has been removed, replaced by LNET.
942 LNET is new networking infrastructure for Lustre, it includes a
943 reorganized network configuration mode (see the user
944 documentation for full details) as well as support for routing
945 between different network fabrics. Lustre Networking Devices
946 (LNDS) for the supported network fabrics have also been created
947 for this new infrastructure.
949 2005-08-08 Cluster File Systems, Inc. <info@clusterfs.com>
954 Frequency : rare (large Voltaire clusters only)
956 Description: the default number of reserved transmit descriptors was too low
957 for some large clusters
958 Details : As a workaround, the number was increased. A proper fix includes
961 2005-06-02 Cluster File Systems, Inc. <info@clusterfs.com>
966 Frequency : occasional (large-scale events, cluster reboot, network failure)
968 Description: too many error messages on console obscure actual problem and
969 can slow down/panic server, or cause recovery to fail repeatedly
970 Details : enable rate-limiting of console error messages, and some messages
971 that were console errors now only go to the kernel log
973 Severity : enhancement
975 Description: add /proc/sys/portals/catastrophe entry which will report if
976 that node has previously LBUGged
978 2005-04-06 Cluster File Systems, Inc. <info@clusterfs.com>
980 - update gmnal to use PTL_MTU, fix module refcounting (b=5786)
982 2005-04-04 Cluster File Systems, Inc. <info@clusterfs.com>
984 - handle error return code in kranal_check_fma_rx() (5915,6054)
986 2005-02-04 Cluster File Systems, Inc. <info@clusterfs.com>
988 - update vibnal (Voltaire IB NAL)
989 - update gmnal (Myrinet NAL), gmnalid
991 2005-02-04 Eric Barton <eeb@bartonsoftware.com>
993 * Landed portals:b_port_step as follows...
995 - removed CFS_DECL_SPIN*
996 just use 'spinlock_t' and initialise with spin_lock_init()
998 - removed CFS_DECL_MUTEX*
999 just use 'struct semaphore' and initialise with init_mutex()
1001 - removed CFS_DECL_RWSEM*
1002 just use 'struct rw_semaphore' and initialise with init_rwsem()
1004 - renamed cfs_sleep_chan -> cfs_waitq
1005 cfs_sleep_link -> cfs_waitlink
1007 - fixed race in linux version of arch-independent socknal
1008 (the ENOMEM/EAGAIN decision).
1010 - Didn't fix problems in Darwin version of arch-independent socknal
1011 (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
1013 - removed libcfs types from non-socknal header files (only some types
1014 in the header files had been changed; the .c files hadn't been