1 tbd Sun Microsystems, Inc.
3 * Support for networks:
4 socklnd - any kernel supported by Lustre,
5 qswlnd - Qsnet kernel modules 5.20 and later,
6 openiblnd - IbGold 1.8.2,
7 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
8 viblnd - Voltaire ibhost 3.4.5 and later,
9 ciblnd - Topspin 3.2.0,
10 iiblnd - Infiniserv 3.3 + PathBits patch,
11 gmlnd - GM 2.1.22 and later,
12 mxlnd - MX 1.2.1 or later,
13 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
20 Severity : enhancement
22 Description: LNet fine grain routing support.
26 Description: router checker stops working when system wall clock goes backward
27 Details : use monotonic timing source instead of system wall clock time.
29 Severity : enhancement
31 Description: avoid asymmetrical router failures
33 Severity : enhancement
35 Description: multiple-instance support for kptllnd
39 Description: ksocknal_close_conn_locked connection race
40 Details : A race was possible when ksocknal_create_conn calls
41 ksocknal_close_conn_locked for already closed conn.
45 Description: router_proc.c is rewritten to use sysctl-interface for parameters
46 residing in /proc/sys/lnet
48 Severity : enhancement
50 Description: port router pinger to userspace
54 Description: kptllnd HELLO protocol deadlock
55 Details : kptllnd HELLO protocol doesn't run to completion in finite time
59 Description: LNet selftest fixes and enhancements
61 Severity : enhancement
63 Description: allow a test node to be a member of multiple test groups
65 Severity : enhancement
67 Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
68 Details : an update from the upstream developer Scott Atchley.
70 Severity : enhancement
72 Description: add a new LND optiion to control peer buffer credits on routers
76 Description: Fixing deadlock in usocklnd
77 Details : A deadlock was possible in usocklnd due to race condition while
78 tearing connection down. The problem resulted from erroneous
79 assumption that lnet_finalize() could have been called holding
83 Bugzilla : 13621, 15983
84 Description: Protocol V2 of o2iblnd
85 Details : o2iblnd V2 has several new features:
86 . map-on-demand: map-on-demand is disabled by default, it can
87 be enabled by using modparam "map_on_demand=@value@", @value@
88 should >= 0 and < 256, 0 will disable map-on-demand, any other
89 valid value will enable map-on-demand.
90 Oi2blnd will create FMR or physical MR for RDMA if fragments of
92 Enable map-on-demand will take less memory for new connection,
93 but a little more CPU for RDMA.
94 . iWARP : to support iWARP, please enable map-on-demand, 32 and 64
95 are recommanded value. iWARP will probably fail for value >=128.
96 . OOB NOOP message: to resolve deadlock on router.
97 . tunable peer_credits_hiw: (high water to return credits),
98 default value of peer_credits_hiw equals to (peer_credits -1),
99 user can change it between peer_credits/2 and (peer_credits - 1).
100 Lower value is recommended for high latency network.
101 . tunable message queue size: it always equals to peer_credits,
102 higher value is recommended for high latency network.
103 . It's compatible with earlier version of o2iblnd
107 Description: Fixing 'running out of ports' issue
108 Details : Add a delay before next reconnect attempt in ksocklnd in
109 the case of lost race. Limit the frequency of query-requests
110 in lnet. Improved handling of 'dead peer' notifications in
115 Description: Change ptllnd timeout and watchdog timers
116 Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match
117 Portals wire timeout.
121 Description: One down Lustre FS hangs ALL mounted Lustre filesystems
122 Details : Shared routing enhancements - peer health detection.
124 Severity : enhancement
126 Description: acceptor.c cleanup
127 Details : Code duplication in acceptor.c for the cases of kernel and
128 user-space removed. User-space libcfs tcpip primitives
129 uniformed to have prototypes similar to kernel ones. Minor
130 cosmetic changes in usocklnd to use cfs_socket_t as
131 representation of socket.
135 Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
136 Details : See comment 46 in bug 11245 for details - it's indeed a bug
137 introduced by the original 11245 fix.
141 Description: uptllnd credit overflow fix
142 Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since
147 Description: socklnd protocol version 3
148 Details : With current protocol V2, connections on router can be
149 blocked and can't receive any incoming messages when there is no
150 more router buffer, so ZC-ACK can't be handled (LNet message
151 can't be finalized) and will cause deadlock on router.
152 Protocol V3 has a dedicated connection for emergency messages
153 like ZC-ACK to router, messages on this dedicated connection
154 don't need any credit so will never be blocked. Also, V3 can send
155 keepalive ping in specified period for router healthy checking.
157 -------------------------------------------------------------------------------
159 12-31-2008 Sun Microsystems, Inc.
161 * Support for networks:
162 socklnd - any kernel supported by Lustre,
163 qswlnd - Qsnet kernel modules 5.20 and later,
164 openiblnd - IbGold 1.8.2,
165 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
166 viblnd - Voltaire ibhost 3.4.5 and later,
167 ciblnd - Topspin 3.2.0,
168 iiblnd - Infiniserv 3.3 + PathBits patch,
169 gmlnd - GM 2.1.22 and later,
170 mxlnd - MX 1.2.1 or later,
171 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
175 Description: workaround for OOM from o2iblnd
176 Details : OFED needs allocate big chunk of memory for QP while creating
177 connection for o2iblnd, OOM can happen if no such a contiguous
179 QP size is decided by concurrent_sends and max_fragments of
180 o2iblnd, now we permit user to specify smaller value for
181 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
182 will decrease memory block size required by creating QP.
186 Description: Support Zerocopy receive of Chelsio device
187 Details : Chelsio driver can support zerocopy for iov[1] if it's
188 contiguous and large enough.
192 Description: fix credit flow deadlock in uptllnd
196 Description: finalize network operation in reasonable time
197 Details : conf-sanity test_32a couldn't stop ost and mds because it
198 tried to access non-existent peer and tcp connect took
199 quite long before timing out.
203 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
204 Details : Lost reference on conn prevents peer from being destroyed, which
205 could prevent new peer creation if peer count has reached upper
210 Description: LNET Selftest results in Soft lockup on OSS CPU
211 Details : only hits when 8 or more o2ib clients involved and a session is
212 torn down with 'lst end_session' without preceeding 'lst stop'.
216 Description: concurrent_sends in IB LNDs should not be changeable at run time
217 Details : concurrent_sends in IB LNDs should not be changeable at run time
221 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
222 Details : only hits under out-of-memory situations
225 -------------------------------------------------------------------------------
227 2009-02-07 Sun Microsystems, Inc.
229 * Support for networks:
230 socklnd - any kernel supported by Lustre,
231 qswlnd - Qsnet kernel modules 5.20 and later,
232 openiblnd - IbGold 1.8.2,
233 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
234 viblnd - Voltaire ibhost 3.4.5 and later,
235 ciblnd - Topspin 3.2.0,
236 iiblnd - Infiniserv 3.3 + PathBits patch,
237 gmlnd - GM 2.1.22 and later,
238 mxlnd - MX 1.2.1 or later,
239 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
242 Description: workaround for OOM from o2iblnd
243 Details : OFED needs allocate big chunk of memory for QP while creating
244 connection for o2iblnd, OOM can happen if no such a contiguous
246 QP size is decided by concurrent_sends and max_fragments of
247 o2iblnd, now we permit user to specify smaller value for
248 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
249 will decrease memory block size required by creating QP.
253 Description: Support Zerocopy receive of Chelsio device
254 Details : Chelsio driver can support zerocopy for iov[1] if it's
255 contiguous and large enough.
258 Description: fix credit flow deadlock in uptllnd
262 Description: finalize network operation in reasonable time
263 Details : conf-sanity test_32a couldn't stop ost and mds because it
264 tried to access non-existent peer and tcp connect took
265 quite long before timing out.
269 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
270 Details : Lost reference on conn prevents peer from being destroyed, which
271 could prevent new peer creation if peer count has reached upper
276 Description: LNET Selftest results in Soft lockup on OSS CPU
277 Details : only hits when 8 or more o2ib clients involved and a session is
278 torn down with 'lst end_session' without preceeding 'lst stop'.
282 Description: concurrent_sends in IB LNDs should not be changeable at run time
283 Details : concurrent_sends in IB LNDs should not be changeable at run time
285 -------------------------------------------------------------------------------
287 11-03-2008 Sun Microsystems, Inc.
289 * Support for networks:
290 socklnd - any kernel supported by Lustre,
291 qswlnd - Qsnet kernel modules 5.20 and later,
292 openiblnd - IbGold 1.8.2,
293 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
294 viblnd - Voltaire ibhost 3.4.5 and later,
295 ciblnd - Topspin 3.2.0,
296 iiblnd - Infiniserv 3.3 + PathBits patch,
297 gmlnd - GM 2.1.22 and later,
298 mxlnd - MX 1.2.1 or later,
299 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
303 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
304 Details : only hits under out-of-memory situations
306 -------------------------------------------------------------------------------
308 04-26-2008 Sun Microsystems, Inc.
310 * Support for networks:
311 socklnd - any kernel supported by Lustre,
312 qswlnd - Qsnet kernel modules 5.20 and later,
313 openiblnd - IbGold 1.8.2,
314 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
315 viblnd - Voltaire ibhost 3.4.5 and later,
316 ciblnd - Topspin 3.2.0,
317 iiblnd - Infiniserv 3.3 + PathBits patch,
318 gmlnd - GM 2.1.22 and later,
319 mxlnd - MX 1.2.1 or later,
320 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
324 Description: excessive debug information removed
325 Details : excessive debug information removed
329 Description: ksocknal_create_conn() hit ASSERTION during connection race
330 Details : ksocknal_create_conn() hit ASSERTION during connection race
334 Description: ksocknal_send_hello() hit ASSERTION while connecting race
335 Details : ksocknal_send_hello() hit ASSERTION while connecting race
339 Description: o2iblnd/ptllnd credit deadlock in a routed config.
340 Details : o2iblnd/ptllnd credit deadlock in a routed config.
344 Description: High load after starting lnet
345 Details : gmlnd should sleep in rx thread in interruptible way. Otherwise,
346 uptime utility reports high load that looks confusingly.
350 Description: ksocklnd fails to establish connection if accept_port is high
351 Details : PID remapping must not be done for active (outgoing) connections
354 --------------------------------------------------------------------------------
356 2008-01-11 Sun Microsystems, Inc.
358 * Support for networks:
359 socklnd - any kernel supported by Lustre,
360 qswlnd - Qsnet kernel modules 5.20 and later,
361 openiblnd - IbGold 1.8.2,
362 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
363 viblnd - Voltaire ibhost 3.4.5 and later,
364 ciblnd - Topspin 3.2.0,
365 iiblnd - Infiniserv 3.3 + PathBits patch,
366 gmlnd - GM 2.1.22 and later,
367 mxlnd - MX 1.2.1 or later,
368 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
372 Description: liblustre network error
373 Details : liblustre clients should understand LNET_ACCEPT_PORT environment
374 variable even if they don't start lnet acceptor.
378 Description: Strange message from lnet (Ignoring prediction from the future)
379 Details : Incorrect calculation of peer's last_alive value in ksocklnd
381 --------------------------------------------------------------------------------
383 2007-12-07 Cluster File Systems, Inc. <info@clusterfs.com>
385 * Support for networks:
386 socklnd - any kernel supported by Lustre,
387 qswlnd - Qsnet kernel modules 5.20 and later,
388 openiblnd - IbGold 1.8.2,
389 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5.
390 viblnd - Voltaire ibhost 3.4.5 and later,
391 ciblnd - Topspin 3.2.0,
392 iiblnd - Infiniserv 3.3 + PathBits patch,
393 gmlnd - GM 2.1.22 and later,
394 mxlnd - MX 1.2.1 or later,
395 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
399 Description: ASSERTION(me == md->md_me) failed in lnet_match_md()
403 Description: increase send queue size for ciblnd/openiblnd
407 Description: new userspace socklnd
408 Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced
409 with new one - usocklnd.
411 Severity : enhancement
413 Description: Console message flood
414 Details : Make cdls ratelimiting more tunable by adding several tunable in
415 procfs /proc/sys/lnet/console_{min,max}_delay_centisecs and
416 /proc/sys/lnet/console_backoff.
418 --------------------------------------------------------------------------------
420 2007-09-27 Cluster File Systems, Inc. <info@clusterfs.com>
422 * Support for networks:
423 socklnd - any kernel supported by Lustre,
424 qswlnd - Qsnet kernel modules 5.20 and later,
425 openiblnd - IbGold 1.8.2,
426 o2iblnd - OFED 1.1 and 1.2,
427 viblnd - Voltaire ibhost 3.4.5 and later,
428 ciblnd - Topspin 3.2.0,
429 iiblnd - Infiniserv 3.3 + PathBits patch,
430 gmlnd - GM 2.1.22 and later,
431 mxlnd - MX 1.2.1 or later,
432 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
436 Description: /proc/sys/lnet has non-sysctl entries
437 Details : Updating dump_kernel/daemon_file/debug_mb to use sysctl variables
441 Description: TOE Kernel panic by ksocklnd
442 Details : offloaded sockets provide their own implementation of sendpage,
443 can't call tcp_sendpage() directly
447 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs
448 Details : races between lnd_shutdown and peer creation prevent
449 lnd_shutdown from finishing.
453 Description: open files rlimit 1024 reached while liblustre testing
454 Details : ulnds/socklnd must close open socket after unsuccessful
459 Description: build error
460 Details : fix typos in gmlnd, ptllnd and viblnd
462 --------------------------------------------------------------------------------
464 2007-07-30 Cluster File Systems, Inc. <info@clusterfs.com>
466 * Support for networks:
467 socklnd - kernels up to 2.6.16,
468 qswlnd - Qsnet kernel modules 5.20 and later,
469 openiblnd - IbGold 1.8.2,
470 o2iblnd - OFED 1.1 and 1.2
471 viblnd - Voltaire ibhost 3.4.5 and later,
472 ciblnd - Topspin 3.2.0,
473 iiblnd - Infiniserv 3.3 + PathBits patch,
474 gmlnd - GM 2.1.22 and later,
475 mxlnd - MX 1.2.1 or later,
476 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
478 --------------------------------------------------------------------------------
480 2007-06-21 Cluster File Systems, Inc. <info@clusterfs.com>
482 * Support for networks:
483 socklnd - kernels up to 2.6.16,
484 qswlnd - Qsnet kernel modules 5.20 and later,
485 openiblnd - IbGold 1.8.2,
487 viblnd - Voltaire ibhost 3.4.5 and later,
488 ciblnd - Topspin 3.2.0,
489 iiblnd - Infiniserv 3.3 + PathBits patch,
490 gmlnd - GM 2.1.22 and later,
491 mxlnd - MX 1.2.1 or later,
492 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
496 Description: Initialize cpumask before use
500 Description: ASSERTION failures when upgrading to the patchless zero-copy
502 Details : This bug affects "rolling upgrades", causing an inconsistent
503 protocol version negotiation and subsequent assertion failure
504 during rolling upgrades after the first wave of upgrades.
508 Details : Change "dropped message" CERRORs to D_NETERROR so they are
509 logged instead of creating "console chatter" when a lustre
510 timeout races with normal RPC completion.
513 Details : lnet_clear_peer_table can wait forever if user forgets to
517 Details : libcfs_id2str should check pid against LNET_PID_ANY.
521 Description: added LNET self test
522 Details : landing b_self_test
527 Description: cfs_duration_{u,n}sec() wrongly calculate nanosecond part of
529 Details : do_div() macro is used incorrectly.
531 2007-04-23 Cluster File Systems, Inc. <info@clusterfs.com>
535 Description: make panic on lbug configurable
539 Description: Add OFED1.2 support to o2iblnd
540 Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules
541 are installed (other than kernel's in-tree infiniband), there
542 could be some problem while insmod o2iblnd (mismatch CRC of
544 If extra Module.symvers is supported in kernel (i.e, 2.6.17),
545 this link provides solution:
546 https://bugs.openfabrics.org/show_bug.cgi?id=355
547 if extra Module.symvers is not supported in kernel, we will
548 have to run the script in bug 12316 to update
549 $LINUX/module.symvers before building o2iblnd.
550 More details about this are in bug 12316.
552 ------------------------------------------------------------------------------
554 2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
555 * version 1.4.10 / 1.6.0
556 * Support for networks:
557 socklnd - kernels up to 2.6.16,
558 qswlnd - Qsnet kernel modules 5.20 and later,
559 openiblnd - IbGold 1.8.2,
561 viblnd - Voltaire ibhost 3.4.5 and later,
562 ciblnd - Topspin 3.2.0,
563 iiblnd - Infiniserv 3.3 + PathBits patch,
564 gmlnd - GM 2.1.22 and later,
565 mxlnd - MX 1.2.1 or later,
566 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
570 Description: Ptllnd didn't init kptllnd_data.kptl_idle_txs before it could be
571 possibly accessed in kptllnd_shutdown. Ptllnd should init
572 kptllnd_data.kptl_ptlid2str_lock before calling kptllnd_ptlid2str.
576 Description: gmlnd ignored some transmit errors when finalizing lnet messages.
580 Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_hello.
584 Description: the_lnet.ln_finalizing was not set when the current thread is
585 about to complete messages. It only affects multi-threaded
591 Description: Changed the default kqswlnd ntxmsg=512
596 Description: Assertion failure in kernel ptllnd caused by posting passive
597 bulk buffers before connection establishment complete.
602 Description: A race in kernel ptllnd between deleting a peer and posting
603 new communications for it could hang communications -
604 manifesting as "Unexpectedly long timeout" messages.
609 Description: Kernel ptllnd lock ordering issue could hang a node.
614 Description: node crash on socket teardown race
617 Frequency : 'lctl peer_list' issued on a mx net
619 Description: Enable lctl's peer_list for MXLND
622 Frequency : after Ptllnd timeouts and portals congestion
624 Description: Credit overflows
625 Details : This was a bug in ptllnd connection establishment. The fix
626 implements better peer stamps to disambiguate connection
627 establishment and ensure both peers enter the credit flow
628 state machine consistently.
633 Description: kptllnd didn't propagate some network errors up to LNET
634 Details : This bug was spotted while investigating 11394. The fix
635 ensures network errors on sends and bulk transfers are
636 propagated to LNET/lustre correctly.
638 Severity : enhancement
640 Description: Fixed console chatter in case of -ETIMEDOUT.
642 Severity : enhancement
644 Description: Added D_NETTRACE for recording network packet history
645 (initially only for ptllnd). Also a separate userspace
646 ptllnd facility to gather history which should really be
647 covered by D_NETTRACE too, if only CDEBUG recorded history in
653 Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED.
654 Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED
655 callback can occur before a connection has actually been
656 established. This caused an assertion failure previously.
658 Severity : enhancement
660 Description: Multiple instances for o2iblnd
661 Details : Allow multiple instances of o2iblnd to enable networking over
662 multiple HCAs and routing between them.
666 Description: lnet deadlock in router_checker
667 Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock
668 into BH locks to eliminate potential deadlock caused by
669 ksocknal_data_ready() preempting code holding these locks.
673 Description: Millions of failed socklnd connection attempts cause a very slow FS
674 Details : added a new route flag ksnr_scheduled to distinguish from
675 ksnr_connecting, so that a peer connection request is only turned
676 down for race concerns when an active connection to the same peer
677 is under progress (instead of just being scheduled).
679 ------------------------------------------------------------------------------
681 2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
683 * Support for networks:
684 socklnd - kernels up to 2.6.16
685 qswlnd - Qsnet kernel modules 5.20 and later
686 openiblnd - IbGold 1.8.2
688 viblnd - Voltaire ibhost 3.4.5 and later
689 ciblnd - Topspin 3.2.0
690 iiblnd - Infiniserv 3.3 + PathBits patch
691 gmlnd - GM 2.1.22 and later
692 mxlnd - MX 1.2.1 or later
693 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
696 Severity : major on XT3
698 Description: libcfs overwrites /proc/sys/portals
699 Details : libcfs created a symlink from /proc/sys/portals to
700 /proc/sys/lnet for backwards compatibility. This is no
701 longer required and makes the Cray portals /proc variables
706 Description: OFED FMR API change
707 Details : This changes parameter usage to reflect a change in
708 ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note
709 that FMR support is only used in experimental versions of the
710 o2iblnd - this change does not affect standard usage at all.
712 Severity : enhancement
714 Description: new ko2iblnd module parameter: ib_mtu
715 Details : the default IB MTU of 2048 performs badly on 23108 Tavor
716 HCAs. You can avoid this problem by setting the MTU to 1024
717 using this module parameter.
719 Severity : enhancement
720 Bugzilla : 11118/11620
721 Description: ptllnd small request message buffer alignment fix
722 Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives.
723 Round up small message size on sends in case this option
724 is not supported. 11620 was a defect in the initial
725 implementation which effectively asserted all peers had to be
726 running the correct protocol version which was fixed by always
727 NAK-ing such requests and handling any misalignments they
732 Description: When kib(nal|lnd)_del_peer() is called upon a peer whose
733 ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s
734 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail.
736 Severity : enhancement
738 Description: Patchless ZC(zero copy) socklnd
739 Details : New protocol for socklnd, socklnd can support zero copy without
740 kernel patch, it's compatible with old socklnd. Checksum is
741 moved from tunables to modparams.
745 Description: When ksocknal_del_peer() is called upon a peer whose
746 ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s
747 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail.
750 Frequency : when ptlrpc is under heavy use and runs out of request buffer
752 Description: In lnet_match_blocked_msg(), md can be used without holding a
756 Frequency : very rarely
758 Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost.
759 If connd connects a route which has been closed by
760 ksocknal_shutdown(), ksocknal_create_routes() may create new
761 routes which hold references on the peer, causing shutdown
762 process to wait for peer to disappear forever.
764 Severity : enhancement
766 Description: Dump XT3 portals traces on kptllnd timeout
767 Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to
768 dump Cray portals debug traces to a file. The kptllnd module
769 parameter "ptltrace_basename", default "/tmp/lnet-ptltrace",
770 is the basename of the dump file.
773 Frequency : infrequent
775 Description: kernel ptllnd fix bug in connection re-establishment
776 Details : Kernel ptllnd could produce protocol errors e.g. illegal
777 matchbits and/or violate the credit flow protocol when trying
778 to re-establish a connection with a peer after an error or
781 Severity : enhancement
783 Description: Allow /proc/sys/lnet/debug to be set symbolically
784 Details : Allow debug and subsystem debug values to be read/set by name
785 in addition to numerically, for ease of use.
788 Frequency : only in configurations with LNET routers
790 Description: routes automatically marked down and recovered
791 Details : In configurations with LNET routers if a router fails routers
792 now actively try to recover routes that are down, unless they
793 are marked down by an administrator.
795 ------------------------------------------------------------------------------
797 2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
800 Frequency : very rarely, in configurations with LNET routers and TCP
802 Description: incorrect data written to files on OSTs
803 Details : In certain high-load conditions incorrect data may be written
804 to files on the OST when using TCP networks.
806 ------------------------------------------------------------------------------
808 2006-07-31 Cluster File Systems, Inc. <info@clusterfs.com>
810 - rework CDEBUG messages rate-limiting mechanism b=10375
811 - add per-socket tunables for socklnd if the kernel is patched b=10327
813 ------------------------------------------------------------------------------
815 2006-02-15 Cluster File Systems, Inc. <info@clusterfs.com>
817 - fix use of portals/lnet pid to avoid dropping RPCs b=10074
818 - iiblnd wasn't mapping all memory, resulting in comms errors b=9776
819 - quiet LNET startup LNI message for liblustre b=10128
820 - Better console error messages if 'ip2nets' can't match an IP address
821 - Fixed overflow/use-before-set bugs in linux-time.h
822 - Fixed ptllnd bug that wasn't initialising rx descriptors completely
823 - LNET teardown failed an assertion about the route table being empty
824 - Fixed a crash in LNetEQPoll(<invalid handle>)
825 - Future protocol compatibility work (b_rls146_lnetprotovrsn)
826 - improve debug message for liblustre/Catamount nodes (b=10116)
828 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
829 * Configuration change for the XT3
830 The PTLLND is now used to run Lustre over Portals on the XT3.
831 The configure option(s) --with-cray-portals are no longer
832 used. Rather --with-portals=<path-to-portals-includes> is
833 used to enable building on the XT3. In addition to enable
834 XT3 specific features the option --enable-cray-xt3 must be
837 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
838 * Portals has been removed, replaced by LNET.
839 LNET is new networking infrastructure for Lustre, it includes a
840 reorganized network configuration mode (see the user
841 documentation for full details) as well as support for routing
842 between different network fabrics. Lustre Networking Devices
843 (LNDS) for the supported network fabrics have also been created
844 for this new infrastructure.
846 2005-08-08 Cluster File Systems, Inc. <info@clusterfs.com>
851 Frequency : rare (large Voltaire clusters only)
853 Description: the default number of reserved transmit descriptors was too low
854 for some large clusters
855 Details : As a workaround, the number was increased. A proper fix includes
858 2005-06-02 Cluster File Systems, Inc. <info@clusterfs.com>
863 Frequency : occasional (large-scale events, cluster reboot, network failure)
865 Description: too many error messages on console obscure actual problem and
866 can slow down/panic server, or cause recovery to fail repeatedly
867 Details : enable rate-limiting of console error messages, and some messages
868 that were console errors now only go to the kernel log
870 Severity : enhancement
872 Description: add /proc/sys/portals/catastrophe entry which will report if
873 that node has previously LBUGged
875 2005-04-06 Cluster File Systems, Inc. <info@clusterfs.com>
877 - update gmnal to use PTL_MTU, fix module refcounting (b=5786)
879 2005-04-04 Cluster File Systems, Inc. <info@clusterfs.com>
881 - handle error return code in kranal_check_fma_rx() (5915,6054)
883 2005-02-04 Cluster File Systems, Inc. <info@clusterfs.com>
885 - update vibnal (Voltaire IB NAL)
886 - update gmnal (Myrinet NAL), gmnalid
888 2005-02-04 Eric Barton <eeb@bartonsoftware.com>
890 * Landed portals:b_port_step as follows...
892 - removed CFS_DECL_SPIN*
893 just use 'spinlock_t' and initialise with spin_lock_init()
895 - removed CFS_DECL_MUTEX*
896 just use 'struct semaphore' and initialise with init_mutex()
898 - removed CFS_DECL_RWSEM*
899 just use 'struct rw_semaphore' and initialise with init_rwsem()
901 - renamed cfs_sleep_chan -> cfs_waitq
902 cfs_sleep_link -> cfs_waitlink
904 - fixed race in linux version of arch-independent socknal
905 (the ENOMEM/EAGAIN decision).
907 - Didn't fix problems in Darwin version of arch-independent socknal
908 (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
910 - removed libcfs types from non-socknal header files (only some types
911 in the header files had been changed; the .c files hadn't been