1 tbd Sun Microsystems, Inc.
3 * Support for networks:
4 socklnd - any kernel supported by Lustre,
5 qswlnd - Qsnet kernel modules 5.20 and later,
6 openiblnd - IbGold 1.8.2,
7 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
8 viblnd - Voltaire ibhost 3.4.5 and later,
9 ciblnd - Topspin 3.2.0,
10 iiblnd - Infiniserv 3.3 + PathBits patch,
11 gmlnd - GM 2.1.22 and later,
12 mxlnd - MX 1.2.1 or later,
13 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
20 Severity : enhancement
22 Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
23 Details : an update from the upstream developer Scott Atchley.
25 Severity : enhancement
27 Description: add a new LND optiion to control peer buffer credits on routers
31 Description: Fixing deadlock in usocklnd
32 Details : A deadlock was possible in usocklnd due to race condition while
33 tearing connection down. The problem resulted from erroneous
34 assumption that lnet_finalize() could have been called holding
38 Bugzilla : 13621, 15983
39 Description: Protocol V2 of o2iblnd
40 Details : o2iblnd V2 has several new features:
41 . map-on-demand: map-on-demand is disabled by default, it can
42 be enabled by using modparam "map_on_demand=@value@", @value@
43 should >= 0 and < 256, 0 will disable map-on-demand, any other
44 valid value will enable map-on-demand.
45 Oi2blnd will create FMR or physical MR for RDMA if fragments of
47 Enable map-on-demand will take less memory for new connection,
48 but a little more CPU for RDMA.
49 . iWARP : to support iWARP, please enable map-on-demand, 32 and 64
50 are recommanded value. iWARP will probably fail for value >=128.
51 . OOB NOOP message: to resolve deadlock on router.
52 . tunable peer_credits_hiw: (high water to return credits),
53 default value of peer_credits_hiw equals to (peer_credits -1),
54 user can change it between peer_credits/2 and (peer_credits - 1).
55 Lower value is recommended for high latency network.
56 . tunable message queue size: it always equals to peer_credits,
57 higher value is recommended for high latency network.
58 . It's compatible with earlier version of o2iblnd
62 Description: Fixing 'running out of ports' issue
63 Details : Add a delay before next reconnect attempt in ksocklnd in
64 the case of lost race. Limit the frequency of query-requests
65 in lnet. Improved handling of 'dead peer' notifications in
70 Description: Change ptllnd timeout and watchdog timers
71 Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match
76 Description: One down Lustre FS hangs ALL mounted Lustre filesystems
77 Details : Shared routing enhancements - peer health detection.
79 Severity : enhancement
81 Description: acceptor.c cleanup
82 Details : Code duplication in acceptor.c for the cases of kernel and
83 user-space removed. User-space libcfs tcpip primitives
84 uniformed to have prototypes similar to kernel ones. Minor
85 cosmetic changes in usocklnd to use cfs_socket_t as
86 representation of socket.
90 Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
91 Details : See comment 46 in bug 11245 for details - it's indeed a bug
92 introduced by the original 11245 fix.
96 Description: uptllnd credit overflow fix
97 Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since
102 Description: socklnd protocol version 3
103 Details : With current protocol V2, connections on router can be
104 blocked and can't receive any incoming messages when there is no
105 more router buffer, so ZC-ACK can't be handled (LNet message
106 can't be finalized) and will cause deadlock on router.
107 Protocol V3 has a dedicated connection for emergency messages
108 like ZC-ACK to router, messages on this dedicated connection
109 don't need any credit so will never be blocked. Also, V3 can send
110 keepalive ping in specified period for router healthy checking.
112 -------------------------------------------------------------------------------
114 12-31-2008 Sun Microsystems, Inc.
116 * Support for networks:
117 socklnd - any kernel supported by Lustre,
118 qswlnd - Qsnet kernel modules 5.20 and later,
119 openiblnd - IbGold 1.8.2,
120 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
121 viblnd - Voltaire ibhost 3.4.5 and later,
122 ciblnd - Topspin 3.2.0,
123 iiblnd - Infiniserv 3.3 + PathBits patch,
124 gmlnd - GM 2.1.22 and later,
125 mxlnd - MX 1.2.1 or later,
126 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
130 Description: workaround for OOM from o2iblnd
131 Details : OFED needs allocate big chunk of memory for QP while creating
132 connection for o2iblnd, OOM can happen if no such a contiguous
134 QP size is decided by concurrent_sends and max_fragments of
135 o2iblnd, now we permit user to specify smaller value for
136 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
137 will decrease memory block size required by creating QP.
141 Description: Support Zerocopy receive of Chelsio device
142 Details : Chelsio driver can support zerocopy for iov[1] if it's
143 contiguous and large enough.
147 Description: fix credit flow deadlock in uptllnd
151 Description: finalize network operation in reasonable time
152 Details : conf-sanity test_32a couldn't stop ost and mds because it
153 tried to access non-existent peer and tcp connect took
154 quite long before timing out.
158 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
159 Details : Lost reference on conn prevents peer from being destroyed, which
160 could prevent new peer creation if peer count has reached upper
165 Description: LNET Selftest results in Soft lockup on OSS CPU
166 Details : only hits when 8 or more o2ib clients involved and a session is
167 torn down with 'lst end_session' without preceeding 'lst stop'.
171 Description: concurrent_sends in IB LNDs should not be changeable at run time
172 Details : concurrent_sends in IB LNDs should not be changeable at run time
176 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
177 Details : only hits under out-of-memory situations
180 -------------------------------------------------------------------------------
182 2009-02-07 Sun Microsystems, Inc.
184 * Support for networks:
185 socklnd - any kernel supported by Lustre,
186 qswlnd - Qsnet kernel modules 5.20 and later,
187 openiblnd - IbGold 1.8.2,
188 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
189 viblnd - Voltaire ibhost 3.4.5 and later,
190 ciblnd - Topspin 3.2.0,
191 iiblnd - Infiniserv 3.3 + PathBits patch,
192 gmlnd - GM 2.1.22 and later,
193 mxlnd - MX 1.2.1 or later,
194 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
197 Description: workaround for OOM from o2iblnd
198 Details : OFED needs allocate big chunk of memory for QP while creating
199 connection for o2iblnd, OOM can happen if no such a contiguous
201 QP size is decided by concurrent_sends and max_fragments of
202 o2iblnd, now we permit user to specify smaller value for
203 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
204 will decrease memory block size required by creating QP.
208 Description: Support Zerocopy receive of Chelsio device
209 Details : Chelsio driver can support zerocopy for iov[1] if it's
210 contiguous and large enough.
213 Description: fix credit flow deadlock in uptllnd
217 Description: finalize network operation in reasonable time
218 Details : conf-sanity test_32a couldn't stop ost and mds because it
219 tried to access non-existent peer and tcp connect took
220 quite long before timing out.
224 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
225 Details : Lost reference on conn prevents peer from being destroyed, which
226 could prevent new peer creation if peer count has reached upper
231 Description: LNET Selftest results in Soft lockup on OSS CPU
232 Details : only hits when 8 or more o2ib clients involved and a session is
233 torn down with 'lst end_session' without preceeding 'lst stop'.
237 Description: concurrent_sends in IB LNDs should not be changeable at run time
238 Details : concurrent_sends in IB LNDs should not be changeable at run time
240 -------------------------------------------------------------------------------
242 11-03-2008 Sun Microsystems, Inc.
244 * Support for networks:
245 socklnd - any kernel supported by Lustre,
246 qswlnd - Qsnet kernel modules 5.20 and later,
247 openiblnd - IbGold 1.8.2,
248 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
249 viblnd - Voltaire ibhost 3.4.5 and later,
250 ciblnd - Topspin 3.2.0,
251 iiblnd - Infiniserv 3.3 + PathBits patch,
252 gmlnd - GM 2.1.22 and later,
253 mxlnd - MX 1.2.1 or later,
254 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
258 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
259 Details : only hits under out-of-memory situations
261 -------------------------------------------------------------------------------
263 04-26-2008 Sun Microsystems, Inc.
265 * Support for networks:
266 socklnd - any kernel supported by Lustre,
267 qswlnd - Qsnet kernel modules 5.20 and later,
268 openiblnd - IbGold 1.8.2,
269 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
270 viblnd - Voltaire ibhost 3.4.5 and later,
271 ciblnd - Topspin 3.2.0,
272 iiblnd - Infiniserv 3.3 + PathBits patch,
273 gmlnd - GM 2.1.22 and later,
274 mxlnd - MX 1.2.1 or later,
275 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
279 Description: excessive debug information removed
280 Details : excessive debug information removed
284 Description: ksocknal_create_conn() hit ASSERTION during connection race
285 Details : ksocknal_create_conn() hit ASSERTION during connection race
289 Description: ksocknal_send_hello() hit ASSERTION while connecting race
290 Details : ksocknal_send_hello() hit ASSERTION while connecting race
294 Description: o2iblnd/ptllnd credit deadlock in a routed config.
295 Details : o2iblnd/ptllnd credit deadlock in a routed config.
299 Description: High load after starting lnet
300 Details : gmlnd should sleep in rx thread in interruptible way. Otherwise,
301 uptime utility reports high load that looks confusingly.
305 Description: ksocklnd fails to establish connection if accept_port is high
306 Details : PID remapping must not be done for active (outgoing) connections
309 --------------------------------------------------------------------------------
311 2008-01-11 Sun Microsystems, Inc.
313 * Support for networks:
314 socklnd - any kernel supported by Lustre,
315 qswlnd - Qsnet kernel modules 5.20 and later,
316 openiblnd - IbGold 1.8.2,
317 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
318 viblnd - Voltaire ibhost 3.4.5 and later,
319 ciblnd - Topspin 3.2.0,
320 iiblnd - Infiniserv 3.3 + PathBits patch,
321 gmlnd - GM 2.1.22 and later,
322 mxlnd - MX 1.2.1 or later,
323 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
327 Description: liblustre network error
328 Details : liblustre clients should understand LNET_ACCEPT_PORT environment
329 variable even if they don't start lnet acceptor.
333 Description: Strange message from lnet (Ignoring prediction from the future)
334 Details : Incorrect calculation of peer's last_alive value in ksocklnd
336 --------------------------------------------------------------------------------
338 2007-12-07 Cluster File Systems, Inc. <info@clusterfs.com>
340 * Support for networks:
341 socklnd - any kernel supported by Lustre,
342 qswlnd - Qsnet kernel modules 5.20 and later,
343 openiblnd - IbGold 1.8.2,
344 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5.
345 viblnd - Voltaire ibhost 3.4.5 and later,
346 ciblnd - Topspin 3.2.0,
347 iiblnd - Infiniserv 3.3 + PathBits patch,
348 gmlnd - GM 2.1.22 and later,
349 mxlnd - MX 1.2.1 or later,
350 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
354 Description: ASSERTION(me == md->md_me) failed in lnet_match_md()
358 Description: increase send queue size for ciblnd/openiblnd
362 Description: new userspace socklnd
363 Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced
364 with new one - usocklnd.
366 Severity : enhancement
368 Description: Console message flood
369 Details : Make cdls ratelimiting more tunable by adding several tunable in
370 procfs /proc/sys/lnet/console_{min,max}_delay_centisecs and
371 /proc/sys/lnet/console_backoff.
373 --------------------------------------------------------------------------------
375 2007-09-27 Cluster File Systems, Inc. <info@clusterfs.com>
377 * Support for networks:
378 socklnd - any kernel supported by Lustre,
379 qswlnd - Qsnet kernel modules 5.20 and later,
380 openiblnd - IbGold 1.8.2,
381 o2iblnd - OFED 1.1 and 1.2,
382 viblnd - Voltaire ibhost 3.4.5 and later,
383 ciblnd - Topspin 3.2.0,
384 iiblnd - Infiniserv 3.3 + PathBits patch,
385 gmlnd - GM 2.1.22 and later,
386 mxlnd - MX 1.2.1 or later,
387 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
391 Description: /proc/sys/lnet has non-sysctl entries
392 Details : Updating dump_kernel/daemon_file/debug_mb to use sysctl variables
396 Description: TOE Kernel panic by ksocklnd
397 Details : offloaded sockets provide their own implementation of sendpage,
398 can't call tcp_sendpage() directly
402 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs
403 Details : races between lnd_shutdown and peer creation prevent
404 lnd_shutdown from finishing.
408 Description: open files rlimit 1024 reached while liblustre testing
409 Details : ulnds/socklnd must close open socket after unsuccessful
414 Description: build error
415 Details : fix typos in gmlnd, ptllnd and viblnd
417 --------------------------------------------------------------------------------
419 2007-07-30 Cluster File Systems, Inc. <info@clusterfs.com>
421 * Support for networks:
422 socklnd - kernels up to 2.6.16,
423 qswlnd - Qsnet kernel modules 5.20 and later,
424 openiblnd - IbGold 1.8.2,
425 o2iblnd - OFED 1.1 and 1.2
426 viblnd - Voltaire ibhost 3.4.5 and later,
427 ciblnd - Topspin 3.2.0,
428 iiblnd - Infiniserv 3.3 + PathBits patch,
429 gmlnd - GM 2.1.22 and later,
430 mxlnd - MX 1.2.1 or later,
431 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
433 --------------------------------------------------------------------------------
435 2007-06-21 Cluster File Systems, Inc. <info@clusterfs.com>
437 * Support for networks:
438 socklnd - kernels up to 2.6.16,
439 qswlnd - Qsnet kernel modules 5.20 and later,
440 openiblnd - IbGold 1.8.2,
442 viblnd - Voltaire ibhost 3.4.5 and later,
443 ciblnd - Topspin 3.2.0,
444 iiblnd - Infiniserv 3.3 + PathBits patch,
445 gmlnd - GM 2.1.22 and later,
446 mxlnd - MX 1.2.1 or later,
447 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
451 Description: Initialize cpumask before use
455 Description: ASSERTION failures when upgrading to the patchless zero-copy
457 Details : This bug affects "rolling upgrades", causing an inconsistent
458 protocol version negotiation and subsequent assertion failure
459 during rolling upgrades after the first wave of upgrades.
463 Details : Change "dropped message" CERRORs to D_NETERROR so they are
464 logged instead of creating "console chatter" when a lustre
465 timeout races with normal RPC completion.
468 Details : lnet_clear_peer_table can wait forever if user forgets to
472 Details : libcfs_id2str should check pid against LNET_PID_ANY.
476 Description: added LNET self test
477 Details : landing b_self_test
482 Description: cfs_duration_{u,n}sec() wrongly calculate nanosecond part of
484 Details : do_div() macro is used incorrectly.
486 2007-04-23 Cluster File Systems, Inc. <info@clusterfs.com>
490 Description: make panic on lbug configurable
494 Description: Add OFED1.2 support to o2iblnd
495 Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules
496 are installed (other than kernel's in-tree infiniband), there
497 could be some problem while insmod o2iblnd (mismatch CRC of
499 If extra Module.symvers is supported in kernel (i.e, 2.6.17),
500 this link provides solution:
501 https://bugs.openfabrics.org/show_bug.cgi?id=355
502 if extra Module.symvers is not supported in kernel, we will
503 have to run the script in bug 12316 to update
504 $LINUX/module.symvers before building o2iblnd.
505 More details about this are in bug 12316.
507 ------------------------------------------------------------------------------
509 2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
510 * version 1.4.10 / 1.6.0
511 * Support for networks:
512 socklnd - kernels up to 2.6.16,
513 qswlnd - Qsnet kernel modules 5.20 and later,
514 openiblnd - IbGold 1.8.2,
516 viblnd - Voltaire ibhost 3.4.5 and later,
517 ciblnd - Topspin 3.2.0,
518 iiblnd - Infiniserv 3.3 + PathBits patch,
519 gmlnd - GM 2.1.22 and later,
520 mxlnd - MX 1.2.1 or later,
521 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
525 Description: Ptllnd didn't init kptllnd_data.kptl_idle_txs before it could be
526 possibly accessed in kptllnd_shutdown. Ptllnd should init
527 kptllnd_data.kptl_ptlid2str_lock before calling kptllnd_ptlid2str.
531 Description: gmlnd ignored some transmit errors when finalizing lnet messages.
535 Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_hello.
539 Description: the_lnet.ln_finalizing was not set when the current thread is
540 about to complete messages. It only affects multi-threaded
546 Description: Changed the default kqswlnd ntxmsg=512
551 Description: Assertion failure in kernel ptllnd caused by posting passive
552 bulk buffers before connection establishment complete.
557 Description: A race in kernel ptllnd between deleting a peer and posting
558 new communications for it could hang communications -
559 manifesting as "Unexpectedly long timeout" messages.
564 Description: Kernel ptllnd lock ordering issue could hang a node.
569 Description: node crash on socket teardown race
572 Frequency : 'lctl peer_list' issued on a mx net
574 Description: Enable lctl's peer_list for MXLND
577 Frequency : after Ptllnd timeouts and portals congestion
579 Description: Credit overflows
580 Details : This was a bug in ptllnd connection establishment. The fix
581 implements better peer stamps to disambiguate connection
582 establishment and ensure both peers enter the credit flow
583 state machine consistently.
588 Description: kptllnd didn't propagate some network errors up to LNET
589 Details : This bug was spotted while investigating 11394. The fix
590 ensures network errors on sends and bulk transfers are
591 propagated to LNET/lustre correctly.
593 Severity : enhancement
595 Description: Fixed console chatter in case of -ETIMEDOUT.
597 Severity : enhancement
599 Description: Added D_NETTRACE for recording network packet history
600 (initially only for ptllnd). Also a separate userspace
601 ptllnd facility to gather history which should really be
602 covered by D_NETTRACE too, if only CDEBUG recorded history in
608 Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED.
609 Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED
610 callback can occur before a connection has actually been
611 established. This caused an assertion failure previously.
613 Severity : enhancement
615 Description: Multiple instances for o2iblnd
616 Details : Allow multiple instances of o2iblnd to enable networking over
617 multiple HCAs and routing between them.
621 Description: lnet deadlock in router_checker
622 Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock
623 into BH locks to eliminate potential deadlock caused by
624 ksocknal_data_ready() preempting code holding these locks.
628 Description: Millions of failed socklnd connection attempts cause a very slow FS
629 Details : added a new route flag ksnr_scheduled to distinguish from
630 ksnr_connecting, so that a peer connection request is only turned
631 down for race concerns when an active connection to the same peer
632 is under progress (instead of just being scheduled).
634 ------------------------------------------------------------------------------
636 2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
638 * Support for networks:
639 socklnd - kernels up to 2.6.16
640 qswlnd - Qsnet kernel modules 5.20 and later
641 openiblnd - IbGold 1.8.2
643 viblnd - Voltaire ibhost 3.4.5 and later
644 ciblnd - Topspin 3.2.0
645 iiblnd - Infiniserv 3.3 + PathBits patch
646 gmlnd - GM 2.1.22 and later
647 mxlnd - MX 1.2.1 or later
648 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
651 Severity : major on XT3
653 Description: libcfs overwrites /proc/sys/portals
654 Details : libcfs created a symlink from /proc/sys/portals to
655 /proc/sys/lnet for backwards compatibility. This is no
656 longer required and makes the Cray portals /proc variables
661 Description: OFED FMR API change
662 Details : This changes parameter usage to reflect a change in
663 ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note
664 that FMR support is only used in experimental versions of the
665 o2iblnd - this change does not affect standard usage at all.
667 Severity : enhancement
669 Description: new ko2iblnd module parameter: ib_mtu
670 Details : the default IB MTU of 2048 performs badly on 23108 Tavor
671 HCAs. You can avoid this problem by setting the MTU to 1024
672 using this module parameter.
674 Severity : enhancement
675 Bugzilla : 11118/11620
676 Description: ptllnd small request message buffer alignment fix
677 Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives.
678 Round up small message size on sends in case this option
679 is not supported. 11620 was a defect in the initial
680 implementation which effectively asserted all peers had to be
681 running the correct protocol version which was fixed by always
682 NAK-ing such requests and handling any misalignments they
687 Description: When kib(nal|lnd)_del_peer() is called upon a peer whose
688 ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s
689 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail.
691 Severity : enhancement
693 Description: Patchless ZC(zero copy) socklnd
694 Details : New protocol for socklnd, socklnd can support zero copy without
695 kernel patch, it's compatible with old socklnd. Checksum is
696 moved from tunables to modparams.
700 Description: When ksocknal_del_peer() is called upon a peer whose
701 ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s
702 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail.
705 Frequency : when ptlrpc is under heavy use and runs out of request buffer
707 Description: In lnet_match_blocked_msg(), md can be used without holding a
711 Frequency : very rarely
713 Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost.
714 If connd connects a route which has been closed by
715 ksocknal_shutdown(), ksocknal_create_routes() may create new
716 routes which hold references on the peer, causing shutdown
717 process to wait for peer to disappear forever.
719 Severity : enhancement
721 Description: Dump XT3 portals traces on kptllnd timeout
722 Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to
723 dump Cray portals debug traces to a file. The kptllnd module
724 parameter "ptltrace_basename", default "/tmp/lnet-ptltrace",
725 is the basename of the dump file.
728 Frequency : infrequent
730 Description: kernel ptllnd fix bug in connection re-establishment
731 Details : Kernel ptllnd could produce protocol errors e.g. illegal
732 matchbits and/or violate the credit flow protocol when trying
733 to re-establish a connection with a peer after an error or
736 Severity : enhancement
738 Description: Allow /proc/sys/lnet/debug to be set symbolically
739 Details : Allow debug and subsystem debug values to be read/set by name
740 in addition to numerically, for ease of use.
743 Frequency : only in configurations with LNET routers
745 Description: routes automatically marked down and recovered
746 Details : In configurations with LNET routers if a router fails routers
747 now actively try to recover routes that are down, unless they
748 are marked down by an administrator.
750 ------------------------------------------------------------------------------
752 2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
755 Frequency : very rarely, in configurations with LNET routers and TCP
757 Description: incorrect data written to files on OSTs
758 Details : In certain high-load conditions incorrect data may be written
759 to files on the OST when using TCP networks.
761 ------------------------------------------------------------------------------
763 2006-07-31 Cluster File Systems, Inc. <info@clusterfs.com>
765 - rework CDEBUG messages rate-limiting mechanism b=10375
766 - add per-socket tunables for socklnd if the kernel is patched b=10327
768 ------------------------------------------------------------------------------
770 2006-02-15 Cluster File Systems, Inc. <info@clusterfs.com>
772 - fix use of portals/lnet pid to avoid dropping RPCs b=10074
773 - iiblnd wasn't mapping all memory, resulting in comms errors b=9776
774 - quiet LNET startup LNI message for liblustre b=10128
775 - Better console error messages if 'ip2nets' can't match an IP address
776 - Fixed overflow/use-before-set bugs in linux-time.h
777 - Fixed ptllnd bug that wasn't initialising rx descriptors completely
778 - LNET teardown failed an assertion about the route table being empty
779 - Fixed a crash in LNetEQPoll(<invalid handle>)
780 - Future protocol compatibility work (b_rls146_lnetprotovrsn)
781 - improve debug message for liblustre/Catamount nodes (b=10116)
783 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
784 * Configuration change for the XT3
785 The PTLLND is now used to run Lustre over Portals on the XT3.
786 The configure option(s) --with-cray-portals are no longer
787 used. Rather --with-portals=<path-to-portals-includes> is
788 used to enable building on the XT3. In addition to enable
789 XT3 specific features the option --enable-cray-xt3 must be
792 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
793 * Portals has been removed, replaced by LNET.
794 LNET is new networking infrastructure for Lustre, it includes a
795 reorganized network configuration mode (see the user
796 documentation for full details) as well as support for routing
797 between different network fabrics. Lustre Networking Devices
798 (LNDS) for the supported network fabrics have also been created
799 for this new infrastructure.
801 2005-08-08 Cluster File Systems, Inc. <info@clusterfs.com>
806 Frequency : rare (large Voltaire clusters only)
808 Description: the default number of reserved transmit descriptors was too low
809 for some large clusters
810 Details : As a workaround, the number was increased. A proper fix includes
813 2005-06-02 Cluster File Systems, Inc. <info@clusterfs.com>
818 Frequency : occasional (large-scale events, cluster reboot, network failure)
820 Description: too many error messages on console obscure actual problem and
821 can slow down/panic server, or cause recovery to fail repeatedly
822 Details : enable rate-limiting of console error messages, and some messages
823 that were console errors now only go to the kernel log
825 Severity : enhancement
827 Description: add /proc/sys/portals/catastrophe entry which will report if
828 that node has previously LBUGged
830 2005-04-06 Cluster File Systems, Inc. <info@clusterfs.com>
832 - update gmnal to use PTL_MTU, fix module refcounting (b=5786)
834 2005-04-04 Cluster File Systems, Inc. <info@clusterfs.com>
836 - handle error return code in kranal_check_fma_rx() (5915,6054)
838 2005-02-04 Cluster File Systems, Inc. <info@clusterfs.com>
840 - update vibnal (Voltaire IB NAL)
841 - update gmnal (Myrinet NAL), gmnalid
843 2005-02-04 Eric Barton <eeb@bartonsoftware.com>
845 * Landed portals:b_port_step as follows...
847 - removed CFS_DECL_SPIN*
848 just use 'spinlock_t' and initialise with spin_lock_init()
850 - removed CFS_DECL_MUTEX*
851 just use 'struct semaphore' and initialise with init_mutex()
853 - removed CFS_DECL_RWSEM*
854 just use 'struct rw_semaphore' and initialise with init_rwsem()
856 - renamed cfs_sleep_chan -> cfs_waitq
857 cfs_sleep_link -> cfs_waitlink
859 - fixed race in linux version of arch-independent socknal
860 (the ENOMEM/EAGAIN decision).
862 - Didn't fix problems in Darwin version of arch-independent socknal
863 (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
865 - removed libcfs types from non-socknal header files (only some types
866 in the header files had been changed; the .c files hadn't been