1 tbd Sun Microsystems, Inc.
3 * Support for networks:
4 socklnd - any kernel supported by Lustre,
5 qswlnd - Qsnet kernel modules 5.20 and later,
6 openiblnd - IbGold 1.8.2,
7 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
8 viblnd - Voltaire ibhost 3.4.5 and later,
9 ciblnd - Topspin 3.2.0,
10 iiblnd - Infiniserv 3.3 + PathBits patch,
11 gmlnd - GM 2.1.22 and later,
12 mxlnd - MX 1.2.1 or later,
13 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
21 Bugzilla : 13621, 15983
22 Description: Protocol V2 of o2iblnd
23 Details : o2iblnd V2 has several new features:
24 . map-on-demand: map-on-demand is disabled by default, it can
25 be enabled by using modparam "map_on_demand=@value@", @value@
26 should >= 0 and < 256, 0 will disable map-on-demand, any other
27 valid value will enable map-on-demand.
28 Oi2blnd will create FMR or physical MR for RDMA if fragments of
30 Enable map-on-demand will take less memory for new connection,
31 but a little more CPU for RDMA.
32 . iWARP : to support iWARP, please enable map-on-demand, 32 and 64
33 are recommanded value. iWARP will probably fail for value >=128.
34 . OOB NOOP message: to resolve deadlock on router.
35 . tunable peer_credits_hiw: (high water to return credits),
36 default value of peer_credits_hiw equals to (peer_credits -1),
37 user can change it between peer_credits/2 and (peer_credits - 1).
38 Lower value is recommended for high latency network.
39 . tunable message queue size: it always equals to peer_credits,
40 higher value is recommended for high latency network.
41 . It's compatible with earlier version of o2iblnd
45 Description: Change ptllnd timeout and watchdog timers
46 Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match
51 Description: One down Lustre FS hangs ALL mounted Lustre filesystems
52 Details : Shared routing enhancements - peer health detection.
54 Severity : enhancement
56 Description: acceptor.c cleanup
57 Details : Code duplication in acceptor.c for the cases of kernel and
58 user-space removed. User-space libcfs tcpip primitives
59 uniformed to have prototypes similar to kernel ones. Minor
60 cosmetic changes in usocklnd to use cfs_socket_t as
61 representation of socket.
65 Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
66 Details : See comment 46 in bug 11245 for details - it's indeed a bug
67 introduced by the original 11245 fix.
71 Description: uptllnd credit overflow fix
72 Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since
77 Description: socklnd protocol version 3
78 Details : With current protocol V2, connections on router can be
79 blocked and can't receive any incoming messages when there is no
80 more router buffer, so ZC-ACK can't be handled (LNet message
81 can't be finalized) and will cause deadlock on router.
82 Protocol V3 has a dedicated connection for emergency messages
83 like ZC-ACK to router, messages on this dedicated connection
84 don't need any credit so will never be blocked. Also, V3 can send
85 keepalive ping in specified period for router healthy checking.
87 -------------------------------------------------------------------------------
89 12-31-2008 Sun Microsystems, Inc.
91 * Support for networks:
92 socklnd - any kernel supported by Lustre,
93 qswlnd - Qsnet kernel modules 5.20 and later,
94 openiblnd - IbGold 1.8.2,
95 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
96 viblnd - Voltaire ibhost 3.4.5 and later,
97 ciblnd - Topspin 3.2.0,
98 iiblnd - Infiniserv 3.3 + PathBits patch,
99 gmlnd - GM 2.1.22 and later,
100 mxlnd - MX 1.2.1 or later,
101 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
105 Description: workaround for OOM from o2iblnd
106 Details : OFED needs allocate big chunk of memory for QP while creating
107 connection for o2iblnd, OOM can happen if no such a contiguous
109 QP size is decided by concurrent_sends and max_fragments of
110 o2iblnd, now we permit user to specify smaller value for
111 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
112 will decrease memory block size required by creating QP.
116 Description: Support Zerocopy receive of Chelsio device
117 Details : Chelsio driver can support zerocopy for iov[1] if it's
118 contiguous and large enough.
122 Description: fix credit flow deadlock in uptllnd
126 Description: finalize network operation in reasonable time
127 Details : conf-sanity test_32a couldn't stop ost and mds because it
128 tried to access non-existent peer and tcp connect took
129 quite long before timing out.
133 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
134 Details : Lost reference on conn prevents peer from being destroyed, which
135 could prevent new peer creation if peer count has reached upper
140 Description: LNET Selftest results in Soft lockup on OSS CPU
141 Details : only hits when 8 or more o2ib clients involved and a session is
142 torn down with 'lst end_session' without preceeding 'lst stop'.
146 Description: concurrent_sends in IB LNDs should not be changeable at run time
147 Details : concurrent_sends in IB LNDs should not be changeable at run time
151 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
152 Details : only hits under out-of-memory situations
155 -------------------------------------------------------------------------------
157 2009-02-07 Sun Microsystems, Inc.
159 * Support for networks:
160 socklnd - any kernel supported by Lustre,
161 qswlnd - Qsnet kernel modules 5.20 and later,
162 openiblnd - IbGold 1.8.2,
163 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
164 viblnd - Voltaire ibhost 3.4.5 and later,
165 ciblnd - Topspin 3.2.0,
166 iiblnd - Infiniserv 3.3 + PathBits patch,
167 gmlnd - GM 2.1.22 and later,
168 mxlnd - MX 1.2.1 or later,
169 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
172 Description: workaround for OOM from o2iblnd
173 Details : OFED needs allocate big chunk of memory for QP while creating
174 connection for o2iblnd, OOM can happen if no such a contiguous
176 QP size is decided by concurrent_sends and max_fragments of
177 o2iblnd, now we permit user to specify smaller value for
178 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
179 will decrease memory block size required by creating QP.
183 Description: Support Zerocopy receive of Chelsio device
184 Details : Chelsio driver can support zerocopy for iov[1] if it's
185 contiguous and large enough.
188 Description: fix credit flow deadlock in uptllnd
192 Description: finalize network operation in reasonable time
193 Details : conf-sanity test_32a couldn't stop ost and mds because it
194 tried to access non-existent peer and tcp connect took
195 quite long before timing out.
199 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
200 Details : Lost reference on conn prevents peer from being destroyed, which
201 could prevent new peer creation if peer count has reached upper
206 Description: LNET Selftest results in Soft lockup on OSS CPU
207 Details : only hits when 8 or more o2ib clients involved and a session is
208 torn down with 'lst end_session' without preceeding 'lst stop'.
212 Description: concurrent_sends in IB LNDs should not be changeable at run time
213 Details : concurrent_sends in IB LNDs should not be changeable at run time
215 -------------------------------------------------------------------------------
217 11-03-2008 Sun Microsystems, Inc.
219 * Support for networks:
220 socklnd - any kernel supported by Lustre,
221 qswlnd - Qsnet kernel modules 5.20 and later,
222 openiblnd - IbGold 1.8.2,
223 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
224 viblnd - Voltaire ibhost 3.4.5 and later,
225 ciblnd - Topspin 3.2.0,
226 iiblnd - Infiniserv 3.3 + PathBits patch,
227 gmlnd - GM 2.1.22 and later,
228 mxlnd - MX 1.2.1 or later,
229 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
233 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
234 Details : only hits under out-of-memory situations
236 -------------------------------------------------------------------------------
238 04-26-2008 Sun Microsystems, Inc.
240 * Support for networks:
241 socklnd - any kernel supported by Lustre,
242 qswlnd - Qsnet kernel modules 5.20 and later,
243 openiblnd - IbGold 1.8.2,
244 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
245 viblnd - Voltaire ibhost 3.4.5 and later,
246 ciblnd - Topspin 3.2.0,
247 iiblnd - Infiniserv 3.3 + PathBits patch,
248 gmlnd - GM 2.1.22 and later,
249 mxlnd - MX 1.2.1 or later,
250 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
254 Description: excessive debug information removed
255 Details : excessive debug information removed
259 Description: ksocknal_create_conn() hit ASSERTION during connection race
260 Details : ksocknal_create_conn() hit ASSERTION during connection race
264 Description: ksocknal_send_hello() hit ASSERTION while connecting race
265 Details : ksocknal_send_hello() hit ASSERTION while connecting race
269 Description: o2iblnd/ptllnd credit deadlock in a routed config.
270 Details : o2iblnd/ptllnd credit deadlock in a routed config.
274 Description: High load after starting lnet
275 Details : gmlnd should sleep in rx thread in interruptible way. Otherwise,
276 uptime utility reports high load that looks confusingly.
280 Description: ksocklnd fails to establish connection if accept_port is high
281 Details : PID remapping must not be done for active (outgoing) connections
284 --------------------------------------------------------------------------------
286 2008-01-11 Sun Microsystems, Inc.
288 * Support for networks:
289 socklnd - any kernel supported by Lustre,
290 qswlnd - Qsnet kernel modules 5.20 and later,
291 openiblnd - IbGold 1.8.2,
292 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
293 viblnd - Voltaire ibhost 3.4.5 and later,
294 ciblnd - Topspin 3.2.0,
295 iiblnd - Infiniserv 3.3 + PathBits patch,
296 gmlnd - GM 2.1.22 and later,
297 mxlnd - MX 1.2.1 or later,
298 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
302 Description: liblustre network error
303 Details : liblustre clients should understand LNET_ACCEPT_PORT environment
304 variable even if they don't start lnet acceptor.
308 Description: Strange message from lnet (Ignoring prediction from the future)
309 Details : Incorrect calculation of peer's last_alive value in ksocklnd
311 --------------------------------------------------------------------------------
313 2007-12-07 Cluster File Systems, Inc. <info@clusterfs.com>
315 * Support for networks:
316 socklnd - any kernel supported by Lustre,
317 qswlnd - Qsnet kernel modules 5.20 and later,
318 openiblnd - IbGold 1.8.2,
319 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5.
320 viblnd - Voltaire ibhost 3.4.5 and later,
321 ciblnd - Topspin 3.2.0,
322 iiblnd - Infiniserv 3.3 + PathBits patch,
323 gmlnd - GM 2.1.22 and later,
324 mxlnd - MX 1.2.1 or later,
325 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
329 Description: ASSERTION(me == md->md_me) failed in lnet_match_md()
333 Description: increase send queue size for ciblnd/openiblnd
337 Description: new userspace socklnd
338 Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced
339 with new one - usocklnd.
341 Severity : enhancement
343 Description: Console message flood
344 Details : Make cdls ratelimiting more tunable by adding several tunable in
345 procfs /proc/sys/lnet/console_{min,max}_delay_centisecs and
346 /proc/sys/lnet/console_backoff.
348 --------------------------------------------------------------------------------
350 2007-09-27 Cluster File Systems, Inc. <info@clusterfs.com>
352 * Support for networks:
353 socklnd - any kernel supported by Lustre,
354 qswlnd - Qsnet kernel modules 5.20 and later,
355 openiblnd - IbGold 1.8.2,
356 o2iblnd - OFED 1.1 and 1.2,
357 viblnd - Voltaire ibhost 3.4.5 and later,
358 ciblnd - Topspin 3.2.0,
359 iiblnd - Infiniserv 3.3 + PathBits patch,
360 gmlnd - GM 2.1.22 and later,
361 mxlnd - MX 1.2.1 or later,
362 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
366 Description: /proc/sys/lnet has non-sysctl entries
367 Details : Updating dump_kernel/daemon_file/debug_mb to use sysctl variables
371 Description: TOE Kernel panic by ksocklnd
372 Details : offloaded sockets provide their own implementation of sendpage,
373 can't call tcp_sendpage() directly
377 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs
378 Details : races between lnd_shutdown and peer creation prevent
379 lnd_shutdown from finishing.
383 Description: open files rlimit 1024 reached while liblustre testing
384 Details : ulnds/socklnd must close open socket after unsuccessful
389 Description: build error
390 Details : fix typos in gmlnd, ptllnd and viblnd
392 --------------------------------------------------------------------------------
394 2007-07-30 Cluster File Systems, Inc. <info@clusterfs.com>
396 * Support for networks:
397 socklnd - kernels up to 2.6.16,
398 qswlnd - Qsnet kernel modules 5.20 and later,
399 openiblnd - IbGold 1.8.2,
400 o2iblnd - OFED 1.1 and 1.2
401 viblnd - Voltaire ibhost 3.4.5 and later,
402 ciblnd - Topspin 3.2.0,
403 iiblnd - Infiniserv 3.3 + PathBits patch,
404 gmlnd - GM 2.1.22 and later,
405 mxlnd - MX 1.2.1 or later,
406 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
408 --------------------------------------------------------------------------------
410 2007-06-21 Cluster File Systems, Inc. <info@clusterfs.com>
412 * Support for networks:
413 socklnd - kernels up to 2.6.16,
414 qswlnd - Qsnet kernel modules 5.20 and later,
415 openiblnd - IbGold 1.8.2,
417 viblnd - Voltaire ibhost 3.4.5 and later,
418 ciblnd - Topspin 3.2.0,
419 iiblnd - Infiniserv 3.3 + PathBits patch,
420 gmlnd - GM 2.1.22 and later,
421 mxlnd - MX 1.2.1 or later,
422 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
426 Description: Initialize cpumask before use
430 Description: ASSERTION failures when upgrading to the patchless zero-copy
432 Details : This bug affects "rolling upgrades", causing an inconsistent
433 protocol version negotiation and subsequent assertion failure
434 during rolling upgrades after the first wave of upgrades.
438 Details : Change "dropped message" CERRORs to D_NETERROR so they are
439 logged instead of creating "console chatter" when a lustre
440 timeout races with normal RPC completion.
443 Details : lnet_clear_peer_table can wait forever if user forgets to
447 Details : libcfs_id2str should check pid against LNET_PID_ANY.
451 Description: added LNET self test
452 Details : landing b_self_test
457 Description: cfs_duration_{u,n}sec() wrongly calculate nanosecond part of
459 Details : do_div() macro is used incorrectly.
461 2007-04-23 Cluster File Systems, Inc. <info@clusterfs.com>
465 Description: make panic on lbug configurable
469 Description: Add OFED1.2 support to o2iblnd
470 Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules
471 are installed (other than kernel's in-tree infiniband), there
472 could be some problem while insmod o2iblnd (mismatch CRC of
474 If extra Module.symvers is supported in kernel (i.e, 2.6.17),
475 this link provides solution:
476 https://bugs.openfabrics.org/show_bug.cgi?id=355
477 if extra Module.symvers is not supported in kernel, we will
478 have to run the script in bug 12316 to update
479 $LINUX/module.symvers before building o2iblnd.
480 More details about this are in bug 12316.
482 ------------------------------------------------------------------------------
484 2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
485 * version 1.4.10 / 1.6.0
486 * Support for networks:
487 socklnd - kernels up to 2.6.16,
488 qswlnd - Qsnet kernel modules 5.20 and later,
489 openiblnd - IbGold 1.8.2,
491 viblnd - Voltaire ibhost 3.4.5 and later,
492 ciblnd - Topspin 3.2.0,
493 iiblnd - Infiniserv 3.3 + PathBits patch,
494 gmlnd - GM 2.1.22 and later,
495 mxlnd - MX 1.2.1 or later,
496 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
500 Description: Ptllnd didn't init kptllnd_data.kptl_idle_txs before it could be
501 possibly accessed in kptllnd_shutdown. Ptllnd should init
502 kptllnd_data.kptl_ptlid2str_lock before calling kptllnd_ptlid2str.
506 Description: gmlnd ignored some transmit errors when finalizing lnet messages.
510 Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_hello.
514 Description: the_lnet.ln_finalizing was not set when the current thread is
515 about to complete messages. It only affects multi-threaded
521 Description: Changed the default kqswlnd ntxmsg=512
526 Description: Assertion failure in kernel ptllnd caused by posting passive
527 bulk buffers before connection establishment complete.
532 Description: A race in kernel ptllnd between deleting a peer and posting
533 new communications for it could hang communications -
534 manifesting as "Unexpectedly long timeout" messages.
539 Description: Kernel ptllnd lock ordering issue could hang a node.
544 Description: node crash on socket teardown race
547 Frequency : 'lctl peer_list' issued on a mx net
549 Description: Enable lctl's peer_list for MXLND
552 Frequency : after Ptllnd timeouts and portals congestion
554 Description: Credit overflows
555 Details : This was a bug in ptllnd connection establishment. The fix
556 implements better peer stamps to disambiguate connection
557 establishment and ensure both peers enter the credit flow
558 state machine consistently.
563 Description: kptllnd didn't propagate some network errors up to LNET
564 Details : This bug was spotted while investigating 11394. The fix
565 ensures network errors on sends and bulk transfers are
566 propagated to LNET/lustre correctly.
568 Severity : enhancement
570 Description: Fixed console chatter in case of -ETIMEDOUT.
572 Severity : enhancement
574 Description: Added D_NETTRACE for recording network packet history
575 (initially only for ptllnd). Also a separate userspace
576 ptllnd facility to gather history which should really be
577 covered by D_NETTRACE too, if only CDEBUG recorded history in
583 Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED.
584 Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED
585 callback can occur before a connection has actually been
586 established. This caused an assertion failure previously.
588 Severity : enhancement
590 Description: Multiple instances for o2iblnd
591 Details : Allow multiple instances of o2iblnd to enable networking over
592 multiple HCAs and routing between them.
596 Description: lnet deadlock in router_checker
597 Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock
598 into BH locks to eliminate potential deadlock caused by
599 ksocknal_data_ready() preempting code holding these locks.
603 Description: Millions of failed socklnd connection attempts cause a very slow FS
604 Details : added a new route flag ksnr_scheduled to distinguish from
605 ksnr_connecting, so that a peer connection request is only turned
606 down for race concerns when an active connection to the same peer
607 is under progress (instead of just being scheduled).
609 ------------------------------------------------------------------------------
611 2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
613 * Support for networks:
614 socklnd - kernels up to 2.6.16
615 qswlnd - Qsnet kernel modules 5.20 and later
616 openiblnd - IbGold 1.8.2
618 viblnd - Voltaire ibhost 3.4.5 and later
619 ciblnd - Topspin 3.2.0
620 iiblnd - Infiniserv 3.3 + PathBits patch
621 gmlnd - GM 2.1.22 and later
622 mxlnd - MX 1.2.1 or later
623 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
626 Severity : major on XT3
628 Description: libcfs overwrites /proc/sys/portals
629 Details : libcfs created a symlink from /proc/sys/portals to
630 /proc/sys/lnet for backwards compatibility. This is no
631 longer required and makes the Cray portals /proc variables
636 Description: OFED FMR API change
637 Details : This changes parameter usage to reflect a change in
638 ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note
639 that FMR support is only used in experimental versions of the
640 o2iblnd - this change does not affect standard usage at all.
642 Severity : enhancement
644 Description: new ko2iblnd module parameter: ib_mtu
645 Details : the default IB MTU of 2048 performs badly on 23108 Tavor
646 HCAs. You can avoid this problem by setting the MTU to 1024
647 using this module parameter.
649 Severity : enhancement
650 Bugzilla : 11118/11620
651 Description: ptllnd small request message buffer alignment fix
652 Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives.
653 Round up small message size on sends in case this option
654 is not supported. 11620 was a defect in the initial
655 implementation which effectively asserted all peers had to be
656 running the correct protocol version which was fixed by always
657 NAK-ing such requests and handling any misalignments they
662 Description: When kib(nal|lnd)_del_peer() is called upon a peer whose
663 ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s
664 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail.
666 Severity : enhancement
668 Description: Patchless ZC(zero copy) socklnd
669 Details : New protocol for socklnd, socklnd can support zero copy without
670 kernel patch, it's compatible with old socklnd. Checksum is
671 moved from tunables to modparams.
675 Description: When ksocknal_del_peer() is called upon a peer whose
676 ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s
677 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail.
680 Frequency : when ptlrpc is under heavy use and runs out of request buffer
682 Description: In lnet_match_blocked_msg(), md can be used without holding a
686 Frequency : very rarely
688 Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost.
689 If connd connects a route which has been closed by
690 ksocknal_shutdown(), ksocknal_create_routes() may create new
691 routes which hold references on the peer, causing shutdown
692 process to wait for peer to disappear forever.
694 Severity : enhancement
696 Description: Dump XT3 portals traces on kptllnd timeout
697 Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to
698 dump Cray portals debug traces to a file. The kptllnd module
699 parameter "ptltrace_basename", default "/tmp/lnet-ptltrace",
700 is the basename of the dump file.
703 Frequency : infrequent
705 Description: kernel ptllnd fix bug in connection re-establishment
706 Details : Kernel ptllnd could produce protocol errors e.g. illegal
707 matchbits and/or violate the credit flow protocol when trying
708 to re-establish a connection with a peer after an error or
711 Severity : enhancement
713 Description: Allow /proc/sys/lnet/debug to be set symbolically
714 Details : Allow debug and subsystem debug values to be read/set by name
715 in addition to numerically, for ease of use.
718 Frequency : only in configurations with LNET routers
720 Description: routes automatically marked down and recovered
721 Details : In configurations with LNET routers if a router fails routers
722 now actively try to recover routes that are down, unless they
723 are marked down by an administrator.
725 ------------------------------------------------------------------------------
727 2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
730 Frequency : very rarely, in configurations with LNET routers and TCP
732 Description: incorrect data written to files on OSTs
733 Details : In certain high-load conditions incorrect data may be written
734 to files on the OST when using TCP networks.
736 ------------------------------------------------------------------------------
738 2006-07-31 Cluster File Systems, Inc. <info@clusterfs.com>
740 - rework CDEBUG messages rate-limiting mechanism b=10375
741 - add per-socket tunables for socklnd if the kernel is patched b=10327
743 ------------------------------------------------------------------------------
745 2006-02-15 Cluster File Systems, Inc. <info@clusterfs.com>
747 - fix use of portals/lnet pid to avoid dropping RPCs b=10074
748 - iiblnd wasn't mapping all memory, resulting in comms errors b=9776
749 - quiet LNET startup LNI message for liblustre b=10128
750 - Better console error messages if 'ip2nets' can't match an IP address
751 - Fixed overflow/use-before-set bugs in linux-time.h
752 - Fixed ptllnd bug that wasn't initialising rx descriptors completely
753 - LNET teardown failed an assertion about the route table being empty
754 - Fixed a crash in LNetEQPoll(<invalid handle>)
755 - Future protocol compatibility work (b_rls146_lnetprotovrsn)
756 - improve debug message for liblustre/Catamount nodes (b=10116)
758 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
759 * Configuration change for the XT3
760 The PTLLND is now used to run Lustre over Portals on the XT3.
761 The configure option(s) --with-cray-portals are no longer
762 used. Rather --with-portals=<path-to-portals-includes> is
763 used to enable building on the XT3. In addition to enable
764 XT3 specific features the option --enable-cray-xt3 must be
767 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
768 * Portals has been removed, replaced by LNET.
769 LNET is new networking infrastructure for Lustre, it includes a
770 reorganized network configuration mode (see the user
771 documentation for full details) as well as support for routing
772 between different network fabrics. Lustre Networking Devices
773 (LNDS) for the supported network fabrics have also been created
774 for this new infrastructure.
776 2005-08-08 Cluster File Systems, Inc. <info@clusterfs.com>
781 Frequency : rare (large Voltaire clusters only)
783 Description: the default number of reserved transmit descriptors was too low
784 for some large clusters
785 Details : As a workaround, the number was increased. A proper fix includes
788 2005-06-02 Cluster File Systems, Inc. <info@clusterfs.com>
793 Frequency : occasional (large-scale events, cluster reboot, network failure)
795 Description: too many error messages on console obscure actual problem and
796 can slow down/panic server, or cause recovery to fail repeatedly
797 Details : enable rate-limiting of console error messages, and some messages
798 that were console errors now only go to the kernel log
800 Severity : enhancement
802 Description: add /proc/sys/portals/catastrophe entry which will report if
803 that node has previously LBUGged
805 2005-04-06 Cluster File Systems, Inc. <info@clusterfs.com>
807 - update gmnal to use PTL_MTU, fix module refcounting (b=5786)
809 2005-04-04 Cluster File Systems, Inc. <info@clusterfs.com>
811 - handle error return code in kranal_check_fma_rx() (5915,6054)
813 2005-02-04 Cluster File Systems, Inc. <info@clusterfs.com>
815 - update vibnal (Voltaire IB NAL)
816 - update gmnal (Myrinet NAL), gmnalid
818 2005-02-04 Eric Barton <eeb@bartonsoftware.com>
820 * Landed portals:b_port_step as follows...
822 - removed CFS_DECL_SPIN*
823 just use 'spinlock_t' and initialise with spin_lock_init()
825 - removed CFS_DECL_MUTEX*
826 just use 'struct semaphore' and initialise with init_mutex()
828 - removed CFS_DECL_RWSEM*
829 just use 'struct rw_semaphore' and initialise with init_rwsem()
831 - renamed cfs_sleep_chan -> cfs_waitq
832 cfs_sleep_link -> cfs_waitlink
834 - fixed race in linux version of arch-independent socknal
835 (the ENOMEM/EAGAIN decision).
837 - Didn't fix problems in Darwin version of arch-independent socknal
838 (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
840 - removed libcfs types from non-socknal header files (only some types
841 in the header files had been changed; the .c files hadn't been