1 18-01-2021 Whamcloud, Inc.
3 * Support for networks:
4 socklnd - any kernel supported by Lustre
5 o2iblnd - OFED from any kernels supported by Lustre
6 o2iblnd - MLNX_OFED 4.5-1.0.1.0, MLNX_OFED 4.6-1.0.1.1
7 MLNX_OFED 4.7-1.0.0.1, MLNX_OFED 4.7-3.2.9.0
8 MLNX_OFED 4.9-0.1.7.0, MLNX_OFED 4.9-2.2.4.0
9 MLNX_OFED 5.0-2.1.8.0, MLNX_OFED 5.1-0.6.6.0
10 MLNX_OFED 5.1-2.3.7.1, MLNX_OFED 5.1-2.5.8.0
14 MR Routing: This feature aligns the LNET Multi-Rail behavior
15 with routing. A gateway now is viewed as a Multi-Rail
16 capable node. When a route is added only one entry
17 per gateway should be used. That route entry should
18 be added using a reachable nid of the gateway.
19 The multi-rail selection algorithm is then run when
20 sending to the gateway to select the best interface
21 to send to. The gateway aliveness is now kept via
24 -------------------------------------------------------------------------------
26 11-05-2018 Whamcloud, Inc.
28 * Support for networks:
29 socklnd - any kernel supported by Lustre
30 o2iblnd - OFED from any kernels supported by Lustre
31 o2ilbnd - MLNX_OFED 3.4-2.0.0.0, MLNX_OFED 4.0-1.0.1.0
32 MLNX_OFED 4.0-2.0.0.1, MLNX_OFED 4.1-1.0.2.0
33 MLNX_OFED 4.2-1.0.0.0, MLNX_OFED 4.2-1.2.0.0
34 MLNX_OFED 4.3-1.0.1.0. MLNX_OFED 4.4-1.0.0.0
37 LNet Health: Keep track of network interface health and select
42 * Support for networks:
43 socklnd - any kernel supported by Lustre,
44 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
45 mxlnd - MX 1.2.10 or later
46 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
48 -------------------------------------------------------------------------------
50 09-30-2011 Whamcloud, Inc.
52 * Support for networks:
53 socklnd - any kernel supported by Lustre,
54 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
55 * Available but unsupported:
56 mxlnd - MX 1.2.10 or later
57 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
59 -------------------------------------------------------------------------------
61 2010-07-15 Oracle, Inc.
63 * Support for networks:
64 socklnd - any kernel supported by Lustre,
65 qswlnd - Qsnet kernel modules 5.20 and later,
66 openiblnd - IbGold 1.8.2,
67 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, and 1.4.1
68 viblnd - Voltaire ibhost 3.4.5 and later,
69 ciblnd - Topspin 3.2.0,
70 iiblnd - Infiniserv 3.3 + PathBits patch,
71 gmlnd - GM 2.1.22 and later,
72 mxlnd - MX 1.2.10 or later,
73 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
77 Description: should update lp_alive for non-router peers
79 Severity : enhancement
81 Description: LNet router shuffler.
83 Severity : enhancement
85 Description: LNet fine grain routing support.
89 Description: router checker stops working when system wall clock goes backward
90 Details : use monotonic timing source instead of system wall clock time.
92 Severity : enhancement
94 Description: avoid asymmetrical router failures
96 Severity : enhancement
98 Description: multiple-instance support for kptllnd
102 Description: ksocknal_close_conn_locked connection race
103 Details : A race was possible when ksocknal_create_conn calls
104 ksocknal_close_conn_locked for already closed conn.
108 Description: router_proc.c is rewritten to use sysctl-interface for parameters
109 residing in /proc/sys/lnet
111 Severity : enhancement
113 Description: port router pinger to userspace
117 Description: kptllnd HELLO protocol deadlock
118 Details : kptllnd HELLO protocol doesn't run to completion in finite time
122 Description: LNet selftest fixes and enhancements
124 Severity : enhancement
126 Description: allow a test node to be a member of multiple test groups
128 Severity : enhancement
130 Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
131 Details : an update from the upstream developer Scott Atchley.
133 Severity : enhancement
135 Description: add a new LND optiion to control peer buffer credits on routers
139 Description: Fixing deadlock in usocklnd
140 Details : A deadlock was possible in usocklnd due to race condition while
141 tearing connection down. The problem resulted from erroneous
142 assumption that lnet_finalize() could have been called holding
143 some lnd-level locks.
146 Bugzilla : 13621, 15983
147 Description: Protocol V2 of o2iblnd
148 Details : o2iblnd V2 has several new features:
149 . map-on-demand: map-on-demand is disabled by default, it can
150 be enabled by using modparam "map_on_demand=@value@", @value@
151 should >= 0 and < 256, 0 will disable map-on-demand, any other
152 valid value will enable map-on-demand.
153 Oi2blnd will create FMR or physical MR for RDMA if fragments of
155 Enable map-on-demand will take less memory for new connection,
156 but a little more CPU for RDMA.
157 . iWARP : to support iWARP, please enable map-on-demand, 32 and 64
158 are recommanded value. iWARP will probably fail for value >=128.
159 . OOB NOOP message: to resolve deadlock on router.
160 . tunable peer_credits_hiw: (high water to return credits),
161 default value of peer_credits_hiw equals to (peer_credits -1),
162 user can change it between peer_credits/2 and (peer_credits - 1).
163 Lower value is recommended for high latency network.
164 . tunable message queue size: it always equals to peer_credits,
165 higher value is recommended for high latency network.
166 . It's compatible with earlier version of o2iblnd
170 Description: Fixing 'running out of ports' issue
171 Details : Add a delay before next reconnect attempt in ksocklnd in
172 the case of lost race. Limit the frequency of query-requests
173 in lnet. Improved handling of 'dead peer' notifications in
178 Description: Change ptllnd timeout and watchdog timers
179 Details : Add ptltrace_on_nal_failed and bump ptllnd timeout to match
180 Portals wire timeout.
184 Description: One down Lustre FS hangs ALL mounted Lustre filesystems
185 Details : Shared routing enhancements - peer health detection.
187 Severity : enhancement
189 Description: acceptor.c cleanup
190 Details : Code duplication in acceptor.c for the cases of kernel and
191 user-space removed. User-space libcfs tcpip primitives
192 uniformed to have prototypes similar to kernel ones. Minor
193 cosmetic changes in usocklnd to use cfs_socket_t as
194 representation of socket.
198 Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
199 Details : See comment 46 in bug 11245 for details - it's indeed a bug
200 introduced by the original 11245 fix.
204 Description: uptllnd credit overflow fix
205 Details : kptl_msg_t::ptlm_credits could be overflown by uptllnd since
210 Description: socklnd protocol version 3
211 Details : With current protocol V2, connections on router can be
212 blocked and can't receive any incoming messages when there is no
213 more router buffer, so ZC-ACK can't be handled (LNet message
214 can't be finalized) and will cause deadlock on router.
215 Protocol V3 has a dedicated connection for emergency messages
216 like ZC-ACK to router, messages on this dedicated connection
217 don't need any credit so will never be blocked. Also, V3 can send
218 keepalive ping in specified period for router healthy checking.
220 -------------------------------------------------------------------------------
222 12-31-2008 Sun Microsystems, Inc.
224 * Support for networks:
225 socklnd - any kernel supported by Lustre,
226 qswlnd - Qsnet kernel modules 5.20 and later,
227 openiblnd - IbGold 1.8.2,
228 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
229 viblnd - Voltaire ibhost 3.4.5 and later,
230 ciblnd - Topspin 3.2.0,
231 iiblnd - Infiniserv 3.3 + PathBits patch,
232 gmlnd - GM 2.1.22 and later,
233 mxlnd - MX 1.2.1 or later,
234 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
238 Description: workaround for OOM from o2iblnd
239 Details : OFED needs allocate big chunk of memory for QP while creating
240 connection for o2iblnd, OOM can happen if no such a contiguous
242 QP size is decided by concurrent_sends and max_fragments of
243 o2iblnd, now we permit user to specify smaller value for
244 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
245 will decrease memory block size required by creating QP.
249 Description: Support Zerocopy receive of Chelsio device
250 Details : Chelsio driver can support zerocopy for iov[1] if it's
251 contiguous and large enough.
255 Description: fix credit flow deadlock in uptllnd
259 Description: finalize network operation in reasonable time
260 Details : conf-sanity test_32a couldn't stop ost and mds because it
261 tried to access non-existent peer and tcp connect took
262 quite long before timing out.
266 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
267 Details : Lost reference on conn prevents peer from being destroyed, which
268 could prevent new peer creation if peer count has reached upper
273 Description: LNET Selftest results in Soft lockup on OSS CPU
274 Details : only hits when 8 or more o2ib clients involved and a session is
275 torn down with 'lst end_session' without preceeding 'lst stop'.
279 Description: concurrent_sends in IB LNDs should not be changeable at run time
280 Details : concurrent_sends in IB LNDs should not be changeable at run time
284 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
285 Details : only hits under out-of-memory situations
288 -------------------------------------------------------------------------------
290 2009-02-07 Sun Microsystems, Inc.
292 * Support for networks:
293 socklnd - any kernel supported by Lustre,
294 qswlnd - Qsnet kernel modules 5.20 and later,
295 openiblnd - IbGold 1.8.2,
296 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
297 viblnd - Voltaire ibhost 3.4.5 and later,
298 ciblnd - Topspin 3.2.0,
299 iiblnd - Infiniserv 3.3 + PathBits patch,
300 gmlnd - GM 2.1.22 and later,
301 mxlnd - MX 1.2.1 or later,
302 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
305 Description: workaround for OOM from o2iblnd
306 Details : OFED needs allocate big chunk of memory for QP while creating
307 connection for o2iblnd, OOM can happen if no such a contiguous
309 QP size is decided by concurrent_sends and max_fragments of
310 o2iblnd, now we permit user to specify smaller value for
311 concurrent_sends of o2iblnd(i.e: concurrent_sends=7), which
312 will decrease memory block size required by creating QP.
316 Description: Support Zerocopy receive of Chelsio device
317 Details : Chelsio driver can support zerocopy for iov[1] if it's
318 contiguous and large enough.
321 Description: fix credit flow deadlock in uptllnd
325 Description: finalize network operation in reasonable time
326 Details : conf-sanity test_32a couldn't stop ost and mds because it
327 tried to access non-existent peer and tcp connect took
328 quite long before timing out.
332 Description: Continuous recovery on 33 of 413 nodes after lustre oss failure
333 Details : Lost reference on conn prevents peer from being destroyed, which
334 could prevent new peer creation if peer count has reached upper
339 Description: LNET Selftest results in Soft lockup on OSS CPU
340 Details : only hits when 8 or more o2ib clients involved and a session is
341 torn down with 'lst end_session' without preceeding 'lst stop'.
345 Description: concurrent_sends in IB LNDs should not be changeable at run time
346 Details : concurrent_sends in IB LNDs should not be changeable at run time
348 -------------------------------------------------------------------------------
350 11-03-2008 Sun Microsystems, Inc.
352 * Support for networks:
353 socklnd - any kernel supported by Lustre,
354 qswlnd - Qsnet kernel modules 5.20 and later,
355 openiblnd - IbGold 1.8.2,
356 o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3
357 viblnd - Voltaire ibhost 3.4.5 and later,
358 ciblnd - Topspin 3.2.0,
359 iiblnd - Infiniserv 3.3 + PathBits patch,
360 gmlnd - GM 2.1.22 and later,
361 mxlnd - MX 1.2.1 or later,
362 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
366 Description: ptl_send_rpc hits LASSERT when ptl_send_buf fails
367 Details : only hits under out-of-memory situations
369 -------------------------------------------------------------------------------
371 04-26-2008 Sun Microsystems, Inc.
373 * Support for networks:
374 socklnd - any kernel supported by Lustre,
375 qswlnd - Qsnet kernel modules 5.20 and later,
376 openiblnd - IbGold 1.8.2,
377 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
378 viblnd - Voltaire ibhost 3.4.5 and later,
379 ciblnd - Topspin 3.2.0,
380 iiblnd - Infiniserv 3.3 + PathBits patch,
381 gmlnd - GM 2.1.22 and later,
382 mxlnd - MX 1.2.1 or later,
383 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
387 Description: excessive debug information removed
388 Details : excessive debug information removed
392 Description: ksocknal_create_conn() hit ASSERTION during connection race
393 Details : ksocknal_create_conn() hit ASSERTION during connection race
397 Description: ksocknal_send_hello() hit ASSERTION while connecting race
398 Details : ksocknal_send_hello() hit ASSERTION while connecting race
402 Description: o2iblnd/ptllnd credit deadlock in a routed config.
403 Details : o2iblnd/ptllnd credit deadlock in a routed config.
407 Description: High load after starting lnet
408 Details : gmlnd should sleep in rx thread in interruptible way. Otherwise,
409 uptime utility reports high load that looks confusingly.
413 Description: ksocklnd fails to establish connection if accept_port is high
414 Details : PID remapping must not be done for active (outgoing) connections
417 --------------------------------------------------------------------------------
419 2008-01-11 Sun Microsystems, Inc.
421 * Support for networks:
422 socklnd - any kernel supported by Lustre,
423 qswlnd - Qsnet kernel modules 5.20 and later,
424 openiblnd - IbGold 1.8.2,
425 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5
426 viblnd - Voltaire ibhost 3.4.5 and later,
427 ciblnd - Topspin 3.2.0,
428 iiblnd - Infiniserv 3.3 + PathBits patch,
429 gmlnd - GM 2.1.22 and later,
430 mxlnd - MX 1.2.1 or later,
431 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
435 Description: liblustre network error
436 Details : liblustre clients should understand LNET_ACCEPT_PORT environment
437 variable even if they don't start lnet acceptor.
441 Description: Strange message from lnet (Ignoring prediction from the future)
442 Details : Incorrect calculation of peer's last_alive value in ksocklnd
444 --------------------------------------------------------------------------------
446 2007-12-07 Cluster File Systems, Inc. <info@clusterfs.com>
448 * Support for networks:
449 socklnd - any kernel supported by Lustre,
450 qswlnd - Qsnet kernel modules 5.20 and later,
451 openiblnd - IbGold 1.8.2,
452 o2iblnd - OFED 1.1 and 1.2.0, 1.2.5.
453 viblnd - Voltaire ibhost 3.4.5 and later,
454 ciblnd - Topspin 3.2.0,
455 iiblnd - Infiniserv 3.3 + PathBits patch,
456 gmlnd - GM 2.1.22 and later,
457 mxlnd - MX 1.2.1 or later,
458 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
462 Description: ASSERTION(me == md->md_me) failed in lnet_match_md()
466 Description: increase send queue size for ciblnd/openiblnd
470 Description: new userspace socklnd
471 Details : Old userspace tcpnal that resided in lnet/ulnds/socklnd replaced
472 with new one - usocklnd.
474 Severity : enhancement
476 Description: Console message flood
477 Details : Make cdls ratelimiting more tunable by adding several tunable in
478 procfs /proc/sys/lnet/console_{min,max}_delay_centisecs and
479 /proc/sys/lnet/console_backoff.
481 --------------------------------------------------------------------------------
483 2007-09-27 Cluster File Systems, Inc. <info@clusterfs.com>
485 * Support for networks:
486 socklnd - any kernel supported by Lustre,
487 qswlnd - Qsnet kernel modules 5.20 and later,
488 openiblnd - IbGold 1.8.2,
489 o2iblnd - OFED 1.1 and 1.2,
490 viblnd - Voltaire ibhost 3.4.5 and later,
491 ciblnd - Topspin 3.2.0,
492 iiblnd - Infiniserv 3.3 + PathBits patch,
493 gmlnd - GM 2.1.22 and later,
494 mxlnd - MX 1.2.1 or later,
495 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
499 Description: /proc/sys/lnet has non-sysctl entries
500 Details : Updating dump_kernel/daemon_file/debug_mb to use sysctl variables
504 Description: TOE Kernel panic by ksocklnd
505 Details : offloaded sockets provide their own implementation of sendpage,
506 can't call tcp_sendpage() directly
510 Description: kibnal_shutdown() doesn't finish; lconf --cleanup hangs
511 Details : races between lnd_shutdown and peer creation prevent
512 lnd_shutdown from finishing.
516 Description: open files rlimit 1024 reached while liblustre testing
517 Details : ulnds/socklnd must close open socket after unsuccessful
522 Description: build error
523 Details : fix typos in gmlnd, ptllnd and viblnd
525 --------------------------------------------------------------------------------
527 2007-07-30 Cluster File Systems, Inc. <info@clusterfs.com>
529 * Support for networks:
530 socklnd - kernels up to 2.6.16,
531 qswlnd - Qsnet kernel modules 5.20 and later,
532 openiblnd - IbGold 1.8.2,
533 o2iblnd - OFED 1.1 and 1.2
534 viblnd - Voltaire ibhost 3.4.5 and later,
535 ciblnd - Topspin 3.2.0,
536 iiblnd - Infiniserv 3.3 + PathBits patch,
537 gmlnd - GM 2.1.22 and later,
538 mxlnd - MX 1.2.1 or later,
539 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
541 --------------------------------------------------------------------------------
543 2007-06-21 Cluster File Systems, Inc. <info@clusterfs.com>
545 * Support for networks:
546 socklnd - kernels up to 2.6.16,
547 qswlnd - Qsnet kernel modules 5.20 and later,
548 openiblnd - IbGold 1.8.2,
550 viblnd - Voltaire ibhost 3.4.5 and later,
551 ciblnd - Topspin 3.2.0,
552 iiblnd - Infiniserv 3.3 + PathBits patch,
553 gmlnd - GM 2.1.22 and later,
554 mxlnd - MX 1.2.1 or later,
555 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
559 Description: Initialize cpumask before use
563 Description: ASSERTION failures when upgrading to the patchless zero-copy
565 Details : This bug affects "rolling upgrades", causing an inconsistent
566 protocol version negotiation and subsequent assertion failure
567 during rolling upgrades after the first wave of upgrades.
571 Details : Change "dropped message" CERRORs to D_NETERROR so they are
572 logged instead of creating "console chatter" when a lustre
573 timeout races with normal RPC completion.
576 Details : lnet_clear_peer_table can wait forever if user forgets to
580 Details : libcfs_id2str should check pid against LNET_PID_ANY.
584 Description: added LNET self test
585 Details : landing b_self_test
590 Description: cfs_duration_{u,n}sec() wrongly calculate nanosecond part of
592 Details : do_div() macro is used incorrectly.
594 2007-04-23 Cluster File Systems, Inc. <info@clusterfs.com>
598 Description: make panic on lbug configurable
602 Description: Add OFED1.2 support to o2iblnd
603 Details : o2iblnd depends on OFED's modules, if out-tree OFED's modules
604 are installed (other than kernel's in-tree infiniband), there
605 could be some problem while insmod o2iblnd (mismatch CRC of
607 If extra Module.symvers is supported in kernel (i.e, 2.6.17),
608 this link provides solution:
609 https://bugs.openfabrics.org/show_bug.cgi?id=355
610 if extra Module.symvers is not supported in kernel, we will
611 have to run the script in bug 12316 to update
612 $LINUX/module.symvers before building o2iblnd.
613 More details about this are in bug 12316.
615 ------------------------------------------------------------------------------
617 2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
618 * version 1.4.10 / 1.6.0
619 * Support for networks:
620 socklnd - kernels up to 2.6.16,
621 qswlnd - Qsnet kernel modules 5.20 and later,
622 openiblnd - IbGold 1.8.2,
624 viblnd - Voltaire ibhost 3.4.5 and later,
625 ciblnd - Topspin 3.2.0,
626 iiblnd - Infiniserv 3.3 + PathBits patch,
627 gmlnd - GM 2.1.22 and later,
628 mxlnd - MX 1.2.1 or later,
629 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
633 Description: Ptllnd didn't init kptllnd_data.kptl_idle_txs before it could be
634 possibly accessed in kptllnd_shutdown. Ptllnd should init
635 kptllnd_data.kptl_ptlid2str_lock before calling kptllnd_ptlid2str.
639 Description: gmlnd ignored some transmit errors when finalizing lnet messages.
643 Description: ptllnd logs a piece of incorrect debug info in kptllnd_peer_handle_hello.
647 Description: the_lnet.ln_finalizing was not set when the current thread is
648 about to complete messages. It only affects multi-threaded
654 Description: Changed the default kqswlnd ntxmsg=512
659 Description: Assertion failure in kernel ptllnd caused by posting passive
660 bulk buffers before connection establishment complete.
665 Description: A race in kernel ptllnd between deleting a peer and posting
666 new communications for it could hang communications -
667 manifesting as "Unexpectedly long timeout" messages.
672 Description: Kernel ptllnd lock ordering issue could hang a node.
677 Description: node crash on socket teardown race
680 Frequency : 'lctl peer_list' issued on a mx net
682 Description: Enable lctl's peer_list for MXLND
685 Frequency : after Ptllnd timeouts and portals congestion
687 Description: Credit overflows
688 Details : This was a bug in ptllnd connection establishment. The fix
689 implements better peer stamps to disambiguate connection
690 establishment and ensure both peers enter the credit flow
691 state machine consistently.
696 Description: kptllnd didn't propagate some network errors up to LNET
697 Details : This bug was spotted while investigating 11394. The fix
698 ensures network errors on sends and bulk transfers are
699 propagated to LNET/lustre correctly.
701 Severity : enhancement
703 Description: Fixed console chatter in case of -ETIMEDOUT.
705 Severity : enhancement
707 Description: Added D_NETTRACE for recording network packet history
708 (initially only for ptllnd). Also a separate userspace
709 ptllnd facility to gather history which should really be
710 covered by D_NETTRACE too, if only CDEBUG recorded history in
716 Description: o2iblnd handle early RDMA_CM_EVENT_DISCONNECTED.
717 Details : If the fabric is lossy, an RDMA_CM_EVENT_DISCONNECTED
718 callback can occur before a connection has actually been
719 established. This caused an assertion failure previously.
721 Severity : enhancement
723 Description: Multiple instances for o2iblnd
724 Details : Allow multiple instances of o2iblnd to enable networking over
725 multiple HCAs and routing between them.
729 Description: lnet deadlock in router_checker
730 Details : turned ksnd_connd_lock, ksnd_reaper_lock, and ksock_net_t:ksnd_lock
731 into BH locks to eliminate potential deadlock caused by
732 ksocknal_data_ready() preempting code holding these locks.
736 Description: Millions of failed socklnd connection attempts cause a very slow FS
737 Details : added a new route flag ksnr_scheduled to distinguish from
738 ksnr_connecting, so that a peer connection request is only turned
739 down for race concerns when an active connection to the same peer
740 is under progress (instead of just being scheduled).
742 ------------------------------------------------------------------------------
744 2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
746 * Support for networks:
747 socklnd - kernels up to 2.6.16
748 qswlnd - Qsnet kernel modules 5.20 and later
749 openiblnd - IbGold 1.8.2
751 viblnd - Voltaire ibhost 3.4.5 and later
752 ciblnd - Topspin 3.2.0
753 iiblnd - Infiniserv 3.3 + PathBits patch
754 gmlnd - GM 2.1.22 and later
755 mxlnd - MX 1.2.1 or later
756 ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
759 Severity : major on XT3
761 Description: libcfs overwrites /proc/sys/portals
762 Details : libcfs created a symlink from /proc/sys/portals to
763 /proc/sys/lnet for backwards compatibility. This is no
764 longer required and makes the Cray portals /proc variables
769 Description: OFED FMR API change
770 Details : This changes parameter usage to reflect a change in
771 ib_fmr_pool_map_phys() between OFED 1.0 and OFED 1.1. Note
772 that FMR support is only used in experimental versions of the
773 o2iblnd - this change does not affect standard usage at all.
775 Severity : enhancement
777 Description: new ko2iblnd module parameter: ib_mtu
778 Details : the default IB MTU of 2048 performs badly on 23108 Tavor
779 HCAs. You can avoid this problem by setting the MTU to 1024
780 using this module parameter.
782 Severity : enhancement
783 Bugzilla : 11118/11620
784 Description: ptllnd small request message buffer alignment fix
785 Details : Set the PTL_MD_LOCAL_ALIGN8 option on small message receives.
786 Round up small message size on sends in case this option
787 is not supported. 11620 was a defect in the initial
788 implementation which effectively asserted all peers had to be
789 running the correct protocol version which was fixed by always
790 NAK-ing such requests and handling any misalignments they
795 Description: When kib(nal|lnd)_del_peer() is called upon a peer whose
796 ibp_tx_queue is not empty, kib(nal|lnd)_destroy_peer()'s
797 'LASSERT(list_empty(&peer->ibp_tx_queue))' will fail.
799 Severity : enhancement
801 Description: Patchless ZC(zero copy) socklnd
802 Details : New protocol for socklnd, socklnd can support zero copy without
803 kernel patch, it's compatible with old socklnd. Checksum is
804 moved from tunables to modparams.
808 Description: When ksocknal_del_peer() is called upon a peer whose
809 ksnp_tx_queue is not empty, ksocknal_destroy_peer()'s
810 'LASSERT(list_empty(&peer->ksnp_tx_queue))' will fail.
813 Frequency : when ptlrpc is under heavy use and runs out of request buffer
815 Description: In lnet_match_blocked_msg(), md can be used without holding a
819 Frequency : very rarely
821 Description: If ksocknal_lib_setup_sock() fails, a ref on peer is lost.
822 If connd connects a route which has been closed by
823 ksocknal_shutdown(), ksocknal_create_routes() may create new
824 routes which hold references on the peer, causing shutdown
825 process to wait for peer to disappear forever.
827 Severity : enhancement
829 Description: Dump XT3 portals traces on kptllnd timeout
830 Details : Set the kptllnd module parameter "ptltrace_on_timeout=1" to
831 dump Cray portals debug traces to a file. The kptllnd module
832 parameter "ptltrace_basename", default "/tmp/lnet-ptltrace",
833 is the basename of the dump file.
836 Frequency : infrequent
838 Description: kernel ptllnd fix bug in connection re-establishment
839 Details : Kernel ptllnd could produce protocol errors e.g. illegal
840 matchbits and/or violate the credit flow protocol when trying
841 to re-establish a connection with a peer after an error or
844 Severity : enhancement
846 Description: Allow /proc/sys/lnet/debug to be set symbolically
847 Details : Allow debug and subsystem debug values to be read/set by name
848 in addition to numerically, for ease of use.
851 Frequency : only in configurations with LNET routers
853 Description: routes automatically marked down and recovered
854 Details : In configurations with LNET routers if a router fails routers
855 now actively try to recover routes that are down, unless they
856 are marked down by an administrator.
858 ------------------------------------------------------------------------------
860 2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
863 Frequency : very rarely, in configurations with LNET routers and TCP
865 Description: incorrect data written to files on OSTs
866 Details : In certain high-load conditions incorrect data may be written
867 to files on the OST when using TCP networks.
869 ------------------------------------------------------------------------------
871 2006-07-31 Cluster File Systems, Inc. <info@clusterfs.com>
873 - rework CDEBUG messages rate-limiting mechanism b=10375
874 - add per-socket tunables for socklnd if the kernel is patched b=10327
876 ------------------------------------------------------------------------------
878 2006-02-15 Cluster File Systems, Inc. <info@clusterfs.com>
880 - fix use of portals/lnet pid to avoid dropping RPCs b=10074
881 - iiblnd wasn't mapping all memory, resulting in comms errors b=9776
882 - quiet LNET startup LNI message for liblustre b=10128
883 - Better console error messages if 'ip2nets' can't match an IP address
884 - Fixed overflow/use-before-set bugs in linux-time.h
885 - Fixed ptllnd bug that wasn't initialising rx descriptors completely
886 - LNET teardown failed an assertion about the route table being empty
887 - Fixed a crash in LNetEQPoll(<invalid handle>)
888 - Future protocol compatibility work (b_rls146_lnetprotovrsn)
889 - improve debug message for liblustre/Catamount nodes (b=10116)
891 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
892 * Configuration change for the XT3
893 The PTLLND is now used to run Lustre over Portals on the XT3.
894 The configure option(s) --with-cray-portals are no longer
895 used. Rather --with-portals=<path-to-portals-includes> is
896 used to enable building on the XT3. In addition to enable
897 XT3 specific features the option --enable-cray-xt3 must be
900 2005-10-10 Cluster File Systems, Inc. <info@clusterfs.com>
901 * Portals has been removed, replaced by LNET.
902 LNET is new networking infrastructure for Lustre, it includes a
903 reorganized network configuration mode (see the user
904 documentation for full details) as well as support for routing
905 between different network fabrics. Lustre Networking Devices
906 (LNDS) for the supported network fabrics have also been created
907 for this new infrastructure.
909 2005-08-08 Cluster File Systems, Inc. <info@clusterfs.com>
914 Frequency : rare (large Voltaire clusters only)
916 Description: the default number of reserved transmit descriptors was too low
917 for some large clusters
918 Details : As a workaround, the number was increased. A proper fix includes
921 2005-06-02 Cluster File Systems, Inc. <info@clusterfs.com>
926 Frequency : occasional (large-scale events, cluster reboot, network failure)
928 Description: too many error messages on console obscure actual problem and
929 can slow down/panic server, or cause recovery to fail repeatedly
930 Details : enable rate-limiting of console error messages, and some messages
931 that were console errors now only go to the kernel log
933 Severity : enhancement
935 Description: add /proc/sys/portals/catastrophe entry which will report if
936 that node has previously LBUGged
938 2005-04-06 Cluster File Systems, Inc. <info@clusterfs.com>
940 - update gmnal to use PTL_MTU, fix module refcounting (b=5786)
942 2005-04-04 Cluster File Systems, Inc. <info@clusterfs.com>
944 - handle error return code in kranal_check_fma_rx() (5915,6054)
946 2005-02-04 Cluster File Systems, Inc. <info@clusterfs.com>
948 - update vibnal (Voltaire IB NAL)
949 - update gmnal (Myrinet NAL), gmnalid
951 2005-02-04 Eric Barton <eeb@bartonsoftware.com>
953 * Landed portals:b_port_step as follows...
955 - removed CFS_DECL_SPIN*
956 just use 'spinlock_t' and initialise with spin_lock_init()
958 - removed CFS_DECL_MUTEX*
959 just use 'struct semaphore' and initialise with init_mutex()
961 - removed CFS_DECL_RWSEM*
962 just use 'struct rw_semaphore' and initialise with init_rwsem()
964 - renamed cfs_sleep_chan -> cfs_waitq
965 cfs_sleep_link -> cfs_waitlink
967 - fixed race in linux version of arch-independent socknal
968 (the ENOMEM/EAGAIN decision).
970 - Didn't fix problems in Darwin version of arch-independent socknal
971 (resetting socket callbacks, eager ack hack, ENOMEM/EAGAIN decision)
973 - removed libcfs types from non-socknal header files (only some types
974 in the header files had been changed; the .c files hadn't been