Whamcloud - gitweb
* userspace (catamount) ptllnd changes
- Error handling
Ensure all communications complete in finite time. Ensure errors cause
clean peer state teardown so that communications can be re-established
after a peer crash.
Note that this does NOT handle reconnection to a failed LNET router, which
is required for routed configurations.
- Environment tunables
PTLLND_DEBUG (boolean, dflt 0) is a global switch to enable/disable debug
features.
PTLLND_TX_HISTORY (int, dflt debug?1024:0) sets the size of the history
buffer.
PTLLND_ABORT_ON_PROTOCOL_MISMATCH (boolean, dflt 1) calls abort on
connecting to a peer running a different version of the ptllnd protocol.
PTLLND_ABORT_ON_NAK (boolean, dflt 0) abort when a peer sends a NAK
(e.g. because it has timed out this node).
PTLLND_DUMP_ON_NAK (boolean, dflt debug?1:0) dumps peer debug and the
history on receiving a NAK
PTLLND_WATCHDOG_INTERVAL (int, dflt 1) sets how often to check some peers
for timed-out communications while the application blocks for
communications to complete.
PTLLND_TIMEOUT (int, dflt 50) is the communications timeout in seconds.
PTLLND_LONG_WAIT (int, dflt debug?5:PTLLND_TIMEOUT) is a time in seconds
after which the ptllnd prints a warning if it blocks for longer during
connection establishment, cleanup after an error or cleanup during shutdown.