1 #LyX 1.2 created this file. For more info see http://www.lyx.org/
6 \renewenvironment{comment}%
7 {\begin{quote}\textbf{Discussion}: \slshape}%
17 \papersize letterpaper
22 \use_numerical_citations 0
23 \paperorientation portrait
26 \paragraph_separation indent
28 \quotes_language english
32 \paperpagestyle headings
36 The Portals 3.2 Message Passing Interface
49 Riesen are with the Scalable Computing Systems Department, Sandia National
51 Box 5800, Albuquerque, NM\SpecialChar ~
53 87111-1110, bright@cs.sandia.gov, rolf@cs.sandia.gov.
65 Maccabe is with the Computer Science Department, University of New Mexico,
66 Albuquerque, NM\SpecialChar ~
68 87131-1386, maccabe@cs.unm.edu.
71 , Rolf Riesen and Trammell Hudson
74 This report presents a specification for the Portals 3.2 message passing
76 Portals 3.2 is intended to allow scalable, high-performance network communicatio
77 n between nodes of a parallel computing system.
78 Specifically, it is designed to support a parallel computing platform composed
79 of clusters of commodity workstations connected by a commodity system area
81 In addition, Portals 3.2 is well suited to massively parallel processing
83 Portals 3.2 represents an adaption of the data movement layer developed
84 for massively parallel processing platforms, such as the 4500-node Intel
107 \begin_inset LatexCommand \tableofcontents{}
128 \begin_inset FloatList figure
149 \begin_inset FloatList table
169 Summary of Changes for Revision 1.1
172 Updated version number to 3.2 throughout the document
176 \begin_inset LatexCommand \ref{sub:PtlGetId}
192 \begin_inset LatexCommand \ref{sec:meattach}
208 \begin_inset LatexCommand \ref{sec:meunlink}
212 : removed text referring to a list of associated memory descriptors.
216 \begin_inset LatexCommand \ref{sec:mdfree}
220 : added text to describe unlinking a free-floating memory descriptor.
224 \begin_inset LatexCommand \ref{tab:types}
236 \begin_inset LatexCommand \ref{sec:md-type}
251 added text to clarify
260 \begin_inset LatexCommand \ref{sec:mdattach}
272 \begin_inset LatexCommand \ref{sec:niinit}
276 : added text to clarify multiple calls to
284 \begin_inset LatexCommand \ref{sec:mdattach}
288 : added text to clarify
296 \begin_inset LatexCommand \ref{sec:receiving}
300 : removed text indicating that an MD will reject a message if the associated
305 \begin_inset LatexCommand \ref{sec:mdfree}
313 error code and text to indicate that only MDs with no pending operations
318 \begin_inset LatexCommand \ref{tab:retcodes}
330 \begin_inset LatexCommand \ref{sec:event-type}
334 : added user id field, MD handle field, and NI specific failure field to
343 \begin_inset LatexCommand \ref{tab:types}
355 \begin_inset LatexCommand \ref{sec:event-type}
367 \begin_inset LatexCommand \ref{tab:func}
379 \begin_inset LatexCommand \ref{sec:meattach}
384 \begin_inset LatexCommand \ref{sec:meinsert}
389 \begin_inset LatexCommand \ref{sec:put}
393 : listed allowable constants with relevant fields.
397 \begin_inset LatexCommand \ref{tab:func}
409 \begin_inset LatexCommand \ref{tab:retcodes}
425 \begin_inset LatexCommand \ref{tab:oconsts}
429 : updated to reflect new event types.
433 \begin_inset LatexCommand \ref{sec:id-type}
452 Summary of Changes for Version 3.1
458 The most significant change to the interface from version 3.0 to 3.1 involves
459 the clarification of how the interface interacts with multi-threaded applicatio
461 We adopted a generic thread model in which processes define an address
462 space and threads share the address space.
463 Consideration of the API in the light of threads lead to several clarifications
464 throughout the document:
471 added a definition for
478 reworded the definition for
487 Section\SpecialChar ~
489 \begin_inset LatexCommand \ref{sec:apiover}
493 : added section\SpecialChar ~
495 \begin_inset LatexCommand \ref{sec:threads}
499 to describe the multi-threading model used by the Portals API.
503 Section\SpecialChar ~
505 \begin_inset LatexCommand \ref{sec:ptlinit}
513 must be called at least once and may be called any number of times.
517 Section\SpecialChar ~
519 \begin_inset LatexCommand \ref{sec:ptlfini}
527 should be called once as the process is terminating and not as each thread
532 Section\SpecialChar ~
534 \begin_inset LatexCommand \ref{sec:pid}
538 : Portals does not define thread ids.
542 Section\SpecialChar ~
544 \begin_inset LatexCommand \ref{sec:ni}
548 : network interfaces are associated with processes, not threads.
552 Section\SpecialChar ~
554 \begin_inset LatexCommand \ref{sec:niinit}
562 must be called at least once and may be called any number of times.
566 Section\SpecialChar ~
568 \begin_inset LatexCommand \ref{sec:eqget}
580 if a thread is blocked on
588 Section\SpecialChar ~
590 \begin_inset LatexCommand \ref{sec:eqwait}
594 : waiting threads are awakened in FIFO order.
606 were removed from the API.
611 was defined to block the calling process until all of the processes in
612 the application group had invoked
617 We now consider this functionality, along with the concept of groups (see
619 \begin_inset Quotes eld
623 \begin_inset Quotes erd
626 ), to be part of the runtime system, not part of the Portals API.
631 was defined to return the number of events in an event queue.
632 Because external operations may lead to new events being added and other
633 threads may remove events, the value returned by
637 would have to be a hint about the number of events in the event queue.
640 Handling small, unexpected messages
643 Another set of changes relates to handling small unexpected messages in
645 In designing version 3.0, we assumed that each unexpected message would
646 be placed in a unique memory descriptor.
647 To avoid the need to process a long list of memory descriptors, we moved
648 the memory descriptors out of the match list and hung them off of a single
650 In this way, large unexpected messages would only encounter a single
651 \begin_inset Quotes eld
655 \begin_inset Quotes erd
658 match list entry before encountering the
659 \begin_inset Quotes eld
663 \begin_inset Quotes erd
667 Experience with this strategy identified resource management problems with
669 In particular, a long sequence of very short (or zero length) messages
670 could quickly exhaust the memory descriptors constructed for handling unexpecte
672 Our new strategy involves the use of several very large memory descriptors
673 for small unexpected messages.
674 Consecutive unexpected messages will be written into the first of these
675 memory descriptors until the memory descriptor fills up.
676 When the first of the
677 \begin_inset Quotes eld
681 \begin_inset Quotes erd
684 descriptors fills up, it will be unlinked and subsequent short messages
685 will be written into the next
686 \begin_inset Quotes eld
690 \begin_inset Quotes erd
695 \begin_inset Quotes eld
699 \begin_inset Quotes erd
702 memory descriptor will be declared full when it does not have sufficient
703 space for the largest small unexpected message.
706 This lead to two significant changes.
707 First, each match list entry now has a single memory descriptor rather
708 than a list of memory descriptors.
709 Second, in addition to exceeding the operation threshold, a memory descriptor
710 can be unlinked when the local offset exceeds a specified value.
711 These changes have lead to several changes in this document:
714 Section\SpecialChar ~
716 \begin_inset LatexCommand \ref{subsec:paddress}
724 removed references to the memory descriptor list,
727 changed the portals address translation description to indicate that unlinking
728 a memory descriptor implies unlinking the associated match list entry--match
729 list entries can no longer be unlinked independently from the memory descriptor.
734 Section\SpecialChar ~
736 \begin_inset LatexCommand \ref{sec:meattach}
744 removed unlink from argument list,
747 removed description of
754 changed wording of the error condition when the Portal table index already
755 has an associated match list.
760 Section\SpecialChar ~
762 \begin_inset LatexCommand \ref{sec:meinsert}
766 : removed unlink from argument list.
770 Section\SpecialChar ~
772 \begin_inset LatexCommand \ref{sec:md-type}
784 Section\SpecialChar ~
786 \begin_inset LatexCommand \ref{sec:mdattach}
801 removed reference to memory descriptor lists,
804 changed wording of the error condition when match list entry already has
805 an associated memory descriptor,
808 changed the description of the
817 Section\SpecialChar ~
819 \begin_inset LatexCommand \ref{sec:md}
831 Section\SpecialChar ~
833 \begin_inset LatexCommand \ref{sec:mdbind}
837 : removed references to memory descriptor list.
841 Section\SpecialChar ~
843 \begin_inset LatexCommand \ref{sec:mdfree}
847 : removed reference to memory descriptor list.
851 Section\SpecialChar ~
853 \begin_inset LatexCommand \ref{sec:summary}
857 : removed references to PtlMDInsert.
861 Section\SpecialChar ~
863 \begin_inset LatexCommand \ref{sec:semantics}
867 : removed reference to memory descriptor list.
871 Section\SpecialChar ~
873 \begin_inset LatexCommand \ref{sec:exmpi}
877 : revised the MPI example to reflect the changes to the interface.
881 Several changes have been made to improve the general documentation of the
886 Section\SpecialChar ~
888 \begin_inset LatexCommand \ref{sec:handle-type}
892 : documented the special value
900 Section\SpecialChar ~
902 \begin_inset LatexCommand \ref{sec:id-type}
906 : documented the special value
914 Section\SpecialChar ~
916 \begin_inset LatexCommand \ref{sec:mdbind}
920 : documented the return value
925 Section\SpecialChar ~
927 \begin_inset LatexCommand \ref{sec:mdupdate}
931 : clarified the description of the
939 Section\SpecialChar ~
941 \begin_inset LatexCommand \ref{sec:implvals}
945 : introduced a new section to document the implementation defined values.
949 Section\SpecialChar ~
951 \begin_inset LatexCommand \ref{sec:summary}
955 : modified Table\SpecialChar ~
957 \begin_inset LatexCommand \ref{tab:oconsts}
961 to indicate where each constant is introduced and where it is used.
968 Implementation defined limits (Section
969 \begin_inset LatexCommand \ref{sec:niinit}
976 The earlier version provided implementation defined limits for the maximum
977 number of match entries, the maximum number of memory descriptors, etc.
978 Rather than spanning the entire implementation, these limits are now associated
979 with individual network interfaces.
982 Added User Ids (Section
983 \begin_inset LatexCommand \ref{sec:uid}
990 Group Ids had been used to simplify access control entries.
991 In particular, a process could allow access for all of the processes in
993 User Ids have been introduced to regain this functionality.
994 We use user ids to fill this role.
997 Removed Group Ids and Rank Ids (Section
998 \begin_inset LatexCommand \ref{sec:pid}
1005 The earlier version of Portals had two forms for addressing processes: <node
1006 id, process id> and <group id, rank id>.
1007 A process group was defined as the collection processes created during
1009 Each process in the group was given a unique rank id in the range 0 to
1011 \begin_inset Formula $n-1$
1015 \begin_inset Formula $n$
1018 was the number of processes in the group.
1019 We removed groups because they are better handled in the runtime system.
1022 Match lists (Section
1023 \begin_inset LatexCommand \ref{sec:meattach}
1030 It is no longer illegal to have an existing match entry when calling PtlMEAttach.
1031 A position argument was added to the list of arguments supplied to
1035 to specify whether the new match entry is prepended or appended to the
1037 If there is no existing match list, the position argument is ignored.
1040 Unlinking Memory Descriptors (Section
1041 \begin_inset LatexCommand \ref{sec:md}
1048 Previously, a memory descriptor could be unlinked if the offset exceeded
1049 a threshold upon the completion of an operation.
1050 In this version, the unlinking is delayed until there is a matching operation
1051 which requires more memory than is currently available in the descriptor.
1052 In addition to changes in section, this lead to a revision of Figure\SpecialChar ~
1054 \begin_inset LatexCommand \ref{fig:flow}
1061 Split Phase Operations and Events (Section
1062 \begin_inset LatexCommand \ref{sec:eq}
1069 Previously, there were five types of events:
1090 The first four of these reflected the completion of potentially long operations.
1091 We have introduced new event types to reflect the fact that long operations
1092 have a distinct starting point and a distinct completion point.
1093 Moreover, the completion may be successful or unsuccessful.
1096 In addition to providing a mechanism for reporting failure to higher levels
1097 of software, this split provides an opportunity for for improved ordering
1099 Previously, if one process intiated two operations (e.g., two put operations)
1100 on a remote process, these operations were guaranteed to complete in the
1101 same order that they were initiated.
1102 Now, we only guarantee that the initiation events are delivered in the
1104 In particular, the operations do not need to complete in the order that
1108 Well known proces ids (Section
1109 \begin_inset LatexCommand \ref{sec:niinit}
1116 To support the notion of
1117 \begin_inset Quotes eld
1120 well known process ids,
1121 \begin_inset Quotes erd
1124 we added a process id argument to the arguments for PtlNIInit.
1130 API Application Programming Interface.
1131 A definition of the functions and semantics provided by library of functions.
1139 that initiates a message operation.
1143 Message An application-defined unit of data that is exchanged between
1151 Message\SpecialChar ~
1152 Operation Either a put operation, which writes data, or a get operation,
1157 Network A network provides point-to-point communication between
1162 Internally, a network may provide multiple routes between endpoints (to
1163 improve fault tolerance or to improve performance characteristics); however,
1164 multiple paths will not be exposed outside of the network.
1168 Node A node is an endpoint in a
1173 Nodes provide processing capabilities and memory.
1174 A node may provide multiple processors (an SMP node) or it may act as a
1183 Process A context of execution.
1184 A process defines a virtual memory (VM) context.
1185 This context is not shared with other processes.
1186 Several threads may share the VM context defined by a process.
1194 that is acted upon by a message operation.
1198 Thread A context of execution that shares a VM context with other threads.
1215 pagenumbering{arabic}
1222 \begin_inset LatexCommand \label{sec:intro}
1232 This document describes an application programming interface for message
1233 passing between nodes in a system area network.
1234 The goal of this interface is to improve the scalability and performance
1235 of network communication by defining the functions and semantics of message
1236 passing required for scaling a parallel computing system to ten thousand
1238 This goal is achieved by providing an interface that will allow a quality
1239 implementation to take advantage of the inherently scalable design of Portals.
1242 This document is divided into several sections:
1245 Section\SpecialChar ~
1247 \begin_inset LatexCommand \ref{sec:intro}
1251 ---Introduction This section describes the purpose and scope of the Portals
1256 Section\SpecialChar ~
1258 \begin_inset LatexCommand \ref{sec:apiover}
1263 Overview\SpecialChar ~
1266 Portals\SpecialChar ~
1268 API This section gives a brief overview of the
1270 The goal is to introduce the key concepts and terminology used in the descripti
1275 Section\SpecialChar ~
1277 \begin_inset LatexCommand \ref{sec:api}
1281 ---The\SpecialChar ~
1282 Portals\SpecialChar ~
1284 API This section describes the functions and semantics of
1285 the Portals application programming interface.
1289 Section\SpecialChar ~
1291 \begin_inset LatexCommand \ref{sec:semantics}
1296 Semantics\SpecialChar ~
1298 Message\SpecialChar ~
1299 Transmission This section describes the semantics
1300 of message transmission.
1301 In particular, the information transmitted in each type of message and
1302 the processing of incoming messages.
1306 Section\SpecialChar ~
1308 \begin_inset LatexCommand \ref{sec:examples}
1312 ---Examples This section presents several examples intended to illustrates
1313 the use of the Portals API.
1320 Existing message passing technologies available for commodity cluster networking
1321 hardware do not meet the scalability goals required by the Cplant\SpecialChar ~
1323 \begin_inset LatexCommand \cite{Cplant}
1327 project at Sandia National Laboratories.
1328 The goal of the Cplant project is to construct a commodity cluster that
1329 can scale to the order of ten thousand nodes.
1330 This number greatly exceeds the capacity for which existing message passing
1331 technologies have been designed and implemented.
1334 In addition to the scalability requirements of the network, these technologies
1335 must also be able to support a scalable implementation of the Message Passing
1336 Interface (MPI)\SpecialChar ~
1338 \begin_inset LatexCommand \cite{MPIstandard}
1342 standard, which has become the
1346 standard for parallel scientific computing.
1347 While MPI does not impose any scalability limitations, existing message
1348 passing technologies do not provide the functionality needed to allow implement
1349 ations of MPI to meet the scalability requirements of Cplant.
1352 The following are properties of a network architecture that do not impose
1353 any inherent scalability limitations:
1356 Connectionless - Many connection-oriented architectures, such as VIA\SpecialChar ~
1358 \begin_inset LatexCommand \cite{VIA}
1362 and TCP/IP sockets, have limitations on the number of peer connections
1363 that can be established.
1367 Network independence - Many communication systems depend on the host processor
1368 to perform operations in order for messages in the network to be consumed.
1369 Message consumption from the network should not be dependent on host processor
1370 activity, such as the operating system scheduler or user-level thread scheduler.
1374 User-level flow control - Many communication systems manage flow control
1375 internally to avoid depleting resources, which can significantly impact
1376 performance as the number of communicating processes increases.
1380 OS Bypass - High performance network communication should not involve memory
1381 copies into or out of a kernel-managed protocol stack.
1385 The following are properties of a network architecture that do not impose
1386 scalability limitations for an implementation of MPI:
1389 Receiver-managed - Sender-managed message passing implementations require
1390 a persistent block of memory to be available for every process, requiring
1391 memory resources to increase with job size and requiring user-level flow
1392 control mechanisms to manage these resources.
1396 User-level Bypass - While OS Bypass is necessary for high-performance, it
1397 alone is not sufficient to support the Progress Rule of MPI asynchronous
1402 Unexpected messages - Few communication systems have support for receiving
1403 messages for which there is no prior notification.
1404 Support for these types of messages is necessary to avoid flow control
1405 and protocol overhead.
1412 Portals was originally designed for and implemented on the nCube machine
1413 as part of the SUNMOS (Sandia/UNM OS)\SpecialChar ~
1415 \begin_inset LatexCommand \cite{SUNMOS}
1419 and Puma\SpecialChar ~
1421 \begin_inset LatexCommand \cite{PumaOS}
1425 lightweight kernel development projects.
1426 Portals went through two design phases, the latter of which is used on
1427 the 4500-node Intel TeraFLOPS machine\SpecialChar ~
1429 \begin_inset LatexCommand \cite{TFLOPS}
1434 Portals have been very successful in meeting the needs of such a large
1435 machine, not only as a layer for a high-performance MPI implementation\SpecialChar ~
1437 \begin_inset LatexCommand \cite{PumaMPI}
1441 , but also for implementing the scalable run-time environment and parallel
1442 I/O capabilities of the machine.
1445 The second generation Portals implementation was designed to take full advantage
1446 of the hardware architecture of large MPP machines.
1447 However, efforts to implement this same design on commodity cluster technology
1448 identified several limitations, due to the differences in network hardware
1449 as well as to shortcomings in the design of Portals.
1455 The primary goal in the design of Portals is scalability.
1456 Portals are designed specifically for an implementation capable of supporting
1457 a parallel job running on tens of thousands of nodes.
1458 Performance is critical only in terms of scalability.
1459 That is, the level of message passing performance is characterized by how
1460 far it allows an application to scale and not by how it performs in micro-bench
1461 marks (e.g., a two node bandwidth or latency test).
1464 The Portals API is designed to allow for scalability, not to guarantee it.
1465 Portals cannot overcome the shortcomings of a poorly designed application
1467 Applications that have inherent scalability limitations, either through
1468 design or implementation, will not be transformed by Portals into scalable
1470 Scalability must be addressed at all levels.
1471 Portals do not inhibit scalability, but do not guarantee it either.
1474 To support scalability, the Portals interface maintains a minimal amount
1476 Portals provide reliable, ordered delivery of messages between pairs of
1478 They are connectionless: a process is not required to explicitly establish
1479 a point-to-point connection with another process in order to communicate.
1480 Moreover, all buffers used in the transmission of messages are maintained
1482 The target process determines how to respond to incoming messages, and
1483 messages for which there are no buffers are discarded.
1489 Portals combine the characteristics of both one-side and two-sided communication.
1491 \begin_inset Quotes eld
1495 \begin_inset Quotes erd
1499 \begin_inset Quotes eld
1503 \begin_inset Quotes erd
1507 The destination of a put (or send) is not an explicit address; instead,
1508 each message contains a set of match bits that allow the receiver to determine
1509 where incoming messages should be placed.
1510 This flexibility allows Portals to support both traditional one-sided operation
1511 s and two-sided send/receive operations.
1514 Portals allows the target to determine whether incoming messages are acceptable.
1515 A target process can choose to accept message operations from any specific
1516 process or can choose to ignore message operations from any specific process.
1519 Zero Copy, OS Bypass and Application Bypass
1522 In traditional system architectures, network packets arrive at the network
1523 interface card (NIC), are passed through one or more protocol layers in
1524 the operating system, and eventually copied into the address space of the
1526 As network bandwidth began to approach memory copy rates, reduction of
1527 memory copies became a critical concern.
1528 This concern lead to the development of zero-copy message passing protocols
1529 in which message copies are eliminated or pipelined to avoid the loss of
1533 A typical zero-copy protocol has the NIC generate an interrupt for the CPU
1534 when a message arrives from the network.
1535 The interrupt handler then controls the transfer of the incoming message
1536 into the address space of the appropriate application.
1537 The interrupt latency, the time from the initiation of an interrupt until
1538 the interrupt handler is running, is fairly significant.
1539 To avoid this cost, some modern NICs have processors that can be programmed
1540 to implement part of a message passing protocol.
1541 Given a properly designed protocol, it is possible to program the NIC to
1542 control the transfer of incoming messages, without needing to interrupt
1544 Because this strategy does not need to involve the OS on every message
1545 transfer, it is frequently called
1546 \begin_inset Quotes eld
1550 \begin_inset Quotes erd
1555 \begin_inset LatexCommand \cite{ST}
1561 \begin_inset LatexCommand \cite{VIA}
1567 \begin_inset LatexCommand \cite{FM2}
1573 \begin_inset LatexCommand \cite{GM}
1577 , and Portals are examples of OS Bypass protocols.
1580 Many protocols that support OS Bypass still require that the application
1581 actively participate in the protocol to ensure progress.
1582 As an example, the long message protocol of PM requires that the application
1583 receive and reply to a request to put or get a long message.
1584 This complicates the runtime environment, requiring a thread to process
1585 incoming requests, and significantly increases the latency required to
1586 initiate a long message protocol.
1587 The Portals message passing protocol does not require activity on the part
1588 of the application to ensure progress.
1590 \begin_inset Quotes eld
1594 \begin_inset Quotes erd
1597 to refer to this aspect of the Portals protocol.
1603 Given the number of components that we are dealing with and the fact that
1604 we are interested in supporting applications that run for very long times,
1605 failures are inevitable.
1606 The Portals API recognizes that the underlying transport may not be able
1607 to successfully complete an operation once it has been initiated.
1608 This is reflected in the fact that the Portals API reports three types
1609 of events: events indicating the initiation of an operation, events indicating
1610 the successful completion of an operation, and events indicating the unsuccessf
1611 ul completion of an operation.
1612 Every initiation event is eventually followed by a successful completion
1613 event or an unsuccessful completion event.
1616 Between the time an operation is started and the time that the operation
1617 completes (successfully or unsuccessfully), any memory associated with
1618 the operation should be considered volatile.
1619 That is, the memory may be changed in unpredictable ways while the operation
1621 Once the operation completes, the memory associated with the operation
1622 will not be subject to further modification (from this operation).
1623 Notice that unsuccessful operations may alter memory in an essentially
1624 unpredictable fashion.
1627 An Overview of the Portals API
1628 \begin_inset LatexCommand \label{sec:apiover}
1635 In this section, we give a conceptual overview of the Portals API.
1636 The goal is to provide a context for understanding the detailed description
1637 of the API presented in the next section.
1641 \begin_inset LatexCommand \label{sec:dmsemantics}
1648 A Portal represents an opening in the address space of a process.
1649 Other processes can use a Portal to read (get) or write (put) the memory
1650 associated with the portal.
1651 Every data movement operation involves two processes, the
1660 The initiator is the process that initiates the data movement operation.
1661 The target is the process that responds to the operation by either accepting
1662 the data for a put operation, or replying with the data for a get operation.
1665 In this discussion, activities attributed to a process may refer to activities
1666 that are actually performed by the process or
1668 on behalf of the process
1671 The inclusiveness of our terminology is important in the context of
1676 In particular, when we note that the target sends a reply in the case of
1677 a get operation, it is possible that reply will be generated by another
1678 component in the system, bypassing the application.
1681 Figures\SpecialChar ~
1683 \begin_inset LatexCommand \ref{fig:put}
1688 \begin_inset LatexCommand \ref{fig:get}
1692 present graphical interpretations of the Portal data movement operations:
1694 In the case of a put operation, the initiator sends a put request message
1695 containing the data to the target.
1696 The target translates the Portal addressing information in the request
1697 using its local Portal structures.
1698 When the request has been processed, the target optionally sends an acknowledge
1703 \begin_inset Float figure
1711 \begin_inset Graphics FormatVersion 1
1725 \begin_inset LatexCommand \label{fig:put}
1735 In the case of a get operation, the initiator sends a get request to the
1737 As with the put operation, the target translates the Portal addressing
1738 information in the request using its local Portal structures.
1739 Once it has translated the Portal addressing information, the target sends
1740 a reply that includes the requested data.
1744 \begin_inset Float figure
1752 \begin_inset Graphics FormatVersion 1
1766 \begin_inset LatexCommand \label{fig:get}
1776 We should note that Portal address translations are only performed on nodes
1777 that respond to operations initiated by other nodes.
1778 Acknowledgements and replies to get operations bypass the portals address
1779 translation structures.
1783 \begin_inset LatexCommand \label{subsec:paddress}
1790 One-sided data movement models (e.g., shmem\SpecialChar ~
1792 \begin_inset LatexCommand \cite{CraySHMEM}
1798 \begin_inset LatexCommand \cite{ST}
1802 , MPI-2\SpecialChar ~
1804 \begin_inset LatexCommand \cite{MPI2}
1808 ) typically use a triple to address memory on a remote node.
1809 This triple consists of a process id, memory buffer id, and offset.
1810 The process id identifies the target process, the memory buffer id specifies
1811 the region of memory to be used for the operation, and the offset specifies
1812 an offset within the memory buffer.
1815 In addition to the standard address components (process id, memory buffer
1816 id, and offset), a Portal address includes a set of match bits.
1817 This addressing model is appropriate for supporting one-sided operations
1818 as well as traditional two-sided message passing operations.
1819 Specifically, the Portals API provides the flexibility needed for an efficient
1820 implementation of MPI-1, which defines two-sided operations with one-sided
1821 completion semantics.
1824 Figure\SpecialChar ~
1826 \begin_inset LatexCommand \ref{fig:portals}
1830 presents a graphical representation of the structures used by a target
1831 in the interpretation of a Portal address.
1832 The process id is used to route the message to the appropriate node and
1833 is not reflected in this diagram.
1834 The memory buffer id, called the
1838 , is used as an index into the Portal table.
1839 Each element of the Portal table identifies a match list.
1840 Each element of the match list specifies two bit patterns: a set of
1841 \begin_inset Quotes eld
1845 \begin_inset Quotes erd
1849 \begin_inset Quotes eld
1853 \begin_inset Quotes erd
1857 In addition to the two sets of match bits, each match list element has
1858 at most one memory descriptor.
1859 Each memory descriptor identifies a memory region and an optional event
1861 The memory region specifies the memory to be used in the operation and
1862 the event queue is used to record information about these operations.
1866 \begin_inset Float figure
1874 \begin_inset Graphics FormatVersion 1
1875 filename portals.eps
1887 Portal Addressing Structures
1888 \begin_inset LatexCommand \label{fig:portals}
1898 Figure\SpecialChar ~
1900 \begin_inset LatexCommand \ref{fig:flow}
1904 illustrates the steps involved in translating a Portal address, starting
1905 from the first element in a match list.
1906 If the match criteria specified in the match list entry are met and the
1907 memory descriptor list accepts the operation
1913 Memory descriptors can reject operations because a threshold has been exceeded
1914 or because the memory region does not have sufficient space, see Section\SpecialChar ~
1916 \begin_inset LatexCommand \ref{sec:md}
1923 , the operation (put or get) is performed using the memory region specified
1924 in the memory descriptor.
1925 If the memory descriptor specifies that it is to be unlinked when a threshold
1926 has been exceeded, the match list entry is removed from the match list
1927 and the resources associated with the memory descriptor and match list
1928 entry are reclaimed.
1929 Finally, if there is an event queue specified in the memory descriptor,
1930 the operation is logged in the event queue.
1934 \begin_inset Float figure
1942 \begin_inset Graphics FormatVersion 1
1943 filename flow_new.eps
1955 Portals Address Translation
1956 \begin_inset LatexCommand \label{fig:flow}
1966 If the match criteria specified in the match list entry are not met, or
1967 there is no memory descriptor associated with the match list entry, or
1968 the memory descriptor associated with the match list entry rejects the
1969 operation, the address translation continues with the next match list entry.
1970 If the end of the match list has been reached, the address translation
1971 is aborted and the incoming requested is discarded.
1977 A process can control access to its portals using an access control list.
1978 Each entry in the access control list specifies a process id and a Portal
1980 The access control list is actually an array of entries.
1981 Each incoming request includes an index into the access control list (i.e.,
1983 \begin_inset Quotes eld
1987 \begin_inset Quotes erd
1991 If the id of the process issuing the request doesn't match the id specified
1992 in the access control list entry or the Portal table index specified in
1993 the request doesn't match the Portal table index specified in the access
1994 control list entry, the request is rejected.
1995 Process identifiers and Portal table indexes may include wild card values
1996 to increase the flexibility of this mechanism.
2000 Two aspects of this design merit further discussion.
2001 First, the model assumes that the information in a message header, the
2002 sender's id in particular, is trustworthy.
2003 In most contexts, we assume that the entity that constructs the header
2004 is trustworthy; however, using cryptographic techniques, we could easily
2005 devise a protocol that would ensure the authenticity of the sender.
2008 Second, because the access check is performed by the receiver, it is possible
2009 that a malicious process will generate thousands of messages that will
2010 be denied by the receiver.
2011 This could saturate the network and/or the receiver, resulting in a
2016 Moving the check to the sender using capabilities, would remove the potential
2017 for this form of attack.
2018 However, the solution introduces the complexities of capability management
2019 (exchange of capabilities, revocation, protections, etc).
2022 Multi-threaded Applications
2023 \begin_inset LatexCommand \label{sec:threads}
2030 The Portals API supports a generic view of multi-threaded applications.
2031 From the perspective of the Portals API, an application program is defined
2032 by a set of processes.
2033 Each process defines a unique address space.
2034 The Portals API defines access to this address space from other processes
2035 (using portals addressing and the data movement operations).
2036 A process may have one or more
2040 executing in its address space.
2044 With the exception of
2048 every function in the Portals API is non-blocking and atomic with respect
2049 to both other threads and external operations that result from data movement
2051 While individual operations are atomic, sequences of these operations may
2052 be interleaved between different threads and with external operations.
2053 The Portals API does not provide any mechanisms to control this interleaving.
2054 It is expected that these mechanisms will be provided by the API used to
2059 \begin_inset LatexCommand \label{sec:api}
2067 \begin_inset LatexCommand \label{sec:conv}
2074 The Portals API defines two types of entities: functions and types.
2075 Function always start with
2079 and use mixed upper and lower case.
2080 When used in the body of this report, function names appear in italic face,
2086 The functions associated with an object type will have names that start
2091 , followed by the two letter object type code shown in Table\SpecialChar ~
2093 \begin_inset LatexCommand \ref{tab:objcodes}
2098 As an example, the function
2102 allocates resources for an event queue.
2106 \begin_inset Float table
2114 \begin_inset LatexCommand \label{tab:objcodes}
2136 \begin_inset Tabular
2137 <lyxtabular version="3" rows="5" columns="3">
2138 <features firstHeadEmpty="true">
2139 <column alignment="left" valignment="top" width="0pt">
2140 <column alignment="left" valignment="top" width="0pt">
2141 <column alignment="left" valignment="top" width="0pt">
2142 <row bottomline="true">
2143 <cell alignment="left" valignment="top" bottomline="true" usebox="none">
2153 <cell alignment="left" valignment="top" bottomline="true" usebox="none">
2161 <cell alignment="left" valignment="top" bottomline="true" usebox="none">
2171 <cell alignment="left" valignment="top" usebox="none">
2179 <cell alignment="left" valignment="top" usebox="none">
2187 <cell alignment="left" valignment="top" usebox="none">
2193 \begin_inset LatexCommand \ref{sec:eq}
2202 <cell alignment="left" valignment="top" usebox="none">
2210 <cell alignment="left" valignment="top" usebox="none">
2218 <cell alignment="left" valignment="top" usebox="none">
2224 \begin_inset LatexCommand \ref{sec:md}
2233 <cell alignment="left" valignment="top" usebox="none">
2241 <cell alignment="left" valignment="top" usebox="none">
2249 <cell alignment="left" valignment="top" usebox="none">
2255 \begin_inset LatexCommand \ref{sec:me}
2264 <cell alignment="left" valignment="top" usebox="none">
2272 <cell alignment="left" valignment="top" usebox="none">
2280 <cell alignment="left" valignment="top" usebox="none">
2286 \begin_inset LatexCommand \ref{sec:ni}
2304 Type names use lower case with underscores to separate words.
2305 Each type name starts with
2314 When used in the body of this report, type names appear in a fixed font,
2322 Names for constants use upper case with underscores to separate words.
2323 Each constant name starts with
2328 When used in the body of this report, type names appear in a fixed font,
2339 The Portals API defines a variety of base types.
2340 These types represent a simple renaming of the base types provided by the
2341 C programming language.
2342 In most cases these new type names have been introduced to improve type
2343 safety and to avoid issues arising from differences in representation sizes
2344 (e.g., 16-bit or 32-bit integers).
2348 \begin_inset LatexCommand \label{sec:size-t}
2359 is an unsigned 64-bit integral type used for representing sizes.
2363 \begin_inset LatexCommand \label{sec:handle-type}
2370 Objects maintained by the API are accessed through handles.
2371 Handle types have names of the form
2383 is one of the two letter object type codes shown in Table\SpecialChar ~
2385 \begin_inset LatexCommand \ref{tab:objcodes}
2390 For example, the type
2394 is used for network interface handles.
2397 Each type of object is given a unique handle type to enhance type checking.
2402 , can be used when a generic handle is needed.
2403 Every handle value can be converted into a value of type
2407 without loss of information.
2410 Handles are not simple values.
2411 Every portals object is associated with a specific network interface and
2412 an identifier for this interface (along with an object identifier) is part
2413 of the handle for the object.
2424 , is used to indicate the absence of an event queue.
2426 \begin_inset LatexCommand \ref{sec:mdfree}
2432 \begin_inset LatexCommand \ref{sec:mdupdate}
2436 for uses of this value.
2440 \begin_inset LatexCommand \label{sec:index-type}
2455 are integral types used for representing Portal table indexes and access
2456 control tables indexes, respectively.
2457 See section\SpecialChar ~
2459 \begin_inset LatexCommand \ref{sec:niinit}
2463 for limits on values of these types.
2467 \begin_inset LatexCommand \label{sec:mb-type}
2478 is capable of holding unsigned 64-bit integer values.
2482 \begin_inset LatexCommand \label{sec:ni-type}
2493 is an integral type used for identifying different network interfaces.
2494 Users will need to consult the local documentation to determine appropriate
2495 values for the interfaces available.
2500 identifies the default interface.
2504 \begin_inset LatexCommand \label{sec:id-type}
2515 is an integral type used for representing node ids
2519 is an integral type for representing process ids, and
2523 is an integral type for representing user ids.
2530 matches any process identifier, PTL_NID_ANY matches any node identifier,
2535 matches any user identifier.
2537 \begin_inset LatexCommand \ref{sec:meattach}
2543 \begin_inset LatexCommand \ref{sec:acentry}
2547 for uses of these values.
2551 \begin_inset LatexCommand \label{sec:stat-type}
2558 Each network interface maintains an array of status registers that can be
2563 function (see Section\SpecialChar ~
2565 \begin_inset LatexCommand \ref{sec:nistatus}
2574 defines the types of indexes that can be used to access the status registers.
2575 The only index defined for all implementations is
2579 which identifies the status register that counts the dropped requests for
2581 Other indexes (and registers) may be defined by the implementation.
2588 defines the types of values held in status registers.
2589 This is a signed integer type.
2590 The size is implementation dependent, but must be at least 32 bits.
2593 Initialization and Cleanup
2594 \begin_inset LatexCommand \label{sec:init}
2601 The Portals API includes a function,
2605 , to initialize the library and a function,
2609 , to cleanup after the application is done using the library.
2613 \begin_inset LatexCommand \label{sec:ptlinit}
2620 int PtlInit( int *max_interfaces );
2627 function initializes the Portals library.
2628 PtlInit must be called at least once by a process before any thread makes
2629 a Portals function call, but may be safely called more than once.
2630 \layout Subsubsection
2635 PTL_OK Indicates success.
2639 PTL_FAIL Indicates an error during initialization.
2643 PTL_SEGV Indicates that
2647 is not a legal address.
2649 \layout Subsubsection
2655 \begin_inset Tabular
2656 <lyxtabular version="3" rows="1" columns="3">
2658 <column alignment="right" valignment="top" width="0pt">
2659 <column alignment="center" valignment="top" width="0pt">
2660 <column alignment="left" valignment="top" width="5in">
2662 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2672 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2682 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
2687 On successful return, this location will hold the maximum number of interfaces
2688 that can be initialized.
2700 \begin_inset LatexCommand \label{sec:ptlfini}
2707 void PtlFini( void );
2714 function cleans up after the Portals library is no longer needed by a process.
2715 After this function is called, calls to any of the functions defined by
2716 the Portal API or use of the structures set up by the Portals API will
2717 result in undefined behavior.
2718 This function should be called once and only once during termination by
2720 Typically, this function will be called in the exit sequence of a process.
2721 Individual threads should not call PtlFini when they terminate.
2725 \begin_inset LatexCommand \label{sec:ni}
2732 The Portals API supports the use of multiple network interfaces.
2733 However, each interface is treated as an independent entity.
2734 Combining interfaces (e.g.,
2735 \begin_inset Quotes eld
2739 \begin_inset Quotes erd
2742 to create a higher bandwidth connection) must be implemented by the application
2743 or embedded in the underlying network.
2744 Interfaces are treated as independent entities to make it easier to cache
2745 information on individual network interface cards.
2748 Once initialized, each interface provides a Portal table, an access control
2749 table, and a collection of status registers.
2750 See Section\SpecialChar ~
2752 \begin_inset LatexCommand \ref{sec:me}
2756 for a discussion of updating Portal table entries using the
2761 See Section\SpecialChar ~
2763 \begin_inset LatexCommand \ref{sec:ac}
2767 for a discussion of the initialization and updating of entries in the access
2769 See Section\SpecialChar ~
2771 \begin_inset LatexCommand \ref{sec:nistatus}
2775 for a discussion of the
2779 function which can be used to determine the value of a status register.
2782 Every other type of Portal object (e.g., memory descriptor, event queue, or
2783 match list entry) is associated with a specific network interface.
2784 The association to a network interface is established when the object is
2785 created and is encoded in the handle for the object.
2788 Each network interface is initialized and shutdown independently.
2789 The initialization routine,
2793 , returns a handle for an interface object which is used in all subsequent
2799 function is used to shutdown an interface and release any resources that
2800 are associated with the interface.
2801 Network interface handles are associated with processes, not threads.
2802 All threads in a process share all of the network interface handles.
2805 The Portals API also defines the
2809 function to query the status registers for a network interface, the
2813 function to determine the
2814 \begin_inset Quotes eld
2818 \begin_inset Quotes erd
2821 to another process, and the
2825 function to determine the network interface that an object is associated
2830 \begin_inset LatexCommand \label{sec:niinit}
2839 int max_match_entries;
2841 int max_mem_descriptors;
2843 int max_event_queues;
2845 ptl_ac_index_t max_atable_index;
2847 ptl_pt_index_t max_ptable_index;
2853 int PtlNIInit( ptl_interface_t interface
2857 ptl_ni_limits_t* desired,
2859 ptl_ni_limits_t* actual,
2861 ptl_handle_ni_t* handle );
2868 include the following members:
2871 max_match_entries Maximum number of match entries that can be allocated
2875 max_mem_descriptors Maximum number of memory descriptors that can be allocated
2879 max_event_queues Maximum number of event queues that can be allocated at
2883 max_atable_index Largest access control table index for this interface,
2884 valid indexes range from zero to
2891 max_ptable_index Largest Portal table index for this interface, valid indexes
2903 function is used to initialized the Portals API for a network interface.
2904 This function must be called at least once by each process before any other
2905 operations that apply to the interface by any process or thread.
2906 For subsequent calls to
2910 from within the same process (either by different threads or the same thread),
2911 the desired limits will be ignored and the call will return the existing
2913 \layout Subsubsection
2918 PTL_OK Indicates success.
2922 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
2926 PTL_INIT_DUP Indicates a duplicate initialization of
2934 PTL_INIT_INV Indicates that
2938 is not a valid network interface.
2942 PTL_NOSPACE Indicates that there is insufficient memory to initialize the
2947 PTL_INV_PROC Indicates that
2951 is not a valid process id.
2954 PTL_SEGV Indicates that
2962 is not a legal address.
2964 \layout Subsubsection
2970 \begin_inset Tabular
2971 <lyxtabular version="3" rows="5" columns="3">
2973 <column alignment="right" valignment="top" width="0pt">
2974 <column alignment="center" valignment="top" width="0pt">
2975 <column alignment="left" valignment="top" width="4.7in">
2977 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2987 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
2997 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3002 Identifies the network interface to be initialized.
3003 (See section\SpecialChar ~
3005 \begin_inset LatexCommand \ref{sec:ni-type}
3009 for a discussion of values used to identify network interfaces.)
3014 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3024 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3034 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3039 Identifies the desired process id (for well known process ids).
3044 may be used to have the process id assigned by the underlying library.
3049 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3059 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3069 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3074 If non-NULL, points to a structure that holds the desired limits.
3079 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3089 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3099 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3104 On successful return, the location pointed to by actual will hold the actual
3110 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3120 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3130 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3135 On successful return, this location will hold a handle for the interface.
3146 The use of desired is implementation dependent.
3147 In particular, an implementation may choose to ignore this argument.
3151 \begin_inset LatexCommand \label{sec:nifini}
3158 int PtlNIFini( ptl_handle_ni_t interface );
3165 function is used to release the resources allocated for a network interface.
3170 operation has been started, the results of pending API operations (e.g.,
3171 operations initiated by another thread) for this interface are undefined.
3172 Similarly, the effects of incoming operations (puts and gets) or return
3173 values (acknowledgements and replies) for this interface are undefined.
3174 \layout Subsubsection
3179 PTL_OK Indicates success.
3183 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
3187 PTL_INV_NI Indicates that
3191 is not a valid network interface handle.
3193 \layout Subsubsection
3199 \begin_inset Tabular
3200 <lyxtabular version="3" rows="1" columns="3">
3202 <column alignment="right" valignment="top" width="0pt">
3203 <column alignment="center" valignment="top" width="0pt">
3204 <column alignment="center" valignment="top" width="0pt">
3206 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3214 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3224 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3229 A handle for the interface to shutdown.
3241 \begin_inset LatexCommand \label{sec:nistatus}
3248 int PtlNIStatus( ptl_handle_ni_t interface,
3250 ptl_sr_index_t status_register,
3252 ptl_sr_value_t* status );
3259 function returns the value of a status register for the specified interface.
3260 (See section\SpecialChar ~
3262 \begin_inset LatexCommand \ref{sec:stat-type}
3266 for more information on status register indexes and status register values.)
3267 \layout Subsubsection
3272 PTL_OK Indicates success.
3276 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
3280 PTL_INV_NI Indicates that
3284 is not a valid network interface handle.
3288 PTL_INV_SR_INDX Indicates that
3292 is not a valid status register.
3296 PTL_SEGV Indicates that
3300 is not a legal address.
3302 \layout Subsubsection
3308 \begin_inset Tabular
3309 <lyxtabular version="3" rows="3" columns="3">
3311 <column alignment="right" valignment="top" width="0pt">
3312 <column alignment="center" valignment="top" width="0pt">
3313 <column alignment="left" valignment="top" width="4.7in">
3315 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3325 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3335 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3340 A handle for the interface to use.
3346 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3356 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3366 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3371 An index for the status register to read.
3376 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3386 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3396 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3401 On successful return, this location will hold the current value of the status
3413 The only status register that must be defined is a drop count register (
3418 Implementations may define additional status registers.
3419 Identifiers for the indexes associated with these registers should start
3430 int PtlNIDist( ptl_handle_ni_t interface,
3432 ptl_process_id_t process,
3434 unsigned long* distance );
3441 function returns the distance to another process using the specified interface.
3442 Distances are only defined relative to an interface.
3443 Distance comparisons between different interfaces on the same process may
3445 \layout Subsubsection
3450 PTL_OK Indicates success.
3454 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
3458 PTL_INV_NI Indicates that
3462 is not a valid network interface handle.
3466 PTL_INV_PROC Indicates that
3470 is not a valid process identifier.
3474 PTL_SEGV Indicates that
3478 is not a legal address.
3480 \layout Subsubsection
3486 \begin_inset Tabular
3487 <lyxtabular version="3" rows="3" columns="3">
3489 <column alignment="right" valignment="top" width="0pt">
3490 <column alignment="center" valignment="top" width="0pt">
3491 <column alignment="left" valignment="top" width="4.7in">
3493 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3503 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3513 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3518 A handle for the interface to use.
3524 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3534 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3544 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3549 An identifier for the process whose distance is being requested.
3555 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3565 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3575 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3580 On successful return, this location will hold the distance to the remote
3592 This function should return a static measure of distance.
3593 Examples include minimum latency, the inverse of available bandwidth, or
3594 the number of switches between the two endpoints.
3600 int PtlNIHandle( ptl_handle_any_t handle,
3602 ptl_handle_ni_t* interface );
3609 function returns a handle for the network interface with which the object
3615 If the object identified by
3619 is a network interface, this function returns the same value it is passed.
3620 \layout Subsubsection
3625 PTL_OK Indicates success.
3629 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
3633 PTL_INV_HANDLE Indicates that
3637 is not a valid handle.
3641 PTL_SEGV Indicates that
3645 is not a legal address.
3647 \layout Subsubsection
3653 \begin_inset Tabular
3654 <lyxtabular version="3" rows="2" columns="3">
3656 <column alignment="right" valignment="top" width="0pt">
3657 <column alignment="center" valignment="top" width="0pt">
3658 <column alignment="left" valignment="top" width="4.7in">
3660 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3670 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3680 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3685 A handle for the object.
3690 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3700 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3710 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3715 On successful return, this location will hold a handle for the network interface
3731 Every handle should encode the network interface and the object id relative
3733 Both are presumably encoded using integer values.
3737 \begin_inset LatexCommand \label{sec:uid}
3744 Every process runs on behalf of a user.
3751 int PtlGetUid( ptl_handle_ni_t ni_handle,
3754 \layout Subsubsection
3759 PTL_OK Indicates success.
3763 PTL_INV_NI Indicates that
3767 is not a valid network interface handle.
3771 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
3775 PTL_SEGV Indicates that
3779 is not a legal address.
3781 \layout Subsubsection
3787 \begin_inset Tabular
3788 <lyxtabular version="3" rows="2" columns="3">
3790 <column alignment="right" valignment="top" width="0pt">
3791 <column alignment="center" valignment="top" width="0pt">
3792 <column alignment="left" valignment="top" width="5in">
3794 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3804 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3814 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3819 A network interface handle.
3824 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3834 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3844 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
3849 On successful return, this location will hold the user id for the calling
3861 Note that user identifiers are dependent on the network interface(s).
3862 In particular, if a node has multiple interfaces, a process may have multiple
3866 Process Identification
3867 \begin_inset LatexCommand \label{sec:pid}
3874 Processes that use the Portals API, can be identified using a node id and
3876 Every node accessible through a network interface has a unique node identifier
3877 and every process running on a node has a unique process identifier.
3878 As such, any process in the computing system can be identified by its node
3883 The Portals API defines a type,
3887 for representing process ids and a function,
3891 , which can be used to obtain the id of the current process.
3894 The portals API does not include thread identifiers.
3895 Messages are delivered to processes (address spaces) not threads (contexts
3900 \begin_inset LatexCommand \label{sec:pid-type}
3909 ptl_nid_t nid; /* node id */
3911 ptl_pid_t pid; /* process id */
3920 type uses two identifiers to represent a process id: a node id and a process
3926 \begin_inset LatexCommand \label{sub:PtlGetId}
3933 int PtlGetId( ptl_handle_ni_t ni_handle,
3935 ptl_process_id_t* id );
3936 \layout Subsubsection
3941 PTL_OK Indicates success.
3945 PTL_INV_NI Indicates that
3949 is not a valid network interface handle.
3953 PTL_NOINIT Indicates that the Portals API has not been successfully initialized.
3957 PTL_SEGV Indicates that
3961 is not a legal address.
3963 \layout Subsubsection
3969 \begin_inset Tabular
3970 <lyxtabular version="3" rows="2" columns="3">
3972 <column alignment="right" valignment="top" width="0pt">
3973 <column alignment="center" valignment="top" width="0pt">
3974 <column alignment="left" valignment="top" width="5in">
3976 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3986 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
3996 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4001 A network interface handle.
4006 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4016 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
4026 <cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
4031 On successful return, this location will hold the id for the calling process.
4042 Note that process identifiers are dependent on the network interface(s).
4043 In particular, if a node has multiple interfaces, it may have multiple
4047 Match List Entries and Match Lists
4048 \begin_inset LatexCommand \label{sec:me}
4055 A match list is a chain of match list entries.
4056 Each match list entry includes a memory descriptor and a set of match criteria.
4057 The match criteria can be used to reject incoming requests based on process
4058 id or the match bits provided in the request.
4059 A match list is created using the
4067 functions, which create a match list consisting of a single match list
4068 entry, attaches the match list to the specified Portal index, and returns
4069 a handle for the match list entry.
4070 Match entries can be dynamically inserted and removed from a match list
4083 \begin_inset LatexCommand \label{sec:meattach}
4090 typedef enum { PTL_RETAIN, PTL_UNLINK } ptl_unlink_t;
4095 typedef enum { PTL_INS_BEFORE, PTL_INS_AFTER } ptl_ins_pos_t;
4100 int PtlMEAttach( ptl_handle_ni_t interface,
4102 ptl_pt_index_t index,
4104 ptl_process_id_t matchid,
4106 ptl_match_bits_t match_bits,
4108 ptl_match_bits_t ignorebits,
4110 ptl_unlink_t unlink,
4112 ptl_ins_pos_t position,
4114 ptl_handle_me_t* handle );
4121 are used to control where a new item is inserted.
4126 is used to insert the new item before the current item or before the head
4132 is used to insert the new item after the current item or after the last
4141 function creates a match list consisting of a single entry and attaches
4142 this list to the Portal table for