Up: Point-to-Point Communication Next: Buffer allocation and usage Previous: Communication Modes
A valid MPI implementation guarantees certain general properties of point-to-point communication, which are described in this section.
Messages are non-overtaking: If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the second message if the first one is still pending. If a receiver posts two receives in succession, and both match the same message, then the second receive operation cannot be satisfied by this message, if the first one is still pending. This requirement facilitates matching of sends to receives. It guarantees that message-passing code is deterministic, if processes are single-threaded and the wildcard MPI_ANY_SOURCE is not used in receives. (Some of the calls described later, such as MPI_CANCEL or MPI_WAITANY, are additional sources of nondeterminism.)
If a process has a single thread of execution, then any two communications executed by this process are ordered. On the other hand, if the process is multi-threaded, then the semantics of thread execution may not define a relative order between two send operations executed by two distinct threads. The operations are logically concurrent, even if one physically precedes the other. In such a case, the two messages sent can be received in any order. Similarly, if two receive operations that are logically concurrent receive two successively sent messages, then the two messages can match the two receives in either order.
An example of non-overtaking messages.*Progress
CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IFThe message sent by the first send must be received by the first receive, and
the message sent by the second send must be received by the second receive.
If a pair of matching send and receives have been initiated on two processes, then at least one of these two operations will complete, independently of other actions in the system: the send operation will complete, unless the receive is satisfied by another message, and completes; the receive operation will complete, unless the message sent is consumed by another matching receive that was posted at the same destination process.
An example of two, intertwined matching pairs.*FairnessCALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag1, comm, ierr) CALL MPI_SSEND(buf2, count, MPI_REAL, 1, tag2, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, tag2, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag1, comm, status, ierr) END IFBoth processes invoke their first communication call.
Since the first send of process zero uses the buffered mode, it must complete,
irrespective of the state of process one. Since no matching receive is
posted, the message will be copied into buffer space. (If insufficient
buffer space is available, then the program will fail.)
The second send is then invoked.
At that point, a matching pair of send and receive
operation is enabled, and both operations must complete. Process one next
invokes its second receive call, which will be satisfied by the buffered
message. Note that process one received the messages in the reverse order they
MPI makes no guarantee of fairness in the handling of communication. Suppose that a send is posted. Then it is possible that the destination process repeatedly posts a receive that matches this send, yet the message is never received, because it is each time overtaken by another message, sent from another source. Similarly, suppose that a receive was posted by a multi-threaded process. Then it is possible that messages that match this receive are repeatedly received, yet the receive is never satisfied, because it is overtaken by other receives posted at this node (by other executing threads). It is the programmer's responsibility to prevent starvation in such situations.
Any pending communication operation consumes system resources that are limited. Errors may occur when lack of resources prevent the execution of an MPI call. A quality implementation will use a (small) fixed amount of resources for each pending send in the ready or synchronous mode and for each pending receive. However, buffer space may be consumed to store messages sent in standard mode, and must be consumed to store messages sent in buffered mode, when no matching receive is available. The amount of space available for buffering will be much smaller than program data memory on many systems. Then, it will be easy to write programs that overrun available buffer space.
MPI allows the user to provide buffer memory for messages sent in the buffered mode. Furthermore, MPI specifies a detailed operational model for the use of this buffer. An MPI implementation is required to do no worse than implied by this model. This allows users to avoid buffer overflows when they use buffered sends. Buffer allocation and use is described in Section Buffer allocation and usage .
A buffered send operation that cannot complete because of a lack of buffer space is erroneous. When such a situation is detected, an error is signalled that may cause the program to terminate abnormally. On the other hand, a standard send operation that cannot complete because of lack of buffer space will merely block, waiting for buffer space to become available or for a matching receive to be posted. This behavior is preferable in many situations. Consider a situation where a producer repeatedly produces new values and sends them to a consumer. Assume that the producer produces new values faster than the consumer can consume them. If buffered sends are used, then a buffer overflow will result. Additional synchronization has to be added to the program so as to prevent this from occurring. If standard sends are used, then the producer will be automatically throttled, as its send operations will block when buffer space is unavailable.
In some situations, a lack of buffer space leads to deadlock situations. This is illustrated by the examples below.
An exchange of messages.
CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) END IFThis program will succeed even if no buffer space for data
is available. The standard send operation can be replaced, in this example,
with a synchronous send.
An attempt to exchange messages.CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) END IFThe receive operation of the first process must complete before its send, and
can complete only if the matching send
of the second processor is executed. The receive operation of the second
process must complete before its send and
can complete only if the matching send of the first process is executed.
This program will always deadlock. The same holds for any other send mode.
An exchange that relies on buffering.CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE ! rank.EQ.1 CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) END IFThe message sent by each process has to be copied out before the send operation
returns and the receive operation starts. For the program to complete, it is
necessary that at least one of the two messages sent be buffered.
Thus, this program can
succeed only if
the communication system can buffer at least count words of data.
 Advice to users.
When standard send operations are used, then a deadlock situation may occur where both processes are blocked because buffer space is not available. The same will certainly happen, if the synchronous mode is used. If the buffered mode is used, and not enough buffer space is available, then the program will not complete either. However, rather than a deadlock situation, we shall have a buffer overflow error.
A program is ``safe'' if no message buffering is required for the program to complete. One can replace all sends in such program with synchronous sends, and the program will still run correctly. This conservative programming style provides the best portability, since program completion does not depend on the amount of buffer space available or in the communication protocol used.
Many programmers prefer to have more leeway and be able to use the ``unsafe'' programming style shown in example Semantics of point-to-point communication . In such cases, the use of standard sends is likely to provide the best compromise between performance and robustness: quality implementations will provide sufficient buffering so that ``common practice'' programs will not deadlock. The buffered send mode can be used for programs that require more buffering, or in situations where the programmer wants more control. This mode might also be used for debugging purposes, as buffer overflow conditions are easier to diagnose than deadlock conditions.
Nonblocking message-passing operations, as described in
Section Nonblocking communication
, can be used to avoid the need for buffering
outgoing messages. This prevents deadlocks due to lack of buffer space, and
improves performance, by allowing overlap of computation and communication,
and avoiding the overheads of allocating buffers and copying messages into
( End of advice to users.)
Up: Point-to-Point Communication Next: Buffer allocation and usage Previous: Communication Modes
Return to MPI Standard Index
Return to MPI home page