MPIX_Win_complete_stream
Completes a GPU stream aware RMA exposure epoch started by a call to
MPIX_Win_post_stream on win.
Definitions
C/C++ Synopsis
int MPIX_Win_wait_stream ( MPI_Win win, void * stream )
Arguments
win                 window object (handle) passed as input
stream              GPU stream handle passed as input
Description
MPIX_Win_wait_stream enqueues operations to GPU stream stream to complete the GPU stream aware exposure epoch started by a call to MPIX_Win_post_stream on win. An exposure epoch created by a call to MPIX_Win_post_stream is closed with a call to MPIX_Win_wait_stream.
MPIX_Win_wait_stream matches calls to MPIX_Win_complete_stream issued by each process that was granted access to the window during this exposure epoch created by MPIX_Win_post_stream. A call to MPIX_Win_wait_stream blocks the GPU stream stream until all matching calls to MPIX_Win_complete_stream have completed execution.
MPIX_Win_wait_stream enqueues GPU wait kernels into the GPU stream stream. Successful completion of the execution of the enqueued GPU wait kernels guarantees that all origin processes with access to the active exposure epoch have completed access to the local window, and all RMA operations with access to the local window have completed.
Note, return from MPIX_Win_wait_stream does not guarantee completion of all RMA operations with access to the local window. Operations are enqueued to the GPU control processor stream and executed in-order to other operations in the stream.
Return Values
None.
Examples
C/C++ Example
Example code snippet showing the usage of basic stream triggered communication operations and comparing about regular non-GPU stream aware MPI active RMA operation.
Non-GPU Stream-aware Example
for (int i = 0; i < iterations; i++ ) {
    MPI_Win_post(group,0,win);
    increment<<<nb,nt,0,stream>>>(n,source);
    CHECK(hipStreamSynchronize(stream));
    MPI_Win_start(group,0,win);
    MPI_Put(source,n,MPI_INT,left,fromL-base,n,MPI_INT,win);
    MPI_Put(source,n,MPI_INT,right,fromR-base,n,MPI_INT,win);
    MPI_Win_complete(win);
    MPI_Win_wait(win);
    compare<<<nb,nt,0,stream>>>(i,n,fromL);
    compare<<<nb,nt,0,stream>>>(i,n,fromR);
    CHECK(hipStreamSynchronize(stream));
}
GPU Stream-aware Example
for (int i = 0; i < iterations; i++ ) {
    MPIX_Win_post_stream(group,win,stream);
    increment<<<nb,nt,0,stream>>>(n,source);
    MPI_Win_start(group,MPI_MODE_STREAM,win);
    MPI_Put(source,n,MPI_INT,left,fromL-base,n,MPI_INT,win);
    MPI_Put(source,n,MPI_INT,right,fromR-base,n,MPI_INT,win);
    MPIX_Win_complete_stream(win,stream);
    MPIX_Win_wait_stream(win,stream);
    compare<<<nb,nt,0,stream>>>(i,n,fromL);
    compare<<<nb,nt,0,stream>>>(i,n,fromR);
}
CHECK(hipStreamSynchronize(stream));