upc_all_broadcast
- Date:
05-03-2013
NAME
upc_all_broadcast - collectively broadcasts shared memory
SYNOPSIS
void upc_all_broadcast( shared void * restrict dst, shared const void * restrict src,
size_t nbytes, upc_flag_t flags );
IMPLEMENTATION
Cray Linux Environment (CLE)
DESCRIPTION
The upc_all_broadcast collective function copies a block of memory (src) with affinity to a single thread to a block of shared memory (dst) on each thread. The number of bytes in each block is nbytes, where nbytes > 0.
The upc_all_broadcast function treats the src pointer as if it pointed to a shared memory area with the type:
shared [] char[nbytes]
The effect is equivalent to copying the entire array pointed to by src to each block of nbytes bytes of a shared array dst with the type:
shared [nbytes] char[nbytes * THREADS]
The target of the dst pointer must have affinity to thread 0. The dst pointer is treated as if it has phase 0.
Controlling Data Synchronization
The argument flag is of type upc_flag_t and is used to specify the data synchronization semantics for the collective function. The value of flag is formed by or-ing together a constant of the form UPC_IN_XSYNC and a constant of the form UPC_OUT_YSYNC, where X and Y may be NO, MY, or ALL. If X is:
- NO
The function may begin to read or write data when the first thread has entered the collective function call.
- MY
The function may begin to read or write only data which has affinity to threads that have entered the collective function call.
- ALL
The function may begin to read or write data only after all threads have entered the function call.
And if Y is:
- NO
The function may read and write data until the last thread has returned from the collective function call.
- MY
The function call may return in a thread only after all reads and writes of data with affinity to the thread are complete.
- ALL
The function call may return only after all reads and writes of data are complete.
For further information, see upc_flag_t(3c).
EXAMPLES
Example 1:
#include <upc.h>
#include <upc_collective.h>
shared int A[THREADS];
shared int B[THREADS];
// Initialize A.
upc_barrier;
upc_all_broadcast( B, &A[1], sizeof(int),
UPC_IN_NOSYNC | UPC_OUT_NOSYNC );
upc_barrier;
Example 2:
#include <upc.h>
#include <upc_collective.h>
#define NELEMS 10
shared [] int A[NELEMS];
shared [NELEMS] int B[NELEMS*THREADS];
// Initialize A.
upc_all_broadcast( B, A, sizeof(int)*NELEMS,
UPC_IN_ALLSYNC | UPC_OUT_ALLSYNC );
Example 3: Shows (A[3], A[4]) is broadcast to (B[0], B[1]), (B[10], B[11]), (B[20], B[21]), …, (B[NELEMS*(THREADS-1)], B[NELEMS*(THREADS-1)+1]).
#include <upc.h>
#include <upc_collective.h>
#define NELEMS 10
shared [NELEMS] int A[NELEMS*THREADS];
shared [NELEMS] int B[NELEMS*THREADS];
// Initialize A.
upc_barrier;
upc_all_broadcast( B, &A[3], sizeof(int)*2,
UPC_IN_NOSYNC | UPC_OUT_NOSYNC );
upc_barrier;
SEE ALSO
intro_pgas(7), upc_all_exchange(3c), upc_all_gather(3c), upc_all_gather_all(3c), upc_all_permute(3c), upc_all_reduce(3c), upc_all_scatter(3c), upc_flag_t(3c)