upc_all_reduce
- Date:
08-26-2014
NAME
upc_all_reduce, upc_all_prefix_reduce - Collective computation operations
SYNOPSIS
#include <upc.h>
#include <upc_collective.h>
void upc_all_reduceT( shared void * restrict dst, shared const void * restrict src,
upc_op_t op, size_t nelems, size_t blk_size,
TYPE(*func)(TYPE, TYPE), upc_flag_t flags )
void upc_all_prefix_reduceT( shared void * restrict dst, shared const void * restrict src,
upc_op_t op, size_t nelems, size_t blk_size,
TYPE(*func)(TYPE, TYPE), upc_flag_t flags )
IMPLEMENTATION
Cray Linux Environment (CLE)
DESCRIPTION
The function prototypes above represents the 22 variations of the upc_all_reduceT and upc_all_prefix_reduceT functions where T and TYPE have the following correspondences:
----------------------
T TYPE
----------------------
C signed char
UC unsigned char
S signed short
US unsigned short
I signed int
UI unsigned int
L signed long
UL unsigned long
F float
D double
LD long double
----------------------
The upc_all_reduceT function reduces nelems values located at shared memory area src into exactly one result located at dst. In other words, on completion of upc_all_reduceT, dst[0] = src[0] @ src[1] @ … @ src[nelems - 1] where “@” is the operator specified by the variable op. See upc_op_t(3c) for valid values of op.
The upc_all_prefix_reduceT function reduces nelems values located at shared memory area src into nelems partial results at dst. In other words, on completion of upc_all_prefix_reduceT, dst[0] = src[0] and dst[i] = dst[i - 1] @ src[i] where “@” is the operator specified by the variable op. See upc_op_t(3c) for valid values of op.
EXAMPLES
Example 1: upc_all_reduce of type long UPC_ADD.
#include <upc.h>
#include <upc_collective.h>
#define BLK_SIZE 3
#define NELEMS 10
shared [BLK_SIZE] long A[NELEMS*THREADS];
shared long B
// Initialize A. The result below is defined only on thread 0.
upc_barrier;
upc_all_reduceL( &B, A, UPC_ADD, NELEMS*THREADS, BLK_SIZE,
NULL, UPC_IN_NOSYNC | UPC_OUT_NOSYNC );
upc_barrier;
Example 2: upc_all_prefix_reduce of type long UPC_ADD.
#include <upc.h>
#include <upc_collective.h>
#define BLK_SIZE 3
#define NELEMS 10
shared [BLK_SIZE] long A[NELEMS*THREADS];
shared [BLK_SIZE] long B[NELEMS*THREADS];
// Initialize A.
upc_barrier;
upc_all_prefix_reduceL( B, A, UPC_ADD, NELEMS*THREADS, BLK_SIZE,
NULL, UPC_IN_ALLSYNC | UPC_OUT_ALLSYNC );
upc_barrier;
SEE ALSO
intro_pgas(7), upc_all_broadcast(3c), upc_all_exchange(3c), upc_all_gather(3c), upc_all_gather_all(3c), upc_all_permute(3c), upc_all_scatter(3c),**upc_flag_t**(3c), upc_op_t(3c)