upc_all_reduce

Date:

08-26-2014

NAME

upc_all_reduce, upc_all_prefix_reduce - Collective computation operations

SYNOPSIS

#include <upc.h>
#include <upc_collective.h>

void upc_all_reduceT( shared void * restrict dst, shared const void * restrict src,
                                             upc_op_t op, size_t nelems, size_t blk_size,
                                             TYPE(*func)(TYPE, TYPE), upc_flag_t flags )

void upc_all_prefix_reduceT( shared void * restrict dst, shared const void * restrict src,
                                             upc_op_t op, size_t nelems, size_t blk_size,
                                             TYPE(*func)(TYPE, TYPE), upc_flag_t flags )

IMPLEMENTATION

Cray Linux Environment (CLE)

DESCRIPTION

The function prototypes above represents the 22 variations of the upc_all_reduceT and upc_all_prefix_reduceT functions where T and TYPE have the following correspondences:

----------------------
T     TYPE
----------------------
C     signed char
UC    unsigned char
S     signed short
US    unsigned short
I     signed int
UI    unsigned int
L     signed long
UL    unsigned long
F     float
D     double
LD    long double
----------------------

The upc_all_reduceT function reduces nelems values located at shared memory area src into exactly one result located at dst. In other words, on completion of upc_all_reduceT, dst[0] = src[0] @ src[1] @@ src[nelems - 1] where “@” is the operator specified by the variable op. See upc_op_t(3c) for valid values of op.

The upc_all_prefix_reduceT function reduces nelems values located at shared memory area src into nelems partial results at dst. In other words, on completion of upc_all_prefix_reduceT, dst[0] = src[0] and dst[i] = dst[i - 1] @ src[i] where “@” is the operator specified by the variable op. See upc_op_t(3c) for valid values of op.

EXAMPLES

Example 1: upc_all_reduce of type long UPC_ADD.

#include <upc.h>
#include <upc_collective.h>
#define BLK_SIZE 3
#define NELEMS 10
shared [BLK_SIZE] long A[NELEMS*THREADS];
shared long B
// Initialize A. The result below is defined only on thread 0.
upc_barrier;
upc_all_reduceL( &B, A, UPC_ADD, NELEMS*THREADS, BLK_SIZE,
                                             NULL, UPC_IN_NOSYNC | UPC_OUT_NOSYNC );
upc_barrier;

Example 2: upc_all_prefix_reduce of type long UPC_ADD.

#include <upc.h>
#include <upc_collective.h>
#define BLK_SIZE 3
#define NELEMS 10
shared [BLK_SIZE] long A[NELEMS*THREADS];
shared [BLK_SIZE] long B[NELEMS*THREADS];
// Initialize A.
upc_barrier;
upc_all_prefix_reduceL( B, A, UPC_ADD, NELEMS*THREADS, BLK_SIZE,
                                             NULL, UPC_IN_ALLSYNC | UPC_OUT_ALLSYNC );
upc_barrier;

SEE ALSO

intro_pgas(7), upc_all_broadcast(3c), upc_all_exchange(3c), upc_all_gather(3c), upc_all_gather_all(3c), upc_all_permute(3c), upc_all_scatter(3c),**upc_flag_t**(3c), upc_op_t(3c)