buffered_async

Date:

11-15-2018

NAME

buffered_async - Batch PGAS operations into bulk data transfers

SYNOPSIS

#pragma pgas buffered_async
!DIR$ PGAS BUFFERED_ASYNC

IMPLEMENTATION

Cray Linux Environment (CLE) on Cray XC systems

DESCRIPTION

PGAS data references made by the single statement immediately following the pgas buffered_async directive will be batched into bulk data transfers.

Before using this directive, the user should port their code to use the defer_sync directive.

No ordering or correctness guarantees between buffered async (BA) and non-BA references are made. No ordering guarantees between BA references are made. Users should insert a fence or barrier if they require ordering guarantees. This directive will allow the compiler to violate language ordering semantics.

Both fence and barrier imply global visibility for BA references. It is the user’s responsibility to ensure BA references do not target overlapping memory.

No automatic progress guarantees are made. The only way to guarantee progress is if both the source and target are actively making BA references or are inside a barrier/fence. User implemented spin-wait routines may encounter deadlock.

The purpose of the buffered_async directive is to achieve higher performance by batching small references into bulk data transfers. This should only be applied to references targeting non-contiguous irregular memory where the compiler is unable to pattern match to an optimized communication pattern.

Care should be taken to ensure many thousands of BA operations take place before a fence. There is overhead added to achieve bulk data transfers.

Using BA references may greatly increase the application’s memory footprint. More information about controlling internal buffer sizes can be found in the intro_pgas(7) manpage.

EXAMPLES

Example 1: UPC random GETs

void bulk_get( int *val, shared int *table, int size, int tabsize, int VLEN ) {
  int i, j;
  for ( i=0; i < VLEN; i += VLEN ) {
    for ( j=0; (j < VLEN) && ((i+j) < size); ++j ) {
      #pragma pgas buffered_async
      val[i+j] = table[rand() & (tabsize-1)];
    }
  }
}

SEE ALSO

intro_directives(7), defer_sync(7) intro_pgas(7)