blockable
- Date:
09-08-2014
NAME
blockable - specifies that it is legal to cache block the subsequent loops
SYNOPSIS
!DIR$ BLOCKABLE (do_variable,do_variable [,do_variable]... )
#pragma _CRI blockable(num_loops)
IMPLEMENTATION
Cray Linux Environment (CLE)
DESCRIPTION
The BLOCKABLE directive specifies that it is legal and desirable to cache block the subsequent loop nest, even when the compiler has not made such a determination. To be legally blockable, the nest must be perfect (without code between constituent loops), rectangular (trip counts of member loops are fixed over the life time of nest), and fully permutable (loop interchange and unrolling is legal at all levels). This directive both permits and requests blocking of the indicated loop nest.
The Fortran directive arguments are a comma-delimited list of two or more loop control variables, do_variable.
The C directive argument is the number of subsequent loops to be blocked, num_loops.
If a BLOCKINGSIZE directive is also provided for the indicated loop, the following rules apply:
If blockingsize at least two, the indicated blockingsize is used.
If blockingsize is zero, the loop itself is not blocked and its is treated as an inner loop (as part of the nest that traverses the cache block tile).
If blockingsize is one, the loop itself is not blocked and it is treated as an outer loop (as a loop in the nest that moves from tile to tile).When no blockingsize directive is supplied the compiler chooses the blockingsize according to its own heuristics.
EXAMPLES
Example 1: blockable and blockingsize Directives
% cat blk.c
#define N 1000
float A[N][N];
float B[N][N];
void
func(int n)
{
#pragma _CRI blockable(2)
#pragma _CRI blockingsize( 32 )
for (int i = 2; i <= N-1; ++i) {
#pragma _CRI blockingsize( 128 )
for (int j = 2; j <= N-1; ++j) {
A[i][j] = B[i-1][j-1]
+ B[i-1][j+1]
+ B[i+1][j-1]
+ B[i+1][j+1];
}
}
}
% cc -c -hlist=md blk.c
% cat blk.lst
...
7. func(int n)
8. {
9. #pragma _CRI blockable(2)
10. #pragma _CRI blockingsize( 32 )
11. + b-------< for (int i = 2; i <= N-1; ++i) {
12. b #pragma _CRI blockingsize( 128 )
13. b Vbr4--< for (int j = 2; j <= N-1; ++j) {
14. b Vbr4 A[i][j] = B[i-1][j-1]
15. b Vbr4 + B[i-1][j+1]
16. b Vbr4 + B[i+1][j-1]
17. b Vbr4 + B[i+1][j+1];
18. b Vbr4--> }
19. b-------> }
20. }
CC-6294 CC: VECTOR File = blk.c, Line = 11
A loop was not vectorized because a better candidate was found at line 13.
CC-6051 CC: SCALAR File = blk.c, Line = 11
A loop was blocked according to user directive with block size 32.
CC-6051 CC: SCALAR File = blk.c, Line = 13
A loop was blocked according to user directive with block size 128.
...
Example 2: noblocking Directive
Change the value of N in the previous example from 1000 to 999999, and modify func() as shown below. Compile with -hlist=md to see automatic blocking. In this example, blocking will not occur, as it takes at least two loops to cache block.
Blocking sizes 0 and 1 allow loops to “participate” in blocking without being themselves blocked.
func(int n)
{
for (int i = 2; i <= N-1; ++i) {
#pragma _CRI noblocking
for (int j = 2; j <= N-1; ++j) {
...
Example 3: blockingsize 0 Directive Followed by its Equivalent
% cat ex0.f90
subroutine EX0(A, B, n)
real A(n,n), B(n,n)
!dir$ blockable(i,j)
!dir$ blockingsize(0)
do j = 1, n-1
!dir$ blockingsize(512)
do i = 1, n
A(i,j) = B(i,j) + B(i,j+1)
enddo
enddo
end subroutine EX0
% cat ex0m.f90
subroutine EX0m(A, B, n)
real A(n,n), B(n,n)
do is = 1, n, 512
do j = 1, n-1
do i = is, min( n, is+511 )
A(i,j) = B(i,j) + B(i,j+1)
enddo
enddo
enddo
end subroutine EX0m
Notice that the j-loop remains undivided as it traverses the tile, while the i-loop is split into an outer loop (over tiles) and an inner loop (within a tile).
Example 4: blockingsize 1 Directive Followed by its Equivalent
% cat ex1.f90
subroutine EX1(A, B, n)
real A(n,n), B(n,n)
!dir$ blockable(i,j)
!dir$ blockingsize(512)
do j = 1, n
!dir$ blockingsize(1)
do i = 1, n-1
A(j,i) = B(j,i) + B(j,i+1)
enddo
enddo
end subroutine EX1
% cat ex1m.f90
subroutine EX1m(A, B, n)
real A(n,n), B(n,n)
do js = 1, n, 512
do i = 1, n-1
do j = js, min( n, js+511 )
A(j,i) = B(j,i) + B(j,i+1)
enddo
enddo
enddo
end subroutine EX1m
Notice that blockingsize(1) is applied to an inner loop, while blockingsize(0) typically is used for outer loops.
Example 5: blockingsize >1 at Both Levels, Followed by Equivalent
% cat ex2.f90
subroutine EX2(A, B, n)
real A(n,n), B(n,n)
!dir$ blockable(i,j)
!dir$ blockingsize(32)
do j = 1, n-1
!dir$ blockingsize(128)
do i = 1, n-1
A(i,j) = B(i,j) + B(i+1,j) + B(i,j+1)
enddo
enddo
end subroutine EX2
% cat ex2m.f90
subroutine EX2(A, B, n)
real A(n,n), B(n,n)
do js = 1, n-1, 32
do is = 1, n-1, 128
do j = js, min( n-1, js+31 )
do i = is, min( n-1, is+127 )
A(i,j) = B(i,j) + B(i+1,j) + B(i,j+1)
enddo
enddo
enddo
enddo
end subroutine EX2
SEE ALSO
intro_directives(7)
blockingsize(7)