RAJA Support in Gdb4hpc

Gdb4hpc includes support for debugging a parallel program that uses the RAJA Performance Portability Layer (https://raja.readthedocs.io/en/develop/).

Printing `RAJA::View`s

Starting in Gdb4hpc 4.15.0, Gdb4hpc supports printing values wrapped in RAJA::Views.

Since data contained in views tends to be large, Gdb4hpc will not print the entire contents of a RAJA::View by default. Instead, it will print the bounds of the views and instructions on how to access the view’s data with Gdb4hpc’s array range .. syntax.

# a 2D RAJA::View
dbg all> whatis Aview
a{0}: RAJA::View<double, RAJA::detail::LayoutBase_impl<camp::int_seq<long, 0l, 1l>, int, -1l> >

dbg all> p Aview
a{0}: RAJA::View; Use (0..3, 0..3) for full contents.

dbg all> p Aview(0..3, 0..3)
a{0}: {{1,1,1,1},{2,2,2,2},{3,3,3,3},{4,4,4,4}}

Like other array-like objects, the .. range syntax can be used to view a subset of the data:

dbg all> p Aview(2..3, 0..3)
a{0}: {{3,3,3,3},{4,4,4,4}}

dbg all> p Aview(0..3, 2..3)
a{0}: {{1,1},{2,2},{3,3},{4,4}}

Ranges are inclusive.

Unlike other array-like objects, views are indexed with parenthesis () instead of brackets []. This more closely matches how RAJA::Views work in actual source code.

Supported Variations

Gdb4hpc supports printing the following types of RAJA::Views.

Normal Views

Gdb4hpc supports printing normal views, created without any special permutation or offset syntax.

Example

// in a .cpp file
RAJA::View< int, RAJA::Layout<2, int> > view_2D(a, Nx, Ny);

# in gdb4hpc

dbg all> whatis Aview
a{0}: RAJA::View<double, RAJA::detail::LayoutBase_impl<camp::int_seq<long, 0l, 1l>, int, -1l> >

dbg all> p Aview
a{0}: RAJA::View; Use (0..3, 0..3) for full contents.

dbg all> p Aview(0..3, 0..3)
a{0}: {{1,1,1,1},{2,2,2,2},{3,3,3,3},{4,4,4,4}}

Permuted Views

Gdb4hpc supports printing permuted views.

Example

// in a .cpp file
std::array<RAJA::idx_t, 3> perm3a {{2, 1, 0}};
RAJA::Layout< 3, int > perm3a_layout =
  RAJA::make_permuted_layout( {{Nx, Ny, Nz}}, perm3a);
RAJA::View< int, RAJA::Layout<3, int> > perm3a_view_3D(a, perm3a_layout);

# in gdb4hpc

dbg all> whatis perm3a_view_3D
a{0}: RAJA::View<int, RAJA::detail::LayoutBase_impl<camp::int_seq<long, 0l, 1l, 2l>, int, -1l> >

dbg all> p perm3a_view_3D
a{0}: RAJA::View; Use (0..2, 0..4, 0..1) for full contents.

dbg all> p perm3a_view_3D(0..2, 0..4, 0..1)
a{0}: {{{0,15},{3,18},{6,21},{9,24},{12,27}},{{1,16},{4,19},{7,22},{10,25},{13,28}},{{2,17},{5,20},{8,23},{11,26},{14,29}}}

Offset Views

Gdb4hpc supports printing offset views.

Example

// in a .cpp file
RAJA::OffsetLayout<2, int> offlayout_2D =
  RAJA::make_offset_layout<2, int>( {{-1, -5}}, {{2, 5}} );

RAJA::View< int, RAJA::OffsetLayout<2, int> > aoview_2Doff(ao,
                                                           offlayout_2D);

# in gdb4hpc

dbg all> whatis aoview_2Doff
a{0}: RAJA::View<int, RAJA::OffsetLayout<2ul, int> >

dbg all> p aoview_2Doff
a{0}: RAJA::View; Use (-1..1, -5..4) for full contents.

dbg all> p aoview_2Doff(-1..1, -5..4)
a{0}: {{0,1,2,3,4,5,6,7,8,9},{10,11,12,13,14,15,16,17,18,19},{20,21,22,23,24,25,26,27,28,29}}

Using `RAJA::View`s in Decompositions

Gdb4hpc has a decomposition feature which allows the user to logically combine and divide data that is in reality distributed across multiple ranks. Gdb4hpc supports using RAJA::Views with decompositions.

Example

For the following example, suppose you have a 4-rank application. Each rank has a one dimensional RAJA::View 60 elements long. The name of the view is view_1D. Rank 1 stores the numbers {0, 1, 2, ..., 59}, rank 2 stores {0, 10, 20, ..., 590}, rank 3 {0, 100, ...}, etc.

We can use the Gdb4hpc decomposition command to concatenate each array into a 240 element long logical array:

# in gdb4hpc

# (printout abbreviated)
dbg all> p view_1D(0..59)
a{0}: {0,1,2,3,4,5,etc...}
a{1}: {0,10,20,30,40,50,etc...}
a{2}: {0,100,200,300,400,500,etc...}
a{3}: {0,1000,2000,3000,4000,5000,etc...}

# create a decomposition called "concat" that is 240 elements long, split across 4 ranks
dbg all> decomposition $concat 240/4

# apply the decomposition to view_1D. note that the (0..59) suffix is no longer required
# (printout abbreviated)
dbg all> p $concat{view_1D}
{0,1,2,3,4,5,6, ... ,57,58,59,0,10,20,30,40,50,60, ... ,570,580,590,0,100,200,300,400,500,600, ... ,5700,5800,5900,0,1000,2000,3000,4000,5000,6000, ... ,57000,58000,59000}

Decompositions can used to handle data in more ways than what was shown here, and RAJA::Views are supported in all of them. See the Tutorial for more details on using decompositions.

Printing RAJA Reductions

RAJA reductions are used to reduce large vectors into single values. Common operations are min, max, and sum.

Gdb4hpc supports printing RAJA reductions.

Example

// in a .cpp file

RAJA::ReduceSum<REDUCE_POL1, int> seq_sum(0);
RAJA::ReduceMin<REDUCE_POL1, int> seq_min(std::numeric_limits<int>::max());
RAJA::ReduceMax<REDUCE_POL1, int> seq_max(std::numeric_limits<int>::min());
RAJA::ReduceMinLoc<REDUCE_POL1, int> seq_minloc(std::numeric_limits<int>::max(), -1);
RAJA::ReduceMaxLoc<REDUCE_POL1, int> seq_maxloc(std::numeric_limits<int>::min(), -1);

RAJA::forall<EXEC_POL1>(arange, [=](int i) {
  seq_sum += a[i];

  seq_min.min(a[i]);
  seq_max.max(a[i]);

  seq_minloc.minloc(a[i], i);
  seq_maxloc.maxloc(a[i], i);
});

std::cout << "\tsum = " << seq_sum.get() << std::endl;
std::cout << "\tmin = " << seq_min.get() << std::endl;
std::cout << "\tmax = " << seq_max.get() << std::endl;
std::cout << "\tmin, loc = " << seq_minloc.get() << " , "
                             << seq_minloc.getLoc() << std::endl;
std::cout << "\tmax, loc = " << seq_maxloc.get() << " , "
                             << seq_maxloc.getLoc() << std::endl;

# in gdb4hpc

dbg all> p seq_sum
a{0}: {RAJA::reduce::detail::BaseReduceSum<int, RAJA::detail::ReduceSeq> = {RAJA::reduce::detail::BaseReduce<int, RAJA::reduce::sum, RAJA::detail::ReduceSeq> = {c = {RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::sum<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::sum<int> > > = {parent = (RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::sum<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::sum<int> > >*) 0x0, identity = 0, my_data = 1}}}}}

dbg all> p seq_sum.get()
a{0}: 1

dbg all> p seq_min
a{0}: {RAJA::reduce::detail::BaseReduceMin<int, RAJA::detail::ReduceSeq> = {RAJA::reduce::detail::BaseReduce<int, RAJA::reduce::min, RAJA::detail::ReduceSeq> = {c = {RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::min<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::min<int> > > = {parent = (RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::min<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::min<int> > >*) 0x0, identity = 2147483647, my_data = -100}}}}}

dbg all> p seq_min.get()
a{0}: -100

dbg all> p seq_minloc
a{0}: {RAJA::reduce::detail::BaseReduceMinLoc<int, long, RAJA::detail::ReduceSeq> = {RAJA::reduce::detail::BaseReduce<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min, RAJA::detail::ReduceSeq> = {c = {RAJA::reduce::detail::BaseCombinable<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> >, RAJA::detail::ReduceSeq<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> > > > = {parent = (RAJA::reduce::detail::BaseCombinable<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> >, RAJA::detail::ReduceSeq<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> > > >*) 0x0, identity = {val = 2147483647, loc = -1}, my_data = {val = -100, loc = 500000}}}}}}

dbg all> p seq_minloc.get()
a{0}: {val = -100, loc = 500000}

dbg all> p seq_minloc.getLoc()
a{0}: 500000

dbg all> p a[seq_minloc.getLoc()]
a{0}: -100

Printing `RAJA::LocalArray`s

RAJA::LocalArrays are used to to store data in CPU stack-allocated or GPU thread local memory. Most often, they are used for tiling operations.

Gdb4hpc supports printing RAJA::LocalArray objects.

RAJA::LocalArrays are implemented as RAJA::View objects, so working with them in Gdb4hpc is the same as working with RAJA::Views.

Example

See the RAJA Tiled Matrix Transpose with Local Array example for more details.

https://raja.readthedocs.io/en/develop/sphinx/user_guide/tutorial/matrix_transpose_local_array.html

// in a .cpp file

using TILE_MEM =
  RAJA::LocalArray<int, RAJA::Perm<0, 1>, RAJA::SizeList<TILE_DIM, TILE_DIM>>;
TILE_MEM Tile_Array;

using SEQ_EXEC_POL_I =
  RAJA::KernelPolicy<
    RAJA::statement::Tile<1, RAJA::tile_fixed<TILE_DIM>, RAJA::loop_exec,
      RAJA::statement::Tile<0, RAJA::tile_fixed<TILE_DIM>, RAJA::loop_exec,

        RAJA::statement::InitLocalMem<RAJA::cpu_tile_mem, RAJA::ParamList<2>,

        RAJA::statement::ForICount<1, RAJA::statement::Param<0>, RAJA::loop_exec,
          RAJA::statement::ForICount<0, RAJA::statement::Param<1>, RAJA::loop_exec,
            RAJA::statement::Lambda<0>
          >
        >,

        RAJA::statement::ForICount<0, RAJA::statement::Param<1>, RAJA::loop_exec,
          RAJA::statement::ForICount<1, RAJA::statement::Param<0>, RAJA::loop_exec,
            RAJA::statement::Lambda<1>
          >
        >

        >
      >
    >
  >;

RAJA::kernel_param<SEQ_EXEC_POL_I>(
  RAJA::make_tuple(RAJA::TypedRangeSegment<int>(0, N_c),
                   RAJA::TypedRangeSegment<int>(0, N_r)),

  RAJA::make_tuple((int)0, (int)0, Tile_Array),

  [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array) {
    Tile_Array(ty, tx) = Aview(row, col);
  },

  [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array) {
    Atview(col, row) = Tile_Array(ty, tx);
  }
);

# in gdb4hpc

dbg all> p Tile_Array
a{0}: RAJA::View; Use (0..15, 0..15) for full contents.

dbg all> p Tile_Array(1, 0..15)
a{0}: {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}

dbg all> p Tile_Array(0..15, 0..15)
a{0}: {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},{2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2},{3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3},{4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4},{5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5},{6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6},{7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7},{8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8},{9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9},{10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10},{11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11},{12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12},{13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13},{14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14},{15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15}}

More Tools

Gdb4hpc treats RAJA views in the same way that it treats arrays. This means that anything described in “Handling Arrays” applies to RAJA::Views too.

For example, you can dump a large RAJA::View to a file like this:

dbg all> pipe p big_raja_view | cat > big_raja_view.txt
dbg all> shell wc -c big_raja_view.txt
988 myfile.txt

See “Handling Arrays” for more.

RAJA Support in Gdb4hpc

Printing RAJA::Views

Supported Variations

Normal Views

Example

Permuted Views

Example

Offset Views

Example

Using RAJA::Views in Decompositions

Example

Printing RAJA Reductions

Example

Printing RAJA::LocalArrays

Example

More Tools

Printing `RAJA::View`s

Using `RAJA::View`s in Decompositions

Printing `RAJA::LocalArray`s