RAJA Support in Gdb4hpc

Gdb4hpc includes support for debugging a parallel program that uses the RAJA Performance Portability Layer (https://raja.readthedocs.io/en/develop/).

Printing RAJA::Views

Starting in Gdb4hpc 4.15.0, Gdb4hpc supports printing values wrapped in RAJA::Views.

Since data contained in views tends to be large, Gdb4hpc will not print the entire contents of a RAJA::View by default. Instead, it will print the bounds of the views and instructions on how to access the view’s data with Gdb4hpc’s array range .. syntax.

# a 2D RAJA::View
dbg all> whatis Aview
a{0}: RAJA::View<double, RAJA::detail::LayoutBase_impl<camp::int_seq<long, 0l, 1l>, int, -1l> >

dbg all> p Aview
a{0}: RAJA::View; Use (0..3, 0..3) for full contents.

dbg all> p Aview(0..3, 0..3)
a{0}: {{1,1,1,1},{2,2,2,2},{3,3,3,3},{4,4,4,4}}

Like other array-like objects, the .. range syntax can be used to view a subset of the data:

dbg all> p Aview(2..3, 0..3)
a{0}: {{3,3,3,3},{4,4,4,4}}

dbg all> p Aview(0..3, 2..3)
a{0}: {{1,1},{2,2},{3,3},{4,4}}

Ranges are inclusive.

Unlike other array-like objects, views are indexed with parenthesis () instead of brackets []. This more closely matches how RAJA::Views work in actual source code.

Supported Variations

Gdb4hpc supports printing the following types of RAJA::Views.

Normal Views

Gdb4hpc supports printing normal views, created without any special permutation or offset syntax.

Example
// in a .cpp file
RAJA::View< int, RAJA::Layout<2, int> > view_2D(a, Nx, Ny);
# in gdb4hpc

dbg all> whatis Aview
a{0}: RAJA::View<double, RAJA::detail::LayoutBase_impl<camp::int_seq<long, 0l, 1l>, int, -1l> >

dbg all> p Aview
a{0}: RAJA::View; Use (0..3, 0..3) for full contents.

dbg all> p Aview(0..3, 0..3)
a{0}: {{1,1,1,1},{2,2,2,2},{3,3,3,3},{4,4,4,4}}

Permuted Views

Gdb4hpc supports printing permuted views.

Example
// in a .cpp file
std::array<RAJA::idx_t, 3> perm3a {{2, 1, 0}};
RAJA::Layout< 3, int > perm3a_layout =
  RAJA::make_permuted_layout( {{Nx, Ny, Nz}}, perm3a);
RAJA::View< int, RAJA::Layout<3, int> > perm3a_view_3D(a, perm3a_layout);
# in gdb4hpc

dbg all> whatis perm3a_view_3D
a{0}: RAJA::View<int, RAJA::detail::LayoutBase_impl<camp::int_seq<long, 0l, 1l, 2l>, int, -1l> >

dbg all> p perm3a_view_3D
a{0}: RAJA::View; Use (0..2, 0..4, 0..1) for full contents.

dbg all> p perm3a_view_3D(0..2, 0..4, 0..1)
a{0}: {{{0,15},{3,18},{6,21},{9,24},{12,27}},{{1,16},{4,19},{7,22},{10,25},{13,28}},{{2,17},{5,20},{8,23},{11,26},{14,29}}}

Offset Views

Gdb4hpc supports printing offset views.

Example
// in a .cpp file
RAJA::OffsetLayout<2, int> offlayout_2D =
  RAJA::make_offset_layout<2, int>( {{-1, -5}}, {{2, 5}} );

RAJA::View< int, RAJA::OffsetLayout<2, int> > aoview_2Doff(ao,
                                                           offlayout_2D);
# in gdb4hpc

dbg all> whatis aoview_2Doff
a{0}: RAJA::View<int, RAJA::OffsetLayout<2ul, int> >

dbg all> p aoview_2Doff
a{0}: RAJA::View; Use (-1..1, -5..4) for full contents.

dbg all> p aoview_2Doff(-1..1, -5..4)
a{0}: {{0,1,2,3,4,5,6,7,8,9},{10,11,12,13,14,15,16,17,18,19},{20,21,22,23,24,25,26,27,28,29}}

Using RAJA::Views in Decompositions

Gdb4hpc has a decomposition feature which allows the user to logically combine and divide data that is in reality distributed across multiple ranks. Gdb4hpc supports using RAJA::Views with decompositions.

Example

For the following example, suppose you have a 4-rank application. Each rank has a one dimensional RAJA::View 60 elements long. The name of the view is view_1D. Rank 1 stores the numbers {0, 1, 2, ..., 59}, rank 2 stores {0, 10, 20, ..., 590}, rank 3 {0, 100, ...}, etc.

We can use the Gdb4hpc decomposition command to concatenate each array into a 240 element long logical array:

# in gdb4hpc

# (printout abbreviated)
dbg all> p view_1D(0..59)
a{0}: {0,1,2,3,4,5,etc...}
a{1}: {0,10,20,30,40,50,etc...}
a{2}: {0,100,200,300,400,500,etc...}
a{3}: {0,1000,2000,3000,4000,5000,etc...}

# create a decomposition called "concat" that is 240 elements long, split across 4 ranks
dbg all> decomposition $concat 240/4

# apply the decomposition to view_1D. note that the (0..59) suffix is no longer required
# (printout abbreviated)
dbg all> p $concat{view_1D}
{0,1,2,3,4,5,6, ... ,57,58,59,0,10,20,30,40,50,60, ... ,570,580,590,0,100,200,300,400,500,600, ... ,5700,5800,5900,0,1000,2000,3000,4000,5000,6000, ... ,57000,58000,59000}

Decompositions can used to handle data in more ways than what was shown here, and RAJA::Views are supported in all of them. See the Tutorial for more details on using decompositions.

Printing RAJA Reductions

RAJA reductions are used to reduce large vectors into single values. Common operations are min, max, and sum.

Gdb4hpc supports printing RAJA reductions.

Example

// in a .cpp file

RAJA::ReduceSum<REDUCE_POL1, int> seq_sum(0);
RAJA::ReduceMin<REDUCE_POL1, int> seq_min(std::numeric_limits<int>::max());
RAJA::ReduceMax<REDUCE_POL1, int> seq_max(std::numeric_limits<int>::min());
RAJA::ReduceMinLoc<REDUCE_POL1, int> seq_minloc(std::numeric_limits<int>::max(), -1);
RAJA::ReduceMaxLoc<REDUCE_POL1, int> seq_maxloc(std::numeric_limits<int>::min(), -1);

RAJA::forall<EXEC_POL1>(arange, [=](int i) {
  seq_sum += a[i];

  seq_min.min(a[i]);
  seq_max.max(a[i]);

  seq_minloc.minloc(a[i], i);
  seq_maxloc.maxloc(a[i], i);
});

std::cout << "\tsum = " << seq_sum.get() << std::endl;
std::cout << "\tmin = " << seq_min.get() << std::endl;
std::cout << "\tmax = " << seq_max.get() << std::endl;
std::cout << "\tmin, loc = " << seq_minloc.get() << " , "
                             << seq_minloc.getLoc() << std::endl;
std::cout << "\tmax, loc = " << seq_maxloc.get() << " , "
                             << seq_maxloc.getLoc() << std::endl;
# in gdb4hpc

dbg all> p seq_sum
a{0}: {RAJA::reduce::detail::BaseReduceSum<int, RAJA::detail::ReduceSeq> = {RAJA::reduce::detail::BaseReduce<int, RAJA::reduce::sum, RAJA::detail::ReduceSeq> = {c = {RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::sum<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::sum<int> > > = {parent = (RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::sum<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::sum<int> > >*) 0x0, identity = 0, my_data = 1}}}}}

dbg all> p seq_sum.get()
a{0}: 1

dbg all> p seq_min
a{0}: {RAJA::reduce::detail::BaseReduceMin<int, RAJA::detail::ReduceSeq> = {RAJA::reduce::detail::BaseReduce<int, RAJA::reduce::min, RAJA::detail::ReduceSeq> = {c = {RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::min<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::min<int> > > = {parent = (RAJA::reduce::detail::BaseCombinable<int, RAJA::reduce::min<int>, RAJA::detail::ReduceSeq<int, RAJA::reduce::min<int> > >*) 0x0, identity = 2147483647, my_data = -100}}}}}

dbg all> p seq_min.get()
a{0}: -100

dbg all> p seq_minloc
a{0}: {RAJA::reduce::detail::BaseReduceMinLoc<int, long, RAJA::detail::ReduceSeq> = {RAJA::reduce::detail::BaseReduce<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min, RAJA::detail::ReduceSeq> = {c = {RAJA::reduce::detail::BaseCombinable<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> >, RAJA::detail::ReduceSeq<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> > > > = {parent = (RAJA::reduce::detail::BaseCombinable<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> >, RAJA::detail::ReduceSeq<RAJA::reduce::detail::ValueLoc<int, long, true>, RAJA::reduce::min<RAJA::reduce::detail::ValueLoc<int, long, true> > > >*) 0x0, identity = {val = 2147483647, loc = -1}, my_data = {val = -100, loc = 500000}}}}}}

dbg all> p seq_minloc.get()
a{0}: {val = -100, loc = 500000}

dbg all> p seq_minloc.getLoc()
a{0}: 500000

dbg all> p a[seq_minloc.getLoc()]
a{0}: -100

Printing RAJA::LocalArrays

RAJA::LocalArrays are used to to store data in CPU stack-allocated or GPU thread local memory. Most often, they are used for tiling operations.

Gdb4hpc supports printing RAJA::LocalArray objects.

RAJA::LocalArrays are implemented as RAJA::View objects, so working with them in Gdb4hpc is the same as working with RAJA::Views.

Example

See the RAJA Tiled Matrix Transpose with Local Array example for more details.

https://raja.readthedocs.io/en/develop/sphinx/user_guide/tutorial/matrix_transpose_local_array.html

// in a .cpp file

using TILE_MEM =
  RAJA::LocalArray<int, RAJA::Perm<0, 1>, RAJA::SizeList<TILE_DIM, TILE_DIM>>;
TILE_MEM Tile_Array;

using SEQ_EXEC_POL_I =
  RAJA::KernelPolicy<
    RAJA::statement::Tile<1, RAJA::tile_fixed<TILE_DIM>, RAJA::loop_exec,
      RAJA::statement::Tile<0, RAJA::tile_fixed<TILE_DIM>, RAJA::loop_exec,

        RAJA::statement::InitLocalMem<RAJA::cpu_tile_mem, RAJA::ParamList<2>,

        RAJA::statement::ForICount<1, RAJA::statement::Param<0>, RAJA::loop_exec,
          RAJA::statement::ForICount<0, RAJA::statement::Param<1>, RAJA::loop_exec,
            RAJA::statement::Lambda<0>
          >
        >,

        RAJA::statement::ForICount<0, RAJA::statement::Param<1>, RAJA::loop_exec,
          RAJA::statement::ForICount<1, RAJA::statement::Param<0>, RAJA::loop_exec,
            RAJA::statement::Lambda<1>
          >
        >

        >
      >
    >
  >;

RAJA::kernel_param<SEQ_EXEC_POL_I>(
  RAJA::make_tuple(RAJA::TypedRangeSegment<int>(0, N_c),
                   RAJA::TypedRangeSegment<int>(0, N_r)),

  RAJA::make_tuple((int)0, (int)0, Tile_Array),

  [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array) {
    Tile_Array(ty, tx) = Aview(row, col);
  },

  [=](int col, int row, int tx, int ty, TILE_MEM &Tile_Array) {
    Atview(col, row) = Tile_Array(ty, tx);
  }
);
# in gdb4hpc

dbg all> p Tile_Array
a{0}: RAJA::View; Use (0..15, 0..15) for full contents.

dbg all> p Tile_Array(1, 0..15)
a{0}: {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}

dbg all> p Tile_Array(0..15, 0..15)
a{0}: {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},{2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2},{3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3},{4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4},{5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5},{6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6},{7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7},{8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8},{9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9},{10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10},{11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11},{12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12},{13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13},{14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14},{15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15}}

More Tools

Gdb4hpc treats RAJA views in the same way that it treats arrays. This means that anything described in “Handling Arrays” applies to RAJA::Views too.

For example, you can dump a large RAJA::View to a file like this:

dbg all> pipe p big_raja_view | cat > big_raja_view.txt
dbg all> shell wc -c big_raja_view.txt
988 myfile.txt

See “Handling Arrays” for more.