Apprentice 3

Author:

Hewlett Packard Enterprise Development LP.

Copyright:

Copyright 2024 Hewlett Packard Enterprise Development LP.

Overview

Apprentice 3 is graphical application for exploring the results of an HPE perftools experiment. The linux client is available within the perftools-base module, and is also packaged with installers to run directly on mac or windows.

Features

Apprentice 3 currently has multiple views for the experiment results:

  • An interactive report generator with over 100 tables focusing on overall performance, gpu usage, data flow, loops, I/O to identify multiple kinds of bottlenecks.

  • A flame graph view to relate the time usage to the call-tree of the program as a whole.

  • A timeline view showing gpu performance information against the program call-stack on every thread at every moment through the length of the run.

Relationship to Apprentice 2

Currently, Apprentice 3 only contains the new or updated features. It will ultimately contain all the data view in Apprentice 2 and will supplant it. In the meantime, Apprentice 2 and Apprentice 3 are both packaged and a user may need to switch back and forth to access some features.

Apprentice 2 is not deprecated, the same experiment data can be viewed via either application. Both use file with the “.ap2” suffix.

Getting started

Redirecting X

If you have a good connection to the host machine, running Apprentice 3 and redirecting the output via X can work reasonably well:

ssh -Y -C myhostname
module load perftools-base
app3

Your setup be slightly different; the “-C” option enables compression, which helps performance.

Using a remote desktop

If your host supports the vncserver, using a remote desktop will give better graphics performance. Instructions for setting one up are beyond the scope of this document; consult your local admin.

Installing a client on your local machine

The desktop client installers for mac and windows are installed as part of the perftools package. If you’ve loaded the perftools-base module, they can be accessed in this directory:

${CRAY_PERFTOOLS_PREFIX}/share/desktop_installers

Which will contain Apprentice3Installer-[version].dmg for the mac, and Apprentice3Installer-[version].exe for windows.

Download the file to your local machine and run the relevant installer. For the mac, we are working on getting a proper Apple signature, but you may need to “open” the .dmg file to allow it to be installed. Sorry, we hope to resolve by 3.0.1.

Running Apprentice 3

The experiment load screen

When running Apprentice 3, it immediately pops up a screen to choose which experiment to open.

../../_images/ExperimentLoader.png

Which contains from left to right the three ways to access a experiment:

  • On your local machine

    Clicking on the “Open” button brings up a usual file browser to navigate to the experiment directory.

  • From a remote machine via ssh

    • The one required entry in this section is the “Server” box, which is the host machine storing your experiment. This load does not understand hostname aliases you might have configured; you may need to provide the entire hostname path.

    • Username will be needed if your account name on the host is different.

    • Password will be necessary if you rely on a password to access the remote machine.

    • The “Browse” button here is referring to the location of your .ssh key if you use one to access the remote machine and it’s not in the usual location.

    • “Open” connects you to the remote host and brings up the file browser to navigate to the right location.

  • By referencing one you’ve recently accessed

    The right side of the windows contains recent experiments. You can hover over one to get the full details, or double click to load it.

    It can take a few seconds for the experiment window to display, but note that the title of this window changes to the name of the experiment.

jman

Depending on how you’re running, you OS may put up a dialog asking if you want to allow the application “jman” to run. jman is the server process for accessing your data; it has to be running for any perftools client to work.

The Experiment Window

When the experiment is loaded, the selection window will disappear, and the primary experiment screen will appear.

../../_images/Summary.png

The application shows the available views as selectable tabs: “Summary”, “Text Report”, “Flame Graph” and “Time Line” if time line information is available.

Apprentice 3 is set up a multi-document application, like Chrome for example. There are only a couple of menu options:

  • File/Open.. Bring up an experiment choosers window which will open another experiment window. Multiple experiments can be open at the same time.

  • File/Close.. Closes the current experiment window.

  • Help Points you to this file and shows the version number.

Summary View

The summary view has a pair of panels with:

  • Experiment Details Basic information about the system the experiment was run on, and the parameters used to run it.

  • Observations Contains a top level analysis of the experiment results including identifying potential bottlenecks, possible fixes, and pointers where to look further.

The divide between the two panels can be dragged to change the relative sizes.

Report View

The “Text Report” tab open access the host of performance tables.

../../_images/ReportView.png

This panel has four main elements:

  • The report pulldown menu.

    This gives access to the more than 100 report broken into themes. Note: Depending on the settings when running the experiment, not every one will be available. It will pop up a message if the necessary data was not gathered during the experiment.

    The pulldown also contains a section for table related to the currently display one.

  • The report table.

    The table supports the things you expect to be able to do with a browser table: widening, narrowing or reordering columns, and collapsing or expanding tree elements.

  • Disable/Enable Thresholds button.

    Toggles whether the table filters out very small entries.

  • Table Notes

    This is a detailed explanation of what is contained in the table.

    The panel divider can be dragged to grow or shrink the Table Notes area.

Flame Graph

The flame graph visualizes the time usage of the program aggregated into each distinct call stack.

../../_images/FlameGraph.png

The flame graph showing each function in a box scaled to the time spent in that function. Then every function it calls is put in a proportionally sized box above it.

In this example, nearly the entire run time is spent in the inner_ call, and much of the time in inner_ is spent in calls to sweep_, global_int_sum_, flux_err_ and several lesser contributors. You can get full name and more detailed information by hovering over each box.

The time spent exclusively inside the inner_ function is indicated by the amount of the box with nothing above it.

Clicking on a box recenters the display to center on that function. Clicking one global_in.. updates the flame graph into this:

../../_images/FlameGraphFocus.png

To widen the focus again, clicking on a box below the current focus.

You may notice that functions in the dispaly are color coded, so MPI function or synchronizing calls will be display in a different color.

Time Line

Panels

The time line view allows you to relate gpu activity against your running program for every thread, and lets you zoom in on the activity at any moment in the run.

../../_images/TimeLine.png

From top top bottom:

  • The location bar

    • PE selects the “program element”, the CPU process to display

    • TH selects the CPU thread on the current PE

    • Time shows the time of the center of the display range. You can

      edit this to recenter to a new interval

    • Func/Prev/Next lets you scroll between occurrences of a

      specific function. Note: this feature is not active in version 3.0.

  • The stack section

    This shows the graphical view of your program stack, each box shows the begin and end of a cpu function call. Hover over a box to get the details: function name, call start and end times

  • D:C:S Device Context Stream

    The left side on this row indicates the coordinates of the gpu threads associated with the current threads, listed as the indices of the device, context, and stream. “Context” and “stream” are generalized terms since GPU manufactures use their own nomenclatures.

    The right side of this row is color coded bar indicating the type of processing:

    • Grey : computation

    • Green : communication

    • Empty : idle

    Clicking a rectangle in this display highlights its corresponding cpu call, which will tend to be slightly earlier in the timeline due to device lag.

  • GPU activity

    Graph of the amount of GPU activity at each time. You can select:

    • Kernel compute activity

    • In data flow into the GPU

    • Out data flow out of the GPU

  • Navigation bars

    • Panning Bar : use to move the display interval at the same resolution. It moves back to the center when you let go of the mouse so you can pan farther.

    • Zoom : narrow or widen the view interval. The scale is logarithmic.

Lassoing an interval

Click and drag within the GPU activity area to set the focus interval directly.

../../_images/TimeLineNavigate.gif

Mouse scroll wheel

Use the scroll wheel to zoom in or out. If you have a two-axis scroll, you can pan with the second axis.