Apprentice 3
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2024 Hewlett Packard Enterprise Development LP.
Overview
Apprentice 3 is graphical application for exploring the results of an HPE perftools experiment. The linux client is available within the perftools-base module, and is also packaged with installers to run directly on mac or windows.
Features
Apprentice 3 currently has multiple views for the experiment results:
An interactive report generator with over 100 tables focusing on overall performance, gpu usage, data flow, loops, I/O to identify multiple kinds of bottlenecks.
A flame graph view to relate the time usage to the call-tree of the program as a whole.
A timeline view showing gpu performance information against the program call-stack on every thread at every moment through the length of the run.
Relationship to Apprentice 2
Currently, Apprentice 3 only contains the new or updated features. It will ultimately contain all the data view in Apprentice 2 and will supplant it. In the meantime, Apprentice 2 and Apprentice 3 are both packaged and a user may need to switch back and forth to access some features.
Apprentice 2 is not deprecated, the same experiment data can be viewed via either application. Both use file with the “.ap2” suffix.
Getting started
Redirecting X
If you have a good connection to the host machine, running Apprentice 3 and redirecting the output via X can work reasonably well:
ssh -Y -C myhostname
module load perftools-base
app3
Your setup be slightly different; the “-C” option enables compression, which helps performance.
Using a remote desktop
If your host supports the vncserver, using a remote desktop will give better graphics performance. Instructions for setting one up are beyond the scope of this document; consult your local admin.
Installing a client on your local machine
The desktop client installers for mac and windows are installed as part of the perftools package. If you’ve loaded the perftools-base module, they can be accessed in this directory:
${CRAY_PERFTOOLS_PREFIX}/share/desktop_installers
Which will contain Apprentice3Installer-[version].dmg
for the mac,
and Apprentice3Installer-[version].exe
for windows.
Download the file to your local machine and run the relevant installer. For the mac, we are working on getting a proper Apple signature, but you may need to “open” the .dmg file to allow it to be installed. Sorry, we hope to resolve by 3.0.1.
Running Apprentice 3
The experiment load screen
When running Apprentice 3, it immediately pops up a screen to choose which experiment to open.
Which contains from left to right the three ways to access a experiment:
On your local machine
Clicking on the “Open” button brings up a usual file browser to navigate to the experiment directory.
From a remote machine via ssh
The one required entry in this section is the “Server” box, which is the host machine storing your experiment. This load does not understand hostname aliases you might have configured; you may need to provide the entire hostname path.
Username will be needed if your account name on the host is different.
Password will be necessary if you rely on a password to access the remote machine.
The “Browse” button here is referring to the location of your .ssh key if you use one to access the remote machine and it’s not in the usual location.
“Open” connects you to the remote host and brings up the file browser to navigate to the right location.
By referencing one you’ve recently accessed
The right side of the windows contains recent experiments. You can hover over one to get the full details, or double click to load it.
It can take a few seconds for the experiment window to display, but note that the title of this window changes to the name of the experiment.
jman
Depending on how you’re running, you OS may put up a dialog asking if you want to allow the application “jman” to run. jman is the server process for accessing your data; it has to be running for any perftools client to work.
The Experiment Window
When the experiment is loaded, the selection window will disappear, and the primary experiment screen will appear.
The application shows the available views as selectable tabs: “Summary”, “Text Report”, “Flame Graph” and “Time Line” if time line information is available.
Apprentice 3 is set up a multi-document application, like Chrome for example. There are only a couple of menu options:
File/Open.. Bring up an experiment choosers window which will open another experiment window. Multiple experiments can be open at the same time.
File/Close.. Closes the current experiment window.
Help Points you to this file and shows the version number.
Summary View
The summary view has a pair of panels with:
Experiment Details Basic information about the system the experiment was run on, and the parameters used to run it.
Observations Contains a top level analysis of the experiment results including identifying potential bottlenecks, possible fixes, and pointers where to look further.
The divide between the two panels can be dragged to change the relative sizes.
Report View
The “Text Report” tab open access the host of performance tables.
This panel has four main elements:
The report pulldown menu.
This gives access to the more than 100 report broken into themes. Note: Depending on the settings when running the experiment, not every one will be available. It will pop up a message if the necessary data was not gathered during the experiment.
The pulldown also contains a section for table related to the currently display one.
The report table.
The table supports the things you expect to be able to do with a browser table: widening, narrowing or reordering columns, and collapsing or expanding tree elements.
Disable/Enable Thresholds button.
Toggles whether the table filters out very small entries.
Table Notes
This is a detailed explanation of what is contained in the table.
The panel divider can be dragged to grow or shrink the Table Notes area.
Flame Graph
The flame graph visualizes the time usage of the program aggregated into each distinct call stack.
The flame graph showing each function in a box scaled to the time spent in that function. Then every function it calls is put in a proportionally sized box above it.
In this example, nearly the entire run time is spent in the
inner_
call, and much of the time in inner_
is spent in calls to
sweep_
, global_int_sum_
, flux_err_
and several lesser
contributors. You can get full name and more detailed information by
hovering over each box.
The time spent exclusively inside the inner_
function is indicated
by the amount of the box with nothing above it.
Clicking on a box recenters the display to center on that function.
Clicking one global_in..
updates the flame graph into this:
To widen the focus again, clicking on a box below the current focus.
You may notice that functions in the dispaly are color coded, so MPI function or synchronizing calls will be display in a different color.
Time Line
Panels
The time line view allows you to relate gpu activity against your running program for every thread, and lets you zoom in on the activity at any moment in the run.
From top top bottom:
The location bar
PE selects the “program element”, the CPU process to display
TH selects the CPU thread on the current PE
- Time shows the time of the center of the display range. You can
edit this to recenter to a new interval
- Func/Prev/Next lets you scroll between occurrences of a
specific function. Note: this feature is not active in version 3.0.
The stack section
This shows the graphical view of your program stack, each box shows the begin and end of a cpu function call. Hover over a box to get the details: function name, call start and end times
D:C:S Device Context Stream
The left side on this row indicates the coordinates of the gpu threads associated with the current threads, listed as the indices of the device, context, and stream. “Context” and “stream” are generalized terms since GPU manufactures use their own nomenclatures.
The right side of this row is color coded bar indicating the type of processing:
Grey : computation
Green : communication
Empty : idle
Clicking a rectangle in this display highlights its corresponding cpu call, which will tend to be slightly earlier in the timeline due to device lag.
GPU activity
Graph of the amount of GPU activity at each time. You can select:
Kernel compute activity
In data flow into the GPU
Out data flow out of the GPU
Navigation bars
Panning Bar : use to move the display interval at the same resolution. It moves back to the center when you let go of the mouse so you can pan farther.
Zoom : narrow or widen the view interval. The scale is logarithmic.
Lassoing an interval
Click and drag within the GPU activity area to set the focus interval directly.
Mouse scroll wheel
Use the scroll wheel to zoom in or out. If you have a two-axis scroll, you can pan with the second axis.