End-user products => DEEP => DEEP/PAPI

DEEP/PAPI

Why do you need DEEP/PAPI?

With modern processors, efficient use of the cache memories is critical to good performance. The difference in speed between a memory reference in cache and one non in cache is often more than an order of magnitude. How do you know how well your program is using the cache? Where are the places in your code that are not using cache efficiently? DEEP/PAPI can answer these and other performance questions by using the PAPI interface to gather hardware counter information about your program at the function and loop-nest levels. You can use all the DEEP tools to inspect your program's source code, annotate with the hardware counter information you have requested.

DEEP Overview

DEEP includes many tools accessed through an interactive integrated GUI interface; these tools can help you quickly understand and investigate program structure, performance, and behavior. DEEP can find your performance bottlenecks and behavior problems quickly, so you can concentrate on the problem your application is trying to solve.

DEEP/PAPI features

DEEP supports all PAPI counters. The type and number of counters you can use depend on what is available with on hardware platform -- there are over 50 possible items that can be counted. With the DEEP profiler, you can make multiple data gathering runs, and you can select which items to count by changing an environment variable before each run (no need to recompile). DEEP lets you see the results in the source code at the loop nest and function level. Some of the interesting counters include:
  • Cache miss counters for Level 1, 2, and 3. (If these levels exist on that platform.)
  • Translation Look-Aside buffer misses. (Worse than cache misses -- change your data structure and/or loop structure to avoid these.)
  • Number of floating point instructions executed. (What is the true MFlops rate?)
  • Number of branches mispredicted. (Do you need feedback optimization?)
  • Number of cycles that the CPU is stalled. (How well are the instructions being scheduled?)
  • Number of loads and stores. (Is the compiler spilling registers to memory, or is my critical loop staying in the registers?)
To minimize overhead, DEEP/PAPI supports a "sampling" mode, where the instrumentation to collect PAPI counts is executed sparingly throughout the program and the results are scaled for the final reporting. This can give very good accuracy with very little intrusion into the running time of the program.

Program Analysis

Look at the information for the DEEP/MPI program analysis mode to get an idea of DEEP/PAPI tools. The program analysis mode includes tools for program structure browsing (such as the call tree viewer) and tools for examining program performance (profiling at various levels). With these tools, the user is able to quickly find help in creating, understanding, tuning and maintaining parallel applications codes at the original source code level. Structure browsing tools let the user answer questions such as: How are the procedures connected? Where are global variables referenced? Profiling tools provide answers for questions such as: Where is the time spent? Which are the most important loops? Where are most of the messages passed? Where are most of the synchronizations done?

Additional information is available on:

Additional Information About DEEP/PAPI

Pricing and availably for DEEP/PAPI have not yet been determined. For additional information on DEEP/PAPI, please contact CBSC.

Home Contact Webmaster Legal

Copyright 2003 Crescent Bay Software Corp.