End-user products => VAST/Parallel

VAST/Parallel

Evaluations

User's Guides

Availability

VAST-F/Parallel (for Fortran) and VAST-C/Parallel (for C), from Crescent Bay Software, are automatic parallelizing preprocessors that can significantly improve the performance of your important applications on shared memory parallel platforms.


Overview


VAST parallel products increase program speed by transforming and restructuring the input source to use parallel threads, so the program can automatically make use of parallel processors. They fully support both automatic parallelization and the OpenMP standard for user-directed parallelization.

For those who want to get the utmost performance from their applications, VAST/Parallel can optionally produce a diagnostic listing that indicates areas for potential improvement. Using the listing and the VAST System's extensive set of switches and directives, you can tune performance even further.

Automatic Parallelization Features

The state-of-the-art parallelization technology used by the VAST system has been developed over many years of use on large systems and large application programs. Its features include:

Full Loop Nest Analysis. Loops are analyzed in simple and complicated loop nests; loops containing the largest amount of work are parallelized. Loops do not have to be tightly nested.

Extended Parallel Regions. VAST/Parallel extends parallel regions to include multiple parallel loops and intervening scalar code. This cuts down on parallel overhead.

Threshold testing. All parallel systems have some overhead. When VAST/Parallel finds a parallel region, if the amount of work in the region is not clear at compile time, then VAST/Parallel creates a run-time test. Through this run-time test, the parallel region will only be executed if there is enough work; otherwise, the original serial version is executed.

Dependence Analysis. VAST/Parallel has very sophisticated data dependency analysis capabilities that allow it to optimize complicated situations. All loop nests are examined to see if they can be executed in parallel safely. VAST/Parallel can resolve ambiguous subscripting by examining variable assignments outside of loops, and restructure the use of varialbes to avoid certain other dependencies.

Potential Dependence Testing. When dependencies are unclear at compile time, sometimes VAST/Parallel can generate run-time tests to allow parallelism to proceed.

Special Reduction Optimization. Summations and other reductions are parallelized through the use of locks or critical regions.

Shared/Private Determination. All variables in a parallel loop are categorized as shared (seen by all threads) or private (copy in each thread). VAST/Parallel can detect and create private arrays.

Interprocedural Analysis for Parallel Calls. VAST/Parallel can examine call chains to determine their dependencies, and then parallelize loops containing calls or groups of calls outside loops.

Automatic recognition of parallel cases. When sections of code deal with disjoint operations, VAST/Parallel can process each section in a separate parallel case.

Superscalar optimizations. VAST/Parallel includes scalar optimizations to boost performance even in a single thread. Parallel optimizations can be done to outer loops while inner loops are optimized for efficient execution on one thread.

Array Syntax. VAST-F/Parallel can in general parallelize and optimize multi-dimensional array syntax just as efficiently as loop nests.

Choice of static or dynamic partitioning of loop iterations. Load balancing can tradeoff with loop overhead. Use dynamic partitioning when you need more load balancing, static partioning when you are concerned about overhead.

Number of threads can be set with an environment variable. This allows degree of parallelism to be changed from run to run. When the system is busy you can run with two threads, when it is empty you can run with eight threads, without recompiling your program.

Choice of thread waiting strategy. You can select either busy waiting or sleep waiting for threads, so that the parallel program can adapt to loaded or dedicated workloads on the target system. Use busy waiting on a lightly loaded system, and sleep waiting when another job might need the cycles.

OpenMP Support

VAST/Parallel fully supports the OpenMP standard. (Currently version 1.0; we're in the process of implementing 2.0.) For calculations where you know exactly what you want parallelized, OpenMp provides a portable way to specify this.

VAST/Parallel supports all OpenMP directives/pragmas and functions, and provides diagnostics on incorrect use of the directives.

Features include:

  • Thread private common (choice of methods).
  • Orphan directives.
  • Nested parallelism.
  • Reduction optimizations.
  • Environment variables.
  • Efficient library implementation.

DEEP

VAST/Parallel works very closely with the DEEP development environment to provide a complete GUI interface to the world of data parallel programming. VAST/Parallel gathers compile-time data for DEEP, and inserts instrumentation code for run-time data gathering. DEEP uses this information to display in detail the compile-time optimization notes (which loop nests have been parallelized, where data dependencies are preventing parallelization, etc.) and run-time performance data (which loop nests use the most wallclock time, which procedures are called the most, etc.) in many useful views of the program. With DEEP, you can very quickly zoom in on any performance bottlenecks in your code.
Performance

Here are some benchmark timings done on a 2-headed RS6000 workstation. They were done on a dedicated machine using static partitioning. They were run with inlining (-e78) and by parallelizing inner loops (-ei) and -O to the native compiler.

BenchmarkSpeedup
tomcatv1.6
hydro2d1.8
dnasa71.5
matrix3003.7
lss1.6

As you can see, significant speed-ups can be obtained with very little work on the part of the user. Of course, these benchmarks are generally extremely parallel and speed-ups are greatly influenced by the coding style of the user. However, VAST/Parallel can help restructure your code, if necessary, by giving informative messages as to why each loop was not optimized (it will produce an annotated listing file). These problems are sometimes just "potential" ones that the user can "override" with directives or options thereby giving him increased performance.

Additional information on VAST/Parallel can be gotten from this white paper.

Home Contact Legal

Copyright 2003, 2005 Crescent Bay Software Corp.