|
VAST/AltiVec features include:
- Optimization of entire loop nests, not just inner loops.
Critical optimizations include loop fusion (squeezing multiple loops into one loop),
outer loop unrolling (unrolling an outer loop inside an inner loop),
loop collapse (making one long loop from a multiple dimension loop),
and loop interchange (changing the order of the loops in a loop nest to get
more efficient memory access).
- Unrolled vector loops. Unrolling vectorized loops is very important
in making sure that the vector instructions are overlapped the the
maximum extent possible.
- Vectorization of reduction loops. Includes array summations, dot products,
minimum and maximum element of an array, product of array elements, etc. These
operations take a large fraction of the CPU time for many programs.
- Vectorization of conditional loops. "if" statements and conditional operators
are vectorized.
- Non-aligned vectors can be vectorized efficiently. VAST introduces "permute"
operations to align vectors "on the fly" prior to computation.
- 32-bit float and 8, 16 and 32-bit integer vectorization. Integers can be
signed and unsigned. Also, VAST can vectorize loops that contain mixed data sizes.
- ALIGNED pragma so that the user can inform VAST-C about arrays that
are aligned on 16-byte boundaries. Also the -Valigned command line switch.
- -Vmessages switch to get vectorization messages for all loops in the program.
Find out what constructs are inhibiting vectorization of your important loops.
- DISJOINT, NODEPCHK pragmas for disambiguating data dependencies. Especially
useful if the target program uses lots of pointers rather than array notation.
- -L parameter for assertion levels to allow vectorization in the
presence of pointer arguments. Can be very useful if the program is written to pass most of the
data as pointer arguments.
- Vector load lifting. Move all loads to the top of the loop, as far as they will
go (safely). Allows the compiler to do a better job of instruction scheduling.
- Vectorization of complex data type. Uses the permute instructions to reorder
interleaved complex data so that it can be operated on with the vector unit.
- Testing for stride one on loops with variable stride. Inserts a run-time test
to see if variable array strides are all one; executes a vector version of the loop
if the strides are one, otherwise executes the original scalar loop.
- Partial vectorization of loops with strided or gather/scatter vectors.
- Vectorization of "table lookup" loops. Loops that have a branch out of the loop
can be vectorized in certain cases.
These and other features are explanined in much more detail in the VAST/AltiVec User's Guide
which comes with the product.
|