

|
End-user products => VAST/AltiVec => Performance
| |||||||||||||||||||||||||||||||||||||||||
|
VAST/AltiVec performance | |||||||||||||||||||||||||||||||||||||||||
| Overview |
Performance of existing C programs can vary greatly depending on the programming
style. If a program is working primarily on arrays and spending most of the time
in loops, then it can generally be cast into a vectorizable form with a little
effort on your part, if it is not already vectorizable in the first place. Programs
that spend all their time in chasing pointers or in function calls are generally
not good candidates for vectorization, and will normally see little speedup.
For more performance tips, see the
FAQ.
The performance numbers quoted here are actual results obtained on a Linux/G4 system using gcc with -O3 on both the original and vectorized code. The performance is quoted as a speedup factor, which is ratio of the original (scalar) time to the VASTed (vectorized) time. | ||||||||||||||||||||||||||||||||||||||||
| Signal Processing |
The first set of timings shows the speedup due to VAST on a set of signal processing
kernels. These kernels operate on 16 bit integer input data, frequently summing into 32 bit
quantities. The VASTed version of the kernels run six to eighteen times faster than
the original code. For comparison, we also show the speedup of a hand-vectorized version
(from an independent organization). VAST's speedups are generally very close to the
hand-vectorized speedup. As the hand-coded versions took over a man-month to create,
the advantages of using VAST become very clear.
| ||||||||||||||||||||||||||||||||||||||||
| Floating Point Benchmarks |
Below are some performance numbers for several well known C floating point benchmarks.
These speedup factors are smaller than the signal processing kernels because floating
point operands are 32 bits wide and only four can fit in a vector register. Also,
these are larger applications and not all of the program can be vectorized.
| ||||||||||||||||||||||||||||||||||||||||
| Fortran |
Under construction.
It is important to note that the AltiVec unit does not support double precision floating point (64 bit), and many Fortran programs are written to use this. Only single precision (32 bit) floating point programs will see a speedup with AltiVec. Here are some results for some single precision benchmarks, comparing g77 -O3... :
| ||||||||||||||||||||||||||||||||||||||||
Contact
Legal