End-user products => VAST/AltiVec => Performance

VAST/AltiVec performance
Overview Performance of existing C programs can vary greatly depending on the programming style. If a program is working primarily on arrays and spending most of the time in loops, then it can generally be cast into a vectorizable form with a little effort on your part, if it is not already vectorizable in the first place. Programs that spend all their time in chasing pointers or in function calls are generally not good candidates for vectorization, and will normally see little speedup. For more performance tips, see the FAQ.

The performance numbers quoted here are actual results obtained on a Linux/G4 system using gcc with -O3 on both the original and vectorized code. The performance is quoted as a speedup factor, which is ratio of the original (scalar) time to the VASTed (vectorized) time.

Signal Processing The first set of timings shows the speedup due to VAST on a set of signal processing kernels. These kernels operate on 16 bit integer input data, frequently summing into 32 bit quantities. The VASTed version of the kernels run six to eighteen times faster than the original code. For comparison, we also show the speedup of a hand-vectorized version (from an independent organization). VAST's speedups are generally very close to the hand-vectorized speedup. As the hand-coded versions took over a man-month to create, the advantages of using VAST become very clear.

Benchmark KernelVAST SpeedupHand Speedup
block6.187.69
lms6.967.90
wht7.848.27
cfir7.898.91
rmax10.0711.61
iir13.5412.05
miir10.9713.51
dotsqu17.1416.73
corr18.9322.22
fir18.9922.44
energy24.8026.01
mfir17.5131.97

Floating Point Benchmarks Below are some performance numbers for several well known C floating point benchmarks. These speedup factors are smaller than the signal processing kernels because floating point operands are 32 bits wide and only four can fit in a vector register. Also, these are larger applications and not all of the program can be vectorized.

BenchmarkSpeedup
Linpack 100x1001.78
Alvinn1.81
Ear2.27

Fortran Under construction.

It is important to note that the AltiVec unit does not support double precision floating point (64 bit), and many Fortran programs are written to use this. Only single precision (32 bit) floating point programs will see a speedup with AltiVec. Here are some results for some single precision benchmarks, comparing g77 -O3... :

BenchmarkSpeedup
Vector Add (non-aligned)3.07
Vector Add (aligned)4.30
SU2COR (q.mech., SPEC92)2.07

Return to Top.

Home Contact Legal

Copyright 2003, 2005 Crescent Bay Software Corp.