Run many independent loops simultaneously on the GPU.
Table of contents
The gfor-loop construct may be used to simultaneously launch all of the iterations of a for-loop on the GPU, as long as the iterations are independent. While the standard for-loop performs each iteration sequentially, ArrayFire's gfor-loop performs each iteration at the same time (in parallel). ArrayFire does this by tiling out the values of all loop iterations and then performing computation on those tiles in one pass.
You can think of GFOR as performing auto-vectorization of your code, e.g. you write a gfor-loop that increments every element of a vector but behind the scenes ArrayFire rewrites it to operate on the entire vector in parallel.
Behind the scenes, ArrayFire rewrites your code into this equivalent and faster version:
A = A + 1;
It is best to vectorize computation as much as possible to avoid the overhead in both for-loops and gfor-loops.
To see another example, you could run an FFT on every 2D slice of a volume in a FOR-loop, or you could "vectorize" and simply do it all in one GFOR-loop operation:
for (int i = 0; i < N; ++i) A(span,span,i) = fft2(A(span,span,i)); // runs each FFT in sequence gfor (array i, N) A(span,span,i) = fft2(A(span,span,i)); // runs N FFTs in parallel
There are three formats for instantiating GFOR loops.
{0, 1, ..., n-1}
{first, first+1, ..., last}
{first, first+inc, first+2*inc, ..., last}
So all of the following represent the equivalent sequence: 0,1,2,3,4
More examples:
array A = ones(n,n); array B = ones(1,n); gfor (array k, 0, n-1) { B(k) = A(k,span) * A(span,k); // inner product }
array A = ones(n,n,m); array B = ones(n,n); gfor (array k, 0,m-1) { A(span,span,k) = A(span,span,k) * B; // matrix-matrix multiply }
Use local() to indicate a that each iteration has a local copy of a variable:
array A = ones(n,m); array B = ones(A.dims()); gfor (array k, m) { // 0, 1, ..., m-1 array a = A(span,k); // local() not needed since iterator subscripting array b = local(zeros(n,1)); // local() needed b(seq(2)) = a(seq(2)); // each GFOR tile gets its own unique copy of 'b' B(span,k) = b; }
See also:
See Main Wiki for more examples