Do more. Code less. Free software for GPU computing.
<scroll to top>

Parallel loops: gfor

Run many independent loops simultaneously on the GPU.

Table of contents

Introduction

The gfor-loop construct may be used to simultaneously launch all of the iterations of a for-loop on the GPU, as long as the iterations are independent. While the standard for-loop performs each iteration sequentially, ArrayFire's gfor-loop performs each iteration at the same time (in parallel). ArrayFire does this by tiling out the values of all loop iterations and then performing computation on those tiles in one pass.

You can think of GFOR as performing auto-vectorization of your code, e.g. you write a gfor-loop that increments every element of a vector but behind the scenes ArrayFire rewrites it to operate on the entire vector in parallel.

   for (int i = 0; i < n; ++i)
       A(i) = A(i) + 1;

   gfor (array i, n)
       A(i) = A(i) + 1;

Behind the scenes, ArrayFire rewrites your code into this equivalent and faster version:

   A = A + 1;

It is best to vectorize computation as much as possible to avoid the overhead in both for-loops and gfor-loops.

To see another example, you could run an FFT on every 2D slice of a volume in a FOR-loop, or you could "vectorize" and simply do it all in one GFOR-loop operation:

   for (int i = 0; i < N; ++i)
       A(span,span,i) = fft2(A(span,span,i)); // runs each FFT in sequence

   gfor (array i, N)
       A(span,span,i) = fft2(A(span,span,i)); // runs N FFTs in parallel

There are three formats for instantiating GFOR loops.

  1. gfor(n) Creates a sequence {0, 1, ..., n-1}
  2. gfor(first,last) Creates a sequence {first, first+1, ..., last}
  3. gfor(first,incr,last) Creates a sequence {first, first+inc, first+2*inc, ..., last}

So all of the following represent the equivalent sequence: 0,1,2,3,4

   gfor (array i, 5)
   gfor (array i, 0, 4)
   gfor (array i, 0, 1, 4)

More examples:

   array A = ones(n,n);
   array B = ones(1,n);
   gfor (array k, 0, n-1) {
       B(k) = A(k,span) * A(span,k);  // inner product
   }
   array A = ones(n,n,m);
   array B = ones(n,n);
   gfor (array k, 0,m-1) {
       A(span,span,k) = A(span,span,k) * B; // matrix-matrix multiply
   }
   array A = randu(n,m);
   array B = zeros(n,m);
   gfor (array k, 0, m-1) {
       B(span,k) = fft(A(span,k));
   }

Use local() to indicate a that each iteration has a local copy of a variable:

   array A = ones(n,m);
   array B = ones(A.dims());
   gfor (array k, m) { // 0, 1, ..., m-1
       array a = A(span,k); // local() not needed since iterator subscripting
       array b = local(zeros(n,1)); // local() needed
       b(seq(2)) = a(seq(2));  // each GFOR tile gets its own unique copy of 'b'
       B(span,k) = b;
   }

See also:

Usage and limitations

See Main Wiki for more examples

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines