Do more. Code less. Free software for GPU computing.
<scroll to top>

Quick Reference

Table of contents

Basic data types and arithmetic

There is one generic array container object while the underlying data may be one of various basic types:

You can generate matrices out on the device. The default underlying datatype is f32 (float) unless otherwise specified. Some examples:

    zeros(3)         // 3-by-1 column of zeros of single-precision (f32 default)
    ones(3, 2, f64)  // 3-by-2 matrix of ones of double-precision
    randu(1, 8)      // row vector (1x8) of random values (uniformly distributed)
    randn(2, 2)      // square matrix (2x2) of random values (normally distributed)
    identity(3, 3)   // 3-by-3 identity (ones along diagonal, zero elsewhere)
    randu(5, 7, c32) // complex values, all real and complex components from a uniform distribution

You can also initialize values from a host array:

    float hA[] = {0,1,2,3,4,5};
    array A(hA, 2, 3); // 2x3 matrix of single-precision
    print(A);
    // A = | 0 2 4 |
    //     | 1 3 5 |

You can print the contents of an array or expression:

    array a = randu(2,2);
    array b = ones(2,1);
    print(a);
    print(b);
    print(a.col(0) + b + .4);

There are hundreds of functions for element-wise arithmetic:

    array R = randu(3,3);
    array C = ones(3,3) + complex(sin(R));  // C is c32

    // rescale complex values to unit circle
    array a = randn(5,c32);
    print(a / abs(a));

    // calculate L2 norm of every column
    array X = randn(20,30);
    print(sqrt(sum(pow(X,2))));    // norm of every column vector
    print(sqrt(sum(pow(X,2),0)));  // same as above
    print(sqrt(sum(pow(X,2),1)));  // norm of every row vector

By default A*B implements matrix multiply to favor linear algebra in v1.0; however, you can toggle this to be elementwise multiply.

You can initialize a matrix from either a host or device pointer:

    float host_ptr[] = {0,1,2,3,4,5};
    array a(host_ptr, 2, 3); // f32 matrix of size 2-by-3 from host data

    float *device_ptr;
    cudaMalloc((void**)&device_ptr, 6*sizeof(float));
    cudaMemcpy(device_ptr, host_ptr, 6*sizeof(float), cudaMemcpyHostToDevice);
    array b(device_ptr, 2,3, afDevicePointer);

    // do not call \c cudaFree(device_ptr) -- it is freed when \c b is destructed.

You can get both device- and host-side pointers to the underlying data with device() and host().

    array a = randu(3, f32);
    float *host_a = a.host<float>();        // must call hostFree() later
    printf("host_a[2] = %g\n", host_a[2]);  // last element
    array::hostFree(host_a);

    float *device_a = a.device<float>();    // no need to free this
    float value;
    cudaMemcpy(&value, device_a + 2, sizeof(float), cudaMemcpyDeviceToHost);
    printf("device_a[2] = %g\n", value);

You can pull the scalar value from the first element of an array back to the CPU with scalar().

    array a = randu(3);
    float val = a.scalar<float>();
    printf("scalar value: %g\n", val);

You can access the dimensions of a matrix using a dim4 object or directly via dims() and ndims().

    array a = randu(4,5,2);
    printf("ndims(a)  %d\n",  a.ndims()); // 3

    dim4 dims = a.dims();
    printf("dims = [%d %d]\n", dims[0], dims[1]); // 4,5
    printf("dims = [%d %d]\n, a.dims(0), a.dims(1)); // 4,5

Integer support includes bitwise operations as well as the standard sort(), min/max, indexing (see more).

    int h_A[] = {1, 2, 4, -1, 2, 0, 4, 2, 3};
    int h_B[] = {2, 3, -5, 6, 0, 10, -12, 0, 1};
    array A = array(3,3,h_A), B = array(3,3,h_B);

    print(A & B);
    print(A | B);
    print(A ^ B);

Several platform-independent constants are available to for reference: pi, nan, inf, i. When these variable names conflict with macros in the standard header files or variables in scope, then reference them with their full namespace, e.g. af::nan

    array A = randu(5,5);
    A(A > .5) = af::nan;

    array x = randu(20e6), y = randu(20e6);
    double pi_est = 4 * sum<float>(hypot(x,y) < 1) / 20e6;
    printf("estimation error: %g\n", fabs(pi - pi_est));

Matrix Manipulation

Many different kinds of matrix manipulation routines are available:

tile() allows you to repeat a matrix along specified dimensions, effectively 'tiling' the matrix. Please note that the dimensions passed in indicate the number of times to replicate the matrix in each dimension, not the final dimensions of the matrix.

    float h[] = {1, 2, 3, 4};
    array small = array(2, 2, h, afHostPointer); // 2x2 matrix
    array large = tile(small, 4, 6);  // produces 8x12 matrix: (2*4)x(2*6)

join() allows you to joining two matrices together. Matrix dimensions must match along every dimension except the dimension of joining (dimensions are 0-indexed). For example, a 2x3 matrix can be joined with a 2x4 matrix along dimension 1, but not along dimension 0 since {3,4} don't match up.

    float hA[] = { 1, 2, 3, 4, 5, 6 };
    float hB[] = { 10, 20, 30, 40, 50, 60, 70, 80, 90 };
    array A = array(3, 2, hA, afHostPointer);
    array B = array(3, 3, hB, afHostPointer);

    print(join(A, B, 1)); // 3x5 matrix
    // array result = join(A, B, 0); // fail: dimension mismatch

grid() can be used to construct a regular mesh grid from vectors x and y. For example, a mesh grid of the vectors {1,2,3,4} and {5,6} would result in two matrices:

    float hx[] = { 1, 2, 3, 4 };
    float hy[] = { 5, 6 };
    array x = array(4, hx, afHostPointer);
    array y = array(2, hy, afHostPointer);
    array u, v;
    grid(u, v, x, y);
    // produces:
    // u = |1 2 3 4|     v=|5 5 5 5|
    //     |1 2 3 4|       |6 6 6 6|

newdims() can be used to create a (shallow) copy of a matrix with different dimensions. The number of elements must remain the same as the original array.

    int hA[] = { 1, 2, 3, 4, 5, 6 };
    array A = array(3, 2, hA);

    print(newdims(h1, 2, 3)); // 2x3 matrix
    print(newdims(h1, 6, 1)); // 6x1 column vector

    // print(newdims(h1, 2, 2)); // fail: wrong number of elements
    // print(newdims(h1, 8, 8)); // fail: wrong number of elements

The T() and H() methods can be used to form the matrix or vector transpose.

    array x = randu(4,4,f64);
    array y = x.T();

    array c = randu(4,4,c64);
    array c_trans = c.T();  // transpose
    array c_conj = c.H();   // Hermitian (conjugate) transpose

Indexing

There are several ways of referencing values. ArrayFire uses parenthesis for subscripted referencing instead of the traditional square bracket notation. Indexing is zero-based, i.e. the first element is at index zero (A(0)). Indexing can be done with mixtures of:

See Subscripted array indexing for the full listing.

    array A = randu(3,3);
    array a1 = A(0);   // first element
    array a2 = A(0,1); // first row, second column

    A(end);   // last element
    A(-1);    // also last element
    A(end-1); // second-to-last element

    A(1,span);       // second row
    A.row(end);      // last row
    A.cols(1,end);   // all but first column

    float b_host[] = {0,1,2,3,4,5,6,7,8,9};
    array b(b_host, 10, dim4(1,10));
    b(seq(3));       //  {0,1,2}
    b(seq(1,7));     //  {1,2,3,4,5,6,7}
    b(seq(1,2,7));   // {1,3,5,7}
    b(seq(0,2,end)); // {0,2,4,6,8}

You can set values in an array:

    // setting entries to a constant
    A(span) = 4;        // fill entire array
    A.row(0) = -1;      // first row
    A(seq(3)) = 3.1415; // first three elements

    // copy in another matrix
    array B = ones(4,4,f64);
    B.row(0) = randu(1,4,f32); // set a row to random values (also upcast)

Use one array to reference into another.

    float h_inds[] = { 0, 4, 2, 1 }; // zero-based indexing
    array inds(h_inds, 1,4);
    array B = randu(1,4);
    array c = B(inds);   // get
    B(inds) = -1;        // set to scalar
    B(inds) = randu(4,1); // set to random

Linear algebra

Matrix decompositions are available: lu, qr, svd, eigen, cholesky, and more.

Matrix operations: inv, mpow, det, solve, hessenberg, and more.

The decompositions have a general the general forms as follows. Here is an example to get packed output, or just the first output.

    array A = randu(5);
    array LU = lu(A);

To get separated lower and upper outputs:

    array in = randu(5);
    array l, u, p;
    lu(l, u, p, in);

    // verify outputs
    print(l);
    print(u);
    print(p);

Other examples:

    array in = randu(5,5);
    array out_inv = inv(in);        // Inverse of input
    array out_pow = mpow(in, 3);    // out_pow = in * in * in; Not element wise.
    float out_det = det<float>(in); // determinant of the input

See also:

Convolutions

The convolve() is the single entrypoint for all image and signal convolution:

convolve() with two inputs performs N dimensional convolution, where N is the highest input dimension:

    array image  = randu(10,10);
    array kernel = ones(3,3) / 9; // average within 3x3 window
    print(convolve(image,kernel)); // 10x10 blurred image

However if the kernel is small and is on the host, it's faster to use it directly from the host pointer instead of pushing it to device first:

    array signal = randu(5000,1);
    float host_filter[] = {1, 0, -1};
    unsigned filter_dims[] = {3};
    convolve(signal,
             1,         // number of filter dimensions
             filter_dims, // filter dimensions
             host_filter);// filter inside host memory

In some cases, a 2D filter kernel is considered "separable", meaning it can be decomposed into two orthogonal vectors. Convolving with those individual vectors is almost always faster.

    // 5x5 derivative with separable kernels
    float h_dx[] = {1/12, -8/12, 0, 8/12, -1/12}; // five point stencil
    float h_spread[] = {1/5, 1/5, 1/5, 1/5, 1/5};
    array dx = array(5,1,h_dx);
    array spread = array(1,5,h_spread);
    array kernel = dx * spread; // 5x5 derivative kernel

    array image = randu(640,480);
    convolve(image, kernel, afConvSame); // derivative of image going down columns

    // equivalent and faster version:
    convolve(dx,spread,image, afConvSame);

    // also supports passing host pointers:
    convolve(5,h_dx, 5,h_spread, image, afConvSame);

Running the convolve.cpp example shows nearly a 3x difference betwen the separable and non-separable cases:

arrayfire/examples/misc $ ./convolve
full 2D convolution:         0.00156023
separable, device pointers:  0.000595222
separable, host pointers:    0.000590385

You can also produce different parts of the convolution with the afConv shape parameter:

    convolve(randu(3,1), randu(5,1), afConvSame)  // 3x1 output
    convolve(randu(5,1), randu(3,1), afConvSame)  // 5x1 output
    convolve(randu(3,1), randu(5,1), afConvFull)  // 7x1 output
    convolve(randu(6,1), randu(5,1), afConvValid) // 2x1 output
    convolve(randu(5,1), randu(6,1), afConvValid) // empty output since kernel bigger than image
See also:
examples/misc/convolve.cpp

Integrating custom CUDA code

ArrayFire can be used in projects that involve writing CUDA kernels and compiling CUDA code. ArrayFire examples directory contains examples/pi/pi_cuda.cu that computes pi launching a CUDA kernel.

Make sure you have the CUDA toolkit installed. This is required as compiling CUDA kernels needs CUDA NVCC compiler.

Windows

ArrayFire examples/pi directory contains solution files for both Visual Studio 2008 and Visual Studio 2010.

To compile pi_cuda in VS 2008 (VS 2010), open pi_cuda_vs2008 (pi_cuda_vs2010) solution file, choose the configuration you want to build (Win32/x64, Debug|Release) and you should be able to build the example successfully.

Linux

Double-check the Makefile and make sure CUDA path is set to the CUDA toolkit installation directory. You should now be able to build the example successfully.

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines