Do more. Code less. Free software for GPU computing.
<scroll to top>
Namespaces | Enumerations | Functions

dla.h File Reference

#include <cuComplex.h>
#include "defines.h"

Go to the source code of this file.

Namespaces

namespace  af

Enumerations

enum  afSolve {
  af_solve_none = 0, af_solve_posdef = 1, af_solve_nonposdef = 2, af_solve_gaussian = 3,
  af_solve_pseudo = 4, af_solve_ctrans = 256, af_solve_trans = 512, af_solve_uppertri = 1024,
  af_solve_lowertri = 2048
}

Functions

array lu (const array &in)
 LU factorization (packed)
void lu (array &lower, array &upper, const array &in)
 LU factorization.
void lu (array &lower, array &upper, array &pivot, const array &in)
 LU factorization (with pivoting)
array qr (const array &in)
 QR factorization (packed).
void qr (array &q, array &r, const array &in)
 QR factorization.
void qr (array &q, array &r, array &tau, const array &in)
 QR factorization with tau.
array cholesky (unsigned &info, const array &X, bool is_upper=true)
 Cholesky decomposition ("Y^T * Y == X").
array hessenberg (const array &in)
 Hessenberg matrix form.
void hessenberg (array &h, array &q, const array &in)
 Hessenberg matrix h with unitary permutation matrix q.
array eigen (const array &in, bool is_diag=false)
 Eigenvalues.
void eigen (array &values, array &vectors, const array &in)
 Eigenvalues and eigenvectors.
array svd (const array &in, bool is_diag=false)
 Singular values.
void svd (array &s, array &u, array &v, const array &in)
 Singular values with unitary bases: in = u * s * v.
array inv (const array &in)
 Matrix inversion.
array pinv (const array &in)
 Pseudo inverse.
array mpow (const array &base, double exponent)
 Matrix power.
unsigned rank (const array &in, double tolerance=1e-5)
 Rank of matrix.
template<typename T >
det (const array &in)
 Matrix determinant.
array solve (const array &A, const array &B, afSolve options=af_solve_none)
 Solve linear system.
Device pointer interface: lu Decomposition
Parameters:
[out]d_pivArray of the pivot indices. Size: min(m, n)
[out]d_UThe upper triangular matrix. Siz: size(min(m,n), n)
[in,out]d_LIf d_U isn't NULL, d_L (Size: (m, n)) contains lower triangular matrix. If d_U is NULL, d_L (Firse m x min(m,n) elements) contain the packed version of lu decomposition.
[in]mnumber of rows in the input
[in]nnumber of columns in the input
[in]batchNumber of tiles of input being handled.
afError af_lu_S (int *d_piv, float *d_U, float *d_L, unsigned m, unsigned n, unsigned batch)
 lu Decomposition on single precision data. No DLA license required.
afError af_lu_C (int *d_piv, cuComplex *d_U, cuComplex *d_L, unsigned m, unsigned n, unsigned batch)
 lu Decomposition on single precision, complex data. DLA license required.
afError af_lu_D (int *d_piv, double *d_U, double *d_L, unsigned m, unsigned n, unsigned batch)
 lu Decomposition on double precision data. DLA license required.
afError af_lu_Z (int *d_piv, cuDoubleComplex *d_U, cuDoubleComplex *d_L, unsigned m, unsigned n, unsigned batch)
 lu Decomposition on double precision, complex data. DLA license required.
Device pointer interface: qr decomposition
Parameters:
[out]d_tauVector on device. Size: min(m, n). contains Additional information about d_Q
[out]d_RThe upper triangular matrix. Size: (k, n)
[in,out]d_QIf d_R isn't NULL, d_Q contains orthogonal matrix. Size: (m, k) if d_R is NULL, d_Q contains the packed version of qr decomposition. Size: (m, n)
[in]mnumber of rows in the input
[in]nnumber of columns in the input
[in]knumber of columns to be calculated in the orthogonal matrix.
[in]batchNumber of tiles of input being handled.
afError af_qr_S (float *d_tau, float *d_R, float *d_Q, unsigned m, unsigned n, unsigned k, unsigned batch)
 qr decomposition on single precision data.
afError af_qr_C (cuComplex *d_tau, cuComplex *d_R, cuComplex *d_Q, unsigned m, unsigned n, unsigned k, unsigned batch)
 qr decomposition on single precision, complex data. DLA license required.
afError af_qr_D (double *d_tau, double *d_R, double *d_Q, unsigned m, unsigned n, unsigned k, unsigned batch)
 qr decomposition on double precision data. DLA license required.
afError af_qr_Z (cuDoubleComplex *d_tau, cuDoubleComplex *d_R, cuDoubleComplex *d_Q, unsigned m, unsigned n, unsigned k, unsigned batch)
 qr decomposition on double precision, complex data. DLA license required.
Device pointer interface: Cholesky decomposition
Parameters:
[out]d_RMatrix of Size: (n, n).
[in]nWidth of the input matrix d_A.
[in]d_AInput square matrix of width n.
[in]is_upperFlag to specify the format out the required output. d_R is upper triangular if is_upper is true. d_R is lower triangular if is_upper is false.
[in]batchNumber of tiles of input being handled.
afError af_cholesky_S (float *d_R, unsigned *info, unsigned n, const float *d_A, bool is_upper, unsigned batch)
 Cholesky decomposition, single precision data. No DLA license required.
afError af_cholesky_C (cuComplex *d_R, unsigned *info, unsigned n, const cuComplex *d_A, bool is_upper, unsigned batch)
 Cholesky decomposition, single precision complex data. DLA license reqd.
afError af_cholesky_D (double *d_R, unsigned *info, unsigned n, const double *d_A, bool is_upper, unsigned batch)
 Cholesky decomposition, double precision data. DLA license required.
afError af_cholesky_Z (cuDoubleComplex *d_R, unsigned *info, unsigned n, const cuDoubleComplex *d_A, bool is_upper, unsigned batch)
 Cholesky decomposition, double precision, complex data. DLA license reqd.
Device pointer interface: Hessenberg Matrix
Parameters:
[out]d_HHessenberg matrix of input d_A
[out]d_QUnitary matrix such that d_A = d_Q * d_H * d_Q'
[in]nWidth of the input matrix d_A.
[in]d_AInput square matrix of width n.
[in]batchNumber of tiles of input being handled.
afError af_hessenberg_S (float *d_H, float *d_Q, unsigned n, const float *d_A, unsigned batch)
 Find the hessenberg matrix on single precision data.
afError af_hessenberg_C (cuComplex *d_H, cuComplex *d_Q, unsigned n, const cuComplex *d_A, unsigned batch)
 Find the hessenberg matrix on single precision complex datax.
afError af_hessenberg_D (double *d_H, double *d_Q, unsigned n, const double *d_A, unsigned batch)
 Find the hessenberg matrix on double precision data.
afError af_hessenberg_Z (cuDoubleComplex *d_H, cuDoubleComplex *d_Q, unsigned n, const cuDoubleComplex *d_A, unsigned batch)
 Find the hessenberg matrix on double precision complex data.
Device pointer interface: Eigen vectors and Eigen values
Parameters:
[out]d_ValThe output containing the eigen values of d_A. For real inputs, this is allocated internally based on the resulting complexity of output and is_imag indicates complexity, e.g. real-valued hermitial matrices produce real-valued eigen values while non-hermitian produce complex-valued eigen values.
[out]d_VecThe output containing the eigen vectors of d_A, or NULL if this is to be ignored (faster). For real inputs, this is allocated internally based on complexity of output and is_imag indicates complexity. For complex inputs, caller allocates. Pass NULL to avoid computing eigen vectors (faster).
[out]is_imagThe output format specifier for real inputs. If is_imag is true, the outputs are stored as cuComplex or (cuDoubleComplex). If is_imag is false, the outputs are stored as float (or double).
[in]nWidth of the input matrix d_A.
[in]d_AInput square matrix of width n.
[in]is_diagSpecifier of d_Val data structure. If is_diag is true, d_Val is a diagonal matrix. if is_diag is false, d_Val is a vector. is_diag can not be false if d_Vec is not NULL
[in]batchNumber of tiles of input being handled.
afError af_eigen_S (void **d_Val, void **d_Vec, bool *is_imag, unsigned n, const float *d_A, bool is_diag, unsigned batch)
 Eigen value decomposition of single precision input.
afError af_eigen_D (void **d_Val, void **d_Vec, bool *is_imag, unsigned n, const double *d_A, bool is_diag, unsigned batch)
 Eigen value decomposition of single precision complex input.
afError af_eigen_C (void **d_Val, cuComplex *d_Vec, bool *is_imag, unsigned n, const cuComplex *d_A, bool is_diag, unsigned batch)
 Eigen value decomposition of double precision input.
afError af_eigen_Z (void **d_Val, cuDoubleComplex *d_Vec, bool *is_imag, unsigned n, const cuDoubleComplex *d_A, bool is_diag, unsigned batch)
 Eigen value decomposition of double precision complex input.
Device pointer interface: Singular value decomposition
Parameters:
[out]d_SThe output containing the singular of the input. if is_diag is false, d_S is a vector. if is_diag is true , d_S is a diagonal matrix. d_A = d_U * d_S * d_V'
[out]d_ULeft unitary Matrix
[out]d_VRight unitary Matrix
[in]jobUCan be one of 'A', 'O', 'S', 'N' similar to lapack
[in]jobVCan be one of 'A', 'O', 'S', 'N' similar to lapack
[in]mNumber of rows in the input.
[in]nNumber of columns in the input
[in]d_AThe input matrix.
[in]m_Number of rows required in the output.
[in]n_Number of columns required in the output.
[in]is_diagData structure specifier of input d_S. is_diag can not be false if d_U or d_V are not NULL
[in]batchNumber of tiles of input being handled.
afError af_svd_S (float *d_S, float *d_U, float *d_V, char jobU, char jobV, unsigned m, unsigned n, const float *d_A, unsigned m_, unsigned n_, bool is_diag, unsigned batch)
 Singular value decomposition on single precision input DLA license not required.
afError af_svd_C (float *d_S, cuComplex *d_U, cuComplex *d_V, char jobU, char jobV, unsigned m, unsigned n, const cuComplex *d_A, unsigned m_, unsigned n_, bool is_diag, unsigned batch)
 Singular value decomposition on single precision complex input DLA license required.
afError af_svd_D (double *d_S, double *d_U, double *d_V, char jobU, char jobV, unsigned m, unsigned n, const double *d_A, unsigned m_, unsigned n_, bool is_diag, unsigned batch)
 Singular value decomposition on double precision input DLA license not required.
afError af_svd_Z (double *d_S, cuDoubleComplex *d_U, cuDoubleComplex *d_V, char jobU, char jobV, unsigned m, unsigned n, const cuDoubleComplex *d_A, unsigned m_, unsigned n_, bool is_diag, unsigned batch)
 Singular value decomposition on double precision complex input DLA license not required.
Device pointer interface: Matrix inversion
Parameters:
[out]d_outThe inverted matrix of d_in.
[in]nWidth of the input matrix.
[in,out]d_AThe input matrix. Inversion done inplace if d_out is NULL
[in]batchNumber of tiles of input being handled.
afError af_inv_S (float *out, unsigned n, float *d_in, unsigned batch)
 Inversion of single precision matrix. DLA license required.
afError af_inv_C (cuComplex *out, unsigned n, cuComplex *d_in, unsigned batch)
 Inversion of single precision complex matrix. DLA license required.
afError af_inv_D (double *out, unsigned n, double *d_in, unsigned batch)
 Inversion of double precision matrix. DLA license required.
afError af_inv_Z (cuDoubleComplex *out, unsigned n, cuDoubleComplex *d_in, unsigned batch)
 Inversion of double precision complex matrix. DLA license required.
Device pointer interface: Matrix determinant
Parameters:
[out]resThe determinant of the input matrix d_X
[in]nWidth of the input matrix.
[in,out]d_XThe input matrix.
[in]inplaceInput data in d_X is destroyed if true.
[in]batchNumber of tiles of input being handled.
afError af_det_S (float *res, unsigned n, float *d_X, bool inplace, unsigned batch)
 Determinant of single precision matrix. DLA license not required.
afError af_det_C (cuComplex *res, unsigned n, cuComplex *d_X, bool inplace, unsigned batch)
 Determinant of single precision complex matrix. DLA license required.
afError af_det_D (double *res, unsigned n, double *d_X, bool inplace, unsigned batch)
 Determinant of double precision matrix. DLA license required.
afError af_det_Z (cuDoubleComplex *res, unsigned n, cuDoubleComplex *d_X, bool inplace, unsigned batch)
 Determinant of double precision complex matrix. DLA license required.
Device pointer interface: Matrix power
Parameters:
[out]d_outd_out is the output containing pow(d_in, power)
[in]nWidth of the input matrix.
[in]d_inThe input matrix.
[in]powerThe exponent the input has to be raised to.
[in]batchNumber of tiles of input being handled.
[in]is_cplxTo signal if the output is real or complex
afError af_matrixPower_S (void **d_out, unsigned n, const float *d_in, float power, unsigned batch, bool *is_cplx)
 Matrix power for single precision matrix.
afError af_matrixPower_C (void **d_out, unsigned n, const cuComplex *d_in, float power, unsigned batch, bool *is_cplx)
 Matrix power for single precision, complex matrix. DLA license required.
afError af_matrixPower_D (void **d_out, unsigned n, const double *d_in, double power, unsigned batch, bool *is_cplx)
 Matrix power for double precision matrix. DLA license required.
afError af_matrixPower_Z (void **d_out, unsigned n, const cuDoubleComplex *d_in, double power, unsigned batch, bool *is_cplx)
 Matrix power for double precision, complex matrix. DLA license required.
Device pointer interface: Solving linear systems.
Parameters:
[out]d_XSolution to the equation d_A * d_X = d_B.
[in]mNumber of rows in d_A.
[in]nNumber of columns in d_A, Number of rows in d_B.
[in]d_AThe co-efficient matrix.
[in]batch_AThe number of tiles of d_A being computed.
[in]kNumber of columns in d_B.
[in]d_BThe residual matrix.
[in]batch_BThe number of tiles of d_B being computed.
[in]optsGive specific information about the system.
optscombination (sum) of:

  • 0 for no information (solves using LU/CHOL for m == n, QR for m != n)
  • 1 if d_A is positive definite
  • 2 if d_A is not positive definite
  • 3 Use Gaussian elimination (fast, cannot be combined with other options)
  • 4 Use pseudo inverse (fast, cannot be combined with other options)
  • 256 to do conjugate transpose on d_A before solving
  • 512 to do transpose on d_A before solving
  • 1024 if d_A is Upper triangular
  • 2048 if d_A is Lower triangular
afError af_linearSolve_SS (float *d_X, unsigned m, unsigned n, const float *d_A, unsigned batch_A, unsigned k, const float *d_B, unsigned batch_B, unsigned opts)
 Solve a single precision system. DLA not required for opts = 0, 3.
afError af_linearSolve_CC (cuComplex *d_X, unsigned m, unsigned n, const cuComplex *d_A, unsigned batch_A, unsigned k, const cuComplex *d_B, unsigned batch_B, unsigned opts)
 Solve a single precision complex system. DLA not required for opts = 3.
afError af_linearSolve_DD (double *d_X, unsigned m, unsigned n, const double *d_A, unsigned batch_A, unsigned k, const double *d_B, unsigned batch_B, unsigned opts)
 Solve a double precision system. DLA not required for opts = 3.
afError af_linearSolve_ZZ (cuDoubleComplex *d_X, unsigned m, unsigned n, const cuDoubleComplex *d_A, unsigned batch_A, unsigned k, const cuDoubleComplex *d_B, unsigned batch_B, unsigned opts)
 Solve a double precision complex system. DLA not required for opts = 3.
Get original pivot indices

d_piv returned by af_lu_* contain indices from their updated locations.

af_piv_final gives the original locations of each of the pivots.

Parameters:
[out]d_outOriginal pivot locations.
[in]mNumber of rows in original input.
[in]kNumber of columns original input.
[in]d_pivContains updated pivot indices.
[in]batchThe number of input tiles.
afError af_piv_final_I (int *d_out, unsigned m, unsigned k, const int *d_piv, unsigned batch)
afError af_piv_final_U (unsigned *d_out, unsigned m, unsigned k, const int *d_piv, unsigned batch)
afError af_piv_final_S (float *d_out, unsigned m, unsigned k, const int *d_piv, unsigned batch)
afError af_piv_final_D (double *d_out, unsigned m, unsigned k, const int *d_piv, unsigned batch)

Function Documentation

afError af_piv_final_I ( int *  d_out,
unsigned  m,
unsigned  k,
const int *  d_piv,
unsigned  batch 
)
afError af_piv_final_U ( unsigned *  d_out,
unsigned  m,
unsigned  k,
const int *  d_piv,
unsigned  batch 
)
afError af_piv_final_S ( float *  d_out,
unsigned  m,
unsigned  k,
const int *  d_piv,
unsigned  batch 
)
afError af_piv_final_D ( double *  d_out,
unsigned  m,
unsigned  k,
const int *  d_piv,
unsigned  batch 
)
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines