BLAS Data Types
I’m going to shed a little light on what I meant when in my last post I
said that “BLAS and LAPACK (mostly) support” an O(1) herm
operation for
matrices.
First I need to tell you about the BLAS data types. There are two fundamental data types in BLAS: dense vectors and dense matrices.
Vectors
A dense vector is represented as an array of non-contiguous values, with a
fixed stride between values. In C, you need three things to represent a vector:
the length of the vector, a pointer to the first element in the vector, and an
integer stride. The stride is required to be at least 1, and the length has to
be non-negative. Length is usually represented by a variable called n
, the
pointer is usually called x
or y
, and the stride (or “increment”) is
named incx
for x
or incy
for y
. So, you could have
double *data = { 1.6, 1.7, -3.1, -0.2, 2.6, 1.1 };
int n = 3;
double *x = data;
int incx = 2;
double *y = data + 1;
int incy = 1;
Then, x would be a vector with elements (1.6, -3.1, 2.6), and y would be a vector with elements (1.7, -3.1, -0.2). Being able to have a non-unit stride for vectors turns out to be incredibly useful.
Matrices
Matrices in BLAS are slightly more complicated. There are two variants: row-major, and column-major. For CBLAS both are supported equally well, but most Fortran code (including LAPACK) assumes column-major. Unfortunately, a lot of C code stores matrices in row-major order. The inconsistency turns out not to be a very big deal for real-valued matrices, but it can cause trouble for complex-valued ones.
I’m going to stick with the convention that when I say “matrix” I mean “column-major matrix”. The important thing to remember is that elements in the same column are stored contiguously, but elements in the same row are not. (For row-major, the reverse is true.)
A dense matrix is represented by four numbers: the number of rows, the
number of columns, a pointer to the upper-left element, and the
leading dimension. The numbers of rows and columns are usually given by
m
and n
. The pointer to the element is usually named a
, b
, or c
.
The leading dimension is named for the pointer it is associated with, and
would be lda
to go with a
or ldb
to go with b
. What’s lda
? This
is the stride between consecutive elements of the same row. It must be
greater than or equal to min(1,m)
.
An example will clarify things a bit. Let’s say we want to represent the
matrix a = [ 1 4; 2 5; 3 6 ]
. In MATLAB notation, this is a 3-by-2 matrix;
the first column is (1, 2, 3), and the second column is (4, 5, 6). In C, we
would have
double *data = { 1, 2, 3, 4, 5, 6 };
int m = 3, n = 2;
double *a = data
int lda = 3;
Since there is no gap in memory between the last element of the first column
and the first element of the second column, lda
is equal to m
. Now, what
if we want to represent the submatrix b = a(1:2,1:2)
? This is easy:
int mb = 2, nb = 2;
double *b = a;
int ldb = 3;
We can also similarly represent c = a(2:3,1:2)
:
int mc = 2, nc = 2;
double *c = a + 1;
int ldc = 3;
In general we can represent any submatrix whose row and column indices are contiguous.
What else can we do with a matrix? Well, because of the stride parameter, we
can easily represent a row of the matrix a
(it has stride lda
), or a
column (it has stride 1
). We can also represent a diagonal of a
using
stride lda+1
.
Data Types
Now, what did I mean when I said that BLAS “supported” an O(1) herm operation?
First of all, notice that I have been a little vague when I’ve given the
definitions for the vector and matrix data types. Probably most of you were
expecting me to introduce a struct
at some point. Sadly, BLAS does not
define any new formal types. BLAS only defines functions. The “types” I
talked about above are really just conventions that all of the functions
adhere to. So, here’s the function signature for daxpy
, the function that
performs the operation y := alpha * x + y
, where alpha
is a scalar and
x
and y
are vectors:
void cblas_daxpy (int n, double alpha,
const double *x, int incx,
double *y, int incy);
Only primitive types appear in the argument list. The only way to tell that the function operates on vectors is by looking at the names of the arguments and reading the documentation. This is annoying. Let’s fix that by defining our own vector type:
typedef struct
{
double *data;
int size;
int stride;
int is_conj;
} vector_t;
(Ignore is_conj
for now; the rest should be self-explanatory). Now, we can
simplify daxpy
a bit:
void my_daxpy (double alpha, const vector_t *x,
vector_t *y);
You can imagine that this will clean up a lot of our code. Let’s see if we can
use the same trick for matrices using dgemv
, the function to multiply a matrix
by a vector, as an example. Specifically, the function performs the operation
y := alpha * op(A) * x + beta * y
where op
is either “transpose”, “herm”, or “identity” (“herm” means conjugate transpose).
Here is the signature:
void cblas_dgemv (enum CBLAS_ORDER order,
enum CBLAS_TRANSPOSE transa,
int m, int n,
double alpha, const double *a, int lda,
const double *x, int incx,
double beta, double *y, int incy);
There are twelve parameters. Barf. Can we clean this up? Absolutely. Here’s the vector type we are going to use:
typedef struct
{
double *data;
int size1;
int size2;
int lda;
int is_herm;
} matrix_t;
I have introduced a boolean field is_herm
to indicate whether the matrix has
been transposed and conjugated. This eliminates transa
from the call
signature:
void my_dgemv (double alpha,
const matrix_t *a, const vector_t *x,
double beta,
const vector_t *y);
We have eliminated seven parameters from the call, and we have only sacrificed a little bit of functionality. We have gained the ability to do run-time dimension checking for the arguments.
The feature we have lost is the ability to use “transpose”. We can only do
“identity” and “herm”. Why did I use a boolean for is_herm
rather than the
more general CBLAS_TRANSPOSE
type? Because now we can have a function
void make_herm (matrix_t *a);
that takes the herm
of a matrix as on O(1) operation. It would be nice to
have make_trans
be an O(1) operation, too. Sadly, because BLAS does not support “conjugate” as the op
type in a multiplication, we can either have
trans
be O(1) or we can have herm
be O(1), but we cannot have both. I think
giving up trans
is worth simplifying the interface.
(Astute readers will notice that for gemv
, it is possible to get conj(a)
by using “row-major” as the order and “herm” as the transpose type. This trick does not extend to dgemm
, the function that multiplies two matrices.)
Now I have to come back to why vector_t
has an is_conj
field. This is
necessary if we want getting a row or a column of a matrix to be an O(1)
operation for “herm-ed” matrices.
Extending BLAS
BLAS almost supports the simplifications in the API we’ve made by introducing
vector_t
and matrix_t
. We need the following functionality ourselves to
make the simplifications work. Here are the functions we need to add:
y := alpha * conj(x) + beta * y
y := alpha * op(A) * conj(x) + beta * y
conj(y) := alpha * op(A) * x + beta * conj(y)
The second function is missing when y
has non-unit stride, and the third is
missing when x
has non-unit stride. This is because we can always cast a
vector as a matrix when it is conjugated. If the vector is not conjugated, we
can only perform the cast if the stride is one. (Columns are stored contiguously
for normal matrices, but not for herm-ed matrices.) Once the vectors have been
casted to matrices, we can call gemm
.
Summary
The BLAS API is painfully verbose. By adding our own data types and giving up the ability to take the transpose of the matrix, we can get a far simpler interface with nearly all of the power. The approach I describe here is the one I use in the blas bindings for Haskell.