python - non-square C-order matrices in cuBLAS ( numba ) -
i'm trying use cublas functions in anaconda's numba package , having issue. need input matrices in c-order. output can in fortran order.
i can run example script provided package, here. script has 2 functions, gemm_v1
, gemm_v2
. in gemm_v1
, user has create input matrices in fortran order. in gemm_v2
, can passed cuda implementation of gemm , transposed on device. can these examples work square matrices. however, can't figure out how gemm_v2
work non-square input matrices. there way work c-order input matrices non-square?
note:
ideally, both input , output matrices stay on device after call gemm used in other calculations ( part of iterative method ).
the problem example is, works square matrices. if matrices not square cannot calculate a^t*b^t
because of dimension missmatch (assuming dimensions right a*b
).
i don't have working cublas-installation @ hand, kind of shot in dark, surprised if cublas work differently usual blas. blas expects matrices in column-major-order (aka fortran-order) can used matrices in row-major-order (aka c-order).
in opinion, might wrong, gemm_v2
not usual/best way handle multiplication of 2 c-order matrices, example because if 1 multiplies 2 c-order matrices 1 have c-order matrix answer.
the trick calculate product of 2 c-order-matrices of gemm
work follows:
even if known you, first elaborate on row-major-order (c-memory-layout) , column-major-order (fortran-memory-layout), in order flesh out answer.
so if have 2x3
(i.e. 2 rows , 3 columns) matrix a
, , store in continuous memory get:
row-major-order(a) = a11, a12, a13, a21, a22, a23 col-major-order(a) = a11, a21, a12, a22, a13, a33
that means if continuous memory, represents matrix in row-major-order, , interpret matrix in column-major-order quite different matrix!
however, if take @ transposed matrix a^t
can see:
row-major-order(a) = col-major-order(a^t) col-major-order(a) = row-major-order(a^t)
that means, if matrix c
in row-major-order result, blas-routine should write transposed matrix c
in column-major-order (after cannot change) memory. however, c^t=(ab)^t=b^t*a^t
, b^t
a^t
original matrices reinterpreted in column-major-order.
now, let a
n x k
-matrix , b
k x m
-matrix, call of gemm routine should follows:
gemm('n', 'n', m, n, k, 1.0, b, m, a, k, 0.0, c, m)
please note:
- we don't have transpose matrices
a
,b
, because handled reinterpreting c-order fortran-order. - we have swap places of matrices
a
,b
in orderc^t
in fortran-order result. - the resulting matrix
c
in c-order (by reinterpreting fortran-order c-order rid of^t
).
Comments
Post a Comment