python - non-square C-order matrices in cuBLAS ( numba ) -
i'm trying use cublas functions in anaconda's numba package , having issue. need input matrices in c-order. output can in fortran order.
i can run example script provided package, here. script has 2 functions, gemm_v1 , gemm_v2. in gemm_v1, user has create input matrices in fortran order. in gemm_v2, can passed cuda implementation of gemm , transposed on device. can these examples work square matrices. however, can't figure out how gemm_v2 work non-square input matrices. there way work c-order input matrices non-square?
note:
ideally, both input , output matrices stay on device after call gemm used in other calculations ( part of iterative method ).
the problem example is, works square matrices. if matrices not square cannot calculate a^t*b^t because of dimension missmatch (assuming dimensions right a*b).
i don't have working cublas-installation @ hand, kind of shot in dark, surprised if cublas work differently usual blas. blas expects matrices in column-major-order (aka fortran-order) can used matrices in row-major-order (aka c-order).
in opinion, might wrong, gemm_v2 not usual/best way handle multiplication of 2 c-order matrices, example because if 1 multiplies 2 c-order matrices 1 have c-order matrix answer.
the trick calculate product of 2 c-order-matrices of gemm work follows:
even if known you, first elaborate on row-major-order (c-memory-layout) , column-major-order (fortran-memory-layout), in order flesh out answer.
so if have 2x3 (i.e. 2 rows , 3 columns) matrix a, , store in continuous memory get:
row-major-order(a) = a11, a12, a13, a21, a22, a23 col-major-order(a) = a11, a21, a12, a22, a13, a33 that means if continuous memory, represents matrix in row-major-order, , interpret matrix in column-major-order quite different matrix!
however, if take @ transposed matrix a^t can see:
row-major-order(a) = col-major-order(a^t) col-major-order(a) = row-major-order(a^t) that means, if matrix c in row-major-order result, blas-routine should write transposed matrix c in column-major-order (after cannot change) memory. however, c^t=(ab)^t=b^t*a^t , b^t a^t original matrices reinterpreted in column-major-order.
now, let a n x k-matrix , b k x m-matrix, call of gemm routine should follows:
gemm('n', 'n', m, n, k, 1.0, b, m, a, k, 0.0, c, m) please note:
- we don't have transpose matrices
a,b, because handled reinterpreting c-order fortran-order. - we have swap places of matrices
a,bin orderc^tin fortran-order result. - the resulting matrix
cin c-order (by reinterpreting fortran-order c-order rid of^t).
Comments
Post a Comment