python - non-square C-order matrices in cuBLAS ( numba ) -


i'm trying use cublas functions in anaconda's numba package , having issue. need input matrices in c-order. output can in fortran order.

i can run example script provided package, here. script has 2 functions, gemm_v1 , gemm_v2. in gemm_v1, user has create input matrices in fortran order. in gemm_v2, can passed cuda implementation of gemm , transposed on device. can these examples work square matrices. however, can't figure out how gemm_v2 work non-square input matrices. there way work c-order input matrices non-square?

note:
ideally, both input , output matrices stay on device after call gemm used in other calculations ( part of iterative method ).

the problem example is, works square matrices. if matrices not square cannot calculate a^t*b^t because of dimension missmatch (assuming dimensions right a*b).

i don't have working cublas-installation @ hand, kind of shot in dark, surprised if cublas work differently usual blas. blas expects matrices in column-major-order (aka fortran-order) can used matrices in row-major-order (aka c-order).

in opinion, might wrong, gemm_v2 not usual/best way handle multiplication of 2 c-order matrices, example because if 1 multiplies 2 c-order matrices 1 have c-order matrix answer.

the trick calculate product of 2 c-order-matrices of gemm work follows:

even if known you, first elaborate on row-major-order (c-memory-layout) , column-major-order (fortran-memory-layout), in order flesh out answer.

so if have 2x3 (i.e. 2 rows , 3 columns) matrix a, , store in continuous memory get:

row-major-order(a) = a11, a12, a13, a21, a22, a23 col-major-order(a) = a11, a21, a12, a22, a13, a33 

that means if continuous memory, represents matrix in row-major-order, , interpret matrix in column-major-order quite different matrix!

however, if take @ transposed matrix a^t can see:

row-major-order(a) = col-major-order(a^t) col-major-order(a) = row-major-order(a^t) 

that means, if matrix c in row-major-order result, blas-routine should write transposed matrix c in column-major-order (after cannot change) memory. however, c^t=(ab)^t=b^t*a^t , b^t a^t original matrices reinterpreted in column-major-order.

now, let a n x k-matrix , b k x m-matrix, call of gemm routine should follows:

gemm('n', 'n', m, n, k, 1.0, b, m, a, k, 0.0, c, m) 

please note:

  1. we don't have transpose matrices a , b, because handled reinterpreting c-order fortran-order.
  2. we have swap places of matrices a , b in order c^t in fortran-order result.
  3. the resulting matrix c in c-order (by reinterpreting fortran-order c-order rid of ^t).

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -