c++ - Aliasing regular array with vector intrinsics in gcc -

March 15, 2012

i'm playing around vector instrinsics in gcc, particularly avx, , i'm tempted write vector multiply between 2 arrays:

#include <unistd.h>  void __attribute__((target("avx"))) vmul(float* __restrict__ cc, const float* __restrict__ aa, const float* __restrict__ bb, ssize_t size) {     const ssize_t vecsize=8;     typedef float vfloat __attribute__((vector_size(sizeof(float)*vecsize)));      // duff's device, process remainder front     ssize_t rem = size % vecsize;     switch (rem) {         case 7: cc[6] = aa[6]*bb[6]; /* fallthru */         case 6: cc[5] = aa[5]*bb[5]; /* fallthru */         case 5: cc[4] = aa[4]*bb[4]; /* fallthru */         case 4: cc[3] = aa[3]*bb[3]; /* fallthru */         case 3: cc[2] = aa[2]*bb[2]; /* fallthru */         case 2: cc[1] = aa[1]*bb[1]; /* fallthru */         case 1: cc[0] = aa[0]*bb[0]; /* fallthru */         case 0: break;     }     size -= rem;      // process rest of array     const vfloat *va = (const vfloat*)(aa+rem);     const vfloat *vb = (const vfloat*)(bb+rem);     vfloat *vc = (vfloat*)(cc+rem);      (ssize_t ii=0; ii < size; ii++) {         vc[ii] = va[ii]*vb[ii];     }     }  int main() { }

the problem pointer-aliasing required data vector type. gcc happily lets (no warning -wall -wextra -ansi -pedantic), assumes underlying memory alignment appropriate. generates vmovaps instructions in inner loop:

   0x0000000000400660 <+176>:   vmovaps (%rsi,%rax,1),%ymm0    0x0000000000400665 <+181>:   vmulps (%rdx,%rax,1),%ymm0,%ymm0    0x000000000040066a <+186>:   vmovaps %ymm0,(%rdi,%rax,1)    0x000000000040066f <+191>:   add    $0x20,%rax    0x0000000000400673 <+195>:   cmp    %r8,%rax    0x0000000000400676 <+198>:   jne    0x400660 <_z4vmulpfpkfs1_l+176>

which fine, until pass in non-aligned memory (or size not multiple of 8 in case), , happily segfaults program trying load unaligned memory aligned instruction.

is there proper way vector extensions?

you can reduce alignment this:

typedef float vfloat __attribute__((vector_size(sizeof(float)*vecsize),     aligned(4)));

with change, vmovups instructions.

Search This Blog

RT

c++ - Aliasing regular array with vector intrinsics in gcc -

Comments

Post a Comment

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -