Quantcast
Channel: Intel® C++-Compiler
Viewing all articles
Browse latest Browse all 1665

loop was unrolled by 2: is it sufficient?

$
0
0

Greetings,

I use MSVC and /QxHOST on Haswell (AVX-256).

I have code under MSVC that is using __m256 type for my own memcpy, and ICC generates correct result, and it is working well.

But when I look at the assembler output, is it sufficient to unroll ONLY by 2 ?! when I have:

#define PACKET_SIZE_MIN             128
#define PACKET_SIZE_AVG             512
#define PACKET_SIZE_MAX             2048

...

#if defined(__INTEL_COMPILER)
#   pragma loop count min(PACKET_SIZE_MIN) avg(PACKET_SIZE_AVG) max(PACKET_SIZE_MAX)
#endi
#   pragma unroll

and the assembler output reads:

.B1.8::                         ; Preds .B1.6 .B1.8
L4::            ; optimization report
                ; LOOP WAS UNROLLED BY 2
                ; %s was not vectorized: operation cannot be vectorized
$LN15:
  00022 48 ff c1         inc rcx                                ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN16:
  00025 c5 fe 6f 04 10   vmovdqu ymm0, YMMWORD PTR [rax+rdx]    ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14
$LN17:
  0002a c5 fe 6f 4c 10
        20               vmovdqu ymm1, YMMWORD PTR [32+rax+rdx] ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14
$LN18:
  00030 c4 a1 7e 7f 04
        08               vmovdqu YMMWORD PTR [rax+r9], ymm0     ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9
$LN19:
  00036 c4 a1 7e 7f 4c
        08 20            vmovdqu YMMWORD PTR [32+rax+r9], ymm1  ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9
$LN20:
  0003d 48 83 c0 40      add rax, 64                            ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN21:
  00041 49 3b c8         cmp rcx, r8                            ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN22:
  00044 72 dc            jb .B1.8 ; Prob 63%                    ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN23:
                                ; LOE rax rdx rcx rbx rbp rsi rdi r8 r9 r10 r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15

 

PS: I need to "#undef" the "min" and the "max" because of MSVC defining these symbols in the other way...

TIA, best


Viewing all articles
Browse latest Browse all 1665

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>