Hi All,
I am trying to compile following sample kernel with Intel (ICC) 14.0.0 20130728 (or version > 12 ). I see strange behaviour with vectorization. I have following questions:
- If I change _iml variable type to int instead of long int, compiler doesn't vectorize the code. If I see vectorization report with -vec-report3, I see large report with ANTI and FLOW dependencies which seems correct. But I didn't understand what compiler does to vectorize when I change loop iteration variable type to long int.
- Below example is auto-generated kernel from domain specific language. We have large array and we process 18 elements of array for every iteration (say those 18 elements represent a particle). So iterations are independent. But this memory layout looks similar to AoS (arrya of struct with 18 elements). AoS is not good for vectorization, I want to understant how Intel compiler vectorize this code.
compute() function is actual compute kernel that I want to vectorize. Please follow the comments for more explaination:
#include <math.h> #define AOS_BLOCK 18 void compute(double *pdata, int num_mechs) { double* _p; /* ISSUE : If I change _iml to int instead of long int * compiler doesn't vectorize the code. Why? */ long int _iml; /* for each iteration of loop, we process 18 elements of pdata 1-d array */ for (_iml = 0; _iml < num_mechs; ++_iml) { /* take pointer to start of 18 blocks element */ _p = &pdata[_iml*AOS_BLOCK]; /* below calculations are generanted from DSL to C code converter, looks ugly I know! * we do some computation on those 18 elements only, so you don't need to understand */ if ( _p[16] == - 35.0 ) { _p[16] = _p[16] + 0.0001 ; } _p[8] = ( 0.182 * ( _p[16] - - 35.0 ) )/ ( 1.0 - ( exp ( - ( _p[16] - - 35.0 ) / 9.0 ) ) ) ; _p[9] = ( 0.124 * ( - _p[16] - 35.0 ) ) / ( 1.0 - ( exp ( - ( - _p[16] - 35.0 ) / 9.0 ) ) ) ; _p[6] = _p[8] / ( _p[8] + _p[9] ) ; _p[7] = 1.0 / ( _p[8] + _p[9] ) ; if ( _p[16] == - 50.0 ) { _p[16] = _p[16] + 0.0001 ; } _p[12] = ( 0.024 * ( _p[16] - - 50.0 ) ) / ( 1.0 - ( exp ( - ( _p[16] - - 50.0 ) / 5.0 ) ) ) ; if ( _p[16] == - 75.0 ) { _p[16] = _p[16] + 0.0001 ; } _p[13] = ( 0.0091 * ( - _p[16] - 75.0 ) ) / ( 1.0 - ( exp ( - ( - _p[16] - 75.0 ) / 5.0 ) ) ) ; _p[10] = 1.0 / ( 1.0 + exp ( ( _p[16] - - 65.0 ) / 6.2 ) ) ; _p[11] = 1.0 / ( _p[12] + _p[13] ) ; _p[3] = _p[3] + (1. - exp(0.01*(( ( ( -1.0 ) ) ) / _p[7])))*(- ( ( ( _p[6] ) ) / _p[7] ) / ( ( ( ( -1.0) ) ) / _p[7] ) - _p[3]); _p[4] = _p[4] + (1. - exp(0.01*(( ( ( -1.0 ) ) ) / _p[11])))*(- ( ( ( _p[10] ) ) / _p[11] ) / ( ( ( ( -1.0) ) ) / _p[11] ) - _p[4]); } } int main(int argc, char *argv[]) { int i, n; double * data; if(argc < 2) { printf("\n Pass lenght of an array as argument \n"); exit(1); } n = atoi( argv[1] ); //data = _mm_malloc( sizeof(double) * n, 32); data = (double *) malloc( sizeof(double) * n * AOS_BLOCK); /* main compute function */ compute( data, n); if(argc > 3) for(i=0; i<n ; i++) printf("\t %lf", data[i]); free(data); //_mm_free(data); }
Any comments to understand this code and vectorization is appreciated.
Thanks!