Quantcast
Channel: Intel® C++-Compiler
Viewing all articles
Browse latest Browse all 1665

Vectorization Issue with loop iterations

$
0
0

Hi All,

I am trying to compile following sample kernel with Intel (ICC) 14.0.0 20130728 (or version > 12 ). I see strange behaviour with vectorization. I have following questions:

  • If I change _iml variable type to int instead of long int, compiler doesn't vectorize the code. If I see vectorization report with  -vec-report3, I see large report with ANTI and FLOW dependencies which seems correct.  But I didn't understand what compiler does to vectorize when I change loop iteration variable type to long int.
  • Below example is auto-generated kernel from domain specific language. We have large array and we process 18 elements of array for every iteration (say those 18 elements represent a particle). So iterations are independent. But this memory layout looks similar to AoS (arrya of struct with 18 elements). AoS is not good for vectorization, I want to understant how Intel compiler vectorize this code.

compute() function is actual compute kernel that  I want to vectorize. Please follow the comments for more explaination:

#include <math.h>
#define AOS_BLOCK 18

void compute(double *pdata, int num_mechs) {

    double* _p;

    /* ISSUE :  If I change _iml to int instead of long int
     * compiler doesn't vectorize the code. Why? 
     */
    long int _iml;

    /* for each iteration of loop, we process 18 elements of pdata 1-d array */
    for (_iml = 0; _iml < num_mechs; ++_iml) {

        /* take pointer to start of 18 blocks element */
        _p = &pdata[_iml*AOS_BLOCK];

        /* below calculations are generanted from DSL to C code converter, looks ugly I know!  
         * we do some computation on those 18 elements only, so you don't need to understand
        */

        if ( _p[16]  == - 35.0 ) {
            _p[16] = _p[16] + 0.0001 ;
        }

        _p[8] = ( 0.182 * ( _p[16] - - 35.0 ) )/ ( 1.0 - ( exp ( - ( _p[16] - - 35.0 ) / 9.0 ) ) ) ;
        _p[9] = ( 0.124 * ( - _p[16] - 35.0 ) )  / ( 1.0 - ( exp ( - ( - _p[16] - 35.0 ) / 9.0 ) ) ) ;
        _p[6] = _p[8] / ( _p[8] + _p[9] ) ;
        _p[7] = 1.0 / ( _p[8] + _p[9] ) ;

        if ( _p[16]  == - 50.0 ) {
            _p[16] = _p[16] + 0.0001 ;
        }

        _p[12] = ( 0.024 * ( _p[16] - - 50.0 ) ) / ( 1.0 - ( exp ( - ( _p[16] - - 50.0 ) / 5.0 ) ) ) ;

        if ( _p[16]  == - 75.0 ) {
            _p[16] = _p[16] + 0.0001 ;
        }

        _p[13] = ( 0.0091 * ( - _p[16] - 75.0 ) ) / ( 1.0 - ( exp ( - ( - _p[16] - 75.0 ) / 5.0 ) ) ) ;
        _p[10] = 1.0 / ( 1.0 + exp ( ( _p[16] - - 65.0 ) / 6.2 ) ) ;
        _p[11] = 1.0 / ( _p[12] + _p[13] ) ;

        _p[3] = _p[3] + (1. - exp(0.01*(( ( ( -1.0 ) ) ) / _p[7])))*(- ( ( ( _p[6] ) ) / _p[7] ) / ( ( ( ( -1.0) ) ) / _p[7] ) - _p[3]);
        _p[4] = _p[4] + (1. - exp(0.01*(( ( ( -1.0 ) ) ) / _p[11])))*(- ( ( ( _p[10] ) ) / _p[11] ) / ( ( ( ( -1.0) ) ) / _p[11] ) - _p[4]);
    }
}

int main(int argc, char *argv[])
{
    int i, n;
    double * data;

    if(argc < 2)
    {
        printf("\n Pass lenght of an array as argument \n");
        exit(1);
    }

    n = atoi( argv[1] );

    //data = _mm_malloc( sizeof(double) * n, 32);
    data = (double *) malloc( sizeof(double) * n * AOS_BLOCK);

    /* main compute function */
    compute( data, n);

    if(argc > 3)
        for(i=0; i<n ; i++)
            printf("\t %lf", data[i]);

    free(data);
    //_mm_free(data);
}

Any comments to understand this code and vectorization is appreciated.

Thanks!


Viewing all articles
Browse latest Browse all 1665

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>