Hello,
I have run into something that looks like icc is generating invalid optimized code. Basically, I have the following code
feenableexcept(FE_DIVBYZERO);
...
if ( chg4 ) {
printf("Change 4\n");
for (j = 0; j < len2; ++j) {
if ( chg4[j] > 0 )
maxpenalty = XMAX (maxpenalty, 1.0 / chg4[j]);
if ( chg4[j] <= 0.0 || base->data->d2[j] >= 1e20 )
continue;
maplen++;
totlen1++;
}
}Here 'chg4' is an array of doubles, all at 0, len2 is the length of the array and XMAX is a macro computing the max of its arguments.
If I compile this code with 'icc -g' and run it, then everything works as expected. However, when I compile it with 'icc -O' or just 'icc' then running the code throws a floating point exception (division by zero). In gdb I get this
(gdb) run
...
Program received signal SIGFPE, Arithmetic exception.
0x0000000000400fd9 in wrapper ()
(gdb) disassemble
...
0x0000000000400fbb <+747>: movaps 0x214e(%rip),%xmm7 # 0x403110
0x0000000000400fc2 <+754>: movaps 0x2157(%rip),%xmm0 # 0x403120
0x0000000000400fc9 <+761>: movslq %ecx,%rdi
0x0000000000400fcc <+764>: movaps %xmm2,%xmm10
0x0000000000400fd0 <+768>: movaps %xmm5,%xmm11
0x0000000000400fd4 <+772>: movaps (%r8,%rdi,8),%xmm9
=> 0x0000000000400fd9 <+777>: divpd %xmm9,%xmm10
0x0000000000400fde <+782>: cmpltpd %xmm9,%xmm11
0x0000000000400fe4 <+788>: cmplepd %xmm5,%xmm9
...
(gdb) p $xmm9
$1 = {v4_float = {0, 0, 0, 0}, v2_double = {0, 0}, v16_int8 = {
0 <repeats 16 times>}, v8_int16 = {0, 0, 0, 0, 0, 0, 0, 0}, v4_int32 = {0,
0, 0, 0}, v2_int64 = {0, 0}, uint128 = 0}
(gdb) p $xmm10
$2 = {v4_float = {0, 1.875, 0, 1.875}, v2_double = {1, 1}, v16_int8 = {0, 0,
0, 0, 0, 0, -16, 63, 0, 0, 0, 0, 0, 0, -16, 63}, v8_int16 = {0, 0, 0,
16368, 0, 0, 0, 16368}, v4_int32 = {0, 1072693248, 0, 1072693248},
v2_int64 = {4607182418800017408, 4607182418800017408},
uint128 = 0x3ff00000000000003ff0000000000000}
(gdb)I took a quick look at the generated assembler code and it looks like the offending divpd instruction is in a part that corresponds to an optimized version of the loop above and the code indeed attempts to compute 1.0/chg4[j], thereby producing a division by zero. I think that this is a bug since my source code explicitly checks that we never do a division if the denominator of the quotient would be zero.
Here is information about my environment and how I build things:
djunglas@MACHINE:~/fpebug> uname -a Linux MACHINE 3.0.80-0.7-default #1 SMP Tue Jun 25 18:32:49 UTC 2013 (25740f8) x86_64 x86_64 x86_64 GNU/Linux djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc --version icc (ICC) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc -O -c -o main.o main.c djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc -O -c -o function.o function.c djunglas@MACHINE:~/fpebug> objdump -D -r function.o > function.txt djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc -o fpebug main.o function.o djunglas@MACHINE:~/fpebug> objdump -D -r fpebug > fpebug.txt djunglas@MACHINE:~/fpebug> ./fpebug Change 4 Floating point exception
I have attached the source code as well as object dumps of the object and the binary file. I would be very happy if someone could tell me whether this is expected behavior or indeed a bug in icc. I would also be happy to learn a way to work around this problem. Disabling FE_DIVBYZERO is not an option right now. Also, I would also like to keep -O.
Thanks a lot,
Daniel