Hello,
I have run into something that looks like icc is generating invalid optimized code. Basically, I have the following code
feenableexcept(FE_DIVBYZERO); ... if ( chg4 ) { printf("Change 4\n"); for (j = 0; j < len2; ++j) { if ( chg4[j] > 0 ) maxpenalty = XMAX (maxpenalty, 1.0 / chg4[j]); if ( chg4[j] <= 0.0 || base->data->d2[j] >= 1e20 ) continue; maplen++; totlen1++; } }
Here 'chg4' is an array of doubles, all at 0, len2 is the length of the array and XMAX is a macro computing the max of its arguments.
If I compile this code with 'icc -g' and run it, then everything works as expected. However, when I compile it with 'icc -O' or just 'icc' then running the code throws a floating point exception (division by zero). In gdb I get this
(gdb) run ... Program received signal SIGFPE, Arithmetic exception. 0x0000000000400fd9 in wrapper () (gdb) disassemble ... 0x0000000000400fbb <+747>: movaps 0x214e(%rip),%xmm7 # 0x403110 0x0000000000400fc2 <+754>: movaps 0x2157(%rip),%xmm0 # 0x403120 0x0000000000400fc9 <+761>: movslq %ecx,%rdi 0x0000000000400fcc <+764>: movaps %xmm2,%xmm10 0x0000000000400fd0 <+768>: movaps %xmm5,%xmm11 0x0000000000400fd4 <+772>: movaps (%r8,%rdi,8),%xmm9 => 0x0000000000400fd9 <+777>: divpd %xmm9,%xmm10 0x0000000000400fde <+782>: cmpltpd %xmm9,%xmm11 0x0000000000400fe4 <+788>: cmplepd %xmm5,%xmm9 ... (gdb) p $xmm9 $1 = {v4_float = {0, 0, 0, 0}, v2_double = {0, 0}, v16_int8 = { 0 <repeats 16 times>}, v8_int16 = {0, 0, 0, 0, 0, 0, 0, 0}, v4_int32 = {0, 0, 0, 0}, v2_int64 = {0, 0}, uint128 = 0} (gdb) p $xmm10 $2 = {v4_float = {0, 1.875, 0, 1.875}, v2_double = {1, 1}, v16_int8 = {0, 0, 0, 0, 0, 0, -16, 63, 0, 0, 0, 0, 0, 0, -16, 63}, v8_int16 = {0, 0, 0, 16368, 0, 0, 0, 16368}, v4_int32 = {0, 1072693248, 0, 1072693248}, v2_int64 = {4607182418800017408, 4607182418800017408}, uint128 = 0x3ff00000000000003ff0000000000000} (gdb)
I took a quick look at the generated assembler code and it looks like the offending divpd instruction is in a part that corresponds to an optimized version of the loop above and the code indeed attempts to compute 1.0/chg4[j], thereby producing a division by zero. I think that this is a bug since my source code explicitly checks that we never do a division if the denominator of the quotient would be zero.
Here is information about my environment and how I build things:
djunglas@MACHINE:~/fpebug> uname -a Linux MACHINE 3.0.80-0.7-default #1 SMP Tue Jun 25 18:32:49 UTC 2013 (25740f8) x86_64 x86_64 x86_64 GNU/Linux djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc --version icc (ICC) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc -O -c -o main.o main.c djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc -O -c -o function.o function.c djunglas@MACHINE:~/fpebug> objdump -D -r function.o > function.txt djunglas@MACHINE:~/fpebug> $ICCPATH/12.1/composer_xe_2011_sp1.11.339/bin/intel64/icc -o fpebug main.o function.o djunglas@MACHINE:~/fpebug> objdump -D -r fpebug > fpebug.txt djunglas@MACHINE:~/fpebug> ./fpebug Change 4 Floating point exception
I have attached the source code as well as object dumps of the object and the binary file. I would be very happy if someone could tell me whether this is expected behavior or indeed a bug in icc. I would also be happy to learn a way to work around this problem. Disabling FE_DIVBYZERO is not an option right now. Also, I would also like to keep -O.
Thanks a lot,
Daniel