Hello,
I have this piece of code
asm("nop"); asm("nop"); asm("nop"); asm("nop"); double const c = a - b; double const d = -c; double const f = d * e; data->field += f; asm("nop"); asm("nop"); asm("nop"); asm("nop"); #if !defined(BAD) printf ("#### new field at %d: %f/%f/%llx\n", __LINE__, data->field, data->field, *(unsigned long long *), &data->field)); printf ("#### %f/%g/%llx %f/%g/%llx %f/%g/%llx %f/%g/%llx\n", data->field, data->field, *(unsigned long long *), &data->field, a, a, *(unsigned long long *), &a, b, b, *(unsigned long long *, &b), e, e, *(unsigned long long *), &e); #endif
So d is basically b-a. However, if b-a is not representable my applications requires d to be an upper bound on the exact value. Therefore the whole code is executed with floating point rounding mode FE_DOWNWARD and it is not ok to directly compute d=b-a (which would give a lower bound on the exact value).
If BAD is not defined then I get this assembly code:
610141: 90 nop 610142: 90 nop 610143: 90 nop 610144: 90 nop 610145: f2 0f 10 44 24 40 movsd 0x40(%rsp),%xmm0 61014b: f2 0f 5c 44 24 48 subsd 0x48(%rsp),%xmm0 610151: 0f 57 05 08 63 97 00 xorps 0x976308(%rip),%xmm0 # f86460 <.L_2il0floatpacket.43+0x80> 610158: f2 0f 59 44 24 10 mulsd 0x10(%rsp),%xmm0 61015e: f2 41 0f 58 44 24 20 addsd 0x20(%r12),%xmm0 610165: f2 41 0f 11 44 24 20 movsd %xmm0,0x20(%r12) 61016c: 90 nop 61016d: 90 nop 61016e: 90 nop 61016f: 90 nop
This is more or less a literal translation of the code in C and my application works correct in this case. If I define BAD then I get this assembly code instead:
610086: 90 nop 610087: 90 nop 610088: 90 nop 610089: 90 nop 61008a: f2 0f 10 44 24 38 movsd 0x38(%rsp),%xmm0 610090: f2 0f 5c 44 24 40 subsd 0x40(%rsp),%xmm0 610096: 0f 57 05 b3 61 97 00 xorps 0x9761b3(%rip),%xmm0 # f86250 <.L_2il0floatpacket.43+0x80> 61009d: 0f 57 05 ac 61 97 00 xorps 0x9761ac(%rip),%xmm0 # f86250 <.L_2il0floatpacket.43+0x80> 6100a4: f2 0f 59 04 24 mulsd (%rsp),%xmm0 6100a9: f2 41 0f 58 44 24 20 addsd 0x20(%r12),%xmm0 6100b0: f2 41 0f 11 44 24 20 movsd %xmm0,0x20(%r12) 6100b7: 90 nop 6100b8: 90 nop 6100b9: 90 nop 6100ba: 90 nop
and my application does not behave as expected (it computes wrong results). The two xorps statements already look suspicious to me. As far as I understand they perform two xor with the same value, hence are essentially a nop with respect to the result in xmm0. Single stepping through the code in gdb I can see that in the case with BAD not defined:
(gdb) info registers rsp rsp 0x7fffffffa010 0x7fffffffa010 (gdb) print *(double *)(0x7fffffffa010 + 0x40) $1 = 5000000 (gdb) print *(double *)(0x7fffffffa010 + 0x48) $2 = 0
while with BAD defined I get
(gdb) info registers rsp rsp 0x7fffffffa020 0x7fffffffa020 (gdb) print *(double *)(0x7fffffffa020 + 0x38) $2 = 0 (gdb) print *(double *)(0x7fffffffa020 + 0x40) $3 = 5000000
Thus, without BAD the code computes d as stated in the C code, while with BAD the code computes d directly as d=b-a. The latter will round inexact values into the wrong direction which will in turn produce incorrect results in my application.
I am using
icc (ICC) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
I compile with
-O -fno-builtin-strlen -fno-builtin-strcat -fno-builtin-strcmp -fno-builtin-strcpy -fno-builtin-strncat -fno-builtin-strncmp -fno-builtin-strrchr -m64 -fPIC -fno-strict-aliasing -diag-disable 1419 -w1 -Wcheck -Wall -Wmissing-declarations -Wmissing-prototypes -Wshadow -vec-report0 -fp-model strict
I have
#pragma fenv_access(on)
at the top-level of my source code.
Am I missing anything here or is this indeed an invalid optimization?
Thanks,
Daniel