Quantcast
Channel: Intel® C++-Compiler
Viewing all articles
Browse latest Browse all 1665

Invalid floating point optimization?

$
0
0

Hello,

I have this piece of code

asm("nop"); asm("nop"); asm("nop"); asm("nop");
double const c = a - b;
double const d = -c;
double const f = d * e;
data->field += f;
asm("nop"); asm("nop"); asm("nop"); asm("nop");
#if !defined(BAD)
printf ("#### new field at %d: %f/%f/%llx\n", __LINE__,
        data->field, data->field, *(unsigned long long *), &data->field));
printf ("#### %f/%g/%llx %f/%g/%llx %f/%g/%llx %f/%g/%llx\n",
       data->field, data->field, *(unsigned long long *), &data->field,
       a, a, *(unsigned long long *), &a, b, b, *(unsigned long long *, &b),
       e, e, *(unsigned long long *), &e);
#endif

So d is basically b-a. However, if b-a is not representable my applications requires d to be an upper bound on the exact value. Therefore the whole code is executed with floating point rounding mode FE_DOWNWARD and it is not ok to directly compute d=b-a (which would give a lower bound on the exact value).

If BAD is not defined then I get this assembly code:

  610141:       90                      nop
  610142:       90                      nop
  610143:       90                      nop
  610144:       90                      nop
  610145:       f2 0f 10 44 24 40       movsd  0x40(%rsp),%xmm0
  61014b:       f2 0f 5c 44 24 48       subsd  0x48(%rsp),%xmm0
  610151:       0f 57 05 08 63 97 00    xorps  0x976308(%rip),%xmm0        # f86460 <.L_2il0floatpacket.43+0x80>
  610158:       f2 0f 59 44 24 10       mulsd  0x10(%rsp),%xmm0
  61015e:       f2 41 0f 58 44 24 20    addsd  0x20(%r12),%xmm0
  610165:       f2 41 0f 11 44 24 20    movsd  %xmm0,0x20(%r12)
  61016c:       90                      nop
  61016d:       90                      nop
  61016e:       90                      nop
  61016f:       90                      nop

This is more or less a literal translation of the code in C and my application works correct in this case. If I define BAD then I get this assembly code instead:

  610086:       90                      nop
  610087:       90                      nop
  610088:       90                      nop
  610089:       90                      nop
  61008a:       f2 0f 10 44 24 38       movsd  0x38(%rsp),%xmm0
  610090:       f2 0f 5c 44 24 40       subsd  0x40(%rsp),%xmm0
  610096:       0f 57 05 b3 61 97 00    xorps  0x9761b3(%rip),%xmm0        # f86250 <.L_2il0floatpacket.43+0x80>
  61009d:       0f 57 05 ac 61 97 00    xorps  0x9761ac(%rip),%xmm0        # f86250 <.L_2il0floatpacket.43+0x80>
  6100a4:       f2 0f 59 04 24          mulsd  (%rsp),%xmm0
  6100a9:       f2 41 0f 58 44 24 20    addsd  0x20(%r12),%xmm0
  6100b0:       f2 41 0f 11 44 24 20    movsd  %xmm0,0x20(%r12)
  6100b7:       90                      nop
  6100b8:       90                      nop
  6100b9:       90                      nop
  6100ba:       90                      nop

and my application does not behave as expected (it computes wrong results). The two xorps statements already look suspicious to me. As far as I understand they perform two xor with the same value, hence are essentially a nop with respect to the result in xmm0. Single stepping through the code in gdb I can see that in the case with BAD not defined:

(gdb) info registers rsp
rsp            0x7fffffffa010	0x7fffffffa010
(gdb) print *(double *)(0x7fffffffa010 + 0x40)
$1 = 5000000
(gdb) print *(double *)(0x7fffffffa010 + 0x48)
$2 = 0

while with BAD defined I get

(gdb) info registers rsp
rsp            0x7fffffffa020	0x7fffffffa020
(gdb) print *(double *)(0x7fffffffa020 + 0x38)
$2 = 0
(gdb) print *(double *)(0x7fffffffa020 + 0x40)
$3 = 5000000

Thus, without BAD the code computes d as stated in the C code, while with BAD the code computes d directly as d=b-a. The latter will round inexact values into the wrong direction which will in turn produce incorrect results in my application.

I am using

icc (ICC) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

I compile with

-O -fno-builtin-strlen -fno-builtin-strcat -fno-builtin-strcmp -fno-builtin-strcpy -fno-builtin-strncat -fno-builtin-strncmp -fno-builtin-strrchr -m64 -fPIC -fno-strict-aliasing -diag-disable 1419 -w1 -Wcheck -Wall -Wmissing-declarations -Wmissing-prototypes -Wshadow -vec-report0 -fp-model strict

I have

#pragma fenv_access(on)

at the top-level of my source code.

Am I missing anything here or is this indeed an invalid optimization?

Thanks,

Daniel


Viewing all articles
Browse latest Browse all 1665

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>