Intel® Parallel Studio XE 2015 Update 4 Composer Edition for Windows*

May 18, 2015, 12:26 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2015 Update 4 Composer Edition for C++ Windows*

Intel® Parallel Studio XE 2015 Update 4 Composer Edition for Windows* includes Intel's latest Fortran and C/C++ compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® Visual Fortran Compiler 15.0.4, Intel® C++ Compiler XE Version 15.0.4, Intel® Math Kernel Library (Intel® MKL) Version 11.2 Update 3, Intel® Integrated Performance Primitives (Intel® IPP) Version 8.2 Update 2, Intel® Threading Building Blocks (Intel® TBB) Version 4.3 Update 5, Intel® Debugger Extension for Intel® Many Integrated Core Architecture (Intel® MIC Architecture) Version 7.7-8.0

New in this release:

Corrections to reported problems in installation and IDE integration
- Compiler fix list
- Intel® MKL fix list

Note: For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Intel® Parallel Studio XE 2015 Composer Edition (Click on desired product)
Intel® Parallel Studio XE 2015 Composer Edition Checksums

Contents
File: w_compxe_online_2015.4.221.exe
Online installer

File: w_compxe_2015.4.221.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, English version)

File: w_compxe_all_jp_2015.4.221.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, Japanese version)

File: w_ccompxe_redist_msi_2015.4.221.zip
C++ Redistributable Libraries for 32-bit and 64-bit msi files

File: w_fcompxe_redist_msi_2015.4.221.zip
Fortran Redistributable Libraries for 32-bit and 64-bit msi files

File: get-ipp-8.2-crypto-library.htm
Cryptography Library

Entwickler

Microsoft Windows* (XP, Vista, 7)

Microsoft Windows* 8.x

C/C++

Fortran

Intel® Parallel Studio XE Composer Edition

Intel® C++-Compiler

Intel® Fortran Compiler

Intel® Debugger

Intel® Math Kernel Library

Intel® Threading Building Blocks

Intel® Integrated-Performance-Primitives

Intel® Fortran Composer XE

Intel® Composer XE

Intel® C++ Composer XE

Intel® Visual Fortran Composer XE

URL

↧

Intel® Parallel Studio XE 2015 Update 4 Composer Edition for C++ Windows*

May 18, 2015, 12:47 pm

Latest and popular articles on Intel Technologies

≫ Next: Unresolved references in MSVCRT.lib with Visual Studio 2015 RC

≪ Previous: Intel® Parallel Studio XE 2015 Update 4 Composer Edition for Windows*

Intel® Parallel Studio XE 2015 Update 4 Composer Edition for C++ Windows* includes the latest Intel C/C++ compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® C++ Compiler XE Version 15.0.4, Intel® Math Kernel Library (Intel® MKL) Version 11.2 Update 3, Intel® Integrated Performance Primitives (Intel® IPP) Version 8.2 Update 2, Intel® Threading Building Blocks (Intel® TBB) Version 4.3 Update 5, Intel® Debugger Extension for Intel® Many Integrated Core Architecture (Intel® MIC Architecture) Version 7.7-8.0

New in this release:

Corrections to reported problems in installation and IDE integration
- Compiler fix list
- Intel® MKL fix list

Note: For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Intel® Parallel Studio XE 2015 Composer Edition (Click on desired product)
Intel® Parallel Studio XE 2015 Composer Edition Checksums

Contents
File: w_ccompxe_online_2015.4.221.exe
Online installer

File: w_ccompxe_2015.4.221.exe
Product for developing 32-bit and 64-bit applications

File: w_ccompxe_redist_msi_2015.4.221.zip
Redistributable Libraries for 32-bit and 64-bit msi files

File: get-ipp-8.2-crypto-library.htm
Cryptography Library

Entwickler

Microsoft Windows* (XP, Vista, 7)

Microsoft Windows* 8.x

C/C++

Intel® Parallel Studio XE Composer Edition

Intel® C++-Compiler

Intel® Math Kernel Library

Intel® Threading Building Blocks

Intel® Integrated-Performance-Primitives

Intel® Composer XE

Intel® C++ Composer XE

URL

↧

Unresolved references in MSVCRT.lib with Visual Studio 2015 RC

May 20, 2015, 4:48 am

Latest and popular articles on Intel Technologies

≫ Next: _mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2 signed short vectors

≪ Previous: Intel® Parallel Studio XE 2015 Update 4 Composer Edition for C++ Windows*

I installed Intel Parallel Studio XE 2016 Beta Update 1 with Visual Studio Community 2015 RC and I'm getting unresolved references in MSVCRT.lib when I try to build a default Win32 console project in x64 mode:

1>ipo: : warning #11021: unresolved __vcrt_initialize
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __vcrt_uninitialize
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __vcrt_uninitialize_critical
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __vcrt_thread_attach
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __vcrt_thread_detach
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved _is_c_termination_complete
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __acrt_initialize
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __acrt_uninitialize
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __acrt_uninitialize_critical
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __acrt_thread_attach
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : warning #11021: unresolved __acrt_thread_detach
1> Referenced in MSVCRT.lib(utility.obj)
1>ipo: : error #11023: Not all components required for linking are present on command line

All project settings are defaults set by project wizard. I installed only 64-bit target tools.

↧

_mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2 signed short vectors

May 20, 2015, 7:18 am

Latest and popular articles on Intel Technologies

≫ Next: Parallelization of dyadic product

≪ Previous: Unresolved references in MSVCRT.lib with Visual Studio 2015 RC

I am using the _mm_unpacklo_epi16 and _mm_unpackhi_epi16 with second argumet vector of 0s to convert signed/unsigned short vectors into 2 signed/unsigned integer vectors. i.e.:

__m128i lowVec = _mm_unpacklo_epi16(vecA vec0);
__m128i highVec = _mm_unpackhi_epi16(vecA,vec0);

This works fine with 16 unsigned chars vector into 2 unsigned short vectors using _mm_unpacklo_epi8 and _mm_unpackhi_epi8, yet when the input vector is of 16 signed chars the 2 short values in result vectors are all 127+original values.

I found a way to overcome this using add operation with 127, and immediately after the unpack performing substraction of the 127, yet this is very non elegant.

Another way was to use _mm_cvtepi8_epi16 and shift operations to get the wanted values - but this was less elegant than the previous add/sub and the performance was worse.

According the documentation of the _mm_unpacklo_epi8 and _mm_unpackhi_epi8 there was not suppose to be any problem with signed chars...

↧

Parallelization of dyadic product

May 22, 2015, 4:41 am

Latest and popular articles on Intel Technologies

≫ Next: Memory leak caused or worsened by /Qipo?

≪ Previous: _mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2 signed short vectors

Hi,

I have two vectors (they can address the same vector) and I need to perform the product x[i]*y[j] with i,j=1..n.

What is the best way to perform this operation in parallel? I've tried

cilk_for(h=0;h<n*n;h++)r[h]=x[h/n]*y[h%n];

but I guess it is only a naive tentative to do that. Indeed vec-report says it is uneffcient.

Thanks.

Fabio

↧

Memory leak caused or worsened by /Qipo?

May 23, 2015, 7:20 pm

Latest and popular articles on Intel Technologies

≫ Next: Possible compiler bug

≪ Previous: Parallelization of dyadic product

I've made a DLL while I compile with /Qipo (Intel C++ Composer XE2015). If I call the constructor and destructor of the main class in it, the memory doesn't get released and after a few calls (32 bit mode) I'm out of memory. However, if I disable /Qipo, there doesn't seem to be a problem at all (I will run it for a longer period tonight, but I let it construct and deconstruct 1024 times earlier tonight and I didn't notice an increase in memory usage).

If I use /Qip mode, the leak is 8 MB per call. With /Qipo it's about 300 MB.

I have checked a .EXE version of my software with Inspector XE2015 and it doesn't report any leaks, and neither does the debugger.

Any clues to help me find the cause are helpful. If I have a memory leak somewhere in my code, could that somehow cause - when using IPO - the whole memory to leak???

↧

Possible compiler bug

May 26, 2015, 1:11 am

Latest and popular articles on Intel Technologies

≫ Next: proc_bind(spread) does not seem to be honored

≪ Previous: Memory leak caused or worsened by /Qipo?

There's a possible bug in the icc installed with Composer XE 2013 SP1 Update 5 (2013.1.5.239). The compiler compiiles the code but the result leads to a run-time crash.

Here is a c++ programs that can reproduce the crash:

If executed it leads to this error:

"Run-Time Check Failure #2 - Stack around the variable 'os_.1016' was corrupted."

//------------------------------------------------
#include <new>

struct base
{
  virtual ~base() {}
};

struct test_type : virtual base
{
};

template<class T>
struct opt
{
  bool init_;
  char buffer_[sizeof(T)];

  opt() : init_(false) {}

  void recreate()
  {
    clear();
    construct();
  }

  void clear()
  {
    if (init_)
    {
      destroy();
    }
  }

  ~opt()
  {
    clear();
  }

  T& operator*()
  {
    return *address();
  }

private:
  void* raw_storage()
  {
    return &buffer_;
  }

  T* address()
  {
    return static_cast<T*>(raw_storage());
  }

  void construct()
  {
    ::new (raw_storage()) T();
    init_ = true;
  }

  void destroy()
  {
    address()->T::~T();
    init_ = false;
  }
};

void test()
{
  opt<test_type>os_;
  os_.recreate();
  os_.clear();
}

int main()
{
  test();
}
//------------------------------------------------

The program compile and runs correctly in gcc, clang, and vc110. (Tried here: http://melpon.org/wandbox and in VS2012)

↧

proc_bind(spread) does not seem to be honored

May 26, 2015, 8:56 am

Latest and popular articles on Intel Technologies

≫ Next: CPU2006 compile issues with MSVC 2013, ICC XE 2015 rev 3, windows server 2012

≪ Previous: Possible compiler bug

Hello Folks,

I have a program that is decomposed in two parts:
One loop that allocates data: it does 4 iterations, one for each socket
One loop that does computation on the data, it does 48 iterations (each thread should work on a slice of data, hopefully a slice of data that is on the local socket).

My machine is a 4 socket, 12 cores per processor Xeon machine. I'm using ICC 15.0.1 20141023

To have good scalability, I need to allocate data evenly on each of my processors.
To that end, I have found that "KMP_AFFINITY=scatter" does exactly what I need.
The problem is that this does not work well with my second loop, that does computations. I'd like computations to occur on the socket that has the data allocated in (kind of like the "owner compute rule".

I thought that OpenMP 4.0's proc_bind(spread) for the first loop, then proc_bind(close) would allow me to have threads where I need them to be, but from my experience, and checking with "sched_getcpu()" to see where a thread is running, using "proc_bind(spread)" or not doesn't make any difference, while "KMP_AFFINITY=scatter" does exactly what I need.

Questions:
Am I right to assume that proc_bind(spread) should do the same thing as "KMP_AFFINITY=scatter"?

Do you think I'm going the wrong way trying to use OpenMP to pin my threads in a custom manner for my two loop nests?

I hope you can help me with this,

Best regards,

Guillaume

↧

CPU2006 compile issues with MSVC 2013, ICC XE 2015 rev 3, windows server 2012

May 28, 2015, 3:03 pm

Latest and popular articles on Intel Technologies

≫ Next: debug symbol of binary appeare always w/o debug enable

≪ Previous: proc_bind(spread) does not seem to be honored

ICL Version 12.0.3.208

I have compile errors with 2 cpu2006 benchmarks.

483.xalancbmk dies in execution if I compile with -O3 -ipo; it works fine with -O2

453.povray says:

file defaultrenderfrontend.cpp

error "<mathimf.h> is incompatible with system <math.h>!"

↧

debug symbol of binary appeare always w/o debug enable

May 30, 2015, 7:37 am

Latest and popular articles on Intel Technologies

≫ Next: Compiler bug in XE 2015: error : no instance of function template "..." matches the argument list

≪ Previous: CPU2006 compile issues with MSVC 2013, ICC XE 2015 rev 3, windows server 2012

Hello,

My ICC version intel_parallel_studio_xe_2015_update1, trial version.

I used following command to compile,

icc -w -fpermissive -fPIE -I. -DMKL_ILP64 -DLINUX -std=c++11 -g0 -O3 -c xx.cpp -o /tmp/xx.o
xiar rcs /tmp/xx.a /tmp/xx.o

and use following command to link.

icc /tmp/release/*.o -o release/yy -L/opt/intel/composer_xe_2015.3.187/lib/intel64 -L/opt/intel/composer_xe_2015.3.187/mkl/lib/intel64 -Wl,--start-group /opt/intel/composer_xe_2015.3.187/mkl/lib/intel64/libmkl_core.a /opt/intel/composer_xe_2015.3.187/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2015.3.187/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2015.3.187/compiler/lib/intel64/libiomp5.a /opt/intel/composer_xe_2015.3.187/compiler/lib/intel64/libirc.a -Wl,--end-group -L. -Bdynamic -lm -Bstatic -pthread -lstdc++ -ldb -lsqlite3

when I get binary yy as above, used objdump -t yy|grep debug, it will show.

0000000000000000 l d .debug_aranges   0000000000000000 .debug_aranges
0000000000000000 l d .debug_info   0000000000000000 .debug_info
0000000000000000 l d .debug_abbrev   0000000000000000 .debug_abbrev
0000000000000000 l d .debug_line   0000000000000000 .debug_line
0000000000000000 l d .debug_str   0000000000000000 .debug_str
0000000000000000 l d .debug_loc   0000000000000000 .debug_loc
0000000000000000 l d .debug_ranges   0000000000000000 .debug_ranges

That I think it is a debug version, but actually, I do compile release version.

My gcc is 4.9.2 compile from source code and instead of system default gcc (4.4) ， os is centos6.6.

2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.9.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../gcc-4.9.2/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++ --disable-dssi --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --disable-multilib
Thread model: posix
gcc version 4.9.2 (GCC)

When I used gcc to compile, it should no issue. binary for gcc is about 5.2M, and for icc is 11M.

Please advise any solution?

Thanks,

yixuan

↧

Compiler bug in XE 2015: error : no instance of function template "..." matches the argument list

June 1, 2015, 1:23 am

Latest and popular articles on Intel Technologies

≫ Next: Memory latency numbers

≪ Previous: debug symbol of binary appeare always w/o debug enable

Hi,

the following code:

#include <tuple>

struct Foo {
	std::tuple<int> inner;
	template <unsigned Idx>
	auto get() const -> decltype(std::get<Idx>(inner)) { return std::get<Idx>(inner); }
};

int main()
{
	Foo f;
	f.get<0>();
}

produces the following error:

1>main.cpp(12): error : no instance of function template "Foo::get" matches the argument list
1>              object type is: Foo
1>    	f.get<0>();
1>    	  ^

It seems as if the compiler is not able to deduce the trailing return type.

Best regards,
Manuel Pöter

↧

Memory latency numbers

June 1, 2015, 5:44 am

Latest and popular articles on Intel Technologies

≫ Next: Which parallelization library to use for realtime processing?

≪ Previous: Compiler bug in XE 2015: error : no instance of function template "..." matches the argument list

Hi,

I 've been working on a talk about cache effects. My main document is "What every programmer should know about memory". I've been working on small benchmark that shows the different level of caches : walking an array of objects in a linear/randomized order. The 3 levels of cache are quite obvious when walking in a randomized order, and I measure about one cache line every 300 clock cycles on a Xeon 2xxx once the program has to go to main memory.

The paper "What every programmer should know about memory" states that this is more that the main memory latency because the hardware needs to translate virtual addresses to physical addresses and you might be slow down by the TLB.

What surprises me is that some people claim some very accurate numbers for L1/L2/L3/main memory access. So here are my questions ?

- On what does the memory latency depend upon ? CPU / Mother board / RAM

- Does Intel give measurement for memory latency ?

- How do people measure this latency ? If the author of "What every programmer should know about memory" claims that 300 cycles is more than the memory latency, he must have a way of knowing this latency. Unfortunately, he does not give any hint on how to get/measure it.

↧

Which parallelization library to use for realtime processing?

June 2, 2015, 4:50 pm

Latest and popular articles on Intel Technologies

≫ Next: integer constant as template parameter problem

≪ Previous: Memory latency numbers

I'm developing a realtime audio processing software. There may be several (for example even 100) processors at each moment, in several parallel chains. I cannot let the processors cooperate and must assume any possible sequence of processing. Each of them receives a block of data usually 256-1024 values and needs to process them as quickly as possible, so that the results may be passed to the next item in chain. If the data is not delivered in time, bad things happen... But in many cases just a few processors may be used and the goal is to keep general CPU usage minimal then. The algorithms in each processor vary a lot, so it is hard to predict anything.

The "host" for all these processors is unknown and usually implements some kind of parallelization as well, but in my testing huge project it was reporting "near trouble" CPU usage, while the system task manager reported just about 14% CPU usage on my 8-core Xeon E5, so evidently there's a lot of spare processing power.

From what I know these are the choices:

1) TBB - this one looks harder to use.

2) CILK

3) OpenMP - I actually tested this one via MSVC and sadly it seemed to have open actively waiting threads, which means that the CPU was at 100% despite pretty small improvement in performance.

I'd prefer if the solution could be linked statically. All of the processor implementations will be present in a single DLL (Windows) / dylib (OSX).

Any recommendations?

↧

integer constant as template parameter problem

June 3, 2015, 1:43 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2015 Update 4 Professional Edition for Windows*

≪ Previous: Which parallelization library to use for realtime processing?

Compiling pcl-1.7.2 with intel c++ compiler 15.0.4.221 Build 20150407 on Windows failed with errors related to Eigen-3.2.4.
I was able to reduce the problem to the following code snippet. It uses an integer constant as parameter of a template type. This type is used as the return type of a template classes member function. The intel c++ compiler reports a mismatch between declaration and definition of this member function. But Microsoft compiler, gcc and I agree, that there is no mismatch. Is this an intel c++ compiler problem?
(For those interested: I found a workaround by defining the integer constant with enum { ... } instead of int const ...).

namespace MyNamespace {
  int const MyConstant=42; // this leads to error in line 19 with intel compiler
  // enum {MyConstant=42};      // ok with intel compiler
 
  template<typename A, int B> class MyRetVal
  {
  };


  template<typename A>
  class MyClass
  {
  public:
    MyRetVal<A,MyConstant> func();
  };
 

  template<typename A>
  MyRetVal<A,MyConstant> MyClass<A>::func() // <-- error with intel compiler
  {
    MyRetVal<A,MyConstant> m;
    return m;
  }
}


int main()
{
  MyNamespace::MyClass<float> a;
  a.func();
  return 0;
}

Here is the compiler error message:

intel_template.cpp
intel_template.cpp(19): error: declaration is incompatible with "MyNamespace::MyRetVal<A, MyNamespace::MyConstant> MyNamespace::MyClass<A>::func()" (declared at line 14)
MyRetVal<A,MyConstant> MyClass<A>::func() // <-- error with intel compiler

compilation aborted for intel_template.cpp (code 2)

↧

Intel® Parallel Studio XE 2015 Update 4 Professional Edition for Windows*

June 2, 2015, 2:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2015 Update 4 Professional Edition for C++ Windows*

≪ Previous: integer constant as template parameter problem

Intel® Parallel Studio XE 2015 Update 4 Professional Edition parallel software development suite combines Intel's C/C++ compiler and Fortran compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering. This new product release includes:

Intel® Parallel Studio XE 2015 Update 4 Composer Edition - includes Intel® Visual Fortran Compiler, Intel® C++ Compiler, Intel® Integrated Performance Primitives (Intel® IPP), Intel® Threading Building Blocks (Intel® TBB) and Intel® Math Kernel Library (Intel® MKL)
Intel® Advisor XE 2015 Update 1
Intel® Inspector XE 2015 Update 1
Intel® VTune™ Amplifier XE 2015 Update 4
Sample programs
Documentation

New in this release:

Components updated to current versions

Note: For more information on the changes listed above, please read the individual component release notes.

See the previous releases' ReadMe to see what was new in that release.

Resources

Intel® Parallel Studio XE (Click on desired product)
Intel® Parallel Studio XE 2015 Professional Edition Checksums

Contents
File: parallel_studio_xe_2015_update4_online_setup.exe
Online installer

File: parallel_studio_xe_2015_update4_setup.exe
Product for developing 32-bit and 64-bit applications

Entwickler

Microsoft Windows* (XP, Vista, 7)

Microsoft Windows* 8.x

C/C++

Fortran

Intel® Parallel Studio XE Composer Edition

Intel® Parallel Studio XE Professional Edition

Intel® VTune™ Amplifier XE

Intel® C++-Compiler

Intel® Inspector XE

Intel® Advisor XE

Intel® Fortran Compiler

Intel® Math Kernel Library

Intel® Threading Building Blocks

Intel® Integrated-Performance-Primitives

Intel® Fortran Composer XE

Intel® Composer XE

Intel® C++ Composer XE

Intel® C++ Studio XE

Intel® Fortran Studio XE

Intel® Visual Fortran Composer XE

URL

↧

Intel® Parallel Studio XE 2015 Update 4 Professional Edition for C++ Windows*

June 2, 2015, 3:16 pm

Latest and popular articles on Intel Technologies

≫ Next: Empty icl 2015 update 4 fix list

≪ Previous: Intel® Parallel Studio XE 2015 Update 4 Professional Edition for Windows*

Intel® Parallel Studio XE 2015 Update 4 Professional Edition for C++ parallel software development suite combines Intel's C/C++ compiler; performance and parallel libraries; error checking, code robustness, and performance profiling tools into a single suite offering. This new product release includes:

Intel® Parallel Studio XE 2015 Update 4 Composer Edition for C++ - includes Intel® C++ Compiler, Intel® Integrated Performance Primitives (Intel® IPP), Intel® Threading Building Blocks (Intel® TBB) and Intel® Math Kernel Library (Intel® MKL)
Intel® Advisor XE 2015 Update 1
Intel® Inspector XE 2015 Update 1
Intel® VTune™ Amplifier XE 2015 Update 4
Sample programs
Documentation

New in this release:

Components updated to current versions

Note: For more information on the changes listed above, please read the individual component release notes.

See the previous releases' ReadMe to see what was new in that release.

Resources

Intel® Parallel Studio XE (Click on desired product)
Intel® Parallel Studio XE 2015 Professional Edition Checksums

Contents
File: parallel_studio_xe_2015_update4_online_setup.exe
Online installer

File: parallel_studio_xe_2015_update4_setup.exe
Product for developing 32-bit and 64-bit applications

Entwickler

Microsoft Windows* (XP, Vista, 7)

Microsoft Windows* 8.x

C/C++

Intel® Parallel Studio XE Composer Edition

Intel® Parallel Studio XE Professional Edition

Intel® VTune™ Amplifier XE

Intel® C++-Compiler

Intel® Inspector XE

Intel® Advisor XE

Intel® Math Kernel Library

Intel® Threading Building Blocks

Intel® Integrated-Performance-Primitives

Intel® Composer XE

Intel® C++ Composer XE

Intel® Cluster Studio

URL

↧

Empty icl 2015 update 4 fix list

June 4, 2015, 2:30 pm

Latest and popular articles on Intel Technologies

≫ Next: icc16: openmp declare reduction : internal error: 20000_0

≪ Previous: Intel® Parallel Studio XE 2015 Update 4 Professional Edition for C++ Windows*

The fixes list for update 4 is empty (no any info about fixes in this update) https://software.intel.com/en-us/articles/intel-composer-xe-2015-compile...

↧

icc16: openmp declare reduction : internal error: 20000_0

June 5, 2015, 1:38 pm

Latest and popular articles on Intel Technologies

≫ Next: In-class initializer bug, destructors incorrectly called, compiler 2015

≪ Previous: Empty icl 2015 update 4 fix list

Hey all,

I tried to use the OpenMP declare reduction pragma for an implementation of an array reduction. The code compiles fine using g++ 4.9 and produces correct results. However, using icpc-16.0.038 gives an internal error 20000_0. I attached a small code example.

#include <iostream>
#include <vector>
#include <complex>


template <const int n>
struct complex_array {

    complex_array() : element( n, std::complex<float>(0.0f,0.0f) ) {

    }

    std::vector< std::complex<float> > element;
};

template <const int n>
static inline void add_complex_struct( complex_array<n> &x, complex_array<n> const &y ) {

    for ( int i = 0; i < n; ++i ) {

        x.element[i] += y.element[i];
    }
}


#pragma omp declare reduction( complex_array_reduction : complex_array<10> : add_complex_struct<10>(omp_out, omp_in) )


int main() {

    complex_array<10> a_array, b_array;


    for ( int i = 0; i < 10; ++i ) {

        std::complex<float> cf( 1.0f+i, 1.5f+i );

        a_array.element[i] =  cf;
    }


    #pragma omp parallel for schedule(static) reduction( complex_array_reduction:b_array )
    for ( int i = 0; i < 100; ++i ) {

        add_complex_struct<10>( b_array, a_array );
    }


    for ( int i = 0; i < 10; ++i ) {

        std::cout << b_array.element[i] << std::endl;
    }
}

I used g++ -fopenmp and icpc -qopenmp to compile the code.

Thanks,

Patrick

↧

In-class initializer bug, destructors incorrectly called, compiler 2015

June 5, 2015, 1:50 pm

Latest and popular articles on Intel Technologies

≫ Next: cv-qualifier bug

≪ Previous: icc16: openmp declare reduction : internal error: 20000_0

I've got a test case where I have a class with 3 subobjects (A, B and C), and the 2nd subobject Bthrows an exception during construction. As I understand C++, the compiler should rewind the construction of the big class and destroy the 1st object A, but not the 2nd (B) or 3rd (C) objects.

What I see is that if I use "In-class initialization" of the first object A, then instead of the first object Agetting destroyed, the 3rd object C gets destroyed. Of course it is VERY BAD to destroy an object that has not been constructed! If, for example, C was a std:unique_ptr<T>, it will probably signal a segmentation violation when it tries to free a garbage pointer.

If I use old school "member initialization", then this problem doesn't happen.

I don't see this with gcc 4.8

Here's the code. The class D exposes the bug. The class E should have identical function, but it does not expose the bug.

#include <iostream>

using namespace std;


struct A {
    A(const string& x) : x_(x) { cout << "A::A()"<< (void*)this <<endl; }
    ~A() { cout << "A::~A() "<< (void*)this<< endl;}
    string x_;
};

struct B {
    B(const A& a)  { cout << "B::B()"<< endl; throw "dead"; }
    ~B() { cout << "B::~B()"<< endl;}
};

struct C {
    C()  { cout << "C::C()"<< endl; }
    ~C() { cout << "C::~C()"<< endl;}
};


struct D  {
    A a{"foo"}; // "new school In-class initialization"
    B b{a};
    C c;
    D() { cout <<"D::D()"<< endl; }
    ~D() { cout <<"D::~D()"<< endl; }
};

struct E {
    A a;
    B b;
    C c;
    E()
        :a{"foo"}  // "old school member initialization"
        ,b(a)
        { cout <<"E::E()"<< endl; }
    ~E() { cout <<"E::~E()"<< endl; }
};

int main()
{
   try {
       D d;
   }
   catch(...)
   {
       cout << "got exception"<< endl;
   }

   try {
       E e;
   }
   catch(...)
   {
       cout << "got exception"<< endl;
   }

   return 0;
}

Here is the output. I expect to see A constructed, B partially constructed then throws, then Adestroyed, but that is not what I see for the D case.

$ icpc -std=c++11 test.cpp
$ ./a.out
A::A()0x7fffe0a5ee90
B::B()
C::~C()
got exception

A::A()0x7fffe0a5eea0
B::B()
A::~A() 0x7fffe0a5eea0
got exception

-- update --

The section of the standard that describes what should happen is 15.2.3

For an object of class type of any storage duration whose initialization or destruction is terminated by an exception, the destructor is invoked for each of the object’s fully constructed subobjects, that is, for each subobject for which the principal constructor (12.6.2) has completed execution and the destructor has not yet begun execution, except that in the case of destruction, the variant members of a union-like class are not destroyed. The subobjects are destroyed in the reverse order of the completion of their construction. Such destruction is sequenced before entering a handler of the function-try-block of the constructor or destructor, if any.

↧

cv-qualifier bug

June 5, 2015, 3:27 pm

Latest and popular articles on Intel Technologies

≫ Next: Integration of XE 2016 Beta into Visual Studio 2015 Community RC

≪ Previous: In-class initializer bug, destructors incorrectly called, compiler 2015

The following code compiles fine with clang and g++ (-std=c++14) but fails with icc 2016:

#include <iostream>

template<class Derived>
struct ConstBase {
  template<bool T = true>
  int f() const {
    return 3;
  }
};

template<class Derived>
struct Base : ConstBase<Derived> {
  using ConstBase<Derived>::f;
  template<bool T = true>
  int f() {
    const Derived& derived = static_cast<const Derived&>(*this);
    return derived.f();
  }
};

struct A : Base<A> {
};

int main() {
  A a;
  std::cout << a.f() << "\n";
  return 0;
}

↧