How to optimize a simple loop?

December 22, 2015, 4:13 am

Latest and popular articles on Intel Technologies

≪ Previous: Parallel Studio XE 2016 on Linux Mint: Unsupported?

The loop is simple

void loop(int n, double* a, double const* b)
{
#pragma ivdep
    for (int i = 0; i < n; ++i, ++a, ++b)
        *a *= *b;
}

I am using intel c++ compiler and using #pragma ivdep for optimization currently. Any way to make it perform better like using multicore and vectorization together, or other techniques?

↧

Evaluation license file?

December 22, 2015, 5:24 am

Latest and popular articles on Intel Technologies

≫ Next: xilink error

≪ Previous: How to optimize a simple loop?

Hello to all,

We have a server with Parallel Studio X without reliable network access (its on a ship). As such, it is impossible to download 3.9GB files.

The Parallel Studio XE licensing on the ships computer has been accidentally setup to see a network license server which it can't communicate with so it complains and won't work.

To solve this problem temporarily, we wanted to transfer an evaluation license (license file only) to the ship so the Parallel Studio XE will work.

Unfortunately, installing Parallel Studio XE as an evaluation version does not seem to produce a licence file that we could transfer over to the ship.

Is there one?

Thanks

↧

xilink error

December 23, 2015, 4:12 am

Latest and popular articles on Intel Technologies

≫ Next: Please suggest me an intel CPU for best performance of my code

≪ Previous: Evaluation license file?

Hi,

after complete new installation of visual studio 2015 and composer 2016 sp1 on a Windows 10 pro I get the following error of xilink:

C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Platforms\x64\PlatformToolsets\Intel C++ Compiler 16.0\Toolset.targets(1090,5): error MSB6006: "xilink.exe" .. Code -1073741701

What can I do?

Frank

↧

Please suggest me an intel CPU for best performance of my code

December 23, 2015, 4:46 am

Latest and popular articles on Intel Technologies

≫ Next: icl error with msbuild

≪ Previous: xilink error

My code is quite simple

void foo(int n, double* a, double* b, double *c, double*d, double* e, double* f, double* g)
{
    for (int i = 0; i < n; ++i)
    {
        a[i] = b[i] * a[i] + c[i] * (d[i] + e[i] + f[i] + g[i]);
    }
}

I want a very good performance. Please suggest me an intel CPU for best performance of my code. And any strategy to optimize its performance with intel c++ compiler? Each iteration has 6 floating point operations. Can you estimate the maximum FLOPS it can reach. Currently I can get only about 3G FLOPS in i7. Thank a lot for your suggestion!

↧

icl error with msbuild

December 23, 2015, 8:23 am

Latest and popular articles on Intel Technologies

≫ Next: Can't compile for x64 anymore after applying Parallel Studio 2016 update 1

≪ Previous: Please suggest me an intel CPU for best performance of my code

After installing IntelV16 Update1 and VS2015 Update 1 , my buildprocess is broken when msbuild(14.0.24720) is used .

icl complains "icl: : error : option '/Qopt-matmul' not supported", if "Enable Matrix Multiply Library Call"is set to any other value than "Default".

Building from inside VS or with devenv is not affected.

I have attached a simple testcase for reproducing.

Bug or Feature ?

(background the real application has 287 projectfiles , and I don't want change every project by hand, as workaround I build now with devenv)

Kind Regards

Steffen

Anhang	Größe
Herunterladen sample.zip	147.92 KB

↧

Can't compile for x64 anymore after applying Parallel Studio 2016 update 1

December 26, 2015, 10:09 am

Latest and popular articles on Intel Technologies

≫ Next: installation of trial version of latest version of c/c++ compiler

≪ Previous: icl error with msbuild

I was succesfully building several projects with Intel C++ 16.0 Build 10250815 (installed with parallel_studio_xe_2016_setup.exe) for both Win32 and x64 target platforms.

After upgrading to the Intel C++ compiler 16.0 Build 20151021 (installed with parallel_studio_xe_2016_update1_setup.exe) I can't compile for x64 targets anymore (Win32 is all OK though).

I get this error message for all files I tried to compile for an x64 platform (as defined in the Configuration Manager):

>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Platforms\x64\PlatformToolsets\Intel C++ Compiler 16.0\Microsoft.Cp.x64.Intel C++ Compiler 16.0.targets(344,5): error MSB6006: "icl.exe" exited with code -1073741819.

I'll be glad to know if there is a known fix / workaround for this issue, the only one I see ATM will be to revert back to the previous version (Parallel Studio 2016 initial release)

↧

installation of trial version of latest version of c/c++ compiler

December 28, 2015, 3:46 am

Latest and popular articles on Intel Technologies

≫ Next: Intel C++ Compiler internal error: 04010002_15114

≪ Previous: Can't compile for x64 anymore after applying Parallel Studio 2016 update 1

how i can install the trial version of intel icc compiler in my machine.

↧

Intel C++ Compiler internal error: 04010002_15114

December 28, 2015, 3:58 am

Latest and popular articles on Intel Technologies

≫ Next: Can simd and omp parallel for work together for a loop?

≪ Previous: installation of trial version of latest version of c/c++ compiler

Hi everyone,

Got error while building MPFR library (mpfr.org) using Intel C++ Compiler.

Used stuff:
- Windows 10,
- Visual Studio 2015 Update 1 with integrated Intel Parallel Studio XE 2016 Update 1,
- MPIR-2.7.2 sources (http://mpir.org/mpir-2.7.2.zip),
- MPFR-3.2.0-dev-9769 sources (svn://scm.gforge.inria.fr/svn/mpfr/trunk),
- 'MPFR with Visual Studio 2015' solution (https://github.com/BrianGladman/mpfr).

The easiest way to reproduce this error is through 'MPFR with Visual Studio 2015' solution using Visual Studio with integrated Intel Parallel Studio (pre-compiled MPIR binaries also needed). While building Release Configuration got:

'error #10298: problem during post processing of parallel object compilation'
and
'error : 04010002_15114'

It inherent to all <shared,static>+<x86,x64> configurations, if were used optimization keys: Minimize Size (-O1), Maximize Speed (-O2), Highest Optimization (-O3), Full Optimization (-Ox). The only optimization key which allow to successfully build MPFR using Intel Compiler is [Optimization] Disabled (-Od) (Visual Studio build logs added in attachment).

Remarks:
- error inherent to bundle Windows 7 + Visual Studio 2013 + Intel Parallel Studio XE 2015 too,
- there is no troubles to build MPFR using 'MPFR with Visual Studio 2015' solution using *Microsoft* Compiler with optimization key Full Optimization (-Ox),
- while using Visual Studio it appears only for builds of MPFR shared and static libraries directly (projects 'dll_mpfr' and ''lib_mpfr'); it does not appear for builds of MPFR 'lib_tests' library, all tests, 'tuneup' utility for <x86,x64>+<shared,static> configurations,
- while using Visual Studio it does not appear for builds of all MPIR projects, including shared and static libraries,
- after disabling Multi-processor Compilation (as recommended by Jim Dempsey), 'error #10298..." gone, while '04010002_15114' still remains,
UPDATE1: - 'Floating Point Model' set (/fp:strict), 'Interprocedural Optimization' set (No) for all tests above.

For testing purposes MPFR was build without Visual Studio using Intel Compiler 16.0 Update 1 with MSYS2. With key '-Ox' MPFR build using Intel Compiler finished with same error:

'C:\libMPFR-3.2.0-dev-9769\src\src\exceptions.c(285) (col. 16): internal error: 04010002_15114'

and *any* additional information. But with key '-Od' all shared and static builds finished successfully (logs added in attachment).

Remarks:
- while using command line with key '-Ox', 'error: 04010002_15114' appeared in the first compilation unit for builds of MPFR shared and static libraries,
- the source of troubles is *not* 'MPFR with Visual Studio 2015' solution, because same errors appeared while using MSYS2,

MPFR developers have come to the conclusion that error described above is a bug in the Intel C++ Compiler. If so, how could it be fixed?

Regards,

Alexander

Anhang	Größe
Herunterladen MPFR+MSYS2+IntelCompiler_logs.zip	46.13 KB
Herunterladen MPFR+VisualStudio+IntelCompiler_logs.zip	11.12 KB

↧

Can simd and omp parallel for work together for a loop?

December 29, 2015, 8:14 am

Latest and popular articles on Intel Technologies

≫ Next: What is the difference between __restrict and restrict?

≪ Previous: Intel C++ Compiler internal error: 04010002_15114

I have a loop, can ivdep, simd and omp parallel for work together for the loop like

#pragma ivdep
#pragma simd
#pragma omp parallel for
for(int i = 0; i < n; ++i)
{
    ... // code without data dependencies between iterations
}

or the compiler might just choose simd or omp parallel?

↧

What is the difference between __restrict and restrict?

December 30, 2015, 7:16 am

Latest and popular articles on Intel Technologies

≫ Next: Internal error: assertion failed at: "shared/cfe/edgcpfe/lower_name.c", line 9613

≪ Previous: Can simd and omp parallel for work together for a loop?

I check them in my code and find __restrict gives better performance than restrict when used for pointers with intel c++ compiler. I use #pragma omp parallel for and #pragma ivdep for my loop. What is the difference between __restrict and restrict?

↧

Internal error: assertion failed at: "shared/cfe/edgcpfe/lower_name.c", line 9613

December 31, 2015, 7:46 am

Latest and popular articles on Intel Technologies

≫ Next: Friend declaration for sibling inner class causes warning

≪ Previous: What is the difference between __restrict and restrict?

Compiling the attached source yields an internal error in shared/cfe/edgcpfe/lower_name.c. This issue is not limited to compilation of pre-processed sources, I've only attached a pre-processed source to make it easier for you to reproduce. If you prefer reproducing this with normal C++ sources I can give you instructions to reproduce by compiling a patched Fruit (http://github.com/google/fruit) library.

This is the first time that I try compiling Fruit with the Intel compiler, so I'm not sure if it worked on previous versions of icc. The version I'm using is ICC 16.0.1 20151021 on Linux.

 $ icc -x c++-cpp-output -O2 -g -fPIC -std=c++11 -W -Wall -g -Werror include_test.i
/home/marco/projects/fruit/include/fruit/impl/meta/basics.h(27): internal error: assertion failed at: "shared/cfe/edgcpfe/lower_name.c", line 9613

  struct Type {
         ^

compilation aborted for /home/marco/projects/fruit/build/tests/include_test.i (code 4)

↧

Friend declaration for sibling inner class causes warning

December 31, 2015, 7:54 am

Latest and popular articles on Intel Technologies

≫ Next: Should we align for SIMD on modern x86?

≪ Previous: Internal error: assertion failed at: "shared/cfe/edgcpfe/lower_name.c", line 9613

The following code causes a bogus warning:

 ~ > cat /tmp/main.cpp
template <typename T>
class Parent {
  class A {};
  class B {
    friend class Parent<T>::A;
  };
};
 ~ > icc -c /tmp/main.cpp
/tmp/main.cpp(5): warning #135: class template "Parent<T>" has no member "A"
      friend class Parent<T>::A;
                              ^

However it's perfectly legal C++ (AFAICT), and there are no warnings under either GCC 4.8.5 or Clang 3.7.0.

↧

Should we align for SIMD on modern x86?

January 1, 2016, 4:30 pm

Latest and popular articles on Intel Technologies

≫ Next: Alignment of returned address from malloc()

≪ Previous: Friend declaration for sibling inner class causes warning

Hi,

I've been working on the usage of aligning arrays to SIMD width on modern x86 CPU. I finally found this piece of code that shows a difference on my computer (Core i7-4850HQ).

#include <iostream>
#include <chrono>
#include <mm_malloc.h>

int main(int argc, const char* argv[]) {
  const int n{8000};
  const int nb_loops{10000000};
  {
    char* a{new char[n]};
    char* b{new char[n]};
    char* c{new char[n]};
    for (int i = 0; i < n; ++i) {
      a[i] = 0;
      b[i] = 1;
      c[i] = 0;
    }

    auto start = std::chrono::high_resolution_clock::now();
    for (int k = 0; k < nb_loops; ++k) {
      for (int i = 0; i < n; ++i) {
        b[i] = a[i] + b[i] + c[i];
      }
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto time =
        std::chrono::duration_cast<std::chrono::nanoseconds>(end - start)
            .count();

    std::cout << "Time unaligned: "<< time << " ns"<< std::endl;

    delete[] c;
    delete[] b;
    delete[] a;
  }

  {
    char* a{static_cast<char*>(_mm_malloc(n * sizeof(char), 32))};
    char* b{static_cast<char*>(_mm_malloc(n * sizeof(char), 32))};
    char* c{static_cast<char*>(_mm_malloc(n * sizeof(char), 32))};
    for (int i = 0; i < n; ++i) {
      a[i] = 0;
      b[i] = 1;
      c[i] = 0;
    }

    auto start = std::chrono::high_resolution_clock::now();
    for (int k = 0; k < nb_loops; ++k) {
#pragma omp simd aligned(a, b, c : 32)
      for (int i = 0; i < n; ++i) {
        b[i] = a[i] + b[i] + c[i];
      }
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto time =
        std::chrono::duration_cast<std::chrono::nanoseconds>(end - start)
            .count();

    std::cout << "Time aligned: "<< time << " ns"<< std::endl;

    _mm_free(c);
    _mm_free(b);
    _mm_free(a);
  }

  return 0;
}

On my CPU, the unaligned version takes 1.57s whereas the aligned version takes 1.36s when compiled with

icpc -std=c++11 -O3 -xHost -ansi-alias -qopenmp main.cpp -o main

I would like to understand the reason for this difference. Here are the suspects:

1) SIMD aligned loads and stores are faster than unaligned ones

2) A SIMD aligned data does not cross a cacheline which makes the the memory transfer faster

3) The code for the SIMD version does not have loop peeling, and is therefore way smaller which makes the loop faster

It seems that reason 1 is not valid on modern CPU. For reason 2, it does not seem right as the aligned version loses its advantage over the first without the alignement hint. This is the reason I highly suspect the third reason. To confirm that, it would be nice to have a compiled version without loop peeling and with aligned loads. Is there a way to do that?

If 3 is the reason for the better speed, why do we still have loop peeling on x86 hardware?

The Xeon Phi is another beast as unaligned loads and stores are way slower than aligned ones. Is it expected to vanish with the future generations of Xeon Phi?

↧

Alignment of returned address from malloc()

January 3, 2016, 2:46 pm

Latest and popular articles on Intel Technologies

≫ Next: Out-of-memory error when compiling a huge project for 32-bit Windows target for profile guided opts

≪ Previous: Should we align for SIMD on modern x86?

Hi, guys,

I am using icc 15.0.2 which is compatible to gcc 4.4.7. Whenever I allocate a memory space with malloc function, the address is aligned by 16 bytes. I know gcc's malloc provides the alignment for 64-bit processors. Does the icc malloc function support the same alignment of address? I think it is related to the quality of vectorization and I definitely need to make sure the malloc function of icc also supports the alignment.

↧

Out-of-memory error when compiling a huge project for 32-bit Windows target for profile guided opts

January 4, 2016, 3:45 am

Latest and popular articles on Intel Technologies

≫ Next: What is "gcc compatibility mode" and how to disable it?

≪ Previous: Alignment of returned address from malloc()

I have a huge project, that needs lots of optimizations, so I use profile guided stuff, which doesn't really improve performance, but smallers the executables quite a bit (from 50MB to 35MB or so). First it takes a hell of a lot of time to build the executables for generting the profile guided database (or how is it called), but my main problem is that 64-bit version works fine, but 32-bit version evetually (after hours of processing...) ends with out-of-memory error (I have enough memory). So I assume the compiler for 32-bit target is also 32-bit (how else could 64-bit compiler succeed).

Is there a way to use the 64-bit compiler to produce 32-bit binaries? Like some parameter or something? I generally don't understand why selecting a different architecture is done by selecting a different compiler.

↧

What is "gcc compatibility mode" and how to disable it?

January 4, 2016, 5:20 am

Latest and popular articles on Intel Technologies

≫ Next: cmake does not find std::nullptr_t

≪ Previous: Out-of-memory error when compiling a huge project for 32-bit Windows target for profile guided opts

Hello,

I am - so far - quite unsuccessful in using icpc 2016 while making boost 1.59.0. I did manage to compile a helloworld program BTW.

I am using Linux Mint 17.3

My question is: icpc shows

 ~ $ icpc -v
icpc version 16.0.0 (gcc version 3.2.0 compatibility)

What does the "compatibility mode" mean? It seems that when building bjam in order to install boost 1.59.0 I get errors because "gcc is too old" as they say over in the boost mailinglist. As a matter of fact, my gcc / g++ are version 4.8.4!

I didn't find anything useful when googling..

What's wrong? Thanks a lot in advance for any hint!

Andreas

↧

cmake does not find std::nullptr_t

January 4, 2016, 6:00 am

Latest and popular articles on Intel Technologies

≫ Next: Profile guided optimizations do not work at all on OSX

≪ Previous: What is "gcc compatibility mode" and how to disable it?

I'm trying to build a library with the Intel compiler using cmake. This library makes use of the std::nullptr_t introduced with C++11. So the cmake scripts check for '-std=c++11' which is present in the Intel 15 compiler.

However the std::nullptr_t is not supported by the system gcc (4.3.x) resulting in the following error message: "Your compiler supports the 'nullptr' keyword, but not the type std::nullptr_t.You are using an Intel compiler, where this error is typically caused by an outdated underlying system GCC."

When I load a newer gcc (5.1.0) using the modules environment the sample code compiles when I compile the extracted code by hand. But inside the cmake script it does not. The Intel compiler offers command line options for the gcc paths, but the '-cxxlibs=...' expects a base dir which includes bin, lib64, include and so on. The gcc 5.1.0 installed by the vendor however does not follow this scheme.

This is the test code from the cmake script:

#include <cstddef>

int main(void)
{
  std::nullptr_t npt = nullptr;
  return 0;
}

↧

Profile guided optimizations do not work at all on OSX

January 4, 2016, 8:51 am

Latest and popular articles on Intel Technologies

≫ Next: xblas compile fails tests with intel compiler 2015.3.187

≪ Previous: cmake does not find std::nullptr_t

I'm trying to use PGO on OSX and while it works on Windows (only 64-bit though), here I can generate 64-bit executable for profiling, but to perform the profile guided opts then, the compiler returns this:

dyld: Library not loaded: @loader_path/libcilkrts.5.dylib
  Referenced from: /opt/intel/composer_xe_2015.3.187/bin/intel64/profmerge
  Reason: image not found
icpc: error #10106: Fatal error in /opt/intel/composer_xe_2015.3.187/bin/intel64/profmerge, terminated by trace trap

I tried reinstalling, didn't help. I'm using this composer_xe_2015.3.187 on newest OSX.

↧

xblas compile fails tests with intel compiler 2015.3.187

January 4, 2016, 10:13 am

Latest and popular articles on Intel Technologies

≫ Next: Installing Intel Parallel Studio

≪ Previous: Profile guided optimizations do not work at all on OSX

FAILED dot : FAIL/TOTAL = 6/28

PASSED sum : FAIL/TOTAL = 0/4

FAILED axpby : FAIL/TOTAL = 4/12

FAILED waxpby : FAIL/TOTAL = 10/28

FAILED gemv : FAIL/TOTAL = 3/28

FAILED ge_sum_mv : FAIL/TOTAL = 13/28

FAILED gbmv : FAIL/TOTAL = 8/28

FAILED symv : FAIL/TOTAL = 15/28

FAILED spmv : FAIL/TOTAL = 13/28

FAILED sbmv : FAIL/TOTAL = 15/28

FAILED hemv : FAIL/TOTAL = 7/12

FAILED hpmv : FAIL/TOTAL = 6/12

FAILED hbmv : FAIL/TOTAL = 7/12

FAILED trmv : FAIL/TOTAL = 7/12

FAILED tpmv : FAIL/TOTAL = 8/12

FAILED trsv : FAIL/TOTAL = 8/12

FAILED tbsv : FAIL/TOTAL = 8/12

FAILED gemm : FAIL/TOTAL = 16/28

FAILED symm : FAIL/TOTAL = 14/28

FAILED hemm : FAIL/TOTAL = 7/12

FAILED gemv2 : FAIL/TOTAL = 16/28

FAILED symv2 : FAIL/TOTAL = 16/28

FAILED hemv2 : FAIL/TOTAL = 7/12

FAILED gbmv2 : FAIL/TOTAL = 16/28

↧

Installing Intel Parallel Studio

January 4, 2016, 5:03 pm

Latest and popular articles on Intel Technologies

≫ Next: error when using icpc compiling c++

≪ Previous: xblas compile fails tests with intel compiler 2015.3.187

Hello,

I have been trying to install Intel Parallel Studio XE update 1, but when I put in my serial number I get the message:

"Registration failed for unknown reason. Please go to
www.intel.com/software/products/support/ for your support options."

I have tried looking through the different topics here, but whenever somebody has this issue the response is always "send a private message and we will help", and therefore I don't know the solution. When I go on the website, there is no online chat to ask a technical question. Is there anybody who can help, please?!

↧

Latest Images