Modified OS Loader

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Declaring Structures with Bit Fields

≪ Previous: add-response-file-dependencies

The modified OS loader makes it possible to save memory occupied by the section with the initialization data and remap read-only data back to read-only segment.

Loader Support API

The compiler provides a special library routine that can be used to initialize big-endian data in any OS loader. This routine implements an internal mechanism to perform big-endian data initialization. The task of the modified OS loader is to make a proper call to that routine.

To save memory occupied by the data initialization section, the loader should either remove the entire data initialization section from the loadable segment or remove all items except an end marker from the section. In the former case, it is additionally required to set the __initdata_begin symbol to zero.

The routine for initializing big-endian data is defined as follows:

Name:

biendian_datainit - initialize big-endian data

Syntax:

#include <loader/bedatainit.h>

int biendian_datainit(void *initdata);

Description:

The biendian_datainit API (routine) performs initialization of big-endian data for applications or shared libraries. You must load and relocate the application or shared library image in a writable memory segment prior to the big-endian data initialization. Apply any write protection on the application image only after performing data initialization.

The initdata argument is a pointer to the preloaded and relocated contents of the initdata executable and linking format (ELF) section for the application or shared library that is being initialized.

Return value:

Upon successful completion, 0 is returned. On error, biendian_datainit returns a non-zero error code.

Errors:

A complete list of errors has not yet been defined.

Prototype:

The initialization routine prototype is declared in:

<install-dir>/include/loader/bedatainit.h

Static library:

The static library libbedatainit.a containing this routine is located at:

<install-dir>/lib/libbedatainit.a

Parent topic: Dynamic Loader Data Initialization

↧

Declaring Structures with Bit Fields

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Legal Information

≪ Previous: Modified OS Loader

If a structure contains a bit field, the bit field is allocated based on the endian convention of the bit field type. This is straightforward when the bit fields are of the same endian convention as shown below.

Big endian struct with bit fields:

/* All bit fields allocated using big endian conventions */

#pragma byte_order (push, bigendian)

struct SB01 {
int b1:16;
int b2:8;
int b3:16;
int b4:16;
} sb01;

#pragma byte_order pop

Little endian struct with bit fields:

/* All bit fields allocated using little endian conventions */

#pragma byte_order (push, littleendian)

struct SB02 {
int b1:16;
int b2:8;
int b3:16;
int b4:16;
} sb02;

#pragma byte_order pop

Note

The big endian and little endian bit fields are allocated differently in their containers. The big endian bit fields are allocated from a high to a low bit, while little endian bit fields are allocated from a low to a high bit. As a result, big endian and little endian bit fields allocated to the same container could potentially overwrite each other. Therefore, the compiler issues an error when it detects a struct containing both big endian and little endian bit fields.

Parent topic: Explicit Endian Usage Model

↧

Legal Information

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: User and Reference Guides for the Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology v.14.0

≪ Previous: Declaring Structures with Bit Fields

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: Learn About Intel® Processor Numbers (http://www.intel.com/products/processor_number).

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Intel, Intel Atom, Intel Core, Intel Cilk, Intel VTune , Itanium, MMX, Pentium, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.

Parent topic: User and Reference Guides for the Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology v.14.0

↧

User and Reference Guides for the Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology v.14.0

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Wic-pointer

≪ Previous: Legal Information

Document number: 323095-006US

Legal Information

Start Here

↧

Wic-pointer

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: pe-dir-rule

≪ Previous: User and Reference Guides for the Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology v.14.0

Determines whether warnings are issued for conversions between pointers to distinct scalar types with the same representation.

Architectures

IA-32, Intel® 64 architectures

Syntax

-W[no-]ic-pointer

Arguments

None

Default

-Wic-pointer

The compiler issues warnings for conversions between pointers to distinct scalar types with the same representation.

Description

This option determines whether warnings are issued for conversions between pointers to distinct scalar types with the same representation.

For example, comparing the following example the compiler by default issues a warning due to conversion from pointer to int to pointer to long:

 void f(int *p) { long *q = p; }

If long and int values have the same representation on the target platform, the warning will not be issued if the -Wno-ic-pointer option is specified.

Parent topic: Alphabetical List of Compiler Options

↧

pe-dir-rule

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Alphabetical List of Compiler Options

≪ Previous: Wic-pointer

Allows prolog and epilog files to be selected based on a directory search. This is a legacy option; consider using the -pe-(i|u)dir-rule instead.

Architectures

IA-32, Intel® 64 architectures

Syntax

-pe-dir-rule="<regular expression> <prolog file> <epilog file> "

Arguments

regular expression

Is a regular expression using POSIX Extended Regular Expression syntax. A specified prolog and epilog file will be applied to both header and source files in the directory matching the regular expression.

For example:

-pe-dir-rule= "^/(proj1|proj2) /usr/include/be-prolog.h /usr/include/be-epilog.h"

prolog file

Is the name of a prolog file.

epilog file

Is the name of an epilog file.

Description

This option allows prolog and epilog files to be selected based on a directory search. More than one -pe-dir-rule options can be specified on a command line. If multiple -pe-dir-rule options are specified, they are processed in the order received until the first match.

This is a legacy option; consider using the -pe-(i|u)dir-rule instead.

Alphabetical List of Compiler Options

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Dynamic Loader Data Initialization

≪ Previous: pe-dir-rule

↧

Dynamic Loader Data Initialization

May 18, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: 游戏公司借助英特尔® Sample Code 加速发展

≪ Previous: Alphabetical List of Compiler Options

The dynamic loader initialization step is optional and requires a modified operating system (OS) loader.

Parent topic: Data Initialization Methods

Modified OS Loader

↧

游戏公司借助英特尔® Sample Code 加速发展

May 10, 2015, 6:30 pm

Latest and popular articles on Intel Technologies

≫ Next: cross compile from Linux to Win

≪ Previous: Dynamic Loader Data Initialization

下载 PDF

无论是独立的游戏开发人员还是以游戏开发为终身职业的游戏开发人员，都将受益于他人编写的代码。无论是对于了解新特性、解决以前难以攻克的问题，还是对于避免从头编写代码以节约时间，自由许可示例代码都是一种有益工具。英特尔在英特尔® 开发人员专区游戏开发人员板块提供了丰富的游戏示例代码。

图 1：英特尔通过下列网址提供游戏示例代码：https://software.intel.com/en-us/gamedev/code-samples

过去几年，英特尔与许多游戏开发人员开展合作，帮助他们优化游戏在英特尔® 硬件上的运行性能。我们经常会提出重要见解或开发出卓越功能，它们应该以示例代码的形式与世界共享。有时，为充分满足开发人员的需求，我们会创建相应的示例。过去几年，我们创建的示例能够适用于 Blizzard 和 Codemasters 发布的游戏，例如自适应体积阴影图（Adaptive Volumetric Shadow Maps，简称 AVSM）、保守形态抗锯齿（Conservative Morphological Anti-Aliasing，简称 CMAA）以及软件遮挡剔除 (Software Occlusion Culling)。

AVSM 能够显著提升《超级房车赛：起点 2 (Grid* 2)》的运行性能

多年以来，Codemasters 与英特尔工程师一直在开展游戏开发合作。针对《超级房车赛：起点 2》，Codemasters 想方设法提高游戏在英特尔硬件上的视觉效果。他们与英特尔工程师们展开讨论，决定使用英特尔 PixelSync 特性增加烟雾真实感。该特性被认为可使赛车游戏达到高可视性效果，用户可借助它让赛车产生很大的烟雾痕迹。它源于英特尔工程师 Marco Salvi，他使用 DirectX 11 创建了 AVSM 实施方案，并在 2010 年的国际图形学年会 (Siggraph) 上对其进行了演示。英特尔示例代码使用原子操作确保无规则透明度 (OIT)。为了将该方案用于《超级房车赛：起点 2》，Codemasters 和英特尔工程师开展了合作，以使用 PixelSync 对算法进行修改，确保它可在有限内存中运行。有限内存中的AVSM修改版本也作为英特尔示例进行了发布。

Codemasters 和英特尔工程师开展了为期 14 多天的现场集成合作。当 Codemasters 得到了借助游戏本身的粒子效果来生成 AVSM 纹理和应用自阴影的工作测试层时，便可认为完成了初步实施。一完成初步实施，Codemasters 的工程师就可对系统进行扩展，以让动画纹理更好地匹配游戏画面和视效。同时，Codemasters 的艺术家设计了用于补充新技术的粒子效果，使用了大量更小粒子而非大广告牌来展示多种烟雾粒子。Codemasters 的工程师发现，改善的灯光效果会将用户的注意力吸引到加法混合 (additive-blending) 粒子系统中的排序问题上来，从而需要返工重新创建更可靠的 CPU 粒子排序。

在确保了出色效果后，工程团队还需寻找有可能在极端情况下发生的任何问题。由于该游戏支持受玩家控制的摄像头，因此该摄像头可能会因距离烟雾效果太近以至于屏幕布满烟雾。这会导致大量重绘，且 AVSM 示例无法对其进行处理。工程团队将 AVSM 与按照镜顶而非像素和灯光进行镶嵌的屏幕空间相结合。该新方法成功处理了大量重绘，以应对最糟糕的游戏问题。

英特尔示例代码在解决这种问题方面发挥了多种作用。最初的研究工作激励 Codemasters 添加新特性。于是，Codemasters 进行了修改，使示例更好地匹配他们的游戏。随后，英特尔更新并重新发布了经过改善的示例以供其他游戏开发人员使用。

图 2： Codemasters《超级房车赛：起点* 2》应用了英特尔® AVSM 示例以达到更好的视觉效果

CMAA 使《魔兽世界*》的画面效果更加流畅

鉴于《魔兽世界》资料片《德拉诺之王*》可达到震撼的画面效果，Codemasters 将新抗锯齿算法添加到游戏图形选项。CMAA 用于为主流硬件提供快速有效的抗锯齿效果，是一种基于图像的后处理技术，由英特尔工程师 Filip Strugar 开发。通过在最后帧缓冲器上运行，它可在独立于渲染管线中其它变更的情况下实施抗锯齿。如《魔兽世界》某些最新延迟渲染技术所表明，该方法支持独立于所选明暗模型实施抗锯齿，为开发人员提供更多灵活性。

CMAA 还是一种可简单修改的算法，开发人员可自由地针对特殊用途对它进行修改并采取增强措施。《魔兽世界》6.1 内容补丁含有另一种名为 SSAA 2x + CMAA 的新抗锯齿模式。这样，通过在向下采样至原始分辨率之前针对 2x 帧缓冲对象实施 CMAA 计算，可将原始超级采样与后处理抗锯齿相匹配。通过算法组合可为高级用户提供最高保真度抗锯齿效果。

开发人员永远都是根据技术的开发价值而制定决策。就《魔兽世界》而言，由于英特尔提供了CMAA 示例，CMAA 试用决策的制定变得更加容易。该示例具有默认的测试场景，但仍允许开发人员插入自己的图像以预览 CMAA 效果并以毫秒计算分辨率或测量用例的增加成本，从而帮助开发人员制定成熟、明智的决策。《魔兽世界》开发团队可将当前 Pandaren 袭击内容的屏幕截图放入该示例中，并查看该性能成本下可达到多大程度的边缘平滑效果。

图 3：英特尔® CMAA 示例具有默认场景，还支持在自定义图像上对效果进行测试

一旦决定好将 CMAA 添加至游戏，就需要对《魔兽世界》引擎做出某些更改，以支持 CMAA 使用的 DirectX 11 特性。虽然该技术成功地用于了渲染管线的尾端，但仍需以特定的方式准备数据。该算法需要有只读深度缓冲器视图，这意味着某些引擎可能需将可选的只读标志添加到纹理和帧缓冲器对象中。它的某些功能及性能还要依靠乱序访问视图（UAV 又称 ImageBuffer）来实现。虽然很多 DirectX 11 引擎都实现了对 UAV 的支持，但其它引擎还需完成更新才能实现 UAV 支持。在这些支持加法之外，只对某些结构进行少许修改，就可几乎不加选择地对该示例的着色器代码进行重复利用。

CMAA 在成本价值和整体流程最小侵袭之间取得了重要平衡。这允许它以 90%-120% 的成本提供与 FXAA 3.8 相比更好的图像质量和稳定性。“增强型子像素形态学反锯齿” (SMAA) 是另一种受欢迎的后处理抗锯齿选择；最经济的版本 —— SMAA 1x 可提供更多抗锯齿功能，产生较少整体图形产品，但会造成更多的模糊及形状失真现象，且更易受到轻微帧间变化（短暂失稳）的影响，所有这些使它的运行成本与 CMAA 相比增加了 30%-120%。Leigh Davies 和 Filip Strugar 对这些算法的分析可用于 IDZ。

与 MSAA 不同，CMAA 的平滑功能还将能够用于 alpha 测试的纹理，为帧提供更加完善的抗锯齿。《德拉诺之王》甚至证明了，CMAA 可与 SSAA 相匹配，它所提供的抗锯齿效果与将它用于其他任何选项时相比，都更为美观和精确。该算法用于在 15W 第四代智能英特尔® 酷睿™ 处理器上以 1600x900 分辨率运行时，将时间保持在 3 毫秒以下。从算法复杂性角度来看，它的成本可按照 ½ 分辨率 3 pass + 1 最终原始分辨率 pass 来计算。

图 4：使用 CMAA，Blizzard 的《魔兽世界》*能够在主流 PC 上实现顺畅的运行性能。

软件遮挡剔除可减少《魔兽世界》*中不必要的渲染工作。

另一个对《魔兽世界》具有吸引力的英特尔示例是软件遮挡剔除。通过仅渲染摄像头实际能够看到的物体，渲染时间大幅缩短，且几乎不会影响结果。Fabien Giesen 撰写了包含多个部分的博客系列，旨在分析英特尔示例（已经得到更新），且 Blizzard 认为它很好用。

如往常一样，该示例代码需要进行重写，以适合游戏引擎。Blizzard 工程师通过采用该示例的内核，独立构建了其余部分。2013 年 3 月首次实施时，整个遮挡流程仅耗费了 0.2 - 1.5 毫秒和较低成本。此后，它日益成为了该游戏中不可或缺的有益部分。

这些技术的使用帮助《魔兽世界》继续流畅运行，同时，Blizzard 增强了游戏的视觉效果，凭借主流硬件为用户提供畅快的游戏体验。自由选择全套机器可帮助工程师发掘新机会，而且情况已经在向好的方向发展。

英特尔® Sample Code 面向所有人

英特尔游戏示例代码团队竭力满足游戏开发人员的需求，并努力发现实际要求，构建可供所有开发人员使用的有效实施方案。英特尔® Code Samples 许可不会减慢您的开发进程，也不会妨碍您发布游戏。本文所举实例旨在表明英特尔开发人员专区的示例代码能够通过各种方式帮助提升游戏的图形保真度和运行性能。

参考文献

英特尔游戏开发人员社区 (Intel Game Developer Community) 中的代码示例 - https://software.intel.com/en-us/gamedev/code-samples

自适应体积阴影图 - https://software.intel.com/en-us/blogs/2013/03/27/adaptive-volumetric-shadow-maps

保守形态抗锯齿 (CMAA) - 2014 年 3 月更新 - https://software.intel.com/en-us/articles/conservative-morphological-anti-aliasing-cmaa-update

《德拉诺之王》中基于边缘检测的后处理（GDC 演示由 Blizzard 和英特尔® 实施） - https://software.intel.com/sites/default/files/managed/4a/38/Edge-Detection-based-Post-Processing-in-Warlords-of-Draenor.pdf

工程师研讨会：《德拉诺之王》中的引擎演进 - http://us.battle.net/wow/en/blog/15936285/

英特尔® Code Sample 许可协议 - https://software.intel.com/en-us/articles/code-samples-license-5/

基于像素同步的无规则透明度近似法 - https://software.intel.com/en-us/articles/oit-approximation-with-pixel-synchronization-update-2014

软件遮挡剔除更新 2 - https://software.intel.com/en-us/blogs/2013/09/06/software-occlusion-culling-update-2

作者介绍

Brad Hill 目前担任英特尔开发人员关系部门的软件工程师。Brad 负责调研关于英特尔硬件的新技术，并通过英特尔® 开发人员专区以及在开发人员会议上与软件开发人员分享最佳方法。此外，他还担任学生/独立黑客松的工程总监，负责为全美范围内高等院校的优秀黑客松/游戏开发活动 (gamejam) 提供代码运行支持。

John Hartwig 目前担任英特尔开发人员关系部门的软件工程师。John 主要负责与 PC 客户端和 Android 移动设备领域的游戏开发人员进行合作，实施优化和推出独特的硬件特性。John 于 2010 年加盟英特尔，最初担任 GPGPU 和媒体驱动程序的显卡驱动开发人员。他能够自己制作艺术玩具，并获得了德保罗大学游戏开发学士学位。

samples we’ve created have been adapted for use in games published by Blizzard and Codemasters – specifically Adaptive Volumetric Shadow Maps (AVSM)

Conservative Morphological Anti-Aliasing (CMAA)

and Software Occlusion Culling.

Entwickler

Microsoft Windows* (XP, Vista, 7)

Microsoft Windows* 10

Microsoft Windows* 8.x

Intel® Core™ Prozessoren

Microsoft Windows* 8 Desktop

Koautoren:

Brad Hill (Intel)

↧

cross compile from Linux to Win

May 11, 2015, 4:00 am

Latest and popular articles on Intel Technologies

≫ Next: multimap support for unique_ptr

≪ Previous: 游戏公司借助英特尔® Sample Code 加速发展

Hello,

Can the Linux ICC be used in order to produce executables that run on Win. We are developing software that depends heavily on numerical computation and we want to use intel compiler to achieve a boost in our performance, but our budget currently allows for only one platform license.

I have checked this post and it states that this is not possible, however it is a four year old post and I was wondering if there has been any change. Also, should such a feature exist, would it retain the expected better performance, compared to an executable compiled with MSVC or cross-compiled with MINGW from Linux to Win.

I have searched in SO and in this forum but I haven't found a specific answer myself.

Thank you in advace,

Kyr

↧

multimap support for unique_ptr

May 11, 2015, 1:14 pm

Latest and popular articles on Intel Technologies

≫ Next: x64 Merge Module

≪ Previous: cross compile from Linux to Win

i am trying to build a multimap with unique_ptr and i am getting strange compilation errors in Linux

Example Code:

#include <iostream>
#include <memory>
#include <map>

using namespace std;

class Event {
public:
    Event (double time) : _time(time) {}
    double getTime () const { return _time; }
private:
    Event (Event const & e);
    void operator= (Event const &e);
    double _time;
};

class Calendar {
public:
    void addEvent (std::unique_ptr<Event> e) {
        double t = e->getTime(); // get time before trying the next line
        _events.insert(move(make_pair (t, move(e)))); // insert into multimap
    }
private:
    multimap <double, unique_ptr<Event>> _events;
};

int main () {
    unique_ptr<Event> e (new Event(1.0));

    Calendar c;
    c.addEvent (move(e));
}

above code compiles in OSX 10.10 with icpc 15.0.2 20150121

However, linux is another story:

% /opt/intel/bin/icpc --version && cat /etc/redhat-release && /opt/intel/bin/icpc main.cpp -std=c++11 -o main && ./main
icpc (ICC) 15.0.2 20150121
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.
CentOS release 6.6 (Final)
/usr/include/c++/4.4.7/bits/stl_pair.h(73): error: function "std::unique_ptr<_Tp, _Tp_Deleter>::unique_ptr(const std::unique_ptr<_Tp, _Tp_Deleter> &) [with _Tp=Event, _Tp_Deleter=std::default_delete<Event>]" (declared at line 214 of "/usr/include/c++/4.4.7/bits/unique_ptr.h") cannot be referenced -- it is a deleted function
_T2 second; ///< @c second is a copy of the second object
^
detected during:
implicit generation of "std::pair<_T1, _T2>::pair(const std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>> &) [with _T1=const double, _T2=std::unique_ptr<Event, std::default_delete<Event>>]" at line 136 of "/usr/include/c++/4.4.7/bits/stl_tree.h"
instantiation of class "std::pair<_T1, _T2> [with _T1=const double, _T2=std::unique_ptr<Event, std::default_delete<Event>>]" at line 136 of "/usr/include/c++/4.4.7/bits/stl_tree.h"
instantiation of "std::_Rb_tree_node<_Val>::_Rb_tree_node(_Args &&...) [with _Val=std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>, _Args=<const std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>> &>]" at line 111 of "/usr/include/c++/4.4.7/ext/new_allocator.h"
instantiation of "void __gnu_cxx::new_allocator<_Tp>::construct(__gnu_cxx::new_allocator<_Tp>::pointer, _Args &&...) [with _Tp=std::_Rb_tree_node<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>, _Args=<const std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>> &>]" at line 395 of "/usr/include/c++/4.4.7/bits/stl_tree.h"
instantiation of "std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_Link_type std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_create_node(_Args &&...) [with _Key=double, _Val=std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>, _KeyOfValue=std::_Select1st<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>, _Compare=std::less<double>, _Alloc=std::allocator<std::pair<const double, std::unique_ptr<Event,
std::default_delete<Event>>>>, _Args=<const std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>> &>]" at line 881 of "/usr/include/c++/4.4.7/bits/stl_tree.h"
instantiation of "std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_(std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_Const_Base_ptr, std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_Const_Base_ptr, const std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::value_type &) [with _Key=double, _Val=std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>,
_KeyOfValue=std::_Select1st<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>, _Compare=std::less<double>, _Alloc=std::allocator<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>]" at line 1200 of "/usr/include/c++/4.4.7/bits/stl_tree.h"
instantiation of "std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_equal(const std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::value_type &) [with _Key=double, _Val=std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>, _KeyOfValue=std::_Select1st<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>, _Compare=std::less<double>,
_Alloc=std::allocator<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>]" at line 438 of "/usr/include/c++/4.4.7/bits/stl_multimap.h"
instantiation of "std::multimap<_Key, _Tp, _Compare, _Alloc>::iterator std::multimap<_Key, _Tp, _Compare, _Alloc>::insert(const std::multimap<_Key, _Tp, _Compare, _Alloc>::value_type &) [with _Key=double, _Tp=std::unique_ptr<Event, std::default_delete<Event>>, _Compare=std::less<double>, _Alloc=std::allocator<std::pair<const double, std::unique_ptr<Event, std::default_delete<Event>>>>]" at line 21 of "main.cpp"
compilation aborted for main.cpp (code 2)

clearly the pair is trying to copy and this is triggering the error. i have tried this all kinds of different ways and still can't get it to compile in linux. suggestions? i would think icpc would support unique_ptr with multimap right?

thanks!

↧

x64 Merge Module

May 12, 2015, 7:47 am

Latest and popular articles on Intel Technologies

≫ Next: Making programmers more productive

≪ Previous: multimap support for unique_ptr

We have recently switched to the Intel compiler (Composer XE 2015) and we are trying to use WIX to create an installer for our product. We are currently having an issue with the x64 merge module (w_ccompxe_redist_intel64_2015.3.208.msm), WIX throws the following error:

light.exe(0,0): error LGHT0204: ICE80: This 32BitComponent Comp_cilkrts20.dll.D9F09DDD_F3FE_427A_A63E_83D87E7D99CC uses 64BitDirectory compiler.D9F09DDD_F3FE_427A_A63E_83D87E7D99CC

We have been using the Microsoft merge modules for several years and had no issues.

Does anyone have any experiences using the Intel x64 merge modules in WIX?

Thanks

Adrian

↧

Making programmers more productive

May 12, 2015, 3:54 pm

Latest and popular articles on Intel Technologies

≫ Next: icc13 license

≪ Previous: x64 Merge Module

One of the themes that ran through this year’s Intel Software Conference, in EMEA, was programmer productivity. The event took place in Seville in April and gave invited resellers and journalists an opportunity to learn more about Intel’s tools for high-performance computing (HPC), parallel programming, cross-platform development, and video processing.

“Scaling is a big deal and power consumption is talked about a lot,” said James Reinders, chief evangelist of Intel Software products, opening the day. “But one challenge that isn’t talked about enough is programmer productivity.”

This means not only making it easier for programmers to get things done, but also preserving their investment in skills and knowledge as the technology evolves. The Intel® Xeon Phi^TM product family, for example, offers up to 61 processor cores and is designed to only run parallel programs well. Yet, it still uses the same programming tools and models as the Intel® Xeon® products, avoiding the need for programmers to learn a whole new technology. This is also why Intel works hard at standards compliance, together with other companies and standards bodies, to ensure that code is portable between architectures.

Throughout the course of the event, there were opportunities to hear about several tools that can help to increase productivity. Laurent Duhem, senior application engineer, presented the new Intel® Advisor XE 2016 Beta for vectorization. This helps to identify where programs can use single instruction multiple data (SIMD) code, which can run the same calculation across a number of different data items simultaneously. The tool helps to ensure correctness by simulating vectorized loops and checking for any memory conflicts, and enables developers to more quickly identify where the program is spending most of its time (including the number of times a loop is called), so that these hot sections can be optimized. The tool offers hints for improving vectorization, and advice on where vectorization might be inefficient because of non-contiguous memory accesses. Vectorization is a difficult challenge, but this new tool provides guidance at each step to make it as easy as possible. You can download the beta now.

Tackling the multiplatform challenge

In the case of vectorization, productivity challenges might be said to arise from hardware complexity. In consumer software development, productivity is more likely to be challenged by the diverse range of operating systems, form factors and processor architectures that make up the device landscape. Intel® Integrated Native Developer Experience (INDE) is a suite of tools that enables programmers to write fast C++ code that targets multiple operating systems and architectures, making it easier to ship applications more quickly. Alex Weggerle, technical consulting engineer, explained how it integrates with your existing developer environment and introduced its key features. For example, it includes Intel® Hardware Accelerated Execution Manager (Intel HAXM), which uses virtualization technology to run a full-speed Android emulation. That enables developers to more quickly test a wide range of device sizes and types. The Graphics Frame Debugger eliminates the need to push updated OpenGL code to the target device for testing each time a change is made (a process of 5 to 10 minutes), so you can instead take a screenshot and instantly see any code changes applied to that screenshot. Alex also presented the Intel® XDK, a free HTML5 cross-platform development tool, that includes templates to help you get started quickly, and the Apache Cordova* APIs to enable cross-platform access to phone hardware features.

Parallel programming more effectively

Intel® Parallel Studio XE 2016 Composer Edition is available now as an open beta. Heinz Bast, technical consulting engineer, introduced some of the new features in this tool suite, which is designed to support programmers as they develop parallel programs to make optimal use of the hardware. It offers improved vectorization using Intel® Cilk^TM Plus and OpenMP* 4.0, with some features from the upcoming OpenMP* 4.1 already implemented. Reinders said that one of the things that excites him about OpenMP is that it helps obtain vectorization while leaving the actual code relatively intact, making it an efficient way to improve performance while keeping the code looking like the original science of the application. The new Intel Parallel Studio XE 2016 tool suite introduces loop blocking, so that data can be chunked for processing to avoid cache misses, and array reductions to avoid the bottleneck of turning off SIMD where there are data dependencies within a loop. The new annotated source listing inserts compiler diagnostics after the corresponding lines, making it easier to see what the compiler has done.

Offloading to GPUs

As more and more sophisticated graphics capabilities have been added to Intel® processors, they have become a key compute resource, with performance exceeding that of the CPU cores by up to 8 times. Heinz explained how work can be offloaded to the Intel® HD Graphics execution units using annotations the Intel® C/C++ Compiler. He said a number of customers have been asking for this capability, and that Intel had chosen to support standards rather than building its own proprietary language extensions.

Faster compilation

There are some changes that make the compilers more time-efficient too. Intel® Fortran Compiler XE 2016 has been improved with the implementation of submodules. Previously, if you made changes to a module you had to recompile not just that module but also any other modules that call it. In a project of three million lines of code, that could cause a significant delay. With submodules, that’s no longer necessary as long as the interface between the submodule and other modules is unchanged. Intel® C/C++ Compiler 16.0 implements a number of compile time improvements, including disabling intrinsics for prototypes (which were rarely used, Bast said) by default.

Accelerating video processing

The conference’s final session considered a different challenge, the rise of video streaming and download. Starting with the 5^th Generation Intel® Core^TM processor, Intel has included hardware acceleration for video with functions built in that enable accelerated encoding and decoding of video. The Intel® Media SDK enables application developers to use those capabilities in their software, making it easier to make applications for visual analysis, media transcoding, and graphics in the cloud (including hosted desktops and cloud gaming). Intel® Media Server Studio can be used to generate random Intel® Stress Bitstreams for testing the architecture and also includes tools for analyzing, encoding and decoding video. As the resolution of video increases (4K is expected to be widespread by the time of the next World Cup in 2018), hardware-accelerated encoding and decoding will become increasingly important to deliver a good user experience.

To find out more about Intel tools for software developers, visit the Intel Developer Zone.

Symbol-Bild:

Intel® Parallel Studio XE

Intel® Media Server Studio

Intel® Integrated Native Developer Experience (INDE)

Intel® XDK

Intel® C++-Compiler

Intel® Advisor XE

Intel® Fortran Compiler

Intel Hardware Accelerated Execution Manager (HAXM)

Media SDK für Windows*

In RSS einschließen:

Anfänger

Fortgeschrittene

↧

icc13 license

May 13, 2015, 10:26 am

Latest and popular articles on Intel Technologies

≫ Next: Performance degradation due to Auto Vectorization

≪ Previous: Making programmers more productive

Hi,

We have been using icc10 for building the binaries and few months back switched to using icc13 compiler. icc13 build time is more than the icc10. is this something to do with license file, i have read some where that using trial license will slow down the build. But we are not using trial version but using the license which was being used for icc10 to icc13.
Please share your inputs.
-regards,
Balaji

↧

Performance degradation due to Auto Vectorization

May 15, 2015, 2:46 am

Latest and popular articles on Intel Technologies

≫ Next: Internal error (C++14 which ICC 16)

≪ Previous: icc13 license

Architecture: x86_64 (Haswell with 6 cores)
Compiler Version: icc 15.0
Performance degradation while compiling with autovectorization(-O2) on the code snippet below:

#define N 200000
void foo()
{
	__declspec(align(64)) int a[N];
	int i,cnt=0;
	for(cnt=0;cnt<1000000;cnt++)
        {
		for(i = 2; i < N; i++)
		{
			a[i] = a[i-2] + 1;
		}
	}
}

Compilation method 1 with vectorization: icc -O2 <filename> -opt-report5
Result: Time taken (3m 24 sec)
Report says for loop above is getting vectorized with estimated potential speed up of about 1.2

Compilation method 2 without vectorization: icc <filename> -opt-report5 -O2 -no-vec
Result: Time taken (1m 08 sec)

Why is autovectorization degrading the performance even though estimated potential speedup is 1.2?

↧

Internal error (C++14 which ICC 16)

May 15, 2015, 6:22 am

Latest and popular articles on Intel Technologies

≫ Next: [bug report] Bug for template deduction?

≪ Previous: Performance degradation due to Auto Vectorization

The (simplified) code:

#include <iostream>
#include <type_traits>
using std::integral_constant;

template <int NUM> struct Cl_Iterate {
  template <typename FUNC> static void Do (FUNC f) {
    Cl_Iterate<NUM-1>::Do(f);
    f(integral_constant<int, NUM>());
  }
};

template <> struct Cl_Iterate<0> {
  template <typename FUNC> static void Do (FUNC f)  {
      f(integral_constant<int,0>());
  }
};

template <int NUM, typename FUNC> void Iterate (FUNC f) {
  Cl_Iterate<NUM-1>::Do(f);
}

constexpr int N = 3;

// Breaks compiler:
// internal error: assertion failed at: "shared/cfe/edgcpfe/expr.c", line 31532
int g() {
    int ii=0;
    Iterate<N> ( [&] (auto i) {
        Iterate<1+i()> ( [&] (auto j ) { ii++; });
      });
}

// Works
int f() {
    int ii=0;
    Iterate<N> ( [&] (auto i) {
        int & kk = ii;
        Iterate<1+i()> ( [&] ( auto j ) { kk++; });
      });
}

// Works
int h() {
    int ii=0;
    auto lambda = [&] ( auto j ) { ii++; };
    Iterate<N> ( [&] (auto i) {
        Iterate<1+i()> ( lambda );
      });
}

Note that in the original code a different assertion fails and a workaround as in g() and h() does not help.

I attached the preprocessed file from the original code, compile it with
icpc -openmp -std=c++1y -c l2hofe_preprocessed.cpp -o l2hofe.o

The assertion produced by the original code:
internal error: assertion failed at: "shared/cfe/edgcpfe/scope_stk.c", line 2025

I hope you can sort this out for the next (beta) update. We are eager to use this kind of 2 dimensional compile-time loop generation.

Regards,
Matthias Hochsteger

Anhang	Größe
Herunterladen l2hofe_preprocessed.cpp	2.14 MB

↧

[bug report] Bug for template deduction?

May 17, 2015, 12:01 am

Latest and popular articles on Intel Technologies

≫ Next: Windows: CUDA 6.5 and Intel Compiler (2015)?

≪ Previous: Internal error (C++14 which ICC 16)

I am using Intel C++ compiler 15.0 for Windows and there's a simple case that cannot compile.

#include

class Foo {
public:
    Foo() {}
    ~Foo() {}
    void test() const { std::cout << "Hello world!"<< std::endl; }
};

template<typename T> void test(const T& t) { t.test(); }

template<typename T, void F(const T&)=test<T> >
void bar(const T& t) {
    F(t);
}

int main() {
    Foo foo;
    bar<Foo>(foo);
    return 0;
}

It can successfully compile on other compilers including msvc, g++(both linux and windows), clang++(both linux and windows) and intel compiler on linux (with std=c++11 option). However on Windows, intel compiler (again with c++11 support) complains "error : no instance of function template "bar" matches the argument list", which does not make sense at all. Anyone could confirm this bug? Thanks in advance!

↧

Windows: CUDA 6.5 and Intel Compiler (2015)?

May 18, 2015, 6:11 am

Latest and popular articles on Intel Technologies

≫ Next: Memory leak?

≪ Previous: [bug report] Bug for template deduction?

I've been trying to find an authoritative answer for this, but everything is a few years old. Can the Intel compiler be used with CUDA 6.5 or 7.0 on Windows / Visual Studio 2013?

I did manage to get it to work for a few hours, but then Visual Studio crashed hard and had to be repaired, and that broke it.

Is there support for using CUDA 6.5 or 7.0 with the Intel Compiler (2015)?

↧

Memory leak?

May 19, 2015, 12:12 am

Latest and popular articles on Intel Technologies

≫ Next: simple vector addition

≪ Previous: Windows: CUDA 6.5 and Intel Compiler (2015)?

Hi,

I have a C++ application which is coded this way:

The main program does not need much memory (just a few variables). But this main program runs a loop in which we call a function.
This function needs about 140 MB of memory to run. The memory is allocated in the function and then released (using RAII).

When I run this program overnight on OSX, here is the data I get from "Activity Monitor", or "top" in terms of memory consumption

After the first loop, the program takes 150 MB of memory
After 68 loops, the program takes 220 MB of memory
After 394 loops, the program takes 480 MB of memory

So it seems that the function, which allocates and deallocated 140 MB of memory, "leaks" about 1 MB each time it is called. In this function, the allocated objects are:

My own version of std::vector which I call il::Vector, il::Matrix, il::Tensor. I have used these class in other codes and they seem fine.
A class that calls Pardiso from the MKL. Using RAII, I take care of properly deallocating Pardiso memory before I destroy the class (using the phase -1).

I have used Pointer Checker from Intel (on a Linux workstation) and Address Sanitizer from Clang on the program (with smaller inputs though) and they don't detect anything. I don't really know what to do. Is there a way memory fragmentation is responsible for this?

↧