Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7943 Discussions

How do I get icx to compile as fast as icc?

mochongli
Novice
2,997 Views

Code:

https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array

Compile commands:

/O3 /Ot /arch:CORE-AVX2

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! With this, the next loop runs faster.
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i)
    {
        for (unsigned c = 0; c < arraySize; ++c)
        {   // Primary loop.
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}

icc:0.02

icx:0.22

 

https://stackoverflow.com/questions/75023152/what-compiler-commands-can-be-used-to-make-gcc-and-icc-compile-programs-as-fast ---------------------------------------------- Here my post on stackoverflow, now I know that icx lacks a lot of optimizations compared to icc.
0 Kudos
10 Replies
SeshaP_Intel
Moderator
2,953 Views

Hi,

 

Thank you for posting in Intel Communities.

Could you please add /Ox optimization flag to the command? Which enables maximum optimizations. Please refer to the below link for more details.

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-reference/compiler-options/optimization-options/ox.html

We are getting good performance with Intel oneAPI C++ Compiler(icx) than with Intel C++ Classic Compiler(icl/icc) while executing your code.

Please refer to the below screenshot for the output.

SeshaP_Intel_0-1673013836825.png

 

Please let us know if you still face any issues.

 

Thanks and Regards,

Pendyala Sesha Srinivas

0 Kudos
mochongli
Novice
2,947 Views

icx has many fewer compile commands than icc, such as
/Qinline-factor-

0 Kudos
mochongli
Novice
2,009 Views

Another comparison

https://stackoverflow.com/questions/29186186/why-does-gcc-generate-a-faster-program-than-clang-in-this-recursive-fibonacci-co

#include <iostream>
#include <chrono>
using namespace std;

#define CHRONO_NOW                  chrono::high_resolution_clock::now()
#define CHRONO_DURATION(first,last) chrono::duration_cast<chrono::duration<double>>(last-first).count()

int fib(int n) {
    if (n<2) return n;
    return fib(n-1) + fib(n-2);
}

int main() {
    auto t0 = CHRONO_NOW;
    cout << fib(45) << endl;
    cout << CHRONO_DURATION(t0, CHRONO_NOW) << endl;
    return 0;
}


icc:0.32
icx:2.35

0 Kudos
Bernasek__Franz
2,838 Views

Now i have your Code compile with icx, icpx  Option -O3 under Linux

see Result

0 Kudos
SeshaP_Intel
Moderator
1,929 Views

Hi,

 

We were able to reproduce your issue. We are working on this issue internally.

We will get back to you soon.

 

Thanks and Regards,

Pendyala Sesha Srinivas

0 Kudos
mochongli
Novice
1,643 Views
#include <iostream>
#include <Windows.h>
#include <nmmintrin.h>

typedef struct _integrity_check
{
    struct section {
        std::uint8_t* name = {};
        void* address = {};
        std::uint32_t checksum = {};

        bool operator==(section& other)
        {
            return checksum == other.checksum;
        }
    }; section _cached;

    _integrity_check()
    {
        _cached = get_text_section(reinterpret_cast<std::uintptr_t>(GetModuleHandle(nullptr)));
    }

    std::uint32_t crc32(void* data, std::size_t size)
    {
        std::uint32_t result = {};

        for (std::size_t index = {}; index < size; ++index)
            result = _mm_crc32_u32(result, reinterpret_cast<std::uint8_t*>(data)[index]);

        return result;
    }

    section get_text_section(std::uintptr_t module)
    {
        section text_section = {};

        PIMAGE_DOS_HEADER dosheader = reinterpret_cast<PIMAGE_DOS_HEADER>(module);
        PIMAGE_NT_HEADERS nt_headers = reinterpret_cast<PIMAGE_NT_HEADERS>(module + dosheader->e_lfanew);

        PIMAGE_SECTION_HEADER section = IMAGE_FIRST_SECTION(nt_headers);

        for (int i = 0; i < nt_headers->FileHeader.NumberOfSections; i++, section++)
        {
            std::string name(reinterpret_cast<char const*>(section->Name));
            if (name != ".text")
                continue;

            void* address = reinterpret_cast<void*>(module + section->VirtualAddress);
            text_section = { section->Name, address, crc32(address, section->Misc.VirtualSize) };
        }
        return text_section;
    }
    /// <summary>
    /// Checks .text integrity.
    /// </summary>
    /// <returns>Returns true if it has been changed.</returns>
    bool check_integrity()
    {
        section section2 = get_text_section(reinterpret_cast<std::uintptr_t>(GetModuleHandle(nullptr)));
        return (!(_cached == section2));
    }
};
int main()  {
    _integrity_check check;

    for (;;)    {
        std::cout << std::boolalpha << check.check_integrity() << std::endl;
    }
}
0 Kudos
mochongli
Novice
1,468 Views

https://godbolt.org/z/a81P4Tsb7
But in real-world projects, icx is often the fastest.

0 Kudos
SeshaP_Intel
Moderator
590 Views

Hi,


Thank you for your patience. The issue raised by you has been targeted to be fixed in oneAPI 2024.0 version which will be released in the coming months. 

The icpx compiler is providing good performance improvement than the icpc compiler.

If the issue still persists with the new release, then you can start a new discussion for the community to investigate.


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
Reply