Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
718 Discussions

/Ot (/O2) flag remove too much of source code in windows

Kohn-Sham
New Contributor I
546 Views

Hi.

 

I'm struggling with optimizing out problem when using mkl, tbb in windows.

I had posted it at oneTBB forum, but found that it seriously related with compiler.

The original post is below.

 

Application that have MKL linking work correct Intermittently in Windows 

0 Kudos
1 Solution
Kohn-Sham
New Contributor I
80 Views

I found that source code get bugged.

My apology for the confusion.

View solution in original post

0 Kudos
3 Replies
Alex_Y_Intel
Moderator
448 Views

Can you please clarify what you meant by "remove too much source code in windows?"

How do you build/compile your program and run it? 

0 Kudos
Kohn-Sham
New Contributor I
428 Views

Hi.

"remove too much source code in windows" means the source code has tbb part but program works skipping the part.

I had been testing further and found that actually the compiled program runs sometimes work correctly, which means program runs tbb part successfully and compiler doens't remove tbb part.

But usually, program fails to run tbb part as below.


PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4402 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4864 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4418 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4014 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.435 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4246 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4774 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4221 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4319 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4685 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4153 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
input_node run successfully
input_node run successfully
1
input_node run successfully
2
input_node run successfully
3
input_node run successfully
input_node run successfully
5
input_node run successfully
6
input_node run successfully
7
input_node run successfully
8
input_node run successfully
input_node run successfully
input_node run successfully
input_node run successfully
12
input_node run successfully
12
input_node run successfully
input_node run successfully
14
input_node run successfully
15
input_node run successfully
16
input_node run successfully
input_node run successfully
18
input_node run successfully
19
20
20
20
20
20
process time : 38.7822 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4188 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4306 ms
PS C:\Users\user\Desktop\tbb\bin>

 

The new source codes are following.

 

<CMakeLists.txt>

 

set(CMAKE_C_COMPILER icx)
set(CMAKE_CXX_COMPILER icx)
set(CMAKE_CXX_FLAGS_DEBUG "/Od /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELEASE "/O2 /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/O2 /arch:AVX2")
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
    set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebugDLL) 
elseif(CMAKE_BUILD_TYPE STREQUAL "Release")
    set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDLL)
elseif(CMAKE_BUILD_TYPE STREQUAL "RelWithDebInfo")
    set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebugDLL)
endif()

cmake_minimum_required(VERSION 3.28.0)
project(test VERSION 0.0.0 LANGUAGES C CXX)
set(MKL_LINK dynamic)
set(MKL_THREADING sequential)
set(MKL_INTERFACE ilp64)
find_package(MKL CONFIG REQUIRED PATHS $ENV{MKLROOT})
find_package(TBB REQUIRED COMPONENTS tbb tbbmalloc)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin)


add_executable(test     
    reproducer.cpp
)
set_target_properties(test PROPERTIES 
    OUTPUT_NAME ${PROJECT_NAME}
)
target_link_libraries(test
    PRIVATE MKL::MKL
    PRIVATE TBB::tbb
)

 

 

 

<reproducer.cpp>

 

#define _USE_MATH_DEFINES
#include <mkl.h>
#include <oneapi/tbb/flow_graph.h>
#include <cmath>
#include <cstring>
#include <iostream>
#ifdef _WIN32
#include <io.h>
#else
#include <unistd.h>
#endif


class src_body{
    MKL_LONG m_F,m_f,m_N;
    MKL_Complex8* m_arr_all_frame;

public:
    src_body(MKL_LONG F,MKL_LONG N,MKL_Complex8* arr_all_frame):m_F(F),m_N(N),m_arr_all_frame(arr_all_frame){};
    MKL_Complex8* operator()(oneapi::tbb::flow_control& fc){
        if(m_f<m_F){
            std::cout<<"input_node run successfully"<<'\n';
            MKL_Complex8* arr{(MKL_Complex8*)mkl_malloc(m_N*8,64)};
            memcpy(arr,m_arr_all_frame+m_N*m_f,m_N*8);
            m_f++;
            return arr;
        }else{
            fc.stop();
            return nullptr;
        }
    }
};

int main(){
    MKL_LONG            N{1048576},F{20};
    MKL_Complex8*       arr_all_frame{(MKL_Complex8*)mkl_malloc(N*F*8,64)};
    MKL_Complex8*       arr_out{(MKL_Complex8*)mkl_malloc(N*F*8,64)};
    oneapi::tbb::flow::graph g;
    std::atomic<int>    f_compute_node{},f_out_node{};

    // init test dataset
    for(MKL_LONG f=0;f<F;f++){
        for(MKL_LONG n=0;n<N;n++){
            arr_all_frame[f*N+n].real=sin(2*M_PI*10*f*n*1e-6);
            arr_all_frame[f*N+n].imag=0;
        }
    }
    
    // define node
    oneapi::tbb::flow::input_node<MKL_Complex8*> read_adc_node(g,src_body(F,N,arr_all_frame));
    oneapi::tbb::flow::function_node<MKL_Complex8*,MKL_Complex8*> compute_node(g,4,[&](MKL_Complex8* arr){
        int i_max=rand();
        for(int i=0;i<i_max;i++){
            float f=i+1;
        }
        f_compute_node++;
        // std::cout<<f_compute_node++<<std::endl; // monitor frame order1
        return arr;
    });
    oneapi::tbb::flow::function_node<MKL_Complex8*,int> output_node(g,1,[&](MKL_Complex8* arr){
        std::cout<<f_compute_node<<std::endl; // monitor frame order2 (Mistake: monitored here)
        memcpy(arr_out+N*f_out_node++,arr,N*8);
        mkl_free(arr);
        return 0;
    });

    // connect edge
    oneapi::tbb::flow::make_edge(read_adc_node,compute_node);
    oneapi::tbb::flow::make_edge(compute_node,output_node);

    // flow data
    std::chrono::high_resolution_clock::time_point t0{std::chrono::high_resolution_clock::now()};
    read_adc_node.activate();
    g.wait_for_all();
    std::chrono::duration<float,std::milli> d{std::chrono::high_resolution_clock::now()-t0};
    std::cout<<"process time : "<<d.count()<<" ms"<<std::endl;
    
    mkl_free(arr_out);
    mkl_free(arr_all_frame);
    return 0;
}

 

 

 

The build commands are following. (source codes are in "C:/Users/user/Desktop/tbb")

 

PS C:\Users\user\Desktop\tbb> cmake.exe -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE --no-warn-unused-cli -SC:/Users/user/Desktop/tbb -Bc:/Users/user/Desktop/tbb/build -G Ninja
Not searching for unused variables given on the command line.
-- The C compiler identification is IntelLLVM 2024.2.1 with MSVC-like command-line
-- The CXX compiler identification is IntelLLVM 2024.2.1 with MSVC-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- MKL_VERSION: 2024.2.0
-- MKL_ROOT: C:/Program Files (x86)/Intel/oneAPI/mkl/latest
-- MKL_SYCL_ARCH: None, set to ` intel64` by default
-- MKL_ARCH: None, set to ` intel64` by default
-- MKL_SYCL_LINK: static
-- MKL_LINK: static
-- MKL_SYCL_INTERFACE_FULL: intel_ilp64
-- MKL_INTERFACE_FULL: intel_ilp64
-- MKL_SYCL_THREADING: sequential
-- MKL_THREADING: sequential
-- MKL_MPI: None, set to ` intelmpi` by default
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_scalapack_ilp64.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_cdft_core.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_intel_ilp64.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_sequential.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_core.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_blacs_intelmpi_ilp64.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_sycl.lib
-- Found Python: C:/Program Files/Python310/python.exe (found version "3.10.11") found components: Interpreter Development NumPy Development.Module Development.Embed
-- Configuring done (6.3s)
-- Generating done (0.0s)
-- Build files have been written to: C:/Users/user/Desktop/tbb/build
PS C:\Users\user\Desktop\tbb> cmake.exe --build c:/Users/user/Desktop/tbb/build --config Release --target all
[2/2] Linking CXX executable C:\Users\user\Desktop\tbb\bin\test.exe
PS C:\Users\user\Desktop\tbb> cd bin
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.8528 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.8555 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.9129 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
input_node run successfully
input_node run successfully
1
input_node run successfully
2
input_node run successfully
3
input_node run successfully
input_node run successfully
5
input_node run successfully
6
input_node run successfully
7
input_node run successfully
8
input_node run successfully
input_node run successfully
10
input_node run successfully
11
input_node run successfully
input_node run successfully
13
input_node run successfully
14
input_node run successfully
15
input_node run successfully
input_node run successfully
17
input_node run successfully
18
input_node run successfully
19
20
20
20
20
20
process time : 38.6662 ms
PS C:\Users\user\Desktop\tbb\bin>

0 Kudos
Kohn-Sham
New Contributor I
81 Views

I found that source code get bugged.

My apology for the confusion.

0 Kudos
Reply