Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2477 Discussions

Application that have MKL linking work correct Intermittently in Windows

Kohn-Sham
New Contributor I
324 Views

Hi.

 

Following source code runs well in Ubuntu, oneAPI version: 2024.0, but doesn't in Windows, oneAPI version: 2024.1 or 2024.2

Compiled app sometimes runs well or sometimes runs without flow_graph or just do nothing. (Running app several times, app somtimes runs well)

Changing cmake build option "--config" to Debug or Release mitigated (but not completely) described bug but still occur.

Setting compiler optimization flag to "/Od" also mitigated but not completely resolved bug.

 

<bug_tbb_mkl_problem_windows.cpp>

 

 

#define _USE_MATH_DEFINES
#include <mkl.h>
#include <oneapi/tbb/flow_graph.h>
#include <fstream>
#include <cmath>
#include <cstring>
#include <iostream>
#include <vector>


class src_body{
    int m_f,m_N,m_F;
    MKL_Complex8* m_arr_in;

public:
    src_body(MKL_LONG N,MKL_LONG F,MKL_Complex8* arr_in):m_N(N),m_F(F),m_arr_in(arr_in){};
    MKL_Complex8* operator()(oneapi::tbb::flow_control& fc){
        if(m_f<m_F){
            MKL_Complex8*  arr{(MKL_Complex8*)mkl_malloc(m_N*8,64)};
            memcpy(arr,m_arr_in+m_N*m_f++,m_N*8);
            std::cout<<"run src_body"<<'\n';
            return arr;
        }else{
            fc.stop();
            return nullptr;
        }
    }
};

int main(){
    MKL_LONG            N{1048576},F{500};
    MKL_Complex8*       arr_in{(MKL_Complex8*)mkl_malloc(N*F*8,64)};
    MKL_Complex8*       arr_out{(MKL_Complex8*)mkl_malloc(N*F*8,64)}; 
    oneapi::tbb::flow::graph g;
    std::atomic<int>    f_fft_node{},f_out_node{};
    
    // init test dataset
    for(MKL_LONG f=0;f<F;f++){
        for(MKL_LONG n=0;n<N;n++){
            arr_in[f*N+n].real=sin(2*M_PI*10*f*n*1e-6);
            arr_in[f*N+n].imag=0;
        }
    }

    // define node
    oneapi::tbb::flow::input_node<MKL_Complex8*> input_node(g,src_body(N,F,arr_in));
    oneapi::tbb::flow::function_node<MKL_Complex8*,MKL_Complex8*> compute_node(g,2,[&](MKL_Complex8* arr){
        for(MKL_LONG n=0;n<N;n++){
            arr[n].real++;
            arr[n].imag++;
        }
        return arr;
    });
    oneapi::tbb::flow::function_node<MKL_Complex8*,uint64_t> output_node(g,1,[&](MKL_Complex8* arr){
        memcpy(arr_out+N*f_out_node,arr,N*8);
        mkl_free(arr);
        return f_out_node++;
    });

    // connect edge
    oneapi::tbb::flow::make_edge(input_node,compute_node);
    oneapi::tbb::flow::make_edge(compute_node,output_node);

    // flow data
    std::chrono::high_resolution_clock::time_point t0{std::chrono::high_resolution_clock::now()};
    input_node.activate();
    g.wait_for_all();
    std::chrono::duration<float,std::milli> d{std::chrono::high_resolution_clock::now()-t0};
    std::cout<<"process time : "<<d.count()<<" ms"<<std::endl;

    mkl_free(arr_out);
    mkl_free(arr_in);
    return 0;
}

 

 

 

<CMakeLists.txt>

 

 

set(CMAKE_C_COMPILER icx)
set(CMAKE_CXX_COMPILER icx)
set(CMAKE_CXX_FLAGS_DEBUG "/Od /arch:AVX2")
# set(CMAKE_CXX_FLAGS_RELEASE "/O2 /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELEASE "/Od /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/O2 /arch:AVX2")
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
    set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebug)
    # set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebugDLL) # not working
elseif(CMAKE_BUILD_TYPE STREQUAL "Release") # not working
    set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebug)
    # set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreaded)
elseif(CMAKE_BUILD_TYPE STREQUAL "RelWithDebInfo") # not working
    set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebug)
endif()

cmake_minimum_required(VERSION 3.28.0)
project(test VERSION 0.0.0 LANGUAGES C CXX)
set(MKL_LINK dynamic)
set(MKL_THREADING sequential)
set(MKL_INTERFACE ilp64)
find_package(MKL CONFIG REQUIRED PATHS $ENV{MKLROOT}) # flag setting reference : MKLConfig.cmake
find_package(TBB REQUIRED COMPONENTS tbb tbbmalloc)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin)


add_executable(test     
    bug_tbb_mkl_problem_windows.cpp
)
set_target_properties(test PROPERTIES 
    OUTPUT_NAME ${PROJECT_NAME}
)
target_link_libraries(test
    PRIVATE MKL::MKL
    PRIVATE TBB::tbb
)

 

 

0 Kudos
2 Replies
Kohn-Sham
New Contributor I
262 Views

I tested compiler optimization flags yesterday.

I found /Ot flag makes this bug. Maybe /Os so do.

/O2 flag is same as /Og /Oi /Ot /Oy /Ob2 /GF /Gy. everything except /Ot works code well.

It looks like compiler related bug.

0 Kudos
Mark_L_Intel
Moderator
193 Views

Hello @Kohn-Sham, the issue may need to be moved to Compiler realted Forum if it is compiler related bug.  

0 Kudos
Reply