- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I'm struggling with optimizing out problem when using mkl, tbb in windows.
I had posted it at oneTBB forum, but found that it seriously related with compiler.
The original post is below.
Application that have MKL linking work correct Intermittently in Windows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found that source code get bugged.
My apology for the confusion.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please clarify what you meant by "remove too much source code in windows?"
How do you build/compile your program and run it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
"remove too much source code in windows" means the source code has tbb part but program works skipping the part.
I had been testing further and found that actually the compiled program runs sometimes work correctly, which means program runs tbb part successfully and compiler doens't remove tbb part.
But usually, program fails to run tbb part as below.
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4402 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4864 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4418 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4014 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.435 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4246 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4774 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4221 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4319 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4685 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4153 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
input_node run successfully
input_node run successfully
1
input_node run successfully
2
input_node run successfully
3
input_node run successfully
input_node run successfully
5
input_node run successfully
6
input_node run successfully
7
input_node run successfully
8
input_node run successfully
input_node run successfully
input_node run successfully
input_node run successfully
12
input_node run successfully
12
input_node run successfully
input_node run successfully
14
input_node run successfully
15
input_node run successfully
16
input_node run successfully
input_node run successfully
18
input_node run successfully
19
20
20
20
20
20
process time : 38.7822 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4188 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.4306 ms
PS C:\Users\user\Desktop\tbb\bin>
The new source codes are following.
<CMakeLists.txt>
set(CMAKE_C_COMPILER icx)
set(CMAKE_CXX_COMPILER icx)
set(CMAKE_CXX_FLAGS_DEBUG "/Od /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELEASE "/O2 /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/O2 /arch:AVX2")
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebugDLL)
elseif(CMAKE_BUILD_TYPE STREQUAL "Release")
set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDLL)
elseif(CMAKE_BUILD_TYPE STREQUAL "RelWithDebInfo")
set(CMAKE_MSVC_RUNTIME_LIBRARY MultiThreadedDebugDLL)
endif()
cmake_minimum_required(VERSION 3.28.0)
project(test VERSION 0.0.0 LANGUAGES C CXX)
set(MKL_LINK dynamic)
set(MKL_THREADING sequential)
set(MKL_INTERFACE ilp64)
find_package(MKL CONFIG REQUIRED PATHS $ENV{MKLROOT})
find_package(TBB REQUIRED COMPONENTS tbb tbbmalloc)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin)
add_executable(test
reproducer.cpp
)
set_target_properties(test PROPERTIES
OUTPUT_NAME ${PROJECT_NAME}
)
target_link_libraries(test
PRIVATE MKL::MKL
PRIVATE TBB::tbb
)
<reproducer.cpp>
#define _USE_MATH_DEFINES
#include <mkl.h>
#include <oneapi/tbb/flow_graph.h>
#include <cmath>
#include <cstring>
#include <iostream>
#ifdef _WIN32
#include <io.h>
#else
#include <unistd.h>
#endif
class src_body{
MKL_LONG m_F,m_f,m_N;
MKL_Complex8* m_arr_all_frame;
public:
src_body(MKL_LONG F,MKL_LONG N,MKL_Complex8* arr_all_frame):m_F(F),m_N(N),m_arr_all_frame(arr_all_frame){};
MKL_Complex8* operator()(oneapi::tbb::flow_control& fc){
if(m_f<m_F){
std::cout<<"input_node run successfully"<<'\n';
MKL_Complex8* arr{(MKL_Complex8*)mkl_malloc(m_N*8,64)};
memcpy(arr,m_arr_all_frame+m_N*m_f,m_N*8);
m_f++;
return arr;
}else{
fc.stop();
return nullptr;
}
}
};
int main(){
MKL_LONG N{1048576},F{20};
MKL_Complex8* arr_all_frame{(MKL_Complex8*)mkl_malloc(N*F*8,64)};
MKL_Complex8* arr_out{(MKL_Complex8*)mkl_malloc(N*F*8,64)};
oneapi::tbb::flow::graph g;
std::atomic<int> f_compute_node{},f_out_node{};
// init test dataset
for(MKL_LONG f=0;f<F;f++){
for(MKL_LONG n=0;n<N;n++){
arr_all_frame[f*N+n].real=sin(2*M_PI*10*f*n*1e-6);
arr_all_frame[f*N+n].imag=0;
}
}
// define node
oneapi::tbb::flow::input_node<MKL_Complex8*> read_adc_node(g,src_body(F,N,arr_all_frame));
oneapi::tbb::flow::function_node<MKL_Complex8*,MKL_Complex8*> compute_node(g,4,[&](MKL_Complex8* arr){
int i_max=rand();
for(int i=0;i<i_max;i++){
float f=i+1;
}
f_compute_node++;
// std::cout<<f_compute_node++<<std::endl; // monitor frame order1
return arr;
});
oneapi::tbb::flow::function_node<MKL_Complex8*,int> output_node(g,1,[&](MKL_Complex8* arr){
std::cout<<f_compute_node<<std::endl; // monitor frame order2 (Mistake: monitored here)
memcpy(arr_out+N*f_out_node++,arr,N*8);
mkl_free(arr);
return 0;
});
// connect edge
oneapi::tbb::flow::make_edge(read_adc_node,compute_node);
oneapi::tbb::flow::make_edge(compute_node,output_node);
// flow data
std::chrono::high_resolution_clock::time_point t0{std::chrono::high_resolution_clock::now()};
read_adc_node.activate();
g.wait_for_all();
std::chrono::duration<float,std::milli> d{std::chrono::high_resolution_clock::now()-t0};
std::cout<<"process time : "<<d.count()<<" ms"<<std::endl;
mkl_free(arr_out);
mkl_free(arr_all_frame);
return 0;
}
The build commands are following. (source codes are in "C:/Users/user/Desktop/tbb")
PS C:\Users\user\Desktop\tbb> cmake.exe -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE --no-warn-unused-cli -SC:/Users/user/Desktop/tbb -Bc:/Users/user/Desktop/tbb/build -G Ninja
Not searching for unused variables given on the command line.
-- The C compiler identification is IntelLLVM 2024.2.1 with MSVC-like command-line
-- The CXX compiler identification is IntelLLVM 2024.2.1 with MSVC-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- MKL_VERSION: 2024.2.0
-- MKL_ROOT: C:/Program Files (x86)/Intel/oneAPI/mkl/latest
-- MKL_SYCL_ARCH: None, set to ` intel64` by default
-- MKL_ARCH: None, set to ` intel64` by default
-- MKL_SYCL_LINK: static
-- MKL_LINK: static
-- MKL_SYCL_INTERFACE_FULL: intel_ilp64
-- MKL_INTERFACE_FULL: intel_ilp64
-- MKL_SYCL_THREADING: sequential
-- MKL_THREADING: sequential
-- MKL_MPI: None, set to ` intelmpi` by default
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_scalapack_ilp64.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_cdft_core.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_intel_ilp64.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_sequential.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_core.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_blacs_intelmpi_ilp64.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_sycl.lib
-- Found Python: C:/Program Files/Python310/python.exe (found version "3.10.11") found components: Interpreter Development NumPy Development.Module Development.Embed
-- Configuring done (6.3s)
-- Generating done (0.0s)
-- Build files have been written to: C:/Users/user/Desktop/tbb/build
PS C:\Users\user\Desktop\tbb> cmake.exe --build c:/Users/user/Desktop/tbb/build --config Release --target all
[2/2] Linking CXX executable C:\Users\user\Desktop\tbb\bin\test.exe
PS C:\Users\user\Desktop\tbb> cd bin
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.8528 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.8555 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
process time : 0.9129 ms
PS C:\Users\user\Desktop\tbb\bin> ./test
input_node run successfully
input_node run successfully
1
input_node run successfully
2
input_node run successfully
3
input_node run successfully
input_node run successfully
5
input_node run successfully
6
input_node run successfully
7
input_node run successfully
8
input_node run successfully
input_node run successfully
10
input_node run successfully
11
input_node run successfully
input_node run successfully
13
input_node run successfully
14
input_node run successfully
15
input_node run successfully
input_node run successfully
17
input_node run successfully
18
input_node run successfully
19
20
20
20
20
20
process time : 38.6662 ms
PS C:\Users\user\Desktop\tbb\bin>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found that source code get bugged.
My apology for the confusion.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page