- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The physicist who has written the module doesn't want to share the source code to it for some strange reason so I don't have it, just a binary blob.
The whole package looks like this: The Front-end (a gui) talks to the modules who talks to the DLL, the DLL sends configuration data to WinDriver that passes the configuration data to the FPGA. The configuration is different for different modules. The FPGA card has a PLX PCI controller. Jungo WinDriver is the driver for the card.
Since the physicist involved in the project doesn't want to bother with the accelerator card, he has a dummy DLL that just returns an OK message.
When my client tested the software with card everything worked all right when they used the dummy DLL. The bulk of the module running on two quad core processors worked utilising 100% on four cores and 30% each on the remaining four.
But when they used the real DLL that calls WindDriver and the card, only one core was used and at 100%. The call to WinDriver somehow messed with the OpenMP parallelization.
So I'm wondering if the Windows XP scheduler is trying to squeeze every thing that calls WinDriver on the on to the same processor for some reason or if there's some other explanation.
I know that there are API calls to set the processor affinity in Windows but I'm not familiar with them.
1 Front-end
2 Module1 Module2 Module3
3 DLL
4 WinDriver
5 Card with PLX controller and Xilinx FPGA.
Platform:
Windows XP 64 Pro with Visual Studio 2005
Two Intel Xenon quad core processors
The Fortran module is built on the same platform with the
Intel Fortran Compiler 10.1.021 and the Intel OpenMP math libraries.
Any help would be greatly appreciated
/Lars Malmqvist
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check on tim18's suggestion first (use critical section if calling within a parallel region)
An alternative is to structure the code such that only 1 thread (usualy the master) makes the call to the hardware driver. (see $OMP MASTER).
If your hardware has an initialization section where working memory buffers are specified (or context/completion data is specified) .AND. if this data is stack local to that thread (usualy master) performing the initialization, then (only then) calling by multiple threads results in threads other than master using invalid buffer addresses (not in their stack).
You might also find success in
initialize FPGA
sequential processing
begin parallel region
CPU processing
end parallel region
FPGA processing
begin parallel region
CPU processing
end parallel region
...
If you need (desire)to perform FPGA processing within parallel regions and if critical sections do not resolve the problem then consider reworking your parallel control loops such that the master thread performs all the FPGA calculations. This will be a bit more complicated programming. This should be well within the skill level of a competent programmer.
Also, in lieu of performing the FPGA code by OpenMP master thread, your application could spawn a non-OpenMP thread to perform all the FPGA processing by way of receiving requests (messages) from the OpenMP threads.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reason this wasn't caught earlier was since the software originally ran on AMD processors and for some reason the API calls are perfectly functional there.
So again, many thanks!
/Lars Malmqvist

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page