Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Intel Threading Tools

I have a C++ (Visual Studio 2003) server process that accepts sockets, reads XML-RPC requests, processes the request, then returns an XML-RPC response. I am experimenting by threading it with OpenMP, and analyzing it with the Thread Checker and Thread Profiler VTune modules.

1) The OpenMP "parallel" section doesn't perform any better than a serial process. I know that if the code is CPU bound, multi-threads won't help, but this shouldn't be the case. For each request, a connection is made to a SQL Server 2005 instance using ADO, and the data is formatted and returned. Performance is slightly slower with multiple threads (3-5 threads versus 1). All data returned is correct and no errors occur. It could be too granular locking, but the one "critical" block is pretty small. Any thoughts regarding what could be going on?

2) When I run either of the threading tools, the server fails badly. It doesn't seem to crash (most of the time), but the data is badly skewed. Console output is seriously interleaved and returned data is similarly corrupted. This is the exact same EXE module that runs without problems (other than slow multithreaded performance) when run outside of the threading tools. This seems pretty serious as I wouldn't expect runtime results to vary when invoked by the tools.

My development machine is a 3.2GHz HT P4 laptop with 1GB RAM running Windows XP Pro. I've tried the project in both VS2003 and 2005, compiled with the MS and the Intel compilers.

Help would be very much appreciated!
0 Kudos
3 Replies
Something to add...

I read somewhere that OpenMP typically has two implementations -- one that essentially contains no-ops, and one with real parallelization (to make it easier to test performance without recompiles).

a) Is this true?
b) Is it possible that when I run and debug the application I'm using the no-op version, but when running under Threading Tools I'm using the "for real" version? This really makes no sense otherwise.

This problem seems to occur with either the Microsoft or Intel compiler. The common factor appears to be running it through the Threading Tools or not.

Black Belt

First, on a single processor with HT a favorable compute bound process might see 20% improvement in performance (10% to 30%).

Second, SQL Server might be serializing your requests.

Corruption should not occure. hat you are interpreting as corruption may be perfectly reasonable for what you are requestion of your program.Keep in mind that you may need to place your console output routine in a critical section to avoid inteleaving of display within a record. Permit records to interleave but not data within a record.

Jim Dempsey

Thanks for the response. To be clear, I don't care about the interleaved console output. It's just logging. The important data is what is returned on the socket. I'm not sure what "perfectly resonable" would be, but it's not reasonable for my needs! A client connects to the socket, the middleware server (the code in question) connects to SQL Server 2005 (not set to single user mode), retrieves a result set, formats it, and returns it on the same socket. The same EXE runs fine in the debugger or stand-alone, but fails when invoked by the threading tools. At that point, even a single request on a single thread returns corrupted data. Very frustrating!

I know that on a single core box (even with HT) I won't see massive parallel performance, but I should see more requests processed per second when those requests are being sent on separate threads. There's always some blocking time when other threads get their moments of work in.

So is there no way that you know of for the threading tools to cause things to run differently? I read somewhere about OpenMP