Re: Looking for help defining a Threaded Programming Contest fo - Page 2

ClayB · ‎02-06-2009

Since we had such a great response to the previous Threading Challenge Contest, we want to do another one or two or six. From that year-long contest that ended in September 2008 there are a few things we hope we can do better this time around. Some of the lessons learned include:

* Don't go on so long. Twelve months is a virtual programming marathon and I appreciate everyone that stuck with it. While it did separate the dabblers from the hardcore hackers from the thread monkeys, it discouraged participation from some innovative and deserving contestants.

* Try to have more objective judging criteria. Execution speed is the whole reason for concurrent and multi-core programming. Does it matter how you get there or what tools are used? A code that screams through a data set should be judged higher than a code that is consistently well indented, right?

* Be more open and precise about the judging machine configuration(s). It would be much better to have a single machine that all contestants could access to develop, test, and submit their entries. Unfortunately, this would require administrators and rack space and all sorts of other expenses. Second best is to be sure everyone understands how the machines are set up, both for hardware and software.

The driving idea behind threaded programming events for 2009 would be to take "classic" algorithms and parallelize them. The term "classic" would mean something that might be studied in an undergraduate course on algorithm design. Categories of classic algorithms would include things like searching, sorting, numerical computations, string processing, computational geometry, and graph algorithms. In addition to fostering a threaded code solution to "classic" algorithms, we are hoping to somehow to generate content for the Intel Software Network.

Since we are still in the planning stages, things are still in flux. Other ideas that have been brought up are to limit the languages used, pick a single OS to be used, restrict some problems be done in a non-traditional programming language (e.g., Haskell), or offer more than one problem per month.

In order to make this contest the best it can be, we'd like to solicit your input. If you have any ideas, especially if you participated in the Threading Challenge Contest, about how the contest could be administered, judging criteria to be used, a problem that could be posed, or how contest entries could be featured in ISN and educate readers about multi-core programming, then we want to hear your ideas. You can reply to this post or send me an message through the email facility here in the forums.

With your help, we can make threaded programming contests this year and in coming years better, more fun, and provide exceptional examples for programmers around the world.

jimdempseyatthecove · ‎02-11-2009

Pure performance is the only thing that matters.

It is easy to contrive a poor performing application that exhibits super scaling.

Jim

fraggy · ‎02-11-2009

Quoting - jimdempseyatthecove

Pure performance is the only thing that matters.

It is easy to contrive a poor performing application that exhibits super scaling.

Jim

Performance scaling is the only thing that matters

If it don't scale correctly on a 48 hardware threads machine, it's useless :)

Don't forget that in few years from now, processors may have thousands of cores...

Vincent, truth must be somewhere in between...

jimdempseyatthecove · ‎02-11-2009

What I am saying is when an entry for a competition comes in where the fast single threaded version of the application runs faster than the slow 48 hardware threaded application and where the fast single threaded application has a negative scaling slope and where the 48 hardware threaded application has a nice linear scaling factor you have a situation at an extreme. For some problems this situation is real. Good scaling means little to nothing when you cannot attain the desired performance.

The above situation can be avoided by "the committee" selectingproblems that should benefit from scaling. Once selected, the measure should be placed solely on performance of the submitted entries.

Jim Dempsey

fraggy · ‎02-12-2009

Quoting - jimdempseyatthecove

The above situation can be avoided by "the committee" selectingproblems that should benefit from scaling. Once selected, the measure should be placed solely on performance of the submitted entries.

If a problem "should benefit from scaling", does it means it's easy to parallelize ? The contest should be more interesting if problems are not so easy to solve.

For instance, a physics engine is not easy to scale, a lot of people had work on that for the last 5/6 years and there is a lot of improvement to make.

We can grade the multithreaded version of an application with the monothread one, comparing pure performance results on 48 Threads (or 24, or both).

We can also grade pure scaling of the multithread version from 1 to 48 zones

Final grading could be a balance of the above :)

Vincent

DweeberlyLoom · ‎02-12-2009

Ya, know I don't think that whole "mirroring from the blog post comments is working ... so I'm going to copy my comments here:

Well, I kind'a liked the last contest (my least favor part was using C++ ... cause I wanted the points). I also rather liked the fact that humans looked at the code, because I think there is more to writing code than just how fast it goes. Still I understand how some people didn't like it and it had to be hard on the judges.

I would pick one central theme and build around it. So if the main idea is to parallelize "classic" algorithms then you wouldn't really need code at all :-)
If you really want people to only express the algorithms in code, then I would say that correctness should be the primary criteria, followed by speed (speed or throughput? ... I'm thinking about the use of memory mapped files in the last contest). Is memory usage important? If my code runs 5% slower but only uses 1/4 of the memory is it better?

Perhaps you should use a (more or less) fixed set of test machines, something like Windows 32 & 64 bit and Linux 32 & 64 bit. Within these four configurations compare the contestants against each other, this would give a more apple/apple comparison.

As to the length of the contest, maybe you should consider more of a carnival approach ... play for points, when you get enough points you can redeem them for something, or keep banking them for a better prize. The prize bank will be a limited so the first person to redeem gets the prize. Unused points can be (perhaps) added to the "belt" status points.

Does MIMD/SIMD programming count toward educating about multi-core?

:-)

rreis · ‎02-13-2009

I like the points idea. I also like to have several categories. Of programmers, of problems. Can someone state, explicitly, what are the contest objectives? Not programming objectives but of it, in itself (bring more programmers to multicore, obtain better algorithms, have fun, establish some sort of "best multithread programmers league", ...).

That's one question. About the platform, here at the Univ the guys from the IT department lecture courses with 200 students. That means that for a Artificial Intelligence course at least 100 projects must be submited. These go all to the same machine and are automatic tested. How hard would be for Intel to have such a machine (or time on such a machine) where everyone would send their programs, it would run automatic and spit out the points (by speed, by scaling, by... ).

jimdempseyatthecove · ‎02-13-2009

dweeb,

Good point on the memory footprint. For some problems that may be important. For many problems small doesn't matter. When you use 500KB vs 1MB on a system with 2GB of RAM I would not think that the size of the working set would be an issue. The performance of this application would be the driving factor - and indirectly because the 500KB program might fit better in the cache, it could also run faster. The 500KB might fit in L2 cache where the 1MB may spill over into L3 or RAM.

The test machines should include Windows 32 & 64 bit and Linux 32 & 64 bit, but the choice of the machine should be up to the contestant.

Jim Dempsey

Dmitry_Vyukov · ‎02-14-2009

Quoting - rreis

That's one question. About the platform, here at the Univ the guys from the IT department lecture courses with 200 students. That means that for a Artificial Intelligence course at least 100 projects must be submited. These go all to the same machine and are automatic tested. How hard would be for Intel to have such a machine (or time on such a machine) where everyone would send their programs, it would run automatic and spit out the points (by speed, by scaling, by... ).

AMD Multicore Threadfest was hosted on the TopCoder platform. Contestants was to submit tests during the whole marathon, tests was automatically executed, and results was sent back to the contestant. Results included execution time, score (scoring was 100% formal), stdout/stderr output. So contestants was able to submit small tests, just to see whether one variant or another runs faster on target machine (machine for tests was the same as for final evaluation). It was very-very handy.

Dmitry_Vyukov · ‎02-22-2009

Quoting - jimdempseyatthecove

Your other suggestions are good. These fall into the class of distributing a base level program with input files and output files and a set of guidelines as to if the output file has to be exactly the same or has to pass an approximatly the same when passed through a verification program (also supplied). The user can do whatever they want with the sample program, but they must use the supplied input data as-is.

In AMD Threadfest, participants was submitting not the whole programs, but just source code implementation of some interface. Then it was compiled with test-suite, provided by organizers, so that actually there was no input/output files.
In this variant test-suite will be also able to control timing parameters of the workload. For example test-suite may give uniform workload or bursty workload.

For example, interface for server may look like:

struct server
{
void handle_request(msg1 const&);
void handle_request(msg2 const&);
void handle_request(msg3 const&);
void handle_config_change(config const&);
void fetch_result_state(result&);
};

Test-suite will create instance of that class and then call some sequence of 'handle_' methods, then call fetch_result_data() to verify behavior of the implementation.

Dmitry_Vyukov · ‎02-22-2009

What I had in mind regarding 'scalability testing'.

Contest will probably run on a machine with very limited number of cores, for example, 8 cores total. On such limited number of cores solution may be not linearly scalable but just fast. So while the solution will be fastest on 8 cores, it may not be "a parallel solution to the problem suitable for thousands of cores". So my idea was/is to distinguish "real parallel solutions to the problem suitable for thousands of cores" and "just fast solutions suitable only for today's very limited number of cores".
For example,
Solution 1:
1 core - 1x (base case)
2 cores - 1.9x
4 cores - 3.5x
8 cores - 6.0x

Solution 2:
1 core - 0.7x
2 cores - 1.4x
4 cores - 2.8x
8 cores - 5.6x

I.e. solution 1 is the fastest on 8 cores, but we see that it significantly degrades and degradation progresses with increasing number of cores. Solution 2 is linearly scalable, although on 8 cores it is still slower.
Solution 1 is not real parallel solution, and suitable only for 8/16 cores maximum. Solution 2 is real parallel solution and suitable for thousands of cores (at least based on this very limited info). So in scalability oriented competition, solution 2 must win, because if we will extrapolate performance curve to 16/32 cores, we will see that it's solution 2 that is fastest.

So, the formal criterion for scalability oriented contest may be following. Extrapolate performance curve to, for example, 4x more cores and then see who is the fastest on 4x more cores. Extrapolation technique must be formalized.

Although, must-much better criterion for scalability oriented contest will be just to run solutions on 128-core system, and see who is the fastest. On that much number of cores solutions just have to scale. If Intel will be able to provide such machines, it will be great.

jimdempseyatthecove · ‎02-22-2009

Why extrapolate? Intel is presumably hosting this contest. I would guess that they have access to a 128 core system.

Jim

Dmitry_Vyukov · ‎02-22-2009

Quoting - jimdempseyatthecove

Why extrapolate? Intel is presumably hosting this contest. I would guess that they have access to a 128 core system.

Variant with 128-core system is the best. It is good in itself and no need to extrapolate to measure scalability. But I am curious on what system previous contest was running? And on what system 2009 contest will be running?

AMD Threadfest (last round was in Dec 2008) was running on 8-core system (2 processors x 4 cores). This is why I am nervous about scalability :)

Dmitry_Vyukov · ‎02-22-2009

Quoting - Dmitriy Vyukov

AMD Threadfest (last round was in Dec 2008) was running on 8-core system (2 processors x 4 cores). This is why I am nervous about scalability :)

Here is good citation of James Rainders:

-----------------------------------------------
In addition to "Think Parallel" the two key skills are:
* Writing scalable code. The shift in mindset here is fundamental: a good program needs to scale much more than it needs to be efficient. This is a very difficult concept for most to embrace, and if you develop a skill at seeing how to write code that scales, you will have a skill that will be in demand. To survive, we all need to cope with the issue of scaling some.
-----------------------------------------------

From:
http://www.ddj.com/hpc-high-performance-computing/214500435

Dmitry_Vyukov · ‎02-22-2009

Quoting - Dmitriy Vyukov

Banking server. Server holds a number of accounts, accounts are organized into tree hierarchy (accounts/subaccounts), balance is connected to each account. Server handles a number of requests: adjustments to account, transfers between accounts, output balance of account, aggregations on several accounts, creation of new accounts. Accounts can be in several statuses (open, preclosed, closed), status defines a set of allowed operations. Each request and each result must be logged to some common journal. Also there are some global settings, which can be changed at run-time and define some aspects of the request processing. The task is to implement server run-time: request handlers, logging subsystem, settings change handler.

James Rainders:

--------------------------------------------------
In addition to "Think Parallel" the two key skills are:

* Writing scalable code....
* Dealing with shared information, managing global state sharing. Do you share mutable state? if not, how do you update state globally? The desire to make global state visible, but not globally changeable, is really similar to the debates in OOP about public vs. private with all the same issues.
----------------------------------------------------

From:
http://www.ddj.com/hpc-high-performance-computing/214500435

Dmitry_Vyukov · ‎02-22-2009

Btw, Clay, we would also like to hear your comments on our comments on your comments on threading contest :)

Looking for help defining a Threaded Programming Contest for 2009