Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.



Here's a interesting mathematical question: Is there a way to define a range object that lets us write a parallel_for that iterates over permutations of a set of N objects? Better yet, can it be done so that subranges are always in lexicographic order?

Here's how it might be used. The parallel_for invocation might look something like:

parallel_for( perm_range(N), Body() );

and Body::operator() might look something like:

void Body::operator()( const perm_range& r ) {
permutation p = r.begin();
while( p!=r.end() ) {
...process the permutation...
...get the next permutation...
std::next_permutation( p.begin(), p.end() );

For example, for N=3, we would want the permutations:

0 1 2
0 2 1
1 0 2
1 2 0
2 0 1
2 1 0

to be generated. The hard part is figuring out how to divide a permutation range into two roughly equal subranges.

- Arch

P.S. TBB does support recursive parallelism, so it's straightforward to write a recursive program that does permutations in parallel, using a parallel_for at each recursion level. I can post that code if anyone cares.

0 Kudos
2 Replies
The lexicographic ranking and unranking of permutations might be useful for this problem, but could be quite costly. They perform a 1-on-1 mapping between a permutation of integers {0,1,...,(n-1)} and an integer within [0,n!). The algorithms can be found in the book "Combinatorial algorithms" by Donald L. Kreher and Douglas R. Stinson. If you're interested, I can post my implementation here.
0 Kudos

I came up with a reasonably straightforward and efficient solution a while back, but didn't have time to post it until now. See code below. It still has printfs in it that are inappropriate for parallel execution, but I left them there to show where the permutations appear.

- Arch

// permute.cpp : Defines the entry point for the console application.


// Program that shows how to write a TBB-style recursive range for permutations.

// Author: Arch D. Robison

// Date: 9/18/2007



#include "tbb/parallel_for.h"

#include "tbb/task_scheduler_init.h"

//! Recursively divisible set of permutations of N objects of type T.

/** Internally, it represents all permutations x of a such that:

 x==a for all k

 x[position]is some a for k in [i,j) 

 x is some a for k in [position,N]

 Be sure to use the default simple_partitioner, not auto_partitioner, because the range cannot be iterated

 over unless is_divisible() returns true.


template<typename T, size_t N>

class permutation_range {

 unsigned position;

 //! When position reaches limit, the set of permutations is no longer considered divisible.

 const unsigned limit;

 unsigned i, j;

 T a;

 //! Check that internal state is c

 bool assert_okay() {

 assert( i
 assert( j<=N );

 assert( position<=i );

 assert( position<=limit );

 for( unsigned k=0; k
 assert( a
 for( unsigned n=0; n
 assert( n==k || a!=a );


 return true;


 void normalize_if_leaf() {

 if( position==limit-1 && i+1==j ) {

 std::rotate(&a[position],a+i, a+i+1);


 assert( assert_okay() );



 bool empty() const {

 return position==N;


 bool is_divisible() const {

 return position+1

 //! Construct initial permutation

 permutation_range( const T x, unsigned grainsize ) : 




 std::copy( x, x+N, a );

 assert( assert_okay() );


 //! Splitting constructor

 permutation_range( permutation_range& r, tbb::split ) : limit(r.limit) {

 assert( r.assert_okay() );

 assert( r.is_divisible() );

 if( r.i+1==r.j ) {


 std::rotate(&r.a[r.position],r.a+r.i, r.a+r.i+1);

 r.i = ++r.position;

 r.j = N;


 std::copy( r.a, r.a+N, a );

 int m = (r.j-r.i)/2;


 j = r.j;

 i = r.j = r.i+m;

 position = r.position;




 //! Set x[] to first permutation in this range.

 void get_first_subpermutation( T x ) const {


 std::copy( a, a+N, x );


 //! Advance x[] to the next permutation in this range. Returns true unless next permutation was a wrap.

 bool next_subpermutation( T x ) const {


 return std::next_permutation( x+limit, x+N );



const int N = 5;

struct Body {

 void operator()( const permutation_range<int,N>& r ) const {

 int a;


 // Process all permutations in the subrange

 printf("	slice

 do {

 // Operations on a given permutation

 printf("		");

 for( int k=0; k
 printf("%d ",a);

bsp; printf("

 // Get the next permutation

 } while( r.next_subpermutation( a ) );



int main() {

 const int N = 5;

 tbb::task_scheduler_init init(1);

 int a;

 for( unsigned k=0; k
 a = k;

 // Try different grain sizes

 for( int g=1; g<=N; ++g ) {

 printf("With grain size=%d

 tbb::parallel_for( permutation_range<int,N>(a,g), Body() );


 return 0;


0 Kudos