[For complete, up-to-date TBB information visit: http://www.ThreadingBuildingBlocks.org]
Summary
Computes reduction over a range.
Syntax
template<typename Range, typename Body>
void parallel_reduce( const Range& range, Body& body );
Header
#include "tbb/parallel_reduce.h"
Description
A parallel_reduce<Range,Body> performs parallel reduction of Body over each value in Range. Type Range must model the Range concept ( 3.2). The body must model the requirements in Table 9.
Table 9: Requirements for parallel_reduce Body
Pseudo-Signature
Semantics
Body::Body( Body&, split );
Splitting constructor ( 3.1). Must be able to run concurrently with operator() and method join.
Body::~Body()
Destructor
void Body::operator()( Range& range );
Accumulate result for subrange
void Body::join( Body& rhs );
Join results. The result in rhs should be merged into the result of this.
A parallel_reduce recursively splits the range into subranges to the point such that is_divisible() is false for each subrange. A parallel_reduce uses the splitting constructor to make one or more copies of the body for each thread. It may copy a body while the body’s operator() or method join runs concurrently. You are responsible for ensuring the safety of such concurrency. In typical usage, the safety requires no extra effort.
When worker threads are available ( 8.2.1 ), parallel_reduce invokes the splitting constructor for the body. For each such split of the body, it invokes method join in order to merge the results from the bodies. Define join to update this to represent the accumulated result for this and rhs. The reduction operation should be associative, but does not have to be commutative. For a noncommutative operation op, “left.join(right)” should update left to be the result of left op right.
A body is split only if the range is split, but the converse is not necessarily so. Figure 1 diagrams a sample execution of parallel_reduce. The root represents the original body b0 being applied to the half-open interval [0,20). The range is recursively split at each level into two subranges. The grain size for the example is 5, which yields four leaf ranges. The slash marks (/) denote where copies (b1 and b2) of the body were created by the body splitting constructor. Bodies b0 and b1 each evaluate one leaf. Body b2 evaluates leaf [10,15) and [15,20), in that order. On the way back up the tree, parallel_reduce invokes b0.join(b1) and b0.join(b2) to merge the results of the leaves.
b0 [0,20) b0 [0,10) b2 [10,20) b0 [0,5) b1 [5,10)b2 [10,15)b2 [15,20)
Figure 1: Example execution of parallel_reduce over blocked_range<int>(0,20,5)
shows only one possible execution. Other valid executions include splitting b2 into b2 and b3, or doing no splitting at all. With no splitting, b0 evaluates each leaf in left to right order, with no calls to join. A given body always evaluates one or more consecutive subranges in left to right order. For example, in Figure 1, body b2 is guaranteed to evaluate [10,15) before [15,20). You may rely on the consecutive left to right property for a given instance of a body, but must not rely on a particular choice of body splitting. parallel_reduce makes the choice of body splitting nondeterministically. Figure 1
When no worker threads are available, parallel_reduce executes sequentially from left to right in the same sense as for parallel_for ( 3.4). Sequential execution never invokes the splitting constructor or method join.
Complexity
If the range and body take O(1) space, and the range splits into nearly equal pieces, then the space complexity is O(P log(N)), where N is the size of the range and P is the number of threads.
Example
The following code sums the values in an array.
#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"
using namespace tbb;
struct Sum {
float value;
Sum() : value(0) {}
Sum( Sum& s, split ) {value = 0;}
void operator()( const blocked_range<float*>& range ) {
float temp = value;
for( float* a=range.begin(); a!=range.end(); ++a ) {
temp += *a;
}
value = temp;
}
void join( Sum& rhs ) {value += rhs.value;}
};
Syntax
struct pre_scan_tag;
struct final_scan_tag;
Header
#include "tbb/parallel_scan.h"
Description
Types pre_scan_tag and final_scan_tag are dummy types used in conjunction with parallel_scan. See the example in Section 3.6 for how they are used in the signature of operator().
Members
namespace tbb {
struct pre_scan_tag {
static bool is_final_scan();
};
struct final_scan_tag {
static bool is_final_scan();
};
}
3.6.1.1 bool is_final_scan()
Returns
True for a final_scan_tag, otherwise false.
Summary
Template function that computes parallel prefix, with the splitting of the range guided by the Partitioner parameter.
Syntax
template<typename Range, typename Body, typename Partitioner>
void parallel_scan( const Range& range, Body& body,
Partitioner &partitioner );
Header
#include "tbb/parallel_scan.h"
[For complete, up-to-date TBB information visit: http://www.ThreadingBuildingBlocks.org]