Based on the discussion in Section 2, the problem of partitioning the iterations of a
single loop with lower bound and upper bound amongst processors
can be expressed as that of finding
, such that
Partitioning by decreasing order:
Partitioning by increasing order:
If is a multiple of , both relations reduce to
The above-mentioned partitioning techniques can also be applied to the outermost loop of a nest of loops, whether perfectly nested or not. Assuming that the bounds of the inner loops are not a function of the index of the outer loop, nor are there any conditional statements in the loop body whose execution depends on the value of the index of the outer loop then perfect load balance may be achieved.
For perfectly nested loops, partitioning may be applied to the iterations of more than one loop at the same time. To illustrate this, assume that partitioning takes place along two loops, the first having a number of iterations equal to , and the second having a number of iterations equal to , and there are processors. Then, minimising in (1) reduces to finding , , , such that is minimised. Instead of solving this, it may be preferable to apply loop coalescing [12] before partitioning. Since the inequality always holds [15], the transformed coalesced loop is expected to result in a smaller load imbalance than the original loop nest.
In the next sections, partitioning is considered only with respect to the index of the outermost loop.