next up previous
Next: Loop Nests Containing Conditionals Up: Partitioning for Load Balance Previous: Partitioning for Load Balance


Rectangular Loop Nests

Based on the discussion in Section 2, the problem of partitioning the iterations of a single loop with lower bound $l$ and upper bound $u$ amongst $p$ processors can be expressed as that of finding $l_k, u_k, 0 \leq k < p$, such that

\begin{displaymath}
\sum_{i=l}^{u} 1 =
\sum_{i=l_0}^{u_0} 1 + \sum_{i=l_1}^{u_1...
...=l_{p-1}}^{u_{p-1}} 1 =
\sum_{k=0}^{p-1} \sum_{j=l_k}^{u_k} 1,
\end{displaymath} (4)

where $u_k=l_{k+1}-1$, for $0 \leq k < p-1$, $u_{p-1}=u$, and, for all $l_k, u_k$,
\begin{displaymath}
\left\lfloor \frac{n}{p} \right\rfloor \leq
\sum_{j=l_k}^{u_k} 1 \leq
\left\lceil \frac{n}{p} \right\rceil,
\end{displaymath} (5)

where $n = u-l+1 = u_{p-1}-l_0$. The following relations satisfy both (4) and (5):

$\bullet$ Partitioning by decreasing order:

\begin{displaymath}l_k = l + k \lfloor n/p \rfloor + \min(n \bmod p, k),
\mbox{~~$k=0,1,2,\dots,p-1,$} \end{displaymath}

where the first $(n \bmod p)$ partitions have $\lceil n/p \rceil$ iterations, whilst the rest have $\lfloor n/p \rfloor$.

$\bullet$ Partitioning by increasing order:

\begin{displaymath}l_k \! = \! l + k \lfloor n/p \rfloor + \max(0, k-p+(n \bmod p)),
\mbox{$k \! = \! 0,1,2,\dots,p \! - \! 1,$} \end{displaymath}

where the last $(n \bmod p)$ partitions have $\lceil n/p \rceil$ iterations, whilst the rest have $\lfloor n/p \rfloor$.

If $n$ is a multiple of $p$, both relations reduce to

\begin{displaymath}l_k = l + kn/p, \mbox{~~$k=0,1,2,\dots,p-1,$} \end{displaymath}

and the loop is partitioned into equal partitions. In this case, assuming that the body of the loop nest does not contain statements whose execution depends on the value of the index of the outer loop, perfect load balance is achieved. Otherwise, if $n$ is not a multiple of $p$, a relative load imbalance equal to $(p - n \bmod p)/(n + p - n \bmod p)$ is expected, a quantity which approaches zero if $n \gg p$.

The above-mentioned partitioning techniques can also be applied to the outermost loop of a nest of loops, whether perfectly nested or not. Assuming that the bounds of the inner loops are not a function of the index of the outer loop, nor are there any conditional statements in the loop body whose execution depends on the value of the index of the outer loop then perfect load balance may be achieved.

For perfectly nested loops, partitioning may be applied to the iterations of more than one loop at the same time. To illustrate this, assume that partitioning takes place along two loops, the first having a number of iterations equal to $n$, and the second having a number of iterations equal to $m$, and there are $p$ processors. Then, minimising $L$ in (1) reduces to finding $p_1$, $p_2$, $p=p_1 \cdot p_2$, such that $\lceil \frac{n}{p_1} \rceil \lceil \frac{m}{p_2} \rceil$ is minimised. Instead of solving this, it may be preferable to apply loop coalescing [12] before partitioning. Since the inequality $\lceil nm/(p_1 p_2) \rceil \leq \lceil n/p_1 \rceil \lceil m/p_2 \rceil$ always holds [15], the transformed coalesced loop is expected to result in a smaller load imbalance than the original loop nest.

In the next sections, partitioning is considered only with respect to the index of the outermost loop.


next up previous
Next: Loop Nests Containing Conditionals Up: Partitioning for Load Balance Previous: Partitioning for Load Balance
Rizos Sakellariou 2000-07-31