next up previous
Next: Canonical Loop Nests Up: Partitioning for Load Balance Previous: Rectangular Loop Nests


Loop Nests Containing Conditionals

The presence, in the loop body, of conditionals whose execution does not depend on the value of the index of the outer loop does not affect the load imbalance resulting from the partitioning schemes described so far; each iteration of the outer loop still performs the same amount of work and, consequently, perfect load balance can be achieved. In the special case where the execution of a conditional depends on a condition involving the index of the outer loop and a constant, then, applying index set splitting [20] prior to partitioning, the conditional may be removed [2].

For instance, let the inequality $l \leq i \leq u$ bound the index of the outermost, parallel loop $i$, and the inequality $l_x \leq i \leq u_x$ correspond to a logical expression evaluated by a conditional in the loop body. Then, the original loop nest can be split into three consecutive loop nests, whose indices take values in the following intervals respectively:

\begin{displaymath}[l,\min(l_x-1,u)],\; [\max(l,l_x),\min(u,u_x)], \; [\max(u_x+1,l),u].
\end{displaymath} (6)

The second interval contains the values of $i$ which satisfy both $l \leq i \leq u$ and $l_x \leq i \leq u_x$. In some cases, the upper bound of an interval can be smaller than its lower bound; then, the corresponding loop nest will never be executed and, therefore, it can be eliminated. Assuming that at least two of the intervals are not empty, the question is how to partition the resulting loop nests. Two main approaches are considered which are illustrated by means of the following example.

Figure 1: Partitioning loop nests containing conditionals.
 DOALL I=L,U
    (statements.1)
    IF (I.GT.A) THEN
       (statements.2)
    ELSE
       (statements.3)
    ENDIF
 ENDDO

a) Loop with conditional.


       DOALL I=L,A
          (statements.1)
          (statements.3)
       ENDDO
       DOALL I=A+1,U
          (statements.1)
          (statements.2)
       ENDDO

b) After index set splitting.


       N=A-L+1
       M=U-A
C*KSR* PARALLEL REGION(NUMTHREADS=P,PRIVATE=I,K,LK,UK)
       K=IPR_MID()
       LK=L+K*FLOOR(N/P)+MIN(MOD(N,P),K)
       UK=L+(K+1)*FLOOR(N/P)+MIN(MOD(N,P),K+1)
       DO I=LK,UK
          (statements.1)
          (statements.3)
       ENDDO
       LK=A+1+K*FLOOR(M/P)+MAX(0,K-P+MOD(M,P))
       UK=A+1+(K+1)*FLOOR(M/P)+MAX(0,K+1-P+MOD(M,P))
       DO I=LK,UK
          (statements.1)
          (statements.2)
       ENDDO
C*KSR* END PARALLEL REGION

c) After loop partitioning.

Consider the code shown in Figure 1.a;[*] applying (6) and assuming that for the values of L, U, A it is known at compile-time that L$\leq$A$<$U, the code shown in Figure 1.b results. One approach to partition this code is to partition each loop using any of the two schemes described in Section 3.1, and then assign the same partition of each loop to the same processor; for both loops, if the number of iterations is a multiple of the number of processors, $p$, then perfect load balance is achieved. In the general case, let $n, m$ be the number of loop iterations, and $W_1, W_2$ the amount of work in the body of each loop, respectively; assuming that the same partitioning scheme (either by decreasing or increasing order) is applied to both loops, then the expected load imbalance is equal to $ L = \left\lceil \frac{n}{p} \right\rceil W_1 +
\left\lceil \frac{m}{p} \right\rceil W_2 - \frac{n W_1 + m W_2}{p}. $ Whenever $(n \bmod p) + (m \bmod p) \leq p$, the load imbalance can be reduced by partitioning one of the loops by decreasing order and the other one by increasing order. This approach is followed in the code shown in Figure 1.c.[*]


next up previous
Next: Canonical Loop Nests Up: Partitioning for Load Balance Previous: Rectangular Loop Nests
Rizos Sakellariou 2000-07-31