The previous section described an approach to partitioning canonical loop nests, as introduced in Definition 1. In this section we re-consider loop nests having the general form shown in Figure 2, but without the restrictions associated with Definition 1, except that (i.e., the loop nest is not empty). Then, applying index set splitting, the original loop nest can be transformed into multiple adjacent loop nests each of which satisfies the requirements for partitioning set by Theorem 2 or it is rectangular.
Consider the loop nest shown in Figure 2; the first step consists of finding the values of which satisfy the inequalities If no such values exist, then the loop with index is never executed. If there are such values, given by , the loop with index is always executed; therefore, this loop is canonical with respect to the outer loop, and no index set splitting is required. Conversely, if there is a subset of the values of which satisfies both inequalities, say , where , then the outer loop must be split into two consecutive loops, the first of which corresponds to the values given by , while the second corresponds to ; for the former values, the loop with index is canonical with respect to the outer loop, while, for the latter, it is never executed.
If, as a result of the previous step, there are some values of for which
the two outermost loops are canonical, the next step consists of finding
the values of which make the three outermost loops canonical;
these values must satisfy the system of inequalities
The same procedure is repeated for each loop, successively, until there are no remaining loops or else a given system of, say , , inequalities has no solutions (this would imply that the loop with index is never executed for these values of that render the outermost loops canonical). Note that, in the case where the original loop nest contains more than one consecutive loops at some level, the same procedure should be applied for each loop separately.
Example 2Consider the loop nest shown in Figure 4.a. Since
there are two consecutive loop nests in the body of the I loop,
the procedure described above must be applied separately for each
of them.
For the first loop nest, the J loop is canonical with respect to the outer loop; the K loop is canonical (with respect to the I, J loops) when 2*I-J 1000 J 2*I-1000. Thus, the J loop must be split into two consecutive loops depending on which values of J render the K loop canonical; the bounds of the first loop will be 1 and MAX(1,2*I-1000)-1, and of the second loop MAX(1,2*I-1000) and I. Since the body of the J loop does not contain statements other than the K loop, no statements are executed in the case when 1 J MAX(1,2*I-1000)-1; hence, the corresponding loop can be eliminated.
For the second loop nest, the J loop is canonical with respect to the outer loop when 2*I-500 1000 I 750; the index of the I loop is split accordingly. Furthermore, the K loop is canonical when I+J 1000 J 1000-I. The code resulting after applying the appropriate transformations is shown in Figure 4.b. Evaluating MAX(1,2*I-1000), by replacing it with appropriate conditionals which are then removed using index set splitting (see Section 3.2), results in the code shown in Figure 4.c (note that MIN(1000,1000-I) is always equal to 1000-I since I takes only positive values). Finally, partitioning I for each loop nest as in Section 3.2, and grouping the partitions according to (7) for , the partitioned code can lead to perfect load balance when using 5 processors, and, in general, a comparatively low value of load imbalance [15].