Compile-Time Minimisation of Load Imbalance in Loop Nests
Rizos Sakellariou, John R. Gurd
Department of Computer Science, University of Manchester,
Oxford Road, Manchester M13 9PL, U.K.
Parallelising compilers typically need be armed with some
performance estimate capability in order to evaluate the
trade-offs between different transformations. Such a capability
requires some sophisticated techniques for analysing the program
and providing quantitative estimates to the model. In this paper,
making use of techniques for evaluating symbolically the number
of iterations in a loop, we describe a novel compile-time
scheme for partitioning loop nests in such a way that load
imbalance is minimised. This scheme is based on a special property
of a certain class of loop nests, termed as canonical, which exhibit
a regular distribution pattern of the computational load upon partitioning
along the index of the outermost loop into equal partitions; a technique for
handling non-canonical loop nests is also presented. Essentially,
this makes it possible to partition any loop nest which consists
of loops whose bounds are linear functions of the loop indices.
Experimental results, on a virtual shared memory parallel computer,
indicate that the proposed scheme achieves generally better
performance than other compile-time based schemes.
Proceedings of ACM International Conference on Supercomputing (ICS'97), ACM Press, pp. 277-284