Data Parallelism

Next: Re-Structuring, Task Parallelism and Up: TECHNICAL OUTLINE Previous: TECHNICAL OUTLINE

Data Parallelism

In the context of COBOL applications, data parallelism can be distinguished into file parallelism,where a program runs in parallel against a number of files, and record parallelism, where different records of the same file can be processed in parallel. In general by viewing COBOL files as arrays of records - the first element of an array equates to the first element of a file - many of the techniques for FORTRAN array parallelism can be adapted (Sakellariou and O'Boyle, 1996).

Two statements can be executed in parallel only if there is no relationship between them which constrains their execution order. Such constraints are identified by means of a data dependency analysis (Banerjee et al., 1993). It is important to identify dependencies between different loop iterations. If iteration I1 precedes iteration I2 then three types of data dependence may exist:

a data flow dependence, where a variable written in I1 is read in I2;
a data anti-dependence , where a variable read in I1 is written in I2;
a data output dependence, where a variable is written in both I1 and I2.

A frequent occurrence in COBOL applications is that of updating a master file with the contents of a transaction file; this of course involves reading from two files and writing to one. Consider the following code: record parallelism can be exploited via repeated execution of the UPDATE paragraph.

SET EOF TO FALSE
PERFORM UPDATE UNTIL EOF
...
UPDATE.
READ TRANS-FILE AT END SET EOF TO TRUE
END-READ
IF NOT EOF
(... read master according to trans-rec; update master fields)
WRITE MASTER-REC
...

The fields of the master record define a data output dependency as they are computed in any two iterations of the loop. By applying the privatisation technique (Banerjee et al., 1993), where each processor has its own copy of the variables, the output dependency can be removed. Essentially the problem relates to whether a master record written in an earlier iteration may be read in a subsequent iteration; this is eliminated if no two transaction records update the same master record. If this does not hold, then updates to the same master record should be executed on the same processor or synchronisation is necessary. The transaction file is often sorted before update, in this case it can be split into sub-files, no two sub-files can refer to the same master record. In general, detecting parallelism based on the content of data is difficult.

Not all COBOL loops have independent iterations, for example many have shared variables (checks, totals, sequence numbers, controls etc) for which sequential semantics need to be preserved. FORTRAN dependency analysis can be used to determine what ordering has to be enforced for sequential consistency, and the query decomposition approaches in SQL, where sub-query results are correlated by a master query, can be used.

As "what-if" analysis and forecasting models become more important to decision support and information management, the amount of computation within each iteration will be substantially more complex than above: in this case many computation optimisations may be useful.

Next: Re-Structuring, Task Parallelism and Up: TECHNICAL OUTLINE Previous: TECHNICAL OUTLINE

Rizos Sakellariou 2000-07-31