Interprocedural Array Region Analyses

creusillet@cri.mines-paristech.fr

http://www.cri.mines-paristech.fr/~creusil

Outline

Introduction
READ and WRITE regions
IN and OUT regions
Interprocedural propagation
References

1. Introduction

The optimization of scientific programs for distributed memory machines, hierarchical memory machines or fault-tolerant computing environments is a particularly troublesome problem that requires a precise intra- and inter-procedural analysis of array data flow.

A recent type of analysis (see Thomas Brandes and Paul Feautrier) has opened up wide perspectives in this area. It provides an exact analysis of array data flow, originally in monoprocedural programs with static control. This last constraint has since been partially removed by Vadim Maslov and Jean-François Collard, at the expense of accuracy. Furthermore, this method has not yet been extended to interprocedural analyses, and its complexity makes it useless on large programs.

Another approach is to compute conservative summaries of the effects of procedure calls on sets of array elements (Rémi Triolet, David Callahan and Ken Kennedy). Their relatively weak complexity (in practice) allows the analysis of large programs. But since these analyses are flow insensitive, and since they do not precisely take into account the modifications of the values of integer scalar variables, they are not accurate enough to support powerful optimizations.

In PIPS, the Interprocedural Parallelizer of Scientific Programs developed at École des mines de Paris, we have extended Triolet's array regions to compute summaries that exactly represent the effects of statements and procedures on sets of array elements, whenever possible; whereas the regions originally defined by Triolet were over-approximations of these effects. The computation of READ and WRITE regions relies on a preliminary analysis of the exact effects of statements and procedures upon array elements.

The resulting regions are already used in PIPS to enhance the dependence analysis when there are procedure calls, and to efficiently compile HPF. However, they cannot be used to compute array data flow, and are thus insufficient for other optimizations such as array privatization.

We have therefore introduced two new types of exact regions: for any statement or procedure, IN regions contain its imported array elements, and OUT regions represent its set of live array elements.

The possible applications are numerous. These regions are already used in PIPS to privatize array sections. In massively parallel or heterogeneous systems, they can also be used to calculate the communications before and after the execution of a piece of code. For a hierarchical memory machine, they provide the sets of array elements that are used or reused, and hence should be prefetched (IN regions) or kept (OUT regions) in caches or local memories; the array elements that do not appear in these sets and that are accessed in the current piece of code, are only temporaries, and should be handled as such. In fault-tolerant systems where the current state is regularly saved by a software component (checkpointing) IN or OUT regions could provide the set of elements that will be used in further computations, and thus could be used to reduce the amount of data to be saved. Other applications would be the verification of software specifications, support for out-of-core computations...

To support the exactness of the analysis, an accurate interprocedural translation is needed. However, by examining the Perfect Club benchmarks, we found out that the existing methods for handling array reshapes were insufficient. We have implemented in PIPS a general linear framework that handles array reshapes in most cases, including when the arrays are not of the same type, or belong to a COMMON which does not have the same layout in the caller and the callee.

The following sections give some more details about array regions.

2. READ and WRITE regions

An array region is a set of array elements described by a convex polyhedron containing equalities and inequalities: they link the region parameters (or phi variables) that represent the array dimensions, to the values of the program integer scalar variables. Two other characteristics are also of interest:

the type of the region: READ (R) or WRITE (W) to represent the effects of statements and procedures; IN and OUT to represent the flow of array elements;
the approximation of the region: EXACT when the region exactly represents the requested set of array elements, or MAY or MUST if it is an over- or under-approximation; in PIPS, we only consider EXACT and MAY regions (previously, MUST was unfortunately used to mean EXACT).

For instance, the region:

<A(phi1,phi2)-W-MUST-{phi1==I, phi1==phi2}>

where phi1 et phi2 respectively represent the first and second dimensions of A, corresponds to an assignement of the element A(I,I).

In order to manipulate regions and propagate them along the control flow graph of the program, several binary operators are needed: union, intersection, and difference. Unary operators are also used to eliminate the variables of the program that appear in the region predicates and that are modified. See some related publications for more details on these operators.

3. IN and OUT regions

READ and WRITE regions summarize the exact effects of statements and procedures upon array elements. However, they do not represent the flow of array elements. For that purpose, two new types of region, IN and OUT regions, have been introduced, that take array kills into account.

IN regions contain the array elements, whose values are ( EXACT) or may be (MAY) imported by the current piece of code. These are the elements that are read before being possibly redefined by another instruction of the same fragment.

OUT regions corresponding to a piece of code contain the array elements that it defines, and that are ( EXACT) or may be (MAY) used afterwards, in the execution order of the program. These are the live or exported array elements.

The OUT region of a statement is not defiend per se, but depends on the future of the computation. For instance, the OUT region of S1 in program S1;S2 is a function of S1;S2 as a whole, and of S2. Thus, OUT regions are propagateed in a top-down fashion along the call graph and hierarchical control flow graph of the program. Since I/O operations are part of the program, the OUT regions of the main program, from which the other OUT regions are derived, is initialized to the empty set.

4. Interprocedural propagation of array regions

The interprocedural propagation of READ, WRITE, and IN regions is a backward analysis. At each call site, the summary regions of the called subroutine are translated from the callee's name space into the caller's name space, using the relations between actual and formal parameters, and between the declarations of global variables in both routines.

On the contrary, the interprocedural propagation of OUT regions is a forward analysis. The regions of all the call sites are first translated from the callers'name space into the callee's name space, and are then merged to form a unique summary.

Because the source and target arrays may not have the same declaration (array reshaping), the translation is not a straightforward operation.

In his thesis, Triolet gave conditions to perform the translation by simply changing the name of the array, and projecting the polyhedron onto the name space of the caller. These conditions are that the formal array is either similar to the actual array, or a subarray of the actual array, for instance a single column of a matrix.

For the Perfect Club Benchmarks, these conditions are too restrictive. In particular, they cannot handle offsets in declarations, general offsets in the references at call site, and array reshaping is rarely treated.

In order to handle the general case, subscript values (see the FORTRAN standard) are used. The subscript value of an array element is its rank in the array, given the fact that array elements are held in column order. This is equivalent to a linearization of the array. However, this may lead to non-linear expressions that are not handled in PIPS. The method tries instead to find similar dimensions, which are translated in a trivial way. The subscript values are then used for the remaining dimensions.

The translation from formal to real parameters, as well as the translation of global variables (arrays in COMMONs, even in case of a partial matching), are supported. The translation of arrays with different types of elements is also performed.

5. References

Bibliography

Related work:

Thomas Brandes. The importance of direct dependences for automatic parallelization.

International Conference on Supercomputing, pp 407-417, July 1986.

David Callahan and Ken Kennedy. Analysis of interprocedural side effects in a parallel programming environment.

Journal of Parallel and Distributed Computing, No.5, pp. 517-550, 1988.

Jean-François Collard. Automatic parallelization of while-loops using speculative execution.

International Journal of Parallel Programming, 23(2):47-56, February 1995.

Paul Feautrier. Dataflow analysis of array and scalar references.

International Journal of Parallel Programming, 10(1):23-53, September 1991.

Vadim Maslov. Lazy array data-flow analysis.

Symposium on Principles of Programming Language, pp.311-325, January 1994.

Rémi Triolet, Paul Feautrier and François Irigoin. Direct parallelization of call statements.

Proceedings of the ACM Symposium on Compiler Construction, 1986.