Extrinsic Example

Next: Rules for Extrinsic Procedures Up: Language Interoperability Previous: Language Interoperability

Extrinsic Example

Consider the following example,

  INTERFACE
   EXTRINSIC(F77_LOCAL) &
    SUBROUTINE Calc_u_like(My_P_No,Siz,Tot_Proc,a,b,c)
    INTEGER, DIMENSION(:), INTENT(IN)  :: B, C
    INTEGER, DIMENSION(:), INTENT(OUT) :: A
    INTEGER, DIMENSION(:), INTENT(IN)  :: My_P_No
    INTEGER, INTENT(IN)                :: Siz, Tot_Proc
    !HPF$ PROCESSORS, &
    !HPF$  DIMENSION(NUMBER_OF_PROCESSORS()) :: P
    !HPF$ DISTRIBUTE (BLOCK) ONTO P          :: A, B, C
    !HPF$ DISTRIBUTE (BLOCK) ONTO P          :: My_P_No
   END SUBROUTINE output ! EXTRINSIC(F77_LOCAL)
  END INTERFACE

The reason for all the arguments is that when control is inside this procedure, the processor has no idea that it is inside an extrinsic procedure that was called from an HPF program. The processor must therefore be given all the information it needs in order to calculate how big the dummy array arguments are. (FORTRAN 77 is different from Fortran 90. The mechanism for passing arrays into procedures is much more primitive, the sizes of each dimension except the last must be supplied in the declaration of the dummy argument. The last dimension may be left as a *; if the actual extent of this dimension is required then it must be passed as an argument.)

It can be seen that, even though the extrinsic is written in FORTRAN 77, (the precursor to Fortran 90,) the interface is expressed in terms of HPF. As usual, the INTENT should be specified to aid the compiler in optimisation.

The mapping information is very very important as when control passes to the extrinsic, the compiler must make sure that the mapping specified in the interface is really the way that things are. If there were no mapping directives specified in the interface, the compiler will have to ensure that all the dummy arguments possess the default mapping which is usually replication. If this was the case, every time the extrinsic subroutine Calc_u_like is called, the array actual arguments would have to be remapped so that they were replicated on every processor in the grid.

The keyword F77_LOCAL is supported by the Portland Compiler, it allows extrinsics to be written in the FORTRAN 77 language. (The Portland Compiler also defines a set of message passing routines which allow different instances of the F77_LOCAL extrinsic to communicate with each other. Explanation of this style of programming is outside the scope of this course.)

It should be pointed out that most compilers will enforce a barrier synchronisation upon entry to and exit from an EXTRINSIC, this will cause an overhead.

In general, every EXTRINSIC must:

have an explicit INTERFACE including mapping information,
work out which, if any, array elements are local

The INTERFACE is expressed using HPF concepts of INTENT, distribution and assumed-shape arrays; the EXTRINSIC is not.

These sorts of extrinsic functions are often called local routines because they only have a local picture of the machine that they are executing on. Such routines only operate on local data and may have no concept that they are part of a network of processors.

Once inside the extrinsic, the first task is to calculate the portions of A, B and C that are local. This can be worked out from the original size of the matrices, the number of processors in each dimension of the HPF processor arrangement and the distribution method. For example,

 SUBROUTINE Calc_u_like(My_P_No,Siz,Tot_Proc,a,b,c)
  INTEGER A(*), B(*), C(*), My_P_No(1), Siz, Tot_Proc
C Find blocksize
  Blk_Siz    = NINT((DBLE(Siz)/DBLE(Tot_Proc))+0.5D0)
C How many elements have I got
  My_Blk_Siz = MIN(Blk_Siz,Siz-(My_P_No(1)-1)*Blk_Siz)
  My_Blk_Siz = MAX(My_Blk_Siz,0)
C Do the Calculation
  DO 100 i = 1,My_Blk_Siz
   a(i) = b(i) + c(i)
  END DO
 END

In the calling program, P_Nos has same number of elements as there are processors and is distributed blockwise so that each processor gets a single element of this array. It has been arranged that this element contains the number of the processor; the first processor in the chain is number 1, the second is number 2 and so on. Since the size of A and the total number of processors are given, each extrinsic can work out the number of array elements that it owns. The calculation of Blk_Siz works out the maximum number of elements that any processor may have; the calculation of My_Blk_Siz works out the size of the local set. (Recall that in some situations, the last processors in a chain may have less elements than the first processors on a chain.) Once the local set has been established, calculations may proceed as normal (using FORTRAN 77 syntax).

  REAL, DIMENSION(Siz) :: A, B = 0, C = 0
  INTEGER, DIMENSION(NUMBER_OF_PROCESSORS()) :: P_Nos
  !HPF$ PROCESSORS, &
  !HPF$  DIMENSION(NUMBER_OF_PROCESSORS()) :: P
  !HPF$ DISTRIBUTE (BLOCK) ONTO P :: A, B, C, P_Nos
    ... Interface from before goes here
  NOP   = NUMBER_OF_PROCESSORS()
  P_Nos = (/ (i, i=1,NOP) /)
  CALL Calc_u_like(P_Nos,SIZE(A),NOP,A,B,C)
 END

Now try this question

C routines will have to follow a similar method in order to establish their local sets. It should be mentioned that C arrays are stored row-wise so any arrays passed as dummy arguments will need to be transposed -- this will generate a large overhead.

Some compilers, (but not Portland,) may support F90_LOCAL routines. If this is the case, then matters can be much simplified by using assumed-shape arrays. In this example, there is no need to calculate the local set; the Fortran 90 SIZE intrinsic could be used if this needs to be established.

 SUBROUTINE Calc_u_like(A,B,C)
  INTEGER, DIMENSION(:), INTENT(IN)  :: B, C
  INTEGER, DIMENSION(:), INTENT(OUT) :: A
   A = B+C
 END

Note how the extrinsic is relying on the sensible way in which Fortran 90 treats zero-sized arrays to remove the need for block-size calculations.

Return to corresponding overview page

Next: Rules for Extrinsic Procedures Up: Language Interoperability Previous: Language Interoperability