The XLF Version 5.1 compiler provides a compiler option, -qsmp, which instructs the compiler to automatically parallelize Fortran DO loops. This includes both DO loops coded explicitly by the user, and DO loops generated by the compiler for array language constructs, ( WHERE, FORALL, array assignment, ...). However, the compiler will only parallelize loops that are independent, that is, loops whose iterations can be computed independently of any other iteration.
While automatic parallelization will be sufficient for some users, the SMP directives give you the option of providing additional information about the source code to the compiler. The information you pass to the compiler will either be used during automatic parallelization or to specify that certain parts of the program can be parallelized. For example, the PARALLEL DO directive specifies that the DO loop immediately following it should be executed in parallel.
These are the directives which have been added for XLF Version 5.1:
>>-directive--------------------------------------------------->< |
Noncomment form directives, are always recognized by the compiler. They cannot be continued.
Additional statements cannot be included on the same line as a directive.
Source format rules concerning white space apply to directive lines.
>>-trigger_head---trigger_constant---directive----------------->< |
The default value for the trigger_constant is IBM*. Comment form directives using IBM* as the trigger_constant are always recognized by the compiler.
All comment form directives, with the exception of the default, are treated as comments by the compiler unless the appropriate trigger_constant has been defined using the -qdirective compiler option. As a result, code containing these directives can be ported to non-SMP environments.
When compiling using either the xlf_r or xlf90_r invocation commands, the option -qdirective=IBM*:IBMT is turned on by default. If the -qsmp compiler option is used in conjunction with one of these invocation commands, the option -qdirective=IBM*:SMP$:$OMP:IBMP:IBMT is turned on by default. You can specify an alternate trigger_constant with the -qdirective compiler option. See the -qdirective compiler option in the User's Guide for more details.
XLF supports some features of the OpenMP specification. In particular, XLF has partial support for the CRITICAL, END CRITICAL, PARALLEL DO, PARALLEL SECTIONS, SECTION, and END PARALLEL SECTIONS directives. To ensure the greatest portability of code, we recommend that you use these directives whenever possible. These directives should be used with the OpenMP trigger_constant, $OMP; this trigger_constant should not be used with any other directive.
XLF also includes the trigger_constants IBMP and IBMT. IBMP is recognized if you compile using the -qsmp compiler option and is recommended for use with the SCHEDULE directive. IBMT is recognized if you compile using the -qthreaded compiler option (which is the default for the xlf_r or xlf90_r invocation commands) and is recommended for use with the THREADLOCAL directive.
XLF directives include some directives that are in common with those provided by other vendors. If you make use of these directives in your code, you can enable whichever trigger_constant that vendor has selected by specifying the trigger_constant using the -qdirective compiler option. Refer to the -qdirective compiler option in the User's Guide for details on specifying alternative trigger_constants.
A directive can be specified as a free source form or fixed source form comment, depending on the current source form.
The trigger_head follows the rules of comment lines either in Fortran 90 free source form or fixed source form. If the trigger_head is !, it does not have to be in column 1. There must be no blanks between the trigger_head and the trigger_constant.
The directive_trigger, (defined as the trigger_head combined with the trigger_constant, !IBM* for example) and any directive keywords can be specified in uppercase, lowercase, or mixed case.
You can specify inline comments on directive lines.
!SMP$ INDEPENDENT, NEW(i) !This is a comment
A directive cannot follow another statement or another directive on the same line.
All comment form directives can be continued. A directive cannot be imbedded within a continued statement, nor may a statement be imbedded within a continued directive.
The directive_trigger must be specified on all continuation lines. However, the directive_trigger on a continuation line need not be identical to the directive_trigger used in the continued line. For example:
!SMP$ INDEPENDENT & !IBM*& , REDUCTION (X) & !SMP$& , NEW (I)is equivalent to:
!SMP$ INDEPENDENT, REDUCTION (X), NEW (I)provided both IBM* and SMP$ are active trigger_constants.
For more information, see "Lines and Source Formats".
If the trigger_head is one of C, c, or *, it must be in column 1.
The maximum length of the trigger_constant in fixed source form is 4 for directives which are continued on one or more lines. This rule applies to the continued lines only and not to the initial line. Otherwise, the maximum length of the trigger_constant is 15. We recommend that initial line triggers should have a maximum length of 4. The maximum allowable length of 15 is permitted for the purposes of backwards compatibility.
The first line of a comment directive must have either white space or a zero in column 6 if the trigger_constant has a length of 4 or less. Otherwise, the character in column 6 is part of the trigger_constant.
The directive_trigger of a continuation line of a comment directive must appear in columns 1-5. Column 6 of a continuation line must have a character that is neither white space nor a zero.
For more information, see "Fixed Source Form".
The maximum length of the trigger_constant is 15.
An ampersand (&) at the end of a line indicates the directive is continued. When you continue a directive line, a directive_trigger must appear at the beginning of all continuation lines. If you are beginning a continuation line with an ampersand, the directive_trigger must precede the ampersand. For example:
!IBM* INDEPENDENT & !SMP$& , REDUCTION (X) & !IBM*& , NEW (I)
For more information, see "Fortran 90 Free Source Form".
This chapter describes the following directives:
Purpose
The ASSERT directive provides information to the compiler about the characteristics of DO loops. This assists the compiler in optimizing the source code.
The ASSERT directive only takes effect if either the -qsmp or -qhot compiler option is specified.
Format
>>-ASSERT--(--assertion_list--)-------------------------------->< |
Rules
The first noncomment line (not including other directives) following the ASSERT directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The ASSERT directive applies only to the DO loop immediately following the directive and not to any nested DO loops.
ITERCNT provides an estimate to the compiler about roughly how many iterations the DO loop will typically execute. There is no requirement that the value be accurate; ITERCNT will only affect performance, never correctness.
When NODEPS is specified, the user is explicitly declaring to the compiler that no loop-carried dependencies exist within the DO loop or any procedures invoked from within the DO loop. A loop-carried dependency involves two iterations within a DO loop interfering with one another. Interference occurs in the following situations:
While it is possible for two complementary ASSERT directives to apply to any given DO loop, an ASSERT directive cannot be followed by a contradicting ASSERT directive for a given DO loop:
!SMP$ ASSERT (ITERCNT(10)) !SMP$ INDEPENDENT, REDUCTION (A) !SMP$ ASSERT (ITERCNT(20)) ! invalid DO I = 1, N A(I) = A(I) * I END DOIn the example above, the ASSERT(ITERCNT(20)) directive contradicts the ASSERT(ITERCNT(10)) directive and is invalid.
The ASSERT directive overrides the -qassert compiler option for the DO loop on which the ASSERT directive is specified.
Examples
Example 1:
! An example of the ASSERT directive with NODEPS. PROGRAM EX1 INTEGER A(100) !SMP$ ASSERT (NODEPS) DO I = 1, 100 A(I) = A(I) * FNC1(I) END DO END PROGRAM EX1 FUNCTION FNC1(I) FNC1 = I * I END FUNCTION FNC1
Example 2:
! An example of the ASSERT directive with NODEPS and ITERCNT. SUBROUTINE SUB2 (N) INTEGER A(N) !SMP$ ASSERT (NODEPS,ITERCNT(100)) DO I = 1, N A(I) = A(I) * FNC2(I) END DO END SUBROUTINE SUB2 FUNCTION FNC2 (I) FNC2 = I * I END FUNCTION FNC2
Related Information
Purpose
When the CNCALL directive is placed before a DO loop, the user is explicitly declaring to the compiler that no loop-carried dependencies exist within any procedure called from the DO loop.
The CNCALL directive only takes effect if either the -qsmp or -qhot compiler option is specified.
Format
>>-CNCALL------------------------------------------------------>< |
Rules
The first noncomment line (not including other directives) following the CNCALL directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The CNCALL directive applies only to the DO loop immediately following the directive and not to any nested DO loops.
When the CNCALL directive is specified, the user is explicitly declaring to the compiler that no procedures invoked within the DO loop have any loop-carried dependencies. If the DO loop invokes a procedure, separate iterations of the loop must be able to concurrently call upon that procedure. The CNCALL directive does not assert that other operations in the loop do not have dependencies - it is only an assertion about procedure references.
A loop-carried dependency occurs when two iterations within a DO loop interfere with one another. See ASSERT for the definition of interference.
Examples
! An example of CNCALL where the procedure invoked has ! no loop-carried dependency but the code within the ! DO loop itself has a loop-carried dependency. PROGRAM EX3 INTEGER A(100) !SMP$ CNCALL DO I = 1, N A(I) = A(I) * FNC3(I) A(I) = A(I) + A(I-1) ! This has loop-carried dependency END DO END PROGRAM EX3 FUNCTION FNC3 (I) FNC3 = I * I END FUNCTION FNC3
Related Information
Purpose
The CRITICAL construct allows you to define independent blocks of code that are to be executed by at most one thread at a time. The CRITICAL construct includes a CRITICAL directive followed by a block of code and ends with an END CRITICAL directive.
The CRITICAL and END CRITICAL directives only take effect if the -qsmp compiler option is specified.
Format
>>-CRITICAL--+------------------+------------------------------>< +-(--lock_name--)--+ >>-block------------------------------------------------------->< >>-END CRITICAL--+------------------+-------------------------->< +-(--lock_name--)--+ |
Rules
The optional lock_name is a name with global scope. The lock_name must not be used to identify any other global entity in the same executable program.
If the lock_name is specified on the CRITICAL directive, the same lock_name must also be specified on the corresponding END CRITICAL directive.
If the same lock_name is specified for more than one CRITICAL construct, the compiler will allow only one thread to execute any one of these CRITICAL constructs at any one time. If multiple CRITICAL constructs have differing lock_name's, the compiler will allow those constructs to run in parallel.
All CRITICAL constructs which do not have an explicit lock_name specified are protected by the same lock. In other words, these CRITICAL constructs will be assigned the same lock_name by the compiler, thereby ensuring that only one thread enters any unnamed CRITICAL construct at a time.
The lock_name must not be the same as a class 1 local entity as defined under the heading The Scope of a Name.
It is illegal to branch into or out of a CRITICAL construct. The CRITICAL construct must not refer to procedures compiled with the -qsmp=auto compiler option.
The CRITICAL construct may appear anywhere in a program.
The CRITICAL construct must not contain a PARALLEL DO directive or a PARALLEL SECTIONS construct. The CRITICAL construct must not refer to procedures containing either a PARALLEL DO directive or a PARALLEL SECTIONS construct.
Although it is possible to nest a CRITICAL construct within a CRITICAL construct it is not considered advisable as a deadlock situation may result.
Examples
Example 1: Note that in this example the CRITICAL construct appears within a DO loop which has been marked with the PARALLEL DO directive.
EXPR=0 !SMP$ PARALLEL DO PRIVATE (I) DO I = 1, 100 !SMP$ CRITICAL EXPR = EXPR + A(I) * I !SMP$ END CRITICAL END DO
Example 2: An example specifying a lock_name on the CRITICAL construct.
!SMP$ PARALLEL DO PRIVATE(T) DO I = 1, 100 T = B(I) * B(I-1) !SMP$ CRITICAL (LOCK) SUM = SUM + T !SMP$ END CRITICAL (LOCK) END DO
Related Information
Purpose
EJECT directs the compiler to start a new full page of the source listing. If no source listing has been requested, this directive is ignored.
Format
>>-EJECT------------------------------------------------------->< |
Rules
The EJECT compiler directive can have an inline comment and a label. However, if a statement label is specified, the compiler discards it. Therefore, you must not reference any label on an EJECT directive. An example of usage would be to put an EJECT directive before the start of an important DO loop that you do not want to split across pages in the listing. If you send the source listing to a printer, the EJECT directive provides a page break.
Purpose
The INCLUDE compiler directive inserts a specified statement or a group of statements into a program unit.
Format
>>-INCLUDE--+-char_literal_constant-+-+----+------------------->< +-(--name--)------------+ +-n--+ |
Under the AIX operating system, it need not specify the full path of the desired file, but it must specify the file extension if one exists.
name must contain only characters allowable in the XL Fortran character set. See "Characters" for the character set supported by XL Fortran.
char_literal_constant is a character literal constant.
A feature called conditional INCLUDE provides a means for selectively activating INCLUDE compiler directives within the Fortran source during compilation. You specify the included files by means of the -qci compiler option.
In fixed source form, the INCLUDE compiler directive must start after column 6, and can have a label.
An inline comment can be added to the line.
Rules
An included file can contain any complete Fortran source statements and compiler directives, including other INCLUDE compiler directives. Recursive INCLUDE compiler directives are not allowed. An END statement can be part of the included group. The first included line must not be a continuation line, nor can the last included line be continued. The statements in the include file are processed with the source form of the including file.
If the SOURCEFORM directive appears in an include file, the source form reverts to that of the including file once processing of the include file is complete. After the inclusion of all groups, the resulting Fortran program must follow all of the Fortran rules for statement order.
For an INCLUDE compiler directive with the left and right parentheses syntax, XL Fortran translates the file name to lowercase unless the -qmixed compiler option is on.
The AIX file system locates the file specified by filename as follows:
Examples
INCLUDE '/u/userid/dc101' ! full absolute file name specified INCLUDE '/u/userid/dc102.inc' ! INCLUDE file name has an extension INCLUDE 'userid/dc103' ! relative path name specified
INCLUDE (ABCdef) ! includes file abcdef
INCLUDE '../Abc' ! includes file Abc from parent directory ! of directory being searched
Related Information
" -qci Option" in the User's Guide
Purpose
The INDEPENDENT directive, if used, must precede a DO loop, FORALL statement, or FORALL construct. This directive specifies that each operation in the FORALL statement or FORALL construct, or each iteration of the DO loop, can be executed in any order without affecting the semantics of the program.
The INDEPENDENT directive only takes effect if either the -qsmp or -qhot compiler option is specified.
Format
+---------------------------------------------+ V | >>-INDEPENDENT----+------------------------------------------++->< +-,--NEW--(--named_variable_list--)--------+ +-,--REDUCTION--(--named_variable_list--)--+ |
Rules
The first noncomment line (not including other directives) following the INDEPENDENT directive must be a DO loop, FORALL statement, or the first statement of a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The INDEPENDENT directive applies only to the DO loop immediately following the directive and not to any nested DO loops.
An INDEPENDENT directive can have at most one NEW clause and at most one REDUCTION clause.
If the directive applies to a DO loop, no iteration of the loop can interfere with any other iteration. Interference occurs in the following situations:
If the NEW clause is specified, the directive must apply to a DO loop. The NEW clause modifies the directive and any surrounding INDEPENDENT directives by accepting any assertions made by such directive(s) as true even if the variables specified in the NEW clause are modified by each iteration of the loop. Variables specified in the NEW clause behave as if they are private to the body of the DO loop. That is, the program is unaffected if these variables (and any variables associated with them) were to become undefined both before and after each iteration of the loop.
Any variable specified in the NEW clause or REDUCTION clause must not:
For FORALL, no combination of index values affected by the INDEPENDENT directive assigns to an atomic storage unit that is required by another combination. If a DO loop, FORALL statement, or FORALL construct have the same body and each are preceded by an INDEPENDENT directive, they behave the same way.
The REDUCTION clause asserts that the named variables are updated within REDUCTION statements in the INDEPENDENT loop. Furthermore, the intermediate values of the REDUCTION variables are not used within the loop, other than in the updates themselves. Thus, the value of the REDUCTION variable after the loop is the result of a reduction tree.
If the REDUCTION clause is specified, the directive must apply to a DO loop. The only reference to a REDUCTION variable in an INDEPENDENT DO loop must be within a reduction statement.
A REDUCTION variable must be of intrinsic type but must not be of type character. A REDUCTION variable must not be an allocatable array.
A REDUCTION variable must not occur in:
>>---reduction_var_ref = expr---reduction_op---reduction_var_ref--->< >>---reduction_var_ref = reduction_var_ref---reduction_op---expr--->< >>-reduction_var_ref = reduction_function--(expr,--reduction_var_ref)->< >>-reduction_var_ref = reduction_function--(reduction_var_ref,--expr)->< |
where:
The following rules apply to reduction statements:
>>-reduction_var_ref-- = --expr-- - --reduction_var_ref-------->< |
Examples
Example 1:
INTEGER A(10),B(10,12),F !IBM* INDEPENDENT ! The NEW clause cannot be FORALL (I=1:9:2) A(I)=A(I+1) ! specified before a FORALL !IBM* INDEPENDENT, NEW(J) DO M=1,10 J=F(M) ! 'J' is used as a scratch A(M)=J*J ! variable in the loop !IBM* INDEPENDENT, NEW(N) DO N=1,12 ! The first executable statement B(M,N)=M+N*N ! following the INDEPENDENT must END DO ! be either a DO or FORALL END DO END
Example 2:
X=0 !IBM* INDEPENDENT, REDUCTION(X) DO J = 1, M X = X + J**2 END DO
Example 3:
INTEGER A(100), B(100, 100) !SMP$ INDEPENDENT, REDUCTION(A), NEW(J) ! Example showing an array used DO I=1,100 ! for a reduction variable DO J=1, 100 A(I)=A(I)+B(J, I) END DO END DO
Related Information
Purpose
The PARALLEL DO directive provides a means of specifying which loops should be parallelized by the compiler.
The PARALLEL DO directive only takes effect if the -qsmp compiler option is specified.
Format
+-+--+----------------------------+ | +--+ | V | >>-PARALLEL DO----+-----------------------------+-+------------>< +-+---+---parallel_do_clause--+ +-,-+ |
where parallel_do_clause is:
>>-+-IF--(--scalar_logical_expr--)-----------------------+----->< +-LASTPRIVATE--(--named_variable_list--)--------------+ +-PRIVATE--(--named_variable_list--)------------------+ +-REDUCTION--(-+----------+---named_variable_list--)--+ | +-op_fnc :-+ | +-SCHEDULE--(--sched_type-+----+---)------------------+ | +-,n-+ | +-SHARED--(--named_variable_list--)-------------------+ |
Rules
The first noncomment line (not including other directives) following the PARALLEL DO directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The PARALLEL DO directive applies only to the DO loop immediately following the directive and not to any nested DO loops.
No iteration of the DO loop can interfere with any other iteration, unless the interference occurs within a CRITICAL construct. See the definition of interference outside a CRITICAL construct, for more information.
The PARALLEL DO directive must not be followed by another PARALLEL DO directive. Only one PARALLEL DO directive may be specified for a given DO loop.
The PARALLEL DO directive must not appear with the INDEPENDENT directive for a given DO loop.
Note: | The INDEPENDENT directive allows you to keep your code common with HPF implementations. The PARALLEL DO directive should be used for maximum portability across multiple vendors. The PARALLEL DO directive is a prescriptive directive while the INDEPENDENT directive is an assertion about the characteristics of the loop. See the INDEPENDENT directive for more information. |
A variable should be specified with the PRIVATE attribute if its value is used during the calculation of a single iteration of a loop, and that value is not dependent on any other iteration of the loop. Copies of the PRIVATE variable exist, locally, on each thread. Each iteration of the loop receives its own uninitialized copy of the PRIVATE variable. A PRIVATE variable has an undefined value or association status on entry to, and exit from, the loop. All DO loop iteration variables within the dynamic extent of the PARALLEL DO directive are given the PRIVATE attribute by default.
Local variables without the SAVE or STATIC attributes in referenced subprograms in the dynamic extent of a PARALLEL DO directive have an implicit PRIVATE attribute. Common blocks and modules in referenced subprograms in the dynamic extent of a PARALLEL DO directive have an implicit SHARED attribute, unless they are THREADLOCAL common blocks.
If one of the entities involved in an asynchronous I/O operation is a PRIVATE variable or a subobject of a PRIVATE variable, the matching WAIT statement must be executed before the end of the iteration.
If there is a call to an MPI routine which does non-blocking communication in a parallel loop, no arguments to the MPI routine should be PRIVATE or LASTPRIVATE.
The LASTPRIVATE clause functions in a manner similar to the PRIVATE clause and should be specified for variables that match the same criteria. The exception is the status of the variable upon exit from the loop. The compiler determines the value of the variable at the final iteration, and takes a copy of that value. The copy of the value is then saved in the named variable for use after the loop. A LASTPRIVATE variable is undefined on entry into the loop. If the last iteration does not define a value then the LASTPRIVATE variable is undefined after the loop.
A variable which appears in the PRIVATE or LASTPRIVATE clause of an inner DO loop must also appear in the PRIVATE or LASTPRIVATE clause of all enclosing DO loops which have the PARALLEL DO directive specified, and of all enclosing PARALLEL SECTIONS constructs. This includes both the lexical extent and the dynamic extent of the PARALLEL DO directive and the PARALLEL SECTIONS construct.
The REDUCTION clause specifies named variables that appear in reduction operations. The compiler will maintain local copies of such variables, but will combine them at loop exit. The intermediate values of the REDUCTION variables are combined in random order, dependent on which threads finish their calculations first. There is, therefore, no guarantee that bit-identical results will be obtained from one parallel run to another, even if the parallel runs use the same number of threads and the same scheduling type and chunk size.
The SHARED clause specifies variables that must be available to all threads. If a variable is specified as SHARED, the user is stating that all iterations of the loop can safely share a single copy of the variable. You should specify a variable as SHARED when:
If neither condition is satisfied then a variable may be marked SHARED only if it is used within a CRITICAL construct, (see CRITICAL / END CRITICAL), and the updating of, or reference to, the variable is not dependent on the order in which the iterations of the loop are executed. All variables, with the exception of loop-iteration variables, are SHARED by default.
If a SHARED variable, subobject of a SHARED variable, or an object associated with a SHARED variable or subobject of a SHARED variable appears as an actual argument in a reference to a non-intrinsic procedure:
unless the procedure reference appears in a CRITICAL construct.
While a DO loop is executed, a variable or subobject of a variable must not be referenced, become defined, become undefined, have its association status or allocation status changed, or appear as an actual argument:
The IF clause may appear at most once in a PARALLEL DO directive.
By default, a nested parallel loop is serialized, regardless of the setting of the IF clause. You can change this default by using the -qsmp=nested_par compiler option.
The SCHEDULE clause may appear at most once in a PARALLEL DO directive.
A variable name must not appear:
A variable in the PRIVATE clause must not:
Note that a variable or a subobject of a variable in the named_variable_list of the PRIVATE or LASTPRIVATE clause may have the POINTER attribute. Such a pointer has undefined association status on entry to the DO loop and undefined association status on exit from every iteration of the DO loop, except that it will retain its association status at the end of the last iteration if the variable appeared in the LASTPRIVATE clause. Also note that a variable name in the named_variable_list of the PRIVATE clause may be an allocatable array. It must not be allocated on initial entry to the DO loop and the user must allocate and deallocate the array in every iteration of the DO loop.
A variable in the LASTPRIVATE clause must not:
If the last iteration of the DO loop does not define a LASTPRIVATE variable, the variable is undefined after the loop.
A variable in the REDUCTION clause must be of intrinsic type. A variable in the REDUCTION clause, or any element thereof, must not:
A variable which appears in the REDUCTION clause of an inner DO loop must also appear in the PRIVATE, LASTPRIVATE, or REDUCTION clause of all enclosing DO loops which have the PARALLEL DO directive specified, and of all enclosing PARALLEL SECTIONS constructs. This includes both the lexical extent and the dynamic extent of the PARALLEL DO directive and the PARALLEL SECTIONS construct. If the REDUCTION variable of an inner DO loop appears in the PRIVATE or LASTPRIVATE clause of an enclosing DO loop or PARALLEL SECTIONS construct, the variable must be initialized before the inner DO loop.
A REDUCTION variable must not appear in either a PRIVATE or LASTPRIVATE clause in the body of the following DO loop.
A variable that appeared in the REDUCTION clause of an INDEPENDENT directive of an enclosing DO loop must not also appear in the named_variable_list of the PRIVATE or LASTPRIVATE clause.
A variable in the SHARED clause must not:
Examples
Example 1: A valid example with the LASTPRIVATE clause.
!SMP$ PARALLEL DO PRIVATE(I), LASTPRIVATE (X) DO I = 1,10 X = I * I A(I) = X * B(I) END DO PRINT *, X ! X has the value 100
Example 2: A valid example with the REDUCTION clause.
!SMP$ PARALLEL DO PRIVATE(I), REDUCTION(MYSUM) DO I = 1, 10 MYSUM = MYSUM + IARR(I) END DO
Example 3: A valid example where a variable marked SHARED is accessed by more than one thread but is used only in a CRITICAL construct.
!SMP$ PARALLEL DO SHARED (X) DO I = 1, 10 A(I) = A(I) * I !SMP$ CRITICAL X = X + A(I) !SMP$ END CRITICAL END DO
Example 4: An invalid example because the variable A appears in both the PRIVATE and the SHARED clauses.
!SMP$ PARALLEL DO PRIVATE(A), SHARED(A) DO I = 1,1000 A(I) = I ** I END DO
Example 5: An invalid example because the SCHEDULE clause appears more than once.
!SMP$ PARALLEL DO SCHEDULE(GUIDED), SCHEDULE(STATIC, 100) DO I = 1, 1000 A(I) = B(I) ** I END DO
Example 6: An invalid example because the REDUCTION clause specifies the division arithmetic operator as the reduction_op.
!SMP$ PARALLEL DO REDUCTION(/ : X) DO I = 1, 1000 X = X / I END DO
Related Information
Purpose
The PARALLEL SECTIONS construct allows you to define independent blocks of code which the compiler can execute concurrently. The PARALLEL SECTIONS construct includes a PARALLEL SECTIONS directive followed by one or more blocks of code delimited by the SECTION directive, and ends with an END PARALLEL SECTIONS directive.
The PARALLEL SECTIONS, SECTION and END PARALLEL SECTIONS directives only take effect if the -qsmp compiler option is specified.
Format
+-+--+----------------------------------+ | +--+ | V | >>-PARALLEL SECTIONS----+-----------------------------------+-+->< +-+----+--parallel_sections_clause--+ +-,--+ +-+--+----------------+ | +--+ | V | >>-+---------+---block----+-----------------+-+---------------->< +-SECTION-+ +-SECTION--block--+ >>-END PARALLEL SECTIONS--------------------------------------->< |
where parallel_sections_clause is:
>>-+-IF--(--scalar_logical_expr--)-----------------------+----->< +-PRIVATE--(--named_variable_list--)------------------+ +-REDUCTION--(-+----------+---named_variable_list--)--+ | +-op_fnc :-+ | +-SHARED--(--named_variable_list--)-------------------+ |
Rules
The PARALLEL SECTIONS construct includes, as stated in the syntax above, the delimiting directives and the blocks of code they enclose. The rules below also refer to sections. A section is defined as the block of code within the delimiting directives.
The SECTION directive marks the beginning of a block of code. At least one SECTION and its block of code must appear within the PARALLEL SECTIONS construct. Note, however, that the SECTION directive does not have to be specified for the first section. The end of a block is delimited by either another SECTION directive or by the END PARALLEL SECTIONS directive.
The PARALLEL SECTIONS construct is used to specify parallel execution of the identified sections of code. There is no assumption as to the order in which sections are executed. Each section must not interfere with any other section in the construct unless the interference occurs within a CRITICAL construct. See the definition of interference outside a CRITICAL construct, for more information.
It is illegal to branch into or out of any block of code defined by the PARALLEL SECTIONS construct.
Within a PARALLEL SECTIONS construct, variables not appearing in the PRIVATE clause are assumed to be SHARED by default.
A variable name must not appear:
While a PARALLEL SECTIONS construct is executing, a variable or subobject of a variable must not be referenced, become defined, become undefined, have its association status or allocation status changed, or appear as an actual argument:
The IF clause may appear at most once in the a PARALLEL SECTIONS directive.
By default, a nested parallel loop is serialized, regardless of the setting of the IF clause. You can change this default by using the -qsmp=nested_par compiler option.
A variable should be specified with the PRIVATE attribute if it is referenced within multiple sections, defined before it is used within a section, and its value is not used after the section ends. Copies of the PRIVATE variable exist, locally, on each thread. Each section receives its own uninitialized copy of the PRIVATE variable. A PRIVATE variable has an undefined value or association status on entry to, and exit from, the PARALLEL SECTIONS construct. All iteration variables within the dynamic extent of the PARALLEL SECTIONS construct are given the PRIVATE attribute by default.
Local variables without the SAVE or STATIC attributes in referenced subprograms in the dynamic extent of a PARALLEL SECTIONS construct have an implicit PRIVATE attribute. Common blocks and modules in referenced subprograms in the dynamic extent of a PARALLEL SECTIONS construct have an implicit SHARED attribute, unless they are THREADLOCAL common blocks.
If there is a call to an MPI routine which does non-blocking communication in a PARALLEL SECTIONS construct, no arguments to the MPI routine should be PRIVATE.
If one of the entities involved in an asynchronous I/O operation is a PRIVATE variable or a subobject of a PRIVATE variable, the matching WAIT statement must be executed before the end of the section.
A variable in the PRIVATE clause must not:
Note that a variable in the named_variable_list of the PRIVATE clause may have the POINTER attribute. Such a pointer has undefined association status on entry to the PARALLEL SECTIONS construct and undefined association status on exit from every section of the PARALLEL SECTIONS construct. Also note that a variable name in the named_variable_list of the PRIVATE clause may be an allocatable array. It must not be allocated on initial entry to the PARALLEL SECTIONS construct and the user must allocate and deallocate the array in every section.
A variable which appears in the PRIVATE clause of an inner PARALLEL SECTIONS construct must also appear in the PRIVATE or LASTPRIVATE clause of all enclosing DO loops which have the PARALLEL DO directive and all enclosing PARALLEL SECTIONS constructs. This includes both the lexical extent and the dynamic extent of the PARALLEL DO directive and the PARALLEL SECTIONS construct.
In a PARALLEL SECTIONS construct, a variable which appeared in the REDUCTION clause of an INDEPENDENT directive or the PARALLEL DO directive of an enclosing DO loop must not also appear in the named_variable_list of the PRIVATE clause.
The REDUCTION clause specifies named variables that appear in reduction operations. The compiler will maintain local copies of such variables, but will combine them upon exit from the construct. The intermediate values of the REDUCTION variables are combined in random order, dependent on which threads finish their calculations first. There is, therefore, no guarantee that bit-identical results will be obtained from one parallel run to another, even if the parallel runs use the same number of threads and the same scheduling type and chunk size.
A variable in the REDUCTION clause must be of intrinsic type. A variable in the REDUCTION clause, or any element thereof, must not:
A variable which appears in the REDUCTION clause of an inner PARALLEL SECTIONS construct must also appear in the PRIVATE, LASTPRIVATE, or REDUCTION clause of all enclosing DO loops which have the PARALLEL DO directive and all enclosing PARALLEL SECTIONS constructs. This includes both the lexical extent and the dynamic extent of the PARALLEL DO directive and the PARALLEL SECTIONS construct. If the REDUCTION variable of the inner PARALLEL SECTIONS construct appears in the PRIVATE clause of an enclosing DO loop or PARALLEL SECTIONS construct, the variable must be initialized before the inner PARALLEL SECTIONS construct.
A REDUCTION variable must not appear in a PRIVATE or LASTPRIVATE clause in the body of the PARALLEL SECTIONS construct.
The SHARED clause specifies variables that must be available to all threads. If a variable is specified as SHARED, the user is stating that all sections can safely share a single copy of the variable. You should specify a variable as SHARED when:
If neither condition is satisfied then a variable may be marked SHARED only if it is used within a CRITICAL construct, (see CRITICAL / END CRITICAL), and the updating of, or reference to, the variable is not dependent on the order in which the sections are executed. All variables with the exception of loop-iteration variables, are SHARED by default.
If a SHARED variable, subobject of a SHARED variable, or an object associated with a SHARED variable or subobject of a SHARED variable appears as an actual argument in a reference to a non-intrinsic procedure:
unless the procedure reference appears in a CRITICAL construct.
A variable in the SHARED clause must not:
The PARALLEL SECTIONS construct must not appear within a CRITICAL construct.
Examples
Example 1: In this example, note that a section of code need not contain a DO loop.
!SMP$ PARALLEL SECTIONS !SMP$ SECTION DO I = 1, 10 C(I) = MAX(A(I), A(I+1)) END DO !SMP$ SECTION W = U + V Z = X + Y !SMP$ END PARALLEL SECTIONS
Example 2: In this example the index variable I is declared as PRIVATE. Note also that the first optional SECTION directive has been omitted.
!SMP$ PARALLEL SECTIONS PRIVATE(I) DO I = 1, 100 A(I) = A(I) * I END DO !SMP$ SECTION CALL NORMALIZE (B) DO I = 1, 100 B(I) = B(I) + 1.0 END DO !SMP$ SECTION DO I = 1, 100 C(I) = C(I) * C(I) END DO !SMP$ END PARALLEL SECTIONS
Example 3: This example is invalid because there is a data dependency for the variable C across sections.
!SMP$ PARALLEL SECTIONS !SMP$ SECTION DO I = 1, 10 C(I) = C(I) * I END DO !SMP$ SECTION DO K = 1, 10 D(K) = C(K) + K END DO !SMP$ END PARALLEL SECTIONS
Related Information
Purpose
The PERMUTATION directive specifies that the elements of each array listed in the integer_array_name_list have no repeated values. This directive is useful when array elements are used as subscripts for other array references.
The PERMUTATION directive only takes effect if either the -qsmp or -qhot compiler option is specified.
Format
>>-PERMUTATION--(--integer_array_name_list--)------------------>< |
Rules
The first noncomment line (not including other directives) following the PERMUTATION directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The PERMUTATION directive applies only to the DO loop immediately following the directive and not to any nested DO loops.
Examples
PROGRAM EX3 INTEGER A(100), B(100) !SMP$ PERMUTATION (A) DO I = 1, 100 A(I) = I B(A(I)) = B(A(I)) + A(I) END DO END PROGRAM EX3
Related Information
Purpose
You can specify compiler options to affect an individual compilation unit by putting the @PROCESS compiler directive in the source file. It can override options specified in the configuration file, in the default settings, or on the command line.
Format
+-+---+-----------------------------+ | +-,-+ | V | >>-@PROCESS----option--+-----------------------+-+------------->< +-(--suboption_list--)--+ |
Rules
In fixed source form, @PROCESS can start in column 1 or after column 6. In free source form, the @PROCESS compiler directive can start in any column.
You cannot place a statement label or inline comment on the same line as an @PROCESS compiler directive.
By default, option settings you designate with the @PROCESS compiler directive are effective only for the compilation unit in which the statement appears. If the file has more than one compilation unit, the option setting is reset to its original state before the next unit is compiled. Trigger constants specified by the DIRECTIVE option are in effect until the end of the file (or until NODIRECTIVE is processed).
The @PROCESS compiler directive must usually appear before the first statement of a compilation unit. The only exceptions are when specifying SOURCE and NOSOURCE; you can put them in @PROCESS directives anywhere in the compilation unit.
Related Information
See the User's Guide for details on compiler options.
Purpose
The SCHEDULE directive allows the user to specify the chunking method for parallelization. Work is assigned to threads in a different manner depending on the scheduling type or chunk size used.
The SCHEDULE directive only takes effect if the -qsmp compiler option is specified.
Format
>>-SCHEDULE--(--sched_type--+-------+--)----------------------->< +-,--n--+ |
CEILING(number_of_iterations / number_of_threads)iterations. Each partition is initially assigned to a thread, and is then further subdivided into chunks containing n iterations, if n has been specified. If n has not been specified, then the chunks consist of
CEILING(number_of_iterations_remaining_in_partition / 2)loop iterations.
When a thread becomes free, it takes the next chunk from its initially assigned partition. If there are no more chunks in that partition, then the thread takes the next available chunk from a partition initially assigned to another thread.
The work in a partition initially assigned to a sleeping thread will be completed by threads which are active.
CEILING(number_of_iterations / number_of_threads)iterations.
Threads are assigned these chunks on a "first-come, first-do" basis. Chunks of the remaining work are assigned to available threads, until all work has been assigned.
If a thread is asleep, its assigned work will be taken over by an active thread, once that thread becomes available.
The first chunk contains
CEILING(number_of_iterations / number_of_threads)iterations. Subsequent chunks consist of
CEILING(number_of_iterations_remaining / number_of_threads)iterations. Available threads are assigned chunks on a "first-come, first-do" basis. Chunks of the remaining work are assigned to available threads, until all work has been assigned.
If a thread is asleep, its assigned work will be taken over by an active thread, once that thread becomes available.
At run time, the scheduling type can be specified using the environment variable XLSMPOPTS. If no scheduling type is specified using that variable, then the default scheduling type used is STATIC.
If n has not been specified, the chunks will contain
CEILING(number_of_iterations / number_of_threads)iterations. Each thread is assigned one of these chunks. This is known as block scheduling.
If a thread is asleep and it has been assigned work, it will be awakened so that it may complete its work.
STATIC is the default scheduling type if the user has not specified any scheduling type at compile-time or run time.
Rules
The SCHEDULE directive must appear in the specification part of a scoping unit.
Only one SCHEDULE directive may appear in the specification part of a scoping unit.
The SCHEDULE directive applies to
Any dummy arguments appearing or referenced in the specification expression for the chunk size n must also appear in the SUBROUTINE or FUNCTION statement and in all ENTRY statements appearing in the given subprogram.
If the specified chunk size n is greater than the number of iterations, the loop will not be parallelized and will execute on a single thread.
If you specify more than one method of determining the chunking algorithm, the compiler will follow, in order of precedence:
Examples
Example 1. Given the following information:
number of iterations = 1000 number of threads = 4and using the GUIDED scheduling type, the chunk sizes would be as follows:
250 188 141 106 79 59 45 33 25 19 14 11 8 6 4 3 3 2 1 1 1 1The iterations would then be divided into the following chunks:
chunk 1 = iterations 1 to 250 chunk 2 = iterations 251 to 438 chunk 3 = iterations 439 to 579 chunk 4 = iterations 580 to 685 chunk 5 = iterations 686 to 764 chunk 6 = iterations 765 to 823 chunk 7 = iterations 824 to 868 chunk 8 = iterations 869 to 901 chunk 9 = iterations 902 to 926 chunk 10 = iterations 927 to 945 chunk 11 = iterations 946 to 959 chunk 12 = iterations 960 to 970 chunk 13 = iterations 971 to 978 chunk 14 = iterations 979 to 984 chunk 15 = iterations 985 to 988 chunk 16 = iterations 989 to 991 chunk 17 = iterations 992 to 994 chunk 18 = iterations 995 to 996 chunk 19 = iterations 997 to 997 chunk 20 = iterations 998 to 998 chunk 21 = iterations 999 to 999 chunk 22 = iterations 1000 to 1000A possible scenario for the division of work could be:
thread 1 executes chunks 1 5 10 13 18 20 thread 2 executes chunks 2 7 9 14 16 22 thread 3 executes chunks 3 6 12 15 19 thread 4 executes chunks 4 8 11 17 21
Example 2. Given the following information:
number of iterations = 100 number of threads = 4and using the AFFINITY scheduling type, the iterations would be divided into the following partitions:
partition 1 = iterations 1 to 25 partition 2 = iterations 26 to 50 partition 3 = iterations 51 to 75 partition 4 = iterations 76 to 100The partitions would be divided into the following chunks:
chunk 1a = iterations 1 to 13 chunk 1b = iterations 14 to 19 chunk 1c = iterations 20 to 22 chunk 1d = iterations 23 to 24 chunk 1e = iterations 25 to 25 chunk 2a = iterations 26 to 38 chunk 2b = iterations 39 to 44 chunk 2c = iterations 45 to 47 chunk 2d = iterations 48 to 49 chunk 2e = iterations 50 to 50 chunk 3a = iterations 51 to 63 chunk 3b = iterations 64 to 69 chunk 3c = iterations 70 to 72 chunk 3d = iterations 73 to 74 chunk 3e = iterations 75 to 75 chunk 4a = iterations 76 to 88 chunk 4b = iterations 89 to 94 chunk 4c = iterations 95 to 97 chunk 4d = iterations 98 to 99 chunk 4e = iterations 100 to 100A possible scenario for the division of work could be:
thread 1 executes chunks 1a 1b 1c 1d 1e 4d thread 2 executes chunks 2a 2b 2c 2d thread 3 executes chunks 3a 3b 3c 3d 3e 2e thread 4 executes chunks 4a 4b 4c 4eNote that in this scenario, thread 1 finished executing all the chunks in its partition and then grabbed an available chunk from the partition of thread 4. Similarly, thread 3 finished executing all the chunks in its partition and then grabbed an available chunk from the partition of thread 2.
Example 3. Given the following information:
number of iterations = 1000 number of threads = 4and using the DYNAMIC scheduling type and chunk size of 100, the chunk sizes would be as follows:
100 100 100 100 100 100 100 100 100 100The iterations would be divided into the following chunks:
chunk 1 = iterations 1 to 100 chunk 2 = iterations 101 to 200 chunk 3 = iterations 201 to 300 chunk 4 = iterations 301 to 400 chunk 5 = iterations 401 to 500 chunk 6 = iterations 501 to 600 chunk 7 = iterations 601 to 700 chunk 8 = iterations 701 to 800 chunk 9 = iterations 801 to 900 chunk 10 = iterations 901 to 1000A possible scenario for the division of work could be:
thread 1 executes chunks 1 5 9 thread 2 executes chunks 2 8 thread 3 executes chunks 3 6 10 thread 4 executes chunks 4 7
Example 4. Given the following information:
number of iterations = 100 number of threads = 4and using the STATIC scheduling type, the iterations would be divided into the following chunks:
chunk 1 = iterations 1 to 25 chunk 2 = iterations 26 to 50 chunk 3 = iterations 51 to 75 chunk 4 = iterations 76 to 100A possible scenario for the division of work could be:
thread 1 executes chunks 1 thread 2 executes chunks 2 thread 3 executes chunks 3 thread 4 executes chunks 4
Related Information
Purpose
The SOURCEFORM compiler directive indicates that all subsequent lines are to be processed in the specified source form until the end of the file is reached or until an @PROCESS directive or another SOURCEFORM directive specifies a different source form.
Format
>>-SOURCEFORM--(--source--)------------------------------------>< |
Rules
The SOURCEFORM directive can appear anywhere within a file. An include file is compiled with the source form of the including file. If the SOURCEFORM directive appears in an include file, the source form reverts to that of the including file once processing of the include file is complete.
The SOURCEFORM directive cannot specify a label.
Tip |
---|
To modify your existing files to Fortran 90 free source form where include files exist:
|
Examples
@PROCESS DIRECTIVE(CONVERT*) PROGRAM MAIN ! Main program not yet converted A=1; B=2 INCLUDE 'freeform.f' PRINT *, RESULT ! Reverts to fixed form END
where file freeform.f contains:
!CONVERT* SOURCEFORM(FREE(F90)) RESULT = A + B
Purpose
The THREADLOCAL directive is used to declare thread-specific common data. It is a possible method of ensuring that access to data contained within COMMON blocks is serialized.
In order to make use of this directive it is not necessary to specify the -qsmp compiler option, but the invocation command must be xlf_r or xlf90_r to link the necessary libraries.
Format
+-,------------------------+ V | >>-THREADLOCAL--+-----+----/--common_block_name--/--+---------->< +-::--+ |
Rules
Only named common blocks may be declared as THREADLOCAL. All rules and constraints that normally apply to named common blocks apply to common blocks declared as THREADLOCAL. See COMMON for more information on the rules and constraints that apply to named common blocks.
The THREADLOCAL directive must appear in the specification_part of the scoping unit. If a common block appears in a THREADLOCAL directive, it must also be declared within a COMMON statement in the same scoping unit. The THREADLOCAL directive may occur before or after the COMMON statement. See "Main Program" for more information on the specification_part of the scoping unit.
A common block cannot be given the THREADLOCAL attribute if it is declared within a PURE subprogram.
Members of a THREADLOCAL common block must not appear in NAMELIST statements.
A common block which is use-associated must not be declared as THREADLOCAL in the scoping unit that contains the USE statement.
Any pointers declared in a THREADLOCAL common block are not affected by the -qinit=f90ptr compiler option.
Objects within THREADLOCAL common blocks may be used in parallel loops and parallel sections. However, these objects are implicitly shared across the iterations of the loop, and across code blocks within parallel sections. In other words, within a scoping unit, all accessible common blocks, whether declared as THREADLOCAL or not, have the SHARED attribute within parallel loops and sections in that scoping unit.
If a common block is declared as THREADLOCAL within a scoping unit, any subprogram that declares or references the common block, and that is directly or indirectly referenced by the scoping unit, must be executed by the same thread executing the scoping unit. If two procedures that declare common blocks are executed by different threads, then they would obtain different copies of the common block, provided that the common block had been declared THREADLOCAL. Threads can be created in one of the following ways:
If a common block is declared to be THREADLOCAL in one scoping unit, it must be declared to be THREADLOCAL in every scoping unit that declares the common block.
If a THREADLOCAL common block, that does not have the SAVE attribute, is declared within a subprogram, the members of the block become undefined at subprogram RETURN or END unless there is at least one other scoping unit in which the common block is accessible that is making a direct or indirect reference to the subprogram.
Examples
Example 1: The following procedure "FORT_SUB" is invoked by two threads:
SUBROUTINE FORT_SUB(IARG) INTEGER IARG CALL LIBRARY_ROUTINE1() CALL LIBRARY_ROUTINE2() ... END SUBROUTINE FORT_SUB
SUBROUTINE LIBRARY_ROUTINE1() COMMON /BLOCK/ R ! The SAVE attribute is required for the common SAVE /BLOCK/ ! block because the program requires that the block !IBM* THREADLOCAL /BLOCK/ ! remain defined after library_routine1 is invoked. R = 1.0 ... END SUBROUTINE LIBRARY_ROUTINE1
SUBROUTINE LIBRARY_ROUTINE2() COMMON /BLOCK/ R SAVE /BLOCK/ !IBM* THREADLOCAL /BLOCK/ ... = R ... END SUBROUTINE LIBRARY_ROUTINE2
Example 2: "FORT_SUB" is invoked by multiple threads. This is an invalid example because "FORT_SUB" and "ANOTHER_SUB" both declare /BLOCK/ to be THREADLOCAL. They intend to share the common block, but they are executed by different threads.
SUBROUTINE FORT_SUB() COMMON /BLOCK/ J INTEGER :: J !IBM* THREADLOCAL /BLOCK/ ! Each thread executing FORT_SUB ! obtains its own copy of /BLOCK/ INTEGER A(10) ... !IBM* INDEPENDENT DO INDEX = 1,10 CALL ANOTHER_SUB(A(I)) END DO ... END SUBROUTINE FORT_SUB
SUBROUTINE ANOTHER_SUB(AA) ! Multiple threads are used to execute ANOTHER_SUB INTEGER AA COMMON /BLOCK/ J ! Each thread obtains a new copy of the INTEGER :: J ! common block /BLOCK/ !IBM* THREADLOCAL /BLOCK/ ... AA = J ! The value of 'J' is undefined. END SUBROUTINE ANOTHER_SUB
Related Information