The PSyclone Internal Representation (PSyIR)

The PSyclone Intermediate Representation (PSyIR) is a language-independent Intermediate Representation that PSyclone uses to represent the PSy (Parallel System) and the Kernel (serial units of work) layers of an application that can be constructed from scratch or produced from existing code using one of the PSyIR front-ends. Its design is optimised to represent high-performance parallel applications in an expressive, mutable and extensible way:

It is expressive because it is intended to be created and/or manipulated by HPC software experts directly when optimizing or porting the code.

It is mutable because it is intended to be programmatically manipulated (usually though PSyclone scripts) and maintain a coherent state (with valid relationships, links to symbols, links to dependencies) after each manipulation.

It is extensible because it is intended to be used as the core component of domain-specific systems which include additional abstract concepts or logic not captured in the generic representation.

To achieve these design goals we use a Normalised Heterogeneous AST representation together with a Type System and a Symbol Table. By heterogeneous we mean that we distinguish between AST nodes using Python class inheritance system and each node has its particular (and semantically relevant) navigation and behaviour methods. For instance the Assignment node has lhs and rhs properties to navigate to the left-hand-side and right-hand-side operators of the Assignment. It also means we can identify a node using its type with isinstance(node, Assignment). Nevertheless, we maintain a normalised core of node relationships and functionality that allows us to build tree walkers, tree visitors and dependency analysis tools without the need to consider the implementation details of each individual sub-class.

The common functionality that all nodes must have is defined in the PSyIR base class Node. See the list of all PSyIR common methods in the Node reference guide.

More information about the type system and symbols and how PSyIR can be transformed back to a particular language using the back-ends (Writers) is provided in the following sections of this guide.

How to create new PSyIR Nodes

In order to create a new PSyIR node, either for adding a new core PSyIR node or to extend the functionality in one of the PSyclone APIs, it is mandatory to perform the following steps:

The new node must inherit from psyclone.psyir.nodes.Node or one of its sub-classes. Note that Node won’t be accepted as a child anywhere in the tree. It may be appropriate to specialise one of the existing subclasses of Node, rather than Node itself. A good starting point would be to consider psyclone.psyir.nodes.Statement (which will be accepted inside any Schedule) or psyclone.psyir.nodes.DataNode (which will be accepted anywhere that the node can be evaluated to a data element).
Set the _text_name and the _color_key class string attributes. These attributes will provide standard behaviour for the __str__ and view() methods.
Set the _children_valid_format class string attribute and specialise the static method _validate_child(position, child). These define, textually and logically, what types of nodes will be accepted as children of the new node:
- _children_valid_format is the textual representation that will be used in error messages. It is expressed using tokens with the same name as the PSyIR classes and the following symbols:
  |: or operand.
  
  ,: concatenation operand.
  
  [ expression ]*: zero or more instances of the expression.
  
  [ expression ]+: one or more instances of the expression.
  
  <LeafNode>: NULL operand (no children accepted).
  For instance, an expression that accepts a statement as a first child and one or more DataNodes after it would be: Statement [, DataNode]+.
- _validate_child(position, child) returns a boolean which indicates whether the given child is a valid component for the given position.

Note

Note that the valid children of a node are described two times, once in _children_valid_format and another in _validate_child, and it is up to the developer to keep them coherent in order to provide sensible error messages. Alternatively we could create an implementation where the textual representation is parsed and the validation method is generated automatically, hence avoiding the duplication. Issue #765 explores this alternative.

If any of the attributes introduced by this method should not be shallow-copied when creating a duplicate of this PSyIR branch, specialise the _refine_copy method to perform the appropriate copy actions.
If any of the attributes in this node should be used to compute the equality of two nodes, specialise the __eq__ member to perform the appropriate checks. The default __eq__ behaviour is to check both instance types are exactly the same, and each of their children also pass the equality check. The only restriction on this implementation is that it must call the super().__eq__(other) as part of its implementation, to ensure any inherited equality checks are correctly checked. The default behaviour ignores annotations and comment attributes, as they should not affect the semantics of the PSyIR tree.

For example, if we want to create a node that can be found anywhere where a statement is valid, and in turn it accepts one and only one DataNode as a child, we would write something like:

from psyclone.psyir.nodes import Statement, DataNode


class MyNode(Statement):
    ''' MyNode is an example node that can be found anywhere where statement
    is valid, and in turn it accepts one and only one DataNode as a children.
    '''
    _text_name = "MyNodeName"
    _colour = "blue"
    _children_valid_format = "DataNode"

    @staticmethod
    def _validate_child(position, child):

This implementation already provides the basic PSyIR functionality and the node can be integrated and used in the PSyIR tree:

>>> from psyclone.psyir.nodes import Literal, Schedule
>>> from psyclone.psyir.symbols import INTEGER_TYPE
>>> from code_snippets.newnode import MyNode
>>> mynode = MyNode(children=[Literal("1", INTEGER_TYPE)])
>>> mynode.children.append(Literal("2", INTEGER_TYPE))
Traceback (most recent call last):
   ...
psyclone.errors.GenerationError: Generation Error: Item 'Literal' can't be child 1 of 'MyNodeName'. The valid format is: 'DataNode'.
>>> schedule = Schedule()
>>> schedule.addchild(mynode)
>>> print(schedule.view(colour=False))
Schedule[]
    0: MyNodeName[]
        Literal[value:'1', Scalar<INTEGER, UNDEFINED>]

For a full list of methods available in any PSyIR node see the Node reference guide.

Note

For convenience, the PSyIR children validation is performed with both: Node methods (e.g. node.addchild()) and also list methods (e.g. node.children.extend([node1, node2])).

To achieve this, we sub-classed the Python list and redefined all methods that modify the list by calling first the PSyIR provided validation method and subsequently, if valid, calling the associated list method and triggering an ‘update’ signal (see Dynamic Tree Updates).

The parent-child relationship

To facilitate the PSyIR tree navigation, the parent-child relationship between nodes is represented with a double reference (providing node.parent and node.children navigational properties).

However, to maintain the consistency of the double reference, we don’t allow the node API to manually specify its parent reference. It is always the responsibility of a parent node to update the parent reference of its children. To make this possible for any operation applied to the node.children list, we provide this functionality in the same list subclass specialisation that does the child validation checks explained in the previous section. Therefore, all the following list operations will work as expected:

node.children.insert(node1)  # Will set node1.parent reference to node
node.children.extend([node2, node3])  # Will set node2 and node3 parent
                                      # references to node
del node.children[1]  # Will unset the parent reference of children[1]
node.children = []  # Will unset the parent references of all its previous
                    # children
node.detach()  # Will ask node.parent to free node, as node can't change
               # the connection by itself

The only exception to the previous consistency rule is when a node constructor is given the parent reference when building a PSyIR tree top-down. In this case, the single-direction reference will be accepted temporarily, but a child connection operation will need to be done eventually to satisfy the other part of the connection. Any attempt to insert the new node as a child of another node not specified in the constructor will fail as this would break the consistency with the predefined parent reference. For example:

assignment = Assignment()
rhs = Reference(symbol1, parent=assignment)  # Predefined parent reference
lhs = Reference(symbol2, parent=assignment)  # Predefined parent reference
assignment.children = [lhs, rhs]  # Finalise parent-child relationship

node = Reference(symbol3, parent=assignment)
lhs.addchild(node)  # Will produce a Generation error because the node
                    # constructor specified that its parent would be the
                    # 'assignment' node

Note that a node which already has a parent won’t be accepted as a child of another node, as this could break any previously existing parent-child relationship.

node1.children.insert(child)  # Valid
node2.children.insert(child)  # Will produce a GenerationError

Methods like node.detach(), node.copy() and node.pop_all_children() can be used to move or replicate existing children into different nodes.

Dynamic Tree Updates

Certain modifications to a PSyIR tree will require that parent nodes also be updated. For instance, if nodes are added to or removed from an OpenACC data region, then the clauses describing the necessary data movement (to/from the accelerator device) may have to change. To support such use cases, the PSyIR Node has the update_signal method which is used to signal that the tree has been modified. This signal is propagated up the tree (i.e. from parent to parent). The default handler for this signal, Node._update_node, does nothing. If a sub-class must take action when the tree below it is modified then it must override the _update_node method as appropriate.

Note that the signalling mechanism is fully contained within the Node class and takes care of avoiding recursive updates to the same Node instance. It should therefore only be necessary for a class to implement the _update_node handler.

Selected Node Descriptions

ScopingNode

A ScopingNode is an abstract class node that defines a scoping region, this node and all its descendants have access to a shared set of symbols. These symbols are described in the SymbolTable (psyclone.psyir.symbols.SymbolTable) attached to this node.

There is a double-link between the ScopingNode (through the symbol_table property) and the SymbolTable (through the scope property) objects. To maintain a consistent connection between both objects the only public methods to update the connections are the attach and detach methods of SymbolTable (which takes care of both sides of the connection).

Also note that the constructor will not accept as a parameter a symbol table that already belongs to another scope. The symbol table will need to be detached or deep copied before it can be assigned to the new ScopingNode.

See the full API in the ScopingNode reference guide.

Container

The Container node is a ScopingNode that contains one or more Container and/or Routine nodes. A Container can be used to capture a hierarchical grouping of Routine nodes and a hierarchy of Symbol scopes i.e. a Symbol specified in a Container is visible to all Container and Routine nodes within it and their descendants. See the full Container API in the Container reference guide.

FileContainer

The FileContainer node is a subclass of the Container node and is used to capture the concept of a file that contains one or more Container and/or Routine nodes. Whilst this structure is the same as for a Container, it is useful to distinguish between the two as backends may need to deal differently with a FileContainer and a Container.

A FileContainer is always created at the root of the PSyIR tree when parsing Fortran code, as a Fortran file can contain one or more program units (captured as Containers and/or Routines). PSyIR tree when parsing Fortran code, as Fortran code has the concept of a program (captured as a FileContainer) that can contain one or more program units (captured as Containers and/or Routines). See the full FileContainer API in the FileContainer reference guide.

Schedule

The Schedule is a ScopingNode that represents a sequence of statements. See the full Schedule API in the Schedule reference guide.

Routine

The Routine node is a subclass of Schedule that represents any program unit (subroutine, function or main program). As such it extends Schedule through the addition of the return_symbol (required when representing a function) and is_program properties. It also adds the create helper method for constructing a valid Routine instance. It is an important node in PSyclone because two of its specialisations: InvokeSchedule and KernelSchedule (described below), are used as the root nodes of PSy-layer invokes and kernel subroutines. This makes them the starting points for any walking of the PSyIR tree in PSyclone transformation scripts and a common target for the application of transformations.

InvokeSchedule

The InvokeSchedule is a PSyIR node that represents an invoke subroutine in the PSy-layer. It specialises the psyclone.psyir.nodes.Routine functionality with a reference to its associated psyclone.psyGen.Invoke object.

Note

This class will be renamed to InvokeRoutine in issue #909.

KernelSchedule

The KernelSchedule is a PSyIR node that represents a Kernel subroutine. As such it is a subclass of psyclone.psyir.nodes.Routine with return_type set to None and is_program set to False.

Note

This class will be renamed to KernelRoutine in issue #909.

Control-Flow Nodes

The PSyIR has four control flow nodes: IfBlock, Loop, WhileLoop and Call. These nodes represent the canonical structure with which conditional branching constructs, iteration constructs and accesses to other blocks of code are built. Additional language-specific syntax for branching and iteration will be normalised to use these same constructs. For example, Fortran has the additional branching constructs ELSE IF, SELECT CASE and SELECT TYPE: when a Fortran code is translated into the PSyIR, PSyclone will build a semantically equivalent implementation using IfBlock nodes (and an additional CodeBlock containing SELECT TYPE in the case of SELECT TYPE). Similarly, Fortran also has the WHERE construct and statement which are represented in the PSyIR with a combination of Loop and IfBlock nodes. Such nodes in the new tree structure are annotated with information to enable the original language-specific syntax to be recreated if required (see below). See the full IfBlock API in the IfBlock reference guide. The PSyIR also supports the concept of named arguments for Call nodes, see the Named arguments section for more details.

Note

A Call node (like the CodeBlock) inherits from both Statement and DataNode because it can be found in Schedules or inside Expressions, however this has some shortcomings, see issue #1437.

Control-Flow Node annotation

If the PSyIR is constructed from existing code (using e.g. the fparser2 frontend) then it is possible that information about that code may be lost. This is because the PSyIR is only semantically equivalent to certain code constructs. In order that information is not lost (making it possible to e.g. recover the original code structure if desired) Nodes may have annotations associated with them. The annotations, the Node types to which they may be applied and their meanings are summarised in the table below:

Annotation	Node types	Origin
was_elseif	IfBlock	else if
was_single_stmt	IfBlock, Loop	if(logical-expr)expr or Fortran where(array-mask)array-expr
was_case	IfBlock	Fortran select case construct
was_where	Loop, IfBlock	Fortran where construct
was_unconditional	WhileLoop	Fortran do loop with no condition
was_type_is	IfBlock	Fortran type is construct within a select type construct
was_class_is	IfBlock	Fortran class is construct within a select type construct

Note

A Loop may currently only be given the was_single_stmt annotation if it also has the was_where annotation. (Thus indicating that this Loop originated from a WHERE statement in the original Fortran code.) The PSyIR represents Fortran single-statement loops (often called array notation) as arrays with ranges in the appropriate indices.

Loop Node

The Loop node is the canonical representation of a counted loop, it has the start, stop, step and loop_body of the loop as its children. The node has the same semantics than the Fortran do construct: the boundary values are inclusive (both are part of the iteration space) and the start, stop and step expressions are evaluated just once at the beginning of the loop.

For more details on the Loop node, see the full API in the reference guide.

WhileLoop Node

The WhileLoop node is the canonical representation of a while loop. The PSyIR representation of the Fortran do loop with no condition will have the annotation was_unconditional, but is otherwise no different from that of a do while loop whose condition is the logical constant .TRUE..

For more details on the WhileLoop node, see the full API in the reference guide.

Ranges

The PSyIR has the Range node which represents a range of integer values with associated start, stop and step properties. e.g. the list of values [4, 6, 8, 10] would be represented by a Range with a start value of 4, a stop value of 10 and a step of 2 (all stored as Literal nodes). This class is intended to simplify the construction of Loop nodes as well as to support array slicing (see below). However, this functionality is under development and at this stage neither of those options have been implemented.

The Range node must also provide support for array-slicing constructs where a user may wish to represent either the entire range of possible index values for a given dimension of an array or a sub-set thereof. e.g. in the following Fortran:

real, dimension(10, 5) :: my_array
call some_routine(my_array(1, :))

the argument to some_routine is specified using array syntax where the lone colon means every element in that dimension. In the PSyIR, this argument would be represented by an ArrayReference node with the first entry in its shape being an integer Literal (with value 1) and the second entry being a Range. In this case the Range will have a start value of LBOUND(my_array, 1), a stop value of UBOUND(my_array, 1) and a step of Literal(“1”). Note that LBOUND and UBOUND will be instances of BinaryOperation. (For the particular code fragment given above, the values are in fact known [1 and 5, respectively] and could be obtained by querying the Symbol Table.)

See the full Range API in the Range reference guide.

Operation Nodes

Arithmetic and logic operations are represented in the PSyIR by sub-classes of the Operation node. The operations are classified according to the number of operands:

Those having one operand are represented by psyclone.psyir.nodes.UnaryOperation nodes,
those having two operands are represented by psyclone.psyir.nodes.BinaryOperation nodes.

See the documentation for each Operation class in the Operation, UnaryOperation and BinaryOperation sections of the reference guide.

Note

Similar to Fortran, the PSyIR has two comparison operators, one for booleans (EQV) and one for non-booleans (EQ). These are not interchangeable because they have different precedence priorities and some compilers will not compile with the wrong operator. In some cases we need to insert a comparison of two expressions and we don’t know the datatype of the operands (e.g. in the select-case canonicalisation). A solution to this is to create an abstract interface with appropriate implementations for each possible datatype.

Data Type of an Operation Node

Table 7.2 of the Fortran2008 standard specifies the rules governing the types of operands and their results. The PSyIR follows these rules with the exception that there is no support for symbols of complex (imaginary) type (see #1590). For unary operations, the type of the result is just that of the operand. For a numeric, binary operation, these rules boil down to saying that if either argument is real then the result is real but if both arguments are integer then the result is integer.

If the precisions of the operands are the same, then the result must also be of that precision. Otherwise, Section 7.1.9.3 of the Fortran2008 standard says that the precision of the result is the greater of the two. In the PSyIR, if both precisions are instances of ScalarType.Precision or int then this permits the precision of the result to be determined. Otherwise, the result is given a precision of ScalarType.Precision.UNDEFINED.

For comparison operations (e.g. <, ==), the intrinsic type of the result is always boolean. If either or both operands are arrays, then the result is a boolean array.

The PSyIR type system includes support for those situations where PSyclone is not able to fully understand a variable declaration. In such cases, the type is an instance of UnsupportedFortranType which stores both the original declaration and, optionally, a partial_datatype holding the aspects of the type that can be represented in the PSyIR. The presence of a partial_datatype implies that we fully understand the intrinsic type. Given this and an array shape, it is always possible to determine the result of a numerical operation involving such a type.

IntrinsicCall Nodes

PSyIR IntrinsicCall nodes (see IntrinsicCall) capture all PSyIR intrinsics that are not expressed as language symbols (+,`-,`* etc). The latter are captured as Operation nodes. At the moment the available PSyIR IntrinsicCall match those of the Fortran 2018 standard In addition to Fortran Intrinsics, special Fortran statements such as: ALLOCATE, DEALLOCATE and NULLIFY are also PSyIR IntrinsicCalls.

IntrinsicCalls, like Calls, have properties to inform if the call is to a pure, elemental, inquiry (does not touch the first argument data) function or is available on a GPU device.

CodeBlock Node

The PSyIR CodeBlock node contains code that has no representation in the PSyIR. It is useful as it allows the PSyIR to represent complex code by using CodeBlocks to handle the parts which contain unsupported language features. One approach would be to work towards capturing all language features in the PSyIR, which would gradually remove the need for CodeBlocks. However, the purpose of the PSyIR is to capture code concepts that are relevant for performance, not all aspects of a code, therefore it is likely that CodeBlocks will continue to be an important part of the PSyIR. See the full Codeblock API in the CodeBlock reference guide.

The code represented by a CodeBlock is currently stored as a list of fparser2 nodes. Therefore, a CodeBlock’s input and output language is limited to being Fortran. This means that only the fparser2 front-end and Fortran back-end can be used when there are CodeBlocks within a PSyIR tree. In theory, language interfaces could be written between CodeBlocks and other PSyIR Nodes to support different back-ends but this has not been implemented.

Currently PSyIR have a single CodeBlock node that can be found in place of full Statements or being part of an expression that evaluates to a DataNode. To make this possible CodeBlock is a subclass of both: Statement and DataNode. However, in certain situations we still need to differentiate which one it is, for instance the Fortran back-end needs this information, as expressions do not need indentation and a newline whereas statements do. For this reason, CodeBlock has a structure method that indicates whether the code contains one or more unrecognized language expressions or one or more statements (which may themselves contain expressions).

The Fortran front-end populates the structure attribute using a feature of the fparser2 node list that is if the first node in the list is a statement then so are all the other nodes in the list and that if the first node in the list is an expression then so are all the other nodes in the list. This allows the structure method to return a single value that represents all nodes in the list. The structure of the PSyIR hierarchy is used to determine whether the code in a CodeBlock contains expressions or statements. This is achieved by looking at the parent PSyIR Node. If the parent Node is a Schedule then the CodeBlock contains one or more statements, otherwise it contains one or more expressions.

This logic works for existing PSyIR nodes and relies on any future PSyIR nodes being constructed so this continues to be true. Another solution would be to have two different nodes: StatementsCodeBlock which subclasses Statement, and DataCodeBlock which subclasses DataNode. We have chosen the first implementation for the simplicity of having a single PSyIR node instead of two, but if things get more complicated using this implementation, the second alternative could be considered again.

ArrayMixin

ArrayMixin is an abstract “mix-in” base class which implements various methods that are specific to those nodes representing arrays and array accesses. It is subclassed by ArrayReference, ArrayOfStructuresReference, ArrayMember and ArrayOfStructuresMember.

Reference Node

The PSyIR Reference Node represents a variable access. It keeps a reference to a Symbol which will be stored in a symbol table. See the full Reference API in the Reference reference guide.

ArrayReference Node

The PSyIR ArrayReference Node represents an access to one or more elements of an array variable. It keeps a reference to a Symbol which will be stored in a symbol table. The indices used to access the array element(s) are represented by the children of the node. The ArrayReference Node inherits from both the Reference and ArrayMixin classes. See the full API in the ArrayReference reference guide.

Directive

The PSyIR Directive Node represents a Directive, such as is used in OpenMP or OpenACC. There are two subclasses, RegionDirective and StandaloneDirective. RegionDirective nodes contain a schedule as their first child, which contains the code segment covered by the directive, for example a Loop for which an OpenMP parallel do may be applied to. Both RegionDirective and StandaloneDirective may also have Clause nodes as children, and can be accessed through the clauses member. See the full API in the Directive reference guide.

Warning

Some parts of some Clauses are still under development, and not all clauses are encoded in Clauses classes yet (for example OpenACC clauses). These clause strings are instead generated inside the begin_string or gen_code methods during code generation.

Named arguments

The Call node (and its sub-classes) support named arguments.

The argument names are provided by the argument_names property. This property returns a list of names. The first entry in the list refers to the first argument, the second entry in the list refers to the second argument, etc. An argument name is stored as a string. If an argument is not a named argument then the list entry will contain None. For example, for the following call:

call example(arg0, name1=arg1, name2=arg2)

the following list would be returned by the argument_names property:

[None, "name1", "name2"]

It was decided to implement it this way, rather than adding a new (NamedArgument) node, as 1) there is no increase in the number and types of PSyIR nodes and 2) iterating over all children (the arguments) of these nodes is kept simple.

The following methods support the setting and updating of named arguments: create(), append_named_arg(), insert_named_arg() and replace_named_arg().

However, this implementation raises consistency problems as it is possible to insert, modify, move or delete children (argument) nodes directly. This would make the argument names list inconsistent as the names themselves are stored within the node.

To solve this problem, the argument names are stored internally in an _argument_names list which not only keeps the argument names but also keeps a reference (the id) to the associated child argument. An internal _reconcile() method then checks whether the internal _argument_names list and the actual arguments match and fixes any inconsistencies.

The _reconcile() method is called before the argument_names property returns its values, thereby ensuring that any access to argument_names is always consistent.

The _reconcile() method looks through the arguments and tries to match them with one of the stored id’s. If there is no match it is assumed that this is not a named argument. This approach has the following behaviour: the argument names are kept if arguments are re-ordered; an argument that has replaced a named argument will not be a named argument; an inserted argument will not be a named argument, and the name of a deleted named argument will be removed.

Making a copy of the Call node also causes problems with consistency between the internal _argument_names list and the arguments. The reason for this is that the arguments get copied and therefore have a different id, whereas the id`s in the internal `_argument_names list are simply copied. To solve this problem, the copy() method is specialised to update the id`s. A second issue is that the internal `_argument_names list may already be inconsistent when a copy is made. Therefore the _reconcile() method is also called in the specialisation of the copy() method.

References to Structures and Structure Members

The PSyIR has support for representing references to symbols of structure type and to members of such structures. Since the former case is still a reference to a symbol held in a symbol table, it is already captured by the Reference node. A reference that includes an access to a member of a structure is described by a StructureReference which is a subclass of Reference. As such, it has a symbol property which gives the Symbol that the reference is to. The member of the structure being accessed is described by a Member (or subclass) which is stored as the first and only child of the StructureReference. The full API is given in the StructureReference section of the reference guide.

Similarly, ArrayOfStructuresReference represents a reference to a member of one or more elements of an array of structures. As such it subclasses both ArrayMixin and StructureReference. As with the latter, the first child describes the member being accessed and will be an instance of (a subclass of) Member. Subsequent children (of which there must be at least one since this is an array reference) then describe the array-index expressions of the reference in the usual fashion for an ArrayReference. The full API is given in the ArrayOfStructuresReference section of the reference guide.

Since members of structures are not represented by symbols in a symbol table, references to them are not subclasses of Reference. They are instead represented by instances of Member (or subclasses thereof). There are four of these:

Class	Type of Accessor Nested Inside
Member	No nested accessor (i.e. is a leaf)
ArrayMember	One or more elements of an array
StructureMember	Member of a structure
ArrayOfStructuresMember	Member of one or more elements of an array of structures

These classes are briefly described below. For full details please follow the appropriate links to the Reference Guide.

Member

This node is used for accesses to members of a structure which do not contain any further accesses nested inside. In a PSyIR tree, any instance of this node type must therefore have no children and a StructureReference or StructureMember (or subclasses thereof) as parent. The full API is given in the Member section of the reference guide.

ArrayMember

This node represents an access to one or more elements of an array within a structure. As such, it subclasses both Member and ArrayMixin. Its children follow the same rules as for an ArrayReference Node. The full API is given in the ArrayMember section of the reference guide.

StructureMember

This node represents an access to a member of a structure that is itself a member of a structure. As such, it has a single child which subclasses Member and specifies which component is being accessed. The full API is given in the StructureMember section of the reference guide.

ArrayOfStructuresMember

This node represents an access to a member of one or more elements of an array of structures that is itself a member of a structure. Its first child must be a subclass of Member. Subsequent children represent the index expressions for the array access. The full API is given in the ArrayOfStructuresMember section of the reference guide.

Data Type of a Structure Access

In order to get the actual data type of a structure reference, PSyclone needs to have access to the declaration of all structures involved in the accessor expression. However, these are often UnresolvedType if the module where they are declared has not been processed. In the case of some domain-API arguments added by PSyclone to a kernel call (e.g. the indices in GOcean, or additional field information in LFRic), the type of these structure accesses is actually known. When creating a structure reference, there is an option overwrite_datatype, which can be set to avoid the need to have details of the required structures. For example, the following code is used to declare that an access like op_proxy%ncell_3d is an LFRic integer:

self.append_structure_reference(
    operator["module"], operator["proxy_type"], ["ncell_3d"],
    arg.proxy_name_indexed,
    overwrite_datatype=LFRicTypes("LFRicIntegerScalarDataType")())

While most of PSyclone works without having access to this detailed information, the driver creation for kernel extraction (see PSy Kernel Extractor (PSyKE)) needs this information to declare the variables in the driver.

Comments attached to PSyIR Nodes

Since the PSyIR is designed to support source-to-source code generation, it is desirable to keep the output code as readable as possible, and this includes keeping or adding comments to the generated code. Comments are not first-class nodes in the PSyIR because it is an abstract syntax tree and it was preferable to hide the complexity of comment nodes from the PSyIR transformations and other manipulations. Therefore, comments have been implemented as string attributes (one for preceding and another for inline comments) attached to particular nodes. And thus the location of comments on a PSyIR tree will move together with their owning node.

The group of nodes that can contain comments does not have an exclusive common ancestor, so they have been implemented with a Mixin class called CommentableMixin. A node can keep track of comments if it inherits from this class, for example:

from psyclone.psyir.nodes.commentable_mixin import CommentableMixin

class MyNode(Node, CommentableMixin):
    ''' Example node '''

mynode = MyNode()
mynode.preceding_comment = "A multi-line\n preceding comment"
mynode.inline_comment = "An inline comment"

From the language-level PSyIR nodes, Container, Routine and Statement have the CommentableMixin trait.

Domain-Specific PSyIR

The discussion so far has been about generic language-level PSyIR. This is located in the psyir directory and contains nodes, symbols, transformations, front-ends and back-ends. None of this is domain specific.

To obtain domain-specific concepts the language-level PSyIR can be specialised or extended. All domains follow the PSyKAl separation of concerns with the Algorithm-layer and the PSy-layer having its own domain-specific concepts, this can be found in psyclone.domain.common.algorithm and psyclone.domain.common.psylayer respectively (some concepts are still on psyclone.psyGen for legacy reasons but will be moved to the new locations over time).

PSy-layer concepts

The PSyLoop is a Loop where the boundaries are given by the domain specific iteration space that the kernels are applied to. In turn it is sub-classed in all of the domains supported by PSyclone. This then allows the class to be configured with a list of valid loop ‘types’. For instance, the GOcean sub-class, GOLoop, has “inner” and “outer” while the LFRic (dynamo0.3) sub-class, LFRicLoop, has “dofs”, “colours”, “colour”, “” and “null”. The default loop type (iterating over cells) is here indicated by the empty string. The concept of a “null” loop type is currently required because the dependency analysis that determines the placement of halo exchanges is handled within the Loop class. As a result, every Kernel call must be associated with a Loop node. However, the LFRic domain has support for kernels which operate on the ‘domain’ and thus do not require a loop over cells or dofs in the generated PSy layer. Supporting an LFRicLoop of “null” type allows us to retain the dependence-analysis functionality within the Loop while not actually producing a loop in the generated code. When #1148 is tackled, the dependence-analysis functionality will be removed from the Loop class and this concept of a “null” loop can be dropped.
The Kern, which can be of type CodedKern, InlinedKern or BuiltIn are the singular units of computation that can be found inside a PSyLoop.
The HaloExchange is a distributed-memory concept in the PSy-layer.
The GlobalSum is a distributed-memory concept in the PSy-layer.

Other specializations

In LFRic there are specialisations for kernel-layer datatypes and symbols. For the algorithm layer in both GOcean1.0 and LFRic there are specialisations for invokes and kernel calls. This is discussed further in the following sections.

The LFRic PSyIR

The LFRic PSyIR is a set of subclasses of the PSyIR which captures LFRic-specific routines, datatypes and associated symbols. These subclasses are work in progress and at the moment are limited to 1) a subset of the datatypes passed into LFRic kernels by argument and by use association and 2) LFRic invoke and kernel calls (LFRicAlgInvokeCall and LFRicKernelFunctor) in the LFRic algorithm-layer. Over time these will be expanded to support a) all LFRic kernel datatypes, b) all LFRic PSyIR datatypes, c) subroutines (KernRoutine etc), d) derived quantities e.g. iterator variables and eventually e) higher level LFRic PSyIR concepts, which will not be concerned with symbol tables and datatypes.

The Kernel-layer subclasses will be used to:

check that the data types, dimensions, intent etc. of a coded kernel’s subroutine arguments conform to the expected datatypes, dimensions, intent etc as defined by the kernel metadata and associated LFRic rules.
represent coded kernels, which will make it easier to reason about the structure of a kernel. At the moment a coded kernel is translated into generic PSyIR. This generic PSyIR will be further translated into LFRic PSyIR using the expected datatypes as specified by the kernel metadata and associated LFRic rules.
replace the existing kernel stub generation implementation so that the PSyIR back ends can be used and PSyclone will rely less on f2pygen and fparser1. At the moment kernel_interface provides the same functionality as kern_stub_arg_list, except that it uses the symbol table (which keeps datatypes and their declarations together).
generate the PSy-layer, replacing the existing kern_call_arg_list and gen_call routines.

The Algorithm-layer subclasses will be used to:

help with transforming the algorithm layer.
help with reasoning about the algorithm layer e.g. to check that the algorithm layer and kernel metadata match.
generate the LFRic Algorithm-layer PSyIR e.g. in psyclone-kern.

Algorithm-layer Classes

The LFRic PSyIR for the Algorithm layer is captured in the domain/lfric/algorithm/psyir.py module. Three classes are currently provided statically, one to capture an invoke call, LFRicAlgorithmInvokeCall and two to capture Builtin and (coded) Kernel calls within an invoke call, LFRicBuiltinFunctor and LFRicKernelFunctor respectively.

The LFRicBuiltinFunctorFactory class dynamically creates a subclass of LFRicBuiltInFunctor for every LFRic Builtin. These are named following the scheme LFRic_<BUILTIN_NAME>_Functor so that, for example, the Setval_X builtin is represented by the LFRic_Setval_X_Functor class. An instance of the appropriate class may be obtained using the factory’s create method:

LFRicBuiltinFunctorFactory.create(name, table, arguments)[source]

Create a BuiltinFunctor for the named LFRic builtin.

Parameters:

name (str) – the built-in for which a functor is required.
table (psyclone.psyir.symbols.SymbolTable) – the symbol table to which to add a corresponding symbol.
arguments (List[psyclone.psyir.nodes.DataNode]) – the arguments to give to the functor.

Kernel-layer Classes

The class LFRicTypes in domain/lfric/lfric_types.py manages the various LFRic data types. It provides a simple interface to get standard classes for LFRic data. For example:

>>> from psyclone.domain.lfric import LFRicTypes
>>> NumberOfUniqueDofsDataSymbol = LFRicTypes("NumberOfUniqueDofsDataSymbol")

The relevant classes are dynamically generated to avoid boilerplate code and to make it simpler to change the LFRic infrastructure classes in the future.

The idea is to declare different classes for the different concepts. For example NumberOfDofsDataType() and NumberOfDofsDataSymbol() classes are created and these are subclasses of DataType and DataSymbol respectively. In NumberOfDofsDataType the intrinsic and precision properties are pre-defined, as is the fact that it is a scalar, so these do not need to be specified. All that is needed to create a undf symbol is a name and the function space it represents:

>>> UNDF_W3 = NumberOfUniqueDofsDataSymbol("undf_w3", "w3")

For arrays, (e.g. for FieldData) the dimensions must also be provided as a Reference:

>>> from psyclone.psyir.nodes import Reference
>>> RealFieldDataDataSymbol = LFRicTypes("RealFieldDataDataSymbol")
>>> FIELD1 = RealFieldDataDataSymbol("field1", [Reference(UNDF_W3)], "w3")

At the moment, argument types and values are also not checked e.g. the function space argument - see issue #926. There is also no consistency checking between specified function spaces (e.g. that UNDF_W3 is for the same function space as FIELD1 in the above example) - see issue #927. Also, the function space attribute would be better if it were a class, rather than using a string, see issue #934.

Currently entities which can have different intrinsic types (e.g. FieldData) are captured as different classes (RealFieldDataDataSymbol, IntegerFieldDataDataSymbol etc). This could be modified if a single class turns out to be preferable.

class psyclone.domain.lfric.LFRicTypes(name)[source]

This class implements a singleton that manages LFRic types. Using the ‘call’ interface, you can query the data type for LFRic types, e.g.:

>>> from psyclone.configuration import Config
>>> from psyclone.domain.lfric import LFRicTypes
>>> config = Config.get()
>>> num_dofs_class = LFRicTypes("NumberOfUniqueDofsDataSymbol")
>>> my_var = num_dofs_class("my_num_dofs")
>>> print(my_var.name)
my_num_dofs

It uses the __new__ function to implement the access to the internal dictionary. This is done to minimise the required code for getting a value, e. g. compared with LFRicTypes.get()("something"), or LFRicType.get("something").

Kernel arguments

At the moment, kernel arguments are generated by the KernStubArgList or KernCallArgList classes. However, whilst these classes generate the correct number of arguments in the correct order, they have no knowledge of the datatypes that the arguments correspond to and how the arguments relate to each other (they just output strings).

The logic and declaration of kernel variables is handled separately by the gen_stub method in LFRicKern and the gen_code method in LFRicInvoke. In both cases these methods make use of the subclasses of LFRicCollection to declare variables.

When using the symbol table in the LFRic PSyIR we naturally capture arguments and datatypes together. The KernelInterface class is aiming to replicate the KernStubArgList class and makes use of the LFRic PSyIR. The idea is that the former will replace the latter when it has the same or more functionality. At the moment, only methods required to pass the tests have been implemented in KernelInterface so there is more to be done, but it is also not clear what the limitations are for KernStubArgList.

Eventually the definition of lfric datatypes should be moved to the LFRic PSyIR, but at the moment there is a lot of information defined in the LFRicCollection subclasses. This will need to be addressed over time.

The GOcean PSyIR

GOcean makes use of algorithm-layer PSyIR specialisations.

Algorithm-layer Classes

The GOcean PSyIR for the Algorithm layer is captured in domain/common/algorithm/psyir.py. Two classes are currently provided, one to capture an invoke call, AlgorithmInvokeCall and the other to capture (coded) Kernel calls within an invoke call, KernelFunctor.