The PSyclone Intermediate Representation (PSyIR)
The PSyclone Intermediate Representation (PSyIR) is a language-independent Intermediate Representation that PSyclone uses to represent the PSy (Parallel System) and the Kernel (serial units of work) layers of an application that can be constructed from scratch or produced from existing code using one of the PSyIR front-ends. Its design is optimised to represent high-performance parallel applications in an expressive, mutable and extensible way:
It is expressive because it is intended to be created and/or manipulated by HPC software experts directly when optimizing or porting the code.
It is mutable because it is intended to be programmatically manipulated (usually though PSyclone scripts) and maintain a coherent state (with valid relationships, links to symbols, links to dependencies) after each manipulation.
It is extensible because it is intended to be used as the core component of domain-specific systems which include additional abstract concepts or logic not captured in the generic representation.
To achieve these design goals we use a Normalised Heterogeneous AST
representation together with a Type System and a Symbol Table.
By heterogeneous we mean that we distinguish between AST nodes using
Python class inheritance system and each node has its particular (and
semantically relevant) navigation and behaviour methods. For instance the
Assignment
node has lhs
and rhs
properties to navigate to the
left-hand-side and right-hand-side operators of the Assignment. It also
means we can identify a node using its type with
isinstance(node, Assignment)
.
Nevertheless, we maintain a normalised core of node relationships and
functionality that allows us to build tree walkers, tree visitors and
dependency analysis tools without the need to consider the implementation
details of each individual sub-class.
The common functionality that all nodes must have is defined in the PSyIR base class Node. See the list of all PSyIR common methods in the Node reference guide.
More information about the type system and symbols and how PSyIR can be transformed back to a particular language using the back-ends (Writers) is provided in the following sections of this guide.
How to create new PSyIR Nodes
In order to create a new PSyIR node, either for adding a new core PSyIR node or to extend the functionality in one of the PSyclone APIs, it is mandatory to perform the following steps:
The new node must inherit from
psyclone.psyir.nodes.Node
or one of its sub-classes. Note thatNode
won’t be accepted as a child anywhere in the tree. It may be appropriate to specialise one of the existing subclasses of Node, rather than Node itself. A good starting point would be to considerpsyclone.psyir.nodes.Statement
(which will be accepted inside any Schedule) orpsyclone.psyir.nodes.DataNode
(which will be accepted anywhere that the node can be evaluated to a data element).Set the
_text_name
and the_color_key
class string attributes. These attributes will provide standard behaviour for the__str__
andview()
methods.Set the
_children_valid_format
class string attribute and specialise the static method_validate_child(position, child)
. These define, textually and logically, what types of nodes will be accepted as children of the new node:_children_valid_format
is the textual representation that will be used in error messages. It is expressed using tokens with the same name as the PSyIR classes and the following symbols:|
: or operand.,
: concatenation operand.[ expression ]*
: zero or more instances of the expression.[ expression ]+
: one or more instances of the expression.<LeafNode>
: NULL operand (no children accepted).
For instance, an expression that accepts a statement as a first child and one or more DataNodes after it would be:
Statement [, DataNode]+
._validate_child(position, child)
returns a boolean which indicates whether the given child is a valid component for the given position.
Note
Note that the valid children of a node are described two times, once in
_children_valid_format
and another in _validate_child
, and it is
up to the developer to keep them coherent in order to provide sensible
error messages. Alternatively we could create an implementation where
the textual representation is parsed and the validation method is
generated automatically, hence avoiding the duplication. Issue #765
explores this alternative.
If any of the attributes introduced by this method should not be shallow-copied when creating a duplicate of this PSyIR branch, specialise the
_refine_copy
method to perform the appropriate copy actions.If any of the attributes in this node should be used to compute the equality of two nodes, specialise the
__eq__
member to perform the appropriate checks. The default__eq__
behaviour is to check both instance types are exactly the same, and each of their children also pass the equality check. The only restriction on this implementation is that it must call thesuper().__eq__(other)
as part of its implementation, to ensure any inherited equality checks are correctly checked. The default behaviour ignores annotations and comment attributes, as they should not affect the semantics of the PSyIR tree.
For example, if we want to create a node that can be found anywhere where a statement is valid, and in turn it accepts one and only one DataNode as a child, we would write something like:
from psyclone.psyir.nodes import Statement, DataNode class MyNode(Statement): ''' MyNode is an example node that can be found anywhere where statement is valid, and in turn it accepts one and only one DataNode as a children. ''' _text_name = "MyNodeName" _colour = "blue" _children_valid_format = "DataNode" @staticmethod def _validate_child(position, child):
This implementation already provides the basic PSyIR functionality and the node can be integrated and used in the PSyIR tree:
>>> from psyclone.psyir.nodes import Literal, Schedule
>>> from psyclone.psyir.symbols import INTEGER_TYPE
>>> from code_snippets.newnode import MyNode
>>> mynode = MyNode(children=[Literal("1", INTEGER_TYPE)])
>>> mynode.children.append(Literal("2", INTEGER_TYPE))
Traceback (most recent call last):
...
psyclone.errors.GenerationError: Generation Error: Item 'Literal' can't be child 1 of 'MyNodeName'. The valid format is: 'DataNode'.
>>> schedule = Schedule()
>>> schedule.addchild(mynode)
>>> print(schedule.view(colour=False))
Schedule[]
0: MyNodeName[]
Literal[value:'1', Scalar<INTEGER, UNDEFINED>]
For a full list of methods available in any PSyIR node see the Node reference guide.
Note
For convenience, the PSyIR children validation is performed
with both: Node methods (e.g. node.addchild()
) and also list
methods (e.g. node.children.extend([node1, node2])
).
To achieve this, we sub-classed the Python list and redefined all methods that modify the list by calling first the PSyIR provided validation method and subsequently, if valid, calling the associated list method and triggering an ‘update’ signal (see Dynamic Tree Updates).
The parent-child relationship
To facilitate the PSyIR tree navigation, the parent-child relationship between
nodes is represented with a double reference (providing node.parent
and
node.children
navigational properties).
However, to maintain the consistency of the double reference, we don’t
allow the node API to manually specify its parent
reference. It is
always the responsibility of a parent node to update the parent
reference of its children. To make this possible for any operation
applied to the node.children
list, we provide this functionality
in the same list subclass specialisation that does the child
validation checks explained in the previous section. Therefore, all
the following list operations will work as expected:
node.children.insert(node1) # Will set node1.parent reference to node
node.children.extend([node2, node3]) # Will set node2 and node3 parent
# references to node
del node.children[1] # Will unset the parent reference of children[1]
node.children = [] # Will unset the parent references of all its previous
# children
node.detach() # Will ask node.parent to free node, as node can't change
# the connection by itself
The only exception to the previous consistency rule is when a node constructor is given the parent reference when building a PSyIR tree top-down. In this case, the single-direction reference will be accepted temporarily, but a child connection operation will need to be done eventually to satisfy the other part of the connection. Any attempt to insert the new node as a child of another node not specified in the constructor will fail as this would break the consistency with the predefined parent reference. For example:
assignment = Assignment()
rhs = Reference(symbol1, parent=assignment) # Predefined parent reference
lhs = Reference(symbol2, parent=assignment) # Predefined parent reference
assignment.children = [lhs, rhs] # Finalise parent-child relationship
node = Reference(symbol3, parent=assignment)
lhs.addchild(node) # Will produce a Generation error because the node
# constructor specified that its parent would be the
# 'assignment' node
Note that a node which already has a parent won’t be accepted as a child of another node, as this could break any previously existing parent-child relationship.
node1.children.insert(child) # Valid
node2.children.insert(child) # Will produce a GenerationError
Methods like node.detach()
, node.copy()
and node.pop_all_children()
can be used to move or replicate existing children into different nodes.
Tree Copying
The ability to create a deep-copy of a PSyIR tree is used heavily in PSyclone,
primarily by the PSyIR backends. (This is because those backends often need
to modify the tree while ensuring that the one provided by the caller remains
unchanged.) As mentioned in the previous section, the node.copy()
method
provides this functionality:
- Node.copy()[source]
Return a copy of this node. This is a bespoke implementation for PSyIR nodes that will deepcopy some of its recursive data-structure (e.g. the children tree), while not copying other attributes (e.g. top-level parent reference).
- Returns:
a copy of this node and its children.
- Return type:
psyclone.psyir.node.Node
As part of this copy operation, all Symbols referred to in the new tree must
also be replaced with their equivalents from the symbol tables in the new tree.
Since these symbol tables are associated with instances of ScopingNode
, it
is ScopingNode._refine_copy
which handles this:
- ScopingNode._refine_copy(other)[source]
Refine the object attributes when a shallow copy is not the most appropriate operation during a call to the copy() method.
This method creates a deep copy of the SymbolTable associated with the other scoping node and then calls replace_symbols_using to update all Symbols referenced in the tree below this node.
Warning
Since replace_symbols_using only uses symbol names, this won’t get the correct symbol if the PSyIR has symbols shadowed in nested scopes, e.g.:
subroutine test integer :: a integer :: b = 1 if condition then ! PSyIR declares a shadowed, locally-scoped a' a' = 1 if condition2 then ! PSyIR declares a shadowed, locally-scoped b' b' = 2 a = a' + b'
Here, the final assignment will end up being a’ = a’ + b’ and thus the semantics of the code are changed. TODO #2666.
- Parameters:
other (
psyclone.psyir.node.Node
) – object we are copying from.
Dynamic Tree Updates
Certain modifications to a PSyIR tree will require that parent nodes
also be updated. For instance, if nodes are added to or removed from
an OpenACC data region, then the clauses describing the
necessary data movement (to/from the accelerator device) may have to
change. To support such use cases, the PSyIR Node has the
update_signal
method which is used to signal that the tree has
been modified. This signal is propagated up the tree (i.e. from parent
to parent). The default handler for this signal, Node._update_node
, does
nothing. If a sub-class must take action when the tree below it is
modified then it must override the _update_node
method as appropriate.
Note that the signalling mechanism is fully contained within the Node
class and takes care of avoiding recursive updates to the same Node instance.
It should therefore only be necessary for a class to implement the
_update_node
handler.
Selected Node Descriptions
ScopingNode
A ScopingNode is an abstract class node that defines a scoping region, this node and all its descendants have access to a shared set of symbols. These symbols are described in the SymbolTable (psyclone.psyir.symbols.SymbolTable) attached to this node.
There is a double-link between the ScopingNode (through the symbol_table
property) and the SymbolTable (through the scope
property) objects. To
maintain a consistent connection between both objects the only public methods
to update the connections are the attach
and detach
methods of
SymbolTable
(which takes care of both sides of the connection).
Also note that the constructor will not accept as a parameter a symbol table that already belongs to another scope. The symbol table will need to be detached or deep copied before it can be assigned to the new ScopingNode.
See the full API in the ScopingNode reference guide.
Container
The Container node is a ScopingNode that contains one or more Container and/or Routine nodes. A Container can be used to capture a hierarchical grouping of Routine nodes and a hierarchy of Symbol scopes i.e. a Symbol specified in a Container is visible to all Container and Routine nodes within it and their descendants. See the full Container API in the Container reference guide.
FileContainer
The FileContainer node is a subclass of the Container node and is used to capture the concept of a file that contains one or more Container and/or Routine nodes. Whilst this structure is the same as for a Container, it is useful to distinguish between the two as backends may need to deal differently with a FileContainer and a Container.
A FileContainer is always created at the root of the PSyIR tree when parsing Fortran code, as a Fortran file can contain one or more program units (captured as Containers and/or Routines). PSyIR tree when parsing Fortran code, as Fortran code has the concept of a program (captured as a FileContainer) that can contain one or more program units (captured as Containers and/or Routines). See the full FileContainer API in the FileContainer reference guide.
Schedule
The Schedule is a ScopingNode that represents a sequence of statements. See the full Schedule API in the Schedule reference guide.
Routine
The Routine node is a subclass of Schedule that represents any program unit (subroutine, function or main program). As such it extends Schedule through the addition of the return_symbol (required when representing a function) and is_program properties. It also adds the create helper method for constructing a valid Routine instance. It is an important node in PSyclone because two of its specialisations: InvokeSchedule and KernelSchedule (described below), are used as the root nodes of PSy-layer invokes and kernel subroutines. This makes them the starting points for any walking of the PSyIR tree in PSyclone transformation scripts and a common target for the application of transformations.
InvokeSchedule
The InvokeSchedule is a PSyIR node that represents an invoke subroutine in the PSy-layer. It specialises the psyclone.psyir.nodes.Routine functionality with a reference to its associated psyclone.psyGen.Invoke object.
Note
This class will be renamed to InvokeRoutine in issue #909.
KernelSchedule
The KernelSchedule is a PSyIR node that represents a Kernel subroutine. As such it is a subclass of psyclone.psyir.nodes.Routine with return_type set to None and is_program set to False.
Note
This class will be renamed to KernelRoutine in issue #909.
Control-Flow Nodes
The PSyIR has four control flow nodes: IfBlock, Loop, WhileLoop and Call. These nodes represent the canonical structure with which conditional branching constructs, iteration constructs and accesses to other blocks of code are built. Additional language-specific syntax for branching and iteration will be normalised to use these same constructs. For example, Fortran has the additional branching constructs ELSE IF, SELECT CASE and SELECT TYPE: when a Fortran code is translated into the PSyIR, PSyclone will build a semantically equivalent implementation using IfBlock nodes (and an additional CodeBlock containing SELECT TYPE in the case of SELECT TYPE). Similarly, Fortran also has the WHERE construct and statement which are represented in the PSyIR with a combination of Loop and IfBlock nodes. Such nodes in the new tree structure are annotated with information to enable the original language-specific syntax to be recreated if required (see below). See the full IfBlock API in the IfBlock reference guide. The PSyIR also supports the concept of named arguments for Call nodes, see the Named arguments section for more details.
Note
A Call node (like the CodeBlock) inherits from both Statement and DataNode because it can be found in Schedules or inside Expressions, however this has some shortcomings, see issue #1437.
Control-Flow Node annotation
If the PSyIR is constructed from existing code (using e.g. the fparser2 frontend) then it is possible that information about that code may be lost. This is because the PSyIR is only semantically equivalent to certain code constructs. In order that information is not lost (making it possible to e.g. recover the original code structure if desired) Nodes may have annotations associated with them. The annotations, the Node types to which they may be applied and their meanings are summarised in the table below:
Annotation |
Node types |
Origin |
---|---|---|
was_elseif |
IfBlock |
else if |
was_single_stmt |
IfBlock, Loop |
if(logical-expr)expr or Fortran where(array-mask)array-expr |
was_case |
IfBlock |
Fortran select case construct |
was_where |
Loop, IfBlock |
Fortran where construct |
was_unconditional |
WhileLoop |
Fortran do loop with no condition |
was_type_is |
IfBlock |
Fortran type is construct within a select type construct |
was_class_is |
IfBlock |
Fortran class is construct within a select type construct |
Note
A Loop may currently only be given the was_single_stmt annotation if it also has the was_where annotation. (Thus indicating that this Loop originated from a WHERE statement in the original Fortran code.) The PSyIR represents Fortran single-statement loops (often called array notation) as arrays with ranges in the appropriate indices.
Loop Node
The Loop node is the canonical representation of a counted loop, it has the start, stop, step and loop_body of the loop as its children. The node has the same semantics than the Fortran do construct: the boundary values are inclusive (both are part of the iteration space) and the start, stop and step expressions are evaluated just once at the beginning of the loop.
For more details on the Loop node, see the full API in the reference guide.
WhileLoop Node
The WhileLoop node is the canonical representation of a while loop. The PSyIR representation of the Fortran do loop with no condition will have the annotation was_unconditional, but is otherwise no different from that of a do while loop whose condition is the logical constant .TRUE..
For more details on the WhileLoop node, see the full API in the reference guide.
Ranges
The PSyIR has the Range node which represents a range of integer values with associated start, stop and step properties. e.g. the list of values [4, 6, 8, 10] would be represented by a Range with a start value of 4, a stop value of 10 and a step of 2 (all stored as Literal nodes). This class is intended to simplify the construction of Loop nodes as well as to support array slicing (see below). However, this functionality is under development and at this stage neither of those options have been implemented.
The Range node must also provide support for array-slicing constructs where a user may wish to represent either the entire range of possible index values for a given dimension of an array or a sub-set thereof. e.g. in the following Fortran:
real, dimension(10, 5) :: my_array
call some_routine(my_array(1, :))
the argument to some_routine is specified using array syntax where the lone colon means every element in that dimension. In the PSyIR, this argument would be represented by an ArrayReference node with the first entry in its shape being an integer Literal (with value 1) and the second entry being a Range. In this case the Range will have a start value of LBOUND(my_array, 1), a stop value of UBOUND(my_array, 1) and a step of Literal(“1”). Note that LBOUND and UBOUND will be instances of BinaryOperation. (For the particular code fragment given above, the values are in fact known [1 and 5, respectively] and could be obtained by querying the Symbol Table.)
See the full Range API in the Range reference guide.
Operation Nodes
Arithmetic and logic operations are represented in the PSyIR by sub-classes of the Operation node. The operations are classified according to the number of operands:
Those having one operand are represented by psyclone.psyir.nodes.UnaryOperation nodes,
those having two operands are represented by psyclone.psyir.nodes.BinaryOperation nodes.
See the documentation for each Operation class in the Operation, UnaryOperation and BinaryOperation sections of the reference guide.
Note
Similar to Fortran, the PSyIR has two comparison operators, one for booleans (EQV) and one for non-booleans (EQ). These are not interchangeable because they have different precedence priorities and some compilers will not compile with the wrong operator. In some cases we need to insert a comparison of two expressions and we don’t know the datatype of the operands (e.g. in the select-case canonicalisation). A solution to this is to create an abstract interface with appropriate implementations for each possible datatype.
Data Type of an Operation Node
Table 7.2 of the Fortran2008 standard specifies the rules governing the types of operands and their results. The PSyIR follows these rules with the exception that there is no support for symbols of complex (imaginary) type (see #1590). For unary operations, the type of the result is just that of the operand. For a numeric, binary operation, these rules boil down to saying that if either argument is real then the result is real but if both arguments are integer then the result is integer.
If the precisions of the operands are the same, then the result must also be of that precision. Otherwise, Section 7.1.9.3 of the Fortran2008 standard says that the precision of the result is the greater of the two. In the PSyIR, if both precisions are instances of ScalarType.Precision or int then this permits the precision of the result to be determined. Otherwise, the result is given a precision of ScalarType.Precision.UNDEFINED.
For comparison operations (e.g. <, ==), the intrinsic type of the result is always boolean. If either or both operands are arrays, then the result is a boolean array.
The PSyIR type system includes support for those situations where PSyclone is not able to fully understand a variable declaration. In such cases, the type is an instance of UnsupportedFortranType which stores both the original declaration and, optionally, a partial_datatype holding the aspects of the type that can be represented in the PSyIR. The presence of a partial_datatype implies that we fully understand the intrinsic type. Given this and an array shape, it is always possible to determine the result of a numerical operation involving such a type.
IntrinsicCall Nodes
PSyIR IntrinsicCall nodes (see IntrinsicCall) capture all PSyIR intrinsics that are not expressed as language symbols (+,`-,`* etc). The latter are captured as Operation nodes. At the moment the available PSyIR IntrinsicCall match those of the Fortran 2018 standard In addition to Fortran Intrinsics, special Fortran statements such as: ALLOCATE, DEALLOCATE and NULLIFY are also PSyIR IntrinsicCalls.
IntrinsicCalls, like Calls, have properties to inform if the call is to a pure, elemental, inquiry (does not touch the first argument data) function or is available on a GPU device.
SUM, PRODUCT, LBOUND, and UBOUND are not documented as having support on GPUs according to the current Nvidia documentation, however we have confirmed them experimentally and so PSyclone treats them as available on GPU devices.
CodeBlock Node
The PSyIR CodeBlock node contains code that has no representation in the PSyIR. It is useful as it allows the PSyIR to represent complex code by using CodeBlocks to handle the parts which contain unsupported language features. One approach would be to work towards capturing all language features in the PSyIR, which would gradually remove the need for CodeBlocks. However, the purpose of the PSyIR is to capture code concepts that are relevant for performance, not all aspects of a code, therefore it is likely that CodeBlocks will continue to be an important part of the PSyIR. See the full Codeblock API in the CodeBlock reference guide.
The code represented by a CodeBlock is currently stored as a list of fparser2 nodes. Therefore, a CodeBlock’s input and output language is limited to being Fortran. This means that only the fparser2 front-end and Fortran back-end can be used when there are CodeBlocks within a PSyIR tree. In theory, language interfaces could be written between CodeBlocks and other PSyIR Nodes to support different back-ends but this has not been implemented.
Currently PSyIR have a single CodeBlock node that can be found
in place of full Statements or being part of an expression that
evaluates to a DataNode. To make this possible CodeBlock is a subclass
of both: Statement and DataNode. However, in certain situations we
still need to differentiate which one it is, for instance the Fortran
back-end needs this information, as expressions do not need indentation
and a newline whereas statements do.
For this reason, CodeBlock has a structure
method that indicates
whether the code contains one or more unrecognized language expressions
or one or more statements (which may themselves contain expressions).
The Fortran front-end populates the structure
attribute using a
feature of the fparser2 node list that is if the first node in the
list is a statement then so are all the other nodes in the list and
that if the first node in the list is an expression then so are all
the other nodes in the list. This allows the structure
method to
return a single value that represents all nodes in the list.
The structure of the PSyIR hierarchy is used to determine whether the
code in a CodeBlock contains expressions or statements. This is
achieved by looking at the parent PSyIR Node. If the parent Node is a
Schedule then the CodeBlock contains one or more statements, otherwise
it contains one or more expressions.
This logic works for existing PSyIR nodes and relies on any future PSyIR nodes being constructed so this continues to be true. Another solution would be to have two different nodes: StatementsCodeBlock which subclasses Statement, and DataCodeBlock which subclasses DataNode. We have chosen the first implementation for the simplicity of having a single PSyIR node instead of two, but if things get more complicated using this implementation, the second alternative could be considered again.
ArrayMixin
ArrayMixin
is an abstract “mix-in” base class which implements
various methods that are specific to those nodes representing arrays
and array accesses. It is subclassed by ArrayReference
,
ArrayOfStructuresReference
, ArrayMember
and
ArrayOfStructuresMember
.
Reference Node
The PSyIR Reference
Node represents a variable access. It keeps
a reference to a Symbol
which will be stored in a symbol table.
See the full Reference
API in the
Reference reference guide.
ArrayReference Node
The PSyIR ArrayReference
Node represents an access to one or more
elements of an array variable. It keeps a reference to a Symbol which
will be stored in a symbol table. The indices used to access the array
element(s) are represented by the children of the node. The
ArrayReference
Node inherits from both the Reference
and
ArrayMixin
classes. See the full API in the ArrayReference reference guide.
Directive
The PSyIR Directive
Node represents a Directive, such as is used
in OpenMP or OpenACC. There are two subclasses, RegionDirective
and StandaloneDirective
. RegionDirective
nodes contain a
schedule as their first child, which contains the code segment covered
by the directive, for example a Loop
for which an OpenMP parallel
do may be applied to.
Both RegionDirective
and StandaloneDirective
may also have
Clause
nodes as children, and can be accessed through the clauses
member. See the full API in the Directive reference guide.
Warning
Some parts of some Clauses are still under development, and not all clauses
are encoded in Clauses classes yet (for example OpenACC clauses). These
clause strings are instead generated inside the begin_string
or
gen_code
methods during code generation.
Named arguments
The Call node (and its sub-classes) support named arguments.
The argument names are provided by the argument_names property. This property returns a list of names. The first entry in the list refers to the first argument, the second entry in the list refers to the second argument, etc. An argument name is stored as a string. If an argument is not a named argument then the list entry will contain None. For example, for the following call:
call example(arg0, name1=arg1, name2=arg2)
the following list would be returned by the argument_names property:
[None, "name1", "name2"]
It was decided to implement it this way, rather than adding a new (NamedArgument) node, as 1) there is no increase in the number and types of PSyIR nodes and 2) iterating over all children (the arguments) of these nodes is kept simple.
The following methods support the setting and updating of named arguments: create(), append_named_arg(), insert_named_arg() and replace_named_arg().
However, this implementation raises consistency problems as it is possible to insert, modify, move or delete children (argument) nodes directly. This would make the argument names list inconsistent as the names themselves are stored within the node.
To solve this problem, the argument names are stored internally in an _argument_names list which not only keeps the argument names but also keeps a reference (the id) to the associated child argument. An internal _reconcile() method then checks whether the internal _argument_names list and the actual arguments match and fixes any inconsistencies.
The _reconcile() method is called before the argument_names property returns its values, thereby ensuring that any access to argument_names is always consistent.
The _reconcile() method looks through the arguments and tries to match them with one of the stored id’s. If there is no match it is assumed that this is not a named argument. This approach has the following behaviour: the argument names are kept if arguments are re-ordered; an argument that has replaced a named argument will not be a named argument; an inserted argument will not be a named argument, and the name of a deleted named argument will be removed.
Making a copy of the Call node also causes problems with consistency between the internal _argument_names list and the arguments. The reason for this is that the arguments get copied and therefore have a different id, whereas the id`s in the internal `_argument_names list are simply copied. To solve this problem, the copy() method is specialised to update the id`s. A second issue is that the internal `_argument_names list may already be inconsistent when a copy is made. Therefore the _reconcile() method is also called in the specialisation of the copy() method.
References to Structures and Structure Members
The PSyIR has support for representing references to symbols of
structure type and to members of such structures. Since the former
case is still a reference to a symbol held in a symbol table, it is
already captured by the Reference
node. A reference that includes
an access to a member of a structure is described by a
StructureReference
which is a subclass of Reference
. As such,
it has a symbol
property which gives the Symbol
that the
reference is to. The member of the structure being accessed is
described by a Member (or subclass) which is stored as the
first and only child of the StructureReference
. The full API is
given in the StructureReference section of the reference guide.
Similarly, ArrayOfStructuresReference
represents a reference to a
member of one or more elements of an array of structures. As such it
subclasses both ArrayMixin
and StructureReference
. As with the
latter, the first child describes the member being accessed and will
be an instance of (a subclass of) Member
. Subsequent
children (of which there must be at least one since this is an array
reference) then describe the array-index expressions of the reference
in the usual fashion for an ArrayReference
. The full API is given
in the ArrayOfStructuresReference section of the reference guide.
Since members of structures are not represented by symbols in a symbol
table, references to them are not subclasses of Reference
. They are
instead represented by instances of Member
(or subclasses
thereof). There are four of these:
Class |
Type of Accessor Nested Inside |
---|---|
Member |
No nested accessor (i.e. is a leaf) |
ArrayMember |
One or more elements of an array |
StructureMember |
Member of a structure |
ArrayOfStructuresMember |
Member of one or more elements of an array of structures |
These classes are briefly described below. For full details please follow the appropriate links to the Reference Guide.
Member
This node is used for accesses to members of a structure which do not contain
any further accesses nested inside. In a PSyIR tree, any instance of this node
type must therefore have no children and a StructureReference
or
StructureMember
(or subclasses thereof) as parent. The full API is given
in the
Member section of the reference guide.
ArrayMember
This node represents an access to one or more elements of an array
within a structure. As such, it subclasses both
Member
and ArrayMixin
. Its children follow the same rules
as for an ArrayReference Node. The full API is given in the
ArrayMember section of the reference guide.
StructureMember
This node represents an access to a member of a structure that is
itself a member of a structure. As such, it has a single child which subclasses
Member
and specifies which component is being accessed. The full API
is given in the
StructureMember section of the reference guide.
ArrayOfStructuresMember
This node represents an access to a member of one or more elements of an array
of structures that is itself a member of a structure. Its first child must be a
subclass of Member
. Subsequent children represent the index expressions
for the array access. The full API is given in the
ArrayOfStructuresMember section of the reference guide.
Data Type of a Structure Access
In order to get the actual data type of a structure reference, PSyclone
needs to have access to the declaration of all structures involved
in the accessor expression. However, these are often UnresolvedType if the
module where they are declared has not been processed. In the case of
some domain-API arguments added by PSyclone to a kernel call (e.g. the
indices in GOcean, or additional field information in LFRic), the type
of these structure accesses is actually known. When creating a
structure reference, there is an option overwrite_datatype
,
which can be set to avoid the need to have details of the required
structures. For example, the following code is used to declare that
an access like op_proxy%ncell_3d
is an LFRic integer:
self.append_structure_reference(
operator["module"], operator["proxy_type"], ["ncell_3d"],
arg.proxy_name_indexed,
overwrite_datatype=LFRicTypes("LFRicIntegerScalarDataType")())
While most of PSyclone works without having access to this detailed information, the driver creation for kernel extraction (see Kernel Extraction (PSyKE)) needs this information to declare the variables in the driver.
Domain-Specific PSyIR
The discussion so far has been about generic language-level
PSyIR. This is located in the psyir
directory and contains nodes,
symbols, transformations, front-ends and back-ends. None of this is
domain specific.
To obtain domain-specific concepts the language-level PSyIR can be
specialised or extended. All domains follow the PSyKAl separation of
concerns with the Algorithm-layer and the PSy-layer having its own
domain-specific concepts, this can be found in
psyclone.domain.common.algorithm
and psyclone.domain.common.psylayer
respectively (some concepts are still on psyclone.psyGen
for legacy
reasons but will be moved to the new locations over time).
PSy-layer concepts
The PSyLoop is a Loop where the boundaries are given by the domain specific iteration space that the kernels are applied to. In turn it is sub-classed in all of the domains supported by PSyclone. This then allows the class to be configured with a list of valid loop ‘types’. For instance, the GOcean sub-class, GOLoop, has “inner” and “outer” while the LFRic sub-class, LFRicLoop, has “dofs”, “colours”, “colour”, “” and “null”. The default loop type (iterating over cells) is here indicated by the empty string. The concept of a “null” loop type is currently required because the dependency analysis that determines the placement of halo exchanges is handled within the Loop class. As a result, every Kernel call must be associated with a Loop node. However, the LFRic domain has support for kernels which operate on the ‘domain’ and thus do not require a loop over cells or dofs in the generated PSy layer. Supporting an LFRicLoop of “null” type allows us to retain the dependence-analysis functionality within the Loop while not actually producing a loop in the generated code. When #1148 is tackled, the dependence-analysis functionality will be removed from the Loop class and this concept of a “null” loop can be dropped.
The Kern, which can be of type CodedKern, InlinedKern or BuiltIn are the singular units of computation that can be found inside a PSyLoop.
The HaloExchange is a distributed-memory concept in the PSy-layer.
The GlobalSum is a distributed-memory concept in the PSy-layer.
Other specializations
In LFRic there are specialisations for kernel-layer datatypes and symbols. For the algorithm layer in both GOcean1.0 and LFRic there are specialisations for invokes and kernel calls. This is discussed further in the following sections.
The LFRic PSyIR
The LFRic PSyIR is a set of subclasses of the PSyIR which captures
LFRic-specific routines, datatypes and associated symbols. These
subclasses are work in progress and at the moment are limited to 1) a
subset of the datatypes passed into LFRic kernels by argument and by
use association and 2) LFRic invoke and kernel calls
(LFRicAlgInvokeCall
and LFRicKernelFunctor
) in the LFRic
algorithm-layer. Over time these will be expanded to support a) all
LFRic kernel datatypes, b) all LFRic PSyIR datatypes, c) subroutines
(KernRoutine etc), d) derived quantities e.g. iterator variables and
eventually e) higher level LFRic PSyIR concepts, which will not be
concerned with symbol tables and datatypes.
The Kernel-layer subclasses will be used to:
check that the data types, dimensions, intent etc. of a coded kernel’s subroutine arguments conform to the expected datatypes, dimensions, intent etc as defined by the kernel metadata and associated LFRic rules.
represent coded kernels, which will make it easier to reason about the structure of a kernel. At the moment a coded kernel is translated into generic PSyIR. This generic PSyIR will be further translated into LFRic PSyIR using the expected datatypes as specified by the kernel metadata and associated LFRic rules.
replace the existing kernel stub generation implementation so that the PSyIR back ends can be used and PSyclone will rely less on
f2pygen
andfparser1
. At the momentkernel_interface
provides the same functionality askern_stub_arg_list
, except that it uses the symbol table (which keeps datatypes and their declarations together).generate the PSy-layer, replacing the existing
kern_call_arg_list
andgen_call
routines.
The Algorithm-layer subclasses will be used to:
help with transforming the algorithm layer.
help with reasoning about the algorithm layer e.g. to check that the algorithm layer and kernel metadata match.
generate the LFRic Algorithm-layer PSyIR e.g. in psyclone-kern.
Algorithm-layer Classes
The LFRic PSyIR for the Algorithm layer is captured in the
domain/lfric/algorithm/psyir.py
module. Three classes are currently
provided statically, one to capture an invoke call, LFRicAlgorithmInvokeCall
and two to capture Builtin and (coded) Kernel calls within an invoke
call, LFRicBuiltinFunctor
and LFRicKernelFunctor
respectively.
The LFRicBuiltinFunctorFactory
class dynamically creates a
subclass of LFRicBuiltInFunctor
for every LFRic
Builtin. These are named following the
scheme LFRic_<BUILTIN_NAME>_Functor
so that, for example, the Setval_X
builtin is represented by the LFRic_Setval_X_Functor
class. An instance
of the appropriate class may be obtained using the factory’s create method:
- LFRicBuiltinFunctorFactory.create(name, table, arguments)[source]
Create a BuiltinFunctor for the named LFRic builtin.
- Parameters:
name (str) – the built-in for which a functor is required.
table (
psyclone.psyir.symbols.SymbolTable
) – the symbol table to which to add a corresponding symbol.arguments (List[
psyclone.psyir.nodes.DataNode
]) – the arguments to give to the functor.
Kernel-layer Classes
The class LFRicTypes
in domain/lfric/lfric_types.py
manages
the various LFRic data types. It provides a simple interface
to get standard classes for LFRic data. For example:
>>> from psyclone.domain.lfric import LFRicTypes
>>> NumberOfUniqueDofsDataSymbol = LFRicTypes("NumberOfUniqueDofsDataSymbol")
The relevant classes are dynamically generated to avoid boilerplate code and to make it simpler to change the LFRic infrastructure classes in the future.
The idea is to declare different classes for the different
concepts. For example NumberOfDofsDataType()
and
NumberOfDofsDataSymbol()
classes are created and these are
subclasses of DataType
and DataSymbol
respectively. In
NumberOfDofsDataType
the intrinsic
and precision
properties are pre-defined, as is the fact that it is a scalar, so
these do not need to be specified. All that is needed to create a
undf
symbol is a name and the function space it represents:
>>> UNDF_W3 = NumberOfUniqueDofsDataSymbol("undf_w3", "w3")
For arrays, (e.g. for FieldData
) the dimensions must also be
provided as a Reference
:
>>> from psyclone.psyir.nodes import Reference
>>> RealFieldDataDataSymbol = LFRicTypes("RealFieldDataDataSymbol")
>>> FIELD1 = RealFieldDataDataSymbol("field1", [Reference(UNDF_W3)], "w3")
At the moment, argument types and values are also not checked e.g. the
function space argument - see issue #926. There is also no consistency
checking between specified function spaces (e.g. that UNDF_W3
is
for the same function space as FIELD1
in the above example) - see
issue #927. Also, the function space attribute would be better if it
were a class, rather than using a string, see issue #934.
Currently entities which can have different intrinsic types
(e.g. FieldData
) are captured as different classes
(RealFieldDataDataSymbol
, IntegerFieldDataDataSymbol
etc). This could be modified if a single class turns out to be
preferable.
- class psyclone.domain.lfric.LFRicTypes(name)[source]
This class implements a singleton that manages LFRic types. Using the ‘call’ interface, you can query the data type for LFRic types, e.g.:
>>> from psyclone.configuration import Config >>> from psyclone.domain.lfric import LFRicTypes >>> config = Config.get() >>> num_dofs_class = LFRicTypes("NumberOfUniqueDofsDataSymbol") >>> my_var = num_dofs_class("my_num_dofs") >>> print(my_var.name) my_num_dofs
It uses the __new__ function to implement the access to the internal dictionary. This is done to minimise the required code for getting a value, e. g. compared with
LFRicTypes.get()("something")
, orLFRicType.get("something")
.
Kernel arguments
At the moment, kernel arguments are generated by the
KernStubArgList
or KernCallArgList
classes. However, whilst
these classes generate the correct number of arguments in the correct
order, they have no knowledge of the datatypes that the arguments
correspond to and how the arguments relate to each other (they just
output strings).
The logic and declaration of kernel variables is handled separately by
the gen_stub
method in LFRicKern
and the gen_code
method in
LFRicInvoke
. In both cases these methods make use of the subclasses
of LFRicCollection
to declare variables.
When using the symbol table in the LFRic PSyIR we naturally capture
arguments and datatypes together. The KernelInterface
class is
aiming to replicate the KernStubArgList
class and makes use of
the LFRic PSyIR. The idea is that the former will replace the latter
when it has the same or more functionality. At the moment, only
methods required to pass the tests have been implemented in
KernelInterface
so there is more to be done, but it is also not
clear what the limitations are for KernStubArgList
.
Eventually the definition of lfric datatypes should be moved to the
LFRic PSyIR, but at the moment there is a lot of information defined
in the LFRicCollection
subclasses. This will need to be addressed
over time.
The GOcean PSyIR
GOcean makes use of algorithm-layer PSyIR specialisations.
Algorithm-layer Classes
The GOcean PSyIR for the Algorithm layer is captured in
domain/common/algorithm/psyir.py
. Two classes are currently
provided, one to capture an invoke call, AlgorithmInvokeCall
and the other to capture (coded) Kernel calls within an invoke
call, KernelFunctor
.
Comments attached to PSyIR Nodes
Since the PSyIR is designed to support source-to-source code generation, it is desirable to keep the output code as readable as possible, and this includes keeping or adding comments to the generated code. Comments are not first-class nodes in the PSyIR because it is an abstract syntax tree and it was preferable to hide the complexity of comment nodes from the PSyIR transformations and other manipulations. Therefore, comments have been implemented as string attributes (one for preceding and another for inline comments) attached to particular nodes. And thus the location of comments on a PSyIR tree will move together with their owning node.
The group of nodes that can contain comments does not have an exclusive common ancestor, so they have been implemented with a Mixin class called CommentableMixin. A node can keep track of comments if it inherits from this class, for example:
From the language-level PSyIR nodes, Container, Routine and Statement have the CommentableMixin trait.