Creating Node Types
A custom language requires many NodeType
classes and instances to be created. After all, every distinct type of node in a PSI tree needs a distinct NodeType
instance. While it is possible to create these classes and instances by hand, it is more manageable to use tooling to generate the code.
The SDK provides the TokenGenerator
tool to convert an XML file listing tokens and keywords into TokenNodeType
classes and static singleton instances. The PsiGen
tool is used to create a parser from a .psi
file, but it also creates ITreeNode
classes and CompositeNodeType
classes and instances.
The output of the lexer is a stream of singleton instances of TokenNodeType
derived classes. The parser doesn't need to know the actual class of the token node type, it only needs to compare it against a known singleton value, and call a known method - TokenNodeType.Create
. The same is true for interior tree nodes - the class of the CompositeNodeType
is irrelevant, instead, the known singleton value is used to call CompositeNodeType.Create
to create the interior tree node.
As such, the usual structure when creating node types is to create them as private nested classes inside a "token type" or "element type" class, and create public static fields that expose the singleton instance.
Creating token node types
For example, the C# language defines the CSharpTokenType
class. This is not to be confused with the CSharpTokenNodeType
, which is the class derived from TokenNodeType
, and acts as the base class for all C# token node types. Instead, the CSharpTokenType
class contains a number of private class definitions - CSharpTokenNodeType
, WhitespaceNodeType
, NewLineNodeType
and so on. It also contains public static fields of type TokenNodeType
, such as WHITE_SPACE
, NEW_LINE
, END_OF_LINE_COMMENT
and so on (the capitals betray the Java heritage, and ReSharper's lineage from IntelliJ).
The unique index for the node type is passed into the constructor, and is based on the LAST_GENERATED_TOKEN_TYPE_INDEX
value, which in turn is generated by the TokenGenerator
SDK tool.
The whitespace, new line and comment token node types are specific classes, and only need the index, while the integer, float and character literal token node types are instances of GenericTokenNodeType
, and require a name, index and representation.
Not shown in the sample above is that the node type classes all implement the Create
method, and return an instance of an ITreeNode
, or more specifically, a class that derives from LeafElementBase
. This is covered in more detail in the section on creating tree nodes.
Also note that the CSharpTokenType
class is a partial
class. This allows other token node types and instances to be created in other files. Typically, a custom language will define the base token node types by hand - CSharpTokenNodeType
, WhitespaceTokenNodeType
, IdentifierTokenNodeType
and also FixedTokenNodeType
, KeywordTokenNodeType
and GenericTokenNodeType
. However, fixed(-length) tokens and keywords are usually generated by TokenGenerator
.
Generating token node types
The TokenGenerator
SDK tool takes an input XML file and creates a C# file that contains the "token type" class, declared as partial
, and defines classes and instances for each of the fixed tokens and keywords in the file. It will also create the LAST_GENERATED_TOKEN_TYPE_INDEX
value seen above). It also generates the ITreeNode
classes for each token.
The format of the XML file is very simple. For example, consider tokens for a language called "Foo". The XML file is called tokens.xml
or FooTokenType.Tokens.xml
or something similar:
The attributes to the root Tokens
element are as follows:
TokenTypeNamespace
- the namespace of the "token type" class that will be generated. Should match the manually written "token type" class.TokenTypeClass
- the name of the "token type" class that will hold private token node type class definitions.BaseTokenNodeTypeIndex
- the initial value used for the index of each token node type. This value needs to be unique across languages, but multiple languages can reuse the same value. However, if there is any chance of token node types being reused across languages (e.g. with languages that extend other languages, such as TypeScript and JavaScript), then care should be taken that these numbers do not clash. If this value isn't specified, the default value is 1000.KeywordNodeType
- the base class to use when generating keyword token node types.KeywordTokenElement
- the base class used when generating theITreeNode
for this node type, which is returned from the token node type'sCreate
method. See the section on creating tree nodes for more details. Typically, this is a manually created class calledFixedTokenElement
(there is no need for aKeywordTokenElement
).TokenNodeType
- the base class to use when generating a fixed token, such as an operator or other punctuation. Typically a manually created class calledFixedTokenNodeType
.TokenTokenElement
- the base class used when generating theITreeNode
for this node type, which is returned from the token node type'sCreate
method. See the section on creating tree nodes for more details. Typically, this is a manually created class calledFixedTokenElement
.Dynamic
- defaults tofalse
. Iftrue
, theITreeNode
that is created is passed the text of the token, such as an identifier's name. This isn't usually needed, as the tokens (operators, punctuation and keywords) are usually fixed.
The child elements of Tokens
are either Token
or Keyword
. A Token
element will generate a class that derives from FixedTokenNodeType
, while Keyword
will generate a class that derives from KeywordTokenNodeType
(or whatever names were specified in the XML file). Both elements take the same attributes:
name
- the name of the token or keyword node type. This is typically specified in all-caps, such asABSTRACT_KEYWORD
, and normalised, by converting to camel case, such asAbstractKeyword
. The all-caps version is used as the name of the singleton instance.title
- if specified, is used as the identifier passed to the base class, and used to construct the name of the node type and token element classes. If not specified, then the normalisedname
is used instead.representation
- the value passed toTokenNodeType.TokenRepresentation
.filtered
- defaults tofalse
. Iftrue
, adds an implementation ofTokenNodeType.IsFiltered
, which returnstrue
. This is usually only used by whitespace and comments, which tend to have their own hand written token node type implementation. However, it can be used by any insignificant syntax tokens.
Using the TokenGenerator
In order to invoke the TokenGenerator
tool on the XML file, the XML file's Build Action needs to be specified in Visual Studio's Properties pane. Select the XML file, open the Properties pane, and set the Build Action to TokenGenerator
.
The ReSharper SDK sets up the build process to run the TokenGenerator
during a compile. However, it requires the output file to be added to the MSBuild file. Open the .csproj
and find the line for the tokens.xml
file, and change it to something like:
Where the path in OutputFile
is the same as the path in the Include
statement.
If the OutputFile
isn't specified, the tokens aren't generated, and the build will fail. If the token node types are generated correctly, they are automatically added into the list of files to be compiled, and the build will be successful. However, it doesn't get added to the project, which can lead to unresolved errors as ReSharper won't be able to find the class definitions. It is recommended to add the generated file to the project. It can be safely excluded from source control, should you wish - the file will be rebuilt at the next compile.
Creating composite node types
The same basic pattern is followed for composite node types - create an owning type, usually called ElementType
, and create private nested classes that derive from CompositeNodeType
. Finally, create a public static field of type CompositeNodeType
that exposes the singleton instance.
There are a couple of minor differences to how token node types are created. Typically, the composite node types are created by the psiGen
parser generator SDK tool, and this creates a class called ElementType
rather than including the language name, such as CSharpTokenType
.
Also, the names of the composite node type classes are created from the rules in the .psi
grammar, by converting "CamelCase" to "SHOUTING_SNAKE_CASE", and adding _INTERNAL
. So a rule such as colorProfileBlock
creates a private nested composite node class called COLOR_PROFILE_BLOCK_INTERNAL
, which is then exposed as a public static field instance called COLOR_PROFILE_BLOCK
.
This psiGen
tool is described in more detail in the section on parsing.