Token node types
Each leaf element in a PSI tree has a node type that is a singleton instance of a class that derives from the TokenNodeType
base class, and implements the ITokenNodeType
interface. This singleton instance is the token produced by the lexer.
Each language must provide at least one class that derives from TokenNodeType
. Typically, this will provide default values for the simple abstract properties. These values can be constants that are overridden in further derived classes (that represent whitespace, comments, etc.) or can be implemented directly by comparing this
to known singleton values for comments, whitespace, etc.
Like NodeType
, the constructor takes in a string identifier and a unique index for the node type, and passes them to the base class. The identifier is only used internally, for diagnostics and testing, and should be simple enough to identify the node type, e.g. "NEW_LINE"
, "COMMENT"
, "IDENTIFIER"
, etc.
The TokenRepresentation
property is a value that is usually passed into the constructor of a derived class, and provides a more human readable representation of the token than the identifier value passed to the constructor of TokenNodeType
. For example, the identifier might be "ANDAND"
, while the representation would be "&&"
, or "RETURN_KEYWORD"
and "return"
. This is used by the GetSampleText
and GetDescription
methods.
The GetSampleText
method is called by a language's formatter implementation, to see if two token types require whitespace to separate them (by building a string with the two pieces of sample text and attempting to lex it. If it succeeds, no whitespace is necessary). By default, this is the TokenRepresentation
.
The GetDescription
method returns a human readable description of the token for use in parse errors. Typically, this is the TokenRepresentation
abstract method, but if this is null
or empty, it returns the string identifier passed to the constructor (e.g. "IDENTIFIER"
, "NEW_LINE"
).
The class also provides two Create
methods. One which takes a string, and another that takes a string buffer and a start and end offset. These methods are used by the parser to create leaf elements for the tree - LeafElementBase
provides an abstract implementation of various parts of ITreeNode
.
Token node type hierarchies
A custom language must derive from TokenNodeType
and provide at least this implementation.
Typically, a language will also create further derived classes to represent whitespace, comments, keywords, identifiers, and so on, essentially creating a derived class for each value of the abstract properties of ITokenNodeType
.
This can be enough for some languages, while others require a more detailed hierarchy. Several languages implemented by ReSharper also create a FixedTokenNodeType
, which becomes a base class for tokens that have a fixed (and therefore also fixed-length) representation, such as operators or keywords. This is opposed to whitespace, comments and identifiers, which don't have a fixed, or fixed-length, representation. Typically, this FixedTokenNodeType
does not add any functionality over TokenNodeType
, but simply acts as a base class. Sometimes it will provide a default implementation of Create
that returns a generic implementation of ITreeNode
that will work for all fixed length tokens.
Similarly, the KeywordTokenNodeType
derives from FixedTokenNodeType
, and is used for keywords. Again, there is usually no extra functionality, other than to return true
from ITokenNodeType.IsKeyword
.
Sometimes, the language will also define a GenericTokenNodeType
, that is used as a base class for FixedTokenNodeType
, and to provide a token node type that is neither an identifier, or whitespace, comment, etc. For example, it can be used to create literals, e.g. new GenericTokenNodeType("INTEGER_LITERAL", 1176, "000")
or new GenericTokenNodeType("CHARACTER_LITERAL", 1178, "'c'")
, where the numbers are the unique index of the node type, and the string is the token representation, used in GetSampleText
and GetDescription
.