Parsing
Parsing or "syntactic analysis" is the process of taking an input stream of tokens and producing a parse tree. This parse tree is used by ReSharper to build a semantic model, with rich information about types, and also used to navigate, analyse and manipulate the code. For example, the unit testing functionality is implemented by walking the parse tree of a file and looking for certain constructs, such as class and method decorations with specific attributes. Similarly, refactoring is the act of rewriting the parse tree, which has the effect of rewriting the underlying text file.
A parse tree built by a ReSharper language is sometimes known as a PSI tree, named for the Program Structure Interface subsystem that is responsible for building and maintaining such trees. It is also commonly referred to as an abstract syntax tree, but it is more correct to call it a concrete parse tree. A concrete parse tree is an accurate model of the syntax in a file, while an abstract syntax tree can be simplified. See this page for a good description of the differences.
All nodes in a ReSharper parse tree implement a common set of interfaces - namely ITreeNode
, with a root node also implementing IFile
. ReSharper provides common functionality to navigate and manipulate these trees, without requiring any knowledge of the language of the underlying file. The interfaces describe the relationship between the nodes, and also provide useful information and services, such as if a node has outgoing references, which are used for Go to Definition, Find Usages and the Rename refactoring.
A custom language can create a parser that will take a lexer as input. It will analyse the tokens produced by the lexer and return a root IFile
tree node. This node represents the whole of the file, and will have children that represent the structural and semantic constructs of the file (such as class declarations, method declarations, method bodies, statements, expressions, etc.).
The parser can be created either by hand, using the PSI's TreeBuilder
helper classes, or by using the PsiGen SDK tool to generate a parser from a custom .psi
file format that describes the grammar. When generating a parser, the PsiGen tool will also generate the classes and interfaces that make up the tree. While it is possible to create these classes by hand, it is usually quicker and more accurate to describe a grammar in a .psi
file and get the tool to generate them, even if the parser is created by hand. Generating a parser by hand is only recommended if the grammar is difficult to implement in the .psi
file format, or if the grammar is to be reused and extended, by another language. For example, the TypeScript grammar reuses parts of the JavaScript parser, which makes it a good candidate for writing by hand - the generated parser cannot be extended and reused in this manner. Despite having different implementations, both methods will create efficient parsers, which is required, as a parse tree needs to be updated as a user edits the file.