Caching Lexers
The root node in the PSI parse tree is IFile
, but the implementation of this interface should also implement IFileImpl
. This exposes properties and methods that are important for the implementation of IFile
, one of which is TokenBuffer
.
The TokenBuffer
property is an optional cache of the tokens in a file. If not-null
, it contains the start and end offset, lexer state and type of all of the tokens in the file. The constructor will take in an ILexer
, and it will immediately scan the whole file and store the processed tokens.
The CachingLexer
class implements ILexer
on top of a TokenBuffer
, using the tokens from this cache. Under typical use, this offers little benefit over simply using the lexer directly - a lexer is usually efficiently implemented with a series of lookup tables. The benefit comes with incremental parsing.
When a user edits a file, the PSI tree needs to be updated. To reduce the impact of this, an incremental parser will only parse the range of the file that has changed, and update the corresponding sub-tree in the PSI. For example, a change inside a C# method body will only re-parse that method body, and not the class definition or other methods in the file.
Of course, if part of a file has changed and needs to be re-parsed, the underlying tokens need to be re-lexed, too. The TokenBuffer.Rescan
method will create a new lexer and re-scan the file, returning a new instance of TokenBuffer
with an updated cache of tokens. An incremental parser will use a CachedLexer
that uses this TokenBuffer
to re-parse the affected region.
The Rescan
method will try to optimise re-scanning the file and creating a new buffer of tokens. If the underlying lexer implements IIncrementalLexer
, it will first copy the unchanged tokens at the start of the buffer, and then call IIncrementalLexer.Start
to restart the lexer at the offset of the change, avoiding processing the unchanged part at the start of the file. This requires passing in the offset and the state of the lexer at that location, as returned by ILexerEx.LexerStateEx
during the initial build of the token buffer cache, and stored in TokenBuffer
. This changed section is then copied into the new TokenBuffer
. The TokenBuffer
will attempt to re-synchronise with the existing token buffer, and if possible, copy the tail end of the tokens into the new buffer. This means only the changed portion of the file is re-scanned, and the tokens at the start and end of the file are reused.
If the underlying lexer does not implement IIncrementalLexer
, the Rescan
method will start the lexer from the start of the file, and re-lex the entire file, from the beginning.
If a parser supports incremental parsing, it should use an instance of CachingLexer
as its lexer - the ToCachingLexer
extension method on ILexer
will create this for you.
More detail is provided in the section on incremental parsing.