How StubIndex data being used?

FileBasedIndex:indexFileContent() is sort of core method which computes index information for given file content. Intellij computes stub index, ... for .java files. Attached sample stub tree for .java file being indexed. These stub tree information is stored in a hashmap with # of filename as key and contents of stub tree as value. IdIndex is sort of hashmap with key as '#of keyword/string' and value as #of file. How does intellij searches the stub index for a given keyword/string? Does intellij gets the list of files in which the word occurred using IdIndex and later uses stubtree to gather further information?

Any pointers related to indexing are most welcome.


1 comment
Comment actions Permalink

Hi Chandra,
In general you get the idea right. Just several clarifications:
1. # of filename is actually a file ID from the VFS. It is guaranteed to be unique while the file is present in VFS. File ids are heavily used in indices and other subsystems for file referencing.
2. There a lot of indices each of them is an extension plugged into the FileBasedIndex component. There is another (higher-level) index named "StubIndex" which is not a FileBasedIndex, but instead is based on  StubUpdatingIndex. StubUpdatingIndex is one of a file-based indices which builds a StubTree for each file that may have a PSI tree associated with. A StubTree is a rough model of a PSI tree, which is easier and faster to build and which contains only general and often-needed information about the file. Thus for querying the class name we don't need to build a full-featured PSI tree and can use a stub instead which takes less time to build and which occupies less memory. When IDEA needs more sophisticated analysis for which more data needed, it lazily substitutes the stub with a full-featured PSI tree. This conversion is done transparently for the client of a PSI interface so they don't have to bother about it. The client sees only PSI interface which may be backed up either by a PsiStub tree or a real PSI tree. So StubIndex is an index that a client may build on a stub tree to make common operations on that tree faster. A typical StubIndex contains element offsets over a serialized form of a stub tree. Several orthogonal stub indices can be built over one stub tree.
3. IdIndex is a file-based index that is a word index in a text file. IDEA uses it to pre-process a number of files in search operations as you correctly mentioned. When IDEA need to find an identifier, it first looks up ID index to find IDs of the files with the occurrences of this word. Then it uses PSI interface for fine-grained filtering taking syntax and semantic information into consideration.


Please sign in to leave a comment.