A less locking version of StubIndexImpl.processElements

Permanently deleted user

Created February 25, 2015 21:34

I noticed that when I look for methods with a common name in a large project, a lot of time is spent in StubIndexImpl.processElements. This wouldn't be a problem per se, but the functions holds an index lock while iterating over the possible files.

Some notes because the questions will come up:

The reason the process is slow is not the work that is done for each element -- I can use an empty processor, but it still takes a long time (a long time being up to a second when searching for very common function names such as "get"). Because the function holds an index lock, and because apparently some other part of the UI waits for it, this makes the UI unacceptably laggy.
I am holding a read lock when I call this function. I am happy to relinquish that lock at any time. In fact, I already do that in other places, and I have an ApplicationListener that will let me know whenever someone wants to start a write action. If that happens, I will give up my read lock and pause my own processing until the write action is over. I will then reacquire the read lock and continue. This works great, but I cannot relinquish my read lock while I am inside of processElements -- that will quickly lead to a deadlock as the write action tries to acquire an index lock (which is still held by processElements).

I would like to have (write) a function equivalent to processElement with lesser consistency guarantees which holds the lock for a much shorter period of time. I imagine this working like this:

get the lock
make a deep copy of the ValueContainer returned by index.getData(key) using container.foreach, and call that containerCopy
release the lock
apply the container.foreach loop that does the actual processing to containerCopy, but use the version of myStubProcessingHelper.processStubsInFile that ignores index inconsistencies: processStubsInFile(... skipOnErrors=true)

This should (silently) ignore inconsistencies, and may miss elements in files that have been updated after we released the lock (and before we process the file). I'd be very happy with such an approximate set of values, if it means that the UI reliably doesn't lock up.

I cannot do this in my own code because myIndices is private, and I need to call myIndices.get(indexKey).getData(key). If StubIndex(Impl) had a public or protected function computing step 2 above, a deep copy of myIndices.get(indexKey).getData(key), that would also make me happy.

Before I go down that path, is there a better way to get what I want? I'm happy to make a patch, it doesn't look hard, but I may not be aware of some subtleties relating to indexing.

Thanks,
Martin

4 comments

Maxim Mossienko

Created February 26, 2015 14:36

Martin,
It would be nice to see sample code of the problem before going to possible solutions: generally speaking adding ability to read inconsistent will increase existing consistency issues with stubs we already having.
Lets focus first on the common problem understanding. AFAIK index.getData(key) is a few disk reads and real performance problem is that we read stub data / instantiate stub tree for each file (one random IO per file with symbol) for processor to work. Given that for your task (as I understand it) it would be enough for StubIndexImpl to get set of file ids for particular key (similar to FileBasedIndex's

getContainingFiles) and use them in IdFilter of processElements

The locking scope would be decreased then. Thoughts ?
Thanks, Maksim

Permanently deleted user

Created February 26, 2015 22:07

Maksim,

If I understand the code correctly, the ValueContainer.foreach loop in processElements already searches only those files that have values in it. But because all of that happens with a lock in hand, it looks atomic to the dispatch thread. I would like to first get the list of files (or file ids), which as you say would be fast, and then search the files one by one (searching each file wrapped in a small read action, but not a large one around searching all of them).

So what you propose would do exactly the right thing, if I could get a set of files (or ids) from StubIndex, I can then use those in later calls to processElement's idfilter argument. There would be some duplicate work, but that should be no problem.

I see that FileBasedIndex has exactly that function (getContainingFiles), but I need to access to the index JavaStubIndexKeys.METHODS, and I don't think that's available from FileBasedIndex (maybe I'm wrong? Is the methods short name index also a FileBasedIndex?).

It would be fairly straightforward to implement the equivalent of getContainingFiles for StubIndex, is that what you're proposing?

Thank you,
Martin

Maxim Mossienko

Created March 03, 2015 12:16

Martin,
Implementation of getContainingFiles for StubIndex (similar to getContainingFiles for FileBasedIndex) would be good start for your purpose.
For extra performance I would not return VirtualFile[] but some FileId iterator in both cases, hopefully I will do it in trunk for Idea 15 (https://youtrack.jetbrains.com/issue/IDEA-137196)
Regards, Maxim

Permanently deleted user

Created March 04, 2015 18:02

I wrote a pull request for this: https://github.com/JetBrains/intellij-community/pull/248

I don't think it's complete, and I'm happy to make it complete, but I don't know what your preferences are:

Should the getContainingIds appear in the interface as well, or only in the implementation?
Should there be both getContainingIds and getContainingFiles (getContainingFiles would only be a small wrapper around getContainingIds anyway)
Should getContainingIds return ValueContainer.IntIterator? I know you mentioned this, but if so, that iterator would simply wrap the Set<Integer> I return now (since generating the ids happens inside a lock, they need to be stored somewhere before they can be returned).
Should I also make a parallel getContainingIds in FileBasedIndex? The two interfaces won't be exactly the same, but they could be close.

If you let me know I can fix these issues to best fit into the existing code base.

We can move this discussion to the pull request, whatever is more convenient.

--
Martin

Please sign in to leave a comment.