com.intellij.util.diff.DiffTree (inconsistent?) behavior

Hi,

I am trying to improve performance of the standard properties editor to handle huge properties files (tens of thousands lines) which we have. The problem is that if you start editing such a big properties file, you get eventual lock ups (up to 20-30 seconds). After spending some time debugging/profiling, I tracked the issue to the Psi trees diffing algorithm. It looks like the problem is that when you add property at the end, the algorithm ends up generating too many diff's.

Looking at the DiffTree#compareLevel I noticed one subtle fact in the implementation. First it tries to go backwards (from the end to the beginning of the tree). Therefore, when it compares two PropertiesElementTypes#PROPERTIES_LIST's, it gets at the position where I added a new property. Starting from that, it enters some sort of "one off" state thinking that all properties before an added one are changed. For example, given the file:

test.properties
A =
B =
C =
D =

if I add property after property C:

A =
B =
C =
newProp
D =


the algorithm compares C to newProp, then B to C, then A to B, etc, up to the beginning of the file. Given the huge size of the properties file, it ends up with about ~20k of changes causing a major slowdown.


If I make this algorithm to leave first loop (one, that goes backwards) after encountering added property (by providing my own comparator that returns ThreeState.NO if property name does not match), it goes into the second loop (one that goes from the beginning to the end). Now the key part is that the second loop is actually smart enough to detect this "one off" difference between old Psi tree and new Psi tree and generates reasonably small amount of differences.

So, the question is, shouldn't the first loop try to match nodes using the same way as the second one?


P.S. With my custom comparator put into PsiBuilderImpl, the properties editor becomes responsive again -- no lockups. I also converted parser to use LighterASTNode, but given that I wasn't able to find any documentation telling the difference between ASTNodes and LighterASTNodes, I don't know if that makes difference, too.

3 comments

Here are the changes I made: https://github.com/idubrov/intellij-community/commit/0358cee2fe2cd813f884d135316c815409f221a5

I didn't make any benchmarking yet, but quick testing shows significant performance improvement.

0

Sounds good. Could you please create a request in the http://youtrack.jetbrains.com/issue/ ?


Alexey Kudravtsev
Intellij IDEA  developer
JetBrains, Inc
http://www.jetbrains.com
"Develop with  pleasure!"
0

After giving it some thought, I think, it is not fault of the DiffTree algorithm. The issue is caused by the fact that all "properties" look the same from the point of view of the DiffTree. Comparison of any two different properties would return "DRILL_DOWN", which will cause generation of two "diffs" (name change and value change). My "solution" is the best that I could think of -- treat two properties strictly different if their names are different. However, that would cause any change to the existing property to generate two diffs, one "insert" and one "delete". Probably, you guys know better how to fix this -- I still somewhat lost in the world of AST nodes, lighter nodes, Psi trees, Stubs, etc.

I've reported an issue: http://youtrack.jetbrains.com/issue/IDEA-101698

0

Please sign in to leave a comment.