Flex-based lexer hits invalid text range while building file-based index

I'm seeing an issue where my Flex-based lexer is running into an IndexOutOfBoundsException in java.nio.Buffer.checkIndex() while my file-based indices are being built.  It says some random file or files can't be indexed until the IDE is restarted, e.g:

[  57204]  ERROR - napi.project.CacheUpdateRunner - Error while indexing C:\path\to\SomeFile.ext
To reindex this file IDEA has to be restarted
java.lang.IndexOutOfBoundsException
 at java.nio.Buffer.checkIndex(Buffer.java:538)
 at java.nio.CharBuffer.charAt(CharBuffer.java:1238)
 at com.intellij.lang.cacheBuilder.DefaultWordsScanner.stripWords(DefaultWordsScanner.java:107)
 at com.intellij.lang.cacheBuilder.DefaultWordsScanner.processWords(DefaultWordsScanner.java:80)
 at com.intellij.psi.impl.cache.impl.id.IdTableBuilding$WordsScannerFileTypeIdIndexerAdapter.map(IdTableBuilding.java:119)
 at com.intellij.psi.impl.cache.impl.id.IdTableBuilding$WordsScannerFileTypeIdIndexerAdapter.map(IdTableBuilding.java:106)
 at com.intellij.psi.impl.cache.impl.id.IdIndex$4.map(IdIndex.java:85)
 at com.intellij.psi.impl.cache.impl.id.IdIndex$4.map(IdIndex.java:79)
 at com.intellij.util.indexing.MapReduceIndex.update(MapReduceIndex.java:398)
 at com.intellij.util.indexing.FileBasedIndexImpl.updateSingleIndex(FileBasedIndexImpl.java:1743)
 at com.intellij.util.indexing.FileBasedIndexImpl.doIndexFileContent(FileBasedIndexImpl.java:1675)
 at com.intellij.util.indexing.FileBasedIndexImpl.indexFileContent(FileBasedIndexImpl.java:1623)
 at com.intellij.util.indexing.UnindexedFilesUpdater$2.consume(UnindexedFilesUpdater.java:101)
 at com.intellij.util.indexing.UnindexedFilesUpdater$2.consume(UnindexedFilesUpdater.java:97)
 at com.intellij.openapi.project.CacheUpdateRunner$MyRunnable$1.run(CacheUpdateRunner.java:286)
 at com.intellij.openapi.application.impl.ApplicationImpl.runReadAction(ApplicationImpl.java:908)
 at com.intellij.openapi.project.CacheUpdateRunner$MyRunnable$2.run(CacheUpdateRunner.java:305)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl$3.run(ProgressManagerImpl.java:194)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.registerIndicatorAndRun(ProgressManagerImpl.java:281)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.registerIndicatorAndRun(ProgressManagerImpl.java:278)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.executeProcessUnderProgress(ProgressManagerImpl.java:233)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.runProcess(ProgressManagerImpl.java:181)
 at com.intellij.openapi.project.CacheUpdateRunner$MyRunnable.run(CacheUpdateRunner.java:300)
 at com.intellij.openapi.application.impl.ApplicationImpl$8.run(ApplicationImpl.java:406)

which generally occurs with this type of error, though not necessarily always in the same file:

[  57905]  ERROR - napi.project.CacheUpdateRunner - Error while indexing C:\path\to\SomeFile.ext
To reindex this file IDEA has to be restarted
java.lang.Error: com.illuminatedcloud.intellij.lexer._ApexLexer: Error: could not match input
 at com.illuminatedcloud.intellij.lexer._ApexLexer.zzScanError(_ApexLexer.java:1301)
 at com.illuminatedcloud.intellij.lexer._ApexLexer.advance(_ApexLexer.java:2527)
 at com.intellij.lexer.FlexAdapter.locateToken(FlexAdapter.java:95)
 at com.intellij.lexer.FlexAdapter.advance(FlexAdapter.java:76)
 at com.intellij.lang.cacheBuilder.DefaultWordsScanner.processWords(DefaultWordsScanner.java:88)
 at com.intellij.psi.impl.cache.impl.id.IdTableBuilding$WordsScannerFileTypeIdIndexerAdapter.map(IdTableBuilding.java:119)
 at com.intellij.psi.impl.cache.impl.id.IdTableBuilding$WordsScannerFileTypeIdIndexerAdapter.map(IdTableBuilding.java:106)
 at com.intellij.psi.impl.cache.impl.id.IdIndex$4.map(IdIndex.java:85)
 at com.intellij.psi.impl.cache.impl.id.IdIndex$4.map(IdIndex.java:79)
 at com.intellij.util.indexing.MapReduceIndex.update(MapReduceIndex.java:398)
 at com.intellij.util.indexing.FileBasedIndexImpl.updateSingleIndex(FileBasedIndexImpl.java:1743)
 at com.intellij.util.indexing.FileBasedIndexImpl.doIndexFileContent(FileBasedIndexImpl.java:1675)
 at com.intellij.util.indexing.FileBasedIndexImpl.indexFileContent(FileBasedIndexImpl.java:1623)
 at com.intellij.util.indexing.UnindexedFilesUpdater$2.consume(UnindexedFilesUpdater.java:101)
 at com.intellij.util.indexing.UnindexedFilesUpdater$2.consume(UnindexedFilesUpdater.java:97)
 at com.intellij.openapi.project.CacheUpdateRunner$MyRunnable$1.run(CacheUpdateRunner.java:286)
 at com.intellij.openapi.application.impl.ApplicationImpl.runReadAction(ApplicationImpl.java:908)
 at com.intellij.openapi.project.CacheUpdateRunner$MyRunnable$2.run(CacheUpdateRunner.java:305)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl$3.run(ProgressManagerImpl.java:194)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.registerIndicatorAndRun(ProgressManagerImpl.java:281)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.registerIndicatorAndRun(ProgressManagerImpl.java:278)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.executeProcessUnderProgress(ProgressManagerImpl.java:233)
 at com.intellij.openapi.progress.impl.ProgressManagerImpl.runProcess(ProgressManagerImpl.java:181)
 at com.intellij.openapi.project.CacheUpdateRunner$MyRunnable.run(CacheUpdateRunner.java:300)
 at com.intellij.openapi.application.impl.ApplicationImpl$8.run(ApplicationImpl.java:406)

These exact files tokenize and parse just fine otherwise and the PSI looks fine in PsiViewer, so it seems like it's circumstantial, perhaps something that's not thread-safe?

I searched a little bit and it sounds like the latter should never happen unless there's a bug in JFlex.  I'm using the version of JFlex that comes with the Grammar-Kit plugin and am not sure if it's safe for me to upgrade it to the latest-and-greatest in-place.

Any thoughts on what might be going on here and how I might resolve it?  I imagine it's something I'm doing, but I can't figure out what that might be.

Thanks!

15 comments
Comment actions Permalink

Error message "could not match input" means that there's a bug is in your lexer. As docs say lexer must match the entire contents of the file even with syntax errors.
This is guaranteed if the last rule in your *.flex file is

[^] { return BAD_CHARACTER; }

If you have several states in your lexer, list them all

[^] { return BAD_CHARACTER; }
0
Comment actions Permalink

Thanks for the reply, Alexander.  You're correct about the last rule in my lexer.  I guess my question would be why it has this issue only while building file-based indices.  I can open the file otherwise and it lexes and parses without any issues, or at least without any visible/reported issues.  I only have one state in my lexer, the standard YYINITIAL, and the last rule looks like:

<YYINITIAL> [^] { yybegin(YYINITIAL); return BAD_CHARACTER; }

Should I be setting the state to YYINITIAL there?

Thanks again!

UPDATE: Just to be sure, I opened one of the files that produces the lexer error and copy/pasted it into another buffer.  There were no errors.  Doesn't that mean that the lexer is able to tokenize the contents fine?  Like I said, my token set is generally quite simple.

UPDATE 2: Debugging this a little more, I noticed that every time it happens IDEA is trying to update the IdIndex, not any of the custom file-based indices that my plugin includes.  Don't know if that helps at all, but it's curious that it's always a single index that causes this issue.

0
Comment actions Permalink

Try File | Invalidate Caches.
Is your plugin open-source?

0
Comment actions Permalink

Thanks.  Yeah, I've done that several times and all it does is demonstrate the issue clearly as the indices are rebuilt.  Unfortunately my plugin isn't open source as I know that would help you guys provide more concrete guidance.

0
Comment actions Permalink

Does your plugin have a FindUsagesProvider? If it does, how does it implement the getWordsScanner() method?

0
Comment actions Permalink

Yes.  It uses a DefaultWordsScanner against my generated Flex lexer.  I'm passing the token types of my name identifier, block/line comments, and various literals.  I think you're very much onto something here, though, because I changed it to return null like JavaFindUsagesProvider, invalidated the indices, and restarted, and it finished indexing without any of these errors!  And find usages still works properly!

So I'm a bit confused.  It looks like this works without a words scanner, so should I just continue to return null, or should I try to figure out what's wrong with my words scanner (presumably the tokens I'm supplying it)?  I'm going to leave it returning null for now, but I'd appreciate any thoughts on whether I should return a scanner and what I might be doing wrong with the one I was returning.

Thanks so much!

0
Comment actions Permalink

I'll go ahead and mark this as answered in that the original issue is gone, but I still don't know why my default words scanner was causing the issue. At least things seem to work fine when I return a null words scanner, though!

0
Comment actions Permalink

Clarifying this issue and a solution, because I had the same:

- The default FindUsagesProvider implementation uses a static instance of DefaultWordsScanner, which takes your LexerAdapter (and therefore lexer)
- The lexer generated by JFlex is not thread-safe.
- When Intellij does indexing upon startup, it does so on a bunch of threads.

This is what causes the issues, not bugs in the lexer.
The fix for me wasn't to completely disable the FindUsagesProvider, but rather return a new instance of the DefaultWordsScanner each time it is run.  I'm now investigating if it's possible to make the JFlex generated lexer thread safe.

0
Comment actions Permalink

Brent, thanks for the follow-up!  I actually should've updated this thread myself when I discovered the exact same thing a month or two back and solved it the same way (new instance of the scanner for each invocation).  I finally got to the bottom of it when I returned to this issue because some operations that would use this index were slower than they would be were it populated properly, and in the debugger I was able to see multiple threads concurrently accessing it and corrupting the state.  This is probably worth documenting in the tutorial for custom language plugins where this aspect is covered.

0
Comment actions Permalink

Thanks Brent,

I ran into the same issue with my plugin.  And was able to fix it by creating a new DefaultWordsScanner for each call to my implementation of FindUsagesProvider.getWordsScanner().

0
Comment actions Permalink

FTR this has been explicitly described in the javadoc as well

/**
* Gets the word scanner for building a word index for the specified language.
* Note that the implementation MUST be thread-safe, otherwise you should return a new instance of your scanner
* (that can be recommended as a best practice).
*
* @return the word scanner implementation, or null if {@link com.intellij.lang.cacheBuilder.SimpleWordsScanner} is OK.
*/
0
Comment actions Permalink

I think the source of the issue for myself (and probably others) is that if you use/follow the Simple plugin example you are probably unaware of that detail.  I believe the way it's implemented there is not thread safe.  Maybe an update to the tutorial and source is in order?

0
Comment actions Permalink

Brent, you're right. It has been updated a few months ago actually https://github.com/JetBrains/intellij-sdk-docs/commit/568ea64319465d63c3f2a88608e0b3da95cbe27e#diff-90ac7a9fd72c984d13bb037807f0f983 so hopefully no-one will run into the same problem again.

0
Comment actions Permalink

It looks like the tutorial is still out of date:
http://www.jetbrains.org/intellij/sdk/docs/tutorials/custom_language_support/find_usages_provider.html

That's what I based my implementation on.

0
Comment actions Permalink

Good catch, will fix it. Thanks!


0

Please sign in to leave a comment.