When will Intellij be usable again for Spark Build/Compilation?

I have struggled to obtain a working Spark 1.x build/debugging environment in IJ (latest Ultimate 13.1.3).   From the command line sbt compile | package | assembly are all fine.  But in IJ a few compilation errors are persistent and unable to be resolved.

Finally I noticed in the spark-dev mailing list that the AmpLab folks have recognized this and they do not trust/use the Intellij parser/compiler for Spark 1.X.

Please guys, try to make this work. Spark is maybe the highest profile Scala project. Try to get this  working. I am open to either Maven or sbt.Just let us know which one to use.

Here is the info taken from the spark-dev mailing list archives, from Reynold Xin

From Reynold Xin <r...@databricks.com>
Subject Re: IntelliJ IDEA cannot compile TreeNode.scala
Date Fri, 27 Jun 2014 03:57:12 GMT
IntelliJ parser/analyzer/compiler behaves differently from Scala compiler,
and sometimes lead to inconsistent behavior. This is one of the case.

In general while we use IntelliJ, we don't use it to build stuff. I
personally always build in command line with sbt or Maven.

BTW the current issue is as follows - and it has been corroborated on the spark dev mailing-list by other developers.. But what I am looking for is not just to resolve the following issue but to really have this work end to end.

I followed the directions in the bug - to delete the mesos-18.1.jar But the following errors now happen. Note this error has also been reported in the spark dev mailing list - so the following is just corroborating what others have already noted.

while compiling: C:\apps\incubator-spark\sql\core\src\main\scala\org\apache\spark\sql\test\TestSQLContext.scala
during phase: jvm
library version: version 2.10.4
compiler version: version 2.10.4
reconstructed args: -classpath <long classpath . -bootclasspath C:\apps\jdk1.7.0_51\jre\lib\resources.jar;C:\apps\jdk1.7.0_51\jre\lib\rt.jar;C:\apps\jdk1.7.0_51\jre\lib\sunrsasign.jar;C:\apps\jdk1.7.0_51\jre\lib\jsse.jar;C:\apps\jdk1.7.0_51\jre\lib\jce.jar;C:\apps\jdk1.7.0_51\jre\lib\charsets.jar;C:\apps\jdk1.7.0_51\jre\lib\jfr.jar;C:\apps\jdk1.7.0_51\jre\classes;C:\Users\s80035683\.m2\repository\org\scala-lang\scala-library\2.10.4\scala-library-2.10.4.jar -deprecation -feature -unchecked -language:postfixOps
last tree to typer: Literal(Constant(org.apache.spark.sql.catalyst.types.PrimitiveType))
symbol: null
symbol definition: null
tpe: Class(classOf[org.apache.spark.sql.catalyst.types.PrimitiveType])
symbol owners:
context owners: object TestSQLContext -> package test
== Enclosing template or block ==
Template( // val <local TestSQLContext>: <notype> in object TestSQLContext, tree.tpe=org.apache.spark.sql.test.TestSQLContext.type
"org.apache.spark.sql.SQLContext" // parents
// 2 statements
DefDef( // private def readResolve(): Object in object TestSQLContext
<method> private <synthetic>
<tpt> // tree.tpe=Object
test.this."TestSQLContext" // object TestSQLContext in package test, tree.tpe=org.apache.spark.sql.test.TestSQLContext.type
DefDef( // def <init>(): org.apache.spark.sql.test.TestSQLContext.type in object TestSQLContext
<tpt> // tree.tpe=org.apache.spark.sql.test.TestSQLContext.type
Block( // tree.tpe=Unit
Apply( // def <init>(sparkContext: org.apache.spark.SparkContext): org.apache.spark.sql.SQLContext in class SQLContext, tree.tpe=org.apache.spark.sql.SQLContext
TestSQLContext.super."<init>" // def <init>(sparkContext: org.apache.spark.SparkContext): org.apache.spark.sql.SQLContext in class SQLContext, tree.tpe=(sparkContext: org.apache.spark.SparkContext)org.apache.spark.sql.SQLContext
Apply( // def <init>(master: String,appName: String,conf: org.apache.spark.SparkConf): org.apache.spark.SparkContext in class SparkContext, tree.tpe=org.apache.spark.SparkContext
new org.apache.spark.SparkContext."<init>" // def <init>(master: String,appName: String,conf: org.apache.spark.SparkConf): org.apache.spark.SparkContext in class SparkContext, tree.tpe=(master: String, appName: String, conf: org.apache.spark.SparkConf)org.apache.spark.SparkContext
// 3 arguments
Apply( // def <init>(): org.apache.spark.SparkConf in class SparkConf, tree.tpe=org.apache.spark.SparkConf
new org.apache.spark.SparkConf."<init>" // def <init>(): org.apache.spark.SparkConf in class SparkConf, tree.tpe=()org.apache.spark.SparkConf
== Expanded type of tree ==
value = Constant(org.apache.spark.sql.catalyst.types.PrimitiveType)
uncaught exception during compilation: java.lang.AssertionError

Comment actions Permalink

Hi Stephen!

Actually, IDEA can rely on a bundled SBT compiler for project compilation (A new way to compile).

Recently, we switched the default compiler to IDEA's own incremental builder to reduce the amount of recompiled files and to increase the performance (Try Faster Scala Compiler in IntelliJ IDEA 13.0.2), but the SBT compiler is still available via "Settings / Scala / Incremental compilation by: SBT incremental compiler".

As for type inference and error highlighting, they're indeed implemented independently from the Scala compiler, and sometimes there may be some glitches because of that. Yet presicely in that distinction lays the power, because it allows as to implement sophisticated analysis and transformation of Scala code. We're on the way to completely fixing the related bugs (Heading to the Perfect Scala Code Analysis).

Please provide specific examples of projects/code so we can reproduce (and then fix) the problems.

Comment actions Permalink

Hi Pavel,
  Thanks for responding!

The specific project is: spark master (which happens to be 1.1.0-SNAPSHOT presently).  Plain and simple.

       git clone https://github.com/apache/spark.git

I am completely open to any way you suggest to be able to :

    1. Build spark-master from IJ
    2. Run one of the Spark examples  from spark-master build. (that do not require hdfs) within IJ.  My OP showed the following:

Class: org.apache.spark.examples.mllib.BinaryClassification

VM options:  -Dspark.master=local[2] -Dhadoop.home.dir=c:\hadoop

Program Arguments: --algorithm LR --regType L2 --regParam 1.0   data/mllib/sample_binary_classification_data.txt

3.  Build and run a custom spark program using either a small pom.xml or build.sbt.  

Here is a sample build.sbt that should be sufficient

import sbt._

import sbt.Keys._

import AssemblyKeys._

import NativePackagerKeys._

name := "hspark"

version      := "0.1.0-SNAPSHOT"

val sparkVersion = "1.0.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(

  ("org.apache.spark" % "spark-core_2.10" % sparkVersion).excludeAll(ExclusionRule("org.mortbay.jetty")),

  "org.apache.spark" % "spark-sql_2.10" % sparkVersion


resolvers  ++= Seq(

  "Apache repo" at "https://repository.apache.org/content/repositories/releases",

  "cloudera" at "https://repository.cloudera.com/artifactory/cloudera-repos",

  "Local Repo" at Path.userHome.asFile.toURI.toURL + "/.m2/repository",



And here is a tiny spark program to build/debug

import org.apache.log4j.Logger

import org.apache.spark.{SparkContext, SparkConf}


* SimpleRdd

* Created by sboesch on 6/26/2014.


object SimpleRdd {

  val logger = Logger.getLogger(getClass.getName)

  def main(args: Array[String]) {

    val sc = new SparkContext(args(0), "SimpleRDD")

    val rdd = sc.parallelize( Array.tabulate(200000)(ix => "Val-%s".format(ix)), 4)

    logger.info(s"Rdd count=${rdd.count}")

    logger.info(s"Rdd take 100=${rdd.take(100).mkString(",")}")




Please sign in to leave a comment.