How to setup Intellij 14 Scala Worksheet to run Spark

I'm trying to create a SparkContext in an Intellij 14 Scala Worksheet.
here are my dependencies

name := "LearnSpark"

version := "1.0"
scalaVersion := "2.11.7"
// for working with Spark API
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0"


Here is the code i run in the worksheet


import org.apache.spark.{SparkContext, SparkConf}
val conf = new SparkConf().setMaster("local").setAppName("spark-play")
val sc = new SparkContext(conf)


error


15/08/24 14:01:59 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException: rg.apache.spark.rpc.akka.AkkaRpcEnvFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)


When I run Spark as standalone app it works fine. For example

Can someone provide some guidance on where I should look to resolve this exception?

Thanks.

3 comments

I also have this question and am running into a similar error.

I managed to somehow get worksheets working on my desktop with SparkContext, but I have no idea how.  This was great for some modeling, but the SparkContext didn't handle saveAsTextFile correctly on my Windows system.  

I'm now trying to get it working on IntelliJ installed on a development Hadoop cluster and I'm running into the same issue with akka on a fresh Spark project.  Can any one provide a list of steps to follow that will get past this error both in the Scala worksheet and also in the Scala console?

I've spent quite a bit of time over the last month searching for answers to this topic, and it seems to remain unanswered. This appears to be an IntelliJ-specific issue.

Thanks for your help.

Below is a code snippet:

import org.apache.spark.{SparkConf, SparkContext}
val sConf = new SparkConf()
sConf.setMaster("localhost")
sConf.setAppName("test1")
val sc = new SparkContext(sConf)

My build.sbt is:

name := "Spark-Test3"
version := "1.0"
scalaVersion := "2.11.8"

The error I receive is:

16/03/16 10:03:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(webbd); users with modify permissions: Set(webbd)
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(test1.sc0.tmp:111)
at com.typesafe.config.impl.SimpleConfig.find(test1.sc0.tmp:132)
at com.typesafe.config.impl.SimpleConfig.find(test1.sc0.tmp:138)
at com.typesafe.config.impl.SimpleConfig.find(test1.sc0.tmp:146)
at com.typesafe.config.impl.SimpleConfig.find(test1.sc0.tmp:151)
at com.typesafe.config.impl.SimpleConfig.getString(test1.sc0.tmp:193)
at akka.actor.ActorSystem$Settings.<init>(test1.sc0.tmp:132)
at akka.actor.ActorSystemImpl.<init>(test1.sc0.tmp:466)
at akka.actor.ActorSystem$.apply(test1.sc0.tmp:107)
at akka.actor.ActorSystem$.apply(test1.sc0.tmp:100)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(test1.sc0.tmp:117)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(test1.sc0.tmp:50)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(test1.sc0.tmp:49)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(test1.sc0.tmp:1500)
at scala.collection.immutable.Range.foreach$mVc$sp(test1.sc0.tmp:137)
at org.apache.spark.util.Utils$.startServiceOnPort(test1.sc0.tmp:1491)
at org.apache.spark.util.AkkaUtils$.createActorSystem(test1.sc0.tmp:52)
at org.apache.spark.SparkEnv$.create(test1.sc0.tmp:149)
at org.apache.spark.SparkContext.<init>(test1.sc0.tmp:200)
at Spark_Test.A$A29$A$A29.sc$lzycompute(test1.sc0.tmp:5)
at Spark_Test.A$A29$A$A29.sc(test1.sc0.tmp:5)
at #worksheet#.#worksheet#(test1.sc0.tmp:5)
0

I think it might be helpful, maybe to include the Scala version and libraries that I included in my project.

I'm using 

org.apache.spark:spark-yarn_2.10:1.1.1 

and 

SBT: org.scala-lang:scala-library:2.11.8:jar 

with Scala version 2.10 in order to be compatible with the Spark 2.10 version.

0

I got past my issue.

Several other posts mentioned adding dependencies in the SBT.  I had done this on my workstation but not on the CentOS development server.

More for others to read, than for my benefit, here is what seems to have made the difference:

I had been trying to add Maven libraries to my project using the New Project Library option.  Even though I was using Scala 2.11, Maven would only locate org.apache.spark libraries with 2.10 versions.  I clearly didn't (and still don't at this time) understand SBT all that well.

I added the following lines to my build.SBT file.

libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"

This addition left all of the dependencies underlined in red with errors indicating "Unresolved dependency".  A banner also displayed indicating that build.sbt was changed and offering me a set of options to Refresh project, Enable auto-import, and Ignore. I still haven't figured out how to either make this banner appear again (without modifying my build.sbt file) or how to perform these activities manually.

Clicking the Refresh Project link in the banner, the little processing bar at the bottom of the screen started churning for a while and I found that all kinds of objects got added to the libraries of my project.  This is clearly what SBT is meant to do.  I also received some warnings about multiple dependencies with different versions.  I'm still not sure how to address these warnings.

3:29:18 PM SBT project import
[warn] Multiple dependencies with the same organization/name but different versions. To avoid conflict, pick one version:
[warn] * org.scala-lang:scala-compiler:(2.11.0, 2.11.8)
[warn] * org.scala-lang:scala-reflect:(2.11.2, 2.11.8)
[warn] * org.apache.commons:commons-lang3:(3.3.2, 3.0)
[warn] * jline:jline:(0.9.94, 2.12.1)
[warn] * org.scala-lang.modules:scala-xml_2.11:(1.0.1, 1.0.4)
[warn] * org.slf4j:slf4j-api:(1.7.10, 1.7.2)

With this completed, my Scala Worksheet now gives me a different error:

16/03/16 15:08:22 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)

I suspect that this error probably happened because in a different package within my IntelliJ Scala project, I had successfully created a SparkContext, but I hadn't stopped it.  I cleared the error by closing and re-opening IntelliJ.

My next error when I tried executing my worksheet was:

16/03/16 15:37:54 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException: org.apache.spark.rpc.netty.NettyRpcEnvFactory

I cleared this by adding another import to my Scala worksheet.

import org.apache.spark.rpc.netty

This import returned me to the state where another SparkContext was active, so I exited IntelliJ and got back in.  I was able to create a SparkContext the first time I executed the Scala worksheet, but then got the SparkContext active error again, so I added a line to the end of the worksheet.

sc.stop()

This didn't work, so for now, I will exit and re-start IntelliJ when I want to experiment with code.  There were some examples online that show how to create a client that will execute the Spark logic.  I'll work on that next.

My current Scala worksheet looks like this:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark._
import org.apache.spark.rpc.netty
//
val sConf = new SparkConf().setMaster("localhost").setAppName("test1")
val sc = new SparkContext(sConf)
sc.stop()

My SBT looks like this:

name := "Spark-Test3"
version := "1.0"
scalaVersion := "2.11.8"

libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
0

Please sign in to leave a comment.