2015-02-04 4 views
2

Я пытаюсь запустить простое приложение с искройискра библиотека импорта апач (математика)

Это мой Scala файл:

/* SimpleApp.scala */ 
import org.apache.spark.SparkContext 

import org.apache.spark.SparkContext._ 
import org.apache.spark.SparkConf 
import org.apache.commons.math3.random.RandomDataGenerator 


object SimpleApp { 
    def main(args: Array[String]) { 
    val logFile = "/home/donbeo/Applications/spark/spark-1.1.0/README.md" // Should be some file on your system 
    val conf = new SparkConf().setAppName("Simple Application") 
    val sc = new SparkContext(conf) 
    val logData = sc.textFile(logFile, 2).cache() 
    val numAs = logData.filter(line => line.contains("a")).count() 
    val numBs = logData.filter(line => line.contains("b")).count() 
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) 


    println("A random number") 

    val randomData = new RandomDataGenerator() 

    println(randomData.nextLong(0, 100)) 
    } 
} 

и это мой файл SBT

name := "Simple Project" 

version := "1.0" 

scalaVersion := "2.10.4" 

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" 

libraryDependencies += "org.apache.commons" % "commons-math3" % "3.3" 

Когда я пытаюсь запустить код, я получаю эту ошибку

[email protected]:~/Applications/spark/spark-1.1.0$ ./bin/spark-submit --class "SimpleApp" --master local[4] /home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-project_2.10-1.0.jar 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
15/02/04 17:42:41 WARN Utils: Your hostname, donbeo-HP-EliteBook-Folio-9470m resolves to a loopback address: 127.0.1.1; using 192.168.1.45 instead (on interface wlan0) 
15/02/04 17:42:41 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/02/04 17:42:41 INFO SecurityManager: Changing view acls to: donbeo, 
15/02/04 17:42:41 INFO SecurityManager: Changing modify acls to: donbeo, 
15/02/04 17:42:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo,); users with modify permissions: Set(donbeo,) 
15/02/04 17:42:42 INFO Slf4jLogger: Slf4jLogger started 
15/02/04 17:42:42 INFO Remoting: Starting remoting 
15/02/04 17:42:42 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:45935] 
15/02/04 17:42:42 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:45935] 
15/02/04 17:42:42 INFO Utils: Successfully started service 'sparkDriver' on port 45935. 
15/02/04 17:42:42 INFO SparkEnv: Registering MapOutputTracker 
15/02/04 17:42:42 INFO SparkEnv: Registering BlockManagerMaster 
15/02/04 17:42:42 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150204174242-bbb1 
15/02/04 17:42:42 INFO Utils: Successfully started service 'Connection manager for block manager' on port 55674. 
15/02/04 17:42:42 INFO ConnectionManager: Bound socket to port 55674 with id = ConnectionManagerId(192.168.1.45,55674) 
15/02/04 17:42:42 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 
15/02/04 17:42:42 INFO BlockManagerMaster: Trying to register BlockManager 
15/02/04 17:42:42 INFO BlockManagerMasterActor: Registering block manager 192.168.1.45:55674 with 265.4 MB RAM 
15/02/04 17:42:42 INFO BlockManagerMaster: Registered BlockManager 
15/02/04 17:42:42 INFO HttpFileServer: HTTP File server directory is /tmp/spark-49443053-833e-4596-9073-d74075483d35 
15/02/04 17:42:42 INFO HttpServer: Starting HTTP Server 
15/02/04 17:42:42 INFO Utils: Successfully started service 'HTTP file server' on port 41309. 
15/02/04 17:42:42 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/02/04 17:42:42 INFO SparkUI: Started SparkUI at http://192.168.1.45:4040 
15/02/04 17:42:42 INFO SparkContext: Added JAR file:/home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-project_2.10-1.0.jar at http://192.168.1.45:41309/jars/simple-project_2.10-1.0.jar with timestamp 1423071762914 
15/02/04 17:42:42 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:45935/user/HeartbeatReceiver 
15/02/04 17:42:43 INFO MemoryStore: ensureFreeSpace(32768) called with curMem=0, maxMem=278302556 
15/02/04 17:42:43 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 265.4 MB) 
15/02/04 17:42:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/02/04 17:42:43 WARN LoadSnappy: Snappy native library not loaded 
15/02/04 17:42:43 INFO FileInputFormat: Total input paths to process : 1 
15/02/04 17:42:43 INFO SparkContext: Starting job: count at SimpleApp.scala:13 
15/02/04 17:42:43 INFO DAGScheduler: Got job 0 (count at SimpleApp.scala:13) with 2 output partitions (allowLocal=false) 
15/02/04 17:42:43 INFO DAGScheduler: Final stage: Stage 0(count at SimpleApp.scala:13) 
15/02/04 17:42:43 INFO DAGScheduler: Parents of final stage: List() 
15/02/04 17:42:43 INFO DAGScheduler: Missing parents: List() 
15/02/04 17:42:43 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at SimpleApp.scala:13), which has no missing parents 
15/02/04 17:42:43 INFO MemoryStore: ensureFreeSpace(2616) called with curMem=32768, maxMem=278302556 
15/02/04 17:42:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.6 KB, free 265.4 MB) 
15/02/04 17:42:43 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (FilteredRDD[2] at filter at SimpleApp.scala:13) 
15/02/04 17:42:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
15/02/04 17:42:43 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1283 bytes) 
15/02/04 17:42:43 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1283 bytes) 
15/02/04 17:42:43 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 
15/02/04 17:42:43 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 
15/02/04 17:42:43 INFO Executor: Fetching http://192.168.1.45:41309/jars/simple-project_2.10-1.0.jar with timestamp 1423071762914 
15/02/04 17:42:43 INFO Utils: Fetching http://192.168.1.45:41309/jars/simple-project_2.10-1.0.jar to /tmp/fetchFileTemp3120003338190168194.tmp 
15/02/04 17:42:43 INFO Executor: Adding file:/tmp/spark-ec5e14c2-9e58-4132-a4c9-2569d237a407/simple-project_2.10-1.0.jar to class loader 
15/02/04 17:42:43 INFO CacheManager: Partition rdd_1_0 not found, computing it 
15/02/04 17:42:43 INFO CacheManager: Partition rdd_1_1 not found, computing it 
15/02/04 17:42:43 INFO HadoopRDD: Input split: file:/home/donbeo/Applications/spark/spark-1.1.0/README.md:0+2405 
15/02/04 17:42:43 INFO HadoopRDD: Input split: file:/home/donbeo/Applications/spark/spark-1.1.0/README.md:2405+2406 
15/02/04 17:42:43 INFO MemoryStore: ensureFreeSpace(7512) called with curMem=35384, maxMem=278302556 
15/02/04 17:42:43 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 7.3 KB, free 265.4 MB) 
15/02/04 17:42:43 INFO BlockManagerInfo: Added rdd_1_1 in memory on 192.168.1.45:55674 (size: 7.3 KB, free: 265.4 MB) 
15/02/04 17:42:43 INFO BlockManagerMaster: Updated info of block rdd_1_1 
15/02/04 17:42:43 INFO MemoryStore: ensureFreeSpace(8352) called with curMem=42896, maxMem=278302556 
15/02/04 17:42:43 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 8.2 KB, free 265.4 MB) 
15/02/04 17:42:43 INFO BlockManagerInfo: Added rdd_1_0 in memory on 192.168.1.45:55674 (size: 8.2 KB, free: 265.4 MB) 
15/02/04 17:42:43 INFO BlockManagerMaster: Updated info of block rdd_1_0 
15/02/04 17:42:43 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2300 bytes result sent to driver 
15/02/04 17:42:43 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2300 bytes result sent to driver 
15/02/04 17:42:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 179 ms on localhost (1/2) 
15/02/04 17:42:43 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 176 ms on localhost (2/2) 
15/02/04 17:42:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/02/04 17:42:43 INFO DAGScheduler: Stage 0 (count at SimpleApp.scala:13) finished in 0.198 s 
15/02/04 17:42:43 INFO SparkContext: Job finished: count at SimpleApp.scala:13, took 0.292364402 s 
15/02/04 17:42:43 INFO SparkContext: Starting job: count at SimpleApp.scala:14 
15/02/04 17:42:43 INFO DAGScheduler: Got job 1 (count at SimpleApp.scala:14) with 2 output partitions (allowLocal=false) 
15/02/04 17:42:43 INFO DAGScheduler: Final stage: Stage 1(count at SimpleApp.scala:14) 
15/02/04 17:42:43 INFO DAGScheduler: Parents of final stage: List() 
15/02/04 17:42:43 INFO DAGScheduler: Missing parents: List() 
15/02/04 17:42:43 INFO DAGScheduler: Submitting Stage 1 (FilteredRDD[3] at filter at SimpleApp.scala:14), which has no missing parents 
15/02/04 17:42:43 INFO MemoryStore: ensureFreeSpace(2616) called with curMem=51248, maxMem=278302556 
15/02/04 17:42:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.6 KB, free 265.4 MB) 
15/02/04 17:42:43 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (FilteredRDD[3] at filter at SimpleApp.scala:14) 
15/02/04 17:42:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 
15/02/04 17:42:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1283 bytes) 
15/02/04 17:42:43 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1283 bytes) 
15/02/04 17:42:43 INFO Executor: Running task 0.0 in stage 1.0 (TID 2) 
15/02/04 17:42:43 INFO Executor: Running task 1.0 in stage 1.0 (TID 3) 
15/02/04 17:42:43 INFO BlockManager: Found block rdd_1_1 locally 
15/02/04 17:42:43 INFO BlockManager: Found block rdd_1_0 locally 
15/02/04 17:42:43 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1731 bytes result sent to driver 
15/02/04 17:42:43 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1731 bytes result sent to driver 
15/02/04 17:42:43 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 14 ms on localhost (1/2) 
15/02/04 17:42:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 17 ms on localhost (2/2) 
15/02/04 17:42:43 INFO DAGScheduler: Stage 1 (count at SimpleApp.scala:14) finished in 0.017 s 
15/02/04 17:42:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/02/04 17:42:43 INFO SparkContext: Job finished: count at SimpleApp.scala:14, took 0.034833058 s 
Lines with a: 83, Lines with b: 38 
A random number 
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomDataGenerator 
    at SimpleApp$.main(SimpleApp.scala:20) 
    at SimpleApp.main(SimpleApp.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: org.apache.commons.math3.random.RandomDataGenerator 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358) 
    ... 9 more 
[email protected]:~/Applications/spark/spark-1.1.0$ 

Я думаю, что я делаю что-то неправильно, когда импортирую библиотеку math3.

Здесь есть подробное объяснение того, как я установил искру и построил проект submit task to Spark

+0

Это работает для меня, как есть ... вы пытались запустить sbt clean? –

ответ

2

Вы должны указать путь общего math3 Jar, это может быть сделано с помощью --jars опции

./bin/spark-submit --class "SimpleApp"       \ 
        --master local[4]       \ 
        --jars <specify-path-of-commons-math3-jar> \ 
        /home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-project_2.10-1.0.jar 

В качестве альтернативы , вы можете построить сборку jar, которая содержит все зависимости.

EDIT: Как построить сборочную банку:

в файле build.sbt

import AssemblyKeys._ 

import sbtassembly.Plugin._ 

name := "Simple Project" 

version := "1.0" 

scalaVersion := "2.10.4" 

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided" 

libraryDependencies += "org.apache.commons" % "commons-math3" % "3.3" 

// This statement includes the assembly plugin capabilities 
assemblySettings 

// Configure jar named used with the assembly plug-in 
jarName in assembly := "simple-app-assembly.jar" 

// A special option to exclude Scala itself form our assembly jar, since Spark 
// already bundles Scala. 
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) 

в файле project/assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2") 

Затем сделайте сборочную банку следующим образом:

sbt assembly 
+0

Как я могу построить сборную банку со всеми зависимостями? Разве это не делается с командой 'sbt package'? – Donbeo

+0

@ Donbeo обновил ответ, чтобы включить шаги для создания сборной банки –

Смежные вопросы