2015-07-01 4 views
5

Я бегу следующий код в pyspark:Weird поведение с искрой подать

In [14]: conf = SparkConf() 

In [15]: conf.getAll() 

[(u'spark.eventLog.enabled', u'true'), 
(u'spark.eventLog.dir', 
    u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), 
(u'spark.master', u'local[*]'), 
(u'spark.yarn.historyServer.address', 
    u'http://ip-10-0-0-220.ec2.internal:18088'), 
(u'spark.executor.extraLibraryPath', 
    u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), 
(u'spark.app.name', u'pyspark-shell'), 
(u'spark.driver.extraLibraryPath', 
    u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native')] 

In [16]: sc 

<pyspark.context.SparkContext at 0x7fab9dd8a750> 

In [17]: sc.version 

u'1.4.0' 

In [19]: sqlContext 

<pyspark.sql.context.HiveContext at 0x7fab9de785d0> 

In [20]: access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 

И все проходит гладко (я могу создать таблицы в улей Metastore и т.д.)

Но когда я пытаюсь чтобы запустить этот код с spark-submit:

# -*- coding: utf-8 -*-                                                               

from __future__ import print_function 

import re 

from pyspark import SparkContext 
from pyspark.sql import HiveContext 
from pyspark.sql import Row 
from pyspark.conf import SparkConf 

if __name__ == "__main__": 

    sc = SparkContext(appName="Minimal Example 2") 

    conf = SparkConf() 

    print(conf.getAll()) 

    print(sc) 

    print(sc.version) 

    sqlContext = HiveContext(sc) 

    print(sqlContext) 

    # ## Read the access log file                                                             
    access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 

    sc.stop() 

я запускаю этот код:

$ spark-submit --master yarn-cluster --deploy-mode cluster minimal-example2.py 

и работает без ошибок (видимо), но если вы проверяете журналы:

$ yarn logs -applicationId application_1435696841856_0027  

Он гласит:

15/07/01 16:55:10 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-0-220.ec2.internal/10.0.0.220:8032 


Container: container_1435696841856_0027_01_000001 on ip-10-0-0-36.ec2.internal_8041 
===================================================================================== 
LogType: stderr 
LogLength: 21077 
Log Contents: 
SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/yarn/nm/usercache/nanounanue/filecache/133/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
15/07/01 16:54:00 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 
15/07/01 16:54:01 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1435696841856_0027_000001 
15/07/01 16:54:02 INFO spark.SecurityManager: Changing view acls to: yarn,nanounanue 
15/07/01 16:54:02 INFO spark.SecurityManager: Changing modify acls to: yarn,nanounanue 
15/07/01 16:54:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, nanounanue); users with modify permissions: Set(yarn, nanounanue) 
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Waiting for spark context initialization 
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
15/07/01 16:54:03 INFO spark.SparkContext: Running Spark version 1.4.0 
15/07/01 16:54:03 INFO spark.SecurityManager: Changing view acls to: yarn,nanounanue 
15/07/01 16:54:03 INFO spark.SecurityManager: Changing modify acls to: yarn,nanounanue 
15/07/01 16:54:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, nanounanue); users with modify permissions: Set(yarn, nanounanue) 
15/07/01 16:54:03 INFO slf4j.Slf4jLogger: Slf4jLogger started 
15/07/01 16:54:03 INFO Remoting: Starting remoting 
15/07/01 16:54:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:41190] 
15/07/01 16:54:03 INFO util.Utils: Successfully started service 'sparkDriver' on port 41190. 
15/07/01 16:54:04 INFO spark.SparkEnv: Registering MapOutputTracker 
15/07/01 16:54:04 INFO spark.SparkEnv: Registering BlockManagerMaster 
15/07/01 16:54:04 INFO storage.DiskBlockManager: Created local directory at /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-14127054-19b1-4cfe-80c3-2c5fc917c9cf 
15/07/01 16:54:04 INFO storage.DiskBlockManager: Created local directory at /data0/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-c8119846-7f6f-45eb-911b-443cb4d7e9c9 
15/07/01 16:54:04 INFO storage.MemoryStore: MemoryStore started with capacity 245.7 MB 
15/07/01 16:54:04 INFO spark.HttpFileServer: HTTP File server directory is /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/httpd-c4abf72b-2ee4-45d7-8252-c68f925bef58 
15/07/01 16:54:04 INFO spark.HttpServer: Starting HTTP Server 
15/07/01 16:54:04 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/07/01 16:54:04 INFO server.AbstractConnector: Started [email protected]:56437 
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'HTTP file server' on port 56437. 
15/07/01 16:54:04 INFO spark.SparkEnv: Registering OutputCommitCoordinator 
15/07/01 16:54:04 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 
15/07/01 16:54:04 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/07/01 16:54:04 INFO server.AbstractConnector: Started [email protected]:37958 
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'SparkUI' on port 37958. 
15/07/01 16:54:04 INFO ui.SparkUI: Started SparkUI at http://10.0.0.36:37958 
15/07/01 16:54:04 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49759. 
15/07/01 16:54:04 INFO netty.NettyBlockTransferService: Server created on 49759 
15/07/01 16:54:05 INFO storage.BlockManagerMaster: Trying to register BlockManager 
15/07/01 16:54:05 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.0.36:49759 with 245.7 MB RAM, BlockManagerId(driver, 10.0.0.36, 49759) 
15/07/01 16:54:05 INFO storage.BlockManagerMaster: Registered BlockManager 
15/07/01 16:54:05 INFO scheduler.EventLoggingListener: Logging events to hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory/application_1435696841856_0027_1 
15/07/01 16:54:05 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#-1566924249]) 
15/07/01 16:54:05 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-0-220.ec2.internal/10.0.0.220:8030 
15/07/01 16:54:05 INFO yarn.YarnRMClient: Registering the ApplicationMaster 
15/07/01 16:54:05 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead 
15/07/01 16:54:05 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>) 
15/07/01 16:54:05 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>) 
15/07/01 16:54:05 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000 
15/07/01 16:54:11 INFO impl.AMRMClientImpl: Received new token for : ip-10-0-0-99.ec2.internal:8041 
15/07/01 16:54:11 INFO impl.AMRMClientImpl: Received new token for : ip-10-0-0-37.ec2.internal:8041 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching container container_1435696841856_0027_01_000002 for on host ip-10-0-0-99.ec2.internal 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl: akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, executorHostname: ip-10-0-0-99.ec2.internal 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching container container_1435696841856_0027_01_000003 for on host ip-10-0-0-37.ec2.internal 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Starting Executor Container 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl: akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, executorHostname: ip-10-0-0-37.ec2.internal 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them. 
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Starting Executor Container 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Preparing Local resources 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Preparing Local resources 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s 
ize: 162896305 timestamp: 1435784032445 type: FILE visibility: PRIVATE, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 
032613 type: FILE visibility: PRIVATE, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: FIL 
E visibility: PRIVATE, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: FILE visibility: PRIVA 
TE) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s 
ize: 162896305 timestamp: 1435784032445 type: FILE visibility: PRIVATE, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 
032613 type: FILE visibility: PRIVATE, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: FIL 
E visibility: PRIVATE, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: FILE visibility: PRIVA 
TE) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOO 
P_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH, SPARK_LOG_URL_STDERR -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanounan 
ue/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1435696841856_0027, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 162896305,281333,37562,2448, SPARK_USER -> nanounanue, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE,PRIVATE,PRIVATE, SPARK_YARN_MODE -> 
true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1435784032445,1435784032613,1435784032652,1435784032692, PYTHONPATH -> pyspark.zip:py4j-0.8.2.1-src.zip, SPARK_LOG_URL_STDOUT -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanou 
nanue/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/applic 
ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_14 
35696841856_0027/minimal-example2.py#minimal-example2.py) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOO 
P_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH, SPARK_LOG_URL_STDERR -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanounan 
ue/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1435696841856_0027, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 162896305,281333,37562,2448, SPARK_USER -> nanounanue, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE,PRIVATE,PRIVATE, SPARK_YARN_MODE -> 
true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1435784032445,1435784032613,1435784032652,1435784032692, PYTHONPATH -> pyspark.zip:py4j-0.8.2.1-src.zip, SPARK_LOG_URL_STDOUT -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanou 
nanue/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/applic 
ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_14 
35696841856_0027/minimal-example2.py#minimal-example2.py) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with commands: List(LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$LD_LIBRARY_PATH", {{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx 
1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui.port=0', '-Dspark.driver.port=41190', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, --e 
xecutor-id, 1, --hostname, ip-10-0-0-99.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with commands: List(LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$LD_LIBRARY_PATH", {{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx 
1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui.port=0', '-Dspark.driver.port=41190', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, --e 
xecutor-id, 2, --hostname, ip-10-0-0-37.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-0-0-37.ec2.internal:8041 
15/07/01 16:54:14 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:43176 
15/07/01 16:54:15 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:58472 
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:49047/user/Executor#563862009]) with ID 1 
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:36122/user/Executor#1370723906]) with ID 2 
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 
15/07/01 16:54:15 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 
15/07/01 16:54:15 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-99.ec2.internal:59769 with 530.3 MB RAM, BlockManagerId(1, ip-10-0-0-99.ec2.internal, 59769) 
15/07/01 16:54:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-37.ec2.internal:48859 with 530.3 MB RAM, BlockManagerId(2, ip-10-0-0-37.ec2.internal, 48859) 
15/07/01 16:54:16 INFO hive.HiveContext: Initializing execution hive, version 0.13.1 
15/07/01 16:54:17 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 
15/07/01 16:54:17 INFO metastore.ObjectStore: ObjectStore, initialize called 
15/07/01 16:54:17 INFO spark.SparkContext: Invoking stop() from shutdown hook 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 
15/07/01 16:54:17 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.0.36:37958 
15/07/01 16:54:17 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:49047 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:36122 
15/07/01 16:54:17 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.0.36:37958 
15/07/01 16:54:17 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:49047 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:36122 
15/07/01 16:54:17 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
15/07/01 16:54:17 INFO storage.MemoryStore: MemoryStore cleared 
15/07/01 16:54:17 INFO storage.BlockManager: BlockManager stopped 
15/07/01 16:54:17 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 
15/07/01 16:54:17 INFO spark.SparkContext: Successfully stopped SparkContext 
15/07/01 16:54:17 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) 
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.) 
15/07/01 16:54:17 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1435696841856_0027 
15/07/01 16:54:17 INFO util.Utils: Shutdown hook called 
15/07/01 16:54:17 INFO util.Utils: Deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/pyspark-215f5c19-b1cb-47df-ad43-79da4244de61 
15/07/01 16:54:17 INFO util.Utils: Deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/tmp/spark-c96dc9dc-e6ee-451b-b09e-637f5d4ca990 

LogType: stdout 
LogLength: 2404 
Log Contents: 
[(u'spark.eventLog.enabled', u'true'), (u'spark.submit.pyArchives', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.yarn.app.container.log.dir', u'/var/log/hadoop-yarn/container/application_1435696841856_0027/container_1435696841856_0027_01_000001'), (u'spark.eventLog.dir', 
u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), (u'spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS', u'ip-10-0-0-220.ec2.internal'), (u'spark.yarn.historyServer.address', u'http://ip-10-0-0-220.ec2.internal:18088' 
), (u'spark.ui.port', u'0'), (u'spark.yarn.app.id', u'application_1435696841856_0027'), (u'spark.app.name', u'minimal-example2.py'), (u'spark.executor.instances', u'2'), (u'spark.executorEnv.PYTHONPATH', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.submit.pyFiles', u''), 
(u'spark.executor.extraLibraryPath', u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.master', u'yarn-cluster'), (u'spark.ui.filters', u'org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter'), (u'spark.org.apache.hadoop.yarn.server.w 
ebproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES', u'http://ip-10-0-0-220.ec2.internal:8088/proxy/application_1435696841856_0027'), (u'spark.driver.extraLibraryPath', u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.yarn.app.attemptId', u 
'1')] 
<pyspark.context.SparkContext object at 0x3fd53d0> 
1.4.0 
<pyspark.sql.context.HiveContext object at 0x40a9110> 
Traceback (most recent call last): 
    File "minimal-example2.py", line 53, in <module> 
    access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 
    File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 591, in read 
    File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 39, in __init__ 
    File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 619, in _ssql_ctx 
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o53)) 

Важно парте последняя строка: "You must build Spark with Hive." Почему? Что я делаю не так?

+0

Вы используете встроенную версию пользовательского искрового или версию от поставщика? Также искры.yarn.jar установлен в вашем conf? – Holden

+0

@Holden это двоичный код от Spark, 1.4. Я не использовал поставщика, поскольку он слишком стар (1.2). Ни один из примеров не имеет 'spark.yarn.jar' set – nanounanue

ответ

6

Недавно я получил эту же проблему. Но оказалось, что сообщение от Искры вводит в заблуждение; не было недостающих банок. Проблема для меня заключалась в том, что класс Java HiveContext, который вызывается PySpark, анализирует hive-site.xml при его построении и возникает исключение, возникающее во время построения. (PySpark ловит это исключение и неправильно указывает, что это связано с отсутствующим банком.) В результате это было ошибкой с свойством hive.metastore.client.connect.retry.delay, которое было установлено на 2s. Класс HiveContext пытается проанализировать это как целое число, которое терпит неудачу. Измените его на 2 и удалите символы в hive.metastore.client.socket.timeout и hive.metastore.client.socket.lifetime.

Обратите внимание, что вы можете получить более описательную ошибку, позвонив по телефону sqlContext._get_hive_ctx().

+0

Спасибо. Этот ответ очень помогает. BTW http://stackoverflow.com/a/34215330/1813988 также полезно для тех, кто ищет решение командной строки. – phil

-1

Он также говорит: «Произошла ошибка при вызове None.org.apache.spark.sql.hive.HiveContext \ п»

Таким образом, проблема, кажется, что улей часть не предусмотрено команда spark-submit, а кластер не находит зависимость Hive. Просто сделайте, как он говорит, и:

Export 'SPARK_HIVE=true'

В теории, это должно позволить вам создать свою банку с Улей зависимость включена, так что искра найдет LIB она скучает.

2

Вы должны создать SQLContext INSTAND из HiveContext

from pyspark.sql import SQLContext 
sqlContext=SQLContext(sc) 
Смежные вопросы