2015-09-04 3 views
0

Я пытался использовать эту команду, я получаю ошибкуСпарк 1.4.1 - Использование pyspark

Код

instances = sqlContext.sql("SELECT instance_id ,instance_usage_code 
FROM ib_instances WHERE (instance_usage_code) = 'OUT_OF_ENTERPRISE' ") 

instances.write.format("orc").save("instances2") 

hivectx.sql(""" CREATE TABLE IF NOT EXISTS instances2 (instance_id 
string, instance_usage_code STRING)""") 

hivectx.sql (" LOAD DATA LOCAL INPATH '/home/hduser/instances2' into 
table instances2 ") 

Ошибка

Traceback (most recent call last): File "/home/hduser/spark_script.py", line 57, in instances.write.format("orc").save("instances2") File "/usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/s ql/readwriter.py", line 304, in save File "/usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/ py4j/java_gateway.py", line 538, in call File "/usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/ py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o55.save. : java.lang.AssertionError: assertion failed: The ORC data source can only be used with HiveContext. at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.hive.orc.DefaultSource.createRelation(OrcRelation .scala:54) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:322) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745)

ответ

0

Моя догадка создать стандарт SQLContext, а не Hive one (который добавляет несколько опций). Создайте свой sqlContext как экземпляр HiveContext. Версия scala:

val sqlContext = new HiveContext(sc)