java - Connect to elasticsearch 2.4.4 with spark 2.x -
from offical doc can see :
elasticsearch-hadoop allows elasticsearch used in spark in 2 ways: through dedicated support available since 2.1 or through map/reduce bridge since 2.0
but when try through dedicated support way below:
import org.elasticsearch.spark.rdd.api.java.javaesspark; sparkconf conf = new sparkconf().setappname("myapp") .set("spark.serializer", "org.apache.spark.serializer.kryoserializer") .set("es.nodes", "localhost") .set("es.port", "9200") .set("es.resource", "test/main") .set("es.index.auto.create", "true"); javasparkcontext sc = new javasparkcontext(conf); javardd<string> input = sc.textfile("file:///home/zht/pycharmprojects/test/text_file.txt"); javardd<map<string, string>> formattedrdd = input.map(...); javaesspark.savetoes(formattedrdd, "test/spark");
and run commind line:
spark-submit --conf spark.es.resource=test/main --jars $spark_home/jars/elasticsearch-hadoop-2.4.4.jar --class org.spark_examples.something.launchspark sparkexamples-1.0-snapshot-jar-with-dependencies.jar
i got error:
java.lang.nosuchmethoderror: org.apache.spark.taskcontext.addoncompletecallback(lscala/function0;)v @ org.elasticsearch.spark.rdd.esrddwriter.write(esrddwriter.scala:42) @ org.elasticsearch.spark.rdd.esspark$$anonfun$dosavetoes$1.apply(esspark.scala:84) @ org.elasticsearch.spark.rdd.esspark$$anonfun$dosavetoes$1.apply(esspark.scala:84) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:87) @ org.apache.spark.scheduler.task.run(task.scala:108) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:335) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:624) @ java.lang.thread.run(thread.java:748)
how fix that?
the addoncompletecallback method removed taskcontext in spark 2.0. spark 2.0 not binary compatible previous releases.
if using maven build project use below code snippet.
<dependency> <groupid>org.elasticsearch</groupid> <artifactid>elasticsearch-hadoop</artifactid> <version>5.1.2</version> </dependency>
if not, please use elasticsearch-hadoop jar version greater version 5.
reference: github issue tracker
Comments
Post a Comment