hadoop - How to fix size limit error when performing actions on hive table in pyspark -
i have hive table 4 billion rows need load pyspark. when try actions such counting against table, following exception (followed taskkilled
exceptions):
py4jjavaerror: error occurred while calling o89.count. : org.apache.spark.sparkexception: job aborted due stage failure: task 6732 in stage 13.0 failed 4 times, recent failure: lost task 6732.3 in stage 13.0 (tid 30759, some_server.xx.net, executor 38): org.apache.hive.com.google.protobuf.invalidprotocolbufferexception: protocol mess age large. may malicious. use codedinputstream.setsizelimit() increase size limi t.
my version of hbase 1.1.2.2.6.1.0-129 , unable upgrade @ time.
is there way can around issue without upgrading, maybe modifying environment variable or config somewhere, or passing argument pyspark via command line?
i no.
based on following jiras increasing protobuf size seems require code change since these jiras resolved code patches using codedinputstream
suggested exception.
- hdfs-6102 lower default maximum items per directory fix pb fsimage loading
- hdfs-10312 large block reports may fail decode @ namenode due 64 mb protobuf maximum length restriction.
- hbase-14076 resultserialization , mutationserialization can throw invalidprotocolbufferexception when serializing cell larger 64mb
- hive-11592 orc metadata section can exceed protobuf message size limit
- spark-19109 orc metadata section can exceed protobuf message size limit
Comments
Post a Comment