hadoop - How to fix size limit error when performing actions on hive table in pyspark -


i have hive table 4 billion rows need load pyspark. when try actions such counting against table, following exception (followed taskkilled exceptions):

py4jjavaerror: error occurred while calling o89.count. : org.apache.spark.sparkexception: job aborted due stage failure: task 6732 in stage 13.0 failed 4 times, recent failure: lost task 6732.3 in stage 13.0 (tid 30759, some_server.xx.net, executor 38): org.apache.hive.com.google.protobuf.invalidprotocolbufferexception: protocol mess age large.  may malicious.  use codedinputstream.setsizelimit() increase size limi t. 

my version of hbase 1.1.2.2.6.1.0-129 , unable upgrade @ time.

is there way can around issue without upgrading, maybe modifying environment variable or config somewhere, or passing argument pyspark via command line?

i no.

based on following jiras increasing protobuf size seems require code change since these jiras resolved code patches using codedinputstream suggested exception.

  • hdfs-6102 lower default maximum items per directory fix pb fsimage loading
  • hdfs-10312 large block reports may fail decode @ namenode due 64 mb protobuf maximum length restriction.
  • hbase-14076 resultserialization , mutationserialization can throw invalidprotocolbufferexception when serializing cell larger 64mb
  • hive-11592 orc metadata section can exceed protobuf message size limit
  • spark-19109 orc metadata section can exceed protobuf message size limit

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -