hadoop - How to fix size limit error when performing actions on hive table in pyspark -

June 15, 2011

i have hive table 4 billion rows need load pyspark. when try actions such counting against table, following exception (followed taskkilled exceptions):

py4jjavaerror: error occurred while calling o89.count. : org.apache.spark.sparkexception: job aborted due stage failure: task 6732 in stage 13.0 failed 4 times, recent failure: lost task 6732.3 in stage 13.0 (tid 30759, some_server.xx.net, executor 38): org.apache.hive.com.google.protobuf.invalidprotocolbufferexception: protocol mess age large.  may malicious.  use codedinputstream.setsizelimit() increase size limi t.

my version of hbase 1.1.2.2.6.1.0-129 , unable upgrade @ time.

is there way can around issue without upgrading, maybe modifying environment variable or config somewhere, or passing argument pyspark via command line?

i no.

based on following jiras increasing protobuf size seems require code change since these jiras resolved code patches using codedinputstream suggested exception.

hdfs-6102 lower default maximum items per directory fix pb fsimage loading
hdfs-10312 large block reports may fail decode @ namenode due 64 mb protobuf maximum length restriction.
hbase-14076 resultserialization , mutationserialization can throw invalidprotocolbufferexception when serializing cell larger 64mb
hive-11592 orc metadata section can exceed protobuf message size limit
spark-19109 orc metadata section can exceed protobuf message size limit

Search This Blog

RT

hadoop - How to fix size limit error when performing actions on hive table in pyspark -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

html - How to custom Bootstrap grid height? -