Redshift table occupies exponentially more disk space in production cluster -


production cluster details:

  • node type dc1.8xlarge
  • nodes 25
  • 2.56tb ssd storage per node

test cluster details:

  • node type ds2.xlarge
  • nodes 6
  • 2tb hdd storage per node

when same table same ddl & encoding unloaded , copied production cluster test cluster, disk footprint reduces exponentially. has been tested multiple tables different distribution styles , sort key patterns.

example: table (no sort key, distsyle even) - size in production: 60gb; size in test: 0.6 gb

table b (sort key, diststyle key) - size in production: 96gb 100% sorted; size in test: 1.4 gb 100% sorted

any ideas can result in discrepancy? have read of redshift forums not able find reason issue. using admin view v_space_used_per_tbl (provided aws) calculating size of table.

if table row count small table size defined minimum table size cluster.

redshift allocates minimum of 1mb per column per slice in cluster. dc1.8xlarge node has 32 slices on 25 node cluster that's ~800mb per column.

a 60gb minimum size table has ~72 user columns plus 3 internal [fewer if of columns large varchar()]. ds2.xlarge node has 2 slices on 6 node cluster minimum size of 72 column table ~900mb.

for large cluster in particular, should consider using diststyle all if table "dimension" table (small-ish lookup table). reduces minimum size 1mb per column per node.

for detailed explanation please refer aws knowledge base article why table in amazon redshift cluster consume more disk storage space expected?


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -