Redshift table occupies exponentially more disk space in production cluster -

May 15, 2010

production cluster details:

node type dc1.8xlarge
nodes 25
2.56tb ssd storage per node

test cluster details:

node type ds2.xlarge
nodes 6
2tb hdd storage per node

when same table same ddl & encoding unloaded , copied production cluster test cluster, disk footprint reduces exponentially. has been tested multiple tables different distribution styles , sort key patterns.

example: table (no sort key, distsyle even) - size in production: 60gb; size in test: 0.6 gb

table b (sort key, diststyle key) - size in production: 96gb 100% sorted; size in test: 1.4 gb 100% sorted

any ideas can result in discrepancy? have read of redshift forums not able find reason issue. using admin view v_space_used_per_tbl (provided aws) calculating size of table.

if table row count small table size defined minimum table size cluster.

redshift allocates minimum of 1mb per column per slice in cluster. dc1.8xlarge node has 32 slices on 25 node cluster that's ~800mb per column.

a 60gb minimum size table has ~72 user columns plus 3 internal [fewer if of columns large varchar()]. ds2.xlarge node has 2 slices on 6 node cluster minimum size of 72 column table ~900mb.

for large cluster in particular, should consider using diststyle all if table "dimension" table (small-ish lookup table). reduces minimum size 1mb per column per node.

for detailed explanation please refer aws knowledge base article why table in amazon redshift cluster consume more disk storage space expected?

Search This Blog

RT

Redshift table occupies exponentially more disk space in production cluster -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

javascript - Replicate keyboard event with html button -