Scaling Kafka for Throughput -
i have setup sample kafka cluster on aws , trying identify maximum throughput possible given configurations. following post provided here analysis.
i appreciate if clarify following issues.
i observed throughput of 40mb/s messages of size 512 bytes ( single producer - single consumer ) given hardware. assume need achieve throughput of 80mb/s.
as understand 1 way increase number of partitions per topic , increase number of threads in producer , consumer. ( assuming not change default values batch size, compression ratio etc. )
- how find maximum throughput possible given hardware? point after required improve our hardware resources if further improve throughput?
( in other words how make decision "with x gb ram , y gb disk space maximum throughput can achieve. if need further improve throughput have upgrade ram xx gb , disk space yy gb" )
2.should scale cluster vertically or horizontally? recommended approach?
thank you.
if define throughput volume of data transmitted on network per second, maximum throughput should not exceed #machine number * bandwidth. given single machine nic configured 1gbps, max tps on single machine cannot larger 1gbps. in case, tps 40mb/s, namely 320mbps,which quite less 1gbps, meaning there still room improvement. however, if target far larger 1gbps, need more machines.
afaik, bandwidth cause system bottleneck. unlike cpu , ram, it's not easy scale vertically, horizontally scaling might option.
you maths before scaling. throughput target "produce 2 billion of records 512bytes in 1 hour". that's say, tps has achieve 2,000,000,000 * 8 * 512 / 3600 / 1024 / 1024 = 2170mbps. assuming available bandwidth single machine 700mbps(over 70% usage brings 'packet loss'), @ least 4 machines should planned producer application.
Comments
Post a Comment