Overview
The article discusses Netflix's benchmarking of Apache Cassandra's scalability on AWS, achieving over a million writes per second. It highlights the automated tooling developed for rapid deployment and the linear scalability observed during testing.
What You'll Learn
1
How to deploy a large-scale Cassandra cluster on AWS using automated tooling
2
Why linear scalability is crucial for handling high write loads in distributed databases
3
When to use different EC2 instance types for Cassandra workloads
Prerequisites & Requirements
- Understanding of NoSQL databases and distributed systems
- Familiarity with AWS EC2 and Cassandra(optional)
Key Questions Answered
How did Netflix achieve over a million writes per second with Cassandra?
Netflix utilized an automated deployment process on AWS with 288 EC2 instances to run a write-oriented benchmark, achieving 1.1 million client writes per second. The data was replicated across three availability zones, resulting in a total of 3.3 million writes per second across the cluster.
What EC2 instance types are best for Cassandra workloads?
The article mentions using M1 Extra Large (m1.xl) instances for write-heavy workloads and M2 Quadruple Extra Large (m2.4xl) instances for read-intensive tasks. M1 instances have four CPUs and 15GB RAM, while M2 instances have eight CPUs and 68GB RAM.
What are the costs associated with running Cassandra benchmarks on AWS?
The benchmarking tests incurred costs of a few hundred dollars, with m1.xl instances costing $0.68 per hour and m2.4xl instances costing $2.00 per hour. The tests were designed to be cost-effective by minimizing setup time and utilizing automation.
Key Statistics & Figures
Client writes per second
1.1 million
Achieved during the benchmark test using 288 EC2 instances.
Total writes per second across the cluster
3.3 million
This includes data replicated across three availability zones.
Cost of m1.xl instances
$0.68 per hour
Used for running Cassandra in the benchmark.
Cost of m2.4xl instances
$2.00 per hour
Used for client instances running stress tests.
Technologies & Tools
Database
Apache Cassandra
Used as the NoSQL data store for benchmarking.
Cloud Computing
AWS EC2
Platform used for deploying Cassandra clusters and running benchmarks.
Key Actionable Insights
1Utilize automated deployment tools to quickly scale Cassandra clusters on AWS.This approach reduces setup time and operational overhead, allowing teams to focus on performance testing rather than infrastructure management.
2Benchmarking in the cloud can significantly lower costs and improve testing efficiency.By leveraging AWS's pay-as-you-go model, teams can run extensive tests without the constraints of traditional data center setups.
3Understand the implications of consistency levels in Cassandra for application performance.Choosing the right consistency level, such as 'ONE' for faster writes or 'LOCAL QUORUM' for stronger consistency, can impact both performance and data integrity.
Common Pitfalls
1
Failing to properly configure consistency levels can lead to data inconsistency.
It's crucial to understand the trade-offs between speed and consistency when selecting a consistency level in Cassandra, as this can impact application performance.
Related Concepts
Distributed Databases
Cloud Computing Best Practices
Cassandra Performance Tuning