Cassandra: Data-Driven Configuration

Noel Cody
8 min readadvanced
--
View Original

Overview

The article discusses Spotify's use of Cassandra for data-driven configuration, emphasizing the importance of load testing and capacity planning for performance optimization. It details the setup and execution of performance tests using the cassandra-stress tool, showcasing a case study for a new service that tracks user interactions with music features.

What You'll Learn

1

How to use cassandra-stress for performance testing in Cassandra

2

Why pre-launch load testing is critical for systems with strict SLAs

3

When to adjust consistency levels and how it impacts performance

Prerequisites & Requirements

  • Understanding of Cassandra architecture and data modeling
  • Familiarity with cassandra-stress tool(optional)

Key Questions Answered

How does changing the consistency level in Cassandra affect performance?
Changing the consistency level from ONE to QUORUM can significantly impact performance. The article notes that with a consistency level of ONE and a 6-node cluster, a 95th percentile latency of under 5 ms per operation is achievable, while increasing to QUORUM still meets the SLA. However, a 3-node cluster fails to meet the SLA even at low consistency.
What are the expected I/O patterns for a new service using Cassandra?
The expected I/O patterns for the new service are write-heavy, with a peak operations per second of about 50K at launch, with plans to scale to nearly twice this. The service will also involve periodic batch reads of features for an anonymous user ID.
What setup is required for using cassandra-stress?
The cassandra-stress tool is included with the default Cassandra installation from DataStax. Setup involves creating a .yaml profile that defines the data model, shape of the data, and queries, which can be built in three steps including dumping the schema and defining queries.

Key Statistics & Figures

Average operation latency
< 5 ms at the 95th percentile
This is the target latency for the service under a 6-node cluster with a consistency level of ONE.
Peak operations per second
90k
This is the expected throughput with a 6-node cluster at a consistency level of ONE.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Cassandra
Used for managing key-value pairs in real-time data processing.
Tool
Cassandra-stress
Utilized for performance testing and forecasting database performance.

Key Actionable Insights

1
Implement pre-launch load testing using cassandra-stress to validate performance expectations.
This ensures that the system can handle production traffic effectively and meets the defined SLAs before going live.
2
Regularly monitor system stats during testing to avoid CPU, memory, or network bottlenecks.
Using tools like htop and ifstat helps maintain the integrity of performance tests and ensures accurate results.
3
Consider scaling your Cassandra cluster based on performance testing results.
The article highlights that a 3-node cluster may not meet performance needs, suggesting that scaling to 6 nodes could provide better results.

Common Pitfalls

1
Failing to simulate production traffic during testing can lead to unexpected performance issues post-launch.
Without accurate load testing, the system may not perform as expected under real-world conditions, potentially leading to SLA violations.
2
Neglecting to monitor system resources during tests can result in misleading performance metrics.
If the testing environment is constrained by CPU or memory limits, the results may not reflect the true capabilities of the Cassandra cluster.

Related Concepts

Load Testing
Capacity Planning
Cassandra Data Modeling