Using virtual private clusters for testing Apache Samza

Rayman Preet Singh
6 min readadvanced
--
View Original

Overview

The article discusses the use of virtual private clusters for testing Apache Samza at LinkedIn, emphasizing the importance of stability and rigorous testing in managed stream processing services. It outlines the setup using Linux Containers (LXC) to emulate a YARN cluster, enabling efficient testing of applications without the overhead of real clusters.

What You'll Learn

1

How to set up a virtual private cluster using LXC for testing Apache Samza

2

Why using OS-level virtualization can enhance testing efficiency

3

How to emulate network conditions for realistic testing scenarios

4

When to apply failure testing in a managed service environment

Prerequisites & Requirements

  • Understanding of Apache Samza and YARN concepts
  • Familiarity with Linux Containers (LXC) and Docker(optional)

Key Questions Answered

How can LXC be used to emulate a YARN cluster for testing?
LXC can be used to create multiple isolated Linux environments that simulate a YARN cluster on a single physical machine. This setup allows developers to run and test applications like Apache Samza efficiently, exercising all multi-machine code paths without the overhead of a real cluster.
What are the benefits of using OS-level virtualization for testing?
OS-level virtualization, such as LXC, offers lower overhead compared to full virtualization, allowing for higher density of instances on a single machine. This enables the creation of clusters with many nodes, facilitating comprehensive testing of distributed applications while maintaining performance.
What network conditions can be emulated for testing purposes?
Using Dummynet, the setup can emulate various network conditions such as bandwidth limits, network delays, and packet loss. This allows for realistic testing scenarios that reflect typical inter-data-center latencies, which is crucial for validating application behavior under different network conditions.
What types of testing can be performed with this setup?
The setup is useful for validation of YARN semantics, failure testing of machine and container behaviors, and development of features requiring multi-machine interactions. This comprehensive testing approach helps ensure stability and performance before deploying upgrades or new features.

Key Statistics & Figures

Cluster size
50 LXC instances of 8 GB each
This setup runs on a base host with 64 GB of RAM, allowing for resource over-subscription.
Bandwidth limit
100 Mbps
This limit is applied to the emulated network conditions between LXC instances.
Network delay
10 ms
This delay is added to simulate typical inter-data-center latencies.
Packet loss
1%
This is randomized to reflect real-world network conditions during testing.

Technologies & Tools

Virtualization
Linux Containers
Used to create isolated environments for testing Apache Samza applications.
Cluster Management
Yarn
Manages resources and scheduling for the applications running in the LXC instances.
Network Emulation
Dummynet
Emulates network conditions such as bandwidth limits and delays for realistic testing.

Key Actionable Insights

1
Implementing a virtual private cluster using LXC can significantly reduce the overhead associated with traditional cluster setups.
This approach allows developers to conduct extensive testing on a single machine, making it easier to validate features and performance before deployment.
2
Utilizing Dummynet for network emulation can enhance the realism of testing scenarios.
By simulating network conditions such as latency and packet loss, developers can better understand how their applications will perform in real-world environments.
3
Regularly validate YARN semantics and performance before adopting new versions.
This ensures that critical features like host affinity remain stable, which is essential for maintaining application reliability in production.

Common Pitfalls

1
Failing to properly emulate network conditions can lead to unrealistic testing outcomes.
Without accurate network emulation, the testing may not reflect real-world scenarios, resulting in unexpected application behavior during production.
2
Overlooking the importance of validating YARN semantics before upgrades.
Neglecting this validation can lead to regressions in critical features, impacting application performance and reliability.

Related Concepts

Apache Samza
Yarn
Linux Containers
Network Emulation