Overview
The article discusses how LinkedIn scaled its Salt infrastructure to support its growing needs for remote execution jobs, achieving a tenfold increase in job capacity and improved reliability. It details the architectural changes made, including the introduction of new products and a restructured Salt ecosystem.
What You'll Learn
1
How to scale Salt infrastructure for remote execution jobs
2
Why using a master-minion architecture is beneficial for task automation
3
How to implement custom modules for enhanced functionality in Salt
Prerequisites & Requirements
- Understanding of Salt architecture and its components
- Familiarity with Python and REST APIs(optional)
Key Questions Answered
How did LinkedIn scale its Salt infrastructure for remote execution?
LinkedIn scaled its Salt infrastructure by restructuring its architecture, introducing multiple li-salt-master instances, and integrating custom modules to enhance performance. This allowed the system to handle over 15,000 remote execution jobs daily across its server fleet, significantly improving reliability and scalability.
What challenges did LinkedIn face with its previous Salt setup?
The previous Salt setup faced challenges such as high load on a single master handling over 60,000 minions, leading to downtime and operational inefficiencies. Issues included poor code coverage, manual failover management, and complex configurations that hindered performance.
What are the new products developed for Salt at LinkedIn?
LinkedIn developed five new Python multiproducts, including li-salt-master for orchestrating minions and exposing REST APIs, and li-minion, an installable agent that configures itself on hosts. These products enhance the Salt ecosystem's functionality and security.
Key Statistics & Figures
Increase in remote execution jobs
10x
Achieved by scaling the Salt infrastructure to support more jobs with improved reliability.
Number of remote jobs executed daily
15,000
The new architecture supports executing over 15,000 remote jobs across LinkedIn's fleet of servers.
Minions per master in old setup
65,000
The previous single master setup managed nearly 65,000 minions, leading to performance issues.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Automation Tool
Salt
Used for automating operational tasks at various infrastructure layers.
Programming Language
Python
Used for developing custom modules and multiproducts in the Salt ecosystem.
Streaming Platform
Apache Kafka
Used for streaming logs and metrics for monitoring and analysis.
Key Actionable Insights
1Integrate Salt with existing CI/CD pipelines to streamline deployment workflows.By leveraging Salt's capabilities within CI/CD processes, teams can automate configuration management and deployment tasks, reducing manual errors and increasing deployment speed.
2Implement custom modules to enhance Salt's functionality for specific use cases.Custom modules can address unique operational challenges and improve the overall performance of the Salt infrastructure, making it more adaptable to LinkedIn's evolving needs.
3Monitor Salt performance metrics using a centralized logging system.Utilizing tools like Apache Kafka for log streaming allows for real-time monitoring and analysis of Salt operations, enabling proactive issue resolution.
Common Pitfalls
1
Overloading a single Salt master with too many minions can lead to performance degradation.
This occurs when the master cannot handle the load, resulting in downtime and operational challenges. Distributing the load across multiple masters can mitigate this issue.
2
Neglecting security measures for client modules can expose vulnerabilities.
Without proper security checks and module ownership, there is a risk of executing unsafe code. Implementing strict ACLs and security audits can help ensure safe operations.
Related Concepts
Salt Architecture And Design Patterns
Remote Execution Strategies
CI/CD Integration Best Practices