Overview
The article discusses the implementation of Operating System Snapshot Automation (OSSA) at LinkedIn, detailing the challenges faced with manual processes and the transition to an automated system. It highlights the benefits of improved operational excellence, data accessibility via RESTful APIs, and enhanced monitoring and validation of OS snapshots.
What You'll Learn
1
How to automate the creation and validation of OS snapshots
2
Why implementing a RESTful API improves data accessibility
3
How to ensure high availability for APIs in a distributed system
Prerequisites & Requirements
- Understanding of operating systems and server management
- Familiarity with RESTful APIs and HTTP methods(optional)
Key Questions Answered
What is an OS snapshot and how is it created?
An OS snapshot is a collection of bootfiles, RPMs, and metadata that represents a specific state of an operating system. It is created by gathering the latest RPMs and bundling them into an immutable entity, ensuring consistent installations across environments.
How does OSSA improve the OS snapshot process?
OSSA automates the OS snapshot creation and validation process, transitioning from a manual method reliant on shell scripts to a highly available and monitored system. This enhances operational excellence and reduces the need for dedicated engineer time.
What challenges did LinkedIn face before implementing OSSA?
Before OSSA, the OS snapshot process was manual, tied to one-off scripts, and lacked a programmatic way to reference metadata. This led to issues with maintainability, scalability, and high availability, requiring significant human effort for snapshot management.
How does OSSA ensure data integrity and monitoring?
OSSA implements monitoring for missing RPMs and metadata changes, using HMAC-SHA256 for data integrity checks. Scheduled tasks notify engineers of discrepancies, ensuring that snapshots remain immutable and consistent throughout their lifecycle.
Key Statistics & Figures
Storage reduction from purging snapshots
500GB
OSSA's first iteration of the purging process removed this amount of redundant data.
Technologies & Tools
Backend
Restful API
Used to expose OS snapshot data for easier access and integration with external teams.
Security
Datavault
Utilized for token-based authorization to secure API access.
Data Management
Goldengate Replication
Employed for replicating data between different data centers to ensure consistency.
Infrastructure
Maas - Metal As A Service
Used for triggering reimages of servers during the OS snapshot validation process.
Storage
Ambry
Used for storing and verifying the integrity of snapshot metadata.
Key Actionable Insights
1Implementing a RESTful API can significantly enhance data accessibility and integration with external systems.By exposing OS snapshot data through an API, teams can easily query and validate snapshots, improving collaboration and reducing reliance on manual tracking methods.
2Automating the OS snapshot validation process can free up engineering resources and improve operational efficiency.Transitioning from manual validation to automated boot-testing allows for quicker iterations and more frequent updates, ensuring that servers are always running the latest, secure OS versions.
3Regularly purging redundant or expired snapshots can help manage storage costs and improve system performance.By implementing a systematic approach to snapshot deprecation, OSSA has successfully reduced storage usage by approximately 500GB, demonstrating the importance of maintaining a clean and efficient data environment.
Common Pitfalls
1
Relying on manual processes for OS snapshot management can lead to inefficiencies and increased risk of errors.
Manual processes are often slow and prone to human error, which can compromise the integrity and security of the OS snapshots. Automating these processes mitigates these risks.
Related Concepts
Automation In Infrastructure Management
Operating System Lifecycle Management
Data Integrity And Monitoring Techniques