A crash course in LinkedIn's global site operations

Greg Leffler
6 min readintermediate
--
View Original

Overview

The article provides an overview of LinkedIn's global site operations, highlighting the technology and tools that enable remote collaboration among Site Reliability Engineers (SREs) across various locations. It discusses the importance of tools like IRC, BlueJeans, inFormed, and EKG in maintaining operational efficiency and communication.

What You'll Learn

1

How to use IRC for effective team communication

2

Why video conferencing tools are essential for remote teams

3

How to monitor site performance using inGraphs

4

When to deploy new code using canary testing

Key Questions Answered

What tools does LinkedIn use for remote collaboration?
LinkedIn employs several tools for remote collaboration, including IRC for chat, BlueJeans for video conferencing, and inFormed for monitoring site changes. These tools help maintain effective communication and operational oversight across global teams.
How does LinkedIn ensure effective monitoring of site performance?
LinkedIn uses inFormed to track site activities, including deployments and feature releases. This tool provides a comprehensive feed of changes, allowing SREs to quickly identify issues and understand what has changed when problems arise.
What is the purpose of canary testing at LinkedIn?
Canary testing at LinkedIn involves deploying new code to a single host to compare its performance against control hosts. This method helps identify potential issues before a full rollout, ensuring stability and reliability in production environments.

Technologies & Tools

Communication
Irc
Used for team chat and discussion archiving.
Video Conferencing
Bluejeans
Facilitates remote meetings and interviews.
Monitoring
Informed
Tracks site changes and deployments.
Analytics
Ingraphs
Provides performance dashboards and alerting.
Deployment
Crt (change Request Tracker)
Monitors code status and supports continuous deployment.
Performance Monitoring
Ekg
Compares performance metrics between canary and control deployments.

Key Actionable Insights

1
Utilize IRC for team communication to enhance collaboration and documentation.
IRC allows for real-time communication and archiving of discussions, making it easier for team members to catch up on conversations and decisions made, especially in a remote work environment.
2
Implement video conferencing tools like BlueJeans for remote meetings.
Video conferencing helps bridge the gap between remote and in-office team members, facilitating better engagement during meetings and interviews, which is crucial for team cohesion.
3
Leverage inGraphs for performance monitoring and alerting.
Using inGraphs allows teams to visualize site performance metrics and set alerts, ensuring that they are promptly notified of any issues that may affect site reliability.

Common Pitfalls

1
Neglecting to utilize effective communication tools can lead to disconnection among remote teams.
Without tools like IRC or video conferencing, team members may miss critical updates or discussions, leading to misalignment and inefficiencies.

Related Concepts

Remote Collaboration Tools
Site Reliability Engineering Practices
Continuous Deployment Strategies